Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
113 commits
Select commit Hold shift + click to select a range
ac7e888
docs: fix helm chart urls (#2033)
nealvaidya Jul 21, 2025
76fd471
refactor: support for turning prefix cache off (#2034)
alec-flowers Jul 22, 2025
4449f3d
fix: never sleep on the eos (#2039)
alec-flowers Jul 22, 2025
20c5daf
fix: install torch distribution matching container cuda version (#2027)
ptarasiewiczNV Jul 22, 2025
e5a8628
feat: add a hierarchical Prometheus MetricsRegistry trait for Distrib…
keivenchang Jul 22, 2025
7882693
feat: use atomic transactions when creating etcd kv (#2044)
PeaBrane Jul 22, 2025
d65ce1b
chore(sglang): Move examples/sglang to components/backends/sglang (#2…
grahamking Jul 22, 2025
73505c7
fix: correct Nixl plugin paths in Dockerfile. (#2048)
karya0 Jul 22, 2025
c49a13e
docs: Cleanup index.rst (#2007)
atchernych Jul 22, 2025
9f2356c
chore: Remove unused portion of kv bindings test (#2052)
rmccorm4 Jul 22, 2025
f3e3d94
refactor: vLLM to new Python UX (#1983)
alec-flowers Jul 22, 2025
9cfaa7b
chore: Bump genai-perf to v0.0.15 (#2051)
ptarasiewiczNV Jul 22, 2025
22e6c96
chore: Change vllm K8s from dynamo-run to python -m dynamo.frontend (…
grahamking Jul 22, 2025
b127d95
feat: health check changes based on endpoint served (#1996)
nnshah1 Jul 23, 2025
1958b3a
build: Fixes for vLLM Blackwell Builds (#2020)
zaristei Jul 23, 2025
2c642fd
fix: vllm deployment examples (#2062)
biswapanda Jul 23, 2025
6a69ef4
fix: cryptic error message for empty messages list in /chat/completio…
heisenberglit Jul 23, 2025
c6f12f6
ci: Add RUN_SGLANG to CI variables (#1928)
pvijayakrish Jul 23, 2025
e0a5194
feat: Connect Library (#1478)
whoisj Jul 23, 2025
ffb5409
fix: endpoint changes should be prioritized over new requests in kv s…
PeaBrane Jul 23, 2025
eebc741
docs: Adjust the path to examples (#2056)
atchernych Jul 23, 2025
f9b1757
fix: Bring back ignore_eos/min_tokens support in trtllm component (#2…
rmccorm4 Jul 23, 2025
66b7d2c
fix: updates versions and adds ahashmap to BPE (#2072)
paulhendricks Jul 23, 2025
9bdceac
fix: github ci triggers (#2075)
biswapanda Jul 23, 2025
7a0013b
chore: update attributions for 0.3.2 release (#1837) (#2032)
nv-anants Jul 23, 2025
13560ab
feat: sglang examples launch and deploy (#2068)
biswapanda Jul 23, 2025
f3d784f
feat: query instance_id based on routing strategy (#1787)
biswapanda Jul 23, 2025
3c500ae
docs: Update docs for new UX (#2070)
grahamking Jul 23, 2025
19a77ae
chore(dynamo-run): Remove out=sglang|vllm|trtllm (#1920)
grahamking Jul 24, 2025
ee3a8e4
feat: add initial Grove support (#2012)
julienmancuso Jul 24, 2025
cde8db3
docs: Replace a sym link with and actual markdown link (#2074)
atchernych Jul 24, 2025
13d3cc1
feat: add nixl benchmark deployment instructions (#2060)
biswapanda Jul 24, 2025
2fc65ad
feat: dump radix tree as router events (#2057)
PeaBrane Jul 24, 2025
ba3ac23
test: add router e2e test with mockers to per-merge ci (#2073)
PeaBrane Jul 24, 2025
fe718fd
feat: deploy SLA profiler to k8s (#2030)
hhzhang16 Jul 24, 2025
a2874fd
feat: add possibility to use grove in dynamo graph helm chart (#1954)
julienmancuso Jul 24, 2025
f03f8be
docs: hello_world python binding example (#2083)
nealvaidya Jul 24, 2025
2bbbd44
chore: Remove unused trtllm requirements.txt (#2098)
rmccorm4 Jul 24, 2025
f0e382a
fix: Merge env vars correctly (#2096)
julienmancuso Jul 24, 2025
3094278
docs: Create a guide for writing dynamo deployments CR (#1999)
atchernych Jul 24, 2025
ff92053
docs: add NAMESPACE (#2105)
atchernych Jul 25, 2025
a2cb1c3
feat: update python packaging for new dynamo UX (#2054)
grahamking Jul 25, 2025
24cb926
docs: Clean index.rst (#2104)
atchernych Jul 25, 2025
412a12a
fix: rm enforce eager from vllm deploy - prefer perf over pod launch …
biswapanda Jul 25, 2025
2cd96ec
build: Add TensorRT-LLM to optional dependency and corresponding inst…
tanmayv25 Jul 25, 2025
384e449
fix: agg router test (#2123)
alec-flowers Jul 25, 2025
4dc529a
chore: remove vLLM v0 multimodal example (#2099)
GuanLuo Jul 25, 2025
4498a77
fix: move docker-compose.yml to deploy/, and update frontend port (#2…
keivenchang Jul 25, 2025
222245e
refactor: Move engine and publisher from dynamo.llm.tensorrt_llm to d…
tanmayv25 Jul 26, 2025
b8461b6
chore: updated health checks to use new probes (#2124)
nnshah1 Jul 27, 2025
e2a514b
fix: remove prints (#2142)
alec-flowers Jul 28, 2025
615580d
feat: Base metrics: add generic ingress handler metrics (#2090)
keivenchang Jul 28, 2025
e82bc4e
chore: update vLLM to 0.10.0 (#2114)
ptarasiewiczNV Jul 28, 2025
803bfa8
feat: proper local hashes for mockers + router watches endpoints (#2132)
PeaBrane Jul 28, 2025
0cb01b3
feat: updates to structured logging (#2061)
nnshah1 Jul 28, 2025
ca0035f
fix: copy whole workspace for pre-merge vllm tests (#2146)
nv-anants Jul 28, 2025
d23d48b
feat: Deploy SLA planner to Kubernetes (#2135)
hhzhang16 Jul 28, 2025
708d7c3
docs: add Llama4 eagle3 one model example and configs (#2087)
jhaotingc Jul 28, 2025
096d117
docs: update router docs (#2148)
PeaBrane Jul 28, 2025
1e6709d
feat: allow to override any podSpec property (#2116)
julienmancuso Jul 28, 2025
f809659
docs: hello world deploy example (#2102)
atchernych Jul 28, 2025
cfc6178
feat: add sglang disagg deployment examples (#2137)
biswapanda Jul 28, 2025
bbe8dbb
fix: remove containers from required property of extraPodSpec (#2153)
julienmancuso Jul 28, 2025
fdcf611
chore: Add Request Migration docs and minor enhancements (#2038)
kthui Jul 28, 2025
095ea3e
chore: updating and removing tests (#2130)
nnshah1 Jul 29, 2025
4747790
feat: deprecate sdk as dependency (#2149)
biswapanda Jul 29, 2025
3175b10
docs: Update to README.md (#2141)
athreesh Jul 29, 2025
7fbd43a
docs: Update dynamo_glossary.md (#2082)
athreesh Jul 29, 2025
358e908
docs: Adding document for running Dynamo on Azure Kubernetes Services…
saurabh-nvidia Jul 29, 2025
195c4c4
docs: Quickstart with new UX (#2005)
nealvaidya Jul 29, 2025
291df28
docs: add disagg example + explanation (#2086)
nealvaidya Jul 29, 2025
ca5b681
docs: add multinode example (#2155)
nealvaidya Jul 29, 2025
a8cb655
docs: update readme install instructions (#2170)
nv-anants Jul 29, 2025
5be23eb
Readmes + eks additions (#2157)
athreesh Jul 29, 2025
2befa38
feat: claim support for AL2023 x86_64 (#2150)
saturley-hall Jul 29, 2025
e542f00
chore: cleanup examples codeowners (#2171)
nealvaidya Jul 29, 2025
12a7b83
docs: Examples README/restructuring, framework READMEs, EKS examples …
athreesh Jul 29, 2025
8b0a035
docs: Update the operator docs (#2172)
atchernych Jul 29, 2025
8248a11
feat: gaie helm chart based example (#2168)
biswapanda Jul 29, 2025
157714a
chore: add instructions to modify SLA to profile_sla doc; update comp…
tedzhouhk Jul 29, 2025
30d4612
fix: install rdma libs in runtime image. (#2163)
karya0 Jul 29, 2025
da0c572
chore: update sgl version and fix h100 wideep example (#2169)
ishandhanani Jul 30, 2025
4c90b1b
chore: Version bump to 0.4.0 (#2179)
dmitry-tokarev-nv Jul 30, 2025
ee09de0
fix: link to point to bindings/python/README.md (#2186)
keivenchang Jul 30, 2025
dabfea3
chore: address QA broken links comments (#2184)
athreesh Jul 30, 2025
b69c507
fix: add better port logic (#2175)
alec-flowers Jul 30, 2025
7fc94da
fix(container): update sgl dockerfile install commands (#2194)
ishandhanani Jul 30, 2025
57482dc
docs: Bug 5424387 (#2196)
atchernych Jul 30, 2025
f3868b1
fix: support config without resource limit for profile sla script (#2…
tedzhouhk Jul 31, 2025
f8b0a5a
feat: Add trtllm deploy examples for k8s (#2133)
tanmayv25 Jul 31, 2025
62c7898
fix: add curl and jq for health checks (#2203)
biswapanda Jul 31, 2025
c546b63
fix: update SGLang version in instructions and Dockerfile to revert t…
ishandhanani Jul 31, 2025
97390ac
fix(k8s): sglang disagg now uses decode worker (#2206)
ishandhanani Jul 31, 2025
f10aab3
fix: Migrating trtllm examples from `1.0.0rc0` to `1.0.4rc4` (#2217)
KrishnanPrash Jul 31, 2025
3bf22bb
feat: reorganize sglang and add expert distribution endpoints (#2181)
ishandhanani Jul 31, 2025
bae25dc
feat: skip downloading model weights if using mocker (only tokenizer)…
PeaBrane Jul 31, 2025
cbc0e20
fix: fix endpoint run to return error DIS-325 (#2156)
keivenchang Jul 31, 2025
625578c
chore: update nixl version to 0.4.1 (#2221)
nv-anants Jul 31, 2025
7e3b3fa
fix: Add default configs in LLMAPI. Fixes OOM issues (#2198)
tanmayv25 Jul 31, 2025
f10e44c
fix: Integration tests fixes (#2161)
keivenchang Jul 31, 2025
f14f59c
chore: Remove multimodal readme. (#2212)
krishung5 Jul 31, 2025
dbd33df
fix: handle groveTerminationDelay and auto-detect grove installation …
julienmancuso Aug 1, 2025
66231cf
feat: reduce / revert routing overheads, do not consider output token…
PeaBrane Aug 1, 2025
8c75ed7
fix: frontend metrics to be renamed from nv_llm_http_service_* => dyn…
keivenchang Aug 1, 2025
1ad6abe
feat: add sgl deploy readme (#2238)
ishandhanani Aug 1, 2025
efd863d
fix: dynamo_component to be added in metric names (#2180)
keivenchang Aug 1, 2025
faafa5f
docs: add a docs/guides/metrics.md (#2160)
keivenchang Aug 1, 2025
cb1492a
rebase main
ziqifan617 Aug 1, 2025
ae51b3f
test: Request Migration Docs and E2E vLLM Tests (#2177)
kthui Aug 1, 2025
959f810
feat: sglang + gb200 (#2223)
ishandhanani Aug 1, 2025
fa492bb
docs: Dyn 591 (#2247)
atchernych Aug 2, 2025
357f34b
cleanup (#2250)
ziqifan617 Aug 2, 2025
2954005
Merge branch 'main' into ziqi/connector-250801
ziqifan617 Aug 2, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix: frontend metrics to be renamed from nv_llm_http_service_* => dyn…
…amo_frontend_* (#2176)

Co-authored-by: Keiven Chang <[email protected]>
  • Loading branch information
keivenchang and keivenchang authored Aug 1, 2025
commit 8c75ed799170e2c8d7f3df42eab6af7ecbd4b5eb
2 changes: 1 addition & 1 deletion components/metrics/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -38,4 +38,4 @@ tracing = { workspace = true }
# TODO: Update axum to 0.8
axum = { version = "0.6" }
clap = { version = "4.5", features = ["derive", "env"] }
reqwest = { version = "0.12", default-features = false, features = ["json", "rustls-tls"] }
reqwest = { version = "0.12.22", default-features = false, features = ["json", "rustls-tls"] }
44 changes: 27 additions & 17 deletions components/metrics/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,23 @@
# Metrics

The `metrics` component is a utility that can collect, aggregate, and publish
metrics from a Dynamo deployment. After collecting and aggregating metrics from
workers, it exposes them via an HTTP `/metrics` endpoint in Prometheus format
that other applications or visualization tools like Prometheus server and Grafana can
pull from.

**Note**: This is a demo implementation. The metrics component is currently under active development and this documentation will change as the implementation evolves.
- In this demo the metrics names use the prefix "llm", but in production they will be prefixed with "nv_llm" (e.g., the HTTP `/metrics` endpoint will serve metrics with "nv_llm" prefixes)
⚠️ **DEPRECATION NOTICE** ⚠️

**This `metrics` component is unmaintained and being deprecated.**

The deprecated `metrics` component is being replaced by the **`MetricsRegistry`** built-in functionality that is now available directly in the `DistributedRuntime` framework. The `MetricsRegistry` provides:

**For new projects and existing deployments, please migrate to using `MetricsRegistry` instead of this component.**

This component may be migrated to the MetricsRegistry in the future.

**📖 See the [Dynamo MetricsRegistry Guide](../../docs/guides/metrics.md) for detailed information on using the new metrics system.**

---

The deprecated `metrics` component is a utility for collecting, aggregating, and publishing metrics from a Dynamo deployment, but it is unmaintained and being deprecated in favor of `MetricsRegistry`.

**Note**: This is a demo implementation. The deprecated `metrics` component is no longer under active development.
- In this demo the metrics names use the prefix "llm", but in production they will be prefixed with "dynamo" (e.g., the HTTP `/metrics` endpoint will serve metrics with "dynamo" prefixes)
- This demo will only work when using examples/llm/configs/agg.yml-- other configurations will not work

<div align="center">
Expand All @@ -16,7 +26,7 @@ pull from.

## Quickstart

To start the `metrics` component, simply point it at the `namespace/component/endpoint`
To start the deprecated `metrics` component, simply point it at the `namespace/component/endpoint`
trio for the Dynamo workers that you're interested in monitoring metrics on.

This will:
Expand Down Expand Up @@ -45,14 +55,14 @@ will get automatically discovered and the warnings will stop.

## Workers

The `metrics` component needs running workers to gather metrics from,
The deprecated `metrics` component needs running workers to gather metrics from,
so below are some examples of workers and how they can be monitored.

### Mock Worker

To try out how `metrics` works, there is a demo Rust-based
To try out how the deprecated `metrics` component works, there is a demo Rust-based
[mock worker](src/bin/mock_worker.rs) that provides sample data through two mechanisms:
1. Exposes a stats handler at `dynamo/MyComponent/my_endpoint` that responds to polling requests (from `metrics`) with randomly generated `ForwardPassMetrics` data
1. Exposes a stats handler at `dynamo/MyComponent/my_endpoint` that responds to polling requests (from the deprecated `metrics` component) with randomly generated `ForwardPassMetrics` data
2. Publishes mock `KVHitRateEvent` data every second to demonstrate event-based metrics

Step 1: Launch a mock workers via the following command (if already built):
Expand Down Expand Up @@ -99,11 +109,11 @@ docker compose -f deploy/docker-compose.yml --profile metrics up -d

## Metrics Collection Modes

The metrics component supports two modes for exposing metrics in a Prometheus format:
The deprecated `metrics` component supports two modes for exposing metrics in a Prometheus format:

### Pull Mode (Default)

When running in pull mode (the default), the metrics component will expose a
When running in pull mode (the default), the deprecated `metrics` component will expose a
Prometheus metrics endpoint on the specified host and port that a
Prometheus server or curl client can pull from:

Expand Down Expand Up @@ -136,7 +146,7 @@ curl localhost:9091/metrics
### Push Mode

For ephemeral or batch jobs, or when metrics need to be pushed through a firewall,
you can use Push mode. In this mode, the metrics component will periodically push
you can use Push mode. In this mode, the deprecated `metrics` component will periodically push
metrics to an externally hosted
[Prometheus PushGateway](https://prometheus.io/docs/instrumenting/pushing/):

Expand All @@ -145,7 +155,7 @@ Start a prometheus push gateway service via docker:
docker run --rm -d -p 9091:9091 --name pushgateway prom/pushgateway
```

Start the metrics component in `--push` mode, specifying the host and port of your PushGateway:
Start the deprecated `metrics` component in `--push` mode, specifying the host and port of your PushGateway:
```bash
# Push metrics to a Prometheus PushGateway every --push-interval seconds
metrics \
Expand Down Expand Up @@ -173,7 +183,7 @@ curl 127.0.0.1:9091/metrics
```
## Building/Running from Source

For easy iteration while making edits to the metrics component, you can use `cargo run`
For easy iteration while making edits to the deprecated `metrics` component, you can use `cargo run`
to build and run with your local changes:

```bash
Expand Down
17 changes: 9 additions & 8 deletions components/planner/src/dynamo/planner/utils/prometheus.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,15 +35,16 @@ def _get_average_metric(
increase(metric_sum[interval])/increase(metric_count[interval])

Args:
metric_name: Base metric name (e.g., 'nv_llm_http_service_inter_token_latency_seconds')
metric_name: Base metric name (e.g., 'inter_token_latency_seconds')
interval: Time interval for the query (e.g., '60s')
operation_name: Human-readable name for error logging

Returns:
Average metric value or 0 if no data/error
"""
try:
query = f"increase({metric_name}_sum[{interval}])/increase({metric_name}_count[{interval}])"
full_metric_name = f"dynamo_frontend_{metric_name}"
query = f"increase({full_metric_name}_sum[{interval}])/increase({full_metric_name}_count[{interval}])"
result = self.prom.custom_query(query=query)
if not result:
# No data available yet (no requests made) - return 0 silently
Expand All @@ -55,21 +56,21 @@ def _get_average_metric(

def get_avg_inter_token_latency(self, interval: str):
return self._get_average_metric(
"nv_llm_http_service_inter_token_latency_seconds",
"inter_token_latency_seconds",
interval,
"avg inter token latency",
)

def get_avg_time_to_first_token(self, interval: str):
return self._get_average_metric(
"nv_llm_http_service_time_to_first_token_seconds",
"time_to_first_token_seconds",
interval,
"avg time to first token",
)

def get_avg_request_duration(self, interval: str):
return self._get_average_metric(
"nv_llm_http_service_request_duration_seconds",
"request_duration_seconds",
interval,
"avg request duration",
)
Expand All @@ -78,7 +79,7 @@ def get_avg_request_count(self, interval: str):
# This function follows a different query pattern than the other metrics
try:
raw_res = self.prom.custom_query(
query=f"increase(nv_llm_http_service_requests_total[{interval}])"
query=f"increase(dynamo_frontend_requests_total[{interval}])"
)
total_count = 0.0
for res in raw_res:
Expand All @@ -91,14 +92,14 @@ def get_avg_request_count(self, interval: str):

def get_avg_input_sequence_tokens(self, interval: str):
return self._get_average_metric(
"nv_llm_http_service_input_sequence_tokens",
"input_sequence_tokens",
interval,
"avg input sequence tokens",
)

def get_avg_output_sequence_tokens(self, interval: str):
return self._get_average_metric(
"nv_llm_http_service_output_sequence_tokens",
"output_sequence_tokens",
interval,
"avg output sequence tokens",
)
17 changes: 10 additions & 7 deletions deploy/metrics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container

- Start the [components/metrics](../../components/metrics/README.md) application to begin monitoring for metric events from dynamo workers and aggregating them on a Prometheus metrics endpoint: `http://localhost:9091/metrics`.
- Uncomment the appropriate lines in prometheus.yml to poll port 9091.
- Start worker(s) that publishes KV Cache metrics: [examples/rust/service_metrics/bin/server](../../lib/runtime/examples/service_metrics/README.md)` can populate dummy KV Cache metrics.
- Start worker(s) that publishes KV Cache metrics: [lib/runtime/examples/service_metrics/README.md](../../lib/runtime/examples/service_metrics/README.md) can populate dummy KV Cache metrics.


## Configuration
Expand Down Expand Up @@ -95,16 +95,19 @@ The following configuration files should be present in this directory:
- [grafana_dashboards/grafana-dcgm-metrics.json](./grafana_dashboards/grafana-dcgm-metrics.json): Contains Grafana dashboard configuration for DCGM GPU metrics
- [grafana_dashboards/grafana-llm-metrics.json](./grafana_dashboards/grafana-llm-metrics.json): This file, which is being phased out, contains the Grafana dashboard configuration for LLM-specific metrics. It requires an additional `metrics` component to operate concurrently. A new version is under development.

## Running the example `metrics` component
## Running the deprecated `metrics` component

IMPORTANT: This section is being phased out, and some metrics may not function as expected. A new solution is under development.
⚠️ **DEPRECATION NOTICE** ⚠️

When you run the example [components/metrics](../../components/metrics/README.md) component, it exposes a Prometheus /metrics endpoint with the followings (defined in [../../components/metrics/src/lib.rs](../../components/metrics/src/lib.rs)):
- `llm_requests_active_slots`: Number of currently active request slots per worker
When you run the example [components/metrics](../../components/metrics/README.md) component, it exposes a Prometheus /metrics endpoint with the following metrics (defined in [components/metrics/src/lib.rs](../../components/metrics/src/lib.rs)):

**⚠️ The following `llm_kv_*` metrics are deprecated:**

- `llm_requests_active_slots`: Active request slots per worker
- `llm_requests_total_slots`: Total available request slots per worker
- `llm_kv_blocks_active`: Number of active KV blocks per worker
- `llm_kv_blocks_active`: Active KV blocks per worker
- `llm_kv_blocks_total`: Total KV blocks available per worker
- `llm_kv_hit_rate_percent`: Cumulative KV Cache hit percent per worker
- `llm_kv_hit_rate_percent`: KV Cache hit percent per worker
- `llm_load_avg`: Average load across workers
- `llm_load_std`: Load standard deviation across workers

Expand Down
22 changes: 11 additions & 11 deletions deploy/metrics/grafana_dashboards/grafana-dynamo-dashboard.json
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
"type": "prometheus",
"uid": "P1809F7CD0C75ACF3"
},
"description": "nv_llm_http_service_requests_total (1m)",
"description": "dynamo_frontend_requests_total (1m)",
"fieldConfig": {
"defaults": {
"color": {
Expand Down Expand Up @@ -106,7 +106,7 @@
"targets": [
{
"editorMode": "code",
"expr": "rate(nv_llm_http_service_requests_total[30s])",
"expr": "rate(dynamo_frontend_requests_total[30s])",
"legendFormat": "{{request_type}}, {{status}},",
"range": true,
"refId": "A"
Expand All @@ -120,7 +120,7 @@
"type": "prometheus",
"uid": "P1809F7CD0C75ACF3"
},
"description": "nv_llm_http_service_time_to_first_token_seconds (sum/count)",
"description": "dynamo_frontend_time_to_first_token_seconds (sum/count)",
"fieldConfig": {
"defaults": {
"color": {
Expand Down Expand Up @@ -199,7 +199,7 @@
"targets": [
{
"editorMode": "code",
"expr": "1000*(nv_llm_http_service_time_to_first_token_seconds_sum/nv_llm_http_service_time_to_first_token_seconds_count)",
"expr": "1000*(dynamo_frontend_time_to_first_token_seconds_sum/dynamo_frontend_time_to_first_token_seconds_count)",
"legendFormat": "{{model}}",
"range": true,
"refId": "A"
Expand All @@ -213,7 +213,7 @@
"type": "prometheus",
"uid": "P1809F7CD0C75ACF3"
},
"description": "nv_llm_http_service_inter_token_latency_seconds (sum/count)",
"description": "dynamo_frontend_inter_token_latency_seconds (sum/count)",
"fieldConfig": {
"defaults": {
"color": {
Expand Down Expand Up @@ -292,7 +292,7 @@
"targets": [
{
"editorMode": "code",
"expr": "1000*(nv_llm_http_service_inter_token_latency_seconds_sum/nv_llm_http_service_inter_token_latency_seconds_count)",
"expr": "1000*(dynamo_frontend_inter_token_latency_seconds_sum/dynamo_frontend_inter_token_latency_seconds_count)",
"legendFormat": "{{model}}",
"range": true,
"refId": "A"
Expand All @@ -306,7 +306,7 @@
"type": "prometheus",
"uid": "P1809F7CD0C75ACF3"
},
"description": "nv_llm_http_service_request_duration (sum/count)",
"description": "dynamo_frontend_request_duration (sum/count)",
"fieldConfig": {
"defaults": {
"color": {
Expand Down Expand Up @@ -385,7 +385,7 @@
"targets": [
{
"editorMode": "code",
"expr": "1000*(nv_llm_http_service_request_duration_seconds_sum / nv_llm_http_service_request_duration_seconds_count)",
"expr": "1000*(dynamo_frontend_request_duration_seconds_sum / dynamo_frontend_request_duration_seconds_count)",
"legendFormat": "{{model}}",
"range": true,
"refId": "A"
Expand All @@ -399,7 +399,7 @@
"type": "prometheus",
"uid": "P1809F7CD0C75ACF3"
},
"description": "The length is the number of tokens. nv_llm_http_service_input_sequence_tokens",
"description": "The length is the number of tokens. dynamo_frontend_input_sequence_tokens",
"fieldConfig": {
"defaults": {
"color": {
Expand Down Expand Up @@ -478,7 +478,7 @@
"targets": [
{
"editorMode": "code",
"expr": "nv_llm_http_service_input_sequence_tokens_sum / nv_llm_http_service_input_sequence_tokens_count",
"expr": "dynamo_frontend_input_sequence_tokens_sum / dynamo_frontend_input_sequence_tokens_count",
"legendFormat": "ISL",
"range": true,
"refId": "A"
Expand All @@ -489,7 +489,7 @@
"uid": "P1809F7CD0C75ACF3"
},
"editorMode": "code",
"expr": "nv_llm_http_service_output_sequence_tokens_sum / nv_llm_http_service_output_sequence_tokens_count",
"expr": "dynamo_frontend_output_sequence_tokens_sum / dynamo_frontend_output_sequence_tokens_count",
"hide": false,
"instant": false,
"legendFormat": "OSL",
Expand Down
8 changes: 7 additions & 1 deletion deploy/metrics/grafana_dashboards/grafana-llm-metrics.json
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,13 @@
"distributed under the License is distributed on an \"AS IS\" BASIS,",
"WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.",
"See the License for the specific language governing permissions and",
"limitations under the License."
"limitations under the License.",
"",
"DEPRECATION NOTICE:",
"This dashboard uses deprecated llm_kv_* metrics (llm_kv_blocks_active, llm_kv_blocks_total, llm_kv_hit_rate_percent)",
"that are part of the deprecated metrics aggregation service. These metrics will be removed in a future release.",
"Please migrate to the new MetricsRegistry system which provides dynamo_* metrics instead.",
"See docs/guides/metrics.md for migration guidance."
],
"editable": true,
"fiscalYearStartMonth": 0,
Expand Down
2 changes: 2 additions & 0 deletions deploy/metrics/prometheus.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ scrape_configs:
static_configs:
- targets: ['host.docker.internal:8081']

# DEPRECATED: This metrics aggregation service is being deprecated in favor of MetricsRegistry
# The new system uses the 'dynamo-backend' job above instead of this separate service
# This is another demo aggregator that needs to be launched manually. See components/metrics/README.md
# Note that you may need to disable the firewall on your host. On Ubuntu: sudo ufw allow 9091/tcp
- job_name: 'metrics-aggregation-service'
Expand Down
Loading