fix: frontend metrics to be renamed from nv_llm_http_service_* => dyn…

…amo_frontend_* (#2176) Co-authored-by: Keiven Chang <[email protected]>
ai-dynamo · ryanolson · Aug 2, 2025 · Jul 21, 2025 · Jul 22, 2025 · Jul 22, 2025
commit 8c75ed799170e2c8d7f3df42eab6af7ecbd4b5eb
diff --git a/components/metrics/Cargo.toml b/components/metrics/Cargo.toml
@@ -38,4 +38,4 @@ tracing = { workspace = true }
 # TODO: Update axum to 0.8
 axum = { version = "0.6" }
 clap = { version = "4.5", features = ["derive", "env"] }
-reqwest = { version = "0.12", default-features = false, features = ["json", "rustls-tls"] }
+reqwest = { version = "0.12.22", default-features = false, features = ["json", "rustls-tls"] }
diff --git a/components/metrics/README.md b/components/metrics/README.md
@@ -1,13 +1,23 @@
 # Metrics
 
-The `metrics` component is a utility that can collect, aggregate, and publish
-metrics from a Dynamo deployment. After collecting and aggregating metrics from
-workers, it exposes them via an HTTP `/metrics` endpoint in Prometheus format
-that other applications or visualization tools like Prometheus server and Grafana can
-pull from.
-
-**Note**: This is a demo implementation. The metrics component is currently under active development and this documentation will change as the implementation evolves.
-- In this demo the metrics names use the prefix "llm", but in production they will be prefixed with "nv_llm" (e.g., the HTTP `/metrics` endpoint will serve metrics with "nv_llm" prefixes)
+⚠️ **DEPRECATION NOTICE** ⚠️
+
+**This `metrics` component is unmaintained and being deprecated.**
+
+The deprecated `metrics` component is being replaced by the **`MetricsRegistry`** built-in functionality that is now available directly in the `DistributedRuntime` framework. The `MetricsRegistry` provides:
+
+**For new projects and existing deployments, please migrate to using `MetricsRegistry` instead of this component.**
+
+This component may be migrated to the MetricsRegistry in the future.
+
+**📖 See the [Dynamo MetricsRegistry Guide](../../docs/guides/metrics.md) for detailed information on using the new metrics system.**
+
+---
+
+The deprecated `metrics` component is a utility for collecting, aggregating, and publishing metrics from a Dynamo deployment, but it is unmaintained and being deprecated in favor of `MetricsRegistry`.
+
+**Note**: This is a demo implementation. The deprecated `metrics` component is no longer under active development.
+- In this demo the metrics names use the prefix "llm", but in production they will be prefixed with "dynamo" (e.g., the HTTP `/metrics` endpoint will serve metrics with "dynamo" prefixes)
 - This demo will only work when using examples/llm/configs/agg.yml-- other configurations will not work
 
 <div align="center">
@@ -16,7 +26,7 @@ pull from.
 
 ## Quickstart
 
-To start the `metrics` component, simply point it at the `namespace/component/endpoint`
+To start the deprecated `metrics` component, simply point it at the `namespace/component/endpoint`
 trio for the Dynamo workers that you're interested in monitoring metrics on.
 
 This will:
@@ -45,14 +55,14 @@ will get automatically discovered and the warnings will stop.
 
 ## Workers
 
-The `metrics` component needs running workers to gather metrics from,
+The deprecated `metrics` component needs running workers to gather metrics from,
 so below are some examples of workers and how they can be monitored.
 
 ### Mock Worker
 
-To try out how `metrics` works, there is a demo Rust-based
+To try out how the deprecated `metrics` component works, there is a demo Rust-based
 [mock worker](src/bin/mock_worker.rs) that provides sample data through two mechanisms:
-1. Exposes a stats handler at `dynamo/MyComponent/my_endpoint` that responds to polling requests (from `metrics`) with randomly generated `ForwardPassMetrics` data
+1. Exposes a stats handler at `dynamo/MyComponent/my_endpoint` that responds to polling requests (from the deprecated `metrics` component) with randomly generated `ForwardPassMetrics` data
 2. Publishes mock `KVHitRateEvent` data every second to demonstrate event-based metrics
 
 Step 1: Launch a mock workers via the following command (if already built):
@@ -99,11 +109,11 @@ docker compose -f deploy/docker-compose.yml --profile metrics up -d
 
 ## Metrics Collection Modes
 
-The metrics component supports two modes for exposing metrics in a Prometheus format:
+The deprecated `metrics` component supports two modes for exposing metrics in a Prometheus format:
 
 ### Pull Mode (Default)
 
-When running in pull mode (the default), the metrics component will expose a
+When running in pull mode (the default), the deprecated `metrics` component will expose a
 Prometheus metrics endpoint on the specified host and port that a
 Prometheus server or curl client can pull from:
 
@@ -136,7 +146,7 @@ curl localhost:9091/metrics
 ### Push Mode
 
 For ephemeral or batch jobs, or when metrics need to be pushed through a firewall,
-you can use Push mode. In this mode, the metrics component will periodically push
+you can use Push mode. In this mode, the deprecated `metrics` component will periodically push
 metrics to an externally hosted
 [Prometheus PushGateway](https://prometheus.io/docs/instrumenting/pushing/):
 
@@ -145,7 +155,7 @@ Start a prometheus push gateway service via docker:
 docker run --rm -d -p 9091:9091 --name pushgateway prom/pushgateway
 ```
 
-Start the metrics component in `--push` mode, specifying the host and port of your PushGateway:
+Start the deprecated `metrics` component in `--push` mode, specifying the host and port of your PushGateway:
 ```bash
 # Push metrics to a Prometheus PushGateway every --push-interval seconds
 metrics \
@@ -173,7 +183,7 @@ curl 127.0.0.1:9091/metrics
 ```
 ## Building/Running from Source
 
-For easy iteration while making edits to the metrics component, you can use `cargo run`
+For easy iteration while making edits to the deprecated `metrics` component, you can use `cargo run`
 to build and run with your local changes:
 
 ```bash

diff --git a/components/planner/src/dynamo/planner/utils/prometheus.py b/components/planner/src/dynamo/planner/utils/prometheus.py
@@ -35,15 +35,16 @@ def _get_average_metric(
         increase(metric_sum[interval])/increase(metric_count[interval])
 
         Args:
-            metric_name: Base metric name (e.g., 'nv_llm_http_service_inter_token_latency_seconds')
+            metric_name: Base metric name (e.g., 'inter_token_latency_seconds')
             interval: Time interval for the query (e.g., '60s')
             operation_name: Human-readable name for error logging
 
         Returns:
             Average metric value or 0 if no data/error
         """
         try:
-            query = f"increase({metric_name}_sum[{interval}])/increase({metric_name}_count[{interval}])"
+            full_metric_name = f"dynamo_frontend_{metric_name}"
+            query = f"increase({full_metric_name}_sum[{interval}])/increase({full_metric_name}_count[{interval}])"
             result = self.prom.custom_query(query=query)
             if not result:
                 # No data available yet (no requests made) - return 0 silently
@@ -55,21 +56,21 @@ def _get_average_metric(
 
     def get_avg_inter_token_latency(self, interval: str):
         return self._get_average_metric(
-            "nv_llm_http_service_inter_token_latency_seconds",
+            "inter_token_latency_seconds",
             interval,
             "avg inter token latency",
         )
 
     def get_avg_time_to_first_token(self, interval: str):
         return self._get_average_metric(
-            "nv_llm_http_service_time_to_first_token_seconds",
+            "time_to_first_token_seconds",
             interval,
             "avg time to first token",
         )
 
     def get_avg_request_duration(self, interval: str):
         return self._get_average_metric(
-            "nv_llm_http_service_request_duration_seconds",
+            "request_duration_seconds",
             interval,
             "avg request duration",
         )
@@ -78,7 +79,7 @@ def get_avg_request_count(self, interval: str):
         # This function follows a different query pattern than the other metrics
         try:
             raw_res = self.prom.custom_query(
-                query=f"increase(nv_llm_http_service_requests_total[{interval}])"
+                query=f"increase(dynamo_frontend_requests_total[{interval}])"
             )
             total_count = 0.0
             for res in raw_res:
@@ -91,14 +92,14 @@ def get_avg_request_count(self, interval: str):
 
     def get_avg_input_sequence_tokens(self, interval: str):
         return self._get_average_metric(
-            "nv_llm_http_service_input_sequence_tokens",
+            "input_sequence_tokens",
             interval,
             "avg input sequence tokens",
         )
 
     def get_avg_output_sequence_tokens(self, interval: str):
         return self._get_average_metric(
-            "nv_llm_http_service_output_sequence_tokens",
+            "output_sequence_tokens",
             interval,
             "avg output sequence tokens",
         )
diff --git a/deploy/metrics/README.md b/deploy/metrics/README.md
@@ -60,7 +60,7 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container
 
    - Start the [components/metrics](../../components/metrics/README.md) application to begin monitoring for metric events from dynamo workers and aggregating them on a Prometheus metrics endpoint: `http://localhost:9091/metrics`.
    - Uncomment the appropriate lines in prometheus.yml to poll port 9091.
-   - Start worker(s) that publishes KV Cache metrics: [examples/rust/service_metrics/bin/server](../../lib/runtime/examples/service_metrics/README.md)` can populate dummy KV Cache metrics.
+   - Start worker(s) that publishes KV Cache metrics: [lib/runtime/examples/service_metrics/README.md](../../lib/runtime/examples/service_metrics/README.md) can populate dummy KV Cache metrics.
 
 
 ## Configuration
@@ -95,16 +95,19 @@ The following configuration files should be present in this directory:
 - [grafana_dashboards/grafana-dcgm-metrics.json](./grafana_dashboards/grafana-dcgm-metrics.json): Contains Grafana dashboard configuration for DCGM GPU metrics
 - [grafana_dashboards/grafana-llm-metrics.json](./grafana_dashboards/grafana-llm-metrics.json): This file, which is being phased out, contains the Grafana dashboard configuration for LLM-specific metrics. It requires an additional `metrics` component to operate concurrently. A new version is under development.
 
-## Running the example `metrics` component
+## Running the deprecated `metrics` component
 
-IMPORTANT: This section is being phased out, and some metrics may not function as expected. A new solution is under development.
+⚠️ **DEPRECATION NOTICE** ⚠️
 
-When you run the example [components/metrics](../../components/metrics/README.md) component, it exposes a Prometheus /metrics endpoint with the followings (defined in [../../components/metrics/src/lib.rs](../../components/metrics/src/lib.rs)):
-- `llm_requests_active_slots`: Number of currently active request slots per worker
+When you run the example [components/metrics](../../components/metrics/README.md) component, it exposes a Prometheus /metrics endpoint with the following metrics (defined in [components/metrics/src/lib.rs](../../components/metrics/src/lib.rs)):
+
+**⚠️ The following `llm_kv_*` metrics are deprecated:**
+
+- `llm_requests_active_slots`: Active request slots per worker
 - `llm_requests_total_slots`: Total available request slots per worker
-- `llm_kv_blocks_active`: Number of active KV blocks per worker
+- `llm_kv_blocks_active`: Active KV blocks per worker
 - `llm_kv_blocks_total`: Total KV blocks available per worker
-- `llm_kv_hit_rate_percent`: Cumulative KV Cache hit percent per worker
+- `llm_kv_hit_rate_percent`: KV Cache hit percent per worker
 - `llm_load_avg`: Average load across workers
 - `llm_load_std`: Load standard deviation across workers
 

diff --git a/deploy/metrics/grafana_dashboards/grafana-dynamo-dashboard.json b/deploy/metrics/grafana_dashboards/grafana-dynamo-dashboard.json
@@ -27,7 +27,7 @@
         "type": "prometheus",
         "uid": "P1809F7CD0C75ACF3"
       },
-      "description": "nv_llm_http_service_requests_total (1m)",
+      "description": "dynamo_frontend_requests_total (1m)",
       "fieldConfig": {
         "defaults": {
           "color": {
@@ -106,7 +106,7 @@
       "targets": [
         {
           "editorMode": "code",
-          "expr": "rate(nv_llm_http_service_requests_total[30s])",
+          "expr": "rate(dynamo_frontend_requests_total[30s])",
           "legendFormat": "{{request_type}}, {{status}},",
           "range": true,
           "refId": "A"
@@ -120,7 +120,7 @@
         "type": "prometheus",
         "uid": "P1809F7CD0C75ACF3"
       },
-      "description": "nv_llm_http_service_time_to_first_token_seconds (sum/count)",
+      "description": "dynamo_frontend_time_to_first_token_seconds (sum/count)",
       "fieldConfig": {
         "defaults": {
           "color": {
@@ -199,7 +199,7 @@
       "targets": [
         {
           "editorMode": "code",
-          "expr": "1000*(nv_llm_http_service_time_to_first_token_seconds_sum/nv_llm_http_service_time_to_first_token_seconds_count)",
+          "expr": "1000*(dynamo_frontend_time_to_first_token_seconds_sum/dynamo_frontend_time_to_first_token_seconds_count)",
           "legendFormat": "{{model}}",
           "range": true,
           "refId": "A"
@@ -213,7 +213,7 @@
         "type": "prometheus",
         "uid": "P1809F7CD0C75ACF3"
       },
-      "description": "nv_llm_http_service_inter_token_latency_seconds (sum/count)",
+      "description": "dynamo_frontend_inter_token_latency_seconds (sum/count)",
       "fieldConfig": {
         "defaults": {
           "color": {
@@ -292,7 +292,7 @@
       "targets": [
         {
           "editorMode": "code",
-          "expr": "1000*(nv_llm_http_service_inter_token_latency_seconds_sum/nv_llm_http_service_inter_token_latency_seconds_count)",
+          "expr": "1000*(dynamo_frontend_inter_token_latency_seconds_sum/dynamo_frontend_inter_token_latency_seconds_count)",
           "legendFormat": "{{model}}",
           "range": true,
           "refId": "A"
@@ -306,7 +306,7 @@
         "type": "prometheus",
         "uid": "P1809F7CD0C75ACF3"
       },
-      "description": "nv_llm_http_service_request_duration (sum/count)",
+      "description": "dynamo_frontend_request_duration (sum/count)",
       "fieldConfig": {
         "defaults": {
           "color": {
@@ -385,7 +385,7 @@
       "targets": [
         {
           "editorMode": "code",
-          "expr": "1000*(nv_llm_http_service_request_duration_seconds_sum / nv_llm_http_service_request_duration_seconds_count)",
+          "expr": "1000*(dynamo_frontend_request_duration_seconds_sum / dynamo_frontend_request_duration_seconds_count)",
           "legendFormat": "{{model}}",
           "range": true,
           "refId": "A"
@@ -399,7 +399,7 @@
         "type": "prometheus",
         "uid": "P1809F7CD0C75ACF3"
       },
-      "description": "The length is the number of tokens. nv_llm_http_service_input_sequence_tokens",
+      "description": "The length is the number of tokens. dynamo_frontend_input_sequence_tokens",
       "fieldConfig": {
         "defaults": {
           "color": {
@@ -478,7 +478,7 @@
       "targets": [
         {
           "editorMode": "code",
-          "expr": "nv_llm_http_service_input_sequence_tokens_sum / nv_llm_http_service_input_sequence_tokens_count",
+          "expr": "dynamo_frontend_input_sequence_tokens_sum / dynamo_frontend_input_sequence_tokens_count",
           "legendFormat": "ISL",
           "range": true,
           "refId": "A"
@@ -489,7 +489,7 @@
             "uid": "P1809F7CD0C75ACF3"
           },
           "editorMode": "code",
-          "expr": "nv_llm_http_service_output_sequence_tokens_sum / nv_llm_http_service_output_sequence_tokens_count",
+          "expr": "dynamo_frontend_output_sequence_tokens_sum / dynamo_frontend_output_sequence_tokens_count",
           "hide": false,
           "instant": false,
           "legendFormat": "OSL",

diff --git a/deploy/metrics/grafana_dashboards/grafana-llm-metrics.json b/deploy/metrics/grafana_dashboards/grafana-llm-metrics.json
@@ -26,7 +26,13 @@
     "distributed under the License is distributed on an \"AS IS\" BASIS,",
     "WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.",
     "See the License for the specific language governing permissions and",
-    "limitations under the License."
+    "limitations under the License.",
+    "",
+    "DEPRECATION NOTICE:",
+    "This dashboard uses deprecated llm_kv_* metrics (llm_kv_blocks_active, llm_kv_blocks_total, llm_kv_hit_rate_percent)",
+    "that are part of the deprecated metrics aggregation service. These metrics will be removed in a future release.",
+    "Please migrate to the new MetricsRegistry system which provides dynamo_* metrics instead.",
+    "See docs/guides/metrics.md for migration guidance."
   ],
   "editable": true,
   "fiscalYearStartMonth": 0,

diff --git a/deploy/metrics/prometheus.yml b/deploy/metrics/prometheus.yml
@@ -47,6 +47,8 @@ scrape_configs:
     static_configs:
       - targets: ['host.docker.internal:8081']
 
+  # DEPRECATED: This metrics aggregation service is being deprecated in favor of MetricsRegistry
+  # The new system uses the 'dynamo-backend' job above instead of this separate service
   # This is another demo aggregator that needs to be launched manually. See components/metrics/README.md
   # Note that you may need to disable the firewall on your host. On Ubuntu: sudo ufw allow 9091/tcp
   - job_name: 'metrics-aggregation-service'