Skip to content

Commit faafa5f

Browse files
authored
docs: add a docs/guides/metrics.md (#2160)
Co-authored-by: Keiven Chang <[email protected]>
1 parent efd863d commit faafa5f

File tree

2 files changed

+379
-21
lines changed

2 files changed

+379
-21
lines changed

deploy/metrics/README.md

Lines changed: 276 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,17 @@
22

33
This directory contains configuration for visualizing metrics from the metrics aggregation service using Prometheus and Grafana.
44

5-
## Components
5+
> [!NOTE]
6+
> For detailed information about Dynamo's metrics system, including hierarchical metrics, automatic labeling, and usage examples, see the [Metrics Guide](../../docs/guides/metrics.md).
7+
8+
## Overview
9+
10+
### Components
611

712
- **Prometheus Server**: Collects and stores metrics from Dynamo services and other components.
813
- **Grafana**: Provides dashboards by querying the Prometheus Server.
914

10-
## Topology
15+
### Topology
1116

1217
Default Service Relationship Diagram:
1318
```mermaid
@@ -29,17 +34,63 @@ The dcgm-exporter service in the Docker Compose network is configured to use por
2934

3035
As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build containers with `--framework VLLM` or `--framework TENSORRTLLM`.
3136

37+
### Available Metrics
38+
39+
#### Component Metrics
40+
41+
The core Dynamo backend system automatically exposes metrics with the `dynamo_component_*` prefix for all components that use the `DistributedRuntime` framework:
42+
43+
- `dynamo_component_concurrent_requests`: Requests currently being processed (gauge)
44+
- `dynamo_component_request_bytes_total`: Total bytes received in requests (counter)
45+
- `dynamo_component_request_duration_seconds`: Request processing time (histogram)
46+
- `dynamo_component_requests_total`: Total requests processed (counter)
47+
- `dynamo_component_response_bytes_total`: Total bytes sent in responses (counter)
48+
- `dynamo_component_system_uptime_seconds`: DistributedRuntime uptime (gauge)
49+
50+
#### Specialized Component Metrics
51+
52+
Some components expose additional metrics specific to their functionality:
53+
54+
- `dynamo_preprocessor_*`: Metrics specific to preprocessor components
55+
56+
#### Frontend Metrics
57+
58+
When using Dynamo HTTP Frontend (`--framework VLLM` or `--framework TENSORRTLLM`), these metrics are automatically exposed with the `dynamo_frontend_*` prefix and include `model` labels containing the model name:
59+
60+
- `dynamo_frontend_inflight_requests`: Inflight requests (gauge)
61+
- `dynamo_frontend_input_sequence_tokens`: Input sequence length (histogram)
62+
- `dynamo_frontend_inter_token_latency_seconds`: Inter-token latency (histogram)
63+
- `dynamo_frontend_output_sequence_tokens`: Output sequence length (histogram)
64+
- `dynamo_frontend_request_duration_seconds`: LLM request duration (histogram)
65+
- `dynamo_frontend_requests_total`: Total LLM requests (counter)
66+
- `dynamo_frontend_time_to_first_token_seconds`: Time to first token (histogram)
67+
68+
### Required Files
69+
70+
The following configuration files should be present in this directory:
71+
- [docker-compose.yml](./docker-compose.yml): Defines the Prometheus and Grafana services
72+
- [prometheus.yml](./prometheus.yml): Contains Prometheus scraping configuration
73+
- [grafana-datasources.yml](./grafana-datasources.yml): Contains Grafana datasource configuration
74+
- [grafana_dashboards/grafana-dashboard-providers.yml](./grafana_dashboards/grafana-dashboard-providers.yml): Contains Grafana dashboard provider configuration
75+
- [grafana_dashboards/grafana-dynamo-dashboard.json](./grafana_dashboards/grafana-dynamo-dashboard.json): A general Dynamo Dashboard for both SW and HW metrics.
76+
- [grafana_dashboards/grafana-dcgm-metrics.json](./grafana_dashboards/grafana-dcgm-metrics.json): Contains Grafana dashboard configuration for DCGM GPU metrics
77+
- [grafana_dashboards/grafana-llm-metrics.json](./grafana_dashboards/grafana-llm-metrics.json): This file, which is being phased out, contains the Grafana dashboard configuration for LLM-specific metrics. It requires an additional `metrics` component to operate concurrently. A new version is under development.
78+
3279
## Getting Started
3380

81+
### Prerequisites
82+
3483
1. Make sure Docker and Docker Compose are installed on your system
3584

36-
2. Start Dynamo dependencies. Assume you're at the root dynamo path:
85+
### Quick Start
86+
87+
1. Start Dynamo dependencies. Assume you're at the root dynamo path:
3788

3889
```bash
3990
# Start the basic services (etcd & natsd), along with Prometheus and Grafana
4091
docker compose -f deploy/docker-compose.yml --profile metrics up -d
4192

42-
# Minimum components for Dynamo: etcd/nats/dcgm-exporter
93+
# Minimum components for Dynamo (will not have Prometheus and Grafana): etcd/nats/dcgm-exporter
4394
docker compose -f deploy/docker-compose.yml up -d
4495
```
4596

@@ -48,24 +99,22 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container
4899
export CUDA_VISIBLE_DEVICES=0,2
49100
```
50101

51-
3. Web servers started. The ones that end in /metrics are in Prometheus format:
102+
2. Web servers started. The ones that end in /metrics are in Prometheus format:
52103
- Grafana: `http://localhost:3001` (default login: dynamo/dynamo)
53104
- Prometheus Server: `http://localhost:9090`
54105
- NATS Server: `http://localhost:8222` (monitoring endpoints: /varz, /healthz, etc.)
55106
- NATS Prometheus Exporter: `http://localhost:7777/metrics`
56107
- etcd Server: `http://localhost:2379/metrics`
57108
- DCGM Exporter: `http://localhost:9401/metrics`
58109

59-
4. Optionally, if you want to experiment further, look through components/metrics/README.md for more details on launching a metrics server (subscribes to nats), mock_worker (publishes to nats), and real workers.
60110

61111
- Start the [components/metrics](../../components/metrics/README.md) application to begin monitoring for metric events from dynamo workers and aggregating them on a Prometheus metrics endpoint: `http://localhost:9091/metrics`.
62112
- Uncomment the appropriate lines in prometheus.yml to poll port 9091.
63113
- Start worker(s) that publishes KV Cache metrics: [lib/runtime/examples/service_metrics/README.md](../../lib/runtime/examples/service_metrics/README.md) can populate dummy KV Cache metrics.
64114

115+
### Configuration
65116

66-
## Configuration
67-
68-
### Prometheus
117+
#### Prometheus
69118

70119
The Prometheus configuration is specified in [prometheus.yml](./prometheus.yml). This file is set up to collect metrics from the metrics aggregation service endpoint.
71120

@@ -77,29 +126,233 @@ After making changes to prometheus.yml, it is necessary to reload the configurat
77126
docker compose -f deploy/docker-compose.yml up prometheus -d --force-recreate
78127
```
79128

80-
### Grafana
129+
#### Grafana
81130

82131
Grafana is pre-configured with:
83132
- Prometheus datasource
84133
- Sample dashboard for visualizing service metrics
85134
![grafana image](./grafana-dynamo-composite.png)
86135

87-
## Required Files
136+
### Troubleshooting
88137

89-
The following configuration files should be present in this directory:
90-
- [docker-compose.yml](./docker-compose.yml): Defines the Prometheus and Grafana services
91-
- [prometheus.yml](./prometheus.yml): Contains Prometheus scraping configuration
92-
- [grafana-datasources.yml](./grafana-datasources.yml): Contains Grafana datasource configuration
93-
- [grafana_dashboards/grafana-dashboard-providers.yml](./grafana_dashboards/grafana-dashboard-providers.yml): Contains Grafana dashboard provider configuration
94-
- [grafana_dashboards/grafana-dynamo-dashboard.json](./grafana_dashboards/grafana-dynamo-dashboard.json): A general Dynamo Dashboard for both SW and HW metrics.
95-
- [grafana_dashboards/grafana-dcgm-metrics.json](./grafana_dashboards/grafana-dcgm-metrics.json): Contains Grafana dashboard configuration for DCGM GPU metrics
96-
- [grafana_dashboards/grafana-llm-metrics.json](./grafana_dashboards/grafana-llm-metrics.json): This file, which is being phased out, contains the Grafana dashboard configuration for LLM-specific metrics. It requires an additional `metrics` component to operate concurrently. A new version is under development.
138+
1. Verify services are running:
139+
```bash
140+
docker compose ps
141+
```
97142

98-
## Running the deprecated `metrics` component
143+
2. Check logs:
144+
```bash
145+
docker compose logs prometheus
146+
docker compose logs grafana
147+
```
148+
149+
3. For issues with the legacy metrics component (being phased out), see [components/metrics/README.md](../../components/metrics/README.md) for details on the exposed metrics and troubleshooting steps.
150+
151+
## Developer Guide
152+
153+
### Creating Metrics at Different Hierarchy Levels
154+
155+
#### Runtime-Level Metrics
156+
157+
```rust
158+
use dynamo_runtime::DistributedRuntime;
159+
160+
let runtime = DistributedRuntime::new()?;
161+
let namespace = runtime.namespace("my_namespace")?;
162+
let component = namespace.component("my_component")?;
163+
let endpoint = component.endpoint("my_endpoint")?;
164+
165+
// Create endpoint-level counters (this is a Prometheus Counter type)
166+
let total_requests = endpoint.create_counter(
167+
"total_requests",
168+
"Total requests across all namespaces",
169+
&[]
170+
)?;
171+
172+
let active_connections = endpoint.create_gauge(
173+
"active_connections",
174+
"Number of active client connections",
175+
&[]
176+
)?;
177+
```
178+
179+
#### Namespace-Level Metrics
180+
181+
```rust
182+
let namespace = runtime.namespace("my_model")?;
183+
184+
// Namespace-scoped metrics
185+
let model_requests = namespace.create_counter(
186+
"model_requests",
187+
"Requests for this specific model",
188+
&[]
189+
)?;
190+
191+
let model_latency = namespace.create_histogram(
192+
"model_latency_seconds",
193+
"Model inference latency",
194+
&[],
195+
&[0.001, 0.01, 0.1, 1.0, 10.0]
196+
)?;
197+
```
198+
199+
#### Component-Level Metrics
200+
201+
```rust
202+
let component = namespace.component("backend")?;
203+
204+
// Component-specific metrics
205+
let backend_requests = component.create_counter(
206+
"backend_requests",
207+
"Requests handled by this backend component",
208+
&[]
209+
)?;
210+
211+
let gpu_memory_usage = component.create_gauge(
212+
"gpu_memory_bytes",
213+
"GPU memory usage in bytes",
214+
&[]
215+
)?;
216+
```
217+
218+
#### Endpoint-Level Metrics
219+
220+
```rust
221+
let endpoint = component.endpoint("generate")?;
222+
223+
// Endpoint-specific metrics
224+
let generate_requests = endpoint.create_counter(
225+
"generate_requests",
226+
"Generate endpoint requests",
227+
&[]
228+
)?;
229+
230+
let generate_latency = endpoint.create_histogram(
231+
"generate_latency_seconds",
232+
"Generate endpoint latency",
233+
&[],
234+
&[0.001, 0.01, 0.1, 1.0, 10.0]
235+
)?;
236+
```
237+
238+
### Creating Vector Metrics with Dynamic Labels
239+
240+
Use vector metrics when you need to track metrics with different label values:
241+
242+
```rust
243+
// Counter with labels
244+
let requests_by_model = endpoint.create_counter_vec(
245+
"requests_by_model",
246+
"Requests by model type",
247+
&["model_type", "model_size"]
248+
)?;
249+
250+
// Increment with specific labels
251+
requests_by_model.with_label_values(&["llama", "7b"]).inc();
252+
requests_by_model.with_label_values(&["gpt", "13b"]).inc();
253+
254+
// Gauge with labels
255+
let memory_by_gpu = component.create_gauge_vec(
256+
"gpu_memory_bytes",
257+
"GPU memory usage by device",
258+
&["gpu_id", "memory_type"]
259+
)?;
260+
261+
memory_by_gpu.with_label_values(&["0", "allocated"]).set(8192.0);
262+
memory_by_gpu.with_label_values(&["0", "cached"]).set(4096.0);
263+
```
264+
265+
### Creating Histograms
266+
267+
Histograms are useful for measuring distributions of values like latency:
268+
269+
```rust
270+
let latency_histogram = endpoint.create_histogram(
271+
"request_latency_seconds",
272+
"Request latency distribution",
273+
&[],
274+
&[0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0]
275+
)?;
276+
277+
// Record latency values
278+
latency_histogram.observe(0.023); // 23ms
279+
latency_histogram.observe(0.156); // 156ms
280+
```
281+
282+
### Transitioning from Plain Prometheus
283+
284+
If you're currently using plain Prometheus metrics, transitioning to Dynamo's `MetricsRegistry` is straightforward:
285+
286+
#### Before (Plain Prometheus)
287+
288+
```rust
289+
use prometheus::{Counter, Opts, Registry};
290+
291+
// Create a registry to hold metrics
292+
let registry = Registry::new();
293+
let counter_opts = Opts::new("my_counter", "My custom counter");
294+
let counter = Counter::with_opts(counter_opts).unwrap();
295+
registry.register(Box::new(counter.clone())).unwrap();
296+
297+
// Use the counter
298+
counter.inc();
299+
300+
// To expose metrics, you'd need to set up an HTTP server manually
301+
// and implement the /metrics endpoint yourself
302+
```
303+
304+
#### After (Dynamo MetricsRegistry)
305+
306+
```rust
307+
let counter = endpoint.create_counter(
308+
"my_counter",
309+
"My custom counter",
310+
&[]
311+
)?;
312+
313+
counter.inc();
314+
```
315+
316+
**Note:** The metric is automatically registered when created via the endpoint's `create_counter` factory method.
317+
318+
**Benefits of Dynamo's approach:**
319+
- **Automatic registration**: Metrics created via endpoint's `create_*` factory methods are automatically registered with the system
320+
- Automatic labeling with namespace, component, and endpoint information
321+
- Consistent metric naming with `dynamo_` prefix
322+
- Built-in HTTP metrics endpoint when enabled with `DYN_SYSTEM_ENABLED=true`
323+
- Hierarchical metric organization
324+
325+
### Advanced Features
326+
327+
#### Custom Buckets for Histograms
328+
329+
```rust
330+
// Define custom buckets for your use case
331+
let custom_buckets = vec![0.001, 0.01, 0.1, 1.0, 10.0];
332+
let latency = endpoint.create_histogram(
333+
"api_latency_seconds",
334+
"API latency in seconds",
335+
&[],
336+
&custom_buckets
337+
)?;
338+
```
339+
340+
#### Metric Aggregation
341+
342+
```rust
343+
// Aggregate metrics across multiple endpoints
344+
let total_requests = namespace.create_counter(
345+
"total_requests",
346+
"Total requests across all endpoints",
347+
&[]
348+
)?;
349+
```
350+
351+
## Running the deprecated `components/metrics` program
99352

100353
⚠️ **DEPRECATION NOTICE** ⚠️
101354

102-
When you run the example [components/metrics](../../components/metrics/README.md) component, it exposes a Prometheus /metrics endpoint with the following metrics (defined in [components/metrics/src/lib.rs](../../components/metrics/src/lib.rs)):
355+
When you run the example [components/metrics](../../components/metrics/README.md) program, it exposes a Prometheus /metrics endpoint with the following metrics (defined in [components/metrics/src/lib.rs](../../components/metrics/src/lib.rs)):
103356

104357
**⚠️ The following `llm_kv_*` metrics are deprecated:**
105358

@@ -123,3 +376,5 @@ When you run the example [components/metrics](../../components/metrics/README.md
123376
docker compose logs prometheus
124377
docker compose logs grafana
125378
```
379+
380+
3. For issues with the legacy metrics component (being phased out), see [components/metrics/README.md](../../components/metrics/README.md) for details on the exposed metrics and troubleshooting steps.

0 commit comments

Comments
 (0)