diff --git a/README.md b/README.md index b50671cb45c..efe82f3e4f1 100644 --- a/README.md +++ b/README.md @@ -122,7 +122,7 @@ python -m dynamo.sglang.worker deepseek-ai/DeepSeek-R1-Distill-Llama-8B #### Send a Request ```bash -curl localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{ +curl localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B", "messages": [ { diff --git a/components/backends/sglang/README.md b/components/backends/sglang/README.md index b414855b58a..a83b528cb24 100644 --- a/components/backends/sglang/README.md +++ b/components/backends/sglang/README.md @@ -56,10 +56,10 @@ Below we provide a guide that lets you run all of our the common deployment patt ### Start NATS and ETCD in the background -Start using [Docker Compose](../../deploy/metrics/docker-compose.yml) +Start using [Docker Compose](../../../deploy/docker-compose.yml) ```bash -docker compose -f deploy/metrics/docker-compose.yml up -d +docker compose -f deploy/docker-compose.yml up -d ``` ### Build container diff --git a/components/backends/trtllm/README.md b/components/backends/trtllm/README.md index 6775553108f..0a25eb4739e 100644 --- a/components/backends/trtllm/README.md +++ b/components/backends/trtllm/README.md @@ -64,9 +64,9 @@ Note: TensorRT-LLM disaggregation does not support conditional disaggregation ye ### Prerequisites -Start required services (etcd and NATS) using [Docker Compose](../../deploy/metrics/docker-compose.yml) +Start required services (etcd and NATS) using [Docker Compose](../../../deploy/docker-compose.yml) ```bash -docker compose -f deploy/metrics/docker-compose.yml up -d +docker compose -f deploy/docker-compose.yml up -d ``` ### Build docker diff --git a/components/backends/vllm/README.md b/components/backends/vllm/README.md index 7adea341c41..525c8311f5d 100644 --- a/components/backends/vllm/README.md +++ b/components/backends/vllm/README.md @@ -15,10 +15,10 @@ See [deployment architectures](../llm/README.md#deployment-architectures) to lea ### Prerequisites -Start required services (etcd and NATS) using [Docker Compose](../../deploy/metrics/docker-compose.yml): +Start required services (etcd and NATS) using [Docker Compose](../../../deploy/docker-compose.yml): ```bash -docker compose -f deploy/metrics/docker-compose.yml up -d +docker compose -f deploy/docker-compose.yml up -d ``` ### Build and Run docker diff --git a/components/backends/vllm/multi-node.md b/components/backends/vllm/multi-node.md index 7479340aa41..6cf928104b6 100644 --- a/components/backends/vllm/multi-node.md +++ b/components/backends/vllm/multi-node.md @@ -22,7 +22,7 @@ Start the required services on your head node. These endpoints must be accessibl ```bash # On head node (node-1) -docker compose -f deploy/metrics/docker-compose.yml up -d +docker compose -f deploy/docker-compose.yml up -d ``` Default ports: diff --git a/components/metrics/README.md b/components/metrics/README.md index 09a758bd819..bf76151d6ca 100644 --- a/components/metrics/README.md +++ b/components/metrics/README.md @@ -94,7 +94,7 @@ To visualize the metrics being exposed on the Prometheus endpoint, see the Prometheus and Grafana configurations in [deploy/metrics](../../deploy/metrics): ```bash -docker compose -f deploy/metrics/docker-compose.yml --profile metrics up -d +docker compose -f deploy/docker-compose.yml --profile metrics up -d ``` ## Metrics Collection Modes diff --git a/deploy/metrics/docker-compose.yml b/deploy/docker-compose.yml similarity index 95% rename from deploy/metrics/docker-compose.yml rename to deploy/docker-compose.yml index 0a15228ed44..804a2fb9f52 100644 --- a/deploy/metrics/docker-compose.yml +++ b/deploy/docker-compose.yml @@ -92,7 +92,7 @@ services: image: prom/prometheus:v3.4.1 container_name: prometheus volumes: - - ./prometheus.yml:/etc/prometheus/prometheus.yml + - ./metrics/prometheus.yml:/etc/prometheus/prometheus.yml command: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.path=/prometheus' @@ -123,8 +123,8 @@ services: image: grafana/grafana-enterprise:12.0.1 container_name: grafana volumes: - - ./grafana_dashboards:/etc/grafana/provisioning/dashboards - - ./grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml + - ./metrics/grafana_dashboards:/etc/grafana/provisioning/dashboards + - ./metrics/grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml environment: - GF_SERVER_HTTP_PORT=3001 # do not make it admin/admin, because you will be prompted to change the password every time diff --git a/deploy/metrics/README.md b/deploy/metrics/README.md index 8cfced705fe..9b6043a1bb4 100644 --- a/deploy/metrics/README.md +++ b/deploy/metrics/README.md @@ -18,7 +18,7 @@ graph TD PROMETHEUS[Prometheus server :9090] -->|:2379/metrics| ETCD_SERVER[etcd-server :2379, :2380] PROMETHEUS -->|:9401/metrics| DCGM_EXPORTER[dcgm-exporter :9401] PROMETHEUS -->|:7777/metrics| NATS_PROM_EXP - PROMETHEUS -->|:8000/metrics| DYNAMOFE[Dynamo HTTP FE :8000] + PROMETHEUS -->|:8080/metrics| DYNAMOFE[Dynamo HTTP FE :8080] GRAFANA -->|:9090/query API| PROMETHEUS end ``` @@ -34,9 +34,9 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container 2. Start Dynamo dependencies. Assume you're at the root dynamo path: ```bash - docker compose -f deploy/metrics/docker-compose.yml up -d # Minimum components for Dynamo: etcd/nats/dcgm-exporter + docker compose -f deploy/docker-compose.yml up -d # Minimum components for Dynamo: etcd/nats/dcgm-exporter # or - docker compose -f deploy/metrics/docker-compose.yml --profile metrics up -d # In addition to the above, start Prometheus & Grafana + docker compose -f deploy/docker-compose.yml --profile metrics up -d # In addition to the above, start Prometheus & Grafana ``` To target specific GPU(s), export the variable below before running Docker Compose: diff --git a/deploy/metrics/prometheus.yml b/deploy/metrics/prometheus.yml index 063d5885574..a08b447a9bf 100644 --- a/deploy/metrics/prometheus.yml +++ b/deploy/metrics/prometheus.yml @@ -35,7 +35,7 @@ scrape_configs: # This is a demo service that needs to be launched manually. See components/metrics/README.md # Note that you may need to disable the firewall on your host. On Ubuntu: sudo ufw allow 8000/tcp - - job_name: 'llm-demo' + - job_name: 'dynamo-backend' scrape_interval: 10s static_configs: - targets: ['host.docker.internal:8000'] # on the "monitoring" network diff --git a/deploy/sdk/README.md b/deploy/sdk/README.md index dcc540434f8..5be0d854431 100644 --- a/deploy/sdk/README.md +++ b/deploy/sdk/README.md @@ -97,7 +97,7 @@ You can run this pipeline locally by spinning up ETCD and NATS and then running ```bash # Spin up ETCD and NATS -docker compose -f deploy/metrics/docker-compose.yml up -d +docker compose -f deploy/docker-compose.yml up -d ``` then @@ -110,7 +110,7 @@ dynamo serve pipeline:Frontend Once it's up and running, you can make a request to the pipeline using ```bash -curl -X POST http://localhost:8000/generate \ +curl -X POST http://localhost:8080/generate \ -H "Content-Type: application/json" \ -d '{"text": "federer"}' ``` diff --git a/docs/architecture/dynamo_flow.md b/docs/architecture/dynamo_flow.md index a80096430e6..32146e1188d 100644 --- a/docs/architecture/dynamo_flow.md +++ b/docs/architecture/dynamo_flow.md @@ -23,7 +23,7 @@ This diagram shows the NVIDIA Dynamo disaggregated inference system as implement The primary user journey through the system: 1. **Discovery (S1)**: Client discovers the service endpoint -2. **Request (S2)**: HTTP client sends API request to Frontend (OpenAI-compatible server on port 8000) +2. **Request (S2)**: HTTP client sends API request to Frontend (OpenAI-compatible server on port 8080) 3. **Validate (S3)**: Frontend forwards request to Processor for validation and routing 4. **Route (S3)**: Processor routes the validated request to appropriate Decode Worker @@ -84,7 +84,7 @@ graph TD %% Top Layer - Client & Frontend Client["HTTP Client"] S1[["1 DISCOVERY"]] - Frontend["Frontend
OpenAI Compatible Server
Port 8000
"] + Frontend["Frontend
OpenAI Compatible Server
Port 8080
"] S2[["2 REQUEST"]] %% Processing Layer diff --git a/docs/examples/README.md b/docs/examples/README.md index a890ddd9bc6..9ef78cdf246 100644 --- a/docs/examples/README.md +++ b/docs/examples/README.md @@ -67,7 +67,7 @@ Look for one that ends in `-frontend` and use it for port forward. ```bash SERVICE_NAME=$(kubectl get svc -n ${NAMESPACE} -o name | grep frontend | sed 's|.*/||' | sed 's|-frontend||' | head -n1) -kubectl port-forward svc/${SERVICE_NAME}-frontend 8000:8000 -n ${NAMESPACE} +kubectl port-forward svc/${SERVICE_NAME}-frontend 8080:8080 -n ${NAMESPACE} ``` Consult the [Port Forward Documentation](https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/) diff --git a/docs/guides/dynamo_deploy/create_deployment.md b/docs/guides/dynamo_deploy/create_deployment.md index a34865314c2..50007a096a8 100644 --- a/docs/guides/dynamo_deploy/create_deployment.md +++ b/docs/guides/dynamo_deploy/create_deployment.md @@ -88,7 +88,7 @@ Here's a template structure based on the examples: Consult the corresponding sh file. Each of the python commands to launch a component will go into your yaml spec under the `extraPodSpec: -> mainContainer: -> args:` -The front end is launched with "python3 -m dynamo.frontend [--http-port 8000] [--router-mode kv]" +The front end is launched with "python3 -m dynamo.frontend [--http-port 8080] [--router-mode kv]" Each worker will launch `python -m dynamo.YOUR_INFERENCE_BACKEND --model YOUR_MODEL --your-flags `command. If you are a Dynamo contributor the [dynamo run guide](../dynamo_run.md) for details on how to run this command. diff --git a/docs/guides/planner_benchmark/README.md b/docs/guides/planner_benchmark/README.md index 9e74117f432..4332c3cdb54 100644 --- a/docs/guides/planner_benchmark/README.md +++ b/docs/guides/planner_benchmark/README.md @@ -46,7 +46,7 @@ genai-perf profile \ --tokenizer deepseek-ai/DeepSeek-R1-Distill-Llama-8B \ -m deepseek-ai/DeepSeek-R1-Distill-Llama-8B \ --endpoint-type chat \ - --url http://localhost:8000 \ + --url http://localhost:8080 \ --streaming \ --input-file payload:sin_b512_t600_rr5.0-20.0-150.0_io3000150-3000150-0.2-0.8-10.jsonl ``` @@ -76,7 +76,7 @@ In this example, we use a fixed 2p2d engine as baseline. Planner provides a `--n # TODO # in terminal 2 -genai-perf profile --tokenizer deepseek-ai/DeepSeek-R1-Distill-Llama-8B -m deepseek-ai/DeepSeek-R1-Distill-Llama-8B --service-kind openai --endpoint-type chat --url http://localhost:8000 --streaming --input-file payload:sin_b512_t600_rr5.0-20.0-150.0_io3000150-3000150-0.2-0.8-10.jsonl +genai-perf profile --tokenizer deepseek-ai/DeepSeek-R1-Distill-Llama-8B -m deepseek-ai/DeepSeek-R1-Distill-Llama-8B --service-kind openai --endpoint-type chat --url http://localhost:8080 --streaming --input-file payload:sin_b512_t600_rr5.0-20.0-150.0_io3000150-3000150-0.2-0.8-10.jsonl ``` ## Results diff --git a/docs/runtime/README.md b/docs/runtime/README.md index 29a8ab7bd9e..bcd29b8c70e 100644 --- a/docs/runtime/README.md +++ b/docs/runtime/README.md @@ -44,11 +44,11 @@ cargo test The simplest way to deploy the pre-requisite services is using [docker-compose](https://docs.docker.com/compose/install/linux/), -defined in [deploy/metrics/docker-compose.yml](../../deploy/metrics/docker-compose.yml). +defined in [deploy/docker-compose.yml](../../deploy/docker-compose.yml). ``` # At the root of the repository: -docker compose -f deploy/metrics/docker-compose.yml up -d +docker compose -f deploy/docker-compose.yml up -d ``` This will deploy a [NATS.io](https://nats.io/) server and an [etcd](https://etcd.io/) diff --git a/examples/runtime/hello_world/README.md b/examples/runtime/hello_world/README.md index 44e1e7a62ed..d6ab686b37a 100644 --- a/examples/runtime/hello_world/README.md +++ b/examples/runtime/hello_world/README.md @@ -61,7 +61,7 @@ The example demonstrates: # clone the dynamo repository if necessary # git clone https://github.com/ai-dynamo/dynamo.git cd dynamo - docker compose -f deploy/metrics/docker-compose.yml up -d + docker compose -f deploy/docker-compose.yml up -d ``` ### Running the Example diff --git a/lib/runtime/examples/system_metrics/README.md b/lib/runtime/examples/system_metrics/README.md index 6664cc67198..954af7dcef6 100644 --- a/lib/runtime/examples/system_metrics/README.md +++ b/lib/runtime/examples/system_metrics/README.md @@ -18,7 +18,7 @@ cargo build ### Run Server ```bash -export DYN_LOG=1 DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8000 +export DYN_LOG=1 DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8081 cargo run --bin system_server ``` @@ -31,7 +31,7 @@ Note: Running the client will increment `service_requests_total`. ### View Metrics ```bash -curl http://localhost:8000/metrics +curl http://localhost:8081/metrics ``` Example output: @@ -66,7 +66,7 @@ uptime_seconds{namespace="http_server"} 725.997013676 |----------|-------------|---------| | `DYN_LOG` | Enable logging | `0` | | `DYN_SYSTEM_ENABLED` | Enable system metrics | `false` | -| `DYN_SYSTEM_PORT` | HTTP server port | `8000` | +| `DYN_SYSTEM_PORT` | HTTP server port | `8081` | ## Metrics diff --git a/lib/runtime/lib/bindings/python/README.md b/lib/runtime/lib/bindings/python/README.md index 9ab801fc2cf..17a804373b2 100644 --- a/lib/runtime/lib/bindings/python/README.md +++ b/lib/runtime/lib/bindings/python/README.md @@ -44,7 +44,7 @@ cargo test The simplest way to deploy the pre-requisite services is using [docker-compose](https://docs.docker.com/compose/install/linux/), -defined in the project's root [docker-compose.yml](../../../docker-compose.yml). +defined in the project's root [docker-compose.yml](../../../../../deploy/docker-compose.yml). ``` docker-compose up -d