ai-dynamo · keivenchang · Jul 25, 2025 · Jul 25, 2025 · Jul 25, 2025 · Jul 25, 2025
diff --git a/README.md b/README.md
@@ -122,7 +122,7 @@ python -m dynamo.sglang.worker deepseek-ai/DeepSeek-R1-Distill-Llama-8B
 #### Send a Request
 
 ```bash
-curl localhost:8000/v1/chat/completions   -H "Content-Type: application/json"   -d '{
+curl localhost:8080/v1/chat/completions   -H "Content-Type: application/json"   -d '{
     "model": "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
     "messages": [
     {

diff --git a/components/backends/sglang/README.md b/components/backends/sglang/README.md
@@ -56,10 +56,10 @@ Below we provide a guide that lets you run all of our the common deployment patt
 
 ### Start NATS and ETCD in the background
 
-Start using [Docker Compose](../../deploy/metrics/docker-compose.yml)
+Start using [Docker Compose](../../../deploy/docker-compose.yml)
 
 ```bash
-docker compose -f deploy/metrics/docker-compose.yml up -d
+docker compose -f deploy/docker-compose.yml up -d
 ```
 
 ### Build container

diff --git a/components/backends/trtllm/README.md b/components/backends/trtllm/README.md
@@ -64,9 +64,9 @@ Note: TensorRT-LLM disaggregation does not support conditional disaggregation ye
 
 ### Prerequisites
 
-Start required services (etcd and NATS) using [Docker Compose](../../deploy/metrics/docker-compose.yml)
+Start required services (etcd and NATS) using [Docker Compose](../../../deploy/docker-compose.yml)
 ```bash
-docker compose -f deploy/metrics/docker-compose.yml up -d
+docker compose -f deploy/docker-compose.yml up -d
 ```
 
 ### Build docker

diff --git a/components/backends/vllm/README.md b/components/backends/vllm/README.md
@@ -15,10 +15,10 @@ See [deployment architectures](../llm/README.md#deployment-architectures) to lea
 
 ### Prerequisites
 
-Start required services (etcd and NATS) using [Docker Compose](../../deploy/metrics/docker-compose.yml):
+Start required services (etcd and NATS) using [Docker Compose](../../../deploy/docker-compose.yml):
 
 ```bash
-docker compose -f deploy/metrics/docker-compose.yml up -d
+docker compose -f deploy/docker-compose.yml up -d
 ```
 
 ### Build and Run docker

diff --git a/components/backends/vllm/multi-node.md b/components/backends/vllm/multi-node.md
@@ -22,7 +22,7 @@ Start the required services on your head node. These endpoints must be accessibl
 
 ```bash
 # On head node (node-1)
-docker compose -f deploy/metrics/docker-compose.yml up -d
+docker compose -f deploy/docker-compose.yml up -d
 ```
 
 Default ports:

diff --git a/components/metrics/README.md b/components/metrics/README.md
@@ -94,7 +94,7 @@ To visualize the metrics being exposed on the Prometheus endpoint,
 see the Prometheus and Grafana configurations in
 [deploy/metrics](../../deploy/metrics):
 ```bash
-docker compose -f deploy/metrics/docker-compose.yml --profile metrics up -d
+docker compose -f deploy/docker-compose.yml --profile metrics up -d
 ```
 
 ## Metrics Collection Modes

@@ -92,7 +92,7 @@ services:
     image: prom/prometheus:v3.4.1
     container_name: prometheus
     volumes:
-      - ./prometheus.yml:/etc/prometheus/prometheus.yml
+      - ./metrics/prometheus.yml:/etc/prometheus/prometheus.yml
     command:
       - '--config.file=/etc/prometheus/prometheus.yml'
       - '--storage.tsdb.path=/prometheus'
@@ -123,8 +123,8 @@ services:
     image: grafana/grafana-enterprise:12.0.1
     container_name: grafana
     volumes:
-      - ./grafana_dashboards:/etc/grafana/provisioning/dashboards
-      - ./grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml
+      - ./metrics/grafana_dashboards:/etc/grafana/provisioning/dashboards
+      - ./metrics/grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml
     environment:
       - GF_SERVER_HTTP_PORT=3001
       # do not make it admin/admin, because you will be prompted to change the password every time

@@ -18,7 +18,7 @@ graph TD
         PROMETHEUS[Prometheus server :9090] -->|:2379/metrics| ETCD_SERVER[etcd-server :2379, :2380]
         PROMETHEUS -->|:9401/metrics| DCGM_EXPORTER[dcgm-exporter :9401]
         PROMETHEUS -->|:7777/metrics| NATS_PROM_EXP
-        PROMETHEUS -->|:8000/metrics| DYNAMOFE[Dynamo HTTP FE :8000]
+        PROMETHEUS -->|:8080/metrics| DYNAMOFE[Dynamo HTTP FE :8080]
         GRAFANA -->|:9090/query API| PROMETHEUS
     end
 ```
@@ -34,9 +34,9 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container
 2. Start Dynamo dependencies. Assume you're at the root dynamo path:
 
    ```bash
-   docker compose -f deploy/metrics/docker-compose.yml up -d  # Minimum components for Dynamo: etcd/nats/dcgm-exporter
+   docker compose -f deploy/docker-compose.yml up -d  # Minimum components for Dynamo: etcd/nats/dcgm-exporter
    # or
-   docker compose -f deploy/metrics/docker-compose.yml --profile metrics up -d  # In addition to the above, start Prometheus & Grafana
+   docker compose -f deploy/docker-compose.yml --profile metrics up -d  # In addition to the above, start Prometheus & Grafana
    ```
 
    To target specific GPU(s), export the variable below before running Docker Compose:

@@ -35,7 +35,7 @@ scrape_configs:
 
   # This is a demo service that needs to be launched manually. See components/metrics/README.md
   # Note that you may need to disable the firewall on your host. On Ubuntu: sudo ufw allow 8000/tcp
-  - job_name: 'llm-demo'
+  - job_name: 'dynamo-backend'
     scrape_interval: 10s
     static_configs:
       - targets: ['host.docker.internal:8000']  # on the "monitoring" network

@@ -97,7 +97,7 @@ You can run this pipeline locally by spinning up ETCD and NATS and then running
 
 ```bash
 # Spin up ETCD and NATS
-docker compose -f deploy/metrics/docker-compose.yml up -d
+docker compose -f deploy/docker-compose.yml up -d
 ```
 
 then
@@ -110,7 +110,7 @@ dynamo serve pipeline:Frontend
 Once it's up and running, you can make a request to the pipeline using
 
 ```bash
-curl -X POST http://localhost:8000/generate \
+curl -X POST http://localhost:8080/generate \
     -H "Content-Type: application/json" \
     -d '{"text": "federer"}'
 ```

diff --git a/docs/architecture/dynamo_flow.md b/docs/architecture/dynamo_flow.md
@@ -23,7 +23,7 @@ This diagram shows the NVIDIA Dynamo disaggregated inference system as implement
 The primary user journey through the system:
 
 1. **Discovery (S1)**: Client discovers the service endpoint
-2. **Request (S2)**: HTTP client sends API request to Frontend (OpenAI-compatible server on port 8000)
+2. **Request (S2)**: HTTP client sends API request to Frontend (OpenAI-compatible server on port 8080)
 3. **Validate (S3)**: Frontend forwards request to Processor for validation and routing
 4. **Route (S3)**: Processor routes the validated request to appropriate Decode Worker
 
@@ -84,7 +84,7 @@ graph TD
     %% Top Layer - Client & Frontend
     Client["<b>HTTP Client</b>"]
     S1[["<b>1 DISCOVERY</b>"]]
-    Frontend["<b>Frontend</b><br/><i>OpenAI Compatible Server<br/>Port 8000</i>"]
+    Frontend["<b>Frontend</b><br/><i>OpenAI Compatible Server<br/>Port 8080</i>"]
     S2[["<b>2 REQUEST</b>"]]
 
     %% Processing Layer

diff --git a/docs/examples/README.md b/docs/examples/README.md
@@ -67,7 +67,7 @@ Look for one that ends in `-frontend` and use it for port forward.
 
 ```bash
 SERVICE_NAME=$(kubectl get svc -n ${NAMESPACE} -o name | grep frontend | sed 's|.*/||' | sed 's|-frontend||' | head -n1)
-kubectl port-forward svc/${SERVICE_NAME}-frontend 8000:8000 -n ${NAMESPACE}
+kubectl port-forward svc/${SERVICE_NAME}-frontend 8080:8080 -n ${NAMESPACE}
 ```
 
 Consult the [Port Forward Documentation](https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/)

diff --git a/docs/guides/dynamo_deploy/create_deployment.md b/docs/guides/dynamo_deploy/create_deployment.md
@@ -88,7 +88,7 @@ Here's a template structure based on the examples:
 Consult the corresponding sh file. Each of the python commands to launch a component will go into your yaml spec under the
 `extraPodSpec: -> mainContainer: -> args:`
 
-The front end is launched with "python3 -m dynamo.frontend [--http-port 8000] [--router-mode kv]"
+The front end is launched with "python3 -m dynamo.frontend [--http-port 8080] [--router-mode kv]"
 Each worker will launch `python -m dynamo.YOUR_INFERENCE_BACKEND --model YOUR_MODEL --your-flags `command.
 If you are a Dynamo contributor the [dynamo run guide](../dynamo_run.md) for details on how to run this command.
 

diff --git a/docs/guides/planner_benchmark/README.md b/docs/guides/planner_benchmark/README.md
@@ -46,7 +46,7 @@ genai-perf profile \
     --tokenizer deepseek-ai/DeepSeek-R1-Distill-Llama-8B \
     -m deepseek-ai/DeepSeek-R1-Distill-Llama-8B \
     --endpoint-type chat \
-    --url http://localhost:8000 \
+    --url http://localhost:8080 \
     --streaming \
     --input-file payload:sin_b512_t600_rr5.0-20.0-150.0_io3000150-3000150-0.2-0.8-10.jsonl
 ```
@@ -76,7 +76,7 @@ In this example, we use a fixed 2p2d engine as baseline. Planner provides a `--n
 # TODO
 
 # in terminal 2
-genai-perf profile --tokenizer deepseek-ai/DeepSeek-R1-Distill-Llama-8B -m deepseek-ai/DeepSeek-R1-Distill-Llama-8B --service-kind openai --endpoint-type chat --url http://localhost:8000 --streaming --input-file payload:sin_b512_t600_rr5.0-20.0-150.0_io3000150-3000150-0.2-0.8-10.jsonl
+genai-perf profile --tokenizer deepseek-ai/DeepSeek-R1-Distill-Llama-8B -m deepseek-ai/DeepSeek-R1-Distill-Llama-8B --service-kind openai --endpoint-type chat --url http://localhost:8080 --streaming --input-file payload:sin_b512_t600_rr5.0-20.0-150.0_io3000150-3000150-0.2-0.8-10.jsonl
 ```
 
 ## Results

diff --git a/docs/runtime/README.md b/docs/runtime/README.md
@@ -44,11 +44,11 @@ cargo test
 
 The simplest way to deploy the pre-requisite services is using
 [docker-compose](https://docs.docker.com/compose/install/linux/),
-defined in [deploy/metrics/docker-compose.yml](../../deploy/metrics/docker-compose.yml).
+defined in [deploy/docker-compose.yml](../../deploy/docker-compose.yml).
 
 ```
 # At the root of the repository:
-docker compose -f deploy/metrics/docker-compose.yml up -d
+docker compose -f deploy/docker-compose.yml up -d
 ```
 
 This will deploy a [NATS.io](https://nats.io/) server and an [etcd](https://etcd.io/)

diff --git a/examples/multimodal/README.md b/examples/multimodal/README.md
@@ -73,7 +73,7 @@ dynamo serve graphs.agg:Frontend -f ./configs/agg-llava.yaml
 
 In another terminal:
 ```bash
-curl http://localhost:8000/v1/chat/completions \
+curl http://localhost:8080/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{
       "model": "llava-hf/llava-1.5-7b-hf",
@@ -146,7 +146,7 @@ dynamo serve graphs.disagg:Frontend -f configs/disagg.yaml
 
 In another terminal:
 ```bash
-curl http://localhost:8000/v1/chat/completions \
+curl http://localhost:8080/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{
       "model": "llava-hf/llava-1.5-7b-hf",
@@ -224,10 +224,10 @@ in `dynamo deployment get ${DEPLOYMENT_NAME}` and skip the steps to find and for
 export FRONTEND_POD=$(kubectl get pods -n ${KUBE_NS} | grep "${DEPLOYMENT_NAME}-frontend" | sort -k1 | tail -n1 | awk '{print $1}')
 
 # Forward the pod's port to localhost
-kubectl port-forward pod/$FRONTEND_POD 8000:8000 -n ${KUBE_NS}
+kubectl port-forward pod/$FRONTEND_POD 8080:8080 -n ${KUBE_NS}
 
 # Test the API endpoint
-curl localhost:8000/v1/chat/completions \
+curl localhost:8080/v1/chat/completions \
   -H "Content-Type: application/json" \
   -d '{
     "model": "llava-hf/llava-1.5-7b-hf",
@@ -378,7 +378,7 @@ dynamo serve graphs.disagg_video:Frontend -f ./configs/disagg_video.yaml
 
 In another terminal:
 ```bash
-curl -X 'POST'   'http://localhost:8000/v1/chat/completions'   -H 'Content-Type: application/json'   -d '{
+curl -X 'POST'   'http://localhost:8080/v1/chat/completions'   -H 'Content-Type: application/json'   -d '{
     "model": "llava-hf/LLaVA-NeXT-Video-7B-hf",
     "messages": [
       {

@@ -61,7 +61,7 @@ The example demonstrates:
  # clone the dynamo repository if necessary
  # git clone https://github.com/ai-dynamo/dynamo.git
  cd dynamo
- docker compose -f deploy/metrics/docker-compose.yml up -d
+ docker compose -f deploy/docker-compose.yml up -d
  ```
 
 ### Running the Example

diff --git a/lib/runtime/examples/system_metrics/README.md b/lib/runtime/examples/system_metrics/README.md
@@ -18,7 +18,7 @@ cargo build
 
 ### Run Server
 ```bash
-export DYN_LOG=1 DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8000
+export DYN_LOG=1 DYN_SYSTEM_ENABLED=true DYN_SYSTEM_PORT=8081
 cargo run --bin system_server
 ```
 
@@ -31,7 +31,7 @@ Note: Running the client will increment `service_requests_total`.
 
 ### View Metrics
 ```bash
-curl http://localhost:8000/metrics
+curl http://localhost:8081/metrics
 ```
 
 Example output:
@@ -66,7 +66,7 @@ uptime_seconds{namespace="http_server"} 725.997013676
 |----------|-------------|---------|
 | `DYN_LOG` | Enable logging | `0` |
 | `DYN_SYSTEM_ENABLED` | Enable system metrics | `false` |
-| `DYN_SYSTEM_PORT` | HTTP server port | `8000` |
+| `DYN_SYSTEM_PORT` | HTTP server port | `8081` |
 
 ## Metrics
 

diff --git a/lib/runtime/lib/bindings/python/README.md b/lib/runtime/lib/bindings/python/README.md
@@ -44,7 +44,7 @@ cargo test
 
 The simplest way to deploy the pre-requisite services is using
 [docker-compose](https://docs.docker.com/compose/install/linux/),
-defined in the project's root [docker-compose.yml](../../../docker-compose.yml).
+defined in the project's root [docker-compose.yml](../../../../../deploy/docker-compose.yml).
 
 ```
 docker-compose up -d