ai-dynamo · dmitry-tokarev-nv · Aug 6, 2025 · Aug 4, 2025 · Aug 4, 2025 · Aug 5, 2025
diff --git a/components/backends/sglang/deploy/README.md b/components/backends/sglang/deploy/README.md
@@ -74,7 +74,7 @@ extraPodSpec:
 
 Before using these templates, ensure you have:
 
-1. **Dynamo Cloud Platform installed** - See [Installing Dynamo Cloud](../../docs/guides/dynamo_deploy/dynamo_cloud.md)
+1. **Dynamo Cloud Platform installed** - See [Installing Dynamo Cloud](../../../../docs/guides/dynamo_deploy/dynamo_cloud.md)
 2. **Kubernetes cluster with GPU support**
 3. **Container registry access** for SGLang runtime images
 4. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`)
@@ -159,4 +159,4 @@ Common issues and solutions:
 3. **Health check failures**: Review model loading logs and increase `initialDelaySeconds`
 4. **Out of memory**: Increase memory limits or reduce model batch size
 
-For additional support, refer to the [deployment troubleshooting guide](../../docs/guides/dynamo_deploy/quickstart.md#troubleshooting).
+For additional support, refer to the [deployment troubleshooting guide](../../../../docs/guides/dynamo_deploy/quickstart.md#troubleshooting).
diff --git a/components/backends/sglang/slurm_jobs/README.md b/components/backends/sglang/slurm_jobs/README.md
@@ -1 +1 @@
-Please refer to [Deploying Dynamo with SGLang on SLURM](../../../../../docs/components/backends/sglang/slurm_jobs/README.md) for more details.
+Please refer to [Deploying Dynamo with SGLang on SLURM](../../../../docs/components/backends/sglang/slurm_jobs/README.md) for more details.
diff --git a/components/backends/trtllm/README.md b/components/backends/trtllm/README.md
@@ -49,12 +49,12 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 
 | Feature | TensorRT-LLM | Notes |
 |---------|--------------|-------|
-| [**Disaggregated Serving**](../../../architecture/disagg_serving.md) | ✅ |  |
-| [**Conditional Disaggregation**](../../../architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | Not supported yet |
-| [**KV-Aware Routing**](../../../architecture/kv_cache_routing.md) | ✅ |  |
-| [**SLA-Based Planner**](../../../architecture/sla_planner.md) | 🚧 | Planned |
-| [**Load Based Planner**](../../../architecture/load_planner.md) | 🚧 | Planned |
-| [**KVBM**](../../../architecture/kvbm_architecture.md) | 🚧 | Planned |
+| [**Disaggregated Serving**](../../../docs/architecture/disagg_serving.md) | ✅ |  |
+| [**Conditional Disaggregation**](../../../docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | Not supported yet |
+| [**KV-Aware Routing**](../../../docs/architecture/kv_cache_routing.md) | ✅ |  |
+| [**SLA-Based Planner**](../../../docs/architecture/sla_planner.md) | 🚧 | Planned |
+| [**Load Based Planner**](../../../docs/architecture/load_planner.md) | 🚧 | Planned |
+| [**KVBM**](../../../docs/architecture/kvbm_architecture.md) | 🚧 | Planned |
 
 ### Large Scale P/D and WideEP Features
 
@@ -180,14 +180,14 @@ Below we provide a selected list of advanced examples. Please open up an issue i
 
 ### Multinode Deployment
 
-For comprehensive instructions on multinode serving, see the [multinode-examples.md](./multinode-examples.md) guide. It provides step-by-step deployment examples and configuration tips for running Dynamo with TensorRT-LLM across multiple nodes. While the walkthrough uses DeepSeek-R1 as the model, you can easily adapt the process for any supported model by updating the relevant configuration files. You can see [Llama4+eagle](./llama4_plus_eagle.md) guide to learn how to use these scripts when a single worker fits on the single node.
+For comprehensive instructions on multinode serving, see the [multinode-examples.md](../../../docs/components/backends/trtllm/multinode-examples.md) guide. It provides step-by-step deployment examples and configuration tips for running Dynamo with TensorRT-LLM across multiple nodes. While the walkthrough uses DeepSeek-R1 as the model, you can easily adapt the process for any supported model by updating the relevant configuration files. You can see [Llama4+eagle](../../../docs/components/backends/trtllm/llama4_plus_eagle.md) guide to learn how to use these scripts when a single worker fits on the single node.
 
 ### Speculative Decoding
-- **[Llama 4 Maverick Instruct + Eagle Speculative Decoding](./llama4_plus_eagle.md)**
+- **[Llama 4 Maverick Instruct + Eagle Speculative Decoding](../../../docs/components/backends/trtllm/llama4_plus_eagle.md)**
 
 ### Kubernetes Deployment
 
-For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [TensorRT-LLM Kubernetes Deployment Guide](deploy/README.md)
+For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [TensorRT-LLM Kubernetes Deployment Guide](../../../docs/components/backends/trtllm/deploy/README.md)
 
 ### Client
 
@@ -216,7 +216,7 @@ DISAGGREGATION_STRATEGY="prefill_first" ./launch/disagg.sh
 
 ## KV Cache Transfer in Disaggregated Serving
 
-Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disaggregated serving: UCX (default) and NIXL (experimental). For detailed information and configuration instructions for each method, see the [KV cache transfer guide](./kv-cache-tranfer.md).
+Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disaggregated serving: UCX (default) and NIXL (experimental). For detailed information and configuration instructions for each method, see the [KV cache transfer guide](../../../docs/components/backends/trtllm/kv-cache-tranfer.md).
 
 ## Request Migration
 

diff --git a/components/backends/vllm/README.md b/components/backends/vllm/README.md
@@ -35,12 +35,12 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 
 | Feature | vLLM | Notes |
 |---------|------|-------|
-| [**Disaggregated Serving**](../../../architecture/disagg_serving.md) | ✅ |  |
-| [**Conditional Disaggregation**](../../../architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | WIP |
-| [**KV-Aware Routing**](../../../architecture/kv_cache_routing.md) | ✅ |  |
-| [**SLA-Based Planner**](../../../architecture/sla_planner.md) | ✅ |  |
-| [**Load Based Planner**](../../../architecture/load_planner.md) | 🚧 | WIP |
-| [**KVBM**](../../../architecture/kvbm_architecture.md) | 🚧 | WIP |
+| [**Disaggregated Serving**](../../../docs/architecture/disagg_serving.md) | ✅ |  |
+| [**Conditional Disaggregation**](../../../docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | WIP |
+| [**KV-Aware Routing**](../../../docs/architecture/kv_cache_routing.md) | ✅ |  |
+| [**SLA-Based Planner**](../../../docs/architecture/sla_planner.md) | ✅ |  |
+| [**Load Based Planner**](../../../docs/architecture/load_planner.md) | 🚧 | WIP |
+| [**KVBM**](../../../docs/architecture/kvbm_architecture.md) | 🚧 | WIP |
 
 ### Large Scale P/D and WideEP Features
 
@@ -152,7 +152,7 @@ Below we provide a selected list of advanced deployments. Please open up an issu
 
 ### Kubernetes Deployment
 
-For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [vLLM Kubernetes Deployment Guide](deploy/README.md)
+For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [vLLM Kubernetes Deployment Guide](../../../docs/components/backends/vllm/deploy/README.md)
 
 ## Configuration
 

@@ -71,7 +71,7 @@ kubectl get gateway inference-gateway -n my-model
 
 3. **Deploy model**
 
-Follow the steps in [model deployment](../../components/backends/vllm/deploy/README.md) to deploy `Qwen/Qwen3-0.6B` model in aggregate mode using [agg.yaml](../../components/backends/vllm/deploy/agg.yaml) in `my-model` kubernetes namespace.
+Follow the steps in [model deployment](../../docs/components/backends/vllm/deploy/README.md) to deploy `Qwen/Qwen3-0.6B` model in aggregate mode using [agg.yaml](../../components/backends/vllm/deploy/agg.yaml) in `my-model` kubernetes namespace.
 
 Sample commands to deploy model:
 ```bash

diff --git a/docs/API/nixl_connect/README.md b/docs/API/nixl_connect/README.md
@@ -64,6 +64,7 @@ sequenceDiagram
     RemoteWorker -->> LocalWorker: Notify completion (unblock awaiter)
 ```
 
+
 ## Python Classes
 
   - [Connector](connector.md)

diff --git a/docs/index.rst b/docs/index.rst
@@ -143,6 +143,7 @@ The examples below assume you build the latest image yourself from source. If us
    Writing Python Workers in Dynamo <guides/backend.md>
    Disaggregation and Performance Tuning <guides/disagg_perf_tuning.md>
    Working with Dynamo Kubernetes Operator <guides/dynamo_deploy/dynamo_operator.md>
+   Configuring Metrics for Observability <guides/metrics.md>
 
 .. toctree::
    :hidden:

@@ -38,7 +38,6 @@ Learn fundamental Dynamo concepts through these introductory examples:
 - **[Quickstart](basics/quickstart/README.md)** - Simple aggregated serving example with vLLM backend
 - **[Disaggregated Serving](basics/disaggregated_serving/README.md)** - Prefill/decode separation for enhanced performance and scalability
 - **[Multi-node](basics/multinode/README.md)** - Distributed inference across multiple nodes and GPUs
-- **[Multimodal](basics/multimodal/README.md)** - Multimodal model deployment with E/P/D disaggregated serving
 
 ## Deployment Examples
 

@@ -103,7 +103,7 @@ Hello star!
 
 ## Deployment to Kubernetes
 
-Follow the [Quickstart Guide](../../../guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.
+Follow the [Quickstart Guide](../../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.
 Then deploy to kubernetes using
 
 ```bash
Original file line number	Diff line number	Diff line change
		@@ -1 +1 @@
		Please refer to [Deploying Dynamo with SGLang on SLURM](../../../../../docs/components/backends/sglang/slurm_jobs/README.md) for more details.
		Please refer to [Deploying Dynamo with SGLang on SLURM](../../../../docs/components/backends/sglang/slurm_jobs/README.md) for more details.
-Original file line number
+Diff line change
@@ Expand Up / @@ -64,6 +64,7 @@ sequenceDiagram @@
         RemoteWorker -->> LocalWorker: Notify completion (unblock awaiter)
     ```
     ## Python Classes
       - [Connector](connector.md)
@@ Expand Down @@