Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions components/backends/sglang/deploy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ extraPodSpec:

Before using these templates, ensure you have:

1. **Dynamo Cloud Platform installed** - See [Installing Dynamo Cloud](../../docs/guides/dynamo_deploy/dynamo_cloud.md)
1. **Dynamo Cloud Platform installed** - See [Installing Dynamo Cloud](../../../../docs/guides/dynamo_deploy/dynamo_cloud.md)
2. **Kubernetes cluster with GPU support**
3. **Container registry access** for SGLang runtime images
4. **HuggingFace token secret** (referenced as `envFromSecret: hf-token-secret`)
Expand Down Expand Up @@ -159,4 +159,4 @@ Common issues and solutions:
3. **Health check failures**: Review model loading logs and increase `initialDelaySeconds`
4. **Out of memory**: Increase memory limits or reduce model batch size

For additional support, refer to the [deployment troubleshooting guide](../../docs/guides/dynamo_deploy/quickstart.md#troubleshooting).
For additional support, refer to the [deployment troubleshooting guide](../../../../docs/guides/dynamo_deploy/quickstart.md#troubleshooting).
2 changes: 1 addition & 1 deletion components/backends/sglang/slurm_jobs/README.md
Original file line number Diff line number Diff line change
@@ -1 +1 @@
Please refer to [Deploying Dynamo with SGLang on SLURM](../../../../../docs/components/backends/sglang/slurm_jobs/README.md) for more details.
Please refer to [Deploying Dynamo with SGLang on SLURM](../../../../docs/components/backends/sglang/slurm_jobs/README.md) for more details.
20 changes: 10 additions & 10 deletions components/backends/trtllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,12 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))

| Feature | TensorRT-LLM | Notes |
|---------|--------------|-------|
| [**Disaggregated Serving**](../../../architecture/disagg_serving.md) | ✅ | |
| [**Conditional Disaggregation**](../../../architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | Not supported yet |
| [**KV-Aware Routing**](../../../architecture/kv_cache_routing.md) | ✅ | |
| [**SLA-Based Planner**](../../../architecture/sla_planner.md) | 🚧 | Planned |
| [**Load Based Planner**](../../../architecture/load_planner.md) | 🚧 | Planned |
| [**KVBM**](../../../architecture/kvbm_architecture.md) | 🚧 | Planned |
| [**Disaggregated Serving**](../../../docs/architecture/disagg_serving.md) | ✅ | |
| [**Conditional Disaggregation**](../../../docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | Not supported yet |
| [**KV-Aware Routing**](../../../docs/architecture/kv_cache_routing.md) | ✅ | |
| [**SLA-Based Planner**](../../../docs/architecture/sla_planner.md) | 🚧 | Planned |
| [**Load Based Planner**](../../../docs/architecture/load_planner.md) | 🚧 | Planned |
| [**KVBM**](../../../docs/architecture/kvbm_architecture.md) | 🚧 | Planned |

### Large Scale P/D and WideEP Features

Expand Down Expand Up @@ -180,14 +180,14 @@ Below we provide a selected list of advanced examples. Please open up an issue i

### Multinode Deployment

For comprehensive instructions on multinode serving, see the [multinode-examples.md](./multinode-examples.md) guide. It provides step-by-step deployment examples and configuration tips for running Dynamo with TensorRT-LLM across multiple nodes. While the walkthrough uses DeepSeek-R1 as the model, you can easily adapt the process for any supported model by updating the relevant configuration files. You can see [Llama4+eagle](./llama4_plus_eagle.md) guide to learn how to use these scripts when a single worker fits on the single node.
For comprehensive instructions on multinode serving, see the [multinode-examples.md](../../../docs/components/backends/trtllm/multinode-examples.md) guide. It provides step-by-step deployment examples and configuration tips for running Dynamo with TensorRT-LLM across multiple nodes. While the walkthrough uses DeepSeek-R1 as the model, you can easily adapt the process for any supported model by updating the relevant configuration files. You can see [Llama4+eagle](../../../docs/components/backends/trtllm/llama4_plus_eagle.md) guide to learn how to use these scripts when a single worker fits on the single node.

### Speculative Decoding
- **[Llama 4 Maverick Instruct + Eagle Speculative Decoding](./llama4_plus_eagle.md)**
- **[Llama 4 Maverick Instruct + Eagle Speculative Decoding](../../../docs/components/backends/trtllm/llama4_plus_eagle.md)**

### Kubernetes Deployment

For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [TensorRT-LLM Kubernetes Deployment Guide](deploy/README.md)
For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [TensorRT-LLM Kubernetes Deployment Guide](../../../docs/components/backends/trtllm/deploy/README.md)

### Client

Expand Down Expand Up @@ -216,7 +216,7 @@ DISAGGREGATION_STRATEGY="prefill_first" ./launch/disagg.sh

## KV Cache Transfer in Disaggregated Serving

Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disaggregated serving: UCX (default) and NIXL (experimental). For detailed information and configuration instructions for each method, see the [KV cache transfer guide](./kv-cache-tranfer.md).
Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disaggregated serving: UCX (default) and NIXL (experimental). For detailed information and configuration instructions for each method, see the [KV cache transfer guide](../../../docs/components/backends/trtllm/kv-cache-tranfer.md).

## Request Migration

Expand Down
14 changes: 7 additions & 7 deletions components/backends/vllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,12 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))

| Feature | vLLM | Notes |
|---------|------|-------|
| [**Disaggregated Serving**](../../../architecture/disagg_serving.md) | ✅ | |
| [**Conditional Disaggregation**](../../../architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | WIP |
| [**KV-Aware Routing**](../../../architecture/kv_cache_routing.md) | ✅ | |
| [**SLA-Based Planner**](../../../architecture/sla_planner.md) | ✅ | |
| [**Load Based Planner**](../../../architecture/load_planner.md) | 🚧 | WIP |
| [**KVBM**](../../../architecture/kvbm_architecture.md) | 🚧 | WIP |
| [**Disaggregated Serving**](../../../docs/architecture/disagg_serving.md) | ✅ | |
| [**Conditional Disaggregation**](../../../docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | WIP |
| [**KV-Aware Routing**](../../../docs/architecture/kv_cache_routing.md) | ✅ | |
| [**SLA-Based Planner**](../../../docs/architecture/sla_planner.md) | ✅ | |
| [**Load Based Planner**](../../../docs/architecture/load_planner.md) | 🚧 | WIP |
| [**KVBM**](../../../docs/architecture/kvbm_architecture.md) | 🚧 | WIP |

### Large Scale P/D and WideEP Features

Expand Down Expand Up @@ -152,7 +152,7 @@ Below we provide a selected list of advanced deployments. Please open up an issu

### Kubernetes Deployment

For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [vLLM Kubernetes Deployment Guide](deploy/README.md)
For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [vLLM Kubernetes Deployment Guide](../../../docs/components/backends/vllm/deploy/README.md)

## Configuration

Expand Down
2 changes: 1 addition & 1 deletion deploy/inference-gateway/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ kubectl get gateway inference-gateway -n my-model

3. **Deploy model**

Follow the steps in [model deployment](../../components/backends/vllm/deploy/README.md) to deploy `Qwen/Qwen3-0.6B` model in aggregate mode using [agg.yaml](../../components/backends/vllm/deploy/agg.yaml) in `my-model` kubernetes namespace.
Follow the steps in [model deployment](../../docs/components/backends/vllm/deploy/README.md) to deploy `Qwen/Qwen3-0.6B` model in aggregate mode using [agg.yaml](../../components/backends/vllm/deploy/agg.yaml) in `my-model` kubernetes namespace.

Sample commands to deploy model:
```bash
Expand Down
1 change: 1 addition & 0 deletions docs/API/nixl_connect/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ sequenceDiagram
RemoteWorker -->> LocalWorker: Notify completion (unblock awaiter)
```


## Python Classes

- [Connector](connector.md)
Expand Down
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,7 @@ The examples below assume you build the latest image yourself from source. If us
Writing Python Workers in Dynamo <guides/backend.md>
Disaggregation and Performance Tuning <guides/disagg_perf_tuning.md>
Working with Dynamo Kubernetes Operator <guides/dynamo_deploy/dynamo_operator.md>
Configuring Metrics for Observability <guides/metrics.md>

.. toctree::
:hidden:
Expand Down
1 change: 0 additions & 1 deletion examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ Learn fundamental Dynamo concepts through these introductory examples:
- **[Quickstart](basics/quickstart/README.md)** - Simple aggregated serving example with vLLM backend
- **[Disaggregated Serving](basics/disaggregated_serving/README.md)** - Prefill/decode separation for enhanced performance and scalability
- **[Multi-node](basics/multinode/README.md)** - Distributed inference across multiple nodes and GPUs
- **[Multimodal](basics/multimodal/README.md)** - Multimodal model deployment with E/P/D disaggregated serving

## Deployment Examples

Expand Down
2 changes: 1 addition & 1 deletion examples/runtime/hello_world/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ Hello star!

## Deployment to Kubernetes

Follow the [Quickstart Guide](../../../guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.
Follow the [Quickstart Guide](../../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.
Then deploy to kubernetes using

```bash
Expand Down
Loading