Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 19 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,30 @@ limitations under the License.
[![Discord](https://dcbadge.limes.pink/api/server/D92uqZRjCZ?style=flat)](https://discord.gg/D92uqZRjCZ)
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/ai-dynamo/dynamo)

| **[Roadmap](https://github.com/ai-dynamo/dynamo/issues/762)** | **[Documentation](https://docs.nvidia.com/dynamo/latest/index.html)** | **[Examples](https://github.com/ai-dynamo/dynamo/tree/main/examples)** | **[Design Proposals](https://github.com/ai-dynamo/enhancements)** |
| **[Roadmap](https://github.com/ai-dynamo/dynamo/issues/762)** | **[Documentation](https://docs.nvidia.com/dynamo/latest/index.html)** | **[Support Matrix](docs/support_matrix.md)** | **[Examples](https://github.com/ai-dynamo/dynamo/tree/main/examples)** | **[Design Proposals](https://github.com/ai-dynamo/enhancements)** |

# NVIDIA Dynamo

High-throughput, low-latency inference framework designed for serving generative AI and reasoning models in multi-node distributed environments.

## Framework Support Matrix

| Feature | vLLM | SGLang | TensorRT-LLM |
|---------|----------------------|----------------------------|----------------------------------------|
| [**Disaggregated Serving**](/docs/architecture/disagg_serving.md) | ✅ | ✅ | ✅ |
| [**Conditional Disaggregation**](/docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | 🚧 | 🚧 |
| [**KV-Aware Routing**](/docs/architecture/kv_cache_routing.md) | ✅ | ✅ | ✅ |
| [**SLA-Based Planner**](/docs/architecture/sla_planner.md) | ✅ | 🚧 | 🚧 |
| [**Load Based Planner**](/docs/architecture/load_planner.md) | ✅ | 🚧 | 🚧 |
| [**KVBM**](/docs/architecture/kvbm_architecture.md) | 🚧 | 🚧 | 🚧 |

To learn more about each framework and their capabilities, check out each framework's README and deploy them with Dynamo!
- **[vLLM](components/backends/vllm/README.md)**
- **[SGLang](components/backends/sglang/README.md)**
- **[TensorRT-LLM](components/backends/trtllm/README.md)**

Built in Rust for performance and in Python for extensibility, Dynamo is fully open-source and driven by a transparent, OSS (Open Source Software) first development approach.

## The Era of Multi-GPU, Multi-Node

<p align="center">
Expand All @@ -47,24 +65,6 @@ Dynamo is designed to be inference engine agnostic (supports TRT-LLM, vLLM, SGLa
<img src="./docs/images/frontpage-architecture.png" alt="Dynamo architecture" width="600" />
</p>

## Framework Support Matrix

| Feature | vLLM | SGLang | TensorRT-LLM |
|---------|----------------------|----------------------------|----------------------------------------|
| [**Disaggregated Serving**](/docs/architecture/disagg_serving.md) | ✅ | ✅ | ✅ |
| [**Conditional Disaggregation**](/docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | 🚧 | 🚧 |
| [**KV-Aware Routing**](/docs/architecture/kv_cache_routing.md) | ✅ | ✅ | ✅ |
| [**SLA-Based Planner**](/docs/architecture/sla_planner.md) | ✅ | 🚧 | 🚧 |
| [**Load Based Planner**](/docs/architecture/load_planner.md) | ✅ | 🚧 | 🚧 |
| [**KVBM**](/docs/architecture/kvbm_architecture.md) | 🚧 | 🚧 | 🚧 |

To learn more about each framework and their capabilities, check out each framework's README!
- **[vLLM](components/backends/vllm/README.md)**
- **[SGLang](components/backends/sglang/README.md)**
- **[TensorRT-LLM](components/backends/trtllm/README.md)**

Built in Rust for performance and in Python for extensibility, Dynamo is fully open-source and driven by a transparent, OSS (Open Source Software) first development approach.

# Installation

The following examples require a few system level packages.
Expand Down
12 changes: 6 additions & 6 deletions components/backends/sglang/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,12 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))

| Feature | SGLang | Notes |
|---------|--------|-------|
| [**Disaggregated Serving**](../../docs/architecture/disagg_serving.md) | ✅ | |
| [**Conditional Disaggregation**](../../docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | WIP [PR](https://github.com/sgl-project/sglang/pull/7730) |
| [**KV-Aware Routing**](../../docs/architecture/kv_cache_routing.md) | ✅ | |
| [**SLA-Based Planner**](../../docs/architecture/sla_planner.md) | ❌ | Planned |
| [**Load Based Planner**](../../docs/architecture/load_planner.md) | ❌ | Planned |
| [**KVBM**](../../docs/architecture/kvbm_architecture.md) | ❌ | Planned |
| [**Disaggregated Serving**](../../../docs/architecture/disagg_serving.md) | ✅ | |
| [**Conditional Disaggregation**](../../../docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | WIP [PR](https://github.com/sgl-project/sglang/pull/7730) |
| [**KV-Aware Routing**](../../../docs/architecture/kv_cache_routing.md) | ✅ | |
| [**SLA-Based Planner**](../../../docs/architecture/sla_planner.md) | ❌ | Planned |
| [**Load Based Planner**](../../../docs/architecture/load_planner.md) | ❌ | Planned |
| [**KVBM**](../../../docs/architecture/kvbm_architecture.md) | ❌ | Planned |

### Large Scale P/D and WideEP Features

Expand Down
12 changes: 6 additions & 6 deletions components/backends/trtllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,12 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))

| Feature | TensorRT-LLM | Notes |
|---------|--------------|-------|
| [**Disaggregated Serving**](../../docs/architecture/disagg_serving.md) | ✅ | |
| [**Conditional Disaggregation**](../../docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | Not supported yet |
| [**KV-Aware Routing**](../../docs/architecture/kv_cache_routing.md) | ✅ | |
| [**SLA-Based Planner**](../../docs/architecture/sla_planner.md) | 🚧 | Planned |
| [**Load Based Planner**](../../docs/architecture/load_planner.md) | 🚧 | Planned |
| [**KVBM**](../../docs/architecture/kvbm_architecture.md) | 🚧 | Planned |
| [**Disaggregated Serving**](../../../docs/architecture/disagg_serving.md) | ✅ | |
| [**Conditional Disaggregation**](../../../docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | Not supported yet |
| [**KV-Aware Routing**](../../../docs/architecture/kv_cache_routing.md) | ✅ | |
| [**SLA-Based Planner**](../../../docs/architecture/sla_planner.md) | 🚧 | Planned |
| [**Load Based Planner**](../../../docs/architecture/load_planner.md) | 🚧 | Planned |
| [**KVBM**](../../../docs/architecture/kvbm_architecture.md) | 🚧 | Planned |

### Large Scale P/D and WideEP Features

Expand Down
12 changes: 6 additions & 6 deletions components/backends/vllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,12 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))

| Feature | vLLM | Notes |
|---------|------|-------|
| [**Disaggregated Serving**](../../docs/architecture/disagg_serving.md) | ✅ | |
| [**Conditional Disaggregation**](../../docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | WIP |
| [**KV-Aware Routing**](../../docs/architecture/kv_cache_routing.md) | ✅ | |
| [**SLA-Based Planner**](../../docs/architecture/sla_planner.md) | ✅ | |
| [**Load Based Planner**](../../docs/architecture/load_planner.md) | 🚧 | WIP |
| [**KVBM**](../../docs/architecture/kvbm_architecture.md) | 🚧 | WIP |
| [**Disaggregated Serving**](../../../docs/architecture/disagg_serving.md) | ✅ | |
| [**Conditional Disaggregation**](../../../docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | WIP |
| [**KV-Aware Routing**](../../../docs/architecture/kv_cache_routing.md) | ✅ | |
| [**SLA-Based Planner**](../../../docs/architecture/sla_planner.md) | ✅ | |
| [**Load Based Planner**](../../../docs/architecture/load_planner.md) | 🚧 | WIP |
| [**KVBM**](../../../docs/architecture/kvbm_architecture.md) | 🚧 | WIP |

### Large Scale P/D and WideEP Features

Expand Down
20 changes: 10 additions & 10 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,15 @@ This directory contains practical examples demonstrating how to deploy and use D
> **Want to see a specific example?**
> Open a [GitHub issue](https://github.com/ai-dynamo/dynamo/issues) to request an example you'd like to see, or [open a pull request](https://github.com/ai-dynamo/dynamo/pulls) if you'd like to contribute your own!

## Framework Support

The /examples directory shows how Dynamo broadly works using major inference engines.

If you want to see advanced, framework-specific deployment patterns and best practices, check out the [Components Workflows](../components/backends/) directory:
- **[vLLM](../components/backends/vllm/)** – vLLM-specific deployment and configuration
- **[SGLang](../components/backends/sglang/)** – SGLang integration examples and workflows
- **[TensorRT-LLM](../components/backends/trtllm/)** – TensorRT-LLM workflows and optimizations

## Basics & Tutorials

Learn fundamental Dynamo concepts through these introductory examples:
Expand Down Expand Up @@ -67,13 +76,4 @@ Before running any examples, ensure you have:
- **Docker & Docker Compose** - For containerized services
- **CUDA-compatible GPU** - For LLM inference (except hello_world, which is non-GPU aware)
- **Python 3.9++** - For client scripts and utilities
- **Kubernetes cluster** - For any cloud deployment/K8s examples

## Framework Support

These examples show how Dynamo broadly works using major inference engines.

If you want to see advanced, framework-specific deployment patterns and best practices, check out the [Components Workflows](../components/backends/) directory:
- **[vLLM](../components/backends/vllm/)** – vLLM-specific deployment and configuration
- **[SGLang](../components/backends/sglang/)** – SGLang integration examples and workflows
- **[TensorRT-LLM](../components/backends/trtllm/)** – TensorRT-LLM workflows and optimizations
- **Kubernetes cluster** - For any cloud deployment/K8s examples
Loading