Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
docs: clean up multinode example
  • Loading branch information
nealvaidya committed Jul 29, 2025
commit 808daac2204a6afbb8fc898a482bece7b0477573
42 changes: 24 additions & 18 deletions examples/basics/multinode/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

This example demonstrates running Dynamo across multiple nodes with **KV-aware routing** to distribute requests between two replicas of a disaggregated model. Each replica consists of dedicated prefill and decode workers, providing high availability and load distribution.

For more information about the core concepts, see:
- [Dynamo Disaggregated Serving](../../docs/architecture/disagg_serving.md)
- [KV Cache Routing Architecture](../../docs/architecture/kv_cache_routing.md)

## Architecture Overview

The multi-node setup consists of:
Expand All @@ -16,19 +20,19 @@ title: Multi-Node Architecture with Full KV Routing (SGLang)
flowchart TD
Client["Users/Clients<br/>(HTTP)"] --> Frontend["Frontend<br/>KV-Aware Router<br/>(Any Node)"]

Frontend --> Router{KV Routing<br/>Decision<br/>(All Workers)}

Router -->|KV-aware route| Decode1["Decode Worker 1<br/>(Node 1)<br/>✓ Emits KV Events"]
Router -->|KV-aware route| Decode2["Decode Worker 2<br/>(Node 2)<br/>✓ Emits KV Events"]
Frontend --> Router{KV Routing<br/>Decision}

Router -->|KV-aware route| Prefill1["Prefill Worker 1<br/>(Node 1)<br/>✓ Emits KV Events"]
Router -->|KV-aware route| Prefill2["Prefill Worker 2<br/>(Node 2)<br/>✓ Emits KV Events"]
Router --> Prefill1["Prefill Worker 1"]
Router --> Prefill2["Prefill Worker 2"]

Prefill1 -->|NIXL Transfer| Decode1
Prefill2 -->|NIXL Transfer| Decode2

Decode1 --> Frontend
Decode2 --> Frontend
Prefill1 -.->|KV Events| Frontend
Prefill2 -.->|KV Events| Frontend

Decode1 --> |Response| Frontend
Decode2 --> |Response| Frontend

Frontend --> Client

Expand All @@ -42,11 +46,6 @@ flowchart TD
Prefill2
end

etcd["etcd<br/>Service Discovery"] -.-> Frontend
etcd -.-> Decode1
etcd -.-> Decode2
etcd -.-> Prefill1
etcd -.-> Prefill2
```

## What is KV-Aware Routing?
Expand Down Expand Up @@ -80,13 +79,17 @@ Note the IP address of this node - you'll need it for worker configuration.

### 2. Software Requirements

Install SGLang and dependencies:
Install [SGLang](https://docs.sglang.ai/) and dependencies:

```bash
pip install sglang[all]
# or follow SGLang installation guide for latest version
```

For more information about the SGLang backend and its integration with Dynamo, see the [SGLang Backend Documentation](../../components/backends/sglang/README.md).

**Note**: This example assumes you have Dynamo installed in your Python environment. If not, ensure you're in the Dynamo development environment or have installed it according to the main project documentation.

### 3. Network Requirements

Ensure the following ports are accessible between nodes:
Expand All @@ -110,6 +113,7 @@ On all nodes, set the etcd and NATS endpoints:

```bash
# Replace with your infrastructure node's IP
# To find your IP address, use: hostname -I | awk '{print $1}'
export ETCD_ENDPOINTS=http://<INFRA_NODE_IP>:2379
export NATS_SERVER=nats://<INFRA_NODE_IP>:4222
export DYN_LOG=debug # Enable debug logging to see routing decisions
Expand Down Expand Up @@ -187,11 +191,13 @@ The frontend will:
- Enable KV-aware routing for intelligent request distribution
- Monitor worker health and adjust routing accordingly

For more details about frontend configuration options, see the [Frontend Component Documentation](../../components/frontend/README).

## Testing the Setup

### Prerequisites

Install the OpenAI Python client library:
Install the [OpenAI Python client](https://github.com/openai/openai-python) library:

```bash
pip install openai
Expand Down Expand Up @@ -335,7 +341,7 @@ python -m dynamo.frontend \

However, for maximum performance with shared prefixes and multi-turn conversations, KV routing provides significant advantages by minimizing redundant computation.


For detailed router configuration and tuning options, see the [KV Router Documentation](../../docs/components/router/README.md).

## Monitoring and Debugging

Expand Down Expand Up @@ -403,8 +409,8 @@ For production deployments, you can fine-tune KV routing behavior:
python -m dynamo.frontend \
--http-port 8000 \
--router-mode kv \
--kv-overlap-weight 1.0 # Weight for cache overlap scoring \
--kv-temperature 0.0 # Temperature for probabilistic routing (0 = deterministic)
--kv-overlap-score-weight 1.0 # Weight for cache overlap scoring \
--router-temperature 0.0 # Temperature for probabilistic routing (0 = deterministic)
```

For more advanced configuration options including custom worker selection, block size tuning, and alternative indexing strategies, see the [KV Cache Routing documentation](../../docs/architecture/kv_cache_routing.md).
Expand Down