Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
chore: cleanup dead links
  • Loading branch information
nealvaidya committed Jul 31, 2025
commit 366c8f7c5b26e0c77413ad9b3f51a2725cfb5cc6
1 change: 0 additions & 1 deletion benchmarks/llm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,3 @@ See the License for the specific language governing permissions and
limitations under the License.
-->

[../../examples/llm/benchmarks/README.md](../../examples/llm/benchmarks/README.md)
2 changes: 1 addition & 1 deletion components/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,4 +77,4 @@ To get started with Dynamo components:
4. **Run deployment scripts** from the engine's launch directory
5. **Monitor performance** using the metrics component

For detailed instructions, see the README files in each component directory and the main [Dynamo documentation](../../docs/).
For detailed instructions, see the README files in each component directory and the main [Dynamo documentation](../docs/).
2 changes: 1 addition & 1 deletion components/backends/llama_cpp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ python -m dynamo.llama_cpp --model-path /data/models/Qwen3-0.6B-Q8_0.gguf [args]

## Request Migration

In a [Distributed System](#distributed-system), a request may fail due to connectivity issues between the Frontend and the Backend.
In a Distributed System, a request may fail due to connectivity issues between the Frontend and the Backend.

The Frontend will automatically track which Backends are having connectivity issues with it and avoid routing new requests to the Backends with known connectivity issues.

Expand Down
6 changes: 2 additions & 4 deletions components/backends/sglang/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,8 +52,7 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))

## Quick Start

Below we provide a guide that lets you run all of our the common deployment patterns on a single node. See our different [architectures](../llm/README.md#deployment-architectures) for a high level overview of each pattern and the architecture diagram for each.

Below we provide a guide that lets you run all of our the common deployment patterns on a single node.
### Start NATS and ETCD in the background

Start using [Docker Compose](../../../deploy/docker-compose.yml)
Expand Down Expand Up @@ -141,7 +140,7 @@ cd $DYNAMO_ROOT/components/backends/sglang

## Request Migration

In a [Distributed System](#distributed-system), a request may fail due to connectivity issues between the Frontend and the Backend.
In a Distributed System, a request may fail due to connectivity issues between the Frontend and the Backend.

The Frontend will automatically track which Backends are having connectivity issues with it and avoid routing new requests to the Backends with known connectivity issues.

Expand All @@ -164,7 +163,6 @@ Below we provide a selected list of advanced examples. Please open up an issue i

### Large scale P/D disaggregation with WideEP
- **[Run DeepSeek-R1 on 104+ H100s](docs/dsr1-wideep-h100.md)**
- **[Run DeepSeek-R1 on GB200s](docs/dsr1-wideep-gb200.md)**

### Speculative Decoding
- **[Deploying DeepSeek-R1 with MTP - coming soon!](.)**
Expand Down
2 changes: 1 addition & 1 deletion components/backends/sglang/docs/dsr1-wideep-h100.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ SPDX-License-Identifier: Apache-2.0

# Running DeepSeek-R1 Disaggregated with WideEP on H100s

Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://www.nvidia.com/en-us/technologies/ai/deepseek-r1-large-scale-p-d-with-wide-expert-parallelism/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs).
Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-05-05-large-scale-ep/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs).

## Instructions

Expand Down
6 changes: 3 additions & 3 deletions components/backends/sglang/slurm_jobs/README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Example: Deploy Multi-node SGLang with Dynamo on SLURM

This folder implements the example of [SGLang DeepSeek-R1 Disaggregated with WideEP](../dsr1-wideep.md) on a SLURM cluster.
This folder implements the example of [SGLang DeepSeek-R1 Disaggregated with WideEP](../docs/dsr1-wideep-h100.md) on a SLURM cluster.

## Overview

The scripts in this folder set up multiple cluster nodes to run the [SGLang DeepSeek-R1 Disaggregated with WideEP](../dsr1-wideep.md) example, with separate nodes handling prefill and decode.
The scripts in this folder set up multiple cluster nodes to run the [SGLang DeepSeek-R1 Disaggregated with WideEP](../docs/dsr1-wideep-h100.md) example, with separate nodes handling prefill and decode.
The node setup is done using Python job submission scripts with Jinja2 templates for flexible configuration. The setup also includes GPU utilization monitoring capabilities to track performance during benchmarks.

## Scripts
Expand Down Expand Up @@ -56,7 +56,7 @@ For simplicity of the example, we will make some assumptions about your SLURM cl
If your cluster supports similar container based plugins, you may be able to
modify the template to use that instead.
3. We assume you have already built a recent Dynamo+SGLang container image as
described [here](../dsr1-wideep.md#instructions).
described [here](../docs/dsr1-wideep-h100.md#instructions).
This is the image that can be passed to the `--container-image` argument in later steps.

## Usage
Expand Down
4 changes: 2 additions & 2 deletions components/backends/trtllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -204,7 +204,7 @@ Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disag

## Request Migration

In a [Distributed System](#distributed-system), a request may fail due to connectivity issues between the Frontend and the Backend.
In a Distributed System, a request may fail due to connectivity issues between the Frontend and the Backend.

The Frontend will automatically track which Backends are having connectivity issues with it and avoid routing new requests to the Backends with known connectivity issues.

Expand All @@ -220,7 +220,7 @@ The migrated request will continue responding to the original request, allowing

## Client

See [client](../llm/README.md#client) section to learn how to send request to the deployment.
See the [quickstart guide](../../../examples/basics/quickstart/README.md#3-send-requests) to learn how to send request to the deployment.

NOTE: To send a request to a multi-node deployment, target the node which is running `python3 -m dynamo.frontend <args>`.

Expand Down
2 changes: 1 addition & 1 deletion components/backends/vllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -235,7 +235,7 @@ The [documentation](https://docs.vllm.ai/en/v0.9.2/configuration/serve_args.html

## Request Migration

In a [Distributed System](#distributed-system), a request may fail due to connectivity issues between the Frontend and the Backend.
In a Distributed System, a request may fail due to connectivity issues between the Frontend and the Backend.

The Frontend will automatically track which Backends are having connectivity issues with it and avoid routing new requests to the Backends with known connectivity issues.

Expand Down
4 changes: 2 additions & 2 deletions deploy/cloud/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,6 @@ This directory contains the infrastructure components required for the Dynamo cl

For detailed documentation on setting up and using the Dynamo Cloud Platform, please refer to:
- [Dynamo Cloud Platform Guide](../../docs/guides/dynamo_deploy/dynamo_cloud.md)
- [Operator Deployment Guide](../../docs/guides/dynamo_deploy/operator_deployment.md)
- [Operator Deployment Guide](../../docs/guides/dynamo_deploy/dynamo_operator.md)

For a quick start example, see [examples/hello_world/README.md#deploying-to-kubernetes-using-dynamo-cloud-and-dynamo-deploy-cli](../../examples/hello_world/README.md#deploying-to-kubernetes-using-dynamo-cloud-and-dynamo-deploy-cli)
For a quick start example, see [examples/runtime/hello_world/README.md#deployment-to-kubernetes](../../examples/runtime/hello_world/README.md#deployment-to-kubernetes)
3 changes: 1 addition & 2 deletions deploy/inference-gateway/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,7 @@ Currently, this setup is only kgateway based Inference Gateway.

1. **Install Dynamo Platform**

[See Quickstart Guide](../../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.

[See Quickstart Guide](../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.

2. **Deploy Inference Gateway**

Expand Down
2 changes: 1 addition & 1 deletion deploy/metrics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ Grafana is pre-configured with:
## Required Files

The following configuration files should be present in this directory:
- [docker-compose.yml](./docker-compose.yml): Defines the Prometheus and Grafana services
- [docker-compose.yml](../docker-compose.yml): Defines the Prometheus and Grafana services
- [prometheus.yml](./prometheus.yml): Contains Prometheus scraping configuration
- [grafana-datasources.yml](./grafana-datasources.yml): Contains Grafana datasource configuration
- [grafana_dashboards/grafana-dashboard-providers.yml](./grafana_dashboards/grafana-dashboard-providers.yml): Contains Grafana dashboard provider configuration
Expand Down
2 changes: 1 addition & 1 deletion docs/API/nixl_connect/connector.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ The connector provides two methods of moving data between workers:

- Preparing local memory to be read by a remote worker.

In both cases, local memory is registered with the NIXL-based RDMA subsystem via the [`Descriptor`](#descriptor) class and provided to the connector.
In both cases, local memory is registered with the NIXL-based RDMA subsystem via the [`Descriptor`](descriptor.md) class and provided to the connector.
The connector then configures the RDMA subsystem to expose the memory for the requested operation and returns an operation control object.
The operation control object, either a [`ReadableOperation`](readable_operation.md) or a [`WritableOperation`](writable_operation.md),
provides RDMA metadata ([RdmaMetadata](rdma_metadata.md)) via its `.metadata()` method, functionality to query the operation's current state, as well as the ability to cancel the operation prior to its completion.
Expand Down
2 changes: 1 addition & 1 deletion docs/architecture/dynamo_flow.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ limitations under the License.

# Dynamo Architecture Flow

This diagram shows the NVIDIA Dynamo disaggregated inference system as implemented in [examples/llm](https://github.com/ai-dynamo/dynamo/tree/main/examples/llm). Color-coded flows indicate different types of operations:
This diagram shows the NVIDIA Dynamo disaggregated inference system as implemented in [examples/llm](https://github.com/ai-dynamo/dynamo/tree/v0.3.2/examples/llm). Color-coded flows indicate different types of operations:

## 🔵 Main Request Flow (Blue)
The primary user journey through the system:
Expand Down
1 change: 0 additions & 1 deletion docs/components/backends/llm/README.md

This file was deleted.

3 changes: 1 addition & 2 deletions docs/guides/dynamo_deploy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,5 +38,4 @@ Users who need more control over their deployments can use the manual deployment
- Provides full control over deployment parameters
- Requires manual management of infrastructure components
- Documentation:
- [Using the Deployment Script](manual_helm_deployment.md#using-the-deployment-script): all-in-one script for manual deployment
- [Helm Deployment Guide](manual_helm_deployment.md#helm-deployment-guide): detailed instructions for manual deployment
- [Helm Deployment Guide](../../../deploy/helm/README.md): detailed instructions for manual deployment
1 change: 0 additions & 1 deletion docs/guides/dynamo_deploy/operator_deployment.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/guides/dynamo_deploy/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ Ensure you have the source code checked out and are in the `dynamo` directory:

### Set Environment Variables

Our examples use the [`nvcr.io`](https://nvcr.io/nvidia/ai-dynamo/) but you can setup your own values if you use another docker registry.
Our examples use the `nvcr.io` but you can setup your own values if you use another docker registry.

```bash
export NAMESPACE=dynamo-cloud # or whatever you prefer.
Expand Down
10 changes: 3 additions & 7 deletions docs/guides/dynamo_run.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ The KV-aware routing arguments:

### Request Migration

In a [Distributed System](#distributed-system), a request may fail due to connectivity issues between the HTTP Server and the Worker Engine.
In a Distributed System, a request may fail due to connectivity issues between the HTTP Server and the Worker Engine.

The HTTP Server will automatically track which Worker Engines are having connectivity issues with it and avoid routing new requests to the Engines with known connectivity issues.

Expand Down Expand Up @@ -482,11 +482,11 @@ The trtllm engine requires [etcd](https://etcd.io/) and [nats](https://nats.io/)

##### Step 1: Build the environment

See instructions [here](https://github.com/ai-dynamo/dynamo/blob/main/examples/tensorrt_llm/README.md#build-docker) to build the dynamo container with TensorRT-LLM.
See instructions [here](https://github.com/ai-dynamo/dynamo/tree/main/components/backends/trtllm#build-container) to build the dynamo container with TensorRT-LLM.

##### Step 2: Run the environment

See instructions [here](https://github.com/ai-dynamo/dynamo/blob/main/examples/tensorrt_llm/README.md#run-container) to run the built environment.
See instructions [here](https://github.com/ai-dynamo/dynamo/tree/main/components/backends/trtllm#run-container) to run the built environment.

##### Step 3: Execute `dynamo-run` command

Expand Down Expand Up @@ -679,10 +679,6 @@ Here are some example engines:
- Chat:
* [sglang](https://github.com/ai-dynamo/dynamo/blob/main/lib/bindings/python/examples/hello_world/server_sglang_tok.py)

More fully-featured Backend engines (used by `dynamo-run`):
- [vllm](https://github.com/ai-dynamo/dynamo/blob/main/launch/dynamo-run/src/subprocess/vllm_inc.py)
- [sglang](https://github.com/ai-dynamo/dynamo/blob/main/launch/dynamo-run/src/subprocess/sglang_inc.py)

### Debugging

`dynamo-run` and `dynamo-runtime` support [tokio-console](https://github.com/tokio-rs/console). Build with the feature to enable:
Expand Down
4 changes: 2 additions & 2 deletions examples/basics/disaggregated_serving/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,8 @@ docker compose -f deploy/metrics/docker-compose.yml up -d
## Components

- [Frontend](../../../components/frontend/README) - HTTP API endpoint that receives requests and forwards them to the decode worker
- [vLLM Prefill Worker](../../../components/backends/vllm/README) - Specialized worker for prefill phase execution
- [vLLM Decode Worker](../../../components/backends/vllm/README) - Specialized worker that handles requests and decides between local/remote prefill
- [vLLM Prefill Worker](../../../components/backends/vllm/README.md) - Specialized worker for prefill phase execution
- [vLLM Decode Worker](../../../components/backends/vllm/README.md) - Specialized worker that handles requests and decides between local/remote prefill

```mermaid
---
Expand Down
2 changes: 1 addition & 1 deletion examples/basics/multimodal/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))

### Components

- workers: For aggregated serving, we have two workers, [encode_worker](components/encode_worker.py) for encoding and [decode_worker](components/decode_worker.py) for prefilling and decoding.
- workers: For aggregated serving, we have two workers, [encode_worker](../../../components/encode_worker.py) for encoding and [decode_worker](components/decode_worker.py) for prefilling and decoding.
- processor: Tokenizes the prompt and passes it to the decode worker.
- frontend: HTTP endpoint to handle incoming requests.

Expand Down
2 changes: 1 addition & 1 deletion examples/basics/multinode/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ Install Dynamo with [SGLang](https://docs.sglang.ai/) support:
pip install ai-dynamo[sglang]
```

For more information about the SGLang backend and its integration with Dynamo, see the [SGLang Backend Documentation](../../components/backends/sglang/README.md).
For more information about the SGLang backend and its integration with Dynamo, see the [SGLang Backend Documentation](../../../components/backends/sglang/README.md).

### 3. Network Requirements

Expand Down
2 changes: 1 addition & 1 deletion examples/basics/quickstart/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ docker compose -f deploy/metrics/docker-compose.yml up -d
## Components

- [Frontend](../../../components/frontend/README) - A built-in component that launches an OpenAI compliant HTTP server, a pre-processor, and a router in a single process
- [vLLM Backend](../../../components/backends/vllm/README) - A built-in component that runs vLLM within the Dynamo runtime
- [vLLM Backend](../../../components/backends/vllm/README.md) - A built-in component that runs vLLM within the Dynamo runtime

```mermaid
---
Expand Down
Loading