chore: cleanup dead links (#2208)

Co-authored-by: Dmitry Tokarev <[email protected]>
ai-dynamo · biswapanda · Jul 30, 2025 · Jul 30, 2025 · Jul 30, 2025 · Jul 31, 2025
commit 65e89b30a90068c4ae1d34263c9fca448fa8c468
diff --git a/benchmarks/llm/README.md b/benchmarks/llm/README.md
@@ -12,4 +12,3 @@ See the License for the specific language governing permissions and
 limitations under the License.
 -->
 
-[../../examples/llm/benchmarks/README.md](../../examples/llm/benchmarks/README.md)
diff --git a/components/README.md b/components/README.md
@@ -77,4 +77,4 @@ To get started with Dynamo components:
 4. **Run deployment scripts** from the engine's launch directory
 5. **Monitor performance** using the metrics component
 
-For detailed instructions, see the README files in each component directory and the main [Dynamo documentation](../../docs/).
+For detailed instructions, see the README files in each component directory and the main [Dynamo documentation](../docs/).
diff --git a/components/backends/llama_cpp/README.md b/components/backends/llama_cpp/README.md
@@ -13,7 +13,7 @@ python -m dynamo.llama_cpp --model-path /data/models/Qwen3-0.6B-Q8_0.gguf [args]
 
 ## Request Migration
 
-In a [Distributed System](#distributed-system), a request may fail due to connectivity issues between the Frontend and the Backend.
+In a Distributed System, a request may fail due to connectivity issues between the Frontend and the Backend.
 
 The Frontend will automatically track which Backends are having connectivity issues with it and avoid routing new requests to the Backends with known connectivity issues.
 

diff --git a/components/backends/sglang/README.md b/components/backends/sglang/README.md
@@ -52,8 +52,7 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 
 ## Quick Start
 
-Below we provide a guide that lets you run all of our the common deployment patterns on a single node. See our different [architectures](../llm/README.md#deployment-architectures) for a high level overview of each pattern and the architecture diagram for each.
-
+Below we provide a guide that lets you run all of our the common deployment patterns on a single node.
 ### Start NATS and ETCD in the background
-Below we provide a guide that lets you run all of our the common deployment patterns on a single node.
-### Start NATS and ETCD in the background
+Below we provide a guide that lets you run all of our common deployment patterns on a single node.
+### Start NATS and ETCD in the background
-Below we provide a guide that lets you run all of our the common deployment patterns on a single node.
-### Start NATS and ETCD in the background
+Below we provide a guide that lets you run all of our common deployment patterns on a single node.
+### Start NATS and ETCD in the background
 
 Start using [Docker Compose](../../../deploy/docker-compose.yml)
@@ -141,7 +140,7 @@ cd $DYNAMO_ROOT/components/backends/sglang
 
 ## Request Migration
 
-In a [Distributed System](#distributed-system), a request may fail due to connectivity issues between the Frontend and the Backend.
+In a Distributed System, a request may fail due to connectivity issues between the Frontend and the Backend.
 
 The Frontend will automatically track which Backends are having connectivity issues with it and avoid routing new requests to the Backends with known connectivity issues.
 
@@ -164,7 +163,6 @@ Below we provide a selected list of advanced examples. Please open up an issue i
 
 ### Large scale P/D disaggregation with WideEP
 - **[Run DeepSeek-R1 on 104+ H100s](docs/dsr1-wideep-h100.md)**
-- **[Run DeepSeek-R1 on GB200s](docs/dsr1-wideep-gb200.md)**
 
 ### Speculative Decoding
 - **[Deploying DeepSeek-R1 with MTP - coming soon!](.)**

diff --git a/components/backends/sglang/docs/dsr1-wideep-h100.md b/components/backends/sglang/docs/dsr1-wideep-h100.md
@@ -5,7 +5,7 @@ SPDX-License-Identifier: Apache-2.0
 
 # Running DeepSeek-R1 Disaggregated with WideEP on H100s
 
-Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://www.nvidia.com/en-us/technologies/ai/deepseek-r1-large-scale-p-d-with-wide-expert-parallelism/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs).
+Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-05-05-large-scale-ep/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs).
-Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-05-05-large-scale-ep/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs).
+Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-05-05-large-scale-ep/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-wideep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs).
-Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-05-05-large-scale-ep/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs).
+Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-05-05-large-scale-ep/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-wideep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs).
 
 ## Instructions
 

diff --git a/components/backends/sglang/slurm_jobs/README.md b/components/backends/sglang/slurm_jobs/README.md
@@ -1,10 +1,10 @@
 # Example: Deploy Multi-node SGLang with Dynamo on SLURM
 
-This folder implements the example of [SGLang DeepSeek-R1 Disaggregated with WideEP](../dsr1-wideep.md) on a SLURM cluster.
+This folder implements the example of [SGLang DeepSeek-R1 Disaggregated with WideEP](../docs/dsr1-wideep-h100.md) on a SLURM cluster.
 
 ## Overview
 
-The scripts in this folder set up multiple cluster nodes to run the [SGLang DeepSeek-R1 Disaggregated with WideEP](../dsr1-wideep.md) example, with separate nodes handling prefill and decode.
+The scripts in this folder set up multiple cluster nodes to run the [SGLang DeepSeek-R1 Disaggregated with WideEP](../docs/dsr1-wideep-h100.md) example, with separate nodes handling prefill and decode.
 The node setup is done using Python job submission scripts with Jinja2 templates for flexible configuration. The setup also includes GPU utilization monitoring capabilities to track performance during benchmarks.
 
 ## Scripts
@@ -56,7 +56,7 @@ For simplicity of the example, we will make some assumptions about your SLURM cl
    If your cluster supports similar container based plugins, you may be able to
    modify the template to use that instead.
 3. We assume you have already built a recent Dynamo+SGLang container image as
-   described [here](../dsr1-wideep.md#instructions).
+   described [here](../docs/dsr1-wideep-h100.md#instructions).
    This is the image that can be passed to the `--container-image` argument in later steps.
 
 ## Usage

diff --git a/components/backends/trtllm/README.md b/components/backends/trtllm/README.md
@@ -263,7 +263,7 @@ Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disag
 
 ## Request Migration
 
-In a [Distributed System](#distributed-system), a request may fail due to connectivity issues between the Frontend and the Backend.
+In a Distributed System, a request may fail due to connectivity issues between the Frontend and the Backend.
 
 The Frontend will automatically track which Backends are having connectivity issues with it and avoid routing new requests to the Backends with known connectivity issues.
 
@@ -279,7 +279,7 @@ The migrated request will continue responding to the original request, allowing
 
 ## Client
 
-See [client](../llm/README.md#client) section to learn how to send request to the deployment.
+See the [quickstart guide](../../../examples/basics/quickstart/README.md#3-send-requests) to learn how to send request to the deployment.
 
 NOTE: To send a request to a multi-node deployment, target the node which is running `python3 -m dynamo.frontend <args>`.
 

diff --git a/components/backends/vllm/README.md b/components/backends/vllm/README.md
@@ -235,7 +235,7 @@ The [documentation](https://docs.vllm.ai/en/v0.9.2/configuration/serve_args.html
 
 ## Request Migration
 
-In a [Distributed System](#distributed-system), a request may fail due to connectivity issues between the Frontend and the Backend.
+In a Distributed System, a request may fail due to connectivity issues between the Frontend and the Backend.
 
 The Frontend will automatically track which Backends are having connectivity issues with it and avoid routing new requests to the Backends with known connectivity issues.
 

@@ -21,6 +21,6 @@ This directory contains the infrastructure components required for the Dynamo cl
 
 For detailed documentation on setting up and using the Dynamo Cloud Platform, please refer to:
 - [Dynamo Cloud Platform Guide](../../docs/guides/dynamo_deploy/dynamo_cloud.md)
-- [Operator Deployment Guide](../../docs/guides/dynamo_deploy/operator_deployment.md)
+- [Operator Deployment Guide](../../docs/guides/dynamo_deploy/dynamo_operator.md)
 
-For a quick start example, see [examples/hello_world/README.md#deploying-to-kubernetes-using-dynamo-cloud-and-dynamo-deploy-cli](../../examples/hello_world/README.md#deploying-to-kubernetes-using-dynamo-cloud-and-dynamo-deploy-cli)
+For a quick start example, see [examples/runtime/hello_world/README.md#deployment-to-kubernetes](../../examples/runtime/hello_world/README.md#deployment-to-kubernetes)
@@ -18,8 +18,7 @@ Currently, this setup is only kgateway based Inference Gateway.
 
 1. **Install Dynamo Platform**
 
-[See Quickstart Guide](../../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.
-
+[See Quickstart Guide](../../docs/guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.
 
 2. **Deploy Inference Gateway**
 

@@ -87,7 +87,7 @@ Grafana is pre-configured with:
 ## Required Files
 
 The following configuration files should be present in this directory:
-- [docker-compose.yml](./docker-compose.yml): Defines the Prometheus and Grafana services
+- [docker-compose.yml](../docker-compose.yml): Defines the Prometheus and Grafana services
 - [prometheus.yml](./prometheus.yml): Contains Prometheus scraping configuration
 - [grafana-datasources.yml](./grafana-datasources.yml): Contains Grafana datasource configuration
 - [grafana_dashboards/grafana-dashboard-providers.yml](./grafana_dashboards/grafana-dashboard-providers.yml): Contains Grafana dashboard provider configuration

diff --git a/docs/API/nixl_connect/connector.md b/docs/API/nixl_connect/connector.md
@@ -28,7 +28,7 @@ The connector provides two methods of moving data between workers:
 
   - Preparing local memory to be read by a remote worker.
 
-In both cases, local memory is registered with the NIXL-based RDMA subsystem via the [`Descriptor`](#descriptor) class and provided to the connector.
+In both cases, local memory is registered with the NIXL-based RDMA subsystem via the [`Descriptor`](descriptor.md) class and provided to the connector.
 The connector then configures the RDMA subsystem to expose the memory for the requested operation and returns an operation control object.
 The operation control object, either a [`ReadableOperation`](readable_operation.md) or a [`WritableOperation`](writable_operation.md),
 provides RDMA metadata ([RdmaMetadata](rdma_metadata.md)) via its `.metadata()` method, functionality to query the operation's current state, as well as the ability to cancel the operation prior to its completion.

diff --git a/docs/architecture/dynamo_flow.md b/docs/architecture/dynamo_flow.md
@@ -17,7 +17,7 @@ limitations under the License.
 
 # Dynamo Architecture Flow
 
-This diagram shows the NVIDIA Dynamo disaggregated inference system as implemented in [examples/llm](https://github.com/ai-dynamo/dynamo/tree/main/examples/llm). Color-coded flows indicate different types of operations:
+This diagram shows the NVIDIA Dynamo disaggregated inference system as implemented in [examples/llm](https://github.com/ai-dynamo/dynamo/tree/v0.3.2/examples/llm). Color-coded flows indicate different types of operations:
 
 ## 🔵 Main Request Flow (Blue)
 The primary user journey through the system:

diff --git a/docs/components/backends/llm/README.md b/docs/components/backends/llm/README.md
diff --git a/docs/guides/dynamo_deploy/README.md b/docs/guides/dynamo_deploy/README.md
@@ -38,5 +38,4 @@ Users who need more control over their deployments can use the manual deployment
 - Provides full control over deployment parameters
 - Requires manual management of infrastructure components
 - Documentation:
-  - [Using the Deployment Script](manual_helm_deployment.md#using-the-deployment-script): all-in-one script for manual deployment
-  - [Helm Deployment Guide](manual_helm_deployment.md#helm-deployment-guide): detailed instructions for manual deployment
+  - [Helm Deployment Guide](../../../deploy/helm/README.md): detailed instructions for manual deployment
diff --git a/docs/guides/dynamo_deploy/operator_deployment.md b/docs/guides/dynamo_deploy/operator_deployment.md
diff --git a/docs/guides/dynamo_deploy/quickstart.md b/docs/guides/dynamo_deploy/quickstart.md
@@ -67,7 +67,7 @@ Ensure you have the source code checked out and are in the `dynamo` directory:
 
 ### Set Environment Variables
 
-Our examples use the [`nvcr.io`](https://nvcr.io/nvidia/ai-dynamo/) but you can setup your own values if you use another docker registry.
+Our examples use the `nvcr.io` but you can setup your own values if you use another docker registry.
 
 ```bash
 export NAMESPACE=dynamo-cloud # or whatever you prefer.

diff --git a/docs/guides/dynamo_run.md b/docs/guides/dynamo_run.md
@@ -211,7 +211,7 @@ The KV-aware routing arguments:
 
 ### Request Migration
 
-In a [Distributed System](#distributed-system), a request may fail due to connectivity issues between the HTTP Server and the Worker Engine.
+In a Distributed System, a request may fail due to connectivity issues between the HTTP Server and the Worker Engine.
 
 The HTTP Server will automatically track which Worker Engines are having connectivity issues with it and avoid routing new requests to the Engines with known connectivity issues.
 
@@ -482,11 +482,11 @@ The trtllm engine requires [etcd](https://etcd.io/) and [nats](https://nats.io/)
 
 ##### Step 1: Build the environment
 
-See instructions [here](https://github.com/ai-dynamo/dynamo/blob/main/examples/tensorrt_llm/README.md#build-docker) to build the dynamo container with TensorRT-LLM.
+See instructions [here](https://github.com/ai-dynamo/dynamo/tree/main/components/backends/trtllm#build-container) to build the dynamo container with TensorRT-LLM.
 
 ##### Step 2: Run the environment
 
-See instructions [here](https://github.com/ai-dynamo/dynamo/blob/main/examples/tensorrt_llm/README.md#run-container) to run the built environment.
+See instructions [here](https://github.com/ai-dynamo/dynamo/tree/main/components/backends/trtllm#run-container) to run the built environment.
 
 ##### Step 3: Execute `dynamo-run` command
 
@@ -679,10 +679,6 @@ Here are some example engines:
 - Chat:
     * [sglang](https://github.com/ai-dynamo/dynamo/blob/main/lib/bindings/python/examples/hello_world/server_sglang_tok.py)
 
-More fully-featured Backend engines (used by `dynamo-run`):
-- [vllm](https://github.com/ai-dynamo/dynamo/blob/main/launch/dynamo-run/src/subprocess/vllm_inc.py)
-- [sglang](https://github.com/ai-dynamo/dynamo/blob/main/launch/dynamo-run/src/subprocess/sglang_inc.py)
-
 ### Debugging
 
 `dynamo-run` and `dynamo-runtime` support [tokio-console](https://github.com/tokio-rs/console). Build with the feature to enable:

@@ -37,8 +37,8 @@ docker compose -f deploy/metrics/docker-compose.yml up -d
 ## Components
 
 - [Frontend](../../../components/frontend/README) - HTTP API endpoint that receives requests and forwards them to the decode worker
-- [vLLM Prefill Worker](../../../components/backends/vllm/README) - Specialized worker for prefill phase execution
-- [vLLM Decode Worker](../../../components/backends/vllm/README) - Specialized worker that handles requests and decides between local/remote prefill
+- [vLLM Prefill Worker](../../../components/backends/vllm/README.md) - Specialized worker for prefill phase execution
+- [vLLM Decode Worker](../../../components/backends/vllm/README.md) - Specialized worker that handles requests and decides between local/remote prefill
 
 ```mermaid
 ---

@@ -35,7 +35,7 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 
 ### Components
 
-- workers: For aggregated serving, we have two workers, [encode_worker](components/encode_worker.py) for encoding and [decode_worker](components/decode_worker.py) for prefilling and decoding.
+- workers: For aggregated serving, we have two workers, [encode_worker](../../../components/encode_worker.py) for encoding and [decode_worker](components/decode_worker.py) for prefilling and decoding.
 - processor: Tokenizes the prompt and passes it to the decode worker.
 - frontend: HTTP endpoint to handle incoming requests.
 

@@ -85,7 +85,7 @@ Install Dynamo with [SGLang](https://docs.sglang.ai/) support:
 pip install ai-dynamo[sglang]
 ```
 
-For more information about the SGLang backend and its integration with Dynamo, see the [SGLang Backend Documentation](../../components/backends/sglang/README.md).
+For more information about the SGLang backend and its integration with Dynamo, see the [SGLang Backend Documentation](../../../components/backends/sglang/README.md).
 
 ### 3. Network Requirements
 

@@ -18,7 +18,7 @@ docker compose -f deploy/metrics/docker-compose.yml up -d
 ## Components
 
 - [Frontend](../../../components/frontend/README) - A built-in component that launches an OpenAI compliant HTTP server, a pre-processor, and a router in a single process
-- [vLLM Backend](../../../components/backends/vllm/README) - A built-in component that runs vLLM within the Dynamo runtime
+- [vLLM Backend](../../../components/backends/vllm/README.md) - A built-in component that runs vLLM within the Dynamo runtime
 
 ```mermaid
 ---
Original file line number	Diff line number	Diff line change
Expand Up		@@ -12,4 +12,3 @@ See the License for the specific language governing permissions and
		limitations under the License.
		-->

		[../../examples/llm/benchmarks/README.md](../../examples/llm/benchmarks/README.md)