Address CodeRabbit feedback

ai-dynamo · rmccorm4 · Aug 18, 2025 · Aug 16, 2025 · Aug 16, 2025 · Aug 17, 2025
commit c3ec608975709e65ec4f297b16da18438588b78f
diff --git a/components/backends/sglang/README.md b/components/backends/sglang/README.md
@@ -52,7 +52,7 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 
 ## Quick Start
 
-Below we provide a guide that lets you run all of our the common deployment patterns on a single node.
+Below we provide a guide that lets you run all of our common deployment patterns on a single node.
 
 ### Start NATS and ETCD in the background
 

diff --git a/components/backends/sglang/deploy/README.md b/components/backends/sglang/deploy/README.md
@@ -159,4 +159,4 @@ Common issues and solutions:
 3. **Health check failures**: Review model loading logs and increase `initialDelaySeconds`
 4. **Out of memory**: Increase memory limits or reduce model batch size
 
-For additional support, refer to the [deployment troubleshooting guide](../../../../docs/guides/dynamo_deploy/quickstart.md#troubleshooting).
+For additional support, refer to the [deployment guide](../../../../docs/guides/dynamo_deploy/quickstart.md).
diff --git a/components/backends/sglang/docs/dsr1-wideep-h100.md b/components/backends/sglang/docs/dsr1-wideep-h100.md
@@ -5,7 +5,7 @@ SPDX-License-Identifier: Apache-2.0
 
 # Running DeepSeek-R1 Disaggregated with WideEP on H100s
 
-Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-05-05-large-scale-ep/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs).
+Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-05-05-large-scale-ep/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-wideep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs).
 
 ## Instructions
 

diff --git a/components/backends/trtllm/README.md b/components/backends/trtllm/README.md
@@ -193,7 +193,7 @@ For complete Kubernetes deployment instructions, configurations, and troubleshoo
 
 ### Client
 
-See [client](../vllm/README.md#client) section to learn how to send request to the deployment.
+See [client](../sglang/README.md#testing-the-deployment) section to learn how to send request to the deployment.
 
 NOTE: To send a request to a multi-node deployment, target the node which is running `python3 -m dynamo.frontend <args>`.
 
@@ -218,7 +218,7 @@ DISAGGREGATION_STRATEGY="prefill_first" ./launch/disagg.sh
 
 ## KV Cache Transfer in Disaggregated Serving
 
-Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disaggregated serving: UCX (default) and NIXL (experimental). For detailed information and configuration instructions for each method, see the [KV cache transfer guide](./kv-cache-tranfer.md).
+Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disaggregated serving: UCX (default) and NIXL (experimental). For detailed information and configuration instructions for each method, see the [KV cache transfer guide](./kv-cache-transfer.md).
 
 
 ## Request Migration
@@ -233,7 +233,7 @@ This allows a request to be migrated up to 3 times before failing. See the [Requ
 
 ## Client
 
-See [client](../vllm/README.md#client) section to learn how to send request to the deployment.
+See [client](../sglang/README.md#testing-the-deployment) section to learn how to send request to the deployment.
 
 NOTE: To send a request to a multi-node deployment, target the node which is running `python3 -m dynamo.frontend <args>`.
 

diff --git a/components/backends/trtllm/deploy/README.md b/components/backends/trtllm/deploy/README.md
@@ -241,7 +241,7 @@ TensorRT-LLM supports two methods for KV cache transfer in disaggregated serving
 - **UCX** (default): Standard method for KV cache transfer
 - **NIXL** (experimental): Alternative transfer method
 
-For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-tranfer.md).
+For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-transfer.md).
 
 ## Request Migration
 

diff --git a/...nents/backends/trtllm/kv-cache-tranfer.md → ...ents/backends/trtllm/kv-cache-transfer.md b/...nents/backends/trtllm/kv-cache-tranfer.md → ...ents/backends/trtllm/kv-cache-transfer.md
diff --git a/docs/guides/deploy/k8s_metrics.md b/docs/guides/deploy/k8s_metrics.md
@@ -39,7 +39,7 @@ This will create two components:
 - A Worker component exposing metrics on its system port
 
 Both components expose a `/metrics` endpoint following the OpenMetrics format, but with different metrics appropriate to their roles. For details about:
-- Deployment configuration: See the [vLLM README](../../components/backends/vllm/README.md)
+- Deployment configuration: See the [vLLM README](../../../components/backends/vllm/README.md)
 - Available metrics: See the [metrics guide](../metrics.md)
 
 ### Validate the Deployment

diff --git a/docs/hidden_toctree.rst b/docs/hidden_toctree.rst
@@ -52,7 +52,7 @@
    components/backends/trtllm/deploy/README.md
    components/backends/trtllm/llama4_plus_eagle.md
    components/backends/trtllm/multinode-examples.md
-   components/backends/trtllm/kv-cache-tranfer.md
+   components/backends/trtllm/kv-cache-transfer.md
    components/backends/vllm/deploy/README.md
    components/backends/vllm/multi-node.md