docs: address sphinx build errors for docs.nvidia.com (#2346)

Signed-off-by: Anish <[email protected]>
ai-dynamo · biswapanda · Jul 30, 2025 · Jul 30, 2025 · Jul 30, 2025 · Jul 31, 2025
commit f342c30e62dfa35b5cfe2d753a30dc6ba307fa30
diff --git a/components/backends/trtllm/README.md b/components/backends/trtllm/README.md
@@ -49,12 +49,12 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))
 
 | Feature | TensorRT-LLM | Notes |
 |---------|--------------|-------|
-| [**Disaggregated Serving**](../../../docs/architecture/disagg_serving.md) | ✅ |  |
-| [**Conditional Disaggregation**](../../../docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | Not supported yet |
-| [**KV-Aware Routing**](../../../docs/architecture/kv_cache_routing.md) | ✅ |  |
-| [**SLA-Based Planner**](../../../docs/architecture/sla_planner.md) | 🚧 | Planned |
-| [**Load Based Planner**](../../../docs/architecture/load_planner.md) | 🚧 | Planned |
-| [**KVBM**](../../../docs/architecture/kvbm_architecture.md) | 🚧 | Planned |
+| [**Disaggregated Serving**](../../../architecture/disagg_serving.md) | ✅ |  |
+| [**Conditional Disaggregation**](../../../architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | Not supported yet |
+| [**KV-Aware Routing**](../../../architecture/kv_cache_routing.md) | ✅ |  |
+| [**SLA-Based Planner**](../../../architecture/sla_planner.md) | 🚧 | Planned |
+| [**Load Based Planner**](../../../architecture/load_planner.md) | 🚧 | Planned |
+| [**KVBM**](../../../architecture/kvbm_architecture.md) | 🚧 | Planned |
 
 ### Large Scale P/D and WideEP Features
 
@@ -180,14 +180,14 @@ Below we provide a selected list of advanced examples. Please open up an issue i
 
 ### Multinode Deployment
 
-For comprehensive instructions on multinode serving, see the [multinode-examples.md](../../../docs/components/backends/trtllm/multinode-examples.md) guide. It provides step-by-step deployment examples and configuration tips for running Dynamo with TensorRT-LLM across multiple nodes. While the walkthrough uses DeepSeek-R1 as the model, you can easily adapt the process for any supported model by updating the relevant configuration files. You can see [Llama4+eagle](../../../docs/components/backends/trtllm/llama4_plus_eagle.md) guide to learn how to use these scripts when a single worker fits on the single node.
+For comprehensive instructions on multinode serving, see the [multinode-examples.md](../../../components/backends/trtllm/multinode-examples.md) guide. It provides step-by-step deployment examples and configuration tips for running Dynamo with TensorRT-LLM across multiple nodes. While the walkthrough uses DeepSeek-R1 as the model, you can easily adapt the process for any supported model by updating the relevant configuration files. You can see [Llama4+eagle](../../../components/backends/trtllm/llama4_plus_eagle.md) guide to learn how to use these scripts when a single worker fits on the single node.
 
 ### Speculative Decoding
-- **[Llama 4 Maverick Instruct + Eagle Speculative Decoding](../../../docs/components/backends/trtllm/llama4_plus_eagle.md)**
+- **[Llama 4 Maverick Instruct + Eagle Speculative Decoding](../../../components/backends/trtllm/llama4_plus_eagle.md)**
 
 ### Kubernetes Deployment
 
-For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [TensorRT-LLM Kubernetes Deployment Guide](../../../docs/components/backends/trtllm/deploy/README.md)
+For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [TensorRT-LLM Kubernetes Deployment Guide](../../../components/backends/trtllm/deploy/README.md)
 
 ### Client
 
@@ -216,7 +216,7 @@ DISAGGREGATION_STRATEGY="prefill_first" ./launch/disagg.sh
 
 ## KV Cache Transfer in Disaggregated Serving
 
-Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disaggregated serving: UCX (default) and NIXL (experimental). For detailed information and configuration instructions for each method, see the [KV cache transfer guide](../../../docs/components/backends/trtllm/kv-cache-tranfer.md).
+Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disaggregated serving: UCX (default) and NIXL (experimental). For detailed information and configuration instructions for each method, see the [KV cache transfer guide](../../../components/backends/trtllm/kv-cache-tranfer.md).
 
 ## Request Migration
 

diff --git a/docs/examples/runtime/hello_world/README.md b/docs/examples/runtime/hello_world/README.md
diff --git a/docs/examples/runtime/hello_world/README.md b/docs/examples/runtime/hello_world/README.md
@@ -0,0 +1,119 @@
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Hello World Example
+
+This is the simplest Dynamo example demonstrating a basic service using Dynamo's distributed runtime. It showcases the fundamental concepts of creating endpoints and workers in the Dynamo runtime system.
+
+## Architecture
+
+```text
+Client (dynamo_worker)
+      │
+      ▼
+┌─────────────┐
+│   Backend   │  Dynamo endpoint (/generate)
+└─────────────┘
+```
+
+## Components
+
+- **Backend**: A Dynamo service with an endpoint that receives text input and streams back greetings for each comma-separated word
+- **Client**: A Dynamo worker that connects to and sends requests to the backend service, then prints out the response
+
+## Implementation Details
+
+The example demonstrates:
+
+- **Endpoint Definition**: Using the `@dynamo_endpoint` decorator to create streaming endpoints
+- **Worker Setup**: Using the `@dynamo_worker()` decorator to create distributed runtime workers
+- **Service Creation**: Creating services and endpoints using the distributed runtime API
+- **Streaming Responses**: Yielding data for real-time streaming
+- **Client Integration**: Connecting to services and processing streams
+- **Logging**: Basic logging configuration with `configure_dynamo_logging`
+
+## Getting Started
+
+## Prerequisites
+
+ Before running this example, ensure you have the following services running:
+
+ - **etcd**: A distributed key-value store used for service discovery and metadata storage
+ - **NATS**: A high-performance message broker for inter-component communication
+
+ You can start these services using Docker Compose:
+
+ ```bash
+ # clone the dynamo repository if necessary
+ # git clone https://github.com/ai-dynamo/dynamo.git
+ cd dynamo
+ docker compose -f deploy/docker-compose.yml up -d
+ ```
+
+### Running the Example
+
+First, start the backend service:
+```bash
+cd examples/runtime/hello_world
+python hello_world.py
+```
+
+Second, in a separate terminal, run the client:
+```bash
+cd examples/runtime/hello_world
+python client.py
+```
+
+The client will connect to the backend service and print the streaming results.
+
+### Expected Output
+
+When running the client, you should see streaming output like:
+```text
+Hello world!
+Hello sun!
+Hello moon!
+Hello star!
+```
+
+## Code Structure
+
+### Backend Service (`hello_world.py`)
+
+- **`content_generator`**: A dynamo endpoint that processes text input and yields greetings
+- **`worker`**: A dynamo worker that sets up the service, creates the endpoint, and serves it
+
+### Client (`client.py`)
+
+- **`worker`**: A dynamo worker that connects to the backend service and processes the streaming response
+
+## Deployment to Kubernetes
+
+Follow the [Quickstart Guide](../../../guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.
+Then deploy to kubernetes using
+
+```bash
+export NAMESPACE=<your-namespace>
+cd dynamo
+kubectl apply -f examples/runtime/hello_world/deploy/hello_world.yaml -n ${NAMESPACE}
+```
+
+to delete your deployment:
+
+```bash
+kubectl delete dynamographdeployment hello-world -n ${NAMESPACE}
+```
diff --git a/docs/hidden_toctree.rst b/docs/hidden_toctree.rst
@@ -53,7 +53,6 @@
    components/backends/trtllm/llama4_plus_eagle.md
    components/backends/trtllm/multinode-examples.md
    components/backends/trtllm/kv-cache-tranfer.md
-   components/backends/vllm/deepseek-r1.md
    components/backends/vllm/deploy/README.md
    components/backends/vllm/multi-node.md
 
diff --git a/docs/index.rst b/docs/index.rst
@@ -143,7 +143,6 @@ The examples below assume you build the latest image yourself from source. If us
    Writing Python Workers in Dynamo <guides/backend.md>
    Disaggregation and Performance Tuning <guides/disagg_perf_tuning.md>
    Working with Dynamo Kubernetes Operator <guides/dynamo_deploy/dynamo_operator.md>
-   Configuring Metrics for Observability <guides/metrics.md>
 
 .. toctree::
    :hidden:

@@ -0,0 +1 @@
+../../../docs/examples/runtime/hello_world/README.md
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		../../../docs/examples/runtime/hello_world/README.md