Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
992adfb
fix: add better port logic (#2175) (#2192)
alec-flowers Jul 30, 2025
9a93f11
chore: fix install (#2191)
ishandhanani Jul 30, 2025
2a616da
chore: fix QA bugs in documentation/readmes (#2199)
athreesh Jul 30, 2025
d0de1a0
feat: Add trtllm deploy examples for k8s #2133 (#2207)
biswapanda Jul 31, 2025
edccbd5
fix(sglang): disagg yaml worker change and agg kv router fix (#2205)
ishandhanani Jul 31, 2025
54fbff3
fix: add curl and jq for health checks #2203 (#2209)
biswapanda Jul 31, 2025
a9b6b28
fix: Kprashanth/trtllm rc4 cherry pick (#2218)
KrishnanPrash Jul 31, 2025
65e89b3
chore: cleanup dead links (#2208)
nealvaidya Jul 31, 2025
c92dc98
chore: update nixl version to 0.4.1 (#2221) (#2228)
nv-anants Jul 31, 2025
eb58916
chore: Remove multimodal readme. (#2212) (#2234)
krishung5 Jul 31, 2025
e848cf5
fix: Cherry pick pr 2186 release 0.4.0 to fix docs/runtime/README.md …
keivenchang Aug 1, 2025
5e3586d
fix: drop cuda graph bs (batch size) on dsr1 h100 sgl (#2235)
ishandhanani Aug 1, 2025
4fbb4e5
fix: handle groveTerminationDelay and auto-detect grove installation …
julienmancuso Aug 1, 2025
dc13774
fix: Locked triton==3.3.1 since triton 3.4.0 breaks tensorrt-llm 1.0.…
dmitry-tokarev-nv Aug 1, 2025
e5e94ad
fix: sgl instructions point to new frontend (#2245)
ishandhanani Aug 1, 2025
92781d3
fix: Update disagg configs for trtllm 1.0.0rc4 changes (release/0.4.0…
rmccorm4 Aug 4, 2025
58ad4a2
fix: readme instruction (#2265)
ishandhanani Aug 4, 2025
039c061
fix: Update eagle_one configs with speculative_model_dir field (#2283)
rmccorm4 Aug 4, 2025
2a8e251
docs: Backport: Dyn 591 (#2247) to 0.4.0 (#2251)
atchernych Aug 4, 2025
2dc4a4b
fix: trtllm container - ENV var used before declaration (#2277)
dmitry-tokarev-nv Aug 5, 2025
85737ba
fix: Update the NIXL TRTLLM commit version to rc4 (#2285)
tanmayv25 Aug 5, 2025
27c8a97
docs: add instruction to deploy model with inference gateway #2257 (#…
biswapanda Aug 5, 2025
641e49d
fix: fix nil pointer deref in dynamo controller (#2293) (#2299)
mohammedabdulwahhab Aug 5, 2025
1b145bb
fix: fix broken doc links (#2308)
biswapanda Aug 5, 2025
4e4818f
fix: Copy cuda libraries from devel to runtime stage (#2298)
nv-tusharma Aug 5, 2025
c92c1f4
docs: update deploy readme (#2306)
atchernych Aug 5, 2025
6fce98a
fix: Add common and test dependencies to sglang runtime build (#2279)…
nv-tusharma Aug 5, 2025
035d6d8
fix: Revert the commit for DeepGEMM to fix vLLM WideEP (#2302) (#2325)
krishung5 Aug 6, 2025
167c793
fix: Backport/anish index rst into 0.4.0 - fix links in docs and more…
athreesh Aug 6, 2025
409aa9e
docs: Final fixes to links reported by QA (#2334)
athreesh Aug 6, 2025
71126c7
fix: nil pointer deref in dynamo controller (#2335)
mohammedabdulwahhab Aug 6, 2025
f342c30
docs: address sphinx build errors for docs.nvidia.com (#2346)
athreesh Aug 7, 2025
96d1f15
docs: Address vincent issue with trtllm symlink (#2351)
athreesh Aug 7, 2025
e8b37a6
fix: ARM Flashinfer Versioning for 0.4.0 Release (#2363)
zaristei Aug 8, 2025
b5c9278
fix: Pinned PyTorch version for vLLM container (#2356)
krishung5 Aug 8, 2025
b0c1a24
chore: ATTRIBUTIONS-Go.md (#2355)
dmitry-tokarev-nv Aug 8, 2025
0cf8041
Revert "adjust tag to accomodate flashinfer versioning typo" (#2364)
zaristei Aug 8, 2025
bd8e368
fix: use wheel files for installation in trtllm build (#2372) (#2375)
nv-anants Aug 8, 2025
73bcc3b
fix(build): Pin cuda-python>=12,<13 to avoid trtllm breakage (#2379)
rmccorm4 Aug 8, 2025
aa57c6b
fix: turn off kvbm for al2023 support (#2533)
saturley-hall Aug 21, 2025
3f0a725
docs: add trtllm known issue for al2023 (#2604) (#2612)
nv-anants Aug 21, 2025
d98a791
docs: update trtllm know issue message (#2639) (#2643)
nv-anants Aug 22, 2025
37fca1c
fix: prevent crash looping hello world (#2625)
biswapanda Aug 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
docs: address sphinx build errors for docs.nvidia.com (#2346)
Signed-off-by: Anish <[email protected]>
  • Loading branch information
athreesh authored Aug 7, 2025
commit f342c30e62dfa35b5cfe2d753a30dc6ba307fa30
20 changes: 10 additions & 10 deletions components/backends/trtllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,12 @@ git checkout $(git describe --tags $(git rev-list --tags --max-count=1))

| Feature | TensorRT-LLM | Notes |
|---------|--------------|-------|
| [**Disaggregated Serving**](../../../docs/architecture/disagg_serving.md) | ✅ | |
| [**Conditional Disaggregation**](../../../docs/architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | Not supported yet |
| [**KV-Aware Routing**](../../../docs/architecture/kv_cache_routing.md) | ✅ | |
| [**SLA-Based Planner**](../../../docs/architecture/sla_planner.md) | 🚧 | Planned |
| [**Load Based Planner**](../../../docs/architecture/load_planner.md) | 🚧 | Planned |
| [**KVBM**](../../../docs/architecture/kvbm_architecture.md) | 🚧 | Planned |
| [**Disaggregated Serving**](../../../architecture/disagg_serving.md) | ✅ | |
| [**Conditional Disaggregation**](../../../architecture/disagg_serving.md#conditional-disaggregation) | 🚧 | Not supported yet |
| [**KV-Aware Routing**](../../../architecture/kv_cache_routing.md) | ✅ | |
| [**SLA-Based Planner**](../../../architecture/sla_planner.md) | 🚧 | Planned |
| [**Load Based Planner**](../../../architecture/load_planner.md) | 🚧 | Planned |
| [**KVBM**](../../../architecture/kvbm_architecture.md) | 🚧 | Planned |

### Large Scale P/D and WideEP Features

Expand Down Expand Up @@ -180,14 +180,14 @@ Below we provide a selected list of advanced examples. Please open up an issue i

### Multinode Deployment

For comprehensive instructions on multinode serving, see the [multinode-examples.md](../../../docs/components/backends/trtllm/multinode-examples.md) guide. It provides step-by-step deployment examples and configuration tips for running Dynamo with TensorRT-LLM across multiple nodes. While the walkthrough uses DeepSeek-R1 as the model, you can easily adapt the process for any supported model by updating the relevant configuration files. You can see [Llama4+eagle](../../../docs/components/backends/trtllm/llama4_plus_eagle.md) guide to learn how to use these scripts when a single worker fits on the single node.
For comprehensive instructions on multinode serving, see the [multinode-examples.md](../../../components/backends/trtllm/multinode-examples.md) guide. It provides step-by-step deployment examples and configuration tips for running Dynamo with TensorRT-LLM across multiple nodes. While the walkthrough uses DeepSeek-R1 as the model, you can easily adapt the process for any supported model by updating the relevant configuration files. You can see [Llama4+eagle](../../../components/backends/trtllm/llama4_plus_eagle.md) guide to learn how to use these scripts when a single worker fits on the single node.

### Speculative Decoding
- **[Llama 4 Maverick Instruct + Eagle Speculative Decoding](../../../docs/components/backends/trtllm/llama4_plus_eagle.md)**
- **[Llama 4 Maverick Instruct + Eagle Speculative Decoding](../../../components/backends/trtllm/llama4_plus_eagle.md)**

### Kubernetes Deployment

For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [TensorRT-LLM Kubernetes Deployment Guide](../../../docs/components/backends/trtllm/deploy/README.md)
For complete Kubernetes deployment instructions, configurations, and troubleshooting, see [TensorRT-LLM Kubernetes Deployment Guide](../../../components/backends/trtllm/deploy/README.md)

### Client

Expand Down Expand Up @@ -216,7 +216,7 @@ DISAGGREGATION_STRATEGY="prefill_first" ./launch/disagg.sh

## KV Cache Transfer in Disaggregated Serving

Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disaggregated serving: UCX (default) and NIXL (experimental). For detailed information and configuration instructions for each method, see the [KV cache transfer guide](../../../docs/components/backends/trtllm/kv-cache-tranfer.md).
Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disaggregated serving: UCX (default) and NIXL (experimental). For detailed information and configuration instructions for each method, see the [KV cache transfer guide](../../../components/backends/trtllm/kv-cache-tranfer.md).

## Request Migration

Expand Down
1 change: 0 additions & 1 deletion docs/examples/runtime/hello_world/README.md

This file was deleted.

119 changes: 119 additions & 0 deletions docs/examples/runtime/hello_world/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
<!--
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Hello World Example

This is the simplest Dynamo example demonstrating a basic service using Dynamo's distributed runtime. It showcases the fundamental concepts of creating endpoints and workers in the Dynamo runtime system.

## Architecture

```text
Client (dynamo_worker)
┌─────────────┐
│ Backend │ Dynamo endpoint (/generate)
└─────────────┘
```

## Components

- **Backend**: A Dynamo service with an endpoint that receives text input and streams back greetings for each comma-separated word
- **Client**: A Dynamo worker that connects to and sends requests to the backend service, then prints out the response

## Implementation Details

The example demonstrates:

- **Endpoint Definition**: Using the `@dynamo_endpoint` decorator to create streaming endpoints
- **Worker Setup**: Using the `@dynamo_worker()` decorator to create distributed runtime workers
- **Service Creation**: Creating services and endpoints using the distributed runtime API
- **Streaming Responses**: Yielding data for real-time streaming
- **Client Integration**: Connecting to services and processing streams
- **Logging**: Basic logging configuration with `configure_dynamo_logging`

## Getting Started

## Prerequisites

Before running this example, ensure you have the following services running:

- **etcd**: A distributed key-value store used for service discovery and metadata storage
- **NATS**: A high-performance message broker for inter-component communication

You can start these services using Docker Compose:

```bash
# clone the dynamo repository if necessary
# git clone https://github.com/ai-dynamo/dynamo.git
cd dynamo
docker compose -f deploy/docker-compose.yml up -d
```

### Running the Example

First, start the backend service:
```bash
cd examples/runtime/hello_world
python hello_world.py
```

Second, in a separate terminal, run the client:
```bash
cd examples/runtime/hello_world
python client.py
```

The client will connect to the backend service and print the streaming results.

### Expected Output

When running the client, you should see streaming output like:
```text
Hello world!
Hello sun!
Hello moon!
Hello star!
```

## Code Structure

### Backend Service (`hello_world.py`)

- **`content_generator`**: A dynamo endpoint that processes text input and yields greetings
- **`worker`**: A dynamo worker that sets up the service, creates the endpoint, and serves it

### Client (`client.py`)

- **`worker`**: A dynamo worker that connects to the backend service and processes the streaming response

## Deployment to Kubernetes

Follow the [Quickstart Guide](../../../guides/dynamo_deploy/quickstart.md) to install Dynamo Cloud.
Then deploy to kubernetes using

```bash
export NAMESPACE=<your-namespace>
cd dynamo
kubectl apply -f examples/runtime/hello_world/deploy/hello_world.yaml -n ${NAMESPACE}
```

to delete your deployment:

```bash
kubectl delete dynamographdeployment hello-world -n ${NAMESPACE}
```
1 change: 0 additions & 1 deletion docs/hidden_toctree.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,6 @@
components/backends/trtllm/llama4_plus_eagle.md
components/backends/trtllm/multinode-examples.md
components/backends/trtllm/kv-cache-tranfer.md
components/backends/vllm/deepseek-r1.md
components/backends/vllm/deploy/README.md
components/backends/vllm/multi-node.md

1 change: 0 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,6 @@ The examples below assume you build the latest image yourself from source. If us
Writing Python Workers in Dynamo <guides/backend.md>
Disaggregation and Performance Tuning <guides/disagg_perf_tuning.md>
Working with Dynamo Kubernetes Operator <guides/dynamo_deploy/dynamo_operator.md>
Configuring Metrics for Observability <guides/metrics.md>

.. toctree::
:hidden:
Expand Down
119 changes: 0 additions & 119 deletions examples/runtime/hello_world/README.md

This file was deleted.

1 change: 1 addition & 0 deletions examples/runtime/hello_world/README.md