Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix the command and wording
  • Loading branch information
richardhuo-nv committed Jun 6, 2025
commit 7f41cdb7d807c4aea1bca8feaa8e7ab72c6209f2
4 changes: 2 additions & 2 deletions examples/tensorrt_llm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,12 +121,12 @@ dynamo serve graphs.disagg_router:Frontend -f ./configs/disagg_router.yaml
#### Aggregated serving with Multi-Token Prediction(MTP) and DeepSeek R1
```bash
cd /workspace/examples/tensorrt_llm
dynamo serve graphs.disagg_router:Frontend -f configs/deepseek_r1/mtp/mtp_agg.yaml
dynamo serve graphs.agg:Frontend -f configs/deepseek_r1/mtp/mtp_agg.yaml
```
Notes:
- There is a noticeable latency for the first two inference requests. Please send warm-up requests before starting the benchmark.
- Please keep the `cuda_graph_padding_enabled` setting as `false` in the model engine's configuration. There is a known bug, and the fix will be included in the next release of TensorRT-LLM.
- Disaggregated support for MTP in Dynamo + TensorRT-LLM is coming soon.
- MTP support for Disaggregation in Dynamo + TensorRT-LLM is coming soon.

#### Multi-Node Disaggregated Serving

Expand Down
Loading