Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
558482d
feat: initial benchmarking wrapper in-cluster work
hhzhang16 Sep 8, 2025
7cc6edb
Merge branch 'main' of github.com:ai-dynamo/dynamo into hannahz/dyn-9…
hhzhang16 Sep 18, 2025
5a09233
feat: update benchmark job for in-cluster benchmarking following late…
hhzhang16 Sep 18, 2025
8f19a4d
feat: update in-cluster benchmark job and yaml
hhzhang16 Sep 19, 2025
3ff6675
feat: enhance GPT OSS frontend with improved harmony tool calling par…
zhongdaor-nv Sep 18, 2025
9482320
feat(operator): mechanism for disabling imagePullSecrets discovery (#…
tmonty12 Sep 18, 2025
f7cc9e9
refactor: simplify Dockerfile.vllm, enable local-dev for all framewor…
keivenchang Sep 19, 2025
d5f0495
feat: Request Cancellation unary request support (#3004)
kthui Sep 19, 2025
1648836
build: update trtllm to v1.1.0rc5 to enable trtllm + KVBM integration…
richardhuo-nv Sep 19, 2025
91181f6
build: OPS-597, OPS-861 restructure TRT-LLM to follow container strat…
nv-tusharma Sep 19, 2025
89e074c
feat: Sglang canary health check (#3103)
tzulingk Sep 19, 2025
271ef47
feat: Convert message[content] from list to string. (#3067)
KrishnanPrash Sep 19, 2025
f79e57b
feat: KVBM connector : enabling vectorized copy from pinned memory to…
oandreeva-nv Sep 19, 2025
8ee077f
feat: update READMe commands
hhzhang16 Sep 19, 2025
4ac8147
feat: update READMe commands
hhzhang16 Sep 19, 2025
e7ed272
Merge branch 'main' of github.com:ai-dynamo/dynamo into hannahz/dyn-9…
hhzhang16 Sep 19, 2025
534ba19
docs: move in-cluster benchmarking doc to the overall benchmarking do…
hhzhang16 Sep 19, 2025
0235ece
feat: minor adjustments based on self look-through and coderabbit com…
hhzhang16 Sep 19, 2025
b392205
Merge branch 'main' of github.com:ai-dynamo/dynamo into hannahz/dyn-9…
hhzhang16 Sep 22, 2025
ef92388
docs: add benchmarking cross-namespace
hhzhang16 Sep 22, 2025
69bcfa8
docs: have user modify benchmark job instead of using envsubst
hhzhang16 Sep 22, 2025
e83590b
docs: add tldr
hhzhang16 Sep 22, 2025
efd16d6
docs: minor doc updates
hhzhang16 Sep 22, 2025
ae9e70e
Merge branch 'main' of github.com:ai-dynamo/dynamo into hannahz/dyn-9…
hhzhang16 Sep 22, 2025
5131348
docs: update k8s-related stuff in benchmarking.md
hhzhang16 Sep 23, 2025
38955ef
Merge branch 'main' into hannahz/dyn-973-allow-in-cluster-perf-benchm…
hhzhang16 Sep 23, 2025
a5e5b18
docs: updating client-side prereqs
hhzhang16 Sep 23, 2025
de853cf
Merge branch 'main' of github.com:ai-dynamo/dynamo into hannahz/dyn-9…
hhzhang16 Sep 23, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
build: update trtllm to v1.1.0rc5 to enable trtllm + KVBM integration (
…#3119)

Signed-off-by: richardhuo-nv <rihuo@nvidia.com>
  • Loading branch information
richardhuo-nv authored and hhzhang16 committed Sep 19, 2025
commit 1648836e18e80821425aea8b089c45475a6010a8
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ It is recommended to use [NGC PyTorch Container](https://catalog.ngc.nvidia.com/

> [!Note]
> Ensure that you select a PyTorch container image version that matches the version of TensorRT-LLM you are using.
> For example, if you are using `tensorrt-llm==1.1.0rc3`, use the PyTorch container image version `25.06`.
> For example, if you are using `tensorrt-llm==1.1.0rc5`, use the PyTorch container image version `25.06`.
> To find the correct PyTorch container version for your desired `tensorrt-llm` release, visit the [TensorRT-LLM Dockerfile.multi](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docker/Dockerfile.multi) on GitHub. Switch to the branch that matches your `tensorrt-llm` version, and look for the `BASE_TAG` line to identify the recommended PyTorch container tag.

> [!Important]
Expand Down
4 changes: 2 additions & 2 deletions container/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ TENSORRTLLM_PIP_WHEEL_DIR="/tmp/trtllm_wheel/"
# TensorRT-LLM commit to use for building the trtllm wheel if not provided.
# Important Note: This commit is not used in our CI pipeline. See the CI
# variables to learn how to run a pipeline with a specific commit.
DEFAULT_EXPERIMENTAL_TRTLLM_COMMIT="e81c50dbd2811ec858eccc2c71b5e7a330ff7e24"
DEFAULT_EXPERIMENTAL_TRTLLM_COMMIT="0c9430e5a530ba958fc9dca561a3ad865ad9f492"
TRTLLM_COMMIT=""
TRTLLM_USE_NIXL_KVCACHE_EXPERIMENTAL="0"
TRTLLM_GIT_URL=""
Expand All @@ -98,7 +98,7 @@ TRTLLM_GIT_URL=""
TENSORRTLLM_INDEX_URL="https://pypi.python.org/simple"
# TODO: Remove the version specification from here and use the ai-dynamo[trtllm] package.
# Need to update the Dockerfile.trtllm to use the ai-dynamo[trtllm] package.
DEFAULT_TENSORRTLLM_PIP_WHEEL="tensorrt-llm==1.1.0rc3"
DEFAULT_TENSORRTLLM_PIP_WHEEL="tensorrt-llm==1.1.0rc5"
TENSORRTLLM_PIP_WHEEL=""


Expand Down
10 changes: 3 additions & 7 deletions docs/guides/run_kvbm_in_trtllm.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ To learn what KVBM is, please check [here](https://docs.nvidia.com/dynamo/latest
> - KVBM only supports TensorRT-LLM’s PyTorch backend.
> - To enable disk cache offloading, you must first enable a CPU memory cache offloading.
> - Disable partial reuse `enable_partial_reuse: false` in the LLM API config’s `kv_connector_config` to increase offloading cache hits.
> - KVBM requires TensorRT-LLM at commit ce580ce4f52af3ad0043a800b3f9469e1f1109f6 or newer.
> - KVBM requires TensorRT-LLM v1.1.0rc5 or newer.
> - Enabling KVBM metrics with TensorRT-LLM is still a work in progress.

## Quick Start
Expand All @@ -38,12 +38,8 @@ To use KVBM in TensorRT-LLM, you can follow the steps below:
# start up etcd for KVBM leader/worker registration and discovery
docker compose -f deploy/docker-compose.yml up -d

# Build a container that includes TensorRT-LLM and KVBM. Note: KVBM integration is only available in TensorRT-LLM commit dcd110cfac07e577ce01343c455917832b0f3d5e or newer.
# When building with the --tensorrtllm-commit option, you may notice that https://github.com keeps prompting for a username and password.
# This happens because cloning TensorRT-LLM can hit GitHub’s rate limit.
# To work around this, you can keep pressing "Enter" or "Return.".
# Setting "export GIT_LFS_SKIP_SMUDGE=1" may also reduce the number of prompts.
./container/build.sh --framework trtllm --tensorrtllm-commit dcd110cfac07e577ce01343c455917832b0f3d5e --enable-kvbm
# Build a container that includes TensorRT-LLM and KVBM.
./container/build.sh --framework trtllm --enable-kvbm

# launch the container
./container/run.sh --framework trtllm -it --mount-workspace --use-nixl-gds
Expand Down
2 changes: 1 addition & 1 deletion docs/support_matrix.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ If you are using a **GPU**, the following GPU models and architectures are suppo
| **Build Dependency** | **Version** |
| :------------------- | :------------------------------------------------------------------------------- |
| **Base Container** | [25.03](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda-dl-base/tags) |
| **TensorRT-LLM** | 1.1.0rc3 |
| **TensorRT-LLM** | 1.1.0rc5 |
| **NIXL** | 0.4.1 |
| **vLLM** | 0.10.1.1 |
| **SGLang** | 0.5.0rc2 |
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ Repository = "https://github.com/ai-dynamo/dynamo.git"
[project.optional-dependencies]
trtllm =[
"uvloop",
"tensorrt-llm==1.1.0rc3",
"tensorrt-llm==1.1.0rc5",
]

vllm = [
Expand Down