-
Notifications
You must be signed in to change notification settings - Fork 754
chore: Update trtllm version to 1.1.0rc3 #2930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughUpdates TensorRT-LLM version references to 1.1.0rc1 across docs, build scripts, and dependencies. Normalizes cache_transceiver_config.backend values in TRT-LLM engine YAMLs to uppercase (DEFAULT/UCX). Adjusts SamplingParams import path in two Python files. Revises multimodal support documentation. No functional algorithm or control-flow changes shown. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (6)
components/backends/trtllm/engine_configs/multimodal/prefill.yaml (1)
31-31: Add a trailing newlineYAMLlint flagged missing newline at EOF. Add one to satisfy linters/CI.
- backend: DEFAULT + backend: DEFAULT +components/backends/trtllm/engine_configs/prefill.yaml (1)
29-30: Add trailing newline (YAMLlint).File lacks a newline at EOF.
Apply:
- backend: DEFAULT + backend: DEFAULT +components/backends/trtllm/engine_configs/multimodal/llama4/prefill.yaml (1)
30-31: Add trailing newline (YAMLlint).File lacks a newline at EOF.
Apply:
- backend: DEFAULT + backend: DEFAULT +components/backends/trtllm/engine_configs/multimodal/decode.yaml (1)
29-29: Add trailing newline at EOF.Minor YAML lint nit; add a newline to satisfy linters.
- backend: DEFAULT + backend: DEFAULT +container/Dockerfile.trtllm (2)
480-486: Pin cuda-python consistently across stages to avoid runtime/build skew.Runtime pins cuda-python to <13, but build/dev don’t. If TRT-LLM 1.1.0rc1 is sensitive to >=13, mirror the pin in the build stage so wheels/tests don’t diverge.
Suggested addition near the build-stage install (before installing TRT-LLM):
- pip uninstall -y tensorrt && \ + pip uninstall -y tensorrt && \ + python3 -m pip install "cuda-python>=12,<13" && \Also consider centralizing these pins via ARGs (e.g., TRITON_PIN=3.3.1, CUDA_PYTHON_PIN="<13") and/or a constraints file to keep stages in sync.
143-164: Avoid overwriting NGC’s triton/pytorch_triton
Installingtriton==3.3.1via pip after the TRT-LLM wheel clobbers the NGC‐providedpytorch_tritonmodule in the build stage, resulting in mismatchedtriton/andpytorch_triton-*.dist-infoat runtime. Choose one of:
- Preferred: install TRT-LLM with dependencies disabled (
pip install --no-deps …) and drop the explicitpip install triton==3.3.1.- Or: introduce an
ngc_basestage (FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS ngc_base) and in the runtime stage copytriton/andpytorch_triton-*.dist-infofromngc_baseinstead ofbuild.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (28)
README.md(1 hunks)components/backends/trtllm/engine_configs/decode.yaml(1 hunks)components/backends/trtllm/engine_configs/deepseek_r1/mtp/mtp_decode.yaml(1 hunks)components/backends/trtllm/engine_configs/deepseek_r1/mtp/mtp_prefill.yaml(1 hunks)components/backends/trtllm/engine_configs/deepseek_r1/simple/decode.yaml(1 hunks)components/backends/trtllm/engine_configs/deepseek_r1/simple/prefill.yaml(1 hunks)components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_decode.yaml(1 hunks)components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_prefill.yaml(1 hunks)components/backends/trtllm/engine_configs/encode.yaml(1 hunks)components/backends/trtllm/engine_configs/gemma3/vswa_decode.yaml(1 hunks)components/backends/trtllm/engine_configs/gemma3/vswa_prefill.yaml(1 hunks)components/backends/trtllm/engine_configs/gpt_oss/decode.yaml(1 hunks)components/backends/trtllm/engine_configs/gpt_oss/prefill.yaml(1 hunks)components/backends/trtllm/engine_configs/llama4/eagle/eagle_decode.yaml(1 hunks)components/backends/trtllm/engine_configs/llama4/eagle/eagle_prefill.yaml(1 hunks)components/backends/trtllm/engine_configs/multimodal/agg.yaml(1 hunks)components/backends/trtllm/engine_configs/multimodal/decode.yaml(1 hunks)components/backends/trtllm/engine_configs/multimodal/llama4/decode.yaml(1 hunks)components/backends/trtllm/engine_configs/multimodal/llama4/prefill.yaml(1 hunks)components/backends/trtllm/engine_configs/multimodal/prefill.yaml(1 hunks)components/backends/trtllm/engine_configs/prefill.yaml(1 hunks)components/backends/trtllm/multimodal_support.md(0 hunks)components/backends/trtllm/src/dynamo/trtllm/main.py(1 hunks)components/backends/trtllm/src/dynamo/trtllm/request_handlers/handler_base.py(1 hunks)container/Dockerfile.trtllm(2 hunks)container/build.sh(1 hunks)docs/support_matrix.md(1 hunks)pyproject.toml(1 hunks)
💤 Files with no reviewable changes (1)
- components/backends/trtllm/multimodal_support.md
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-07-31T11:26:48.422Z
Learnt from: KrishnanPrash
PR: ai-dynamo/dynamo#2217
File: components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_prefill.yaml:18-0
Timestamp: 2025-07-31T11:26:48.422Z
Learning: TRTLLM LLM-API expects all caps for backend field names in configuration files. When migrating TRTLLM configurations, backend values like "WideEP" should be changed to "WIDEEP" to comply with the API requirements.
Applied to files:
components/backends/trtllm/engine_configs/deepseek_r1/simple/decode.yamlcomponents/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_decode.yamlcomponents/backends/trtllm/engine_configs/prefill.yamlcomponents/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_prefill.yamlcomponents/backends/trtllm/engine_configs/gpt_oss/decode.yamlcomponents/backends/trtllm/engine_configs/multimodal/agg.yamlcomponents/backends/trtllm/engine_configs/decode.yaml
📚 Learning: 2025-08-28T20:43:54.701Z
Learnt from: bhuvan002
PR: ai-dynamo/dynamo#2702
File: components/backends/trtllm/src/dynamo/trtllm/logits_processing/adapter.py:7-13
Timestamp: 2025-08-28T20:43:54.701Z
Learning: CI import failures for optional dependencies like tensorrt_llm can be resolved by removing __init__.py files and restructuring imports, rather than wrapping imports in try/except blocks.
Applied to files:
pyproject.toml
🧬 Code graph analysis (1)
components/backends/trtllm/src/dynamo/trtllm/request_handlers/handler_base.py (1)
components/backends/trtllm/src/dynamo/trtllm/engine.py (1)
llm(36-39)
🪛 YAMLlint (1.37.1)
components/backends/trtllm/engine_configs/multimodal/decode.yaml
[error] 29-29: no new line character at the end of file
(new-line-at-end-of-file)
components/backends/trtllm/engine_configs/multimodal/llama4/decode.yaml
[error] 29-29: no new line character at the end of file
(new-line-at-end-of-file)
components/backends/trtllm/engine_configs/multimodal/prefill.yaml
[error] 31-31: no new line character at the end of file
(new-line-at-end-of-file)
components/backends/trtllm/engine_configs/prefill.yaml
[error] 30-30: no new line character at the end of file
(new-line-at-end-of-file)
components/backends/trtllm/engine_configs/multimodal/llama4/prefill.yaml
[error] 31-31: no new line character at the end of file
(new-line-at-end-of-file)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (23)
components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_prefill.yaml (1)
43-45: Uppercase backend values confirmed across TRTLLM configsNo lowercase
backendvalues detected; approving code changes.components/backends/trtllm/engine_configs/llama4/eagle/eagle_decode.yaml (1)
51-53: Uppercasing backend is consistent with the rest of the PRChange is safe and aligns with expected enum values.
components/backends/trtllm/engine_configs/gemma3/vswa_decode.yaml (1)
28-30: Consistent backend value casingNo issues spotted.
components/backends/trtllm/engine_configs/deepseek_r1/mtp/mtp_decode.yaml (1)
56-58: OK: backend -> DEFAULTMatches TRT-LLM config expectations.
components/backends/trtllm/engine_configs/multimodal/prefill.yaml (1)
30-31: Backend casing update looks goodNo functional concerns.
components/backends/trtllm/engine_configs/deepseek_r1/simple/prefill.yaml (1)
38-40: Approved: Backend value standardized
Both simple/prefill.yaml and simple/decode.yaml setcache_transceiver_config.backend: DEFAULTand are consistent.components/backends/trtllm/engine_configs/gemma3/vswa_prefill.yaml (1)
29-31: Uppercase backend value acknowledgedChange is correct and consistent with vswa_decode.
components/backends/trtllm/engine_configs/multimodal/llama4/decode.yaml (1)
28-29: Backend token normalized to DEFAULT — LGTM.Uppercasing aligns with TRT-LLM’s config expectations. No further changes needed.
components/backends/trtllm/engine_configs/gpt_oss/decode.yaml (1)
21-23: LGTM — no lowercase cache_transceiver_config backends found across engine_configscomponents/backends/trtllm/engine_configs/gpt_oss/prefill.yaml (1)
23-24: UCX token uppercased — LGTM.Consistent with decode config.
components/backends/trtllm/engine_configs/llama4/eagle/eagle_prefill.yaml (1)
36-37: Backend token normalized to DEFAULT — LGTM.Aligned with repo-wide standardization.
components/backends/trtllm/engine_configs/deepseek_r1/mtp/mtp_prefill.yaml (1)
40-41: Backend token normalized to DEFAULT — LGTM.No functional changes beyond enum casing.
components/backends/trtllm/engine_configs/encode.yaml (1)
29-30: Backend token normalized to DEFAULT — LGTM.Matches TRT-LLM’s all-caps requirement.
components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_decode.yaml (1)
65-66: Approve uppercase cache_transceiver_config backends.
Allcache_transceiver_config.backendvalues are now uppercase (DEFAULTorUCX) across TRTLLM engine configurations—no lowercase occurrences remain.container/build.sh (1)
101-101: Bump to tensorrt-llm==1.1.0rc1 looks good; PR text needs correction.PR description mentions “1.0rc1”; please update it to “1.1.0rc1” for consistency with the change here.
README.md (1)
202-203: Provide exact GitHub tag or branch for v1.1.0rc1I can’t find a tag named
v1.1.0rc1in the NVIDIA/TensorRT-LLM repo; please share the correct GitHub tag or branch link so I can confirm theBASE_TAGin docker/Dockerfile.multi.pyproject.toml (1)
51-53: Version sync and Triton pin LGTM.
trtllmextra now usestensorrt-llm==1.1.0rc1withtriton==3.3.1; matches stated compatibility.Please verify CI resolves the same versions as the container by running a quick lock/resolve in the TRT-LLM environment to catch any transitive conflicts.
components/backends/trtllm/engine_configs/multimodal/decode.yaml (1)
28-29: Backend normalized to DEFAULT.Matches TRT-LLM’s uppercase requirement.
components/backends/trtllm/engine_configs/deepseek_r1/simple/decode.yaml (1)
59-61: Backend normalized to DEFAULT.Change is correct and consistent with other configs.
components/backends/trtllm/engine_configs/multimodal/agg.yaml (1)
28-29: Backend normalized to DEFAULT.Looks good and consistent across multimodal configs.
components/backends/trtllm/engine_configs/decode.yaml (1)
30-31: Backend normalized to DEFAULT.Correct per TRT-LLM API; no other changes observed.
components/backends/trtllm/src/dynamo/trtllm/request_handlers/handler_base.py (1)
25-25: Update SamplingParams import in main.py
components/backends/trtllm/src/dynamo/trtllm/main.py:18: replacefrom tensorrt_llm.llmapi.llm import SamplingParamswith the correct 1.1.0rc1 import path.
⛔ Skipped due to learnings
Learnt from: tanmayv25 PR: ai-dynamo/dynamo#1391 File: examples/tensorrt_llm/common/base_engine.py:171-176 Timestamp: 2025-06-05T01:10:51.865Z Learning: In examples/tensorrt_llm/common/base_engine.py, the _init_engine method is called only once during initialization, so direct mutation of the _default_sampling_params object during setup is safe and appropriate.components/backends/trtllm/src/dynamo/trtllm/main.py (1)
18-19: Version bump to 1.1.0rc1 is consistent
Alltensorrt-llm==1.1.0rc1references are correctly updated in pyproject.toml, build.sh, and README.md.
nv-kmcgill53
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update your PR description with the correct version of TRTLLM you are upgrading to. 1.1.0rc1
KrishnanPrash
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wanted to follow-up on our discussion, were the from tensorrt_llm import MultimodalEncoder and related changes not needed for tensorrt_llm==1.1.0rc1 [Link to Changes]
Will we need to make these changes when we bump up to tensorrt_llm>=1.1.0rc2?
They are part of |
dmitry-tokarev-nv
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please refresh DEFAULT_EXPERIMENTAL_TRTLLM_COMMIT
nv-kmcgill53
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for looking into those upstream version questions. All my observations have been addressed.
|
@indrajit96 you'll need to retroactively sign your unsigned commits |
d032b99 to
26eb264
Compare
26eb264 to
d032b99
Compare
d032b99 to
e63cc74
Compare
e63cc74 to
d032b99
Compare
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
d032b99 to
fe5c619
Compare
|
/ok to test 0656523 |
Signed-off-by: Indrajit Bhosale <[email protected]> Signed-off-by: zhongdaor <[email protected]>
Overview:
Update trtllm version in dynamo container from
1.0.0rc6to1.1.0rc3Details:
Slight change in usage of llmapi Sampling params
Where should the reviewer start?
components/backends/trtllm/src/dynamo/trtllm/main.pyfor API changescontainer/build.shfor Docker changesRelated Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
Chores
Documentation
Refactor