Skip to content

Conversation

@indrajit96
Copy link
Contributor

@indrajit96 indrajit96 commented Sep 8, 2025

Overview:

Update trtllm version in dynamo container from 1.0.0rc6 to 1.1.0rc3

Details:

Slight change in usage of llmapi Sampling params

Where should the reviewer start?

components/backends/trtllm/src/dynamo/trtllm/main.py for API changes
container/build.sh for Docker changes

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • Chores

    • Bumped default TensorRT-LLM to 1.1.0rc1 and aligned dependency pins; updated container/build defaults.
  • Documentation

    • Updated README and support matrix to reflect TensorRT-LLM 1.1.0rc1.
    • Revised TRT-LLM multimodal guidance and important notes.
  • Refactor

    • Standardized backend identifiers in TRT-LLM engine configs (consistent casing like DEFAULT/UCX).
    • Adjusted internal imports to match the latest TensorRT-LLM package layout.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 8, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 8, 2025

Walkthrough

Updates TensorRT-LLM version references to 1.1.0rc1 across docs, build scripts, and dependencies. Normalizes cache_transceiver_config.backend values in TRT-LLM engine YAMLs to uppercase (DEFAULT/UCX). Adjusts SamplingParams import path in two Python files. Revises multimodal support documentation. No functional algorithm or control-flow changes shown.

Changes

Cohort / File(s) Summary
TRT-LLM version references
README.md, container/Dockerfile.trtllm, container/build.sh, docs/support_matrix.md, pyproject.toml
Bump/version-note updates from tensorrt-llm==1.0.0rc6 to 1.1.0rc1; comments and optional deps updated; build.sh default wheel set to 1.1.0rc1.
Engine config backend normalization (DEFAULT)
components/backends/trtllm/engine_configs/decode.yaml, .../encode.yaml, .../prefill.yaml, .../multimodal/agg.yaml, .../multimodal/decode.yaml, .../multimodal/prefill.yaml, .../multimodal/llama4/decode.yaml, .../multimodal/llama4/prefill.yaml, .../gemma3/vswa_decode.yaml, .../gemma3/vswa_prefill.yaml, .../llama4/eagle/eagle_decode.yaml, .../llama4/eagle/eagle_prefill.yaml, .../deepseek_r1/*/decode.yaml, .../deepseek_r1/*/prefill.yaml
Change cache_transceiver_config.backend from default to DEFAULT.
Engine config backend normalization (UCX)
components/backends/trtllm/engine_configs/gpt_oss/decode.yaml, .../gpt_oss/prefill.yaml
Change cache_transceiver_config.backend from ucx to UCX.
Import path adjustments
components/backends/trtllm/src/dynamo/trtllm/main.py, .../request_handlers/handler_base.py
Update SamplingParams import to from tensorrt_llm.llmapi.llm import SamplingParams.
Multimodal docs revision
components/backends/trtllm/multimodal_support.md
Remove container build/run/usage sections; add Important note about known issues and rebuild guidance for multimodal.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Poem

I hop through configs, ears held high,
Flipping “default” to “DEFAULT” as I fly.
UCX in caps—now that’s quite neat,
New wheels to roll, versioned sweet.
With tidy imports in a tidy den,
I thump approval—ship it, friend! 🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (6)
components/backends/trtllm/engine_configs/multimodal/prefill.yaml (1)

31-31: Add a trailing newline

YAMLlint flagged missing newline at EOF. Add one to satisfy linters/CI.

-  backend: DEFAULT
+  backend: DEFAULT
+
components/backends/trtllm/engine_configs/prefill.yaml (1)

29-30: Add trailing newline (YAMLlint).

File lacks a newline at EOF.

Apply:

-  backend: DEFAULT
+  backend: DEFAULT
+
components/backends/trtllm/engine_configs/multimodal/llama4/prefill.yaml (1)

30-31: Add trailing newline (YAMLlint).

File lacks a newline at EOF.

Apply:

-  backend: DEFAULT
+  backend: DEFAULT
+
components/backends/trtllm/engine_configs/multimodal/decode.yaml (1)

29-29: Add trailing newline at EOF.

Minor YAML lint nit; add a newline to satisfy linters.

-  backend: DEFAULT
+  backend: DEFAULT
+
container/Dockerfile.trtllm (2)

480-486: Pin cuda-python consistently across stages to avoid runtime/build skew.

Runtime pins cuda-python to <13, but build/dev don’t. If TRT-LLM 1.1.0rc1 is sensitive to >=13, mirror the pin in the build stage so wheels/tests don’t diverge.

Suggested addition near the build-stage install (before installing TRT-LLM):

-    pip uninstall -y tensorrt && \
+    pip uninstall -y tensorrt && \
+    python3 -m pip install "cuda-python>=12,<13" && \

Also consider centralizing these pins via ARGs (e.g., TRITON_PIN=3.3.1, CUDA_PYTHON_PIN="<13") and/or a constraints file to keep stages in sync.


143-164: Avoid overwriting NGC’s triton/pytorch_triton
Installing triton==3.3.1 via pip after the TRT-LLM wheel clobbers the NGC‐provided pytorch_triton module in the build stage, resulting in mismatched triton/ and pytorch_triton-*.dist-info at runtime. Choose one of:

  • Preferred: install TRT-LLM with dependencies disabled (pip install --no-deps …) and drop the explicit pip install triton==3.3.1.
  • Or: introduce an ngc_base stage (FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} AS ngc_base) and in the runtime stage copy triton/ and pytorch_triton-*.dist-info from ngc_base instead of build.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1477f6e and 7a778f7.

📒 Files selected for processing (28)
  • README.md (1 hunks)
  • components/backends/trtllm/engine_configs/decode.yaml (1 hunks)
  • components/backends/trtllm/engine_configs/deepseek_r1/mtp/mtp_decode.yaml (1 hunks)
  • components/backends/trtllm/engine_configs/deepseek_r1/mtp/mtp_prefill.yaml (1 hunks)
  • components/backends/trtllm/engine_configs/deepseek_r1/simple/decode.yaml (1 hunks)
  • components/backends/trtllm/engine_configs/deepseek_r1/simple/prefill.yaml (1 hunks)
  • components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_decode.yaml (1 hunks)
  • components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_prefill.yaml (1 hunks)
  • components/backends/trtllm/engine_configs/encode.yaml (1 hunks)
  • components/backends/trtllm/engine_configs/gemma3/vswa_decode.yaml (1 hunks)
  • components/backends/trtllm/engine_configs/gemma3/vswa_prefill.yaml (1 hunks)
  • components/backends/trtllm/engine_configs/gpt_oss/decode.yaml (1 hunks)
  • components/backends/trtllm/engine_configs/gpt_oss/prefill.yaml (1 hunks)
  • components/backends/trtllm/engine_configs/llama4/eagle/eagle_decode.yaml (1 hunks)
  • components/backends/trtllm/engine_configs/llama4/eagle/eagle_prefill.yaml (1 hunks)
  • components/backends/trtllm/engine_configs/multimodal/agg.yaml (1 hunks)
  • components/backends/trtllm/engine_configs/multimodal/decode.yaml (1 hunks)
  • components/backends/trtllm/engine_configs/multimodal/llama4/decode.yaml (1 hunks)
  • components/backends/trtllm/engine_configs/multimodal/llama4/prefill.yaml (1 hunks)
  • components/backends/trtllm/engine_configs/multimodal/prefill.yaml (1 hunks)
  • components/backends/trtllm/engine_configs/prefill.yaml (1 hunks)
  • components/backends/trtllm/multimodal_support.md (0 hunks)
  • components/backends/trtllm/src/dynamo/trtllm/main.py (1 hunks)
  • components/backends/trtllm/src/dynamo/trtllm/request_handlers/handler_base.py (1 hunks)
  • container/Dockerfile.trtllm (2 hunks)
  • container/build.sh (1 hunks)
  • docs/support_matrix.md (1 hunks)
  • pyproject.toml (1 hunks)
💤 Files with no reviewable changes (1)
  • components/backends/trtllm/multimodal_support.md
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-07-31T11:26:48.422Z
Learnt from: KrishnanPrash
PR: ai-dynamo/dynamo#2217
File: components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_prefill.yaml:18-0
Timestamp: 2025-07-31T11:26:48.422Z
Learning: TRTLLM LLM-API expects all caps for backend field names in configuration files. When migrating TRTLLM configurations, backend values like "WideEP" should be changed to "WIDEEP" to comply with the API requirements.

Applied to files:

  • components/backends/trtllm/engine_configs/deepseek_r1/simple/decode.yaml
  • components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_decode.yaml
  • components/backends/trtllm/engine_configs/prefill.yaml
  • components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_prefill.yaml
  • components/backends/trtllm/engine_configs/gpt_oss/decode.yaml
  • components/backends/trtllm/engine_configs/multimodal/agg.yaml
  • components/backends/trtllm/engine_configs/decode.yaml
📚 Learning: 2025-08-28T20:43:54.701Z
Learnt from: bhuvan002
PR: ai-dynamo/dynamo#2702
File: components/backends/trtllm/src/dynamo/trtllm/logits_processing/adapter.py:7-13
Timestamp: 2025-08-28T20:43:54.701Z
Learning: CI import failures for optional dependencies like tensorrt_llm can be resolved by removing __init__.py files and restructuring imports, rather than wrapping imports in try/except blocks.

Applied to files:

  • pyproject.toml
🧬 Code graph analysis (1)
components/backends/trtllm/src/dynamo/trtllm/request_handlers/handler_base.py (1)
components/backends/trtllm/src/dynamo/trtllm/engine.py (1)
  • llm (36-39)
🪛 YAMLlint (1.37.1)
components/backends/trtllm/engine_configs/multimodal/decode.yaml

[error] 29-29: no new line character at the end of file

(new-line-at-end-of-file)

components/backends/trtllm/engine_configs/multimodal/llama4/decode.yaml

[error] 29-29: no new line character at the end of file

(new-line-at-end-of-file)

components/backends/trtllm/engine_configs/multimodal/prefill.yaml

[error] 31-31: no new line character at the end of file

(new-line-at-end-of-file)

components/backends/trtllm/engine_configs/prefill.yaml

[error] 30-30: no new line character at the end of file

(new-line-at-end-of-file)

components/backends/trtllm/engine_configs/multimodal/llama4/prefill.yaml

[error] 31-31: no new line character at the end of file

(new-line-at-end-of-file)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (23)
components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_prefill.yaml (1)

43-45: Uppercase backend values confirmed across TRTLLM configs

No lowercase backend values detected; approving code changes.

components/backends/trtllm/engine_configs/llama4/eagle/eagle_decode.yaml (1)

51-53: Uppercasing backend is consistent with the rest of the PR

Change is safe and aligns with expected enum values.

components/backends/trtllm/engine_configs/gemma3/vswa_decode.yaml (1)

28-30: Consistent backend value casing

No issues spotted.

components/backends/trtllm/engine_configs/deepseek_r1/mtp/mtp_decode.yaml (1)

56-58: OK: backend -> DEFAULT

Matches TRT-LLM config expectations.

components/backends/trtllm/engine_configs/multimodal/prefill.yaml (1)

30-31: Backend casing update looks good

No functional concerns.

components/backends/trtllm/engine_configs/deepseek_r1/simple/prefill.yaml (1)

38-40: Approved: Backend value standardized
Both simple/prefill.yaml and simple/decode.yaml set cache_transceiver_config.backend: DEFAULT and are consistent.

components/backends/trtllm/engine_configs/gemma3/vswa_prefill.yaml (1)

29-31: Uppercase backend value acknowledged

Change is correct and consistent with vswa_decode.

components/backends/trtllm/engine_configs/multimodal/llama4/decode.yaml (1)

28-29: Backend token normalized to DEFAULT — LGTM.

Uppercasing aligns with TRT-LLM’s config expectations. No further changes needed.

components/backends/trtllm/engine_configs/gpt_oss/decode.yaml (1)

21-23: LGTM — no lowercase cache_transceiver_config backends found across engine_configs

components/backends/trtllm/engine_configs/gpt_oss/prefill.yaml (1)

23-24: UCX token uppercased — LGTM.

Consistent with decode config.

components/backends/trtllm/engine_configs/llama4/eagle/eagle_prefill.yaml (1)

36-37: Backend token normalized to DEFAULT — LGTM.

Aligned with repo-wide standardization.

components/backends/trtllm/engine_configs/deepseek_r1/mtp/mtp_prefill.yaml (1)

40-41: Backend token normalized to DEFAULT — LGTM.

No functional changes beyond enum casing.

components/backends/trtllm/engine_configs/encode.yaml (1)

29-30: Backend token normalized to DEFAULT — LGTM.

Matches TRT-LLM’s all-caps requirement.

components/backends/trtllm/engine_configs/deepseek_r1/wide_ep/wide_ep_decode.yaml (1)

65-66: Approve uppercase cache_transceiver_config backends.
All cache_transceiver_config.backend values are now uppercase (DEFAULT or UCX) across TRTLLM engine configurations—no lowercase occurrences remain.

container/build.sh (1)

101-101: Bump to tensorrt-llm==1.1.0rc1 looks good; PR text needs correction.

PR description mentions “1.0rc1”; please update it to “1.1.0rc1” for consistency with the change here.

README.md (1)

202-203: Provide exact GitHub tag or branch for v1.1.0rc1

I can’t find a tag named v1.1.0rc1 in the NVIDIA/TensorRT-LLM repo; please share the correct GitHub tag or branch link so I can confirm the BASE_TAG in docker/Dockerfile.multi.

pyproject.toml (1)

51-53: Version sync and Triton pin LGTM.

trtllm extra now uses tensorrt-llm==1.1.0rc1 with triton==3.3.1; matches stated compatibility.

Please verify CI resolves the same versions as the container by running a quick lock/resolve in the TRT-LLM environment to catch any transitive conflicts.

components/backends/trtllm/engine_configs/multimodal/decode.yaml (1)

28-29: Backend normalized to DEFAULT.

Matches TRT-LLM’s uppercase requirement.

components/backends/trtllm/engine_configs/deepseek_r1/simple/decode.yaml (1)

59-61: Backend normalized to DEFAULT.

Change is correct and consistent with other configs.

components/backends/trtllm/engine_configs/multimodal/agg.yaml (1)

28-29: Backend normalized to DEFAULT.

Looks good and consistent across multimodal configs.

components/backends/trtllm/engine_configs/decode.yaml (1)

30-31: Backend normalized to DEFAULT.

Correct per TRT-LLM API; no other changes observed.

components/backends/trtllm/src/dynamo/trtllm/request_handlers/handler_base.py (1)

25-25: Update SamplingParams import in main.py
components/backends/trtllm/src/dynamo/trtllm/main.py:18: replace

from tensorrt_llm.llmapi.llm import SamplingParams

with the correct 1.1.0rc1 import path.

⛔ Skipped due to learnings
Learnt from: tanmayv25
PR: ai-dynamo/dynamo#1391
File: examples/tensorrt_llm/common/base_engine.py:171-176
Timestamp: 2025-06-05T01:10:51.865Z
Learning: In examples/tensorrt_llm/common/base_engine.py, the _init_engine method is called only once during initialization, so direct mutation of the _default_sampling_params object during setup is safe and appropriate.
components/backends/trtllm/src/dynamo/trtllm/main.py (1)

18-19: Version bump to 1.1.0rc1 is consistent
All tensorrt-llm==1.1.0rc1 references are correctly updated in pyproject.toml, build.sh, and README.md.

Copy link
Contributor

@nv-kmcgill53 nv-kmcgill53 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update your PR description with the correct version of TRTLLM you are upgrading to. 1.1.0rc1

Copy link
Contributor

@KrishnanPrash KrishnanPrash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to follow-up on our discussion, were the from tensorrt_llm import MultimodalEncoder and related changes not needed for tensorrt_llm==1.1.0rc1 [Link to Changes]

Will we need to make these changes when we bump up to tensorrt_llm>=1.1.0rc2?

@indrajit96 indrajit96 changed the title chore: Update trtllm version to 1.1.0rc1 chore: Update trtllm version to 1.1.0rc3 Sep 8, 2025
@indrajit96
Copy link
Contributor Author

Just wanted to follow-up on our discussion, were the from tensorrt_llm import MultimodalEncoder and related changes not needed for tensorrt_llm==1.1.0rc1 [Link to Changes]

Will we need to make these changes when we bump up to tensorrt_llm>=1.1.0rc2?

They are part of 1.1.0rc3 so bumping up to the same

Copy link
Contributor

@dmitry-tokarev-nv dmitry-tokarev-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please refresh DEFAULT_EXPERIMENTAL_TRTLLM_COMMIT

Copy link
Contributor

@nv-kmcgill53 nv-kmcgill53 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into those upstream version questions. All my observations have been addressed.

@indrajit96 indrajit96 enabled auto-merge (squash) September 9, 2025 00:59
@rmccorm4
Copy link
Contributor

rmccorm4 commented Sep 9, 2025

@indrajit96 you'll need to retroactively sign your unsigned commits git commit --signoff ... to get DCO check to pass - ops team had a script to help there as well.

Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: Indrajit Bhosale <[email protected]>
@indrajit96 indrajit96 force-pushed the ibhosale_trtllm_update branch from d032b99 to fe5c619 Compare September 9, 2025 16:55
@nv-anants
Copy link
Contributor

/ok to test 0656523

@indrajit96 indrajit96 merged commit e8cb972 into main Sep 9, 2025
13 of 14 checks passed
@indrajit96 indrajit96 deleted the ibhosale_trtllm_update branch September 9, 2025 21:30
zhongdaor-nv pushed a commit that referenced this pull request Sep 15, 2025
Signed-off-by: Indrajit Bhosale <[email protected]>
Signed-off-by: zhongdaor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants