fix(build): Pin cuda-python>=12,<13 to avoid trtllm breakage (#2379) #2396

rmccorm4 · 2025-08-11T17:23:00Z

Overview:

Cherry-pick TRTLLM cuda-python13 build fix back to main from release/0.4.0

Fixes this error at runtime when importing tensorrt_llm from cuda-python 13 getting pulled in:

cannot import name 'cuda' from 'cuda' (unknown location)

See #2379

Summary by CodeRabbit

Bug Fixes
- Improved container compatibility by pinning CUDA Python to the 12.x range during TensorRT-LLM installation.
- Preserves platform-specific Triton installation for amd64.
Chores
- Reworked install sequence to staged pip steps for more reliable dependency resolution.
- Added installation of local wheels from the workspace in both build and runtime flows.
- Applied changes across both main build/runtime and runtime-only sections to ensure consistent packaging behavior.

coderabbitai · 2025-08-11T17:27:05Z

Walkthrough

Updates Dockerfile.tensorrt_llm to install cuda-python (>=12,<13) before installing TensorRT-LLM from the extra index, retains conditional triton 3.3.1 install on amd64, and adds installation of local ai_dynamo_runtime*, ai_dynamo*, and nixl* wheels in both build and runtime sections. Includes a note on temporary cuda-python pin.

Changes

Cohort / File(s)	Summary of changes
TensorRT-LLM install flow update (build stage) `container/Dockerfile.tensorrt_llm`	Split pip install: first cuda-python>=12,<13, then TensorRT-LLM from extra-index; preserve amd64-only triton==3.3.1 install; append install of local ai_dynamo_runtime, ai_dynamo, nixl* wheels; add explanatory note about temporary cuda-python constraint.
Mirrored runtime-stage updates `container/Dockerfile.tensorrt_llm`	Apply the same two-step install and local wheel installations in the runtime stage, maintaining existing architecture conditionals for triton.

Sequence Diagram(s)

sequenceDiagram
    participant Docker as Docker build
    participant Pip as pip
    participant Index as Extra Index (TensorRT-LLM)
    participant Wheelhouse as Local wheelhouse

    Docker->>Pip: install cuda-python (>=12,<13)
    alt ARCH=amd64
        Docker->>Pip: install triton==3.3.1
    end
    Docker->>Pip: install TensorRT-LLM from extra-index
    Docker->>Pip: install ai_dynamo_runtime*, ai_dynamo*, nixl* (local wheels)

    Note over Docker,Pip: Same sequence repeated in runtime stage

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

fix: use wheel files for installation in trtllm build #2372: Also adjusts Dockerfile.tensorrt_llm to modify local ai_dynamo/nixl wheel installation steps.
fix: triton ver lock fix and trtllm env var declaration #2300: Changes pip install order and Triton handling in the same Dockerfile, similar to this PR’s sequencing.
feat: add runtime image for trtllm container build #1796: Updates TensorRT-LLM installation and includes ai_dynamo/nixl wheels in runtime/build stages.

Poem

I hop through layers, stage by stage,
Pinning CUDA, turning the page.
Triton for amd, wheels aligned,
TensorRT-LLM, neatly timed.
In my burrow of bytes I cheer—
Clean installs now, carrots near! 🥕🐇

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🔭 Outside diff range comments (1)

container/Dockerfile.tensorrt_llm (1)

154-174: Also pin cuda-python<13 in the build-stage TRT-LLM install

The pin is applied in runtime but not in the build stage. Since TRT-LLM is installed here too (from either a local wheel or index), cuda-python 13 can still slip in and break the build. Install the pin right after uninstalling tensorrt so it applies to both local and index paths.

Apply this diff inside the existing RUN:

 RUN [ -f /etc/pip/constraint.txt ] && : > /etc/pip/constraint.txt || true && \
-    pip uninstall -y tensorrt && \
+    pip uninstall -y tensorrt && \
+    pip install "cuda-python>=12,<13" && \
     if [ "$HAS_TRTLLM_CONTEXT" = "1" ]; then \
         # Install from local wheel directory in build context
         WHEEL_FILE=$(find /trtllm_wheel -name "*.whl" | head -n 1); \
         if [ -n "$WHEEL_FILE" ]; then \
             pip install "$WHEEL_FILE"; \
             if [ "$ARCH" = "amd64" ]; then \
                 pip install "triton==3.3.1"; \
             fi; \
         else \
             echo "No wheel file found in /trtllm_wheel directory."; \
             exit 1; \
         fi; \
     else \
         # Install TensorRT-LLM wheel from the provided index URL, allow dependencies from PyPI
         pip install --extra-index-url "${TENSORRTLLM_INDEX_URL}" "${TENSORRTLLM_PIP_WHEEL}"; \
         if [ "$ARCH" = "amd64" ]; then \
             pip install "triton==3.3.1"; \
         fi; \
     fi

🧹 Nitpick comments (2)

container/Dockerfile.tensorrt_llm (2)
490-492: Nit: Use uv pip for triton for consistency in venv

Since runtime installs into VIRTUAL_ENV via uv, prefer uv pip here for consistency and clearer intent.
-    if [ "$ARCH" = "amd64" ]; then \
-        pip install "triton==3.3.1"; \
-    fi; \
+    if [ "$ARCH" = "amd64" ]; then \
+        uv pip install "triton==3.3.1"; \
+    fi; \
147-154: Clarify scope of the “cannot use uv venv for TRT-LLM” note

This note says we can’t use a uv venv for TRT-LLM, yet in the runtime stage we install TRT-LLM inside the uv-created venv. If the restriction only applies to the build stage, consider updating the comment to avoid confusion.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4385473 and 2733550.

📒 Files selected for processing (1)

container/Dockerfile.tensorrt_llm (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Mirror Repository to GitLab
GitHub Check: Build and Test - dynamo

🔇 Additional comments (1)

container/Dockerfile.tensorrt_llm (1)

484-493: Pinning cuda-python<13 before TRT-LLM install (runtime) — looks good

This achieves the stated goal and should avoid the TRT-LLM breakage with cuda-python 13.

…2396) Signed-off-by: Hannah Zhang <[email protected]>

fix(build): Pin cuda-python>=12,<13 to avoid trtllm breakage (#2379)

2733550

rmccorm4 requested review from a team, alec-flowers, ishandhanani, nnshah1, ptarasiewiczNV, richardhuo-nv and tanmayv25 as code owners August 11, 2025 17:23

pull-request-size bot added the size/XS label Aug 11, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 11, 2025 17:23 Inactive

github-actions bot added the fix label Aug 11, 2025

rmccorm4 requested a review from dmitry-tokarev-nv August 11, 2025 17:23

coderabbitai bot reviewed Aug 11, 2025

View reviewed changes

copy-pr-bot bot temporarily deployed to GITLAB August 11, 2025 17:27 Inactive

rmccorm4 enabled auto-merge (squash) August 12, 2025 21:33

tanmayv25 approved these changes Aug 12, 2025

View reviewed changes

rmccorm4 merged commit 7e4eec2 into main Aug 12, 2025
11 of 12 checks passed

rmccorm4 deleted the rmccormick/cp-trtllm-cuda13-fix branch August 12, 2025 21:33

hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025

fix(build): Pin cuda-python>=12,<13 to avoid trtllm breakage (#2379) (#…

6fbfaa9

…2396) Signed-off-by: Hannah Zhang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(build): Pin cuda-python>=12,<13 to avoid trtllm breakage (#2379) #2396

fix(build): Pin cuda-python>=12,<13 to avoid trtllm breakage (#2379) #2396

Uh oh!

rmccorm4 commented Aug 11, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Aug 11, 2025

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix(build): Pin cuda-python>=12,<13 to avoid trtllm breakage (#2379) #2396

fix(build): Pin cuda-python>=12,<13 to avoid trtllm breakage (#2379) #2396

Uh oh!

Conversation

rmccorm4 commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 11, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rmccorm4 commented Aug 11, 2025 •

edited

Loading