Skip to content

Conversation

@rmccorm4
Copy link
Contributor

@rmccorm4 rmccorm4 commented Aug 11, 2025

Overview:

Cherry-pick TRTLLM cuda-python13 build fix back to main from release/0.4.0

Fixes this error at runtime when importing tensorrt_llm from cuda-python 13 getting pulled in:

cannot import name 'cuda' from 'cuda' (unknown location)

See #2379

Summary by CodeRabbit

  • Bug Fixes

    • Improved container compatibility by pinning CUDA Python to the 12.x range during TensorRT-LLM installation.
    • Preserves platform-specific Triton installation for amd64.
  • Chores

    • Reworked install sequence to staged pip steps for more reliable dependency resolution.
    • Added installation of local wheels from the workspace in both build and runtime flows.
    • Applied changes across both main build/runtime and runtime-only sections to ensure consistent packaging behavior.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 11, 2025

Walkthrough

Updates Dockerfile.tensorrt_llm to install cuda-python (>=12,<13) before installing TensorRT-LLM from the extra index, retains conditional triton 3.3.1 install on amd64, and adds installation of local ai_dynamo_runtime*, ai_dynamo*, and nixl* wheels in both build and runtime sections. Includes a note on temporary cuda-python pin.

Changes

Cohort / File(s) Summary of changes
TensorRT-LLM install flow update (build stage)
container/Dockerfile.tensorrt_llm
Split pip install: first cuda-python>=12,<13, then TensorRT-LLM from extra-index; preserve amd64-only triton==3.3.1 install; append install of local ai_dynamo_runtime*, ai_dynamo*, nixl* wheels; add explanatory note about temporary cuda-python constraint.
Mirrored runtime-stage updates
container/Dockerfile.tensorrt_llm
Apply the same two-step install and local wheel installations in the runtime stage, maintaining existing architecture conditionals for triton.

Sequence Diagram(s)

sequenceDiagram
    participant Docker as Docker build
    participant Pip as pip
    participant Index as Extra Index (TensorRT-LLM)
    participant Wheelhouse as Local wheelhouse

    Docker->>Pip: install cuda-python (>=12,<13)
    alt ARCH=amd64
        Docker->>Pip: install triton==3.3.1
    end
    Docker->>Pip: install TensorRT-LLM from extra-index
    Docker->>Pip: install ai_dynamo_runtime*, ai_dynamo*, nixl* (local wheels)

    Note over Docker,Pip: Same sequence repeated in runtime stage
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Poem

I hop through layers, stage by stage,
Pinning CUDA, turning the page.
Triton for amd, wheels aligned,
TensorRT-LLM, neatly timed.
In my burrow of bytes I cheer—
Clean installs now, carrots near! 🥕🐇


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
container/Dockerfile.tensorrt_llm (1)

154-174: Also pin cuda-python<13 in the build-stage TRT-LLM install

The pin is applied in runtime but not in the build stage. Since TRT-LLM is installed here too (from either a local wheel or index), cuda-python 13 can still slip in and break the build. Install the pin right after uninstalling tensorrt so it applies to both local and index paths.

Apply this diff inside the existing RUN:

 RUN [ -f /etc/pip/constraint.txt ] && : > /etc/pip/constraint.txt || true && \
-    pip uninstall -y tensorrt && \
+    pip uninstall -y tensorrt && \
+    pip install "cuda-python>=12,<13" && \
     if [ "$HAS_TRTLLM_CONTEXT" = "1" ]; then \
         # Install from local wheel directory in build context
         WHEEL_FILE=$(find /trtllm_wheel -name "*.whl" | head -n 1); \
         if [ -n "$WHEEL_FILE" ]; then \
             pip install "$WHEEL_FILE"; \
             if [ "$ARCH" = "amd64" ]; then \
                 pip install "triton==3.3.1"; \
             fi; \
         else \
             echo "No wheel file found in /trtllm_wheel directory."; \
             exit 1; \
         fi; \
     else \
         # Install TensorRT-LLM wheel from the provided index URL, allow dependencies from PyPI
         pip install --extra-index-url "${TENSORRTLLM_INDEX_URL}" "${TENSORRTLLM_PIP_WHEEL}"; \
         if [ "$ARCH" = "amd64" ]; then \
             pip install "triton==3.3.1"; \
         fi; \
     fi
🧹 Nitpick comments (2)
container/Dockerfile.tensorrt_llm (2)

490-492: Nit: Use uv pip for triton for consistency in venv

Since runtime installs into VIRTUAL_ENV via uv, prefer uv pip here for consistency and clearer intent.

-    if [ "$ARCH" = "amd64" ]; then \
-        pip install "triton==3.3.1"; \
-    fi; \
+    if [ "$ARCH" = "amd64" ]; then \
+        uv pip install "triton==3.3.1"; \
+    fi; \

147-154: Clarify scope of the “cannot use uv venv for TRT-LLM” note

This note says we can’t use a uv venv for TRT-LLM, yet in the runtime stage we install TRT-LLM inside the uv-created venv. If the restriction only applies to the build stage, consider updating the comment to avoid confusion.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4385473 and 2733550.

📒 Files selected for processing (1)
  • container/Dockerfile.tensorrt_llm (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Mirror Repository to GitLab
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
container/Dockerfile.tensorrt_llm (1)

484-493: Pinning cuda-python<13 before TRT-LLM install (runtime) — looks good

This achieves the stated goal and should avoid the TRT-LLM breakage with cuda-python 13.

@rmccorm4 rmccorm4 enabled auto-merge (squash) August 12, 2025 21:33
@rmccorm4 rmccorm4 merged commit 7e4eec2 into main Aug 12, 2025
11 of 12 checks passed
@rmccorm4 rmccorm4 deleted the rmccormick/cp-trtllm-cuda13-fix branch August 12, 2025 21:33
hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants