-
Notifications
You must be signed in to change notification settings - Fork 751
fix(build): Pin cuda-python>=12,<13 to avoid trtllm breakage (#2379) #2396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughUpdates Dockerfile.tensorrt_llm to install cuda-python (>=12,<13) before installing TensorRT-LLM from the extra index, retains conditional triton 3.3.1 install on amd64, and adds installation of local ai_dynamo_runtime*, ai_dynamo*, and nixl* wheels in both build and runtime sections. Includes a note on temporary cuda-python pin. Changes
Sequence Diagram(s)sequenceDiagram
participant Docker as Docker build
participant Pip as pip
participant Index as Extra Index (TensorRT-LLM)
participant Wheelhouse as Local wheelhouse
Docker->>Pip: install cuda-python (>=12,<13)
alt ARCH=amd64
Docker->>Pip: install triton==3.3.1
end
Docker->>Pip: install TensorRT-LLM from extra-index
Docker->>Pip: install ai_dynamo_runtime*, ai_dynamo*, nixl* (local wheels)
Note over Docker,Pip: Same sequence repeated in runtime stage
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
container/Dockerfile.tensorrt_llm (1)
154-174: Also pin cuda-python<13 in the build-stage TRT-LLM installThe pin is applied in runtime but not in the build stage. Since TRT-LLM is installed here too (from either a local wheel or index), cuda-python 13 can still slip in and break the build. Install the pin right after uninstalling tensorrt so it applies to both local and index paths.
Apply this diff inside the existing RUN:
RUN [ -f /etc/pip/constraint.txt ] && : > /etc/pip/constraint.txt || true && \ - pip uninstall -y tensorrt && \ + pip uninstall -y tensorrt && \ + pip install "cuda-python>=12,<13" && \ if [ "$HAS_TRTLLM_CONTEXT" = "1" ]; then \ # Install from local wheel directory in build context WHEEL_FILE=$(find /trtllm_wheel -name "*.whl" | head -n 1); \ if [ -n "$WHEEL_FILE" ]; then \ pip install "$WHEEL_FILE"; \ if [ "$ARCH" = "amd64" ]; then \ pip install "triton==3.3.1"; \ fi; \ else \ echo "No wheel file found in /trtllm_wheel directory."; \ exit 1; \ fi; \ else \ # Install TensorRT-LLM wheel from the provided index URL, allow dependencies from PyPI pip install --extra-index-url "${TENSORRTLLM_INDEX_URL}" "${TENSORRTLLM_PIP_WHEEL}"; \ if [ "$ARCH" = "amd64" ]; then \ pip install "triton==3.3.1"; \ fi; \ fi
🧹 Nitpick comments (2)
container/Dockerfile.tensorrt_llm (2)
490-492: Nit: Use uv pip for triton for consistency in venvSince runtime installs into VIRTUAL_ENV via uv, prefer uv pip here for consistency and clearer intent.
- if [ "$ARCH" = "amd64" ]; then \ - pip install "triton==3.3.1"; \ - fi; \ + if [ "$ARCH" = "amd64" ]; then \ + uv pip install "triton==3.3.1"; \ + fi; \
147-154: Clarify scope of the “cannot use uv venv for TRT-LLM” noteThis note says we can’t use a uv venv for TRT-LLM, yet in the runtime stage we install TRT-LLM inside the uv-created venv. If the restriction only applies to the build stage, consider updating the comment to avoid confusion.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
container/Dockerfile.tensorrt_llm(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Mirror Repository to GitLab
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
container/Dockerfile.tensorrt_llm (1)
484-493: Pinning cuda-python<13 before TRT-LLM install (runtime) — looks goodThis achieves the stated goal and should avoid the TRT-LLM breakage with cuda-python 13.
…2396) Signed-off-by: Hannah Zhang <[email protected]>
Overview:
Cherry-pick TRTLLM cuda-python13 build fix back to main from release/0.4.0
Fixes this error at runtime when importing tensorrt_llm from cuda-python 13 getting pulled in:
See #2379
Summary by CodeRabbit
Bug Fixes
Chores