-
Notifications
You must be signed in to change notification settings - Fork 753
fix: Copy cuda libraries from devel to runtime stage #2301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe Dockerfile for the runtime image was updated to copy the entire CUDA development toolkit directory from the base development image into the runtime image. This change ensures the CUDA toolkit is present in the runtime environment, with no other modifications to dependencies, environment variables, or build steps. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Possibly related PRs
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
container/Dockerfile.sglang(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: grahamking
PR: ai-dynamo/dynamo#1177
File: container/Dockerfile.vllm:102-105
Timestamp: 2025-05-28T22:54:46.875Z
Learning: In Dockerfiles, when appending to environment variables that may not exist in the base image, Docker validation will fail if you reference undefined variables with ${VARIABLE} syntax. In such cases, setting the environment variable directly (e.g., ENV CPATH=/usr/include) rather than appending is the appropriate approach.
📚 Learning: in dockerfiles, when appending to environment variables that may not exist in the base image, docker...
Learnt from: grahamking
PR: ai-dynamo/dynamo#1177
File: container/Dockerfile.vllm:102-105
Timestamp: 2025-05-28T22:54:46.875Z
Learning: In Dockerfiles, when appending to environment variables that may not exist in the base image, Docker validation will fail if you reference undefined variables with ${VARIABLE} syntax. In such cases, setting the environment variable directly (e.g., ENV CPATH=/usr/include) rather than appending is the appropriate approach.
Applied to files:
container/Dockerfile.sglang
📚 Learning: the fault tolerance tests in tests/fault_tolerance/ are designed to run only in the mounted containe...
Learnt from: nnshah1
PR: ai-dynamo/dynamo#1444
File: tests/fault_tolerance/scenarios.py:57-57
Timestamp: 2025-07-01T15:39:56.789Z
Learning: The fault tolerance tests in tests/fault_tolerance/ are designed to run only in the mounted container environment, so hardcoded absolute paths with `/workspace/` prefix are intentional and should not be changed to relative paths.
Applied to files:
container/Dockerfile.sglang
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - vllm
|
/ok to test 37a3b49 |
pvijayakrish
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
|
/ok to test bd03999 |
Overview:
Cherry-pick from release/0.4.0 to main: #2298
We are encountering the following issue in CI and during local validation runs with the sglang runtime container:
2025-08-04T22:37:58.555820Z 01O [BASH] /bin/sh: 1: /usr/local/cuda/bin/nvcc: not found
This indicates that the cuda runtime container is missing the nvcc compiler which is a required dependency for sglang to built JIT kernels on the fly. This PR copies over cuda libraries from devel to runtime to ensure that the cuda binaries are the same across both containers. With this change, the CI tests should pass.
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit