-
Notifications
You must be signed in to change notification settings - Fork 738
feat: vllm container - CUDA 13 #4763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Dmitry Tokarev <[email protected]>
WalkthroughCUDA version upgraded from 12.9 to 13.0 across build and container infrastructure. Base image updated to nvidia/cuda-dl-base, runtime dependencies revised, and vLLM installation refactored to use prebuilt PyPI wheels with a forked repository for nvshmem support. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
container/Dockerfile.vllm(4 hunks)container/build.sh(2 hunks)container/deps/vllm/install_vllm.sh(2 hunks)
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: ptarasiewiczNV
Repo: ai-dynamo/dynamo PR: 2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.
📚 Learning: 2025-12-02T18:13:40.065Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 4698
File: .github/workflows/container-validation-dynamo.yml:68-68
Timestamp: 2025-12-02T18:13:40.065Z
Learning: In the ai-dynamo/dynamo repository, backend-specific tests (vllm, sglang, trtllm) are intentionally excluded from the container-validation-dynamo.yml workflow using "not (vllm or sglang or trtllm)" because they run in a separate container-validation-backends.yml workflow that has dedicated jobs for each backend. This separation keeps framework-agnostic tests separate from backend-specific tests.
Applied to files:
container/build.sh
📚 Learning: 2025-08-18T16:52:15.659Z
Learnt from: nnshah1
Repo: ai-dynamo/dynamo PR: 2489
File: container/deps/vllm/install_vllm.sh:151-152
Timestamp: 2025-08-18T16:52:15.659Z
Learning: The VLLM_PRECOMPILED_WHEEL_LOCATION environment variable, when exported, automatically triggers vLLM's build system to use the precompiled wheel instead of building from source, even when using standard `uv pip install .` commands in container/deps/vllm/install_vllm.sh.
Applied to files:
container/deps/vllm/install_vllm.sh
📚 Learning: 2025-07-22T10:22:28.972Z
Learnt from: ptarasiewiczNV
Repo: ai-dynamo/dynamo PR: 2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.
Applied to files:
container/deps/vllm/install_vllm.shcontainer/Dockerfile.vllm
📚 Learning: 2025-07-21T00:10:56.947Z
Learnt from: zaristei
Repo: ai-dynamo/dynamo PR: 2020
File: container/deps/vllm/install_vllm.sh:115-118
Timestamp: 2025-07-21T00:10:56.947Z
Learning: Graceful fallback for PyTorch wheel installation is broken on ARM architecture, so immediate exit on pinned version failure is preferred over fallback mechanisms in container/deps/vllm/install_vllm.sh for ARM64.
Applied to files:
container/deps/vllm/install_vllm.sh
📚 Learning: 2025-08-18T16:52:15.659Z
Learnt from: nnshah1
Repo: ai-dynamo/dynamo PR: 2489
File: container/deps/vllm/install_vllm.sh:151-152
Timestamp: 2025-08-18T16:52:15.659Z
Learning: The VLLM_PRECOMPILED_WHEEL_LOCATION environment variable is an official vLLM environment variable that, when exported, automatically triggers vLLM's build system to use the specified precompiled wheel instead of building from source. This works even with standard `uv pip install .` commands without requiring explicit reference to the variable in the install command. The vLLM build system internally detects and uses this environment variable.
Applied to files:
container/deps/vllm/install_vllm.sh
📚 Learning: 2025-08-30T20:43:49.632Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.
Applied to files:
container/Dockerfile.vllm
📚 Learning: 2025-12-03T01:14:42.094Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 4657
File: container/Dockerfile.vllm:264-265
Timestamp: 2025-12-03T01:14:42.094Z
Learning: In container/Dockerfile.vllm, the recursive chmod -R g+w operation during early user setup (at lines 264-265, when creating the dynamo user and initializing /workspace, /home/dynamo, /opt/dynamo) is an intentional exception to the pattern of avoiding recursive operations, as it handles pre-existing paths and dotfiles created by useradd -m before bulk content is copied.
Applied to files:
container/Dockerfile.vllm
📚 Learning: 2025-08-05T22:51:59.230Z
Learnt from: dmitry-tokarev-nv
Repo: ai-dynamo/dynamo PR: 2300
File: pyproject.toml:64-66
Timestamp: 2025-08-05T22:51:59.230Z
Learning: The ai-dynamo/dynamo project does not ship ARM64 wheels, so platform markers to restrict dependencies to x86_64 are not needed in pyproject.toml dependencies.
Applied to files:
container/Dockerfile.vllm
📚 Learning: 2025-12-03T01:04:32.053Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 4657
File: container/Dockerfile.sglang:216-223
Timestamp: 2025-12-03T01:04:32.053Z
Learning: In container/Dockerfile.sglang, the recursive chown -R and chmod -R operations during early user setup (when creating the dynamo user and initializing /sgl-workspace, /workspace, /home/dynamo, /opt/dynamo) are intentional exceptions to the pattern of avoiding recursive operations, as they handle pre-existing paths and dotfiles created by useradd -m.
Applied to files:
container/Dockerfile.vllm
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: sglang (amd64)
- GitHub Check: operator (amd64)
- GitHub Check: vllm (amd64)
- GitHub Check: trtllm (amd64)
- GitHub Check: trtllm (arm64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (5)
container/Dockerfile.vllm (2)
276-284: LGTM: pip --no-cache optimization applied consistently.The addition of
--no-cacheflags to pip install commands reduces image layer size and prevents stale cache issues during rebuilds. This is applied consistently across runtime wheel installation and kvbm optional dependencies.
191-192: CUDA 13.0 library symlink names are correct.The symlinks to libcublas.so.13 and libcublasLt.so.13 are the correct SONAMEs for CUDA 13.0 runtime. No changes needed.
container/deps/vllm/install_vllm.sh (2)
14-15: LGTM: CUDA 13.0 and VLLM version parameterization applied correctly.The refactoring to parameterize VLLM_VER and VLLM_REF, along with the CUDA_VERSION update to 13.0 and derived TORCH_BACKEND (cu130), provides good flexibility for future version updates. The parameter validation in the help text and argument parsing is comprehensive.
Also applies to: 25-25, 100-100
146-146: UV_NO_CACHE=1 is valid but only controls uv's cache, not pip's.
UV_NO_CACHE=1is the official uv environment variable (available since v0.1.2) equivalent to--no-cacheand correctly prevents uv from using its cache. However, this does not disable pip's internal HTTP/wheel cache. If complete cache bypass is needed for kernel compilation, also setPIP_NO_CACHE_DIR=1or ensure the called scripts (install_deepgemm.sh,install_python_libraries.sh) use--no-cache-dirwith theiruv pip installcommands.Also applies to: 148-148, 155-155
container/build.sh (1)
109-109: Base image tag is valid and available.The VLLM_BASE_IMAGE_TAG "25.10-cuda13.0-devel-ubuntu24.04" exists on nvcr.io/nvidia/cuda-dl-base and is officially published by NVIDIA. The devel variant includes CUDA 13.0 runtime and development tools as expected.
| echo "\n=== Cloning vLLM repository ===" | ||
| # Clone needed for DeepGEMM and EP kernels install scripts | ||
| cd $INSTALLATION_DIR | ||
| git clone https://github.com/dmitry-tokarev-nv/vllm vllm # TODO: switch to official repo when the nvshmem fix is merged | ||
| cd vllm | ||
| git checkout nvshmem-3.3.24-cuda-13 | ||
| echo "✓ vLLM repository cloned" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
dmitry-tokarev-nv vllm nvshmem-3.3.24-cuda-13 fork maintained merge
💡 Result:
I couldn't find a GitHub fork or branch named "dmitry-tokarev-nv" or a branch/PR called "nvshmem-3.3.24-cuda-13" in the vLLM repo or forks. vLLM mainline and related issues reference CUDA/compatibility work, and NVSHMEM 3.3.24 (CUDA 13 support) is an NVIDIA release, but there’s no public evidence the specific fork/merge you named exists or was merged. [1][2][3]
Sources:
- NVSHMEM 3.3.24 release notes (CUDA 13 compatibility). [1]
- vLLM docs / repo (mentions CUDA build binaries and ongoing CUDA compatibility issues). [2][3]
🏁 Script executed:
# Search for any related documentation or comments about this fork in the codebase
rg -i "dmitry-tokarev-nv|nvshmem.*cuda" --type sh --type md --type txtRepository: ai-dynamo/dynamo
Length of output: 287
🏁 Script executed:
# Check if there are any GitHub issues or documentation referencing this fork
find . -type f \( -name "*.md" -o -name "*.txt" -o -name "*.sh" \) | xargs grep -l "nvshmem\|dmitry-tokarev" 2>/dev/null | head -20Repository: ai-dynamo/dynamo
Length of output: 97
Replace fork reference with official vLLM repository or document internal fork access.
The referenced fork (dmitry-tokarev-nv/vllm) and branch (nvshmem-3.3.24-cuda-13) cannot be publicly verified to exist on GitHub. This introduces a critical risk:
- Build will fail immediately if the fork is private, archived, or deleted
- No public documentation or merge timeline exists for the nvshmem fix
- Production build flow depends on an unverifiable external resource
Either:
- Use the official vLLM repository and specify a concrete version/branch
- Host the fork internally with documented access controls and maintenance guarantees
- Provide explicit documentation of the fork's location and expected lifecycle
Signed-off-by: Dmitry Tokarev <[email protected]>
…lation Signed-off-by: Dmitry Tokarev <[email protected]>
2a75737 to
9549ebb
Compare
|
/ok to test 9549ebb |
Signed-off-by: Dmitry Tokarev <[email protected]>
Signed-off-by: Dmitry Tokarev <[email protected]>
6b32c65 to
ef1066f
Compare
… ARM Signed-off-by: Dmitry Tokarev <[email protected]>
Signed-off-by: Dmitry Tokarev <[email protected]>
Overview:
TODO:
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.