feat: vllm container - CUDA 13 #4763

dmitry-tokarev-nv · 2025-12-05T00:03:57Z

Overview:

TODO:

handle arm - build from source
do not replace cuda 12 container, only add cuda 13, update build.sh and CI logic to build/test cuda 13
test 25.11 base container

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

Chores
- Updated CUDA runtime support from version 12.9 to 13.0
- Updated container base images to latest compatible versions
- Optimized Python package installation with cache improvements
- Enhanced vLLM installation process using prebuilt wheels
- Improved build configuration for ARM64 environments

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Dmitry Tokarev <[email protected]>

coderabbitai · 2025-12-05T00:07:46Z

Walkthrough

CUDA version upgraded from 12.9 to 13.0 across build and container infrastructure. Base image updated to nvidia/cuda-dl-base, runtime dependencies revised, and vLLM installation refactored to use prebuilt PyPI wheels with a forked repository for nvshmem support.

Changes

Cohort / File(s)	Summary
CUDA 13.0 Upgrade and Base Image Configuration `container/Dockerfile.vllm`, `container/build.sh`	Updated CUDA version from 12.9 to 13.0 across Dockerfile and build scripts. Changed base image from `nvcr.io/nvidia/cuda:12.9-devel-ubuntu24.04` to `nvcr.io/nvidia/cuda-dl-base:25.10-cuda13.0-devel-ubuntu24.04`. Updated libcublas/libcublasLt symlinks to version 13, added CPATH environment variable, adjusted apt package dependencies, and updated build arguments (VLLM_BASE_IMAGE_TAG, RUNTIME_IMAGE_TAG, TORCH_BACKEND).
vLLM Installation Workflow Refactoring `container/deps/vllm/install_vllm.sh`	Introduced VLLM_VER and VLLM_REF parameterization, replaced direct GitHub vLLM cloning with prebuilt wheel installation from PyPI, added forked nvshmem-variant repository cloning (nvshmem-3.3.24-cuda-13 branch), relocated repository operations before DeepGEMM/EP kernel builds, and enabled UV_NO_CACHE=1 for all pip operations. Minor fixes include typo correction and clarified comments.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Focus areas:
- container/deps/vllm/install_vllm.sh — Significant restructuring of installation flow; verify wheel URL construction logic, repository branching strategy, and UV_NO_CACHE flag placement
- container/Dockerfile.vllm — Validate CPATH environment variable correctness and apt package availability in new base image
- Cross-file consistency — Ensure CUDA version changes (12.9 → 13.0) and torch backend updates align across all three files

Poem

🐰 Docker wheels do spin and whirl,
CUDA thirteen takes a twirl,
From point-nine to the latest call,
PyPI wheels deliver all!
Nvshmem forks now lead the way,
Hops of code for GPU day! 🐾

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The pull request description is incomplete and contains TODO items rather than substantive information about the changes made.	Replace TODO items with detailed descriptions of the actual changes implemented. Provide specifics on CUDA upgrades, file modifications, and testing performed. Fill in actual GitHub issue number instead of placeholder #xxx.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: vllm container - CUDA 13' clearly and directly summarizes the main change: upgrading the vLLM container to use CUDA 13.0.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8c4a5f and d0e46c7.

📒 Files selected for processing (3)

container/Dockerfile.vllm (4 hunks)
container/build.sh (2 hunks)
container/deps/vllm/install_vllm.sh (2 hunks)

🧰 Additional context used

🧠 Learnings (10)

📓 Common learnings

Learnt from: ptarasiewiczNV
Repo: ai-dynamo/dynamo PR: 2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.

📚 Learning: 2025-12-02T18:13:40.065Z

Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 4698
File: .github/workflows/container-validation-dynamo.yml:68-68
Timestamp: 2025-12-02T18:13:40.065Z
Learning: In the ai-dynamo/dynamo repository, backend-specific tests (vllm, sglang, trtllm) are intentionally excluded from the container-validation-dynamo.yml workflow using "not (vllm or sglang or trtllm)" because they run in a separate container-validation-backends.yml workflow that has dedicated jobs for each backend. This separation keeps framework-agnostic tests separate from backend-specific tests.

Applied to files:

container/build.sh

📚 Learning: 2025-08-18T16:52:15.659Z

Learnt from: nnshah1
Repo: ai-dynamo/dynamo PR: 2489
File: container/deps/vllm/install_vllm.sh:151-152
Timestamp: 2025-08-18T16:52:15.659Z
Learning: The VLLM_PRECOMPILED_WHEEL_LOCATION environment variable, when exported, automatically triggers vLLM's build system to use the precompiled wheel instead of building from source, even when using standard `uv pip install .` commands in container/deps/vllm/install_vllm.sh.

Applied to files:

container/deps/vllm/install_vllm.sh

📚 Learning: 2025-07-22T10:22:28.972Z

Learnt from: ptarasiewiczNV
Repo: ai-dynamo/dynamo PR: 2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.

Applied to files:

container/deps/vllm/install_vllm.sh
container/Dockerfile.vllm

📚 Learning: 2025-07-21T00:10:56.947Z

Learnt from: zaristei
Repo: ai-dynamo/dynamo PR: 2020
File: container/deps/vllm/install_vllm.sh:115-118
Timestamp: 2025-07-21T00:10:56.947Z
Learning: Graceful fallback for PyTorch wheel installation is broken on ARM architecture, so immediate exit on pinned version failure is preferred over fallback mechanisms in container/deps/vllm/install_vllm.sh for ARM64.

Applied to files:

container/deps/vllm/install_vllm.sh

📚 Learning: 2025-08-18T16:52:15.659Z

Learnt from: nnshah1
Repo: ai-dynamo/dynamo PR: 2489
File: container/deps/vllm/install_vllm.sh:151-152
Timestamp: 2025-08-18T16:52:15.659Z
Learning: The VLLM_PRECOMPILED_WHEEL_LOCATION environment variable is an official vLLM environment variable that, when exported, automatically triggers vLLM's build system to use the specified precompiled wheel instead of building from source. This works even with standard `uv pip install .` commands without requiring explicit reference to the variable in the install command. The vLLM build system internally detects and uses this environment variable.

Applied to files:

container/deps/vllm/install_vllm.sh

📚 Learning: 2025-08-30T20:43:49.632Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.

Applied to files:

container/Dockerfile.vllm

📚 Learning: 2025-12-03T01:14:42.094Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 4657
File: container/Dockerfile.vllm:264-265
Timestamp: 2025-12-03T01:14:42.094Z
Learning: In container/Dockerfile.vllm, the recursive chmod -R g+w operation during early user setup (at lines 264-265, when creating the dynamo user and initializing /workspace, /home/dynamo, /opt/dynamo) is an intentional exception to the pattern of avoiding recursive operations, as it handles pre-existing paths and dotfiles created by useradd -m before bulk content is copied.

Applied to files:

container/Dockerfile.vllm

📚 Learning: 2025-08-05T22:51:59.230Z

Learnt from: dmitry-tokarev-nv
Repo: ai-dynamo/dynamo PR: 2300
File: pyproject.toml:64-66
Timestamp: 2025-08-05T22:51:59.230Z
Learning: The ai-dynamo/dynamo project does not ship ARM64 wheels, so platform markers to restrict dependencies to x86_64 are not needed in pyproject.toml dependencies.

Applied to files:

container/Dockerfile.vllm

📚 Learning: 2025-12-03T01:04:32.053Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 4657
File: container/Dockerfile.sglang:216-223
Timestamp: 2025-12-03T01:04:32.053Z
Learning: In container/Dockerfile.sglang, the recursive chown -R and chmod -R operations during early user setup (when creating the dynamo user and initializing /sgl-workspace, /workspace, /home/dynamo, /opt/dynamo) are intentional exceptions to the pattern of avoiding recursive operations, as they handle pre-existing paths and dotfiles created by useradd -m.

Applied to files:

container/Dockerfile.vllm

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: sglang (amd64)
GitHub Check: operator (amd64)
GitHub Check: vllm (amd64)
GitHub Check: trtllm (amd64)
GitHub Check: trtllm (arm64)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (5)

container/Dockerfile.vllm (2)

276-284: LGTM: pip --no-cache optimization applied consistently.

The addition of --no-cache flags to pip install commands reduces image layer size and prevents stale cache issues during rebuilds. This is applied consistently across runtime wheel installation and kvbm optional dependencies.

191-192: CUDA 13.0 library symlink names are correct.

The symlinks to libcublas.so.13 and libcublasLt.so.13 are the correct SONAMEs for CUDA 13.0 runtime. No changes needed.

container/deps/vllm/install_vllm.sh (2)

14-15: LGTM: CUDA 13.0 and VLLM version parameterization applied correctly.

The refactoring to parameterize VLLM_VER and VLLM_REF, along with the CUDA_VERSION update to 13.0 and derived TORCH_BACKEND (cu130), provides good flexibility for future version updates. The parameter validation in the help text and argument parsing is comprehensive.

Also applies to: 25-25, 100-100

146-146: UV_NO_CACHE=1 is valid but only controls uv's cache, not pip's.

UV_NO_CACHE=1 is the official uv environment variable (available since v0.1.2) equivalent to --no-cache and correctly prevents uv from using its cache. However, this does not disable pip's internal HTTP/wheel cache. If complete cache bypass is needed for kernel compilation, also set PIP_NO_CACHE_DIR=1 or ensure the called scripts (install_deepgemm.sh, install_python_libraries.sh) use --no-cache-dir with their uv pip install commands.

Also applies to: 148-148, 155-155

container/build.sh (1)

109-109: Base image tag is valid and available.

The VLLM_BASE_IMAGE_TAG "25.10-cuda13.0-devel-ubuntu24.04" exists on nvcr.io/nvidia/cuda-dl-base and is officially published by NVIDIA. The devel variant includes CUDA 13.0 runtime and development tools as expected.

container/deps/vllm/install_vllm.sh

coderabbitai · 2025-12-05T00:07:49Z

container/deps/vllm/install_vllm.sh

+echo "\n=== Cloning vLLM repository ==="
+# Clone needed for DeepGEMM and EP kernels install scripts
+cd $INSTALLATION_DIR
+git clone https://github.com/dmitry-tokarev-nv/vllm vllm # TODO: switch to official repo when the nvshmem fix is merged
+cd vllm
+git checkout nvshmem-3.3.24-cuda-13
+echo "✓ vLLM repository cloned"


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

dmitry-tokarev-nv vllm nvshmem-3.3.24-cuda-13 fork maintained merge

💡 Result:

I couldn't find a GitHub fork or branch named "dmitry-tokarev-nv" or a branch/PR called "nvshmem-3.3.24-cuda-13" in the vLLM repo or forks. vLLM mainline and related issues reference CUDA/compatibility work, and NVSHMEM 3.3.24 (CUDA 13 support) is an NVIDIA release, but there’s no public evidence the specific fork/merge you named exists or was merged. [1][2][3]

Sources:

NVSHMEM 3.3.24 release notes (CUDA 13 compatibility). [1]

vLLM docs / repo (mentions CUDA build binaries and ongoing CUDA compatibility issues). [2][3]

🏁 Script executed:

# Search for any related documentation or comments about this fork in the codebase rg -i "dmitry-tokarev-nv|nvshmem.*cuda" --type sh --type md --type txt

Repository: ai-dynamo/dynamo

Length of output: 287

🏁 Script executed:

# Check if there are any GitHub issues or documentation referencing this fork find . -type f \( -name "*.md" -o -name "*.txt" -o -name "*.sh" \) | xargs grep -l "nvshmem\|dmitry-tokarev" 2>/dev/null | head -20

Repository: ai-dynamo/dynamo

Length of output: 97

Replace fork reference with official vLLM repository or document internal fork access.

The referenced fork (dmitry-tokarev-nv/vllm) and branch (nvshmem-3.3.24-cuda-13) cannot be publicly verified to exist on GitHub. This introduces a critical risk:

Build will fail immediately if the fork is private, archived, or deleted

No public documentation or merge timeline exists for the nvshmem fix

Production build flow depends on an unverifiable external resource

Either:

Use the official vLLM repository and specify a concrete version/branch

Host the fork internally with documented access controls and maintenance guarantees

Provide explicit documentation of the fork's location and expected lifecycle

container/Dockerfile.vllm

container/deps/vllm/install_vllm.sh

Signed-off-by: Dmitry Tokarev <[email protected]>

…lation Signed-off-by: Dmitry Tokarev <[email protected]>

dmitry-tokarev-nv · 2025-12-08T20:51:07Z

/ok to test 9549ebb

Signed-off-by: Dmitry Tokarev <[email protected]>

… ARM Signed-off-by: Dmitry Tokarev <[email protected]>

Signed-off-by: Dmitry Tokarev <[email protected]>

vllm container - CUDA 13

d0e46c7

Signed-off-by: Dmitry Tokarev <[email protected]>

dmitry-tokarev-nv requested review from a team as code owners December 5, 2025 00:03

pull-request-size bot added the size/M label Dec 5, 2025

github-actions bot added the feat label Dec 5, 2025

coderabbitai bot reviewed Dec 5, 2025

View reviewed changes

oandreeva-nv reviewed Dec 5, 2025

View reviewed changes

container/deps/vllm/install_vllm.sh Show resolved Hide resolved

oandreeva-nv reviewed Dec 5, 2025

View reviewed changes

container/deps/vllm/install_vllm.sh Outdated Show resolved Hide resolved

karen-sy reviewed Dec 5, 2025

View reviewed changes

container/deps/vllm/install_vllm.sh Show resolved Hide resolved

dmitry-tokarev-nv marked this pull request as draft December 5, 2025 17:02

Merge branch 'main' into dtokarev/vllm-cuda-13

ba2afb9

copy-pr-bot bot had a problem deploying to GITLAB December 5, 2025 19:52 Failure

bump to 25.11

f12a85f

Signed-off-by: Dmitry Tokarev <[email protected]>

copy-pr-bot bot had a problem deploying to GITLAB December 5, 2025 22:28 Failure

pull-request-size bot added size/L and removed size/M labels Dec 8, 2025

copy-pr-bot bot temporarily deployed to GITLAB December 8, 2025 20:31 Inactive

copy-pr-bot bot temporarily deployed to GITLAB December 8, 2025 20:42 Inactive

Support building vllm from source on arm. Fix nixl and lmcache instal…

9549ebb

…lation Signed-off-by: Dmitry Tokarev <[email protected]>

dmitry-tokarev-nv force-pushed the dtokarev/vllm-cuda-13 branch from 2a75737 to 9549ebb Compare December 8, 2025 20:49

copy-pr-bot bot temporarily deployed to GITLAB December 8, 2025 20:49 Inactive

dmitry-tokarev-nv self-assigned this Dec 8, 2025

copy-pr-bot bot temporarily deployed to GITLAB December 8, 2025 20:59 Inactive

switch to upstream vllm + cherry-pick nvshmem fix

863e593

Signed-off-by: Dmitry Tokarev <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB December 8, 2025 21:31 Inactive

copy-pr-bot bot temporarily deployed to GITLAB December 8, 2025 21:37 Inactive

copy-pr-bot bot had a problem deploying to GITLAB December 9, 2025 21:02 Failure

updated github workflow for vllm to use cuda 13

ef1066f

Signed-off-by: Dmitry Tokarev <[email protected]>

dmitry-tokarev-nv force-pushed the dtokarev/vllm-cuda-13 branch from 6b32c65 to ef1066f Compare December 9, 2025 21:02

copy-pr-bot bot had a problem deploying to GITLAB December 9, 2025 21:02 Failure

uv pip install --no-cache setuptools_scm to build vllm from source on…

44661bd

… ARM Signed-off-by: Dmitry Tokarev <[email protected]>

copy-pr-bot bot had a problem deploying to GITLAB December 9, 2025 23:40 Failure

dmitry-tokarev-nv requested review from karen-sy and oandreeva-nv and removed request for karen-sy December 10, 2025 05:02

Disable LMcache for CUDA 13

923500f

Signed-off-by: Dmitry Tokarev <[email protected]>

copy-pr-bot bot had a problem deploying to GITLAB December 11, 2025 18:51 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: vllm container - CUDA 13 #4763

feat: vllm container - CUDA 13 #4763

dmitry-tokarev-nv commented Dec 5, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Dec 5, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Dec 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dmitry-tokarev-nv commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: vllm container - CUDA 13 #4763

Are you sure you want to change the base?

feat: vllm container - CUDA 13 #4763

Conversation

dmitry-tokarev-nv commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dmitry-tokarev-nv commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dmitry-tokarev-nv commented Dec 5, 2025 •

edited

Loading

coderabbitai bot commented Dec 5, 2025 •

edited

Loading