Skip to content

Conversation

@dmitry-tokarev-nv
Copy link
Contributor

@dmitry-tokarev-nv dmitry-tokarev-nv commented Dec 5, 2025

Overview:

TODO:

  • handle arm - build from source
  • do not replace cuda 12 container, only add cuda 13, update build.sh and CI logic to build/test cuda 13
  • test 25.11 base container

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • Chores
    • Updated CUDA runtime support from version 12.9 to 13.0
    • Updated container base images to latest compatible versions
    • Optimized Python package installation with cache improvements
    • Enhanced vLLM installation process using prebuilt wheels
    • Improved build configuration for ARM64 environments

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Dmitry Tokarev <[email protected]>
@dmitry-tokarev-nv dmitry-tokarev-nv requested review from a team as code owners December 5, 2025 00:03
@github-actions github-actions bot added the feat label Dec 5, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 5, 2025

Walkthrough

CUDA version upgraded from 12.9 to 13.0 across build and container infrastructure. Base image updated to nvidia/cuda-dl-base, runtime dependencies revised, and vLLM installation refactored to use prebuilt PyPI wheels with a forked repository for nvshmem support.

Changes

Cohort / File(s) Summary
CUDA 13.0 Upgrade and Base Image Configuration
container/Dockerfile.vllm, container/build.sh
Updated CUDA version from 12.9 to 13.0 across Dockerfile and build scripts. Changed base image from nvcr.io/nvidia/cuda:12.9-devel-ubuntu24.04 to nvcr.io/nvidia/cuda-dl-base:25.10-cuda13.0-devel-ubuntu24.04. Updated libcublas/libcublasLt symlinks to version 13, added CPATH environment variable, adjusted apt package dependencies, and updated build arguments (VLLM_BASE_IMAGE_TAG, RUNTIME_IMAGE_TAG, TORCH_BACKEND).
vLLM Installation Workflow Refactoring
container/deps/vllm/install_vllm.sh
Introduced VLLM_VER and VLLM_REF parameterization, replaced direct GitHub vLLM cloning with prebuilt wheel installation from PyPI, added forked nvshmem-variant repository cloning (nvshmem-3.3.24-cuda-13 branch), relocated repository operations before DeepGEMM/EP kernel builds, and enabled UV_NO_CACHE=1 for all pip operations. Minor fixes include typo correction and clarified comments.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Focus areas:
    • container/deps/vllm/install_vllm.sh — Significant restructuring of installation flow; verify wheel URL construction logic, repository branching strategy, and UV_NO_CACHE flag placement
    • container/Dockerfile.vllm — Validate CPATH environment variable correctness and apt package availability in new base image
    • Cross-file consistency — Ensure CUDA version changes (12.9 → 13.0) and torch backend updates align across all three files

Poem

🐰 Docker wheels do spin and whirl,
CUDA thirteen takes a twirl,
From point-nine to the latest call,
PyPI wheels deliver all!
Nvshmem forks now lead the way,
Hops of code for GPU day! 🐾

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description is incomplete and contains TODO items rather than substantive information about the changes made. Replace TODO items with detailed descriptions of the actual changes implemented. Provide specifics on CUDA upgrades, file modifications, and testing performed. Fill in actual GitHub issue number instead of placeholder #xxx.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: vllm container - CUDA 13' clearly and directly summarizes the main change: upgrading the vLLM container to use CUDA 13.0.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8c4a5f and d0e46c7.

📒 Files selected for processing (3)
  • container/Dockerfile.vllm (4 hunks)
  • container/build.sh (2 hunks)
  • container/deps/vllm/install_vllm.sh (2 hunks)
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: ptarasiewiczNV
Repo: ai-dynamo/dynamo PR: 2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.
📚 Learning: 2025-12-02T18:13:40.065Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 4698
File: .github/workflows/container-validation-dynamo.yml:68-68
Timestamp: 2025-12-02T18:13:40.065Z
Learning: In the ai-dynamo/dynamo repository, backend-specific tests (vllm, sglang, trtllm) are intentionally excluded from the container-validation-dynamo.yml workflow using "not (vllm or sglang or trtllm)" because they run in a separate container-validation-backends.yml workflow that has dedicated jobs for each backend. This separation keeps framework-agnostic tests separate from backend-specific tests.

Applied to files:

  • container/build.sh
📚 Learning: 2025-08-18T16:52:15.659Z
Learnt from: nnshah1
Repo: ai-dynamo/dynamo PR: 2489
File: container/deps/vllm/install_vllm.sh:151-152
Timestamp: 2025-08-18T16:52:15.659Z
Learning: The VLLM_PRECOMPILED_WHEEL_LOCATION environment variable, when exported, automatically triggers vLLM's build system to use the precompiled wheel instead of building from source, even when using standard `uv pip install .` commands in container/deps/vllm/install_vllm.sh.

Applied to files:

  • container/deps/vllm/install_vllm.sh
📚 Learning: 2025-07-22T10:22:28.972Z
Learnt from: ptarasiewiczNV
Repo: ai-dynamo/dynamo PR: 2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.

Applied to files:

  • container/deps/vllm/install_vllm.sh
  • container/Dockerfile.vllm
📚 Learning: 2025-07-21T00:10:56.947Z
Learnt from: zaristei
Repo: ai-dynamo/dynamo PR: 2020
File: container/deps/vllm/install_vllm.sh:115-118
Timestamp: 2025-07-21T00:10:56.947Z
Learning: Graceful fallback for PyTorch wheel installation is broken on ARM architecture, so immediate exit on pinned version failure is preferred over fallback mechanisms in container/deps/vllm/install_vllm.sh for ARM64.

Applied to files:

  • container/deps/vllm/install_vllm.sh
📚 Learning: 2025-08-18T16:52:15.659Z
Learnt from: nnshah1
Repo: ai-dynamo/dynamo PR: 2489
File: container/deps/vllm/install_vllm.sh:151-152
Timestamp: 2025-08-18T16:52:15.659Z
Learning: The VLLM_PRECOMPILED_WHEEL_LOCATION environment variable is an official vLLM environment variable that, when exported, automatically triggers vLLM's build system to use the specified precompiled wheel instead of building from source. This works even with standard `uv pip install .` commands without requiring explicit reference to the variable in the install command. The vLLM build system internally detects and uses this environment variable.

Applied to files:

  • container/deps/vllm/install_vllm.sh
📚 Learning: 2025-08-30T20:43:49.632Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.

Applied to files:

  • container/Dockerfile.vllm
📚 Learning: 2025-12-03T01:14:42.094Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 4657
File: container/Dockerfile.vllm:264-265
Timestamp: 2025-12-03T01:14:42.094Z
Learning: In container/Dockerfile.vllm, the recursive chmod -R g+w operation during early user setup (at lines 264-265, when creating the dynamo user and initializing /workspace, /home/dynamo, /opt/dynamo) is an intentional exception to the pattern of avoiding recursive operations, as it handles pre-existing paths and dotfiles created by useradd -m before bulk content is copied.

Applied to files:

  • container/Dockerfile.vllm
📚 Learning: 2025-08-05T22:51:59.230Z
Learnt from: dmitry-tokarev-nv
Repo: ai-dynamo/dynamo PR: 2300
File: pyproject.toml:64-66
Timestamp: 2025-08-05T22:51:59.230Z
Learning: The ai-dynamo/dynamo project does not ship ARM64 wheels, so platform markers to restrict dependencies to x86_64 are not needed in pyproject.toml dependencies.

Applied to files:

  • container/Dockerfile.vllm
📚 Learning: 2025-12-03T01:04:32.053Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 4657
File: container/Dockerfile.sglang:216-223
Timestamp: 2025-12-03T01:04:32.053Z
Learning: In container/Dockerfile.sglang, the recursive chown -R and chmod -R operations during early user setup (when creating the dynamo user and initializing /sgl-workspace, /workspace, /home/dynamo, /opt/dynamo) are intentional exceptions to the pattern of avoiding recursive operations, as they handle pre-existing paths and dotfiles created by useradd -m.

Applied to files:

  • container/Dockerfile.vllm
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: sglang (amd64)
  • GitHub Check: operator (amd64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (5)
container/Dockerfile.vllm (2)

276-284: LGTM: pip --no-cache optimization applied consistently.

The addition of --no-cache flags to pip install commands reduces image layer size and prevents stale cache issues during rebuilds. This is applied consistently across runtime wheel installation and kvbm optional dependencies.


191-192: CUDA 13.0 library symlink names are correct.

The symlinks to libcublas.so.13 and libcublasLt.so.13 are the correct SONAMEs for CUDA 13.0 runtime. No changes needed.

container/deps/vllm/install_vllm.sh (2)

14-15: LGTM: CUDA 13.0 and VLLM version parameterization applied correctly.

The refactoring to parameterize VLLM_VER and VLLM_REF, along with the CUDA_VERSION update to 13.0 and derived TORCH_BACKEND (cu130), provides good flexibility for future version updates. The parameter validation in the help text and argument parsing is comprehensive.

Also applies to: 25-25, 100-100


146-146: UV_NO_CACHE=1 is valid but only controls uv's cache, not pip's.

UV_NO_CACHE=1 is the official uv environment variable (available since v0.1.2) equivalent to --no-cache and correctly prevents uv from using its cache. However, this does not disable pip's internal HTTP/wheel cache. If complete cache bypass is needed for kernel compilation, also set PIP_NO_CACHE_DIR=1 or ensure the called scripts (install_deepgemm.sh, install_python_libraries.sh) use --no-cache-dir with their uv pip install commands.

Also applies to: 148-148, 155-155

container/build.sh (1)

109-109: Base image tag is valid and available.

The VLLM_BASE_IMAGE_TAG "25.10-cuda13.0-devel-ubuntu24.04" exists on nvcr.io/nvidia/cuda-dl-base and is officially published by NVIDIA. The devel variant includes CUDA 13.0 runtime and development tools as expected.

Comment on lines 134 to 140
echo "\n=== Cloning vLLM repository ==="
# Clone needed for DeepGEMM and EP kernels install scripts
cd $INSTALLATION_DIR
git clone https://github.com/dmitry-tokarev-nv/vllm vllm # TODO: switch to official repo when the nvshmem fix is merged
cd vllm
git checkout nvshmem-3.3.24-cuda-13
echo "✓ vLLM repository cloned"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

dmitry-tokarev-nv vllm nvshmem-3.3.24-cuda-13 fork maintained merge

💡 Result:

I couldn't find a GitHub fork or branch named "dmitry-tokarev-nv" or a branch/PR called "nvshmem-3.3.24-cuda-13" in the vLLM repo or forks. vLLM mainline and related issues reference CUDA/compatibility work, and NVSHMEM 3.3.24 (CUDA 13 support) is an NVIDIA release, but there’s no public evidence the specific fork/merge you named exists or was merged. [1][2][3]

Sources:

  • NVSHMEM 3.3.24 release notes (CUDA 13 compatibility). [1]
  • vLLM docs / repo (mentions CUDA build binaries and ongoing CUDA compatibility issues). [2][3]

🏁 Script executed:

# Search for any related documentation or comments about this fork in the codebase
rg -i "dmitry-tokarev-nv|nvshmem.*cuda" --type sh --type md --type txt

Repository: ai-dynamo/dynamo

Length of output: 287


🏁 Script executed:

# Check if there are any GitHub issues or documentation referencing this fork
find . -type f \( -name "*.md" -o -name "*.txt" -o -name "*.sh" \) | xargs grep -l "nvshmem\|dmitry-tokarev" 2>/dev/null | head -20

Repository: ai-dynamo/dynamo

Length of output: 97


Replace fork reference with official vLLM repository or document internal fork access.

The referenced fork (dmitry-tokarev-nv/vllm) and branch (nvshmem-3.3.24-cuda-13) cannot be publicly verified to exist on GitHub. This introduces a critical risk:

  • Build will fail immediately if the fork is private, archived, or deleted
  • No public documentation or merge timeline exists for the nvshmem fix
  • Production build flow depends on an unverifiable external resource

Either:

  1. Use the official vLLM repository and specify a concrete version/branch
  2. Host the fork internally with documented access controls and maintenance guarantees
  3. Provide explicit documentation of the fork's location and expected lifecycle

@dmitry-tokarev-nv dmitry-tokarev-nv marked this pull request as draft December 5, 2025 17:02
Signed-off-by: Dmitry Tokarev <[email protected]>
@dmitry-tokarev-nv
Copy link
Contributor Author

/ok to test 9549ebb

Signed-off-by: Dmitry Tokarev <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants