Skip to content

Conversation

@hhzhang16
Copy link
Contributor

@hhzhang16 hhzhang16 commented Apr 16, 2025

Overview:

This MR updates Earthfile and creates container/Earthfile for Dynamo vLLM Docker image support for easy building of a succinct image and for the CI.

Details:

  • Added a new target dynamo-base-docker-llm to build and verify the installation of both Dynamo and vllm
  • Created container/Earthfile containing a vllm-build target that builds a patched version of the vllm package. It will only need to rebuild when changes are made to the containers/deps/vllm subdirectory
  • Reduced vllm image size from dynamo:latest-vllm-local-dev being 34GB to the earthly build resulting in a 13GB image.

Simple docker run result to show NIXL and GPU support:

> docker run --gpus all --rm -it my-registry/dynamo-base-docker-llm /bin/bash
root@36daedf25032:/workspace# python3
Python 3.12.3 (main, Feb  4 2025, 14:48:35) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import vllm
INFO 04-17 05:21:44 __init__.py:190] Automatically detected platform cuda.
WARNING 04-17 05:21:44 cuda.py:336] Detected different devices in the system: 
WARNING 04-17 05:21:44 cuda.py:336] NVIDIA GeForce GT 1030
WARNING 04-17 05:21:44 cuda.py:336] NVIDIA RTX A6000
WARNING 04-17 05:21:44 cuda.py:336] NVIDIA RTX A6000
WARNING 04-17 05:21:44 cuda.py:336] Please make sure to set `CUDA_DEVICE_ORDER=PCI_BUS_ID` to avoid unexpected behavior.
INFO 04-17 05:21:44 nixl.py:16] NIXL is available
>>> 
  • Confirmed that this 13GB image can be used as a dynamo base image and the resulting dynamo deployment is successful and can be handled by the image builder on K8s (the 34GB one will crash the image builder)

Where should the reviewer start?

Run earthly +dynamo-base-docker-llm to test locally

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

  • New Features

    • Introduced a new build process for CUDA-enabled environments, including a reusable setup for CUDA toolkit and NVIDIA drivers.
    • Added a new build target for creating a Docker image with minimal NIXL dependencies and pre-installed vllm Python wheel.
    • Added a new multi-stage container build process supporting vllm, uv, and nixl components.
  • Improvements

    • Streamlined Docker image build and push steps with simplified commands and updated environment variable usage.
    • Enhanced build orchestration by reorganizing and parameterizing build targets for greater flexibility and maintainability.
  • Documentation

    • Updated build and push instructions to reflect new Earthly-based workflow and naming conventions.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Apr 16, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@hhzhang16 hhzhang16 marked this pull request as ready for review April 18, 2025 22:26
@github-actions
Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Jun 21, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 21, 2025

Walkthrough

A new multi-stage Earthfile was added for building vLLM, uv, and NIXL components in a CUDA-enabled environment. The main Earthfile was refactored to introduce reusable CUDA setup, update Docker image argument conventions, add a dedicated LLM base image build, and reorganize build orchestration targets. The README was updated to reflect the new Earthly-based image build and push workflow.

Changes

File(s) Change Summary
Earthfile Added SETUP_CUDA function; refactored CUDA setup; renamed Docker args; added dynamo-base-docker-llm and orchestration targets; updated image build flow.
README.md Updated instructions to use Earthly with new Docker argument conventions and image naming.
container/Earthfile New file: defines multi-stage builds for vllm-build, uv-source, nixl-source, and nixl-base with artifact management and CUDA support.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Earthfile
    participant Docker
    participant ArtifactStore

    User->>Earthfile: Run +dynamo-base-docker-llm (with DOCKER_SERVER, IMAGE_TAG)
    Earthfile->>Docker: Build image with CUDA, NIXL, vLLM, UCX
    Earthfile->>ArtifactStore: Copy NIXL/vLLM/uv artifacts from container/Earthfile
    Earthfile->>Docker: Install dependencies, set env vars, verify imports
    Earthfile->>Docker: Push built image to DOCKER_SERVER with IMAGE_TAG
Loading

Possibly related PRs

Suggested labels

size/M, feat, build

Suggested reviewers

  • tanmayv25
  • nnshah1
  • alec-flowers
  • ishandhanani

Poem

In the warren where builds are spun,
Earthfiles now dance, their work begun.
CUDA and NIXL, vLLM too—
All in one image, shiny and new!
Docker tags hop, arguments leap,
This bunny’s code is tidy and neat.
🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the feat label Jun 21, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
container/Earthfile (2)

60-61: Consider using a more modern manylinux tag.

The manylinux1 tag is deprecated. Consider using manylinux2014 or manylinux_2_17 for better library support while maintaining compatibility.

-        sed -i 's/Tag: cp38-abi3-linux_x86_64/Tag: cp38-abi3-manylinux1_x86_64/g' ${VLLM_PATCHED_PACKAGE_NAME}-${VLLM_PATCHED_PACKAGE_VERSION}.dist-info/WHEEL && \
-        sed -i "s/-cp38-abi3-linux_x86_64.whl/-cp38-abi3-manylinux1_x86_64.whl/g" ${VLLM_PATCHED_PACKAGE_NAME}-${VLLM_PATCHED_PACKAGE_VERSION}.dist-info/RECORD && \
+        sed -i 's/Tag: cp38-abi3-linux_x86_64/Tag: cp38-abi3-manylinux2014_x86_64/g' ${VLLM_PATCHED_PACKAGE_NAME}-${VLLM_PATCHED_PACKAGE_VERSION}.dist-info/WHEEL && \
+        sed -i "s/-cp38-abi3-linux_x86_64.whl/-cp38-abi3-manylinux2014_x86_64.whl/g" ${VLLM_PATCHED_PACKAGE_NAME}-${VLLM_PATCHED_PACKAGE_VERSION}.dist-info/RECORD && \

83-83: Consider providing a default value for NIXL_COMMIT.

The NIXL_COMMIT argument is required but has no default value, which could cause build failures if not provided.

-    ARG NIXL_COMMIT
+    ARG NIXL_COMMIT=main  # or a specific commit hash

Alternatively, add a build-time check to ensure it's provided:

     ARG NIXL_COMMIT
+    RUN test -n "${NIXL_COMMIT}" || (echo "Error: NIXL_COMMIT must be provided" && exit 1)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 831bc72 and 5b0e15f.

📒 Files selected for processing (3)
  • Earthfile (4 hunks)
  • README.md (1 hunks)
  • container/Earthfile (1 hunks)
🔇 Additional comments (8)
container/Earthfile (2)

67-70: LGTM!

Clean and efficient way to extract the uv binary from the official image.


144-144: The previous ripgrep run didn’t search any files—let’s locate the Earthfile and inspect both the base image and the CUDA flag:

#!/bin/bash
# Locate the Earthfile
earthfile=$(find . -type f -name Earthfile | head -n1)
echo "Using Earthfile: $earthfile"

# Show the Docker base image line
grep -n "FROM nvcr.io/nvidia/cuda-dl-base" "$earthfile" || echo "Base image line not found"

# Show UCX build flags for CUDA
grep -n "with-cuda" "$earthfile" || echo "No 'with-cuda' flags found"

# Show any references to /usr/local/cuda
grep -n "/usr/local/cuda" "$earthfile" || echo "No '/usr/local/cuda' references found"
README.md (2)

60-64: LGTM!

The updated build instructions using Earthly are clear and align with the new build process.


68-68: LGTM!

Correctly references the new LLM-optimized image with the -llm suffix.

Earthfile (4)

61-85: LGTM!

Excellent implementation of a reusable CUDA setup function with comprehensive environment variables and proper error handling.


100-100: LGTM!

Good refactoring to use the reusable SETUP_CUDA function.


150-151: LGTM!

The new argument names are more descriptive and flexible than the CI-specific ones.


224-241: LGTM!

Excellent reorganization of build targets into logical groups. The separation between cloud components and base images improves maintainability and allows for more granular builds.


RUN mkdir -p /workspace/dist
RUN mkdir /tmp/vllm && \
pip download --only-binary=:all: --no-deps --dest /tmp/vllm vllm==v${VLLM_REF} && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix the version specifier in pip download command.

The PyPI version format doesn't include the 'v' prefix. This will cause the download to fail.

Apply this diff to fix the version specifier:

-        pip download --only-binary=:all: --no-deps --dest /tmp/vllm vllm==v${VLLM_REF} && \
+        pip download --only-binary=:all: --no-deps --dest /tmp/vllm vllm==${VLLM_REF} && \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
pip download --only-binary=:all: --no-deps --dest /tmp/vllm vllm==v${VLLM_REF} && \
pip download --only-binary=:all: --no-deps --dest /tmp/vllm vllm==${VLLM_REF} && \
🤖 Prompt for AI Agents
In container/Earthfile at line 50, the pip download command incorrectly includes
a 'v' prefix in the version specifier for vllm, which is not valid for PyPI
versions. Remove the 'v' prefix from the version specifier so that it reads
vllm==${VLLM_REF} instead of vllm==v${VLLM_REF} to ensure the download succeeds.

rm -rf /workspace/ai_dynamo_vllm-*.whl

# Verify both Dynamo and vllm are properly installed
RUN python3 -c "import dynamo; import vllm" || (echo "Failed to import Dynamo or vllm" && exit 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use the virtual environment Python for import verification.

The verification command uses python3 which might not be the virtual environment Python. Use the activated environment's Python instead.

-    RUN python3 -c "import dynamo; import vllm" || (echo "Failed to import Dynamo or vllm" && exit 1)
+    RUN . /opt/dynamo/venv/bin/activate && python -c "import dynamo; import vllm" || (echo "Failed to import Dynamo or vllm" && exit 1)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
RUN python3 -c "import dynamo; import vllm" || (echo "Failed to import Dynamo or vllm" && exit 1)
RUN . /opt/dynamo/venv/bin/activate && python -c "import dynamo; import vllm" || (echo "Failed to import Dynamo or vllm" && exit 1)
🤖 Prompt for AI Agents
In Earthfile at line 216, the import verification uses `python3` which may not
point to the virtual environment's Python interpreter. Change the command to use
the Python executable from the activated virtual environment, typically
referenced by `$(which python)` or `$VIRTUAL_ENV/bin/python`, to ensure the
imports are checked within the correct environment.

@github-actions github-actions bot removed the Stale label Jul 4, 2025
@grahamking
Copy link
Contributor

Earthly is going away: #2154

@grahamking grahamking closed this Aug 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants