Skip to content

Conversation

@saturley-hall
Copy link
Member

@saturley-hall saturley-hall commented Aug 15, 2025

Overview:

This removes the KVBM from the ai-dynamo-runtime wheel so that in environments where the glibc version is not 2.38 (such as AWS AL2023) we are able to run dynamo in environments for python 3.12.

Details:

Enabling the KVBM feature in the ai-dynamo-runtime wheel requires packaging libnixl.so. Unfortunately, due to lack of the publicly-available NVIDIA containers which contain both CUDA and a manylinux build environment there is not an easy way to build libnixl.so in such a way that the glibc version is kept equivalent to the 2.28 reported by the wheel.

By removing it the wheel no longer contains libnixl.so and has a glibc dependency of 2.28.

Where should the reviewer start?

After pulling this branch in the dynamo repo run the following to build the container and copy out the wheels into a folder called ./wheelhouse:

container/build.sh --framework none --tag dynamo:base-aws
docker create --name aws dynamo:base-aws
docker cp aws:/opt/dynamo/wheelhouse .
docker rm aws

In the top-level of the dynamo directory add the following file named Containerfile.al2023

FROM public.ecr.aws/amazonlinux/amazonlinux:2023 AS base

COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/

ENV VIRTUAL_ENV=/opt/dynamo/venv
RUN mkdir /opt/dynamo && \
    uv venv --python 3.12 ${VIRTUAL_ENV}
ENV PATH="$VIRTUAL_ENV/bin:$PATH"\
    UV_COMPILE_BYTECODE=1 \
    UV_LINK_MODE=copy

COPY wheelhouse /wheelhouse

RUN yum install -y clang

RUN --mount=type=cache,target=/root/.cache/uv \
    echo "oof4" >> /hi_there.txt && \
    uv pip install \
    /wheelhouse/ai_dynamo_runtime*cp312*.whl \
    /wheelhouse/ai_dynamo-0.4.0.post0-py3-none-any.whl[vllm]

We will now build this container

docker build -f Containerfile.al2023 --tag al2023:p312-vllm . 

To test the container we will bring up Dynamo in the normal way using Qwen3-0.6B

docker compose -f ~/ai-dynamo/dynamo/deploy/docker-compose.yml up -d
docker run --gpus all -d --name al2023_frontend --network host --ipc host al2023:p312-vllm python -m dynamo.frontend
docker run --gpus all -d --name al2023_worker -v ~/.cache/huggingface:/root/.cache/huggingface --network host --ipc host al2023:p312-vllm python -m dynamo.vllm --model Qwen/Qwen3-0.6B --enforce-eager --no-enable-prefix-caching

Wait for a moment while the model is downloaded and the service starts. If you issue the command too early just wait a second and try again.

curl localhost:8080/v1/chat/completions   -H "Content-Type: application/json"   -d '{
    "model": "Qwen/Qwen3-0.6B",
    "messages": [
    {
        "role": "user",
        "content": "Hello, how are you?"
    }
    ],
    "stream":false,
    "max_tokens": 300
}' | jq

Once you have observed functionality you can clean up your running containers, delete the ./wheelhouse/ folder and ./Containerfile.al2023

docker stop al2023_frontend al2023_worker
docker rm al2023_frontend al2023_worker
docker image rm al2023:p312-vllm dynamo:base-aws
rm -rf ./wheelhouse
rm ./Containerfile.al2023

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Fixes OPS-623

Summary by CodeRabbit

  • Chores
    • Updated package version to 0.4.0.post0 and aligned dependency versions accordingly.
    • Switched Python bindings from dynamic (VCS-based) to a fixed static version.
    • Adjusted container build to produce Python wheels with default features; overall build workflow unchanged.
  • Notes
    • No changes to public APIs or user-facing functionality.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 15, 2025

Walkthrough

Removed the block-manager feature from maturin wheel builds in container Dockerfiles and switched Python package versioning to a fixed 0.4.0.post0, updating the root dependency to match. Other build steps and conditionals remain unchanged.

Changes

Cohort / File(s) Summary
Docker wheel build feature flag removal
container/Dockerfile, container/Dockerfile.vllm
Dropped --features block-manager from maturin build commands; other build steps (including cargo with block-manager in vLLM and RELEASE_BUILD conditionals) unchanged.
Python package version pinning
lib/bindings/python/pyproject.toml, pyproject.toml
Switched from dynamic version to 0.4.0.post0; updated project version and ai-dynamo-runtime dependency to 0.4.0.post0.

Sequence Diagram(s)

sequenceDiagram
    participant Dev as Developer
    participant Docker as Docker Build
    participant Maturin as maturin
    participant Registry as Wheel Artifact

    Dev->>Docker: Build image (Dockerfile/ Dockerfile.vllm)
    Docker->>Maturin: build (no --features block-manager)
    Maturin-->>Registry: Produce wheel with default features
    Docker-->>Dev: Image with wheel
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Poem

A rabbit taps the Docker wall,
“No block flags now—just build them all.”
Wheels roll out, version post-zero,
Tags align, a tidy hero.
In burrows deep, the pyproject sings—
Hop, release, with simpler things. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
pyproject.toml (1)

28-28: Pinning ai-dynamo-runtime to 0.4.0.post0 keeps packages in lockstep

Given the tight coupling between ai-dynamo and ai-dynamo-runtime, this explicit pin is reasonable for the wheel change.

If you expect to publish multiple post releases quickly, consider a compatible pin (e.g., ~=0.4.0.post0) to reduce churn in the root package when only runtime is patched.

container/Dockerfile (1)

231-236: Add --no-default-features to all maturin builds to avoid enabling KVBM / bundling libnixl

I ran the inspection script but no .whl files were found under dist, so I couldn't confirm wheel contents or GLIBC symbols — add this flag defensively and re-run the wheel check after building.

  • File needing change:
    • container/Dockerfile — around lines 231–236

Apply:

-    maturin build --release --out /opt/dynamo/dist && \
+    maturin build --release --no-default-features --out /opt/dynamo/dist && \
     if [ "$RELEASE_BUILD" = "true" ]; then \
         # do not enable KVBM feature, ensure compatibility with lower glibc
-        uv run --python 3.11 maturin build --release --out /opt/dynamo/dist && \
-        uv run --python 3.10 maturin build --release --out /opt/dynamo/dist; \
+        uv run --python 3.11 maturin build --release --no-default-features --out /opt/dynamo/dist && \
+        uv run --python 3.10 maturin build --release --no-default-features --out /opt/dynamo/dist; \
     fi

Please re-run the wheel-inspection script (or provide dist/*.whl) so we can verify no libnixl is bundled and the wheel targets the expected manylinux tag.

container/Dockerfile.vllm (1)

371-376: Good: maturin builds omit KVBM; also pass --no-default-features for robustness

Same rationale as the main Dockerfile — explicitly disable default features for all maturin invocations to prevent accidental inclusion if defaults change.

Apply:

-    maturin build --release --out /workspace/dist && \
+    maturin build --release --no-default-features --out /workspace/dist && \
     if [ "$RELEASE_BUILD" = "true" ]; then \
         # do not enable KVBM feature, ensure compatibility with lower glibc
-        uv run --python 3.11 maturin build --release --out /workspace/dist && \
-        uv run --python 3.10 maturin build --release --out /workspace/dist; \
+        uv run --python 3.11 maturin build --release --no-default-features --out /workspace/dist && \
+        uv run --python 3.10 maturin build --release --no-default-features --out /workspace/dist; \
     fi

Note: Earlier in this file, the workspace cargo build is invoked with --features dynamo-llm/block-manager (Line 364 in context). That’s fine for producing binaries like metrics, but it won’t affect maturin’s feature set as long as maturin explicitly disables defaults as recommended above.

Use the same wheel inspection script suggested for the main Dockerfile to ensure no nixl artifacts and GLIBC_2.28 targeting.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between acbdabc and 7f4e017.

📒 Files selected for processing (4)
  • container/Dockerfile (1 hunks)
  • container/Dockerfile.vllm (1 hunks)
  • lib/bindings/python/pyproject.toml (1 hunks)
  • pyproject.toml (2 hunks)
🔇 Additional comments (1)
pyproject.toml (1)

18-18: Approve — Python packages bumped to 0.4.0.post0; Rust/Cargo remains 0.4.0

Brief: pyproject files were updated to 0.4.0.post0 while Cargo.toml (workspace + local crates) remains 0.4.0. This is expected (Cargo uses semver and doesn't accept PEP 440 post-release suffixes). Ensure tags/releases are created consistently so downstream tooling picks the intended post-release.

Files to note:

  • pyproject.toml — version = "0.4.0.post0"
  • lib/bindings/python/pyproject.toml — version = "0.4.0.post0"
  • Cargo.toml — workspace.package version = "0.4.0" (and local crates: dynamo-runtime / dynamo-llm / dynamo-tokens = "0.4.0"; async_zmq dependency = "0.4.0")

@saturley-hall saturley-hall merged commit ffae72b into main Aug 15, 2025
12 of 13 checks passed
@saturley-hall saturley-hall deleted the harrison/disable-kvbm branch August 15, 2025 20:43
hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build system Related to build processes fix size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants