Skip to content

Conversation

@ishandhanani
Copy link
Contributor

@ishandhanani ishandhanani commented Dec 5, 2025

😄

sgl-project/sglang#13861

Summary by CodeRabbit

Release Notes

  • Chores
    • Streamlined Docker build infrastructure with simplified multi-stage configuration and improved flexibility for source provisioning options
    • Enhanced architecture-aware dependency handling
    • Removed obsolete build configuration file

✏️ Tip: You can customize this high-level summary in your review settings.

@ishandhanani ishandhanani requested review from a team as code owners December 5, 2025 00:02
@github-actions github-actions bot added the feat label Dec 5, 2025
@ishandhanani ishandhanani changed the title feat: sglang dockerfile to leverage upstream runtime [wip] feat: sglang dockerfile to leverage upstream runtime Dec 5, 2025
@ishandhanani ishandhanani marked this pull request as draft December 5, 2025 00:03
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 5, 2025

Walkthrough

The Docker container definitions are refactored to simplify the build process. The Dockerfile.sglang file is restructured with a runtime-centric base image, introduces BRANCH_TYPE-controlled conditional logic for Dynamo installation, and adds new build arguments. The Dockerfile.sglang-wideep file is entirely removed.

Changes

Cohort / File(s) Change Summary
Dockerfile Refactoring
container/Dockerfile.sglang
Simplified multi-stage Dockerfile that switches to runtime-centric base image, introduces BRANCH_TYPE ARG for conditional local/remote Dynamo installation, adds SGLANG_IMAGE_TAG and CUDA_VERSION build arguments, adds SGLANG_FORCE_SHUTDOWN environment variable, incorporates architecture-aware NATS/ETCD binary installation, and reorganizes Python binding build steps under BRANCH_TYPE branches.
File Removal
container/Dockerfile.sglang-wideep
File deleted. Previously contained multi-stage Docker build with BRANCH_TYPE-based Dynamo installation logic, NATS/ETCD binary setup, and Python binding builds with maturin.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • BRANCH_TYPE conditional logic: Verify that all three paths (local, remote, else) execute correctly and handle edge cases appropriately.
  • Architecture-aware binary installation: Validate NATS/ETCD binary selection and installation for different architectures.
  • Multi-stage build flow: Confirm the new runtime-centric base image integration and local_src stage dependencies are correctly structured.
  • Environment variable propagation: Ensure SGLANG_FORCE_SHUTDOWN and other new environment variables are properly set and inherited in the build stages.

Poem

🐰 Hoppy hops through layers bright,
Dockerfile simplified—what a delight!
Old wideeep file says goodbye,
BRANCH_TYPE paths branch high,
Docker dreams in runtime's light!

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description is incomplete and lacks required sections from the template. It only contains an emoji and a link reference without addressing Overview, Details, Where to start, or Related Issues. Add comprehensive description sections including Overview, Details, Where should the reviewer start, and Related Issues following the repository template.
✅ Passed checks (2 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Title check ✅ Passed The title accurately reflects the main change: refactoring the sglang Dockerfile to use an upstream runtime base image instead of building everything from scratch.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
container/Dockerfile.sglang (2)

45-64: Consolidate and document the complex multi-branch conditional RUN command.

The 19-line RUN instruction (lines 45–64) implements three separate build paths using shell conditionals, making it difficult to read, maintain, and modify. The branches diverge only in the source path (/tmp/local_src vs /sgl-workspace/dynamo), yet duplicate substantial logic.

Consider splitting this into dedicated stages or separate RUN instructions for each build path, or extracting the common logic into a reusable script. For example:

# Option 1: Use a helper script to encapsulate build logic
COPY --from=local_src /src /tmp/local_src

RUN if [ "$BRANCH_TYPE" = "local" ]; then \
      cp -r /tmp/local_src /sgl-workspace/dynamo; \
    elif [ "$BRANCH_TYPE" = "remote" ]; then \
      git clone https://github.com/ai-dynamo/dynamo.git /sgl-workspace/dynamo; \
    fi

ENV CARGO_BUILD_JOBS=${CARGO_BUILD_JOBS:-16}

# Extract the build+install logic into a dedicated RUN step for clarity
RUN if [ "$BRANCH_TYPE" = "local" ] || [ "$BRANCH_TYPE" = "remote" ]; then \
      DYNAMO_PATH=/sgl-workspace/dynamo; \
      SOURCE_REQS="${DYNAMO_PATH}/container/deps/requirements.txt"; \
      [ "$BRANCH_TYPE" = "local" ] && SOURCE_REQS=/tmp/local_src/container/deps/requirements.txt; \
      cd ${DYNAMO_PATH}/lib/bindings/python && \
      pip install --break-system-packages maturin && \
      maturin build --release && \
      pip install --break-system-packages target/wheels/*.whl && \
      cd ${DYNAMO_PATH} && \
      pip install --break-system-packages -e . && \
      pip install --break-system-packages --requirement ${SOURCE_REQS}; \
    else \
      pip install --break-system-packages ai-dynamo; \
    fi \
  && rm -rf /tmp/local_src

This reduces duplication and improves readability while maintaining the same behavior.


11-14: Review the multi-stage source-copy pattern for efficiency.

Using FROM scratch AS local_src with COPY . /src and later COPY --from=local_src /src /tmp/local_src adds an intermediate stage. While valid, a direct COPY . /tmp/local_src in the main stage would be more straightforward, unless the scratch stage serves another purpose (e.g., clarity, future reuse across stages).

If this pattern is not required for consistency across other framework Dockerfiles (vllm, etc.), consider simplifying to a direct COPY in the main FROM stage.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8c4a5f and a3d153e.

📒 Files selected for processing (2)
  • container/Dockerfile.sglang (1 hunks)
  • container/Dockerfile.sglang-wideep (0 hunks)
💤 Files with no reviewable changes (1)
  • container/Dockerfile.sglang-wideep
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 4657
File: container/Dockerfile.sglang:216-223
Timestamp: 2025-12-03T01:04:32.053Z
Learning: In container/Dockerfile.sglang, the recursive chown -R and chmod -R operations during early user setup (when creating the dynamo user and initializing /sgl-workspace, /workspace, /home/dynamo, /opt/dynamo) are intentional exceptions to the pattern of avoiding recursive operations, as they handle pre-existing paths and dotfiles created by useradd -m.
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3051
File: container/templates/Dockerfile.vllm.j2:221-233
Timestamp: 2025-09-16T17:16:07.820Z
Learning: In the dynamo project, when converting Dockerfiles to Jinja2 templates, the primary goal is backward compatibility - generated Dockerfiles must be identical to the originals. Security improvements or other enhancements that would change the generated output are out of scope for templating PRs and should be addressed separately.
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project, devcontainer.json files use templated container names (like "dynamo-vllm-devcontainer") that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.
📚 Learning: 2025-12-03T01:04:32.053Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 4657
File: container/Dockerfile.sglang:216-223
Timestamp: 2025-12-03T01:04:32.053Z
Learning: In container/Dockerfile.sglang, the recursive chown -R and chmod -R operations during early user setup (when creating the dynamo user and initializing /sgl-workspace, /workspace, /home/dynamo, /opt/dynamo) are intentional exceptions to the pattern of avoiding recursive operations, as they handle pre-existing paths and dotfiles created by useradd -m.

Applied to files:

  • container/Dockerfile.sglang
📚 Learning: 2025-08-30T20:43:49.632Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.

Applied to files:

  • container/Dockerfile.sglang
📚 Learning: 2025-09-16T17:16:07.820Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3051
File: container/templates/Dockerfile.vllm.j2:221-233
Timestamp: 2025-09-16T17:16:07.820Z
Learning: In the dynamo project, when converting Dockerfiles to Jinja2 templates, the primary goal is backward compatibility - generated Dockerfiles must be identical to the originals. Security improvements or other enhancements that would change the generated output are out of scope for templating PRs and should be addressed separately.

Applied to files:

  • container/Dockerfile.sglang
📚 Learning: 2025-12-02T18:13:40.065Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 4698
File: .github/workflows/container-validation-dynamo.yml:68-68
Timestamp: 2025-12-02T18:13:40.065Z
Learning: In the ai-dynamo/dynamo repository, backend-specific tests (vllm, sglang, trtllm) are intentionally excluded from the container-validation-dynamo.yml workflow using "not (vllm or sglang or trtllm)" because they run in a separate container-validation-backends.yml workflow that has dedicated jobs for each backend. This separation keeps framework-agnostic tests separate from backend-specific tests.

Applied to files:

  • container/Dockerfile.sglang
📚 Learning: 2025-08-30T20:43:10.091Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project, devcontainer.json files use templated container names (like "dynamo-vllm-devcontainer") that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Applied to files:

  • container/Dockerfile.sglang
📚 Learning: 2025-08-30T20:43:10.091Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project's devcontainer setup, hard-coded container names in devcontainer.json files serve as templates that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Applied to files:

  • container/Dockerfile.sglang
📚 Learning: 2025-09-03T01:10:12.599Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2822
File: container/Dockerfile.vllm:343-352
Timestamp: 2025-09-03T01:10:12.599Z
Learning: In the dynamo project's local-dev Docker targets, USER_UID and USER_GID build args are intentionally left without default values to force explicit UID/GID mapping during build time, preventing file permission issues in local development environments where container users need to match host user permissions for mounted volumes.

Applied to files:

  • container/Dockerfile.sglang
📚 Learning: 2025-07-03T10:14:30.570Z
Learnt from: fsaady
Repo: ai-dynamo/dynamo PR: 1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

Applied to files:

  • container/Dockerfile.sglang
📚 Learning: 2025-12-03T01:14:42.094Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 4657
File: container/Dockerfile.vllm:264-265
Timestamp: 2025-12-03T01:14:42.094Z
Learning: In container/Dockerfile.vllm, the recursive chmod -R g+w operation during early user setup (at lines 264-265, when creating the dynamo user and initializing /workspace, /home/dynamo, /opt/dynamo) is an intentional exception to the pattern of avoiding recursive operations, as it handles pre-existing paths and dotfiles created by useradd -m before bulk content is copied.

Applied to files:

  • container/Dockerfile.sglang
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: operator (amd64)
  • GitHub Check: sglang (arm64)
  • GitHub Check: operator (arm64)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
container/Dockerfile.sglang (1)

45-45: Verify the justification and consistency of --break-system-packages flags.

The --break-system-packages flag is used throughout the conditional build logic (lines 47–62). The inline comment states "SGLang does not use a venv in their container," which justifies bypassing PEP 668 protections. Ensure this is consistent with other framework Dockerfiles (e.g., Dockerfile.vllm) and document this decision if it is unique to sglang.

Per learnings on consistency across framework-specific Dockerfiles (PR 2797), verify that this approach aligns with the patterns used in sibling Dockerfiles.

Copy link
Contributor

@dmitry-tokarev-nv dmitry-tokarev-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move etcd and nats before Dynamo as dynamo is updated more frequently.
Let's add --no-cache-dir to pip install

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants