Skip to content

Conversation

@ishandhanani
Copy link
Collaborator

@ishandhanani ishandhanani commented Nov 24, 2025

Will merge this in after next release

Multi-stage Dockerfile splits SGLang builds into base, framework, and runtime stages. Runtime cuts image size roughly in half.


sglang                            framework-test     be66a8e51a09   39.3GB
sglang                            runtime-test          a4dac91fe030    20GB

Tests

  1. cu13 arm - https://github.com/ishandhanani/srt-slurm/blob/main/recipies/gb300-fp4/1p2.yaml
  2. cu12 arm - https://github.com/ishandhanani/srt-slurm/blob/main/recipies/gb200-fp4/1p2d.yaml
  3. x86 - all simple agg/disagg examples

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ishandhanani, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive refactoring of the Dockerfile to implement a multi-stage build strategy. The core objective is to create a highly optimized and lightweight runtime image for SGLang, distinct from the development and build environment. This change enhances build performance, ensures correct CUDA architecture compatibility, and improves the overall reliability of the Docker image creation process.

Highlights

  • Multi-stage Dockerfile Implementation: The Dockerfile has been refactored into a multi-stage build process, introducing distinct base, framework, and runtime stages. This significantly improves image size and build efficiency by separating build-time dependencies from the final runtime environment.
  • Lightweight Runtime Image: A new runtime stage has been added, designed to be a lightweight production image. It includes only essential runtime dependencies and copies pre-built artifacts from the framework stage, drastically reducing the final image footprint for deployment.
  • Optimized Dependency Management and Caching: Dependency installations across all stages have been reorganized for better Docker layer caching. RUN commands are consolidated, and system dependencies are categorized for clarity and efficiency. Redundant packages are removed from earlier stages if not strictly necessary.
  • CUDA Architecture Compatibility Fix: The DeepEP build process now correctly handles CUDA architecture targeting. Specifically, CUDA 12.8.1 no longer attempts to build for Blackwell (10.3) architecture, which is now exclusively supported by 12.9.1 and later versions, resolving potential compilation issues.
  • Enhanced Build Robustness: Retry logic (curl --retry 3 --retry-delay 2) has been added to various download commands for external dependencies (e.g., GDRCopy, DeepEP, Triton, clangd, CMake, just, oh-my-zsh plugins), making the build process more resilient to transient network issues.
  • SGL-Router Integration and Size Optimization: The sgl-router component is now built and installed within the framework stage. Crucially, the Rust toolchain used for its compilation is removed immediately after the build, ensuring that it does not contribute to the final image size.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant improvements to the Dockerfile by refactoring it into a multi-stage build with base, framework, and runtime stages. This is a great change that improves maintainability, reduces image size for production, and optimizes the build process. The introduction of a lightweight runtime stage is particularly valuable. Other notable improvements include adding retry logic for downloads, fixing a critical CUDA architecture compilation bug, and better organization of dependencies.

I have one suggestion to further optimize the runtime stage by combining apt operations to reduce layers and remove redundant commands. Overall, this is an excellent contribution.

@slin1237
Copy link
Collaborator

PR looks good to me
we are currently using this image to launch router and engine in the same container, I think some other people are doing this too
since we are also releasing framework image, I think by the time we do the release
just make sure to write this note in the release page

Copy link

@nv-tusharma nv-tusharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few comments I've made so feel free to check those out but overall LGTM.

@Fridge003
Copy link
Collaborator

Need to add some explanation for the runtime image here
https://github.com/sgl-project/sglang/blob/main/docs/get_started/install.md

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Dec 5, 2025
@ishandhanani ishandhanani merged commit 498ea41 into main Dec 5, 2025
45 checks passed
@ishandhanani ishandhanani deleted the ishan/dockerfile-opt branch December 5, 2025 08:28
@hnyls2002
Copy link
Collaborator

In the new Docker image

 zsh
bash: zsh: command not found

@ishandhanani

yuchengz816-bot pushed a commit to yuchengz816-bot/sglang that referenced this pull request Dec 8, 2025
Kevin-XiongC pushed a commit to novitalabs/sglang that referenced this pull request Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants