fix: fix hello world #2727 #2731

biswapanda · 2025-08-26T22:58:47Z

Overview:

Cherry pick: #2727

liveness and readiness check for hello world should be exit 0

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Per-component /dev/shm configuration via new SharedMemory field in CRDs/Helm; frontend/worker-aware defaults for commands, env, ports, and probes.
- Readiness gating for SGLang requests until model registration completes.
- New multimodal example deployment (LLaVA, aggregated).
Bug Fixes
- Safer handling of missing output tokens in SGLang decode stream.
- Reduced CUDA graph batch size on H100 job script; increased readiness timeouts.
Documentation
- New Quickstart (local), Installation, Examples gallery; major docs reorg.
- Updated health-check guidance; numerous model examples switched to Qwen/Qwen3-0.6B.
Chores
- Version bumps: TensorRT-LLM 1.0.0rc6, vLLM 0.10.1.1, UCX v1.19.0; images include Prometheus.

copy-pr-bot · 2025-08-26T22:58:50Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-08-26T23:07:41Z

Caution

Review failed

The pull request is closed.

Walkthrough

The PR updates docs and configs broadly, switches default/example models to Qwen/Qwen3-0.6B, bumps multiple container/dependency versions, removes an internal Rust macro crate, adds shared-memory configuration to CRDs/operator, enhances Helm templates for frontend/worker roles, and introduces SGLang worker readiness gating and tokenizer-init behavior changes.

Changes

Cohort / File(s)	Summary
Licenses `ATTRIBUTIONS-Go.md`	Adds MIT/BSD-3-Clause attributions for testify and go-difflib (duplicated blocks).
Rust workspace/macros `Cargo.toml`, `lib/async-openai-macros/*`, `lib/async-openai/Cargo.toml`	Removes local proc-macro crate; updates dependency to crates.io `async-openai-macros = "0.1.0"`.
SGLang runtime behavior `components/backends/sglang/src/dynamo/sglang/{args.py,main.py,register.py,request_handlers/decode_handler.py}`, `components/backends/sglang/slurm_jobs/scripts/worker_setup.py`	Forces `--skip-tokenizer-init` true with warning; adds readiness gate before serving; `register_llm_with_runtime_config` now returns bool; defensive handling for missing `output_ids`; frontend start cmd updated.
SGLang deploy/docs/model switch `components/backends/sglang/{README.md,deploy/,launch/.sh,docs/*}`	Switches example/deploy model refs to `Qwen/Qwen3-0.6B`; link fixes; doc updates (HiCache flag `--hicache-size`→`--hicache-ratio`); multinode doc tweaks.
TRTLLM deploy/engine configs `components/backends/trtllm/{README.md,deploy/,engine_configs/llama4/*,llama4_plus_eagle.md,gpt-oss.md}`	Switches model refs to Qwen; removes/adjusts Eagle configs (deletions and parameter changes); streamlines multimodal docs; adds readiness guidance.
vLLM updates `components/backends/vllm/deploy/README.md`, `components/backends/vllm/deploy/agg_router.yaml`	Moves GPU limit to service-level; doc links updated; adds architecture links.
Operator & CRDs: shared memory + backend detection `deploy/cloud/helm/crds/templates/nvidia.com_.yaml`, `deploy/cloud/operator/api/v1alpha1/`, `deploy/cloud/operator/internal/{consts/consts.go,dynamo/graph.go}`, `deploy/cloud/operator/config/crd/bases/nvidia.com_*.yaml`, `deploy/cloud/operator/api/v1alpha1/zz_generated.deepcopy.go`, `deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go`, `deploy/cloud/operator/internal/dynamo/graph_test.go`	Adds `sharedMemory` spec (disabled/size) to CRDs/types; defaults `/dev/shm` and `8Gi`; generates deepcopy; integrates per-component shm volume/mount; refines env merge precedence; introduces `BackendFrameworkNoop` and relaxed detection.
Helm templates (frontend/worker-aware) `deploy/helm/chart/templates/{deployment.yaml,grove-podgangset.yaml,service.yaml}`	Adds componentType-aware defaults for command/args, env, ports, and probes; services render for `frontend`; adds readiness/liveness behavior per role; terminationDelay in PodGangSet.
Container builds & deps `container/Dockerfile*`, `container/build.sh`, `container/deps/vllm/install_vllm.sh`	Pins UCX to v1.19.0; updates TRT-LLM base/runtime tags and PyTorch/dep versions; copies Prometheus into runtime images; updates vLLM ref/wheel; bumps NGC/TRT-LLM vars in build script.
Docs restructure & links `docs/*/`, `README.md`, `deploy/inference-gateway/README.md`, `docs/conf.py`	Major docs reorg (index, sections, includes, install/quickstart); updates support matrix; removes legacy pages; Sphinx config overhauled; adds example gallery and installation/architecture pages.
Examples `examples/runtime/hello_world/{README.md,client.py,deploy/hello_world.yaml}`, `examples/basics/multinode/README.md`, `examples/multimodal/deploy/agg_llava.yaml`	Adds retry loop in client; adjusts probes/args and backendFramework in hello_world; minor code snippet fix; adds vLLM multimodal LLaVA aggregated deployment manifest.
Tests `tests/kvbm/test_determinism.py`, `tests/serve/{test_sglang.py,test_vllm.py}`	Updates model refs in SGLang tests; relaxes chat content assertion; extends vLLM timeout; simplifies kvbm markers.
Python packaging `pyproject.toml`	Bumps optional deps: `tensorrt-llm` to 1.0.0rc6; `vllm[flashinfer]` to 0.10.1.1.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant Frontend as Frontend (dynamo.frontend)
  participant Worker as SGLang Worker
  participant Runtime as Runtime Registry

  rect rgba(200,220,255,0.25)
    note over Worker: Startup
    Worker->>Worker: parse args
    alt skip_tokenizer_init not set
      Worker->>Worker: warn and set skip_tokenizer_init=true
    end
    par
      Worker->>Runtime: register_llm_with_runtime_config()
      Runtime-->>Worker: success (bool)
    and
      Worker->>Worker: start endpoints (generate via gate)
    end
    alt registration failed
      Worker->>Worker: shutdown runtime, raise error
    else registration succeeded
      Worker->>Worker: set ready_event
    end
  end

  rect rgba(200,255,200,0.25)
    note over Client,Worker: Request flow after readiness
    Client->>Frontend: /v1/chat/completions
    Frontend->>Worker: dyn://sglang.generate (queued until ready)
    Worker-->>Frontend: stream tokens
    Frontend-->>Client: response
  end

  rect rgba(255,230,200,0.25)
    note over Worker: Decode stream safety
    Worker->>Worker: process stream
    alt output_ids missing
      Worker->>Worker: raise ValueError (descriptive)
    end
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Possibly related PRs

fix: increase shm default size and make it configurable #2616 — Adds configurable shared memory to CRDs/operator; overlaps with this PR’s sharedMemory field, defaults, and controller integration.
fix: fix manual helm chart #2648 — Helm templates gain frontend/worker-specific command/args/env/probes; aligns with this PR’s Helm template branching.
chore: Remove async-openai-macros #2554 — Removes local async-openai-macros crate and switches to a crates.io dependency; same Rust workspace area changed here.

Poem

A bunny taps the keys with glee,
Swaps DeepSeek paths for Qwen3,
Charts grow wise to shm’s new size,
Workers wait till regs arise.
Docs realign, containers shine—
Hop, hop! Releases hop in time.
(carrot-shaped commits ☺)

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.2.2)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

fix: fix hello world (#2727)

3c0b30c

github-actions bot added the fix label Aug 26, 2025

pull-request-size bot added the size/XXL label Aug 26, 2025

biswapanda changed the base branch from main to release/0.4.1 August 26, 2025 22:59

pull-request-size bot added size/XS and removed size/XXL labels Aug 26, 2025

biswapanda closed this Aug 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: fix hello world #2727 #2731

fix: fix hello world #2727 #2731

Uh oh!

biswapanda commented Aug 26, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Aug 26, 2025

Uh oh!

coderabbitai bot commented Aug 26, 2025

Review failed

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: fix hello world #2727 #2731

fix: fix hello world #2727 #2731

Uh oh!

Conversation

biswapanda commented Aug 26, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Aug 26, 2025

Uh oh!

coderabbitai bot commented Aug 26, 2025

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

biswapanda commented Aug 26, 2025 •

edited by coderabbitai bot

Loading