fix: 0.4.1 disable kvbm tests (CP #2611) #2635

dmitry-tokarev-nv · 2025-08-22T02:31:05Z

Overview:

Cherry-pick for #2611

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Switched default example/model configs to Qwen/Qwen3-0.6B across sglang and TensorRT-LLM deployments and launch scripts.
- Added Prometheus to vLLM runtime images.
Bug Fixes
- Improved sglang robustness: safer handling when token IDs are missing and automatic tokenizer init behavior.
- Refined GPU resource configuration for vLLM decode workers.
Documentation
- Updated setup guides, health checks, and examples; simplified multimodal instructions; refreshed HiCache guidance; version notes updated.
Chores
- Upgraded base images and dependencies (TensorRT-LLM rc6, PyTorch stack, UCX pinning).
Tests
- Updated model references, increased timeouts, and narrowed markers.

copy-pr-bot · 2025-08-22T02:31:09Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-08-22T02:37:04Z

Caution

Review failed

The pull request is closed.

Walkthrough

Bulk updates switch default model references to Qwen/Qwen3-0.6B, bump TensorRT-LLM pins to 1.0.0rc6 with container/version alignments, adjust UCX refs to v1.19.0, remove the local async-openai-macros crate in favor of a published dependency, and add SGLang runtime behavior tweaks (frontend tokenization default, decode error guard). Tests and docs updated accordingly.

Changes

Cohort / File(s)	Summary of changes
Rust macros removal & workspace update `Cargo.toml`, `lib/async-openai-macros/Cargo.toml`, `lib/async-openai-macros/src/lib.rs`, `lib/async-openai/Cargo.toml`	Removed local `async-openai-macros` crate from workspace and repo; `lib/async-openai` now depends on published `async-openai-macros = "0.1.0"`.
SGLang model/config/docs switch `components/backends/sglang/README.md`, `components/backends/sglang/deploy/.yaml`, `components/backends/sglang/launch/.sh`, `components/backends/sglang/docs/*`	Replaced model identifiers from `deepseek-ai/DeepSeek-R1-Distill-Llama-8B` to `Qwen/Qwen3-0.6B`; added `--skip-tokenizer-init` where noted; unified some invocation entrypoints; HiCache example switched from size to ratio.
SGLang runtime behavior `components/backends/sglang/src/dynamo/sglang/args.py`, `.../request_handlers/decode_handler.py`, `components/backends/sglang/slurm_jobs/scripts/worker_setup.py`, `components/backends/sglang/slurm_jobs/scripts/h100.sh`	Default enforce `skip_tokenizer_init=True` with warning; guarded missing `output_ids` with descriptive error; changed frontend startup command; adjusted `--cuda-graph-bs` 256→128 in H100 script.
TRT-LLM model/config/docs switch `components/backends/trtllm/deploy/.yaml`, `components/backends/trtllm/launch/.sh`, `components/backends/trtllm/README.md`, `components/backends/trtllm/deploy/README.md`, `components/backends/trtllm/gemma3_sliding_window_attention.md`, `components/backends/trtllm/gpt-oss.md`	Switched model to `Qwen/Qwen3-0.6B` in deploy/launch; cleaned MTP sections; added readiness health-check guidance; updated notes.
Container/tooling pins (UCX/TRT-LLM/PyTorch) `container/Dockerfile*`, `container/build.sh`, `pyproject.toml`, `README.md`, `docs/support_matrix.md`	Bumped UCX ref to `v1.19.0`; updated TRT-LLM base tag to 25.06, wheel to `1.0.0rc6`, related pins (Torch, FlashAttention, NetworkX, CUDA runtime); introduced/promoted Prometheus and LD paths in vLLM Dockerfile; added cuda-python pin note and uv usage; support matrix footnote.
vLLM deployment resource placement `components/backends/vllm/deploy/agg_router.yaml`	Moved GPU limit from container-level to pod-level `resources.limits.gpu: "1"`.
Tests adjustments `tests/serve/test_sglang.py`, `tests/serve/test_vllm.py`, `tests/kvbm/test_determinism.py`	Updated models in SGLang tests; relaxed one content-length assertion; increased vLLM readiness timeout 300→500; limited kvbm test markers (keep only `kvbm`).

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Frontend (Dynamo)
  participant Worker (SGLang)

  rect rgb(245,248,255)
  note over Frontend (Dynamo): Tokenization enforced (skip_tokenizer_init=True)
  Client->>Frontend (Dynamo): /v1/chat/completions
  Frontend (Dynamo)->>Frontend (Dynamo): Tokenize input
  Frontend (Dynamo)->>Worker (SGLang): Prefill/Decode request (no tokenizer init)
  end

  alt Streaming tokens
    Worker (SGLang)-->>Frontend (Dynamo): res with output_ids
    Frontend (Dynamo)->>Client: stream tokens
  else Missing output_ids
    Worker (SGLang)-->>Frontend (Dynamo): res without output_ids
    Frontend (Dynamo)->>Frontend (Dynamo): raise ValueError with keys hint
    Frontend (Dynamo)-->>Client: error response
  end

  Frontend (Dynamo)->>Frontend (Dynamo): Detokenize (if needed)
  Frontend (Dynamo)-->>Client: final response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

chore: Remove async-openai-macros #2554 — Removes the local async-openai-macros crate and switches to the published crate, matching this PR’s crate deletion and dependency change.
fix: Update tensorrt_llm to 1.0.0rc6 #2606 — Bumps TensorRT-LLM to 1.0.0rc6 across build scripts and dependencies, aligning with the same version updates here.
feat: use consistent small models across all deploy examples #2573 — Replaces DeepSeek model IDs with Qwen/Qwen3-0.6B in SGLang/TRT-LLM deployment artifacts, mirroring the model switches in this PR.

Poem

In cables and crates I twitch my nose,
Pins hop forward as the UCX grows.
Qwen now purrs where DeepSeek sat,
Frontend nibbles tokens—imagine that!
Wheels roll rc6, the tests lie in wait—
A rabbit stamps OK: ship this update. 🐰✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e2e909f and 2d96f92.

⛔ Files ignored due to path filters (2)

Cargo.lock is excluded by !**/*.lock
lib/bindings/python/Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (45)

Cargo.toml (0 hunks)
README.md (2 hunks)
components/backends/sglang/README.md (1 hunks)
components/backends/sglang/deploy/agg.yaml (1 hunks)
components/backends/sglang/deploy/agg_router.yaml (1 hunks)
components/backends/sglang/deploy/disagg-multinode.yaml (1 hunks)
components/backends/sglang/deploy/disagg.yaml (2 hunks)
components/backends/sglang/deploy/disagg_planner.yaml (2 hunks)
components/backends/sglang/docs/multinode-examples.md (1 hunks)
components/backends/sglang/docs/sgl-hicache-example.md (3 hunks)
components/backends/sglang/launch/agg.sh (1 hunks)
components/backends/sglang/launch/agg_router.sh (2 hunks)
components/backends/sglang/launch/disagg.sh (2 hunks)
components/backends/sglang/slurm_jobs/scripts/h100.sh (1 hunks)
components/backends/sglang/slurm_jobs/scripts/worker_setup.py (1 hunks)
components/backends/sglang/src/dynamo/sglang/args.py (1 hunks)
components/backends/sglang/src/dynamo/sglang/request_handlers/decode_handler.py (1 hunks)
components/backends/trtllm/README.md (1 hunks)
components/backends/trtllm/deploy/README.md (0 hunks)
components/backends/trtllm/deploy/agg.yaml (1 hunks)
components/backends/trtllm/deploy/agg_router.yaml (1 hunks)
components/backends/trtllm/deploy/disagg.yaml (2 hunks)
components/backends/trtllm/deploy/disagg_router.yaml (2 hunks)
components/backends/trtllm/gemma3_sliding_window_attention.md (1 hunks)
components/backends/trtllm/gpt-oss.md (1 hunks)
components/backends/trtllm/launch/agg.sh (1 hunks)
components/backends/trtllm/launch/agg_router.sh (1 hunks)
components/backends/trtllm/launch/disagg.sh (1 hunks)
components/backends/trtllm/launch/disagg_router.sh (1 hunks)
components/backends/vllm/deploy/agg_router.yaml (1 hunks)
container/Dockerfile (2 hunks)
container/Dockerfile.kvbm (1 hunks)
container/Dockerfile.sglang (1 hunks)
container/Dockerfile.sglang-wideep (2 hunks)
container/Dockerfile.trtllm (5 hunks)
container/Dockerfile.vllm (2 hunks)
container/build.sh (4 hunks)
docs/support_matrix.md (1 hunks)
lib/async-openai-macros/Cargo.toml (0 hunks)
lib/async-openai-macros/src/lib.rs (0 hunks)
lib/async-openai/Cargo.toml (1 hunks)
pyproject.toml (1 hunks)
tests/kvbm/test_determinism.py (1 hunks)
tests/serve/test_sglang.py (2 hunks)
tests/serve/test_vllm.py (1 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

test: Temporary exclude kvbm tests from vllm, nightly, gpu1... (#2611)

2d96f92

dmitry-tokarev-nv requested review from a team, alec-flowers, ishandhanani, nnshah1, piotrm-nvidia, ptarasiewiczNV, richardhuo-nv, rmccorm4, ryanolson and tanmayv25 as code owners August 22, 2025 02:31

dmitry-tokarev-nv requested review from a team, GuanLuo, PeaBrane, biswapanda, grahamking, hhzhang16, jthomson04, kkranen, paulhendricks, tedzhouhk and tmonty12 as code owners August 22, 2025 02:31

github-actions bot added the fix label Aug 22, 2025

pull-request-size bot added the size/XL label Aug 22, 2025

dmitry-tokarev-nv changed the base branch from main to release/0.4.1 August 22, 2025 02:31

pull-request-size bot added size/XS and removed size/XL labels Aug 22, 2025

dmitry-tokarev-nv merged commit 2429b48 into release/0.4.1 Aug 22, 2025
3 of 4 checks passed

dmitry-tokarev-nv deleted the dtokarev-0.4.1-disable-kvbm-tests branch August 22, 2025 02:31

dmitry-tokarev-nv changed the title ~~fix: 0.4.1 disable kvbm tests~~ fix: 0.4.1 disable kvbm tests (CP #2611) Aug 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: 0.4.1 disable kvbm tests (CP #2611) #2635

fix: 0.4.1 disable kvbm tests (CP #2611) #2635

Uh oh!

dmitry-tokarev-nv commented Aug 22, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Aug 22, 2025

Uh oh!

Uh oh!

coderabbitai bot commented Aug 22, 2025

Review failed

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: 0.4.1 disable kvbm tests (CP #2611) #2635

fix: 0.4.1 disable kvbm tests (CP #2611) #2635

Uh oh!

Conversation

dmitry-tokarev-nv commented Aug 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Aug 22, 2025

Uh oh!

Uh oh!

coderabbitai bot commented Aug 22, 2025

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dmitry-tokarev-nv commented Aug 22, 2025 •

edited

Loading