[feat]: Emit fixed-size KV blocks events #6824

faradawn · 2025-06-03T03:12:52Z

Motivation

Since KV cache often cannot fit on one node, we would like to boardcast cache admission and eviction events to other nodes for KV transfer.

Many KV event consumers like Nvidia Dynamo only accept fixed-sized blocks. This PR changes the emission of kv event into fixed-size pages. This keeps the wire contract (ZMQ) standard and robust for more consumers, expanding the capability of KV transfer.

Modifications

the radix-cache emitter now slices multi-page nodes into uniform, fixed-size blocks before queuing events.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

gemini-code-assist

Hello @faradawn, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini here, providing a summary of this pull request by @faradawn. The main goal of this PR is to change how KV cache events (BlockStored, BlockRemoved) are emitted by the radix cache. Instead of emitting a single event for potentially large, variable-sized blocks corresponding to a node's key, the events are now broken down into fixed-size pages. This modification is intended to simplify and make the ZMQ wire contract more robust for consumers of these events, such as Nvidia Dynamo.

Highlights

KV Event Granularity: The core change is that KV cache events (BlockStored, BlockRemoved) are now emitted on a per-page basis, where each page contains a fixed size of contiguous tokens (self.page_size), rather than emitting one event for the entire, potentially multi-page, node key.
Consumer Compatibility: This change is motivated by the need to provide a simpler and more uniform wire contract for external consumers of the KV cache events, specifically mentioning Nvidia Dynamo.
Radix Cache Implementation: The logic for recording store and remove events in the RadixCache class (_record_store_event and _record_remove_event) has been updated to iterate over the node's key in self.page_size chunks and enqueue a separate event for each chunk.

Changelog

python/sglang/srt/mem_cache/radix_cache.py
- Modified the _record_store_event method to iterate through the node's key in steps of self.page_size and emit a BlockStored event for each page slice (around lines 463-482).
- Added a comment in _record_store_event explaining the rationale for emitting fixed-size page events for consumers like Dynamo (around lines 465-467).
- Adjusted the parent hash calculation in _record_store_event to correctly handle the root node (which has no parent) by setting parent_block_hash to None (around line 468).
- Modified the _record_remove_event method to iterate through the node's key in steps of self.page_size and emit a BlockRemoved event for each page slice, for symmetry with the store event (around lines 484-493).
- Added a comment in _record_remove_event explaining the symmetry with the store event (around lines 486-487).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Data flows in blocks,
Fixed pages, simple locks,
For systems downstream.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively refactors the KV event emission to use fixed-size blocks, which should indeed help in simplifying the contract for consumers like Dynamo. The changes are localized to the event recording methods and appear logically sound.

Key improvements include:

Paged emission of BlockStored and BlockRemoved events.
Robust handling of parent_block_hash when a node's parent is None in _record_store_event.

A few points to consider for further improvement and clarity are detailed in the specific comments. Additionally, adding unit tests for this new event emission logic would be highly beneficial to ensure correctness across various scenarios (e.g., different key lengths relative to page size).

Summary of Findings

Robustness of parent_block_hash Calculation: The modification to parent_block_hash = hash(tuple(node.parent.key)) if node.parent else None in _record_store_event correctly handles cases where node.parent might be None, preventing potential AttributeError exceptions. This enhances the robustness of the event recording mechanism.
Clarity of parent_block_hash Semantics: The parent_block_hash in BlockStored events is calculated from the parent node's full key. This might differ from the block_hashes emitted when the parent node (if multi-page) was itself stored (which would be page-based hashes). It's important to ensure this semantic is clear and sufficient for consumers.
Unit Testing for New Event Logic: The changes introduce new logic for emitting events in fixed-size blocks. It would be beneficial to add or update unit tests to cover this new behavior, including scenarios with different key lengths relative to page_size, and edge cases like empty keys (if possible) or keys shorter than page_size.
Potential Redundancy in if not page_tokens: check: In both _record_store_event (lines 471-472) and _record_remove_event (lines 490-491), the check if not page_tokens: continue appears to be potentially redundant. Given that the loop for start in range(0, len(node.key), self.page_size) iterates such that start < len(node.key), and assuming self.page_size > 0, the slice node.key[start : start + self.page_size] should always yield a non-empty list if node.key itself is non-empty. If node.key is empty, the loop doesn't run. This was not commented on directly due to review settings (low severity).

Merge Readiness

The pull request introduces a valuable feature and the implementation is largely on the right track. However, before merging, I recommend addressing the point about parent_block_hash semantics for clarity with consumers and, importantly, adding unit tests to cover the new event emission logic. Addressing these points will further improve the quality and maintainability of the code. I am unable to approve the pull request myself; please ensure other reviewers approve it after these considerations are addressed.

python/sglang/srt/mem_cache/radix_cache.py

faradawn · 2025-06-05T20:35:17Z

The file now slices the prompt into page_size chunks and publishes one BlockStored event per chunk.

For each chunk the parent_block_hash points to the hash of the preceding chunk, giving downstream consumers (like Nvidia Dynamo) an accurate prefix chain.

Thanks @trevor-m for pointing it out and @ispobock for the suggestion!

python/sglang/srt/mem_cache/radix_cache.py

…nge-kv-event-to-fixed-size

…glang into change-kv-event-to-fixed-size

python/sglang/srt/mem_cache/radix_cache.py

…nge-kv-event-to-fixed-size

ispobock

LGTM

ispobock · 2025-06-09T14:11:31Z

@faradawn Could you fix the lint ci?

…nge-kv-event-to-fixed-size

…glang into change-kv-event-to-fixed-size

faradawn · 2025-06-09T21:14:10Z

Hi @ispobock, linting fixed and pre-commit hooks passed. Ready for github workflows!

faradawn · 2025-06-10T18:30:38Z

Hi @ispobock @zhyncs @merrymercy , is there a way to launch the github actions? Think this PR is ready

ispobock · 2025-06-11T07:37:18Z

@zhyncs Please help merge this PR.

faradawn · 2025-06-11T22:45:34Z

Thank you @ispobock for reviewing the PR. Thank you @zhyncs for running the tests and merging it!

change kv event to fixed size

7ab4ee7

faradawn requested review from Ying1123, hnyls2002, merrymercy and xiezhq-hermann as code owners June 3, 2025 03:12

gemini-code-assist bot reviewed Jun 3, 2025

View reviewed changes

gemini-code-assist bot suggested changes Jun 3, 2025

View reviewed changes

python/sglang/srt/mem_cache/radix_cache.py Outdated Show resolved Hide resolved

faradawn mentioned this pull request Jun 3, 2025

feat: sglang integration of dynamo event manager ai-dynamo/dynamo#1323

Closed

ispobock reviewed Jun 5, 2025

View reviewed changes

python/sglang/srt/mem_cache/radix_cache.py Show resolved Hide resolved

faradawn and others added 2 commits June 5, 2025 13:10

make parent hash to be the prev block hash

1cdfb26

Merge branch 'main' into change-kv-event-to-fixed-size

efae289

faradawn changed the title ~~feat: change kv event to fixed-size blocks~~ [feat]: Emit fixed-size KV blocks events Jun 5, 2025

Merge branch 'main' into change-kv-event-to-fixed-size

31b54bb

trevor-m requested changes Jun 5, 2025

View reviewed changes

python/sglang/srt/mem_cache/radix_cache.py Outdated Show resolved Hide resolved

faradawn added 3 commits June 6, 2025 08:54

fix the first parent hash

19455f5

Merge branch 'main' of https://github.com/sgl-project/sglang into cha…

dfd4703

…nge-kv-event-to-fixed-size

Merge branch 'change-kv-event-to-fixed-size' of github.com:faradawn/s…

e09484c

…glang into change-kv-event-to-fixed-size

trevor-m requested changes Jun 6, 2025

View reviewed changes

python/sglang/srt/mem_cache/radix_cache.py Outdated Show resolved Hide resolved

faradawn added 2 commits June 7, 2025 11:42

Merge branch 'main' of https://github.com/sgl-project/sglang into cha…

44bc3e7

…nge-kv-event-to-fixed-size

fix parent black hash alignment

738ad98

trevor-m approved these changes Jun 7, 2025

View reviewed changes

faradawn and others added 2 commits June 7, 2025 15:40

Merge branch 'main' into change-kv-event-to-fixed-size

bf3ffd1

Merge branch 'main' into change-kv-event-to-fixed-size

65551dc

ispobock approved these changes Jun 9, 2025

View reviewed changes

ishandhanani approved these changes Jun 9, 2025

View reviewed changes

Merge branch 'main' into change-kv-event-to-fixed-size

fdf5b19

zhyncs temporarily deployed to prod June 9, 2025 14:07 — with GitHub Actions Inactive

fix eof new line format

cf1d5d5

faradawn and others added 5 commits June 9, 2025 15:51

Merge branch 'main' of https://github.com/sgl-project/sglang into cha…

b97a9b2

…nge-kv-event-to-fixed-size

Merge branch 'sgl-project:main' into change-kv-event-to-fixed-size

6e3e08b

Merge branch 'change-kv-event-to-fixed-size' of github.com:faradawn/s…

ca4b76e

…glang into change-kv-event-to-fixed-size

fix code formatting

422185b

add EOF new line and pre-commit hook passed

f664442

Merge branch 'main' into change-kv-event-to-fixed-size

9a631e0

Merge branch 'main' into change-kv-event-to-fixed-size

1375eb5

zhyncs merged commit 777688b into sgl-project:main Jun 11, 2025
49 of 52 checks passed

jianan-gu pushed a commit to jianan-gu/sglang that referenced this pull request Jun 12, 2025

[feat]: Emit fixed-size KV blocks events (sgl-project#6824)

6ef5687

faradawn mentioned this pull request Jun 22, 2025

feat: add kv router to sglang ai-dynamo/dynamo#1605

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feat]: Emit fixed-size KV blocks events #6824

[feat]: Emit fixed-size KV blocks events #6824

Uh oh!

faradawn commented Jun 3, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

faradawn commented Jun 5, 2025

Uh oh!

Uh oh!

Uh oh!

ispobock left a comment

Uh oh!

ispobock commented Jun 9, 2025

Uh oh!

faradawn commented Jun 9, 2025

Uh oh!

faradawn commented Jun 10, 2025

Uh oh!

ispobock commented Jun 11, 2025

Uh oh!

Uh oh!

faradawn commented Jun 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[feat]: Emit fixed-size KV blocks events #6824

[feat]: Emit fixed-size KV blocks events #6824

Uh oh!

Conversation

faradawn commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

Uh oh!

Uh oh!

faradawn commented Jun 5, 2025

Uh oh!

Uh oh!

Uh oh!

ispobock left a comment

Choose a reason for hiding this comment

Uh oh!

ispobock commented Jun 9, 2025

Uh oh!

faradawn commented Jun 9, 2025

Uh oh!

faradawn commented Jun 10, 2025

Uh oh!

ispobock commented Jun 11, 2025

Uh oh!

Uh oh!

faradawn commented Jun 11, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

faradawn commented Jun 3, 2025 •

edited

Loading