-
Notifications
You must be signed in to change notification settings - Fork 4.1k
[feat]: Emit fixed-size KV blocks events #6824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat]: Emit fixed-size KV blocks events #6824
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @faradawn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello! Gemini here, providing a summary of this pull request by @faradawn. The main goal of this PR is to change how KV cache events (BlockStored, BlockRemoved) are emitted by the radix cache. Instead of emitting a single event for potentially large, variable-sized blocks corresponding to a node's key, the events are now broken down into fixed-size pages. This modification is intended to simplify and make the ZMQ wire contract more robust for consumers of these events, such as Nvidia Dynamo.
Highlights
- KV Event Granularity: The core change is that KV cache events (
BlockStored,BlockRemoved) are now emitted on a per-page basis, where each page contains a fixed size of contiguous tokens (self.page_size), rather than emitting one event for the entire, potentially multi-page, node key. - Consumer Compatibility: This change is motivated by the need to provide a simpler and more uniform wire contract for external consumers of the KV cache events, specifically mentioning Nvidia Dynamo.
- Radix Cache Implementation: The logic for recording store and remove events in the
RadixCacheclass (_record_store_eventand_record_remove_event) has been updated to iterate over the node's key inself.page_sizechunks and enqueue a separate event for each chunk.
Changelog
- python/sglang/srt/mem_cache/radix_cache.py
- Modified the
_record_store_eventmethod to iterate through the node's key in steps ofself.page_sizeand emit aBlockStoredevent for each page slice (around lines 463-482). - Added a comment in
_record_store_eventexplaining the rationale for emitting fixed-size page events for consumers like Dynamo (around lines 465-467). - Adjusted the parent hash calculation in
_record_store_eventto correctly handle the root node (which has no parent) by settingparent_block_hashtoNone(around line 468). - Modified the
_record_remove_eventmethod to iterate through the node's key in steps ofself.page_sizeand emit aBlockRemovedevent for each page slice, for symmetry with the store event (around lines 484-493). - Added a comment in
_record_remove_eventexplaining the symmetry with the store event (around lines 486-487).
- Modified the
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Data flows in blocks,
Fixed pages, simple locks,
For systems downstream.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively refactors the KV event emission to use fixed-size blocks, which should indeed help in simplifying the contract for consumers like Dynamo. The changes are localized to the event recording methods and appear logically sound.
Key improvements include:
- Paged emission of
BlockStoredandBlockRemovedevents. - Robust handling of
parent_block_hashwhen a node's parent isNonein_record_store_event.
A few points to consider for further improvement and clarity are detailed in the specific comments. Additionally, adding unit tests for this new event emission logic would be highly beneficial to ensure correctness across various scenarios (e.g., different key lengths relative to page size).
Summary of Findings
- Robustness of
parent_block_hashCalculation: The modification toparent_block_hash = hash(tuple(node.parent.key)) if node.parent else Nonein_record_store_eventcorrectly handles cases wherenode.parentmight beNone, preventing potentialAttributeErrorexceptions. This enhances the robustness of the event recording mechanism. - Clarity of
parent_block_hashSemantics: Theparent_block_hashinBlockStoredevents is calculated from the parent node's full key. This might differ from theblock_hashesemitted when the parent node (if multi-page) was itself stored (which would be page-based hashes). It's important to ensure this semantic is clear and sufficient for consumers. - Unit Testing for New Event Logic: The changes introduce new logic for emitting events in fixed-size blocks. It would be beneficial to add or update unit tests to cover this new behavior, including scenarios with different key lengths relative to
page_size, and edge cases like empty keys (if possible) or keys shorter thanpage_size. - Potential Redundancy in
if not page_tokens:check: In both_record_store_event(lines 471-472) and_record_remove_event(lines 490-491), the checkif not page_tokens: continueappears to be potentially redundant. Given that the loopfor start in range(0, len(node.key), self.page_size)iterates such thatstart < len(node.key), and assumingself.page_size > 0, the slicenode.key[start : start + self.page_size]should always yield a non-empty list ifnode.keyitself is non-empty. Ifnode.keyis empty, the loop doesn't run. This was not commented on directly due to review settings (low severity).
Merge Readiness
The pull request introduces a valuable feature and the implementation is largely on the right track. However, before merging, I recommend addressing the point about parent_block_hash semantics for clarity with consumers and, importantly, adding unit tests to cover the new event emission logic. Addressing these points will further improve the quality and maintainability of the code. I am unable to approve the pull request myself; please ensure other reviewers approve it after these considerations are addressed.
|
The file now slices the prompt into page_size chunks and publishes one BlockStored event per chunk. For each chunk the parent_block_hash points to the hash of the preceding chunk, giving downstream consumers (like Nvidia Dynamo) an accurate prefix chain. Thanks @trevor-m for pointing it out and @ispobock for the suggestion! |
…nge-kv-event-to-fixed-size
…glang into change-kv-event-to-fixed-size
…nge-kv-event-to-fixed-size
ispobock
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@faradawn Could you fix the lint ci? |
…nge-kv-event-to-fixed-size
…glang into change-kv-event-to-fixed-size
|
Hi @ispobock, linting fixed and pre-commit hooks passed. Ready for github workflows! |
|
Hi @ispobock @zhyncs @merrymercy , is there a way to launch the github actions? Think this PR is ready |
|
@zhyncs Please help merge this PR. |

Motivation
Since KV cache often cannot fit on one node, we would like to boardcast cache admission and eviction events to other nodes for KV transfer.
Many KV event consumers like Nvidia Dynamo only accept fixed-sized blocks. This PR changes the emission of kv event into fixed-size pages. This keeps the wire contract (ZMQ) standard and robust for more consumers, expanding the capability of KV transfer.
Modifications
the radix-cache emitter now slices multi-page nodes into uniform, fixed-size blocks before queuing events.
Checklist