Skip to content

Conversation

@faradawn
Copy link
Contributor

@faradawn faradawn commented Jun 3, 2025

Motivation

Since KV cache often cannot fit on one node, we would like to boardcast cache admission and eviction events to other nodes for KV transfer.

Many KV event consumers like Nvidia Dynamo only accept fixed-sized blocks. This PR changes the emission of kv event into fixed-size pages. This keeps the wire contract (ZMQ) standard and robust for more consumers, expanding the capability of KV transfer.

Modifications

the radix-cache emitter now slices multi-page nodes into uniform, fixed-size blocks before queuing events.

Checklist

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @faradawn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini here, providing a summary of this pull request by @faradawn. The main goal of this PR is to change how KV cache events (BlockStored, BlockRemoved) are emitted by the radix cache. Instead of emitting a single event for potentially large, variable-sized blocks corresponding to a node's key, the events are now broken down into fixed-size pages. This modification is intended to simplify and make the ZMQ wire contract more robust for consumers of these events, such as Nvidia Dynamo.

Highlights

  • KV Event Granularity: The core change is that KV cache events (BlockStored, BlockRemoved) are now emitted on a per-page basis, where each page contains a fixed size of contiguous tokens (self.page_size), rather than emitting one event for the entire, potentially multi-page, node key.
  • Consumer Compatibility: This change is motivated by the need to provide a simpler and more uniform wire contract for external consumers of the KV cache events, specifically mentioning Nvidia Dynamo.
  • Radix Cache Implementation: The logic for recording store and remove events in the RadixCache class (_record_store_event and _record_remove_event) has been updated to iterate over the node's key in self.page_size chunks and enqueue a separate event for each chunk.

Changelog

  • python/sglang/srt/mem_cache/radix_cache.py
    • Modified the _record_store_event method to iterate through the node's key in steps of self.page_size and emit a BlockStored event for each page slice (around lines 463-482).
    • Added a comment in _record_store_event explaining the rationale for emitting fixed-size page events for consumers like Dynamo (around lines 465-467).
    • Adjusted the parent hash calculation in _record_store_event to correctly handle the root node (which has no parent) by setting parent_block_hash to None (around line 468).
    • Modified the _record_remove_event method to iterate through the node's key in steps of self.page_size and emit a BlockRemoved event for each page slice, for symmetry with the store event (around lines 484-493).
    • Added a comment in _record_remove_event explaining the symmetry with the store event (around lines 486-487).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Data flows in blocks,
Fixed pages, simple locks,
For systems downstream.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively refactors the KV event emission to use fixed-size blocks, which should indeed help in simplifying the contract for consumers like Dynamo. The changes are localized to the event recording methods and appear logically sound.

Key improvements include:

  • Paged emission of BlockStored and BlockRemoved events.
  • Robust handling of parent_block_hash when a node's parent is None in _record_store_event.

A few points to consider for further improvement and clarity are detailed in the specific comments. Additionally, adding unit tests for this new event emission logic would be highly beneficial to ensure correctness across various scenarios (e.g., different key lengths relative to page size).

Summary of Findings

  • Robustness of parent_block_hash Calculation: The modification to parent_block_hash = hash(tuple(node.parent.key)) if node.parent else None in _record_store_event correctly handles cases where node.parent might be None, preventing potential AttributeError exceptions. This enhances the robustness of the event recording mechanism.
  • Clarity of parent_block_hash Semantics: The parent_block_hash in BlockStored events is calculated from the parent node's full key. This might differ from the block_hashes emitted when the parent node (if multi-page) was itself stored (which would be page-based hashes). It's important to ensure this semantic is clear and sufficient for consumers.
  • Unit Testing for New Event Logic: The changes introduce new logic for emitting events in fixed-size blocks. It would be beneficial to add or update unit tests to cover this new behavior, including scenarios with different key lengths relative to page_size, and edge cases like empty keys (if possible) or keys shorter than page_size.
  • Potential Redundancy in if not page_tokens: check: In both _record_store_event (lines 471-472) and _record_remove_event (lines 490-491), the check if not page_tokens: continue appears to be potentially redundant. Given that the loop for start in range(0, len(node.key), self.page_size) iterates such that start < len(node.key), and assuming self.page_size > 0, the slice node.key[start : start + self.page_size] should always yield a non-empty list if node.key itself is non-empty. If node.key is empty, the loop doesn't run. This was not commented on directly due to review settings (low severity).

Merge Readiness

The pull request introduces a valuable feature and the implementation is largely on the right track. However, before merging, I recommend addressing the point about parent_block_hash semantics for clarity with consumers and, importantly, adding unit tests to cover the new event emission logic. Addressing these points will further improve the quality and maintainability of the code. I am unable to approve the pull request myself; please ensure other reviewers approve it after these considerations are addressed.

@faradawn faradawn changed the title feat: change kv event to fixed-size blocks [feat]: Emit fixed-size KV blocks events Jun 5, 2025
@faradawn
Copy link
Contributor Author

faradawn commented Jun 5, 2025

The file now slices the prompt into page_size chunks and publishes one BlockStored event per chunk.

For each chunk the parent_block_hash points to the hash of the preceding chunk, giving downstream consumers (like Nvidia Dynamo) an accurate prefix chain.

Thanks @trevor-m for pointing it out and @ispobock for the suggestion!

Nvidia Dynamo

Copy link
Collaborator

@ispobock ispobock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ispobock
Copy link
Collaborator

ispobock commented Jun 9, 2025

@faradawn Could you fix the lint ci?

@faradawn
Copy link
Contributor Author

faradawn commented Jun 9, 2025

Hi @ispobock, linting fixed and pre-commit hooks passed. Ready for github workflows!

@faradawn
Copy link
Contributor Author

Hi @ispobock @zhyncs @merrymercy , is there a way to launch the github actions? Think this PR is ready

@ispobock
Copy link
Collaborator

@zhyncs Please help merge this PR.

@zhyncs zhyncs merged commit 777688b into sgl-project:main Jun 11, 2025
49 of 52 checks passed
@faradawn
Copy link
Contributor Author

Thank you @ispobock for reviewing the PR. Thank you @zhyncs for running the tests and merging it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants