-
Notifications
You must be signed in to change notification settings - Fork 766
feat: Update docs to indicate need to use consistent hashing for KV events in backend engines #2981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
PeaBrane
merged 14 commits into
ai-dynamo:main
from
qimcis:docs/consistent-hashing-for-kv-events
Sep 18, 2025
Merged
Changes from 1 commit
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
6f8e1f4
fix: devcontainer.json typo from b6b3a767c (#2976)
qimcis 385040d
fix import order
qimcis 539b58a
Merge branch 'main' into docs/consistent-hashing-for-kv-events
qimcis c232989
remove guide and uneccessary changes, added docstrings
qimcis 2e4de8c
Merge branch 'main' into docs/consistent-hashing-for-kv-events
qimcis e0d9806
revert index.rst
qimcis ff51102
fix cargo fmt issue
qimcis b7a7a2a
fix clippy error
qimcis 8fc6099
Merge branch 'main' into docs/consistent-hashing-for-kv-events
PeaBrane 557f06f
revert indexer.rs
PeaBrane da3c230
extra empty line at end
PeaBrane 34b9760
revert args.py
PeaBrane 1eac768
only keep hashing notes for vllm
PeaBrane 52e76be
use fixed seed for router benchmarking
PeaBrane File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next
Next commit
Signed-off-by: Keiven Chang <[email protected]> Signed-off-by: Chi McIsaac <[email protected]>
- Loading branch information
commit 6f8e1f47d2b33e8731a0f7aa20f8c76871a31845
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| <!-- | ||
PeaBrane marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. | ||
| SPDX-License-Identifier: Apache-2.0 | ||
| --> | ||
|
|
||
| # KV Events & Hashing Consistency | ||
|
|
||
| This guide explains how Dynamo computes and consumes KV cache block hashes, and how to ensure consistent hashing across engines, processes, and nodes. | ||
|
|
||
| ## Canonical Hashing (Router) | ||
|
|
||
| - Algorithm: xxh3-64 | ||
| - Seed: 1337 | ||
| - Token encoding: u32 tokens serialized via little-endian `to_le_bytes` | ||
| - Scope: Computes "local block hashes" used by the router/indexer to match cached prefixes. | ||
|
|
||
| Reference implementations: | ||
| - Rust (primary): `lib/llm/src/kv_router/indexer.rs` (`compute_block_hash_for_seq`) | ||
| - Python binding: `dynamo._core.compute_block_hash_for_seq_py` (delegates to the Rust implementation) | ||
|
|
||
| Note: | ||
| - `kv_block_size` must be identical between the engine that publishes KV events and the router. A mismatch will yield different local block hashes and break prefix matching. | ||
|
|
||
| Reference test vector check: | ||
| - Tokens `[1,2,3,4]`, `kv_block_size=4` → `14643705804678351452` | ||
|
|
||
| ## Engine Block IDs vs Router Hashes | ||
|
|
||
| - LocalBlockHash (router): Canonical value used for KV matching. | ||
| - ExternalSequenceBlockHash (engine): Engine-provided block identifiers to link parent/child and removals; MUST be deterministic within a deployment. | ||
|
|
||
| The router recomputes LocalBlockHash from tokens on ingest. If parent links or removals reference unknown ExternalSequenceBlockHash, the router logs a warning (or error if `DYN_KV_ENFORCE_ENGINE_HASH_STABILITY=1`). | ||
|
|
||
| ## Engine Configuration Tips | ||
|
|
||
| The goal is to ensure that emitted KV events are deterministic across ranks/restarts. | ||
|
|
||
| General: | ||
| - Set `PYTHONHASHSEED=0` for Python processes to eliminate hash randomization. | ||
|
|
||
| vLLM: | ||
| - If your version supports it, set a deterministic prefix-caching algorithm, e.g. `--prefix-caching-algo sha256`. | ||
| - Keep `enable_prefix_caching=True` when emitting KV events. | ||
|
|
||
| SGLang: | ||
| - Ensure events use deterministic block IDs across processes. If applicable, set `PYTHONHASHSEED=0`. | ||
|
|
||
| TensorRT-LLM: | ||
| - Use a stable `--random-seed` where applicable and validate that KV event block IDs are deterministic across launches. | ||
|
|
||
| ## Observability and Enforcement | ||
|
|
||
| - Warnings on router when parent link is missing or a removal refers to an unknown block id include remediation hints. | ||
| - Set `DYN_KV_ENFORCE_ENGINE_HASH_STABILITY=1` to promote these warnings to error-level logs. This does not abort processing; the router still skips the offending operation. | ||
|
|
||
| ## Quick Self-Check | ||
|
|
||
| From Python: | ||
|
|
||
| ```python | ||
| from dynamo._core import compute_block_hash_for_seq_py | ||
| assert compute_block_hash_for_seq_py([1,2,3,4], 4)[0] == 14643705804678351452 | ||
| ``` | ||
|
|
||
| If this check fails across nodes, verify environment and engine flags per above. | ||
| This self‑check only validates the router’s canonical hashing path (known‑answer test); it does not validate that engine‑emitted block IDs are deterministic. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.