feat: allow using ApproxKvIndexer for routing via use_kv_events flag #1869

PeaBrane · 2025-07-10T20:39:42Z

Overview:

Additionally:

Guard find_best_matches such that only one request can run it at a time. Performance tradeoff for more optimal routing empirically.
Set a threshold for TimerManager to rebuild the binary heap when it gets too large (too many stale entries). Should not be normally needed unless the time duration is set very long
Cosmetic cleanups

Summary by CodeRabbit

Documentation
- Updated the user guide for the CLI tool to include a new optional argument for KV event handling, with clearer explanations and improved formatting for related options.
New Features
- Added a configuration option to control whether the router listens to KV events or uses an approximate prediction method for cached blocks.
Improvements
- Enhanced internal logic for managing cached block routing, including improved concurrency control and more efficient handling of stale entries.

No changes to public APIs outside of the new configuration option.

coderabbitai · 2025-07-10T20:50:04Z

Walkthrough

The changes introduce a new use_kv_events flag to the Dynamo KV router, allowing users to choose between the original event-driven KvIndexer and a new approximate ApproxKvIndexer. The router and its configuration are updated to support this flag, with unified indexer handling, concurrency improvements, and documentation updates reflecting the new option.

Changes

File(s)	Change Summary
docs/guides/dynamo_run.md	Updated documentation to describe the new `--use-kv-events` CLI argument, explain its behavior, and clarify related options.
launch/dynamo-run/src/flags.rs	Added the `use_kv_events` flag to CLI flags and passed it to router configuration.
lib/llm/src/discovery/model_manager.rs	Simplified logic to use the `use_kv_events` flag from config directly when constructing the router.
lib/llm/src/kv_router.rs	Added `use_kv_events` to `KvRouterConfig`, unified `KvIndexer` and `ApproxKvIndexer` under an `Indexer` enum, added mutex for concurrency, and updated router logic to select indexer based on the flag.
lib/llm/src/kv_router/approx.rs	Enhanced `TimerManager` with a threshold and heap rebuild logic, updated `ApproxKvIndexer` to use new timer manager signature, and adjusted tests accordingly.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI
    participant Flags
    participant RouterConfig
    participant KvRouter
    participant KvIndexer
    participant ApproxKvIndexer

    User->>CLI: Run with --use-kv-events=[true|false]
    CLI->>Flags: Parse arguments
    Flags->>RouterConfig: Pass use_kv_events flag
    RouterConfig->>KvRouter: Instantiate with config
    alt use_kv_events = true
        KvRouter->>KvIndexer: Create KvIndexer
    else use_kv_events = false
        KvRouter->>ApproxKvIndexer: Create ApproxKvIndexer
    end
    KvRouter->>KvRouter: Route requests using selected indexer

Possibly related PRs

feat: Approximate KV Routing #1636: Introduces and implements the ApproxKvIndexer and approximate KV routing, which is directly extended and integrated by this PR.

Poem

In the warren of code, a new flag appears,
Choose your indexer—let's all give three cheers!
With events or a guess, the router will know,
Which blocks are cached, where data should go.
🐇✨ Now the routing is clever and neat—
A hop, a skip, and requests can't be beat!

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

lib/llm/src/kv_router/approx.rs (1)

196-197: Consider making the threshold configurable.

While the fixed threshold of 50 is reasonable, consider making it configurable through the constructor parameters for better flexibility in different deployment scenarios.
lib/llm/src/kv_router.rs (2)
108-125: Clean abstraction over indexer implementations.

The Indexer enum effectively unifies the interface. Regarding the TODO comment: Rust doesn't auto-derive trait implementations for enums, but you could use a macro crate like enum_dispatch to automate this boilerplate.

160-165: Consider making the TTL configurable.

The 120-second TTL for ApproxKvIndexer is hardcoded. Consider making this configurable through KvRouterConfig to allow tuning based on deployment characteristics.

Apply this diff to make TTL configurable:
 pub struct KvRouterConfig {
     pub overlap_score_weight: f64,
     pub router_temperature: f64,
     pub use_kv_events: bool,
+    pub approx_indexer_ttl_secs: u64,
     pub max_num_batched_tokens: u32,
 }

 impl Default for KvRouterConfig {
     fn default() -> Self {
         Self {
             overlap_score_weight: 1.0,
             router_temperature: 0.5,
             use_kv_events: true,
+            approx_indexer_ttl_secs: 120,
             max_num_batched_tokens: 8192,
         }
     }
 }
Then use it in the instantiation:
-                Duration::from_secs(120),
+                Duration::from_secs(config.approx_indexer_ttl_secs),

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8cd9411 and 5321a2a.

📒 Files selected for processing (5)

docs/guides/dynamo_run.md (2 hunks)
launch/dynamo-run/src/flags.rs (2 hunks)
lib/llm/src/discovery/model_manager.rs (1 hunks)
lib/llm/src/kv_router.rs (11 hunks)
lib/llm/src/kv_router/approx.rs (6 hunks)

🧰 Additional context used

🧠 Learnings (5)

📓 Common learnings

Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1285
File: lib/llm/src/kv_router/scheduler.rs:260-266
Timestamp: 2025-05-30T06:34:12.785Z
Learning: In the KV router scheduler code, PeaBrane prefers fail-fast behavior over silent failure handling. When accessing worker metrics data that could be out-of-bounds (like dp_rank indexing), explicit panics are preferred over graceful degradation with continue statements to ensure data integrity issues are caught early.

docs/guides/dynamo_run.md (1)

Learnt from: nnshah1
PR: ai-dynamo/dynamo#1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The `@dynamo_worker()` decorator in the dynamo codebase returns a wrapper that automatically injects the `runtime` parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature `async def get_metrics(runtime, log_dir)` decorated with `@dynamo_worker()` can be called as `get_metrics(log_dir)` because the decorator wrapper injects the runtime parameter.

lib/llm/src/discovery/model_manager.rs (5)

Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.

Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1285
File: lib/llm/src/kv_router/scoring.rs:58-63
Timestamp: 2025-05-30T06:38:09.630Z
Learning: In lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating load_avg and variance with empty endpoints rather than adding guards for division by zero. They want the code to fail fast on this error condition.

Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.

Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.

Learnt from: jthomson04
PR: ai-dynamo/dynamo#1429
File: lib/runtime/src/utils/leader_worker_barrier.rs:69-72
Timestamp: 2025-06-08T03:12:03.985Z
Learning: In the leader-worker barrier implementation in lib/runtime/src/utils/leader_worker_barrier.rs, the `wait_for_key_count` function correctly uses exact equality (`==`) instead of greater-than-or-equal (`>=`) because worker IDs must be unique (enforced by etcd create-only operations), ensuring exactly the expected number of workers can register.

lib/llm/src/kv_router/approx.rs (1)

Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.

lib/llm/src/kv_router.rs (8)

Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.

Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1285
File: lib/llm/src/kv_router/scoring.rs:58-63
Timestamp: 2025-05-30T06:38:09.630Z
Learning: In lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating load_avg and variance with empty endpoints rather than adding guards for division by zero. They want the code to fail fast on this error condition.

Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.

Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1285
File: lib/llm/src/kv_router/scheduler.rs:260-266
Timestamp: 2025-05-30T06:34:12.785Z
Learning: In the KV router scheduler code, PeaBrane prefers fail-fast behavior over silent failure handling. When accessing worker metrics data that could be out-of-bounds (like dp_rank indexing), explicit panics are preferred over graceful degradation with continue statements to ensure data integrity issues are caught early.

Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.

Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.

Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.

Learnt from: oandreeva-nv
PR: ai-dynamo/dynamo#1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: Mirror Repository to GitLab
GitHub Check: Build and Test - vllm
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (.)

🔇 Additional comments (13)

launch/dynamo-run/src/flags.rs (2)

131-136: LGTM! Well-documented CLI flag addition.

The new use_kv_events flag is properly implemented with clear documentation explaining its purpose and default behavior.

224-224: Correctly passes the new flag to router configuration.

The use_kv_events flag is properly propagated to the KvRouterConfig constructor.

docs/guides/dynamo_run.md (2)

11-11: Documentation accurately reflects the new CLI option.

The usage example correctly shows the new --use-kv-events flag with its default value.

204-210: Excellent documentation of the KV routing options.

The reformatted bullet points improve readability, and the explanation of the --use-kv-events flag clearly describes when to use each indexer type.

lib/llm/src/discovery/model_manager.rs (2)

215-215: Appropriate use of clone for ownership transfer.

The change from borrowing to cloning is correct since DefaultWorkerSelector::new needs ownership of the config. The KvRouterConfig struct is small and derives Clone, making this an efficient operation.

220-223: Good simplification of the configuration logic.

Directly using the use_kv_events flag from the config is cleaner than the previous approach of deriving it from other fields.

lib/llm/src/kv_router/approx.rs (3)

84-97: Good addition of threshold mechanism for heap management.

The threshold field is well-documented and will help prevent unbounded growth of stale entries in the expiration heap.

100-108: Efficient heap rebuild implementation.

The rebuild_heap method correctly reconstructs the heap from the authoritative timers map, effectively removing all stale entries.

127-131: Smart threshold-based rebuild trigger.

The condition self.expirations.len() > self.timers.len() * self.threshold effectively triggers rebuilds when stale entries accumulate beyond the threshold multiplier.

lib/llm/src/kv_router.rs (4)

71-71: Well-structured configuration extension.

The use_kv_events field is properly integrated into the config struct with a sensible default value of true, maintaining backward compatibility.

Also applies to: 82-82, 94-101

137-139: Consider the performance implications of the mutex.

The mutex serializes all find_best_match calls. As your TODO suggests, benchmark whether making the subroutines synchronous would be more efficient than async with a mutex. This could be a bottleneck under high concurrent load.

Also applies to: 220-222

227-228: Good refactoring and proper state management.

Using the compute_block_hash_for_seq helper function improves code reuse, and properly notifying ApproxKvIndexer about routing decisions is essential for maintaining its internal state.

Also applies to: 241-246

208-208: Correct mutex initialization.

The mutex is properly initialized for synchronization purposes.

docs/guides/dynamo_run.md

launch/dynamo-run/src/flags.rs

lib/llm/src/kv_router/approx.rs

docs/guides/dynamo_run.md

lib/llm/src/kv_router/approx.rs

Co-authored-by: Hongkuan Zhou <[email protected]> Signed-off-by: Yan Ru Pei <[email protected]>

PeaBrane added 4 commits July 9, 2025 21:31

approx kv router

cea7331

correct local_block_hashes, and some approx.rs cleanups

1082da9

block on find_best_match

14eee77

use_kv_events false uses ApproxKvIndexer

b8d5b3e

pull-request-size bot added the size/L label Jul 10, 2025

copy-pr-bot bot temporarily deployed to GITLAB July 10, 2025 20:39 Inactive

github-actions bot added the feat label Jul 10, 2025

copy-pr-bot bot temporarily deployed to GITLAB July 10, 2025 20:40 Inactive

typo

5321a2a

copy-pr-bot bot temporarily deployed to GITLAB July 10, 2025 20:40 Inactive

PeaBrane requested a review from jthomson04 July 10, 2025 20:41

PeaBrane marked this pull request as ready for review July 10, 2025 20:42

copy-pr-bot bot temporarily deployed to GITLAB July 10, 2025 20:45 Inactive

coderabbitai bot reviewed Jul 10, 2025

View reviewed changes

jthomson04 reviewed Jul 10, 2025

View reviewed changes

docs/guides/dynamo_run.md Outdated Show resolved Hide resolved

launch/dynamo-run/src/flags.rs Outdated Show resolved Hide resolved

lib/llm/src/kv_router/approx.rs Show resolved Hide resolved

clippy

a060ef4

copy-pr-bot bot temporarily deployed to GITLAB July 10, 2025 20:52 Inactive

PeaBrane self-assigned this Jul 10, 2025

short description of approx router

4e63f6f

copy-pr-bot bot temporarily deployed to GITLAB July 10, 2025 21:14 Inactive

copy-pr-bot bot temporarily deployed to GITLAB July 10, 2025 21:15 Inactive

jthomson04 approved these changes Jul 10, 2025

View reviewed changes

tedzhouhk reviewed Jul 10, 2025

View reviewed changes

docs/guides/dynamo_run.md Outdated Show resolved Hide resolved

lib/llm/src/kv_router/approx.rs Show resolved Hide resolved

Update docs/guides/dynamo_run.md

c42dcc0

Co-authored-by: Hongkuan Zhou <[email protected]> Signed-off-by: Yan Ru Pei <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB July 10, 2025 22:07 Inactive

copy-pr-bot bot temporarily deployed to GITLAB July 10, 2025 22:10 Inactive

PeaBrane enabled auto-merge (squash) July 10, 2025 22:20

PeaBrane merged commit 13640e1 into main Jul 10, 2025
13 of 14 checks passed

PeaBrane deleted the rupei/kv-router-appro branch July 10, 2025 22:46

This was referenced Sep 5, 2025

feat: adds kv indexer metrics #2905

Merged

feat: Update docs to indicate need to use consistent hashing for KV events in backend engines #2981

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: allow using ApproxKvIndexer for routing via use_kv_events flag #1869

feat: allow using ApproxKvIndexer for routing via use_kv_events flag #1869

Uh oh!

PeaBrane commented Jul 10, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Jul 10, 2025

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: allow using ApproxKvIndexer for routing via use_kv_events flag #1869

feat: allow using ApproxKvIndexer for routing via use_kv_events flag #1869

Uh oh!

Conversation

PeaBrane commented Jul 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 10, 2025

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PeaBrane commented Jul 10, 2025 •

edited

Loading