feat: KVBM async Python bindings and Layer class #1141

kthui · 2025-05-20T19:21:19Z

Overview:

Add async Python bindings methods for KV Block Manager.
Introduce the outer dimemsion.
Introduce the layer class.
Refactor dlpack implementation.
Add tests for layer class.

Details:

N/A

Where should the reviewer start?

Start with the async Python bindings for block manager. Then, review the test case for layer class and finally the layer class implementation.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

N/A

Summary by CodeRabbit

New Features
- Introduced a Layer class with DLPack interoperability for efficient data sharing.
- Added asynchronous methods to allocate host and device blocks.
- Enhanced Block objects to support Python container and iterator protocols, allowing iteration, indexing, and conversion to lists of layers.
- Updated tensor shapes to include an additional dimension for improved data representation.
Bug Fixes
- Improved error handling for block allocation and configuration failures, returning Python runtime errors instead of panicking.
Tests
- Added and updated tests to cover asynchronous block allocation, layer access, iteration, and data copying between host and device blocks, including support for an additional tensor dimension.

copy-pr-bot · 2025-05-20T19:21:22Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-05-28T23:17:10Z

Walkthrough

The changes introduce a new Python-exposed Layer class representing a block layer with DLPack support, enhance the Block class with Python container and iterator protocols, and add asynchronous allocation methods to BlockManager. DLPack interoperability is refactored for both blocks and layers. Tests and type hints are updated to reflect new tensor shapes, async APIs, and layer-level access.

Changes

File(s)	Change Summary
lib/bindings/python/rust/llm/block_manager.rs	Refactored to use shared Tokio runtime; improved error handling; added async methods `allocate_host_blocks` and `allocate_device_blocks`; exposed new `layer::Layer` class; removed `tokio_runtime` field.
lib/bindings/python/rust/llm/block_manager/block.rs	Refactored DLPack logic directly into `Block`; added field for Python iterator state; implemented Python container and iterator protocols; simplified device info retrieval; removed `DlPackTensor` struct.
lib/bindings/python/rust/llm/block_manager/block_list.rs	Updated `to_list` and `__iter__` method signatures to use explicit Python GIL tokens and mutable references, simplifying GIL handling and iterator reset logic.
lib/bindings/python/rust/llm/block_manager/dlpack.rs	Added new file implementing DLPack tensor and device info interoperability for blocks and layers, including helper functions for DLPack capsule and device tuple creation.
lib/bindings/python/rust/llm/block_manager/layer.rs	Added new file defining the `Layer` struct with DLPack support and Python bindings, including methods for DLPack capsule and device info retrieval.
lib/bindings/python/src/dynamo/_core.pyi	Added `Layer` class; updated `Block` with container/iterator protocols and `to_list`; added async allocation methods to `BlockManager`; updated docstrings for DLPack support and exception behavior.
lib/bindings/python/tests/test_block_manager.py	Refactored for new tensor shape with extra dimension; updated to use async allocation methods; added tests for layer access, iteration, and layer-level copying; revised permutation and data integrity checks; introduced new fixture for fresh managers.

Sequence Diagram(s)

sequenceDiagram
    participant Python as Python User
    participant BlockManager as BlockManager (Rust)
    participant BlockList as BlockList
    participant Block as Block
    participant Layer as Layer

    Python->>BlockManager: await allocate_host_blocks(count)
    BlockManager->>BlockList: create BlockList of Blocks
    BlockList-->>Python: BlockList

    Python->>BlockList: __iter__()
    BlockList-->>Python: iterator

    loop over BlockList
        Python->>Block: __getitem__(index)
        Block-->>Layer: return Layer
        Python->>Layer: __dlpack__()
        Layer-->>Python: DLPack capsule
    end

Poem

In the meadow of memory blocks,
Layers now hop with DLPack socks.
Async bunnies leap to allocate,
Iterators nibble—oh, how great!
With shapes that stretch and tests anew,
This rabbit’s code just grew and grew.
🐇✨

📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2a26025 and 8baca8e.

📒 Files selected for processing (1)

lib/bindings/python/rust/llm/block_manager.rs (7 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

lib/bindings/python/rust/llm/block_manager.rs

⏰ Context from checks skipped due to timeout of 90000ms (4)

GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: pre-merge-rust (.)
GitHub Check: Build and Test - vllm

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 1

♻️ Duplicate comments (3)

lib/bindings/python/rust/llm/block_manager/dlpack.rs (1)
114-129: ⚠️ Potential issue

unwrap() panics and bare PyTuple still break compilation – please propagate errors and return Ok

The exact same problems flagged in the previous review are still present:

unwrap() on every Python‐level call (eval, import, getattr, call1) will abort the interpreter when Python raises.

The function promises PyResult<…> but returns a raw PyTuple, producing a mismatched types compilation error.
-    let dev_type_list = py.eval(c_str!("[('CPU', 1), ...)]"), None, None).unwrap();
+    // `eval` expects &str; pass a string slice and propagate errors
+    let dev_type_list = py.eval(
+        "[('CPU', 1), ('CUDA', 2), ('CPU_PINNED', 3), ('OPENCL', 4), \
+          ('VULKAN', 7), ('METAL', 8), ('VPI', 9), ('ROCM', 10)]",
+        None,
+        None,
+    )?;

-    let dev_type_enum = py
-        .import("enum")
-        .unwrap()
-        .getattr("Enum")
-        .unwrap()
-        .call1(("DLDeviceType", dev_type_list))
-        .unwrap();
+    let dev_type_enum = py
+        .import("enum")?
+        .getattr("Enum")?
+        .call1(("DLDeviceType", dev_type_list))?;

 ...
-    PyTuple::new(py, dev)
+    Ok(PyTuple::new(py, dev))
Fixing these two issues will unblock compilation and ensure Python exceptions are surfaced as PyErr rather than hard-panics.
lib/bindings/python/rust/llm/block_manager/block.rs (1)
47-49: 🛠️ Refactor suggestion

Iterator state inside Block is still racy – use a dedicated iterator object

Keeping py_itr_idx inside the shared Block means:

Nested loops (for l1 in block: … for l2 in block: …) corrupt each other.

Two async tasks iterating the same Block race on the counter.

The earlier review recommended moving the index into a separate BlockIter object; the code still uses the in-place counter. Please refactor as suggested:
-    // Python iterator state
-    py_itr_idx: usize,
+    // (removed)
 #[pymethods]
 impl Block {
-    fn __iter__(mut slf: PyRefMut<'_, Self>) -> PyResult<PyRefMut<'_, Self>> {
-        slf.py_itr_idx = 0;
-        Ok(slf)
-    }
-
-    fn __next__(&mut self) -> PyResult<layer::Layer> {
-        if self.py_itr_idx >= self.num_layers() {
-            return Err(pyo3::exceptions::PyStopIteration::new_err(()));
-        }
-        let layer = …;
-        self.py_itr_idx += 1;
-        Ok(layer)
-    }
+    fn __iter__(slf: PyRef<Self>) -> BlockIter {
+        BlockIter { block: slf.into(), idx: 0 }
+    }
 }
+
+#[pyclass]
+struct BlockIter {
+    block: Block,
+    idx: usize,
+}
+
+#[pymethods]
+impl BlockIter {
+    fn __iter__(slf: PyRef<Self>) -> PyRef<Self> { slf }
+    fn __next__(&mut self) -> PyResult<layer::Layer> {
+        if self.idx >= self.block.num_layers() {
+            return Err(pyo3::exceptions::PyStopIteration::new_err(()));
+        }
+        let layer = self.block.__getitem__(self.idx)?;
+        self.idx += 1;
+        Ok(layer)
+    }
+}
This isolates iteration state, eliminates data races, and matches Python’s iterator semantics.

Also applies to: 112-133
lib/bindings/python/tests/test_block_manager.py (1)
382-396: Custom main() still bypasses pytest – skip markers are ignored

Invoking the test functions manually runs GPU tests unconditionally, even on CPU-only machines, negating the @pytest.mark.skipif decorators flagged earlier. Please delete the main() block and rely on pytest:
-if __name__ == "__main__":
-    asyncio.run(main())
+# Intentionally left blank – run with `pytest -q`

🧹 Nitpick comments (4)

lib/bindings/python/rust/llm/block_manager/dlpack.rs (2)
43-56: Pinned memory is reported as generic CPU – potential interoperability mismatch

Device::CPU is returned for BlockType::Pinned, but many frameworks (PyTorch, CuPy) expose a distinct CUDA-host / pinned device (kDLCPUHost, DLDeviceType::kDLCUDAHost). Mapping everything to plain CPU may silently fall back to pageable memory and degrade performance.

If dlpack / dlpark supports CudaHost, return that instead; otherwise document the limitation clearly so downstream users are not surprised.

58-76: Enum mapping lacks exhaustiveness checks – future dtypes may trigger UB

The match is non-exhaustive with respect to the upstream enum but relies on the compiler to detect missing variants. Add a wildcard arm returning a sensible default (unreachable!() or DataType::F32) or derive #[non_exhaustive] on the upstream enum to make future additions compile-time errors.
             dynamo_llm::common::dtype::DType::I64 => DataType::I64,
+            // Prevent future UB if new variants are added
+            _ => unreachable!("Unhandled DType variant in DlPack mapping"),
lib/bindings/python/tests/test_block_manager.py (2)

108-136: CPU-only tests are unnecessarily skipped when CUDA is absent

test_cpu_block_access and test_block_list_iteration operate solely on host memory but are wrapped in @skipif(not torch.cuda.is_available()). This prevents valuable coverage on CPU-only CI jobs.

Remove the CUDA guard for purely host tests to keep them runnable everywhere.

38-55: new_block_manager() creates device blocks by default – tests that only need host blocks pay the cost

Most CPU-side tests don’t use GPU allocations, yet new_block_manager() always requests DEVICE_NUM_BLOCKS. Consider adding parameters or a second fixture that provisions only the resources required for the test, reducing startup time and VRAM usage.

🛑 Comments failed to post (1)

lib/bindings/python/rust/llm/block_manager/block.rs (1)
166-195: ⚠️ Potential issue

Unsafe pointer may dangle after releasing the lock

ptr is captured while the mutex is held but used after the lock is released when the DLPack capsule is consumed in Python. If another thread resizes / frees the block before the consumer finishes, the pointer may dangle.

Consider:

Keeping the mutex guard alive inside DlPackTensor (store the MutexGuard) so memory cannot be mutated while a DLPack view exists, or

Exposing a read-only view and document that the block must outlive all exported tensors.

Without this, users can trigger use-after-free through concurrency.
🤖 Prompt for AI Agents
In lib/bindings/python/rust/llm/block_manager/block.rs around lines 166 to 195,
the unsafe pointer `ptr` is obtained while holding a mutex lock but used after
the lock is released, risking a dangling pointer if the block is mutated
concurrently. To fix this, modify the code to keep the mutex guard alive for the
lifetime of the DlPackTensor by storing the MutexGuard inside it, ensuring the
memory cannot be mutated or freed while the DLPack view exists. Alternatively,
expose a read-only view and clearly document that the block must outlive all
exported tensors to prevent use-after-free errors.

pull-request-size bot added the size/S label May 20, 2025

kthui self-assigned this May 20, 2025

github-actions bot added the feat label May 20, 2025

pull-request-size bot added size/L and removed size/S labels May 21, 2025

kthui force-pushed the jacky-kvbm-py-async branch from 3306ec8 to 867883d Compare May 21, 2025 01:42

kthui marked this pull request as ready for review May 21, 2025 01:48

kthui requested review from a team, GuanLuo, alec-flowers, biswapanda, grahamking, ishandhanani, jthomson04, kkranen, nnshah1, oandreeva-nv, paulhendricks, piotrm-nvidia, ptarasiewiczNV, rmccorm4, ryanolson, tanmayv25, tedzhouhk and tmonty12 as code owners May 21, 2025 01:48

kthui enabled auto-merge (squash) May 21, 2025 01:48

kthui force-pushed the jacky-kvbm-py-async branch 2 times, most recently from e04422e to 70ad4cf Compare May 28, 2025 23:12

pull-request-size bot removed the size/L label May 28, 2025

pull-request-size bot added the size/XXL label May 28, 2025

This comment was marked as outdated.

Sign in to view

kthui force-pushed the jacky-kvbm-py-async branch from 70ad4cf to 0d03d94 Compare May 29, 2025 00:03

This comment was marked as outdated.

Sign in to view

kthui force-pushed the jacky-kvbm-py-async branch from 0d03d94 to 1690afc Compare May 29, 2025 00:20

kthui added 6 commits May 28, 2025 17:21

Use tokio runtime from pyo3

7d3b1cd

Add async host and device blocks allocation methods

140c43b

refactor: Improve GIL lifecycle management

d3f3d5a

Enable KVBM py bind tests for outer_dims

f260bac

Add Layer class and tests

8fb0e53

Return Rust error as Python exception

42bfbb7

kthui force-pushed the jacky-kvbm-py-async branch from 1690afc to 42bfbb7 Compare May 29, 2025 00:22

coderabbitai bot reviewed May 29, 2025

View reviewed changes

kthui changed the title ~~feat: Add async Python bindings to KVBM~~ feat: KVBM async Python bindings and Layer class May 29, 2025

Refactor map_err to use to_pyerr

2a26025

kthui requested a review from PeaBrane as a code owner May 29, 2025 17:08

kthui disabled auto-merge May 29, 2025 17:08

Fix cargo fmt

8baca8e

kthui enabled auto-merge (squash) May 29, 2025 17:15

ryanolson approved these changes May 29, 2025

View reviewed changes

kthui merged commit 7677f74 into main May 29, 2025
9 checks passed

kthui deleted the jacky-kvbm-py-async branch May 29, 2025 17:49

coderabbitai bot mentioned this pull request Jun 4, 2025

chore: Restructure block manager bindings, run clippy + fmt #1368

Closed

coderabbitai bot mentioned this pull request Jul 1, 2025

feat: KVBM improved external block matching #1714

Closed

coderabbitai bot mentioned this pull request Aug 2, 2025

merge with main #2252

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: KVBM async Python bindings and Layer class #1141

feat: KVBM async Python bindings and Layer class #1141

Uh oh!

kthui commented May 20, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented May 20, 2025

Uh oh!

coderabbitai bot commented May 28, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: KVBM async Python bindings and Layer class #1141

feat: KVBM async Python bindings and Layer class #1141

Uh oh!

Conversation

kthui commented May 20, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented May 20, 2025

Uh oh!

coderabbitai bot commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kthui commented May 20, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented May 28, 2025 •

edited

Loading