-
Notifications
You must be signed in to change notification settings - Fork 748
feat: KVBM async Python bindings and Layer class #1141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3306ec8 to
867883d
Compare
e04422e to
70ad4cf
Compare
WalkthroughThe changes introduce a new Python-exposed Changes
Sequence Diagram(s)sequenceDiagram
participant Python as Python User
participant BlockManager as BlockManager (Rust)
participant BlockList as BlockList
participant Block as Block
participant Layer as Layer
Python->>BlockManager: await allocate_host_blocks(count)
BlockManager->>BlockList: create BlockList of Blocks
BlockList-->>Python: BlockList
Python->>BlockList: __iter__()
BlockList-->>Python: iterator
loop over BlockList
Python->>Block: __getitem__(index)
Block-->>Layer: return Layer
Python->>Layer: __dlpack__()
Layer-->>Python: DLPack capsule
end
Poem
📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (4)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
70ad4cf to
0d03d94
Compare
0d03d94 to
1690afc
Compare
1690afc to
42bfbb7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 1
♻️ Duplicate comments (3)
lib/bindings/python/rust/llm/block_manager/dlpack.rs (1)
114-129:⚠️ Potential issue
unwrap()panics and barePyTuplestill break compilation – please propagate errors and returnOkThe exact same problems flagged in the previous review are still present:
unwrap()on every Python‐level call (eval,import,getattr,call1) will abort the interpreter when Python raises.- The function promises
PyResult<…>but returns a rawPyTuple, producing a mismatched types compilation error.- let dev_type_list = py.eval(c_str!("[('CPU', 1), ...)]"), None, None).unwrap(); + // `eval` expects &str; pass a string slice and propagate errors + let dev_type_list = py.eval( + "[('CPU', 1), ('CUDA', 2), ('CPU_PINNED', 3), ('OPENCL', 4), \ + ('VULKAN', 7), ('METAL', 8), ('VPI', 9), ('ROCM', 10)]", + None, + None, + )?; - let dev_type_enum = py - .import("enum") - .unwrap() - .getattr("Enum") - .unwrap() - .call1(("DLDeviceType", dev_type_list)) - .unwrap(); + let dev_type_enum = py + .import("enum")? + .getattr("Enum")? + .call1(("DLDeviceType", dev_type_list))?; ... - PyTuple::new(py, dev) + Ok(PyTuple::new(py, dev))Fixing these two issues will unblock compilation and ensure Python exceptions are surfaced as
PyErrrather than hard-panics.lib/bindings/python/rust/llm/block_manager/block.rs (1)
47-49: 🛠️ Refactor suggestionIterator state inside
Blockis still racy – use a dedicated iterator objectKeeping
py_itr_idxinside the sharedBlockmeans:
- Nested loops (
for l1 in block: … for l2 in block: …) corrupt each other.- Two async tasks iterating the same
Blockrace on the counter.The earlier review recommended moving the index into a separate
BlockIterobject; the code still uses the in-place counter. Please refactor as suggested:- // Python iterator state - py_itr_idx: usize, + // (removed)#[pymethods] impl Block { - fn __iter__(mut slf: PyRefMut<'_, Self>) -> PyResult<PyRefMut<'_, Self>> { - slf.py_itr_idx = 0; - Ok(slf) - } - - fn __next__(&mut self) -> PyResult<layer::Layer> { - if self.py_itr_idx >= self.num_layers() { - return Err(pyo3::exceptions::PyStopIteration::new_err(())); - } - let layer = …; - self.py_itr_idx += 1; - Ok(layer) - } + fn __iter__(slf: PyRef<Self>) -> BlockIter { + BlockIter { block: slf.into(), idx: 0 } + } } + +#[pyclass] +struct BlockIter { + block: Block, + idx: usize, +} + +#[pymethods] +impl BlockIter { + fn __iter__(slf: PyRef<Self>) -> PyRef<Self> { slf } + fn __next__(&mut self) -> PyResult<layer::Layer> { + if self.idx >= self.block.num_layers() { + return Err(pyo3::exceptions::PyStopIteration::new_err(())); + } + let layer = self.block.__getitem__(self.idx)?; + self.idx += 1; + Ok(layer) + } +}This isolates iteration state, eliminates data races, and matches Python’s iterator semantics.
Also applies to: 112-133
lib/bindings/python/tests/test_block_manager.py (1)
382-396: Custommain()still bypassespytest– skip markers are ignoredInvoking the test functions manually runs GPU tests unconditionally, even on CPU-only machines, negating the
@pytest.mark.skipifdecorators flagged earlier. Please delete themain()block and rely onpytest:-if __name__ == "__main__": - asyncio.run(main()) +# Intentionally left blank – run with `pytest -q`
🧹 Nitpick comments (4)
lib/bindings/python/rust/llm/block_manager/dlpack.rs (2)
43-56: Pinned memory is reported as generic CPU – potential interoperability mismatch
Device::CPUis returned forBlockType::Pinned, but many frameworks (PyTorch, CuPy) expose a distinct CUDA-host / pinned device (kDLCPUHost,DLDeviceType::kDLCUDAHost). Mapping everything to plain CPU may silently fall back to pageable memory and degrade performance.If dlpack / dlpark supports
CudaHost, return that instead; otherwise document the limitation clearly so downstream users are not surprised.
58-76: Enum mapping lacks exhaustiveness checks – future dtypes may trigger UBThe
matchis non-exhaustive with respect to the upstream enum but relies on the compiler to detect missing variants. Add a wildcard arm returning a sensible default (unreachable!()orDataType::F32) or derive#[non_exhaustive]on the upstream enum to make future additions compile-time errors.dynamo_llm::common::dtype::DType::I64 => DataType::I64, + // Prevent future UB if new variants are added + _ => unreachable!("Unhandled DType variant in DlPack mapping"),lib/bindings/python/tests/test_block_manager.py (2)
108-136: CPU-only tests are unnecessarily skipped when CUDA is absent
test_cpu_block_accessandtest_block_list_iterationoperate solely on host memory but are wrapped in@skipif(not torch.cuda.is_available()). This prevents valuable coverage on CPU-only CI jobs.Remove the CUDA guard for purely host tests to keep them runnable everywhere.
38-55:new_block_manager()creates device blocks by default – tests that only need host blocks pay the costMost CPU-side tests don’t use GPU allocations, yet
new_block_manager()always requestsDEVICE_NUM_BLOCKS. Consider adding parameters or a second fixture that provisions only the resources required for the test, reducing startup time and VRAM usage.
🛑 Comments failed to post (1)
lib/bindings/python/rust/llm/block_manager/block.rs (1)
166-195:
⚠️ Potential issueUnsafe pointer may dangle after releasing the lock
ptris captured while the mutex is held but used after the lock is released when the DLPack capsule is consumed in Python. If another thread resizes / frees the block before the consumer finishes, the pointer may dangle.Consider:
- Keeping the mutex guard alive inside
DlPackTensor(store theMutexGuard) so memory cannot be mutated while a DLPack view exists, or- Exposing a read-only view and document that the block must outlive all exported tensors.
Without this, users can trigger use-after-free through concurrency.
🤖 Prompt for AI Agents
In lib/bindings/python/rust/llm/block_manager/block.rs around lines 166 to 195, the unsafe pointer `ptr` is obtained while holding a mutex lock but used after the lock is released, risking a dangling pointer if the block is mutated concurrently. To fix this, modify the code to keep the mutex guard alive for the lifetime of the DlPackTensor by storing the MutexGuard inside it, ensuring the memory cannot be mutated or freed while the DLPack view exists. Alternatively, expose a read-only view and clearly document that the block must outlive all exported tensors to prevent use-after-free errors.
Overview:
Details:
N/A
Where should the reviewer start?
Start with the async Python bindings for block manager. Then, review the test case for layer class and finally the layer class implementation.
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
N/A
Summary by CodeRabbit
New Features
Layerclass with DLPack interoperability for efficient data sharing.Blockobjects to support Python container and iterator protocols, allowing iteration, indexing, and conversion to lists of layers.Bug Fixes
Tests