feat: Support Dynamo KVBM with TRTLLM Disagg #3527

jthomson04 · 2025-10-09T18:26:29Z

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Added KV cache connector configuration option with support for kvbm connector type
- Added --connector command-line argument for connector selection
Tests
- Extended determinism test suite to support TensorRT-LLM servers
- Enhanced server health checks with improved diagnostic messaging and error reporting

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: jthomson04 <[email protected]>

…terminism Signed-off-by: jthomson04 <[email protected]>

Signed-off-by: jthomson04 <[email protected]>

…m-disagg

coderabbitai · 2025-11-24T18:15:36Z

Walkthrough

This pull request adds KV cache connector support to the TensorRT-LLM engine integration by introducing a --connector configuration option, building KvCacheConnectorConfig instances, and refactoring the scheduler output handling in the block manager to track scheduled tokens dynamically instead of using computed positions.

Changes

Cohort / File(s)	Summary
TensorRT-LLM Configuration `components/src/dynamo/trtllm/main.py`, `components/src/dynamo/trtllm/utils/trtllm_utils.py`	Added `--connector` command-line option (choices: `"kvbm"`, default: `None`), new `connector` attribute to Config class, and `build_kv_connector_config()` helper function that constructs KvCacheConnectorConfig when connector is set and propagates it into engine arguments.
Rust Block Manager Trait Refactoring `lib/bindings/kvbm/src/block_manager/vllm/connector/leader/slot.rs`	Removed `apply_scheduler_output_with_computed_position` method from Slot trait and its VllmConnectorSlot implementation, eliminating the computed-position-based scheduling path.
Scheduler Output Integration `lib/bindings/kvbm/src/block_manager/vllm/connector/trtllm_leader.rs`	Updated call sites from `apply_scheduler_output_with_computed_position` to `apply_scheduler_output`, now deriving `scheduled_tokens` from scheduler_output metadata and passing it as a parameter instead of using fixed boolean values.
Scheduling Metadata Augmentation `lib/bindings/kvbm/python/kvbm/trtllm_integration/connector/kvbm_connector_leader.py`	Enhanced `build_connector_meta` to attach per-request scheduling metadata mapping each request_id to its num_scheduled_tokens for both new and cached requests.
Test Infrastructure `tests/kvbm_integration/test_determinism_disagg.py`	Added TRTLLM server type support with `_set_up_trtllm_config` method for YAML config generation, enhanced server health checks with detailed diagnostics, and expanded server selection logic in fixtures.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

The changes span multiple heterogeneous systems: Python configuration plumbing, Rust trait modifications, cross-language scheduler metadata coordination, and test infrastructure expansion
Requires understanding interconnections between KvCacheConnectorConfig instantiation, scheduler output metadata flow, and both new and cached request handling paths in trtllm_leader.rs
The removal of apply_scheduler_output_with_computed_position and its replacement pattern needs careful verification across all call sites

Areas requiring extra attention during review:

Verification that all callers of the removed apply_scheduler_output_with_computed_position method have been identified and updated in trtllm_leader.rs
Correctness of the scheduled_tokens retrieval logic in trtllm_leader.rs (handling of missing keys, default value of 0)
Completeness of TRTLLM test setup in test_determinism_disagg.py and whether all required configuration paths are correctly generated
Consistency between Python KvCacheConnectorConfig construction and how the metadata is consumed downstream

Poem

🐰 A connector blooms in the KV cache garden,
Where tokens are scheduled and memories harden,
The trait sheds its burden, rust shines so bright,
Config flows forward, from Python to night,
Tests now know TRTLLM, what a delight!

Pre-merge checks

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is entirely empty/template-only with no substantive information, failing to document the changes, rationale, or reviewer guidance.	Fill in all required sections: provide an overview of the feature, describe specific changes made, identify key files for review, and reference the related GitHub issue with proper closing keywords.
Docstring Coverage	⚠️ Warning	Docstring coverage is 43.75% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: adding KVBM support for TRTLLM with disaggregation, which aligns with the changeset's core functionality.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (5)

lib/bindings/kvbm/src/block_manager/vllm/connector/trtllm_leader.rs (1)

328-338: Per-request scheduled_tokens wiring looks correct; consider tightening missing-key handling

Using scheduler_output.num_scheduled_tokens.get(request_id).unwrap_or(&0) for both new and cached requests aligns with the Python side dict keyed by stringified request_id, and passing scheduled_tokens into slot.apply_scheduler_output matches the new slot API.

One thing to consider: silently defaulting to 0 if the key is missing will mask any mismatch between the Python and Rust schedulers. A lightweight guard like a debug_assert!(scheduler_output.num_scheduled_tokens.contains_key(request_id)) (or at least a debug log) would make integration issues much easier to diagnose while keeping release behavior unchanged.

Also applies to: 364-374
components/src/dynamo/trtllm/main.py (1)
32-32: Verify connector_module path and kv_connector_config wiring semantics

The conditional construction of KvCacheConnectorConfig from config.connector and passing kv_connector_config through arg_map is a clean way to make connector support optional and fail fast on invalid values.

Two details worth double-checking:
Module path consistency: build_kv_connector_config uses
connector_module="dynamo.llm.trtllm_integration.connector"
while the existing deterministic tests for aggregated mode reference
"kvbm.trtllm_integration.connector" as the module. Please confirm that dynamo.llm.trtllm_integration.connector actually exposes DynamoKVBMConnectorLeader / DynamoKVBMConnectorWorker, or consider aligning this string with the tested path to avoid runtime import errors.
None semantics for kv_connector_config: when config.connector is unset, the key is still present in arg_map with a value of None. That should be acceptable if kv_connector_config is an optional field in the underlying LLM args, but if tensorrt-llm distinguishes between “key absent” vs “key present with None”, it may be slightly safer to omit the key entirely in that case.
Neither point blocks the overall design, but they’re worth verifying against the tensorrt-llm API and the existing kvbm integration tests.

Also applies to: 106-117, 192-193, 210-210
tests/kvbm_integration/test_determinism_disagg.py (3)
173-181: Use of fixed /tmp/...yaml paths is fine for tests but can be made more robust

The prefill/decode configs are written to fixed filenames under /tmp:
prefill_config_path = os.environ.get(..., "/tmp/kvbm_llm_api_prefill_config.yaml")
decode_config_path = os.environ.get(..., "/tmp/kvbm_llm_api_decode_config.yaml")
...
yaml.dump(..., default_flow_style=False, sort_keys=False)
For CI and single-test runs this is usually acceptable, but it does open you up to path collisions if multiple test processes run concurrently on the same host.

If this becomes an issue, you could switch to tempfile.NamedTemporaryFile or include something like the PID or port in the default filenames; otherwise, this is probably fine as-is for an internal integration test.

Also applies to: 244-247

424-456: Improved health logging is helpful; consider tightening control flow slightly

The extra logging around the health check and model endpoint status, plus the explicit requests.exceptions.RequestException handling, should make TRTLLM startup issues much easier to diagnose.

If you want to mirror static-analysis expectations, you could move return response.status_code == 200 into an else block under the second if response.status_code != 200 for slightly clearer control flow, but functionally the current structure is sound.

507-512: Server-type selection now supports TRTLLM; skip message could be updated

Falling back to ServerType.trtllm when vllm is absent but tensorrt_llm is present is a sensible way to reuse the same test harness.

The only minor nit is the skip message:
else:
    pytest.skip("vllm module is not available in the current environment.")
If neither backend is present, this message is slightly misleading now that TRTLLM is also a valid option; you may want to reword it to mention both backends or say something like “No supported LLM backend (vllm or tensorrt_llm) is available”.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0ca16f and fa19f67.

📒 Files selected for processing (6)

components/src/dynamo/trtllm/main.py (4 hunks)
components/src/dynamo/trtllm/utils/trtllm_utils.py (3 hunks)
lib/bindings/kvbm/python/kvbm/trtllm_integration/connector/kvbm_connector_leader.py (1 hunks)
lib/bindings/kvbm/src/block_manager/vllm/connector/leader/slot.rs (0 hunks)
lib/bindings/kvbm/src/block_manager/vllm/connector/trtllm_leader.rs (2 hunks)
tests/kvbm_integration/test_determinism_disagg.py (6 hunks)

💤 Files with no reviewable changes (1)

lib/bindings/kvbm/src/block_manager/vllm/connector/leader/slot.rs

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 2989
File: lib/llm/src/block_manager/distributed/transfer.rs:6-6
Timestamp: 2025-09-18T21:47:44.143Z
Learning: For PR ai-dynamo/dynamo#2989, the ConnectorTransferBatcher architectural issues will be addressed in a follow-up PR by removing the duplicate batching logic and integrating distributed transfers with the existing TransferBatcher + LocalTransferManager pipeline, rather than adding bounded concurrency primitives like Semaphore.

🧬 Code graph analysis (4)

tests/kvbm_integration/test_determinism_disagg.py (2)

tests/kvbm_integration/common.py (1)

ServerType (133-135)

tests/kvbm_integration/test_determinism_agg.py (1)

_set_up_trtllm_config (122-164)

lib/bindings/kvbm/src/block_manager/vllm/connector/trtllm_leader.rs (1)

lib/bindings/kvbm/src/block_manager/vllm/connector/leader/slot.rs (2)

request_id (95-95)

request_id (460-462)

lib/bindings/kvbm/python/kvbm/trtllm_integration/connector/kvbm_connector_leader.py (2)

lib/bindings/kvbm/src/block_manager/vllm/connector.rs (1)

add_num_scheduled_tokens (79-82)

lib/bindings/kvbm/src/block_manager/vllm/connector/leader/slot.rs (2)

request_id (95-95)

request_id (460-462)

components/src/dynamo/trtllm/main.py (2)

components/src/dynamo/trtllm/utils/trtllm_utils.py (1)

Config (29-97)

components/src/dynamo/vllm/args.py (1)

Config (30-78)

🪛 Ruff (0.14.5)

tests/kvbm_integration/test_determinism_disagg.py

175-175: Probable insecure usage of temporary file or directory: "/tmp/kvbm_llm_api_prefill_config.yaml"

(S108)

180-180: Probable insecure usage of temporary file or directory: "/tmp/kvbm_llm_api_decode_config.yaml"

(S108)

228-235: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

237-242: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

452-452: Consider moving this statement to an else block

(TRY300)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)

GitHub Check: operator (arm64)
GitHub Check: operator (amd64)
GitHub Check: vllm (arm64)
GitHub Check: trtllm (arm64)
GitHub Check: trtllm (amd64)
GitHub Check: sglang (arm64)
GitHub Check: Mirror Repository to GitLab
GitHub Check: clippy (lib/bindings/python)
GitHub Check: tests (launch/dynamo-run)
GitHub Check: clippy (launch/dynamo-run)
GitHub Check: tests (lib/bindings/python)
GitHub Check: tests (lib/runtime/examples)
GitHub Check: tests (.)
GitHub Check: clippy (.)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (2)

lib/bindings/kvbm/python/kvbm/trtllm_integration/connector/kvbm_connector_leader.py (1)

76-82: num_scheduled_tokens export is consistent with the Rust API

The dict comprehension that merges scheduler_output.new_requests and scheduler_output.cached_requests into a {request_id: num_scheduled_tokens} mapping is exactly what RustSchedulerOutput.add_num_scheduled_tokens expects, and the key format (str(req.request_id)) matches what the Rust side uses for lookups.

Just make sure that every entry in both new_requests and cached_requests always has num_scheduled_tokens populated; otherwise the Rust side will quietly treat the request as having 0 scheduled tokens due to the unwrap_or(&0) default.

components/src/dynamo/trtllm/utils/trtllm_utils.py (1)

63-63: Connector option plumbing is straightforward and consistent

Adding connector to Config, exposing it via --connector (with explicit choices) and wiring config.connector = args.connector keeps the configuration path simple and type-safe. This should compose cleanly with the new build_kv_connector_config helper in main.py and is easy to extend if more connectors are added later.

Also applies to: 279-285, 368-368

coderabbitai · 2025-11-24T18:15:40Z

tests/kvbm_integration/test_determinism_disagg.py

+    def _set_up_trtllm_config(self, gpu_cache_blocks):
+        # Mostly the same parameters here as in the
+        prefill_config_path = os.environ.get(
+            "KVBM_TRTLLM_LLMAPI_PREFILL_CONFIG_PATH",
+            "/tmp/kvbm_llm_api_prefill_config.yaml",
+        )
+
+        decode_config_path = os.environ.get(
+            "KVBM_TRTLLM_LLMAPI_DECODE_CONFIG_PATH",
+            "/tmp/kvbm_llm_api_decode_config.yaml",
+        )
+
+        llm_api_config: Dict[str, Any] = {}
+        llm_api_config["kv_cache_config"] = {
+            "enable_partial_reuse": False,
+            "free_gpu_memory_fraction": 0.10,
+            "tokens_per_block": 16,
+        }
+
+        # GPU blocks override
+        if gpu_cache_blocks is not None:
+            del llm_api_config["kv_cache_config"]["free_gpu_memory_fraction"]
+            llm_api_config["kv_cache_config"]["max_tokens"] = (
+                int(gpu_cache_blocks) * 32
+            )  # TRTLLM defaults 32 tokens per block
+
+        prefill_config = deepcopy(llm_api_config)
+        prefill_config["disable_overlap_scheduler"] = True
+        prefill_config["cache_transceiver_config"] = {
+            "backend": "DEFAULT",
+            "max_tokens_in_buffer": 16384,
+        }
+        prefill_config["cuda_graph_config"] = None
+
+        decode_config = deepcopy(llm_api_config)
+        decode_config["disable_overlap_scheduler"] = False
+        decode_config["cache_transceiver_config"] = {
+            "backend": "DEFAULT",
+            "max_tokens_in_buffer": 65536,
+        }
+
+        model = os.environ.get(
+            "KVBM_MODEL_ID", "deepseek-ai/DeepSeek-R1-Distill-Llama-8B"
+        )
+
+        cmd_root = [
+            "python3",
+            "-m",
+            "dynamo.trtllm",
+            "--model",
+            model,
+            "--kv-block-size",
+            "16",
+            "--max-num-tokens",
+            "8000",
+        ]
+
+        self.prefiller_cmd = cmd_root + [
+            "--extra-engine-args",
+            prefill_config_path,
+            "--disaggregation-mode",
+            "prefill",
+            "--connector",
+            "kvbm",
+        ]
+
+        self.decoder_cmd = cmd_root + [
+            "--extra-engine-args",
+            decode_config_path,
+            "--disaggregation-mode",
+            "decode",
+        ]
+
+        with open(prefill_config_path, "w") as f:
+            yaml.dump(prefill_config, f, default_flow_style=False, sort_keys=False)
+        with open(decode_config_path, "w") as f:
+            yaml.dump(decode_config, f, default_flow_style=False, sort_keys=False)


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

GPU cache blocks override likely doubles effective blocks due to inconsistent multiplier

In _set_up_trtllm_config, you configure:

llm_api_config["kv_cache_config"] = { "enable_partial_reuse": False, "free_gpu_memory_fraction": 0.10, "tokens_per_block": 16, } ... if gpu_cache_blocks is not None: del llm_api_config["kv_cache_config"]["free_gpu_memory_fraction"] llm_api_config["kv_cache_config"]["max_tokens"] = int(gpu_cache_blocks) * 32 # comment: defaults 32 tokens per block

With tokens_per_block explicitly set to 16, this * 32 multiplier makes the effective number of blocks max_tokens / tokens_per_block = 2 * gpu_cache_blocks, which doesn’t match the name of the override or the comment (that comment matches the aggregated-mode helper where tokens_per_block stayed at the TRTLLM default of 32).

To keep the override semantics intuitive, consider deriving max_tokens from the configured tokens_per_block, e.g.:

- llm_api_config["kv_cache_config"]["max_tokens"] = ( - int(gpu_cache_blocks) * 32 - ) # TRTLLM defaults 32 tokens per block + tokens_per_block = llm_api_config["kv_cache_config"]["tokens_per_block"] + llm_api_config["kv_cache_config"]["max_tokens"] = int(gpu_cache_blocks) * tokens_per_block

This keeps the block count aligned with gpu_cache_blocks regardless of what tokens_per_block you choose for the test.

I'll verify the review comment by examining the actual code and its context.

🏁 Script executed:

#!/bin/bash # Locate and examine the test file to verify the GPU cache blocks override logic find . -name "test_determinism_disagg.py" -type f | head -1 | xargs -I {} cat {}

Length of output: 20913

🏁 Script executed:

#!/bin/bash # Search for other usages of tokens_per_block to understand the intended semantics rg -n "tokens_per_block" --type py -C 3

Length of output: 2376

🏁 Script executed:

#!/bin/bash # Search for other TRTLLM config patterns or gpu_cache_blocks references rg -n "gpu_cache_blocks|max_tokens" --type py -C 2

Length of output: 68469

Fix max_tokens multiplier to use configured tokens_per_block instead of hard-coded 32

In _set_up_trtllm_config (lines 171–195), the code explicitly sets tokens_per_block to 16 but then multiplies gpu_cache_blocks by 32 when computing max_tokens. This creates an effective block count of 2× the parameter value:

llm_api_config["kv_cache_config"]["tokens_per_block"] = 16 ... llm_api_config["kv_cache_config"]["max_tokens"] = int(gpu_cache_blocks) * 32 # → effective blocks = (gpu_cache_blocks * 32) / 16 = 2 * gpu_cache_blocks

The comment references "TRTLLM defaults 32 tokens per block," but that default is overridden here. Use the configured value instead:

-llm_api_config["kv_cache_config"]["max_tokens"] = ( - int(gpu_cache_blocks) * 32 -) # TRTLLM defaults 32 tokens per block +tokens_per_block = llm_api_config["kv_cache_config"]["tokens_per_block"] +llm_api_config["kv_cache_config"]["max_tokens"] = int(gpu_cache_blocks) * tokens_per_block

This ensures the override semantics remain intuitive and block count stays aligned with the gpu_cache_blocks parameter.

🧰 Tools

🪛 Ruff (0.14.5)

175-175: Probable insecure usage of temporary file or directory: "/tmp/kvbm_llm_api_prefill_config.yaml"

(S108)

180-180: Probable insecure usage of temporary file or directory: "/tmp/kvbm_llm_api_decode_config.yaml"

(S108)

228-235: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

237-242: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

🤖 Prompt for AI Agents

In tests/kvbm_integration/test_determinism_disagg.py around lines 171-247, the code sets kv_cache_config["tokens_per_block"]=16 but then computes max_tokens using a hard-coded 32 multiplier, effectively doubling the intended blocks; change the computation to use the configured tokens_per_block value (read tokens_per_block from kv_cache_config or a local variable) when calculating max_tokens so max_tokens = int(gpu_cache_blocks) * tokens_per_block (and remove the misleading comment about TRTLLM default or update it to reflect using the configured value).

Signed-off-by: jthomson04 <[email protected]>

…m-disagg

ziqifan617 · 2025-11-24T18:19:47Z

please update https://github.com/ai-dynamo/dynamo/blob/main/docs/kvbm/trtllm-setup.md to reflect disagg is now supported

oandreeva-nv · 2025-11-24T18:28:04Z

lib/bindings/kvbm/src/block_manager/vllm/connector/leader/slot.rs

        num_scheduled_tokens: usize,
    ) -> Result<(), SlotError>;

-    // TRT-LLM does not include scheduled tokens in the scheduler output.


so trtllm 1.2.0 includes scheduled tokens now?

Yes, 1.2.0rc2 supports it

we can run connector with trtllm starting 1.1.0rc5, if we remove this part, what happens with 1.1.0rc5?

1.1.0rc5 would break with this MR. We could (in theory) detect the TRTLLM version, and fallback to the non scheduled-tokens implementation, but that could be super ugly.

But the scheduled-tokens output in rc2 (as well as the scheduled-tokens handling on the KVBM-side) is required for Dynamo + kvbm to work.

we'd need to handle it in some way, even if it's simple detect trtllm version -> fail if incompatible

we-d need to also adjust docs, where 1.1.0.rc5 is mentioned as supported

Updated docs, and added a little check that the num_scheduled_tokens field exists; throws an error if it doesn't.

@oandreeva-nv Can you please re-review?

Signed-off-by: jthomson04 <[email protected]>

pull-request-size bot added the size/L label Oct 9, 2025

github-actions bot added the feat label Oct 9, 2025

copy-pr-bot bot temporarily deployed to GITLAB October 10, 2025 20:26 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 10, 2025 20:34 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 13, 2025 17:47 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 13, 2025 17:49 Inactive

jthomson04 added 3 commits October 24, 2025 11:11

Support disaggregation with KVBM and TRTLLM

2d2e8df

Signed-off-by: jthomson04 <[email protected]>

Wire up connector to trtllm component and support trtllm in disagg de…

b537618

…terminism Signed-off-by: jthomson04 <[email protected]>

Run connector with prefill worker

21a5928

Signed-off-by: jthomson04 <[email protected]>

jthomson04 force-pushed the jthomson04/kvbm-trtllm-disagg branch from ccc7447 to 21a5928 Compare October 24, 2025 18:11

copy-pr-bot bot temporarily deployed to GITLAB October 24, 2025 18:11 Inactive

Cleanup

1cad35a

Signed-off-by: jthomson04 <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 24, 2025 18:15 Inactive

copy-pr-bot bot temporarily deployed to GITLAB October 24, 2025 18:21 Inactive

fmt

6927c39

Signed-off-by: jthomson04 <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB October 24, 2025 20:14 Inactive

Merge remote-tracking branch 'origin/main' into jthomson04/kvbm-trtll…

fa19f67

…m-disagg

jthomson04 marked this pull request as ready for review November 24, 2025 18:10

jthomson04 requested review from a team as code owners November 24, 2025 18:10

copy-pr-bot bot temporarily deployed to GITLAB November 24, 2025 18:10 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 24, 2025 18:12 Inactive

coderabbitai bot reviewed Nov 24, 2025

View reviewed changes

jthomson04 added 2 commits November 24, 2025 10:15

Fix connector path

58f5968

Signed-off-by: jthomson04 <[email protected]>

Merge remote-tracking branch 'origin/main' into jthomson04/kvbm-trtll…

b00b282

…m-disagg

copy-pr-bot bot temporarily deployed to GITLAB November 24, 2025 18:16 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 24, 2025 18:18 Inactive

oandreeva-nv reviewed Nov 24, 2025

View reviewed changes

jthomson04 added 2 commits November 24, 2025 10:42

Update docs

a701194

Signed-off-by: jthomson04 <[email protected]>

Check for version compatibility

5d134df

Signed-off-by: jthomson04 <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB November 24, 2025 22:19 Inactive

precommit

667e155

Signed-off-by: jthomson04 <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB November 24, 2025 22:20 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 24, 2025 22:24 Inactive

Merge branch 'main' into jthomson04/kvbm-trtllm-disagg

cce0e3d

copy-pr-bot bot temporarily deployed to GITLAB November 25, 2025 17:50 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 25, 2025 17:53 Inactive

Merge branch 'main' into jthomson04/kvbm-trtllm-disagg

8a2d280

copy-pr-bot bot temporarily deployed to GITLAB November 30, 2025 03:02 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 30, 2025 03:04 Inactive

Merge branch 'main' into jthomson04/kvbm-trtllm-disagg

1386c07

copy-pr-bot bot had a problem deploying to GITLAB December 1, 2025 19:52 Failure

Merge branch 'main' into jthomson04/kvbm-trtllm-disagg

41ed735

copy-pr-bot bot temporarily deployed to GITLAB December 8, 2025 16:47 Inactive

copy-pr-bot bot temporarily deployed to GITLAB December 8, 2025 16:48 Inactive

feat: Support Dynamo KVBM with TRTLLM Disagg #3527

Are you sure you want to change the base?

feat: Support Dynamo KVBM with TRTLLM Disagg #3527

Uh oh!

Conversation

jthomson04 commented Oct 9, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 24, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

ziqifan617 commented Nov 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jthomson04 Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jthomson04 commented Oct 9, 2025 •

edited by coderabbitai bot

Loading

jthomson04 Nov 24, 2025 •

edited

Loading