Skip to content
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Merge remote changes: resolve conflicts in TensorRT-LLM README and di…
…stributed runtime docs
  • Loading branch information
athreesh committed Jul 29, 2025
commit ba8bf2f14271141fa0efb34f44dd82398a6522c2
16 changes: 16 additions & 0 deletions components/backends/trtllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,22 @@ DISAGGREGATION_STRATEGY="prefill_first" ./launch/disagg.sh

Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disaggregated serving: UCX (default) and NIXL (experimental). For detailed information and configuration instructions for each method, see the [KV cache transfer guide](./kv-cache-tranfer.md).

## Request Migration

In a [Distributed System](#distributed-system), a request may fail due to connectivity issues between the Frontend and the Backend.

The Frontend will automatically track which Backends are having connectivity issues with it and avoid routing new requests to the Backends with known connectivity issues.

For ongoing requests, there is a `--migration-limit` flag which can be set on the Backend that tells the Frontend how many times a request can be migrated to another Backend should there be a loss of connectivity to the current Backend.

For example,
```bash
python3 -m dynamo.trtllm ... --migration-limit=3
```
indicates a request to this model may be migrated up to 3 times to another Backend, before failing the request, should the Frontend detects a connectivity issue to the current Backend.

The migrated request will continue responding to the original request, allowing for a seamless transition between Backends, and a reduced overall request failure rate at the Frontend for enhanced user experience.

## Client

See [client](../llm/README.md#client) section to learn how to send request to the deployment.
Expand Down
You are viewing a condensed version of this merge commit. You can view the full changes here.