ai-dynamo · athreesh · Jul 29, 2025 · Jul 24, 2025 · Jul 28, 2025 · Jul 28, 2025
diff --git a/components/backends/trtllm/README.md b/components/backends/trtllm/README.md
@@ -202,6 +202,22 @@ DISAGGREGATION_STRATEGY="prefill_first" ./launch/disagg.sh
 
 Dynamo with TensorRT-LLM supports two methods for transferring KV cache in disaggregated serving: UCX (default) and NIXL (experimental). For detailed information and configuration instructions for each method, see the [KV cache transfer guide](./kv-cache-tranfer.md).
 
+## Request Migration
+
+In a [Distributed System](#distributed-system), a request may fail due to connectivity issues between the Frontend and the Backend.
+
+The Frontend will automatically track which Backends are having connectivity issues with it and avoid routing new requests to the Backends with known connectivity issues.
+
+For ongoing requests, there is a `--migration-limit` flag which can be set on the Backend that tells the Frontend how many times a request can be migrated to another Backend should there be a loss of connectivity to the current Backend.
+
+For example,
+```bash
+python3 -m dynamo.trtllm ... --migration-limit=3
+```
+indicates a request to this model may be migrated up to 3 times to another Backend, before failing the request, should the Frontend detects a connectivity issue to the current Backend.
+
+The migrated request will continue responding to the original request, allowing for a seamless transition between Backends, and a reduced overall request failure rate at the Frontend for enhanced user experience.
+
 ## Client
 
 See [client](../llm/README.md#client) section to learn how to send request to the deployment.