Skip to content

Conversation

@kthui
Copy link
Contributor

@kthui kthui commented Aug 18, 2025

Overview:

This is a side branch for checking in all the smaller Request Cancellation features and fixes and tests, and have everything merged into main before the next 0.5.0 code freeze.

Details:

Features and Fixes:

  • Relay cancellation signal between streams over migration layer.
  • Relay context_id between streams over migration layer.
  • Add debug print on received --migration-limit on mdc.
  • Allow Python outgoing calls to optionally supply a context.
  • vLLM to abort on context stop / killed.

Tests:

  • Python unit tests on incoming and outgoing context cancellation.
  • E2E vLLM test on request abort and disagg decode request abort.

Where should the reviewer start?

Recommend starting with /docs updates, and then the followings:

Python cancellation support:

  1. Look into /lib/bindings/python/rust/tests/test_cancellation on how Python can cancel / be notified on cancellation.
  2. Look into /lib/bindings/python/rust/context.rs and /lib/bindings/python/rust/lib.rs on the implementation.

Rust child and parent context:

  1. Look into /lib/runtime/src/engine.rs on how a child context is linked to a parent context.
  2. Look into /lib/runtime/src/pipeline/context.rs for an example on how the interface at engine.rs is implemented.

vLLM abort on cancel:

  1. Look into /components/backends/vllm/src/dynamo/vllm/handlers.py on how requests are aborted in vLLM on cancel.
  2. Look into /tests/fault_tolerance/test_request_cancellation.py for the E2E tests on cancellation.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

N/A

Summary by CodeRabbit

  • New Features

    • Introduced request cancellation across generation and streaming, including prefill/decode workflows.
    • Added hierarchical context propagation to abort in-flight requests promptly.
    • Python SDK: new Context class and optional context parameter on client methods to monitor/control cancellation.
    • Improved cancellation logging for better observability.
  • Documentation

    • New Request Cancellation architecture doc.
    • Backend and runtime guides updated with cancellation usage and HTTP disconnect behavior.
    • Fault-tolerance docs expanded with cancellation scenarios.
  • Tests

    • Added unit, integration, and end-to-end cancellation tests, including vLLM and disaggregated prefill/decode setups.

@kthui kthui self-assigned this Aug 18, 2025
@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 18, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@kthui kthui changed the title FT: Request Cancellation feature and test for 0.5.0 feat: FT Request Cancellation feature and test for 0.5.0 Aug 18, 2025
@github-actions github-actions bot added the feat label Aug 18, 2025
@kthui kthui force-pushed the ft-request-cancel-0.5.0 branch from b869c74 to 77cae01 Compare August 19, 2025 18:26
@pull-request-size pull-request-size bot added size/L and removed size/M labels Aug 19, 2025
@kthui kthui force-pushed the ft-request-cancel-0.5.0 branch from 77cae01 to 681067e Compare August 23, 2025 01:52
@kthui kthui force-pushed the ft-request-cancel-0.5.0 branch from 681067e to 8f12b18 Compare August 25, 2025 19:10
@kthui kthui marked this pull request as ready for review August 27, 2025 17:08
Copy link
Contributor

@michaelfeil michaelfeil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks correct on a high level, something we would use. left some questions.

Copy link
Contributor

@grahamking grahamking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should address the Mutex in a follow-up PR.

@kthui
Copy link
Contributor Author

kthui commented Aug 29, 2025

We should address the Mutex in a follow-up PR.

Yes, DIS-569.

Copy link
Contributor

@keivenchang keivenchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for answering the questions and thanks for this feat!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants