Skip to content

Conversation

@krishung5
Copy link
Contributor

@krishung5 krishung5 commented Aug 27, 2025

Overview:

Add vLLM multimodal video support.

Closes DIS-155
Fixes #1946

Details:

  • Video processing changes mainly in video_encode_worker.py and video_utils.py.
  • Rust files updated for adding video url field in the openai format.
  • Some refactors to avoid duplicated code between image and video pipelines.

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Added end-to-end video input support to the multimodal pipeline (image/video via a unified input).
    • Introduced a video encoding worker and utilities for loading, decoding, sampling, and resizing video.
    • Provided aggregated and disaggregated video serving demos with launch scripts.
    • Enabled video content in client SDK/types alongside images and text.
    • Added a shared, reusable HTTP client for faster media fetching.
  • Documentation

    • Expanded README with step-by-step video-serving guides, example requests, and sample outputs.
  • Tests

    • Added automated test configuration and payloads for video aggregation flows.

@krishung5
Copy link
Contributor Author

/ok to test b6d2d2f

@krishung5
Copy link
Contributor Author

/ok to test d382e44

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 27, 2025

Walkthrough

Adds end-to-end video modality support to the multimodal examples: new video encode worker, video utilities, protocol/model updates, processor/PD worker adaptations, HTTP client refactor, launch scripts for aggregated/disaggregated video serving, README sections, Rust async-openai type extensions for video content, and vLLM tests for video pipelines.

Changes

Cohort / File(s) Summary of changes
Docs: Video pipelines
examples/multimodal/README.md
Adds sections for aggregated and disaggregated video serving with components, graphs, launch scripts, curl examples, and sample responses.
Processor & request structure
examples/multimodal/components/processor.py, examples/multimodal/components/encode_worker.py, examples/multimodal/components/worker.py, examples/multimodal/utils/protocol.py
Switches to MultiModalInput container (image_url/video_url). Updates request handling, signatures, and field clearing. Adds video fields (image_grid_thw, embeddings_shape). Removes EncodeRequest. Adjusts PD worker dtype and modality routing.
Video encode worker & utilities
examples/multimodal/components/video_encode_worker.py, examples/multimodal/utils/video_utils.py
Adds VllmEncodeWorker for video decoding, frame sampling/resizing, RDMA packaging, and streaming. Introduces async helpers for loading, decoding, sampling, resizing, and RDMA tensor prep.
HTTP client and image loader
examples/multimodal/utils/http_client.py, examples/multimodal/utils/image_loader.py
Adds shared httpx.AsyncClient getter. ImageLoader now configurable http timeout and uses shared client per call.
Model multimodal data construction
examples/multimodal/utils/model.py
Extends construct_mm_data to support video_numpy and refactors image paths (incl. Qwen-specific). Removes get_vision_embeddings_info.
Launch scripts (video)
examples/multimodal/launch/video_agg.sh, examples/multimodal/launch/video_disagg.sh
Adds orchestrators for aggregated/disaggregated video pipelines with GPU assignment and cleanup traps.
Rust async-openai: chat/message/responses
lib/async-openai/src/types/chat.rs, .../message.rs, .../responses.rs
Extends API to support video URL content: new structs/enums (VideoUrl, content parts, input video). Updates message content and delta variants.
Tests: vLLM video config & payload
tests/serve/test_vllm.py
Adds VLLMConfig dataclass and video_agg config. Adjusts multimodal detection and adds video payload generation.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Frontend as Ingress Frontend
  participant Processor
  participant Encode as VideoEncodeWorker
  participant PD as VllmPDWorker (Prefill/Decode)
  participant Model as vLLM Engine

  Client->>Frontend: POST /chat (video_url, prompt)
  Frontend->>Processor: Forward request
  Processor->>Encode: vLLMMultimodalRequest(multimodal_input.video_url)
  Encode->>Encode: Load + decode video, sample frames, resize
  Encode->>PD: RDMA descriptor (video tensor), metadata
  PD->>Model: Prefill/Decode with video mm-data
  Model-->>PD: Tokens / outputs (stream or final)
  PD-->>Processor: MyRequestOutput stream
  Processor-->>Frontend: Aggregate/forward
  Frontend-->>Client: Response
Loading
sequenceDiagram
  autonumber
  participant Client
  participant Frontend
  participant Processor
  participant Encode as VideoEncodeWorker
  participant Prefill as VllmPDWorker (Prefill)
  participant Decode as VllmDecodeWorker
  participant Model

  Client->>Frontend: POST /chat (video_url)
  Frontend->>Processor: Forward
  Processor->>Encode: Request video embeddings/tensor
  Encode-->>Processor: RDMA descriptor (video tensor)
  Processor->>Prefill: Prefill with mm-data (video)
  Prefill->>Decode: Handoff KV/state
  Decode->>Model: Decode tokens
  Model-->>Decode: Outputs
  Decode-->>Processor: Stream outputs
  Processor-->>Frontend: Forward
  Frontend-->>Client: Response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

Possibly related PRs

Poem

A twitch of whiskers, reels in flight,
Frames hop by in rainbow light.
I nibble bytes, then chew the stream—
From video seeds, I sprout a dream.
Prefill, decode—thump-thump, hooray!
Multimodal meadows, we bound today.
(🥕▶️🐇)

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Signed-off-by: krishung5 <[email protected]>
Signed-off-by: krishung5 <[email protected]>
@krishung5
Copy link
Contributor Author

/ok to test 6f54bf1

Copy link
Contributor

@indrajit96 indrajit96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some minor suggestions

@krishung5
Copy link
Contributor Author

/ok to test 37d9859

@krishung5 krishung5 requested a review from indrajit96 August 27, 2025 23:49
Copy link
Contributor

@indrajit96 indrajit96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@krishung5 krishung5 enabled auto-merge (squash) August 27, 2025 23:52
@krishung5
Copy link
Contributor Author

@coderabbitai resolve

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 28, 2025

✅ Actions performed

Comments resolved.

@krishung5 krishung5 merged commit 78a1107 into main Aug 28, 2025
14 of 15 checks passed
@krishung5 krishung5 deleted the krish/video branch August 28, 2025 03:41
jasonqinzhou pushed a commit that referenced this pull request Aug 30, 2025
Signed-off-by: krishung5 <[email protected]>
Signed-off-by: Jason Zhou <[email protected]>
michaelshin pushed a commit that referenced this pull request Sep 2, 2025
Signed-off-by: krishung5 <[email protected]>
Signed-off-by: Michael Shin <[email protected]>
KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025
Signed-off-by: krishung5 <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
nnshah1 pushed a commit that referenced this pull request Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error processing video request in multimodal example

4 participants