feat: Add vLLM multimodal video support #2738

krishung5 · 2025-08-27T00:27:19Z

Overview:

Add vLLM multimodal video support.

Closes DIS-155
Fixes #1946

Details:

Video processing changes mainly in video_encode_worker.py and video_utils.py.
Rust files updated for adding video url field in the openai format.
Some refactors to avoid duplicated code between image and video pipelines.

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Added end-to-end video input support to the multimodal pipeline (image/video via a unified input).
- Introduced a video encoding worker and utilities for loading, decoding, sampling, and resizing video.
- Provided aggregated and disaggregated video serving demos with launch scripts.
- Enabled video content in client SDK/types alongside images and text.
- Added a shared, reusable HTTP client for faster media fetching.
Documentation
- Expanded README with step-by-step video-serving guides, example requests, and sample outputs.
Tests
- Added automated test configuration and payloads for video aggregation flows.

krishung5 · 2025-08-27T00:27:42Z

/ok to test b6d2d2f

krishung5 · 2025-08-27T00:30:14Z

/ok to test d382e44

coderabbitai · 2025-08-27T00:31:14Z

Walkthrough

Adds end-to-end video modality support to the multimodal examples: new video encode worker, video utilities, protocol/model updates, processor/PD worker adaptations, HTTP client refactor, launch scripts for aggregated/disaggregated video serving, README sections, Rust async-openai type extensions for video content, and vLLM tests for video pipelines.

Changes

Cohort / File(s)	Summary of changes
Docs: Video pipelines `examples/multimodal/README.md`	Adds sections for aggregated and disaggregated video serving with components, graphs, launch scripts, curl examples, and sample responses.
Processor & request structure `examples/multimodal/components/processor.py`, `examples/multimodal/components/encode_worker.py`, `examples/multimodal/components/worker.py`, `examples/multimodal/utils/protocol.py`	Switches to MultiModalInput container (image_url/video_url). Updates request handling, signatures, and field clearing. Adds video fields (image_grid_thw, embeddings_shape). Removes EncodeRequest. Adjusts PD worker dtype and modality routing.
Video encode worker & utilities `examples/multimodal/components/video_encode_worker.py`, `examples/multimodal/utils/video_utils.py`	Adds VllmEncodeWorker for video decoding, frame sampling/resizing, RDMA packaging, and streaming. Introduces async helpers for loading, decoding, sampling, resizing, and RDMA tensor prep.
HTTP client and image loader `examples/multimodal/utils/http_client.py`, `examples/multimodal/utils/image_loader.py`	Adds shared httpx.AsyncClient getter. ImageLoader now configurable http timeout and uses shared client per call.
Model multimodal data construction `examples/multimodal/utils/model.py`	Extends construct_mm_data to support video_numpy and refactors image paths (incl. Qwen-specific). Removes get_vision_embeddings_info.
Launch scripts (video) `examples/multimodal/launch/video_agg.sh`, `examples/multimodal/launch/video_disagg.sh`	Adds orchestrators for aggregated/disaggregated video pipelines with GPU assignment and cleanup traps.
Rust async-openai: chat/message/responses `lib/async-openai/src/types/chat.rs`, `.../message.rs`, `.../responses.rs`	Extends API to support video URL content: new structs/enums (VideoUrl, content parts, input video). Updates message content and delta variants.
Tests: vLLM video config & payload `tests/serve/test_vllm.py`	Adds VLLMConfig dataclass and video_agg config. Adjusts multimodal detection and adds video payload generation.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Frontend as Ingress Frontend
  participant Processor
  participant Encode as VideoEncodeWorker
  participant PD as VllmPDWorker (Prefill/Decode)
  participant Model as vLLM Engine

  Client->>Frontend: POST /chat (video_url, prompt)
  Frontend->>Processor: Forward request
  Processor->>Encode: vLLMMultimodalRequest(multimodal_input.video_url)
  Encode->>Encode: Load + decode video, sample frames, resize
  Encode->>PD: RDMA descriptor (video tensor), metadata
  PD->>Model: Prefill/Decode with video mm-data
  Model-->>PD: Tokens / outputs (stream or final)
  PD-->>Processor: MyRequestOutput stream
  Processor-->>Frontend: Aggregate/forward
  Frontend-->>Client: Response

sequenceDiagram
  autonumber
  participant Client
  participant Frontend
  participant Processor
  participant Encode as VideoEncodeWorker
  participant Prefill as VllmPDWorker (Prefill)
  participant Decode as VllmDecodeWorker
  participant Model

  Client->>Frontend: POST /chat (video_url)
  Frontend->>Processor: Forward
  Processor->>Encode: Request video embeddings/tensor
  Encode-->>Processor: RDMA descriptor (video tensor)
  Processor->>Prefill: Prefill with mm-data (video)
  Prefill->>Decode: Handoff KV/state
  Decode->>Model: Decode tokens
  Model-->>Decode: Outputs
  Decode-->>Processor: Stream outputs
  Processor-->>Frontend: Forward
  Frontend-->>Client: Response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

Possibly related PRs

feat: Video support with Dynamo #1443 — Adds similar multimodal video support across workers, protocol, and scripts; overlaps with new video encode/PD flow.
feat: Add vllm multimodal qwen aggregated support #2694 — Modifies the same multimodal components (encode_worker, worker, model.construct_mm_data, protocol request types) with aligned API changes.
ci: Add vllm multimodal example to pytest #2451 — Updates tests/serve/test_vllm.py to support multimodal vLLM configs and launcher args, matching this PR’s test changes.

Poem

A twitch of whiskers, reels in flight,
Frames hop by in rainbow light.
I nibble bytes, then chew the stream—
From video seeds, I sprout a dream.
Prefill, decode—thump-thump, hooray!
Multimodal meadows, we bound today.
(🥕▶️🐇)

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Signed-off-by: krishung5 <[email protected]>

krishung5 · 2025-08-27T20:37:41Z

/ok to test 6f54bf1

indrajit96

LGTM, some minor suggestions

examples/multimodal/utils/video_utils.py

examples/multimodal/components/video_encode_worker.py

Signed-off-by: krishung5 <[email protected]>

krishung5 · 2025-08-27T23:47:59Z

/ok to test 37d9859

indrajit96

LGTM!

krishung5 · 2025-08-28T03:41:42Z

@coderabbitai resolve

coderabbitai · 2025-08-28T03:41:56Z

✅ Actions performed

Comments resolved.

Signed-off-by: krishung5 <[email protected]> Signed-off-by: Jason Zhou <[email protected]>

Signed-off-by: krishung5 <[email protected]> Signed-off-by: Michael Shin <[email protected]>

Signed-off-by: krishung5 <[email protected]> Signed-off-by: Krishnan Prashanth <[email protected]>

Signed-off-by: krishung5 <[email protected]> Signed-off-by: nnshah1 <[email protected]>

krishung5 requested review from a team, GuanLuo, PeaBrane, alec-flowers, biswapanda, grahamking, hhzhang16, indrajit96, ishandhanani, kkranen, nnshah1, paulhendricks, piotrm-nvidia, ptarasiewiczNV, rmccorm4, ryanolson, tanmayv25, tedzhouhk, tmonty12 and whoisj as code owners August 27, 2025 00:27

pull-request-size bot added the size/XXL label Aug 27, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 27, 2025 00:27 Inactive

github-actions bot added the feat label Aug 27, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 27, 2025 00:28 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 27, 2025 00:29 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 27, 2025 00:33 Inactive

krishung5 added 2 commits August 27, 2025 13:36

Linter

3e4b8b1

Signed-off-by: krishung5 <[email protected]>

Address comment

b024f6d

Signed-off-by: krishung5 <[email protected]>

krishung5 force-pushed the krish/video branch from 1a0b222 to b024f6d Compare August 27, 2025 20:36

copy-pr-bot bot temporarily deployed to GITLAB August 27, 2025 20:36 Inactive

Merge remote-tracking branch 'origin' into krish/video

6f54bf1

Signed-off-by: krishung5 <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB August 27, 2025 20:37 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 27, 2025 20:41 Inactive

GuanLuo approved these changes Aug 27, 2025

View reviewed changes

indrajit96 reviewed Aug 27, 2025

View reviewed changes

examples/multimodal/utils/video_utils.py Outdated Show resolved Hide resolved

examples/multimodal/components/video_encode_worker.py Outdated Show resolved Hide resolved

copy-pr-bot bot temporarily deployed to GITLAB August 27, 2025 23:46 Inactive

krishung5 added 2 commits August 27, 2025 16:47

Address comments

81821c8

Signed-off-by: krishung5 <[email protected]>

Merge remote-tracking branch 'origin' into krish/video

37d9859

Signed-off-by: krishung5 <[email protected]>

krishung5 force-pushed the krish/video branch from cfbdc41 to 37d9859 Compare August 27, 2025 23:47

copy-pr-bot bot temporarily deployed to GITLAB August 27, 2025 23:47 Inactive

krishung5 requested a review from indrajit96 August 27, 2025 23:49

indrajit96 approved these changes Aug 27, 2025

View reviewed changes

krishung5 enabled auto-merge (squash) August 27, 2025 23:52

copy-pr-bot bot temporarily deployed to GITLAB August 27, 2025 23:52 Inactive

krishung5 merged commit 78a1107 into main Aug 28, 2025
14 of 15 checks passed

krishung5 deleted the krish/video branch August 28, 2025 03:41

coderabbitai bot mentioned this pull request Aug 28, 2025

feat: Add vLLM multimodal audio support #2760

Merged

jasonqinzhou pushed a commit that referenced this pull request Aug 30, 2025

feat: Add vLLM multimodal video support (#2738)

8eb06b1

Signed-off-by: krishung5 <[email protected]> Signed-off-by: Jason Zhou <[email protected]>

michaelshin pushed a commit that referenced this pull request Sep 2, 2025

feat: Add vLLM multimodal video support (#2738)

e02cf6c

Signed-off-by: krishung5 <[email protected]> Signed-off-by: Michael Shin <[email protected]>

KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025

feat: Add vLLM multimodal video support (#2738)

5d6b0eb

Signed-off-by: krishung5 <[email protected]> Signed-off-by: Krishnan Prashanth <[email protected]>

nnshah1 pushed a commit that referenced this pull request Sep 8, 2025

feat: Add vLLM multimodal video support (#2738)

6d0b524

Signed-off-by: krishung5 <[email protected]> Signed-off-by: nnshah1 <[email protected]>

coderabbitai bot mentioned this pull request Sep 15, 2025

chore: add additional param support for multimodal models #3042

Merged

feat: Add vLLM multimodal video support #2738

feat: Add vLLM multimodal video support #2738

Uh oh!

Conversation

krishung5 commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

krishung5 commented Aug 27, 2025

Uh oh!

krishung5 commented Aug 27, 2025

Uh oh!

coderabbitai bot commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

krishung5 commented Aug 27, 2025

Uh oh!

indrajit96 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

krishung5 commented Aug 27, 2025

Uh oh!

indrajit96 left a comment

Choose a reason for hiding this comment

Uh oh!

krishung5 commented Aug 28, 2025

Uh oh!

coderabbitai bot commented Aug 28, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

krishung5 commented Aug 27, 2025 •

edited

Loading

coderabbitai bot commented Aug 27, 2025 •

edited

Loading