-
Notifications
You must be signed in to change notification settings - Fork 766
feat: Add vLLM multimodal video support #2738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/ok to test b6d2d2f |
|
/ok to test d382e44 |
WalkthroughAdds end-to-end video modality support to the multimodal examples: new video encode worker, video utilities, protocol/model updates, processor/PD worker adaptations, HTTP client refactor, launch scripts for aggregated/disaggregated video serving, README sections, Rust async-openai type extensions for video content, and vLLM tests for video pipelines. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Frontend as Ingress Frontend
participant Processor
participant Encode as VideoEncodeWorker
participant PD as VllmPDWorker (Prefill/Decode)
participant Model as vLLM Engine
Client->>Frontend: POST /chat (video_url, prompt)
Frontend->>Processor: Forward request
Processor->>Encode: vLLMMultimodalRequest(multimodal_input.video_url)
Encode->>Encode: Load + decode video, sample frames, resize
Encode->>PD: RDMA descriptor (video tensor), metadata
PD->>Model: Prefill/Decode with video mm-data
Model-->>PD: Tokens / outputs (stream or final)
PD-->>Processor: MyRequestOutput stream
Processor-->>Frontend: Aggregate/forward
Frontend-->>Client: Response
sequenceDiagram
autonumber
participant Client
participant Frontend
participant Processor
participant Encode as VideoEncodeWorker
participant Prefill as VllmPDWorker (Prefill)
participant Decode as VllmDecodeWorker
participant Model
Client->>Frontend: POST /chat (video_url)
Frontend->>Processor: Forward
Processor->>Encode: Request video embeddings/tensor
Encode-->>Processor: RDMA descriptor (video tensor)
Processor->>Prefill: Prefill with mm-data (video)
Prefill->>Decode: Handoff KV/state
Decode->>Model: Decode tokens
Model-->>Decode: Outputs
Decode-->>Processor: Stream outputs
Processor-->>Frontend: Forward
Frontend-->>Client: Response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60–90 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
Signed-off-by: krishung5 <[email protected]>
Signed-off-by: krishung5 <[email protected]>
Signed-off-by: krishung5 <[email protected]>
|
/ok to test 6f54bf1 |
indrajit96
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, some minor suggestions
Signed-off-by: krishung5 <[email protected]>
Signed-off-by: krishung5 <[email protected]>
|
/ok to test 37d9859 |
indrajit96
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
@coderabbitai resolve |
✅ Actions performedComments resolved. |
Signed-off-by: krishung5 <[email protected]> Signed-off-by: Jason Zhou <[email protected]>
Signed-off-by: krishung5 <[email protected]> Signed-off-by: Michael Shin <[email protected]>
Signed-off-by: krishung5 <[email protected]> Signed-off-by: Krishnan Prashanth <[email protected]>
Signed-off-by: krishung5 <[email protected]> Signed-off-by: nnshah1 <[email protected]>
Overview:
Add vLLM multimodal video support.
Closes DIS-155
Fixes #1946
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Documentation
Tests