-
Notifications
You must be signed in to change notification settings - Fork 736
docs: consolidate multimodal docs #4842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Neal Vaidya <[email protected]>
Signed-off-by: Neal Vaidya <[email protected]>
WalkthroughThis pull request reorganizes multimodal documentation by consolidating backend-specific guides scattered across individual backend folders into a centralized Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
docs/backends/sglang/README.md(2 hunks)docs/backends/sglang/multimodal_epd.md(0 hunks)docs/backends/sglang/multimodal_sglang_guide.md(0 hunks)docs/backends/trtllm/README.md(1 hunks)docs/backends/trtllm/multimodal_support.md(0 hunks)docs/backends/trtllm/multimodal_trtllm_guide.md(0 hunks)docs/backends/trtllm/multinode/multinode-multimodal-example.md(0 hunks)docs/backends/vllm/multimodal.md(0 hunks)docs/backends/vllm/multimodal_vllm_guide.md(0 hunks)docs/conf.py(1 hunks)docs/hidden_toctree.rst(0 hunks)docs/index.rst(1 hunks)docs/multimodal/index.md(1 hunks)docs/multimodal/multimodal_intro.md(0 hunks)docs/multimodal/sglang.md(1 hunks)docs/multimodal/trtllm.md(1 hunks)docs/multimodal/vllm.md(1 hunks)docs/project.json(1 hunks)
💤 Files with no reviewable changes (9)
- docs/backends/vllm/multimodal_vllm_guide.md
- docs/backends/sglang/multimodal_epd.md
- docs/backends/trtllm/multinode/multinode-multimodal-example.md
- docs/backends/sglang/multimodal_sglang_guide.md
- docs/backends/trtllm/multimodal_support.md
- docs/hidden_toctree.rst
- docs/multimodal/multimodal_intro.md
- docs/backends/vllm/multimodal.md
- docs/backends/trtllm/multimodal_trtllm_guide.md
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2025-09-16T19:47:30.312Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.
Applied to files:
docs/multimodal/trtllm.mddocs/multimodal/sglang.mddocs/backends/trtllm/README.mddocs/multimodal/vllm.mddocs/multimodal/index.md
📚 Learning: 2025-10-28T05:48:37.621Z
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_utils/model.py:39-42
Timestamp: 2025-10-28T05:48:37.621Z
Learning: In components/src/dynamo/vllm/multimodal_utils/model.py, the AutoModel.from_pretrained call with trust_remote_code=True in the load_vision_model function is intentional and expected for the vLLM multimodal implementation.
Applied to files:
docs/multimodal/trtllm.mddocs/backends/trtllm/README.md
📚 Learning: 2025-10-28T04:09:48.264Z
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_handlers/processor_handler.py:66-72
Timestamp: 2025-10-28T04:09:48.264Z
Learning: In components/src/dynamo/vllm/multimodal_handlers/processor_handler.py, the AutoTokenizer.from_pretrained call with trust_remote_code=True is intentional and expected for the vLLM multimodal handler implementation.
Applied to files:
docs/multimodal/trtllm.md
📚 Learning: 2025-09-04T19:03:06.643Z
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 2872
File: examples/multimodal/deploy/agg_qwen.yaml:53-60
Timestamp: 2025-09-04T19:03:06.643Z
Learning: In the dynamo repository, Kubernetes Custom Resources use `gpu: "1"` format for GPU resource limits and requests, not the standard Kubernetes `nvidia.com/gpu: 1` format. This applies to DynamoGraphDeployment resources and other dynamo CRs.
Applied to files:
docs/project.json
📚 Learning: 2025-08-30T20:43:10.091Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project, devcontainer.json files use templated container names (like "dynamo-vllm-devcontainer") that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.
Applied to files:
docs/project.json
📚 Learning: 2025-08-30T20:43:10.091Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project's devcontainer setup, hard-coded container names in devcontainer.json files serve as templates that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.
Applied to files:
docs/project.json
📚 Learning: 2025-12-02T18:13:40.065Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 4698
File: .github/workflows/container-validation-dynamo.yml:68-68
Timestamp: 2025-12-02T18:13:40.065Z
Learning: In the ai-dynamo/dynamo repository, backend-specific tests (vllm, sglang, trtllm) are intentionally excluded from the container-validation-dynamo.yml workflow using "not (vllm or sglang or trtllm)" because they run in a separate container-validation-backends.yml workflow that has dedicated jobs for each backend. This separation keeps framework-agnostic tests separate from backend-specific tests.
Applied to files:
docs/multimodal/index.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (12)
docs/project.json (1)
1-1: No changes to review.This file contains only metadata and shows no observable changes in the diff.
docs/multimodal/sglang.md (1)
1-50: Approve new SGLang multimodal documentation with verification note.The new comprehensive guide is well-structured, clearly explains E/PD and E/P/D patterns (correctly excluding EPD and EP/D per support matrix), and provides good technical detail on bootstrap coordination, vision encoding, and NIXL usage. The content aligns with the centralized multimodal index and appropriate cross-references are in place.
Verify that the launch scripts referenced in the guide exist:
examples/backends/sglang/launch/multimodal_agg.sh(line 113)examples/backends/sglang/launch/multimodal_disagg.sh(line 176)Also confirm the Qwen model references (lines 123, 186) are current and match available versions.
docs/index.rst (1)
61-61: Approve documentation index update.The toctree entry has been correctly updated to reference the new centralized multimodal documentation index at
multimodal/index.md, reflecting the consolidation goal of the PR.docs/backends/trtllm/README.md (1)
243-243: Approve TRT-LLM documentation link update.The reference has been correctly updated to point to the new centralized TRT-LLM multimodal guide at
../../multimodal/trtllm.md. The generic description avoids pattern-specific claims and correctly reflects the consolidation.docs/multimodal/vllm.md (1)
1-100: Approve comprehensive vLLM multimodal documentation.The new guide is thorough, well-organized, and covers all four deployment patterns with support for images, video, and audio. The security requirement for
--enable-multimodalis clearly called out, and experimental features are appropriately marked. Content aligns with the main multimodal index support matrix and terminology standards.Verify that the launch scripts exist in the referenced locations:
examples/backends/vllm/launch/agg_multimodal_epd.sh(line 107)examples/backends/vllm/launch/disagg_multimodal_epd.sh(line 170)examples/backends/vllm/launch/agg_multimodal_llama.sh(line 197)examples/backends/vllm/launch/disagg_multimodal_llama.sh(line 248)- Video and audio launch scripts (lines 282-283, 366-367)
Also confirm that the model references (Qwen, LLaVA, Llama versions) are current and widely available.
docs/multimodal/index.md (2)
1-66: Approve centralized multimodal documentation index.The new index provides an excellent consolidation point with clear definitions of all four EPD terminology variants (EPD, E/PD, E/P/D, EP/D), accurate support matrices for each backend, and proper input format compatibility information. The toctree structure correctly references the three backend-specific guides.
Verify support matrix accuracy by having framework PICs review their respective sections:
- vLLM PIC: Confirm line 44 (supports all four patterns, images, video, audio experimental)
- TRT-LLM PIC: Confirm line 45 (E/PD not supported, E/P/D WIP, EP/D and EPD supported)
- SGLang PIC: Confirm line 46 (E/PD and E/P/D only, images only)
This aligns with the PR guidance that "Framework PICs are asked to check their respective framework pages for any inaccuracies introduced during the migration."
81-179: Approve architecture pattern definitions.The four deployment patterns are clearly explained with helpful text descriptions, ASCII diagrams, component tables, and guidance on when to use each pattern. Terminology is consistent throughout and aligns with the standardization goal.
docs/multimodal/trtllm.md (3)
1-100: Approve comprehensive TRT-LLM multimodal documentation.The guide provides thorough coverage of TRT-LLM multimodal capabilities, correctly documents supported patterns (EPD and EP/D for production, E/P/D with WIP caveats), and clearly distinguishes between HTTP/HTTPS URLs and pre-computed embeddings. The experimental nature of E/P/D with embeddings is properly marked, and multi-node SLURM deployment guidance is valuable.
Verify the following engine configuration and script paths exist:
Engine config files referenced for aggregated serving (line 71):
examples/backends/trtllm/engine_configs/llama4/multimodal/agg.yamlEngine config files referenced for disaggregated serving (lines 112-113):
examples/backends/trtllm/engine_configs/qwen2-vl-7b-instruct/prefill.yamlexamples/backends/trtllm/engine_configs/qwen2-vl-7b-instruct/decode.yamlLaunch scripts:
examples/backends/trtllm/launch/disagg.sh(referenced with MODALITY="multimodal", line 116)examples/backends/trtllm/launch/epd_disagg.sh(line 200)SLURM example scripts (line 270):
examples/basics/multinode/trtllm/srun_disaggregated.shand related scripts
146-175: Approve pre-computed embeddings E/P/D documentation.The section clearly explains the experimental status, required TensorRT-LLM commit, supported file types, and two embedding formats (simple tensor and dictionary). The zero-copy NIXL architecture is well-explained with sequence diagrams and configuration examples.
266-323: Approve multi-node SLURM deployment guidance.The environment setup, disaggregated launch procedure, and expected output walkthrough provide clear steps for multi-node deployment. The cleanup section is helpful.
docs/backends/sglang/README.md (1)
41-41: The relative path resolves correctly.The hyperlink at line 41 (and 279) uses the relative path
../../multimodal/sglang.mdfromdocs/backends/sglang/README.md, which correctly resolves todocs/multimodal/sglang.md.docs/conf.py (1)
89-97: Multimodal redirect mappings look consistent and depth‑correctThe new redirects correctly resolve from the old HTML locations to the consolidated
multimodalpages (including the deeperbackends/trtllm/multinode/...case), andmultimodal/multimodal_intro→index.htmlkeeps users on the new multimodal index. This block is coherent with the PR’s goal of centralizing multimodal docs.
|
Thanks for pulling this together! Overall lgtm, left a few comments. |
Signed-off-by: Neal Vaidya <[email protected]>
Signed-off-by: Neal Vaidya <[email protected]>
krishung5
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
Overview:
Pull together all the various multimodal docs into a single folder. Also standardize on EPD terminology using EPD, E/PD, E/P/D, and EP/D to describe the different disagg strategies.
Where should the reviewer start?
Start with
multimodal/index.md. Would appreciate if framework PICs could look at the respective framework pages to ensure no inaccuracies were added during the migration.Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.