docs: consolidate multimodal docs #4842

nealvaidya · 2025-12-10T04:36:49Z

Overview:

Pull together all the various multimodal docs into a single folder. Also standardize on EPD terminology using EPD, E/PD, E/P/D, and EP/D to describe the different disagg strategies.

Where should the reviewer start?

Start with multimodal/index.md. Would appreciate if framework PICs could look at the respective framework pages to ensure no inaccuracies were added during the migration.

Summary by CodeRabbit

Documentation
- Reorganized multimodal inference documentation into a centralized guide with backend-specific sections for vLLM, TensorRT-LLM, and SGLang.
- Consolidated multimodal deployment patterns, architecture guidance, and workflow examples under a single location.
- Added automatic redirects to maintain compatibility with previously linked documentation pages.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: Neal Vaidya <[email protected]>

coderabbitai · 2025-12-10T04:41:18Z

Walkthrough

This pull request reorganizes multimodal documentation by consolidating backend-specific guides scattered across individual backend folders into a centralized docs/multimodal/ directory, with new unified backend documentation files for vLLM, TensorRT-LLM, and SGLang. Old documentation files are removed, links are updated, and HTTP redirects are configured to preserve backward compatibility.

Changes

Cohort / File(s)	Summary
Centralized Multimodal Documentation `docs/multimodal/index.md`, `docs/multimodal/sglang.md`, `docs/multimodal/trtllm.md`, `docs/multimodal/vllm.md`	Four new documentation files created in a dedicated multimodal directory providing unified guides for multimodal inference, architecture patterns, deployment strategies, component roles, and supported models across vLLM, TensorRT-LLM, and SGLang backends.
Deleted Backend-Specific Multimodal Documentation `docs/backends/sglang/multimodal_epd.md`, `docs/backends/sglang/multimodal_sglang_guide.md`, `docs/backends/trtllm/multimodal_support.md`, `docs/backends/trtllm/multimodal_trtllm_guide.md`, `docs/backends/trtllm/multinode/multinode-multimodal-example.md`, `docs/backends/vllm/multimodal.md`, `docs/backends/vllm/multimodal_vllm_guide.md`, `docs/multimodal/multimodal_intro.md`	Eight documentation files removed from dispersed locations; content consolidated into centralized multimodal guides.
Backend README Link Updates `docs/backends/sglang/README.md`, `docs/backends/trtllm/README.md`	Hyperlinks updated to reference new centralized multimodal documentation paths instead of local backend-specific files.
Navigation and Redirect Configuration `docs/conf.py`	Eight new redirect mappings added to preserve backward compatibility for removed and relocated multimodal documentation files.
Documentation Structure Updates `docs/hidden_toctree.rst`, `docs/index.rst`	Removed dispersed multimodal entries from toctree; updated primary multimodal link from `multimodal/multimodal_intro.md` to `multimodal/index.md`.
Project Metadata `docs/project.json`	No functional changes detected.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Verify redirect mappings in docs/conf.py are correct and cover all deleted/moved files
Check link consistency in updated backend README files and toctree entries
Validate content consolidation by confirming key multimodal concepts are preserved in centralized files
Confirm navigation structure maintains logical organization for users navigating multimodal documentation

Poem

🐰 Documentation paths did intertwine,
So we gathered them in one place divine,
Old links now redirect with care,
Multimodal guides consolidated there! ✨
No code was changed, just files rearranged,
For cleaner docs—beautifully arranged!

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description covers the Overview and Where should the reviewer start sections, but is missing the Details and Related Issues sections required by the template.	Add a Details section explaining the specific changes (file reorganization, terminology standardization) and include a Related Issues section with relevant GitHub issue references.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main objective: consolidating multimodal documentation into a centralized folder structure.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f57fd72 and 36a4b9d.

📒 Files selected for processing (18)

docs/backends/sglang/README.md (2 hunks)
docs/backends/sglang/multimodal_epd.md (0 hunks)
docs/backends/sglang/multimodal_sglang_guide.md (0 hunks)
docs/backends/trtllm/README.md (1 hunks)
docs/backends/trtllm/multimodal_support.md (0 hunks)
docs/backends/trtllm/multimodal_trtllm_guide.md (0 hunks)
docs/backends/trtllm/multinode/multinode-multimodal-example.md (0 hunks)
docs/backends/vllm/multimodal.md (0 hunks)
docs/backends/vllm/multimodal_vllm_guide.md (0 hunks)
docs/conf.py (1 hunks)
docs/hidden_toctree.rst (0 hunks)
docs/index.rst (1 hunks)
docs/multimodal/index.md (1 hunks)
docs/multimodal/multimodal_intro.md (0 hunks)
docs/multimodal/sglang.md (1 hunks)
docs/multimodal/trtllm.md (1 hunks)
docs/multimodal/vllm.md (1 hunks)
docs/project.json (1 hunks)

💤 Files with no reviewable changes (9)

docs/backends/vllm/multimodal_vllm_guide.md
docs/backends/sglang/multimodal_epd.md
docs/backends/trtllm/multinode/multinode-multimodal-example.md
docs/backends/sglang/multimodal_sglang_guide.md
docs/backends/trtllm/multimodal_support.md
docs/hidden_toctree.rst
docs/multimodal/multimodal_intro.md
docs/backends/vllm/multimodal.md
docs/backends/trtllm/multimodal_trtllm_guide.md

🧰 Additional context used

🧠 Learnings (7)

📚 Learning: 2025-09-16T19:47:30.312Z

Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.

Applied to files:

docs/multimodal/trtllm.md
docs/multimodal/sglang.md
docs/backends/trtllm/README.md
docs/multimodal/vllm.md
docs/multimodal/index.md

📚 Learning: 2025-10-28T05:48:37.621Z

Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_utils/model.py:39-42
Timestamp: 2025-10-28T05:48:37.621Z
Learning: In components/src/dynamo/vllm/multimodal_utils/model.py, the AutoModel.from_pretrained call with trust_remote_code=True in the load_vision_model function is intentional and expected for the vLLM multimodal implementation.

Applied to files:

docs/multimodal/trtllm.md
docs/backends/trtllm/README.md

📚 Learning: 2025-10-28T04:09:48.264Z

Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_handlers/processor_handler.py:66-72
Timestamp: 2025-10-28T04:09:48.264Z
Learning: In components/src/dynamo/vllm/multimodal_handlers/processor_handler.py, the AutoTokenizer.from_pretrained call with trust_remote_code=True is intentional and expected for the vLLM multimodal handler implementation.

Applied to files:

docs/multimodal/trtllm.md

📚 Learning: 2025-09-04T19:03:06.643Z

Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 2872
File: examples/multimodal/deploy/agg_qwen.yaml:53-60
Timestamp: 2025-09-04T19:03:06.643Z
Learning: In the dynamo repository, Kubernetes Custom Resources use `gpu: "1"` format for GPU resource limits and requests, not the standard Kubernetes `nvidia.com/gpu: 1` format. This applies to DynamoGraphDeployment resources and other dynamo CRs.

Applied to files:

docs/project.json

📚 Learning: 2025-08-30T20:43:10.091Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project, devcontainer.json files use templated container names (like "dynamo-vllm-devcontainer") that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Applied to files:

docs/project.json

📚 Learning: 2025-08-30T20:43:10.091Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project's devcontainer setup, hard-coded container names in devcontainer.json files serve as templates that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Applied to files:

docs/project.json

📚 Learning: 2025-12-02T18:13:40.065Z

Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 4698
File: .github/workflows/container-validation-dynamo.yml:68-68
Timestamp: 2025-12-02T18:13:40.065Z
Learning: In the ai-dynamo/dynamo repository, backend-specific tests (vllm, sglang, trtllm) are intentionally excluded from the container-validation-dynamo.yml workflow using "not (vllm or sglang or trtllm)" because they run in a separate container-validation-backends.yml workflow that has dedicated jobs for each backend. This separation keeps framework-agnostic tests separate from backend-specific tests.

Applied to files:

docs/multimodal/index.md

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and Test - dynamo

🔇 Additional comments (12)

docs/project.json (1)

1-1: No changes to review.

This file contains only metadata and shows no observable changes in the diff.

docs/multimodal/sglang.md (1)

1-50: Approve new SGLang multimodal documentation with verification note.

The new comprehensive guide is well-structured, clearly explains E/PD and E/P/D patterns (correctly excluding EPD and EP/D per support matrix), and provides good technical detail on bootstrap coordination, vision encoding, and NIXL usage. The content aligns with the centralized multimodal index and appropriate cross-references are in place.

Verify that the launch scripts referenced in the guide exist:

examples/backends/sglang/launch/multimodal_agg.sh (line 113)

examples/backends/sglang/launch/multimodal_disagg.sh (line 176)

Also confirm the Qwen model references (lines 123, 186) are current and match available versions.

docs/index.rst (1)

61-61: Approve documentation index update.

The toctree entry has been correctly updated to reference the new centralized multimodal documentation index at multimodal/index.md, reflecting the consolidation goal of the PR.

docs/backends/trtllm/README.md (1)

243-243: Approve TRT-LLM documentation link update.

The reference has been correctly updated to point to the new centralized TRT-LLM multimodal guide at ../../multimodal/trtllm.md. The generic description avoids pattern-specific claims and correctly reflects the consolidation.

docs/multimodal/vllm.md (1)

1-100: Approve comprehensive vLLM multimodal documentation.

The new guide is thorough, well-organized, and covers all four deployment patterns with support for images, video, and audio. The security requirement for --enable-multimodal is clearly called out, and experimental features are appropriately marked. Content aligns with the main multimodal index support matrix and terminology standards.

Verify that the launch scripts exist in the referenced locations:

examples/backends/vllm/launch/agg_multimodal_epd.sh (line 107)

examples/backends/vllm/launch/disagg_multimodal_epd.sh (line 170)

examples/backends/vllm/launch/agg_multimodal_llama.sh (line 197)

examples/backends/vllm/launch/disagg_multimodal_llama.sh (line 248)

Video and audio launch scripts (lines 282-283, 366-367)

Also confirm that the model references (Qwen, LLaVA, Llama versions) are current and widely available.

docs/multimodal/index.md (2)

1-66: Approve centralized multimodal documentation index.

The new index provides an excellent consolidation point with clear definitions of all four EPD terminology variants (EPD, E/PD, E/P/D, EP/D), accurate support matrices for each backend, and proper input format compatibility information. The toctree structure correctly references the three backend-specific guides.

Verify support matrix accuracy by having framework PICs review their respective sections:

vLLM PIC: Confirm line 44 (supports all four patterns, images, video, audio experimental)

TRT-LLM PIC: Confirm line 45 (E/PD not supported, E/P/D WIP, EP/D and EPD supported)

SGLang PIC: Confirm line 46 (E/PD and E/P/D only, images only)

This aligns with the PR guidance that "Framework PICs are asked to check their respective framework pages for any inaccuracies introduced during the migration."

81-179: Approve architecture pattern definitions.

The four deployment patterns are clearly explained with helpful text descriptions, ASCII diagrams, component tables, and guidance on when to use each pattern. Terminology is consistent throughout and aligns with the standardization goal.

docs/multimodal/trtllm.md (3)

1-100: Approve comprehensive TRT-LLM multimodal documentation.

The guide provides thorough coverage of TRT-LLM multimodal capabilities, correctly documents supported patterns (EPD and EP/D for production, E/P/D with WIP caveats), and clearly distinguishes between HTTP/HTTPS URLs and pre-computed embeddings. The experimental nature of E/P/D with embeddings is properly marked, and multi-node SLURM deployment guidance is valuable.

Verify the following engine configuration and script paths exist:

Engine config files referenced for aggregated serving (line 71):

examples/backends/trtllm/engine_configs/llama4/multimodal/agg.yaml

Engine config files referenced for disaggregated serving (lines 112-113):

examples/backends/trtllm/engine_configs/qwen2-vl-7b-instruct/prefill.yaml

examples/backends/trtllm/engine_configs/qwen2-vl-7b-instruct/decode.yaml

Launch scripts:

examples/backends/trtllm/launch/disagg.sh (referenced with MODALITY="multimodal", line 116)

examples/backends/trtllm/launch/epd_disagg.sh (line 200)

SLURM example scripts (line 270):

examples/basics/multinode/trtllm/srun_disaggregated.sh and related scripts

146-175: Approve pre-computed embeddings E/P/D documentation.

The section clearly explains the experimental status, required TensorRT-LLM commit, supported file types, and two embedding formats (simple tensor and dictionary). The zero-copy NIXL architecture is well-explained with sequence diagrams and configuration examples.

266-323: Approve multi-node SLURM deployment guidance.

The environment setup, disaggregated launch procedure, and expected output walkthrough provide clear steps for multi-node deployment. The cleanup section is helpful.

docs/backends/sglang/README.md (1)

41-41: The relative path resolves correctly.

The hyperlink at line 41 (and 279) uses the relative path ../../multimodal/sglang.md from docs/backends/sglang/README.md, which correctly resolves to docs/multimodal/sglang.md.

docs/conf.py (1)

89-97: Multimodal redirect mappings look consistent and depth‑correct

The new redirects correctly resolve from the old HTML locations to the consolidated multimodal pages (including the deeper backends/trtllm/multinode/... case), and multimodal/multimodal_intro → index.html keeps users on the new multimodal index. This block is coherent with the PR’s goal of centralizing multimodal docs.

docs/backends/sglang/README.md

docs/multimodal/index.md

docs/multimodal/sglang.md

docs/multimodal/index.md

docs/backends/sglang/README.md

krishung5 · 2025-12-10T17:21:05Z

Thanks for pulling this together! Overall lgtm, left a few comments.

Signed-off-by: Neal Vaidya <[email protected]>

krishung5

LGTM, thanks

docs: consolidate multimodal docs

36a4b9d

Signed-off-by: Neal Vaidya <[email protected]>

nealvaidya requested review from a team as code owners December 10, 2025 04:36

pull-request-size bot added the size/XXL label Dec 10, 2025

github-actions bot added the docs label Dec 10, 2025

fix broken link

7ad4346

Signed-off-by: Neal Vaidya <[email protected]>

coderabbitai bot reviewed Dec 10, 2025

View reviewed changes

docs/backends/sglang/README.md Outdated Show resolved Hide resolved

krishung5 reviewed Dec 10, 2025

View reviewed changes

docs/multimodal/index.md Outdated Show resolved Hide resolved

krishung5 reviewed Dec 10, 2025

View reviewed changes

docs/multimodal/sglang.md Outdated Show resolved Hide resolved

krishung5 reviewed Dec 10, 2025

View reviewed changes

docs/multimodal/index.md Show resolved Hide resolved

krishung5 reviewed Dec 10, 2025

View reviewed changes

docs/backends/sglang/README.md Outdated Show resolved Hide resolved

nealvaidya added 2 commits December 10, 2025 19:33

address comments

82e1efc

Signed-off-by: Neal Vaidya <[email protected]>

clarify E/P/D arch

c16a446

Signed-off-by: Neal Vaidya <[email protected]>

krishung5 approved these changes Dec 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: consolidate multimodal docs #4842

docs: consolidate multimodal docs #4842

Uh oh!

nealvaidya commented Dec 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 10, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

krishung5 commented Dec 10, 2025

Uh oh!

krishung5 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

docs: consolidate multimodal docs #4842

Are you sure you want to change the base?

docs: consolidate multimodal docs #4842

Uh oh!

Conversation

nealvaidya commented Dec 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Where should the reviewer start?

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 10, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

krishung5 commented Dec 10, 2025

Uh oh!

krishung5 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nealvaidya commented Dec 10, 2025 •

edited by coderabbitai bot

Loading