Skip to content

Conversation

@indrajit96
Copy link
Contributor

@indrajit96 indrajit96 commented Nov 20, 2025

Overview:

Add comprehensive multimodal guides for vLLM, Sglang and TRT-LLM backends documenting architectures, deployment modes, input formats, and known limitations.

Details:

  • New: docs/backends/vllm/multimodal_vllm_guide.md - Complete vLLM multimodal reference
    -New: docs/backends/trtllm/multimodal_trtllm_guide.md - Complete TRT-LLM multimodal reference
    -New: dynamo/docs/backends/sglang/multimodal_sglang_guide.md - Complete SGlang multimodal reference

Summary by CodeRabbit

  • Documentation
    • Added comprehensive multimodal backend guides for SGLang, TRT-LLM, and vLLM inference backends
    • Guides document architecture patterns, deployment modes, component responsibilities, and known limitations
    • Includes practical configuration examples and architectural diagrams
    • Updated documentation index to surface new guides

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: Indrajit Bhosale <[email protected]>
Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update https://github.com/ai-dynamo/dynamo/blob/main/docs/multimodal/multimodal_intro.md at the bottom with links to each of the backend specific docs as a central location?

Signed-off-by: Indrajit Bhosale <[email protected]>
@rmccorm4 rmccorm4 changed the title 2/3 Done docs: Add multimodal documentation vllm, sglang, and trtllm backends Nov 21, 2025
@github-actions github-actions bot added the docs label Nov 21, 2025
Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: https://github.com/ai-dynamo/dynamo/actions/runs/19580969130/job/56078496698?pr=4510

#15 5.473 checking consistency... /workspace/dynamo/docs/backends/sglang/multimodal_sglang_guide.md: WARNING: document isn't included in any toctree [toc.not_included]
#15 5.474 /workspace/dynamo/docs/backends/trtllm/multimodal_trtllm_guide.md: WARNING: document isn't included in any toctree [toc.not_included]
#15 5.474 /workspace/dynamo/docs/backends/vllm/multimodal_vllm_guide.md: WARNING: document isn't included in any toctree [toc.not_included]

Needs these files added to docs/hidden_toctree.rst

@rmccorm4
Copy link
Contributor

@krishung5 can you help review the docs here? Main point is to clearly document what is supported in each backend today with relation to multimodality, and highlight at least 1 key example or model for each.

Copy link
Contributor

@krishung5 krishung5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting up this doc, great work! I think one minor comment is that, we have multimodal doc for all three frameworks, so maybe we can link them in the guide here somehow? i.e. vllm, trtllm here and here, and sglang

@krishung5 krishung5 marked this pull request as ready for review December 4, 2025 22:58
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 4, 2025

Walkthrough

Adds three new comprehensive multimodal backend guides for SGLang, TRT-LLM, and vLLM, documenting support matrices, deployment architectures, component configurations, inter-component communication patterns, and known limitations. Updates documentation registry to include these new guides.

Changes

Cohort / File(s) Summary
Multimodal Backend Guides
docs/backends/sglang/multimodal_sglang_guide.md, docs/backends/trtllm/multimodal_trtllm_guide.md, docs/backends/vllm/multimodal_vllm_guide.md
Adds three new documentation files covering multimodal support architecture, deployment patterns (aggregated/disaggregated modes), component responsibilities, NATS/NIXL inter-component communication, vision encoding specifics, model registration patterns, supported models, and comprehensive gaps/limitations documentation for each backend.
Documentation Registry
docs/hidden_toctree.rst
Registers four new documentation entries in the hidden toctree: three multimodal backend guides (TRT-LLM, SGLang, vLLM) and one vLLM LMCache integration guide.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Specific areas requiring attention:
    • Architectural consistency across the three backend guides (ensure deployment pattern descriptions align conceptually)
    • Accuracy of component responsibility descriptions and command-line flags
    • Completeness and correctness of gaps/limitations sections, particularly regarding current vs. planned support
    • Technical accuracy of NATS/NIXL communication patterns and token expansion mechanics described in each guide

Poem

🐰 Multimodal dreams now documented clear,
Three backends bloom with vision near,
SGLang, TRT-LLM, vLLM aligned,
Deployment patterns beautifully designed,
From aggregated to disaggregated flow,
The hidden toctree helps them glow!

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'docs: Add multimodal documentation vllm, sglang, and trtllm backends' directly and specifically describes the main change: adding multimodal documentation guides for three backends.
Description check ✅ Passed The PR description covers the required template sections: Overview (clear purpose), Details (specific files added), but lacks 'Where should the reviewer start?' and 'Related Issues' sections.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b8c4a5f and 81402c2.

📒 Files selected for processing (4)
  • docs/backends/sglang/multimodal_sglang_guide.md (1 hunks)
  • docs/backends/trtllm/multimodal_trtllm_guide.md (1 hunks)
  • docs/backends/vllm/multimodal_vllm_guide.md (1 hunks)
  • docs/hidden_toctree.rst (3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-16T19:47:30.312Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.

Applied to files:

  • docs/backends/trtllm/multimodal_trtllm_guide.md
  • docs/backends/vllm/multimodal_vllm_guide.md
  • docs/backends/sglang/multimodal_sglang_guide.md
📚 Learning: 2025-10-28T05:48:37.621Z
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_utils/model.py:39-42
Timestamp: 2025-10-28T05:48:37.621Z
Learning: In components/src/dynamo/vllm/multimodal_utils/model.py, the AutoModel.from_pretrained call with trust_remote_code=True in the load_vision_model function is intentional and expected for the vLLM multimodal implementation.

Applied to files:

  • docs/backends/trtllm/multimodal_trtllm_guide.md
🪛 LanguageTool
docs/backends/trtllm/multimodal_trtllm_guide.md

[uncategorized] ~175-~175: Do not mix variants of the same word (‘precompute’ and ‘pre-compute’) within a single text.
Context: ...dding path | No | | Encode → Prefill (Precomputed Embeddings) | NIXL metadata (pre-comp...

(EN_WORD_COHERENCY)


[uncategorized] ~186-~186: Do not mix variants of the same word (‘precompute’ and ‘pre-compute’) within a single text.
Context: ...UCX or NIXL) | | E->P->D Disaggregated (Precomputed Embeddings) | [`examples/backends/trtll...

(EN_WORD_COHERENCY)

docs/backends/sglang/multimodal_sglang_guide.md

[style] ~464-~464: To elevate your writing, try using a synonym here.
Context: ...ng difficulty**: Token count mismatches hard to diagnose Comparison with vLLM: ...

(HARD_TO)

🪛 markdownlint-cli2 (0.18.1)
docs/backends/trtllm/multimodal_trtllm_guide.md

33-33: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


62-62: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


86-86: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


115-115: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


224-224: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


235-235: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


295-295: Multiple headings with the same content

(MD024, no-duplicate-heading)

docs/backends/vllm/multimodal_vllm_guide.md

35-35: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


51-51: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


84-84: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


112-112: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


177-177: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


183-183: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


198-198: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/backends/sglang/multimodal_sglang_guide.md

35-35: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


51-51: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


84-84: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


112-112: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


177-177: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


183-183: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


198-198: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


222-222: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


375-375: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: Mirror Repository to GitLab
🔇 Additional comments (2)
docs/hidden_toctree.rst (2)

78-78: Verify LMCache_Integration.md file exists.

Line 78 adds backends/vllm/LMCache_Integration.md to the toctree, but this file was not included in the PR files for review. Confirm that this file exists or is intended to be added in this PR.


54-54: Toctree entries are well-structured.

The four new toctree entries (lines 54, 65, 78, 81) are correctly formatted and logically placed within their respective backend sections. They follow the existing naming and ordering conventions.

Also applies to: 65-65, 78-78, 81-81

@krishung5
Copy link
Contributor

Originally we have two TRTLLM multimodal docs, multimodal_support.md and multimodal_epd.md. I think it will be confusing if we have total 3 docs for multimodal(including this new guide) so I consolidated the first two into the multimodal_support.md.

@krishung5
Copy link
Contributor

/ok to test b2870e2

@krishung5 krishung5 requested a review from rmccorm4 December 6, 2025 06:50
Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up Kris!

  1. Only one small comment left for this PR: #4510 (comment)

  2. For a future PR maybe, I think it's a little confusing to users that sglang has a multimodal_epd.md but vllm/trtllm describe it in separate doc. Maybe we can move sglang EPD info into single doc, or split out the other backend EPDs to separate doc -- either way is fine, just to be consistent for all 3.

@krishung5
Copy link
Contributor

For a future PR maybe, I think it's a little confusing to users that sglang has a multimodal_epd.md but vllm/trtllm describe it in separate doc. Maybe we can move sglang EPD info into single doc, or split out the other backend EPDs to separate doc -- either way is fine, just to be consistent for all 3.

I think we could maybe just rename the sglang multimodal_epd.md to multimodal.md. For the other 2 backends, there's only 1 multimodal doc (would need to rename multimodal_support.md to multimodal.md as well for TRTLLM to have a consistent naming across all the backends). Sglang didn't have the non-epd workflow so that's why the doc was named with the epd keyword.

@krishung5 krishung5 merged commit 94d145a into main Dec 9, 2025
26 of 27 checks passed
@krishung5 krishung5 deleted the mm_docs branch December 9, 2025 15:38
karen-sy pushed a commit that referenced this pull request Dec 9, 2025
ryanolson added a commit that referenced this pull request Dec 10, 2025
Merged 238 commits from main branch to bring the feature branch up to date.

Key conflicts resolved:
- Removed lib/kvbm-kernels references (deleted in main)
- Kept nova/nova-backend/kvbm workspace members from feature branch
- Maintained v2 module API refactoring from feature branch
- Updated Cargo.lock files to reflect new dependencies

Major updates from main include:
- LoRA support for vLLM (#4810)
- Multimodal documentation (#4510)
- Scaling adapter features (#4699, #4825)
- Tool calling support (#4822, #4722)
- NIXL connect improvements (#4433)

Signed-off-by: Ryan Olson <[email protected]>
zxue2 pushed a commit to zxue2/dynamo that referenced this pull request Dec 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants