-
Notifications
You must be signed in to change notification settings - Fork 767
fix: support multimodal models with non-standard Jinja2 tags #4379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes improve handling of non-standard Jinja2 templates across multiple components. A regex-based sanitization replaces non-standard tags with placeholders before minijinja validation. Token ID extraction becomes more flexible, handling both single values and arrays. Text config layers become optional for multimodal models. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks✅ Passed checks (3 passed)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
lib/llm/src/preprocessor/prompt/template/formatters.rs (2)
51-51: Consider compiling the regex once for better performance.The regex is compiled on every function call. For a frequently used function, consider using
lazy_staticorstd::sync::OnceLockto compile it once.Apply this pattern:
use std::sync::OnceLock; static BLOCK_REGEX: OnceLock<Regex> = OnceLock::new(); fn replace_non_standard_blocks(template: &str) -> String { let re = BLOCK_REGEX.get_or_init(|| { Regex::new(r"\{%\s*([a-zA-Z_][a-zA-Z0-9_]*)\s*%\}").unwrap() }); // ... rest of the function }
59-63: Note: Case information is lost during round-trip transformation.The placeholder uses
to_uppercase()(line 61) and the Python restoration uses.lower()(chat_processor.py line 157), which means original case is not preserved. For example,{% Generation %}would become{% generation %}after the round-trip.This is acceptable if Jinja2 tag names are conventionally lowercase, but document this behavior if case preservation is ever needed.
components/src/dynamo/vllm/multimodal_utils/chat_processor.py (1)
152-152: Consider moving the import to the top of the file.While importing
reinside the conditional block is functionally correct, Python convention is to place all imports at the module level for better visibility and consistency.Apply this change:
import json +import re import time from typing import AsyncIterator, List, Optional, Protocol, Union, runtime_checkableAnd remove line 152.
lib/llm/src/model_card.rs (1)
673-730: Consider extracting the duplicated eos_token_id parsing logic.The logic to parse
eos_token_idas either a number or array (lines 678-701 and 711-730) is duplicated. This could be refactored into a helper function to improve maintainability.Extract to a helper function:
fn parse_eos_token_id(v: &serde_json::Value) -> Option<Vec<TokenIdType>> { if v.is_number() { v.as_number() .and_then(|n| n.as_u64()) .map(|n| vec![n as TokenIdType]) } else if v.is_array() { let arr = v.as_array().unwrap(); // Safety: We just checked Some( arr.iter() .filter_map(|inner_v| { inner_v .as_number() .and_then(|n| n.as_u64()) .map(|n| n as TokenIdType) }) .collect(), ) } else { None } }Then use it in both places:
let final_eos_token_ids: Vec<TokenIdType> = config .eos_token_id .as_ref() .or(text_config.eos_token_id.as_ref()) .and_then(parse_eos_token_id) .or_else(|| { crate::file_json_field::<serde_json::Value>(&gencfg_path, "eos_token_id") .ok() .and_then(|v| parse_eos_token_id(&v)) }) .ok_or_else(|| { anyhow::anyhow!( "missing eos_token_id in config.json and generation_config.json, cannot load" ) })?;
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (3)
components/src/dynamo/vllm/multimodal_utils/chat_processor.py(1 hunks)lib/llm/src/model_card.rs(2 hunks)lib/llm/src/preprocessor/prompt/template/formatters.rs(3 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 2778
File: components/backends/vllm/src/dynamo/vllm/args.py:150-151
Timestamp: 2025-08-29T12:32:52.257Z
Learning: In the custom_jinja_template implementation, KrishnanPrash chose to defer file validation to maintain consistency with how other template files (chat_template.jinja and tokenizer_config.json) are processed in the template loading pipeline, rather than validating the file path during argument parsing.
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3165
File: components/backends/sglang/src/dynamo/sglang/args.py:201-202
Timestamp: 2025-09-22T18:09:23.513Z
Learning: The Rust validation for custom_jinja_template paths is already implemented in lib/bindings/python/rust/lib.rs using PathBuf::from() and path.exists() checks with PyFileNotFoundError. Both vLLM and SGLang benefit from this validation since they both call register_llm(). The missing piece is path expansion (~ and environment variables) in the Python argument processing before passing to the Rust layer.
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3035
File: lib/runtime/src/metrics/prometheus_names.rs:49-53
Timestamp: 2025-09-16T00:26:37.092Z
Learning: keivenchang prefers consistency in metric naming standardization over strict adherence to Prometheus conventions about gauge vs counter suffixes. When standardizing metrics naming, prioritize consistency across the codebase rather than technical pedantry about individual metric type conventions.
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3051
File: container/templates/Dockerfile.trtllm.j2:424-437
Timestamp: 2025-09-16T17:16:03.785Z
Learning: keivenchang prioritizes maintaining exact backward compatibility during migration/refactoring PRs, even when bugs are identified in the original code. Fixes should be deferred to separate PRs after the migration is complete.
📚 Learning: 2025-09-22T18:09:23.513Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3165
File: components/backends/sglang/src/dynamo/sglang/args.py:201-202
Timestamp: 2025-09-22T18:09:23.513Z
Learning: Both vLLM and SGLang implementations of --custom-jinja-template pass the path directly to register_llm() without any validation. KrishnanPrash suggested implementing early validation in the Rust layer (lib/bindings/python/rust/lib.rs) using PathBuf::from() and path.exists() checks with PyFileNotFoundError for consistent error handling across both backends.
Applied to files:
lib/llm/src/preprocessor/prompt/template/formatters.rs
📚 Learning: 2025-09-22T18:09:23.513Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3165
File: components/backends/sglang/src/dynamo/sglang/args.py:201-202
Timestamp: 2025-09-22T18:09:23.513Z
Learning: KrishnanPrash suggested adding early validation for custom Jinja template paths in the Rust layer (lib/bindings/python/rust/lib.rs) to benefit both vLLM and SGLang workflows, using PathBuf::from() and path.exists() checks with appropriate PyFileNotFoundError handling.
Applied to files:
lib/llm/src/preprocessor/prompt/template/formatters.rs
📚 Learning: 2025-09-22T18:09:23.513Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3165
File: components/backends/sglang/src/dynamo/sglang/args.py:201-202
Timestamp: 2025-09-22T18:09:23.513Z
Learning: The Rust validation for custom_jinja_template paths is already implemented in lib/bindings/python/rust/lib.rs using PathBuf::from() and path.exists() checks with PyFileNotFoundError. Both vLLM and SGLang benefit from this validation since they both call register_llm(). The missing piece is path expansion (~ and environment variables) in the Python argument processing before passing to the Rust layer.
Applied to files:
lib/llm/src/preprocessor/prompt/template/formatters.rs
📚 Learning: 2025-09-02T16:46:54.015Z
Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.
Applied to files:
lib/llm/src/model_card.rs
🧬 Code graph analysis (1)
lib/llm/src/model_card.rs (1)
lib/llm/src/lib.rs (1)
file_json_field(63-100)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: trtllm (amd64)
- GitHub Check: sglang (amd64)
- GitHub Check: operator (arm64)
- GitHub Check: vllm (amd64)
- GitHub Check: operator (amd64)
- GitHub Check: Build and Test - dynamo
- GitHub Check: tests (launch/dynamo-run)
- GitHub Check: clippy (.)
- GitHub Check: clippy (launch/dynamo-run)
- GitHub Check: clippy (lib/bindings/python)
- GitHub Check: tests (.)
- GitHub Check: tests (lib/bindings/python)
- GitHub Check: tests (lib/runtime/examples)
🔇 Additional comments (6)
lib/llm/src/preprocessor/prompt/template/formatters.rs (3)
128-133: LGTM! Appropriate sanitization before template validation.The template is correctly sanitized before being added to the minijinja Environment, allowing validation to pass even with non-standard blocks.
155-157: LGTM! Consistent sanitization applied to all templates.The same sanitization is applied to templates in the map, ensuring consistent handling across different template types.
17-71: Remove review comment; the concern does not reflect actual vLLM template syntax.vLLM's generation block is implemented as
{% generation %} ... {% endgeneration %}without call-time arguments, not with arguments as the review suggests. minijinja does not expose custom block tag extensions in its public API, so the function's purpose is specifically to preprocess templates by removing non-standard blocks to allow minijinja validation of the remaining syntax. The regex correctly matches simple block tags like{% generation %}. Code search found no non-standard blocks with arguments in actual templates.Likely an incorrect or invalid review comment.
lib/llm/src/model_card.rs (3)
705-730: LGTM! Enhanced eos_token_id parsing supports both single values and arrays.The updated logic correctly handles both single integer and array formats for
eos_token_idingeneration_config.json, improving flexibility for different model configurations.
695-700: Good error handling for invalid eos_token_id formats.The error logging when
eos_token_idis neither a number nor an array provides helpful debugging information.
624-625: No issues found — field is not consumed by any code.The
num_hidden_layersfield appears only at its definition and is never accessed anywhere in the codebase. Since no code reads this field, making itOption<usize>has no impact on existing consumers. The change is safe, and deserialization will work correctly as serde handlesOptiontypes natively.
|
Thank you for fixing the EPD flow. Just a few comments, about the other multimodal flow. My mental model for multimodal models in dynamo+vLLM:
If we use the Normal Path in Dynamo + vLLM with this current PR, an inference request like this: curl -X POST http://localhost:8000/v1/chat/completions -H 'Content-Type: application/json' -d '{
"model": "llava-hf/llava-1.5-7b-hf",
"messages": [
{"role": "user", "content": "Who is Jensen Huang?"},
{"role": "assistant", "content": "Answer with clear detail. Do not omit details."}
]
}' | jqAfter the application of the chat template, the rendered prompt will look like this: USER: Who is Jensen Huang? ASSISTANT: __JINJA_BLOCK_GENERATION__Answer with clear detail. Do not omit details. __JINJA_BLOCK_ENDGENERATION__The same cleanup that we do for the epd flow, in my opinion, is needed for the regular flow as well. But instead of re-introducing the custom block tags, we need to remove them all together. This could look something like this: fn render() {
....
let rendered = tmpl.render(&ctx)?;
let cleaned = super::formatters::remove_placeholder_blocks(&rendered);
Ok(cleaned)
}In fn remove_placeholder_blocks(rendered: &str) -> String {
use regex::Regex;
let re = Regex::new(r"__JINJA_BLOCK_[A-Z_]+__").unwrap();
re.replace_all(rendered, "").to_string()
}Or we can make it more specfic to only remove the generation tags? Let me know what you think. |
|
Put up #4380 with some additional changes I needed to get this working. Still need to clean the PR up and do some more testing. Will ping you when it is ready for review. |
|
Does this imply that tokenization is being done by vllm on the backend? If yes, do we need to do anything with the template in the frontend? The frontend becomes a proxy, maybe we can pass the request straight though. To do that call |
|
Can we update the doc here to remove the limitation of llava: https://github.com/ai-dynamo/dynamo/blob/main/docs/backends/vllm/multimodal.md |
GuanLuo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. But as Graham brought it up, I believe the multi-modal path does register the model with ModelInput.Text as it needs to do something with the raw text (unless this part has been ported to the frontend preprocessor), I am curious if the chat template validation is performed, in the frontend, regardless of the model input type.
- Add remove_known_non_jinja2_tags to strip {% generation %} tags before minijinja validation
- Fixes LLaVA and other models using vLLM-specific template extensions
- Make num_hidden_layers optional in model card parsing for multimodal compatibility
- Handle eos_token_id as single value or array in generation_config.json
Signed-off-by: Keiven Chang <[email protected]>
80d9922 to
ef0165f
Compare
KrishnanPrash
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 👍 . Great Work Here 💯
I'm learning this as I'm reading along (@KrishnanPrash can correct me) but in watcher.rs, it looks like:
Can you just arbitrarily register ModelInput type? I thought you need to match it with whatever the model is (e.g. Text if multimodal, otherwise Tokens). @KrishnanPrash? |
Signed-off-by: Keiven Chang <[email protected]>
Copying something from a previous PR comment. Let me know if this addresses your doubts.
|
…mo#4379) Signed-off-by: Keiven Chang <[email protected]>
…mo#4379) Signed-off-by: Keiven Chang <[email protected]>
Overview:
Enables support for multimodal models (like LLaVA) that use non-standard Jinja2 template tags unrecognized by minijinja. Implements a placeholder replacement strategy where the Rust frontend transforms custom tags for validation, and the Python backend restores them before vLLM processing.
Details:
replace_non_standard_blocks()informatters.rsto convert{% generation %}→__JINJA_BLOCK_GENERATION__for minijinja validationchat_processor.pybefore passing to vLLM's custom AssistantTracker extensionnum_hidden_layersoptional in model card parsing for multimodal models that omit it intext_configeos_token_idparsing to handle both single integer values and arrays ingeneration_config.jsonWhere should the reviewer start?
Start with
lib/llm/src/preprocessor/prompt/template/formatters.rsto see the placeholder replacement logic, then reviewcomponents/src/dynamo/vllm/multimodal_utils/chat_processor.pyto see how placeholders are restored before vLLM processing.Before the fix:
After the fix, you can test completions endpoint (proper format for LLaVA)
/coderabbit profile chill
Summary by CodeRabbit
New Features
Bug Fixes