-
Notifications
You must be signed in to change notification settings - Fork 766
fix: Support for msg[content] as a list #4380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Implement placeholder replacement for non-standard Jinja2 tags (e.g., {% generation %})
that minijinja doesn't recognize but vLLM's custom extensions require.
Rust frontend replaces tags with __JINJA_BLOCK_<TAG>__ for validation, Python backend
restores them before vLLM processing. Also fix model card parsing for multimodal models:
make num_hidden_layers optional and handle eos_token_id as single value or array.
Signed-off-by: Keiven Chang <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
9267bbf to
d5eaec9
Compare
Signed-off-by: Krishnan Prashanth <[email protected]>
|
@coderabbitai review this PR. |
|
@KrishnanPrash: I'll review the changes in this PR now. This PR introduces support for handling message content as both strings and arrays to accommodate different chat template requirements (standard vs multimodal templates). I'll analyze the implementation of the detection mechanism, normalization logic, and overall approach. 🧠 Learnings used✅ Actions performedReview triggered.
|
|
@KrishnanPrash I'll review the PR changes now. Based on the PR objectives, I can see this implements bidirectional content format normalization to handle the different expectations between standard and multimodal chat templates. I'll analyze the implementation details and provide feedback. 🧠 Learnings used✅ Actions performedReview triggered.
|
WalkthroughThree files in the LLM prompt template system were modified to add runtime template analysis for content array usage. A new field tracks whether templates require content arrays, a detection mechanism analyzes templates at initialization, and bidirectional content normalization replaces prior single-pass array fixing logic. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks✅ Passed checks (3 passed)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
lib/llm/src/preprocessor/prompt/template/oai.rs (2)
76-137: Bidirectional normalization logic looks correct and robustThe
may_be_fix_msg_contentimplementation handles the key cases cleanly:
preserve_arrays = true: only stringcontentfields are converted to[{"type":"text","text":...}]; existing arrays (including mixed multimodal) are left untouched.preserve_arrays = false: only non‑empty, text‑only arrays are flattened into a"\n"‑joined string; empty arrays, mixed arrays, and non‑text arrays are preserved.- Non‑array
messagesor messages withoutcontentare passed through unchanged.This matches the intent for standard vs multimodal templates and is idempotent for all the covered shapes. The only minor nit is that the function returns a
minijinja::Valueeven though it operates entirely onserde_json::Valueand is immediately re‑serialized in callers; if this grows more uses, consider returningserde_json::Valueinstead to avoid double conversions, but it’s not required for correctness.
169-177: Minor redundancy in message serialization, but behavior is fine
NvCreateChatCompletionRequest::messagesnow doesto_value(&self.inner.messages)and thenValue::from_serialize(&messages_json), and laterrenderconverts thatValueback intoserde_json::Value. Functionally this is safe and consistent; the extra round‑trip is just a small overhead and not a blocker. If you touch this again, you could consider reducing the number of conversions, but it’s not urgent.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
lib/llm/src/preprocessor/prompt/template.rs(1 hunks)lib/llm/src/preprocessor/prompt/template/formatters.rs(3 hunks)lib/llm/src/preprocessor/prompt/template/oai.rs(14 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.
📚 Learning: 2025-09-16T19:47:30.312Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.
Applied to files:
lib/llm/src/preprocessor/prompt/template/formatters.rslib/llm/src/preprocessor/prompt/template/oai.rs
📚 Learning: 2025-09-10T22:32:12.978Z
Learnt from: zhongdaor-nv
Repo: ai-dynamo/dynamo PR: 2999
File: lib/parsers/src/tool_calling/harmony/harmony_parser.rs:250-256
Timestamp: 2025-09-10T22:32:12.978Z
Learning: In lib/parsers/src/tool_calling/harmony/harmony_parser.rs, the team prefers to maintain identical code patterns between parse_tool_calls_harmony and parse_tool_calls_harmony_complete functions, including message.content[0] indexing, to ensure consistency between streaming and complete parser implementations.
Applied to files:
lib/llm/src/preprocessor/prompt/template/oai.rs
🪛 GitHub Actions: Rust pre-merge checks
lib/llm/src/preprocessor/prompt/template/formatters.rs
[error] 1-1: cargo fmt -- --check failed with exit code 1. Code formatting issues detected in the file.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: sglang (arm64)
- GitHub Check: trtllm (amd64)
- GitHub Check: trtllm (arm64)
- GitHub Check: sglang (amd64)
- GitHub Check: vllm (amd64)
- GitHub Check: vllm (arm64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (5)
lib/llm/src/preprocessor/prompt/template.rs (1)
104-110: Newrequires_content_arraysflag is correctly scoped and integratedAdding this internal flag to
HfTokenizerConfigJsonFormatteris consistent with howformatters.rs::newnow computes and stores template capabilities; no API surface changes or obvious correctness risks here.lib/llm/src/preprocessor/prompt/template/oai.rs (2)
303-315: Render path correctly drives normalization fromrequires_content_arraysUsing
self.requires_content_arraysto pick the direction of normalization beforenormalize_tool_arguments_in_messageskeeps the template contract centralized:
- Standard templates: text‑only arrays are flattened into strings.
- Multimodal templates: string content is wrapped into array form.
This aligns with the detection wiring in
formatters.rsand keeps the rest of the rendering logic unchanged.
454-983: Test coverage for content normalization is thorough and well targetedThe expanded tests exercise:
- Array→string flattening (single and multiple messages, empty arrays).
- Preservation of mixed and non‑text content arrays.
- Interaction with tool argument normalization in multimodal scenarios.
- String→array conversion and array preservation when
preserve_arrays=true.This gives good confidence that the new normalization behavior is stable across both standard and multimodal templates and across legacy tool‑calling shapes. No additional cases are strictly required.
lib/llm/src/preprocessor/prompt/template/formatters.rs (2)
74-108: Content‑array detection heuristic is reasonable and safely conservative
detect_content_array_usage’s strategy—rendering thedefaulttemplate once with array content and once with string content, then checking for"template_test"—is a pragmatic way to infer whether the template truly depends on array semantics:
- Templates that require
content[0].textwill succeed only for the array case, correctly flaggingrequires_content_arrays = true.- Templates that are agnostic or support both forms will typically render
"template_test"in both cases, yieldingfalse, which is safe because normalization will then prefer simpler string content.- Any render or lookup errors degrade to empty output via
unwrap_or_default, which just biases towardfalse(no arrays required) rather than breaking initialization.Given this is only used to drive normalization behavior and not functional correctness of templates themselves, the trade‑offs look acceptable.
205-219: Integration of detection into formatter initialization looks goodCalling
detect_content_array_usage(&env)inHfTokenizerConfigJsonFormatter::newand storing the result inrequires_content_arrays, with a one‑timetracing::info!log, cleanly wires the capability check into the formatter:
- The environment is fully configured (filters, functions, templates) before detection runs, so the probe sees realistic conditions.
- The flag is then consumed by
OAIPromptFormatter::renderto choose normalization direction, without further branching here.- Defaulting
supports_add_generation_promptand this new flag at initialization keeps runtime rendering cheap.This integration is straightforward and fits the existing initialization pattern.
Signed-off-by: Krishnan Prashanth <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be an indepdendent fix targeting main instead of keivenchang/MDC-fix-on-main-nvbugs5662072? Or does it need to target Keiven's branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we merge Keiven's PR (without this one), special models (with custom block tags) would still be usable with dynamo+vLLM, but could fail if their inference request's are malformed (msg[content] is a string, but the model chat template wants a list). I will leave it up to you on what branch this fix should target. I guess it's more just a question of model support.
| } | ||
|
|
||
| // Detect at model load time whether this template requires content arrays | ||
| let requires_content_arrays = detect_content_array_usage(&env); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does vllm-serve, sglang, trtllm-serve, etc. solve this problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From previous investigations, vLLM's pre-processor does something like:
- Feeds the chat template to Jinja2, which internally parses and generates an AST.
- Relies on internal representations of Jinja2 to check if a for loop is done over
messages[content]. - If yes, leave
messages[content]as an array. - If no, flatten
messages[content]to a string. (Similar to what we do in feat: Convert message[content] from list to string. #3067)
And from my limited investigation into MiniJinja it does not expose anything similar to that.
| // Detect at model load time whether this template requires content arrays | ||
| let requires_content_arrays = detect_content_array_usage(&env); | ||
|
|
||
| tracing::info!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info is too noisy for this, maybe debug or trace instead if we need this log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was really sad to learn from Prashanth that the JSON formats are not versioned (or, specify versions), and some model makers (Mistral) don't even publish formal specs. So, I guess the best thing to do then is to keep writing code to detect changes in JSON.
Having that said, the templates will just keep changing every once in a while-- even many times throughout the year. It would be nice if we (Dynamo) document all the different types of templates that we currently can handle (and future ones that we are working on). It'll help programmers understand the different formats, as well as get picked up by Cursor/Claude to write code to support these examples. Can you add something like this in a *.md file? It'll help you (and AI) tremendously.
- Add remove_known_non_jinja2_tags to strip {% generation %} tags before minijinja validation
- Fixes LLaVA and other models using vLLM-specific template extensions
- Make num_hidden_layers optional in model card parsing for multimodal compatibility
- Handle eos_token_id as single value or array in generation_config.json
Signed-off-by: Keiven Chang <[email protected]>
80d9922 to
ef0165f
Compare
…nth/msg_content_list_or_string Signed-off-by: KrishnanPrash <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
|
|
||
| # Restore all __JINJA_BLOCK_<TAG>__ placeholders back to {% tag %} | ||
| chat_template = re.sub( | ||
| r"__JINJA_BLOCK_([A-Z_]+)__", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed? I feel like this came from Keiven's branch and later got changed?
Signed-off-by: KrishnanPrash <[email protected]>
Overview:
Chat templates have conflicting expectations for message content format:
"Hello"[{"type": "text", "text": "Hello"}]When the wrong format is provided, content goes missing or renders as malformed in prompts.
"t1\nt2")[{"type": "text", "text": "..."}])Details:
detect_content_array_usage()informatters.rsrequires_content_arraysfield toHfTokenizerConfigJsonFormattermay_be_fix_msg_content()bidirectional withpreserve_arraysparameterRelated PRs:
Summary by CodeRabbit