Skip to content

Conversation

@KrishnanPrash
Copy link
Contributor

@KrishnanPrash KrishnanPrash commented Nov 16, 2025

Overview:

Chat templates have conflicting expectations for message content format:

  • Standard templates expect strings: "Hello"
  • Multimodal templates (llava) expect arrays: [{"type": "text", "text": "Hello"}]

When the wrong format is provided, content goes missing or renders as malformed in prompts.

  1. Detection at model load: Test-render the template with both formats to detect requirements
  2. Normalization per request: Convert between formats based on template needs
    • Standard templates: Flatten text-only arrays → strings ("t1\nt2")
    • Multimodal templates: Wrap strings → arrays ([{"type": "text", "text": "..."}])
  3. Smart preservation: Mixed content (text + images) always kept as-is

Details:

  • Added detect_content_array_usage() in formatters.rs
  • Added requires_content_arrays field to HfTokenizerConfigJsonFormatter
  • Made may_be_fix_msg_content() bidirectional with preserve_arrays parameter
  • Updated render pipeline to apply normalization automatically

Related PRs:

Summary by CodeRabbit

  • Improvements
    • Enhanced message content formatting to better detect and handle different template styles.
    • Improved flexibility in converting between different message content formats for better template compatibility.

Implement placeholder replacement for non-standard Jinja2 tags (e.g., {% generation %})
that minijinja doesn't recognize but vLLM's custom extensions require.

Rust frontend replaces tags with __JINJA_BLOCK_<TAG>__ for validation, Python backend
restores them before vLLM processing. Also fix model card parsing for multimodal models:
make num_hidden_layers optional and handle eos_token_id as single value or array.

Signed-off-by: Keiven Chang <[email protected]>
@KrishnanPrash KrishnanPrash marked this pull request as ready for review November 17, 2025 13:05
@KrishnanPrash
Copy link
Contributor Author

KrishnanPrash commented Nov 17, 2025

@coderabbitai review this PR.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 17, 2025

@KrishnanPrash: I'll review the changes in this PR now. This PR introduces support for handling message content as both strings and arrays to accommodate different chat template requirements (standard vs multimodal templates). I'll analyze the implementation of the detection mechanism, normalization logic, and overall approach.


🧠 Learnings used
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 17, 2025

@KrishnanPrash I'll review the PR changes now. Based on the PR objectives, I can see this implements bidirectional content format normalization to handle the different expectations between standard and multimodal chat templates. I'll analyze the implementation details and provide feedback.


🧠 Learnings used
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 17, 2025

Walkthrough

Three files in the LLM prompt template system were modified to add runtime template analysis for content array usage. A new field tracks whether templates require content arrays, a detection mechanism analyzes templates at initialization, and bidirectional content normalization replaces prior single-pass array fixing logic.

Changes

Cohort / File(s) Summary
Template Content Array Infrastructure
lib/llm/src/preprocessor/prompt/template.rs, lib/llm/src/preprocessor/prompt/template/formatters.rs
Added requires_content_arrays: bool field to HfTokenizerConfigJsonFormatter. Implemented private detect_content_array_usage() helper that test-renders templates with array-based and string-based message formats to determine content array requirements. Integrated analysis into HfTokenizerConfigJsonFormatter::new() initialization.
Bidirectional Content Normalization
lib/llm/src/preprocessor/prompt/template/oai.rs
Updated may_be_fix_msg_content() to accept preserve_arrays parameter enabling bidirectional normalization: strings convert to single-element text arrays when preserve_arrays=true; text-only arrays flatten to strings when false. Replaced auto-fixing logic with conditional behavior in OAIPromptFormatter render path using the detected requires_content_arrays flag. Expanded tests for both normalization directions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Examine the detect_content_array_usage() logic to ensure test contexts accurately simulate both template formats
  • Verify the bidirectional normalization in may_be_fix_msg_content() handles edge cases (empty arrays, mixed content types, etc.)
  • Confirm preserve_arrays flag is threaded correctly through the OAIPromptFormatter render path
  • Validate test coverage adequately exercises both normalization directions across multimodal and standard templates

Poem

🐰 With whiskers twitched and nose held high,
I probe your templates as they fly,
Content arrays dance and sway,
Bidirectional, come what may!
No more guessing—detect and know,
What format makes your templates glow!

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The PR title 'fix: Support for msg[content] as a list' directly describes the main objective: enabling support for message content as both lists and strings through bidirectional normalization.
Description check ✅ Passed The PR description clearly addresses the problem, implementation approach, and specific file changes. It includes related PRs but lacks GitHub issue reference.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example:

"Create a concise high-level summary as a bullet-point list. Then include a Markdown table showing lines added and removed by each contributing author."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
lib/llm/src/preprocessor/prompt/template/oai.rs (2)

76-137: Bidirectional normalization logic looks correct and robust

The may_be_fix_msg_content implementation handles the key cases cleanly:

  • preserve_arrays = true: only string content fields are converted to [{"type":"text","text":...}]; existing arrays (including mixed multimodal) are left untouched.
  • preserve_arrays = false: only non‑empty, text‑only arrays are flattened into a "\n"‑joined string; empty arrays, mixed arrays, and non‑text arrays are preserved.
  • Non‑array messages or messages without content are passed through unchanged.

This matches the intent for standard vs multimodal templates and is idempotent for all the covered shapes. The only minor nit is that the function returns a minijinja::Value even though it operates entirely on serde_json::Value and is immediately re‑serialized in callers; if this grows more uses, consider returning serde_json::Value instead to avoid double conversions, but it’s not required for correctness.


169-177: Minor redundancy in message serialization, but behavior is fine

NvCreateChatCompletionRequest::messages now does to_value(&self.inner.messages) and then Value::from_serialize(&messages_json), and later render converts that Value back into serde_json::Value. Functionally this is safe and consistent; the extra round‑trip is just a small overhead and not a blocker. If you touch this again, you could consider reducing the number of conversions, but it’s not urgent.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 879b1d5 and 7e662ce.

📒 Files selected for processing (3)
  • lib/llm/src/preprocessor/prompt/template.rs (1 hunks)
  • lib/llm/src/preprocessor/prompt/template/formatters.rs (3 hunks)
  • lib/llm/src/preprocessor/prompt/template/oai.rs (14 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.
📚 Learning: 2025-09-16T19:47:30.312Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.

Applied to files:

  • lib/llm/src/preprocessor/prompt/template/formatters.rs
  • lib/llm/src/preprocessor/prompt/template/oai.rs
📚 Learning: 2025-09-10T22:32:12.978Z
Learnt from: zhongdaor-nv
Repo: ai-dynamo/dynamo PR: 2999
File: lib/parsers/src/tool_calling/harmony/harmony_parser.rs:250-256
Timestamp: 2025-09-10T22:32:12.978Z
Learning: In lib/parsers/src/tool_calling/harmony/harmony_parser.rs, the team prefers to maintain identical code patterns between parse_tool_calls_harmony and parse_tool_calls_harmony_complete functions, including message.content[0] indexing, to ensure consistency between streaming and complete parser implementations.

Applied to files:

  • lib/llm/src/preprocessor/prompt/template/oai.rs
🪛 GitHub Actions: Rust pre-merge checks
lib/llm/src/preprocessor/prompt/template/formatters.rs

[error] 1-1: cargo fmt -- --check failed with exit code 1. Code formatting issues detected in the file.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: sglang (arm64)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: sglang (amd64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: vllm (arm64)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (5)
lib/llm/src/preprocessor/prompt/template.rs (1)

104-110: New requires_content_arrays flag is correctly scoped and integrated

Adding this internal flag to HfTokenizerConfigJsonFormatter is consistent with how formatters.rs::new now computes and stores template capabilities; no API surface changes or obvious correctness risks here.

lib/llm/src/preprocessor/prompt/template/oai.rs (2)

303-315: Render path correctly drives normalization from requires_content_arrays

Using self.requires_content_arrays to pick the direction of normalization before normalize_tool_arguments_in_messages keeps the template contract centralized:

  • Standard templates: text‑only arrays are flattened into strings.
  • Multimodal templates: string content is wrapped into array form.

This aligns with the detection wiring in formatters.rs and keeps the rest of the rendering logic unchanged.


454-983: Test coverage for content normalization is thorough and well targeted

The expanded tests exercise:

  • Array→string flattening (single and multiple messages, empty arrays).
  • Preservation of mixed and non‑text content arrays.
  • Interaction with tool argument normalization in multimodal scenarios.
  • String→array conversion and array preservation when preserve_arrays=true.

This gives good confidence that the new normalization behavior is stable across both standard and multimodal templates and across legacy tool‑calling shapes. No additional cases are strictly required.

lib/llm/src/preprocessor/prompt/template/formatters.rs (2)

74-108: Content‑array detection heuristic is reasonable and safely conservative

detect_content_array_usage’s strategy—rendering the default template once with array content and once with string content, then checking for "template_test"—is a pragmatic way to infer whether the template truly depends on array semantics:

  • Templates that require content[0].text will succeed only for the array case, correctly flagging requires_content_arrays = true.
  • Templates that are agnostic or support both forms will typically render "template_test" in both cases, yielding false, which is safe because normalization will then prefer simpler string content.
  • Any render or lookup errors degrade to empty output via unwrap_or_default, which just biases toward false (no arrays required) rather than breaking initialization.

Given this is only used to drive normalization behavior and not functional correctness of templates themselves, the trade‑offs look acceptable.


205-219: Integration of detection into formatter initialization looks good

Calling detect_content_array_usage(&env) in HfTokenizerConfigJsonFormatter::new and storing the result in requires_content_arrays, with a one‑time tracing::info! log, cleanly wires the capability check into the formatter:

  • The environment is fully configured (filters, functions, templates) before detection runs, so the probe sees realistic conditions.
  • The flag is then consumed by OAIPromptFormatter::render to choose normalization direction, without further branching here.
  • Defaulting supports_add_generation_prompt and this new flag at initialization keeps runtime rendering cheap.

This integration is straightforward and fits the existing initialization pattern.

Signed-off-by: Krishnan Prashanth <[email protected]>
@KrishnanPrash KrishnanPrash changed the title Working support for msg[content] as a list feat: Support for msg[content] as a list Nov 17, 2025
@github-actions github-actions bot added the feat label Nov 17, 2025
@KrishnanPrash KrishnanPrash changed the title feat: Support for msg[content] as a list fix: Support for msg[content] as a list Nov 17, 2025
@github-actions github-actions bot added fix and removed feat labels Nov 17, 2025
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be an indepdendent fix targeting main instead of keivenchang/MDC-fix-on-main-nvbugs5662072? Or does it need to target Keiven's branch?

Copy link
Contributor Author

@KrishnanPrash KrishnanPrash Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we merge Keiven's PR (without this one), special models (with custom block tags) would still be usable with dynamo+vLLM, but could fail if their inference request's are malformed (msg[content] is a string, but the model chat template wants a list). I will leave it up to you on what branch this fix should target. I guess it's more just a question of model support.

}

// Detect at model load time whether this template requires content arrays
let requires_content_arrays = detect_content_array_usage(&env);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does vllm-serve, sglang, trtllm-serve, etc. solve this problem?

Copy link
Contributor Author

@KrishnanPrash KrishnanPrash Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From previous investigations, vLLM's pre-processor does something like:

  • Feeds the chat template to Jinja2, which internally parses and generates an AST.
  • Relies on internal representations of Jinja2 to check if a for loop is done over messages[content].
  • If yes, leave messages[content] as an array.
  • If no, flatten messages[content] to a string. (Similar to what we do in feat: Convert message[content] from list to string. #3067)

And from my limited investigation into MiniJinja it does not expose anything similar to that.

// Detect at model load time whether this template requires content arrays
let requires_content_arrays = detect_content_array_usage(&env);

tracing::info!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

info is too noisy for this, maybe debug or trace instead if we need this log

Copy link
Contributor

@keivenchang keivenchang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was really sad to learn from Prashanth that the JSON formats are not versioned (or, specify versions), and some model makers (Mistral) don't even publish formal specs. So, I guess the best thing to do then is to keep writing code to detect changes in JSON.

Having that said, the templates will just keep changing every once in a while-- even many times throughout the year. It would be nice if we (Dynamo) document all the different types of templates that we currently can handle (and future ones that we are working on). It'll help programmers understand the different formats, as well as get picked up by Cursor/Claude to write code to support these examples. Can you add something like this in a *.md file? It'll help you (and AI) tremendously.

- Add remove_known_non_jinja2_tags to strip {% generation %} tags before minijinja validation
- Fixes LLaVA and other models using vLLM-specific template extensions
- Make num_hidden_layers optional in model card parsing for multimodal compatibility
- Handle eos_token_id as single value or array in generation_config.json

Signed-off-by: Keiven Chang <[email protected]>
@keivenchang keivenchang force-pushed the keivenchang/MDC-fix-on-main-nvbugs5662072 branch from 80d9922 to ef0165f Compare November 18, 2025 03:57
@keivenchang keivenchang requested review from a team as code owners November 18, 2025 03:57
…nth/msg_content_list_or_string

Signed-off-by: KrishnanPrash <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
@KrishnanPrash KrishnanPrash changed the base branch from keivenchang/MDC-fix-on-main-nvbugs5662072 to main November 19, 2025 17:24
@rmccorm4 rmccorm4 changed the base branch from main to keivenchang/MDC-fix-on-main-nvbugs5662072 November 19, 2025 17:31

# Restore all __JINJA_BLOCK_<TAG>__ placeholders back to {% tag %}
chat_template = re.sub(
r"__JINJA_BLOCK_([A-Z_]+)__",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed? I feel like this came from Keiven's branch and later got changed?

Base automatically changed from keivenchang/MDC-fix-on-main-nvbugs5662072 to main November 19, 2025 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants