-
Notifications
You must be signed in to change notification settings - Fork 752
fix: use correct prompt template in agg_qwen.yaml #2909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: GuanLuo <[email protected]>
WalkthroughUpdates the Processor's --prompt-template in examples/multimodal/deploy/agg_qwen.yaml from a simple USER/ASSISTANT format to a structured multimodal template including a system instruction and vision tokens (<|vision_start|>, image pad, <|vision_end|>, assistant tag). No other lines in the file are modified. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as User
participant P as Processor (agg_qwen)
participant M as Qwen Model
U->>P: Submit prompt + image
Note over P: Build multimodal prompt<br/>- System instruction<br/>- User text<br/>- <|vision_start|> image pad <|vision_end|><br/>- Assistant tag
P->>M: Send formatted prompt
M-->>P: Generate assistant response
P-->>U: Return response
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
examples/multimodal/deploy/agg_qwen.yaml (2)
68-68: Verify newline handling: '\n' may remain literal with current quoting.The YAML single-quoted scalar plus shell double-quotes typically preserves backslash-n as two characters, not real newlines. If
processor.pyexpects actual line breaks, switch to a YAML block scalar to embed real newlines.Apply:
- - 'python3 components/processor.py --model Qwen/Qwen2.5-VL-7B-Instruct --prompt-template "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<|vision_start|><|image_pad|><|vision_end|><prompt><|im_end|>\n<|im_start|>assistant\n"' + - >- + python3 components/processor.py --model Qwen/Qwen2.5-VL-7B-Instruct + --prompt-template "<|im_start|>system + You are a helpful assistant.<|im_end|> + <|im_start|>user + <|vision_start|><|image_pad|><|vision_end|><prompt><|im_end|> + <|im_start|>assistant + "Run-time check suggestion:
- Log the received template in
processor.py(repr-style) to confirm embedded line breaks.
68-68: Consider externalizing the prompt template.For readability and easier edits, mount a ConfigMap file and use a flag like
--prompt-template-fileinstead of inlining the entire template.Example change to args (volume/ConfigMap omitted here for brevity):
- - >- - python3 components/processor.py --model Qwen/Qwen2.5-VL-7B-Instruct - --prompt-template "<|im_start|>system - You are a helpful assistant.<|im_end|> - <|im_start|>user - <|vision_start|><|image_pad|><|vision_end|><prompt><|im_end|> - <|im_start|>assistant - " + - python3 components/processor.py --model Qwen/Qwen2.5-VL-7B-Instruct --prompt-template-file /etc/prompts/qwen_vl_chat.tmpl
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
examples/multimodal/deploy/agg_qwen.yaml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build and Test - vllm
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
examples/multimodal/deploy/agg_qwen.yaml (2)
68-68: Template looks correct for Qwen2.5-VL chat format.Roles/tokens and trailing assistant tag are consistent with Qwen’s chat template. Nice catch.
68-68: Confirm image injection path matches tokenized prompt.Using <|vision_start|><|image_pad|><|vision_end|> assumes the runtime consumes text tokens for images; some vLLM paths expect images via a separate multimodal payload. Verify your Processor actually inserts/attaches the image(s) accordingly.
If multiple images are possible, ensure you repeat <|image_pad|> per image or programmatically expand the template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Side question:
Is there a way we can verify such deployments locally using minkube or something?
And do we have any documented steps for the same if above is true?
Probably question for @biswapanda or @atchernych |
Signed-off-by: GuanLuo <[email protected]>
Signed-off-by: GuanLuo <[email protected]>
Signed-off-by: GuanLuo <[email protected]> Signed-off-by: Gavin.Zhu <[email protected]>
Signed-off-by: GuanLuo <[email protected]> Signed-off-by: nnshah1 <[email protected]>
Signed-off-by: GuanLuo <[email protected]> Signed-off-by: nnshah1 <[email protected]>
Overview:
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit