chore: add agg_qwen.yaml to multimodal deploy #2872

GuanLuo · 2025-09-04T17:48:47Z

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Added a ready-to-deploy multimodal Qwen example using a vLLM backend, enabling image+text prompts and responses.
- Includes coordinated services for frontend, encoding, VLM prefill, and processing, with single-GPU workers and HF token support.
- Provides a default prompt template for multimodal interactions.
Documentation
- Introduces a new example manifest demonstrating end-to-end setup of a multimodal Qwen deployment, helping users quickly spin up and test the workflow.

Signed-off-by: GuanLuo <[email protected]>

coderabbitai · 2025-09-04T17:56:24Z

Walkthrough

Adds a new Kubernetes DynamoGraphDeployment manifest for a multimodal Qwen (Qwen2.5-VL-7B-Instruct) example using the vLLM backend. Defines four services—Frontend, EncodeWorker, VLMWorker, Processor—each in namespace agg-qwen with replicas=1, per-service images, GPU limits for workers, envFromSecret for Hugging Face token, and component-specific commands/args.

Changes

Cohort / File(s)	Summary
New multimodal deployment manifest `examples/multimodal/deploy/agg_qwen.yaml`	Introduces a DynamoGraphDeployment (apiVersion nvidia.com/v1alpha1) named agg-qwen using backendFramework vllm with four services (Frontend, EncodeWorker, VLMWorker, Processor), per-service images, GPU limits for workers, envFromSecret hf-token-secret, workingDir, and component-specific command/args targeting Qwen/Qwen2.5-VL-7B-Instruct.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant Frontend as Frontend Service (Pod)
    participant Encode as EncodeWorker (Pod, GPU=1)
    participant VLM as VLMWorker (Pod, GPU=1, prefill)
    participant Proc as Processor (Pod, GPU=1)

    User->>Frontend: Submit multimodal request (text + image)
    Frontend->>Encode: Run encode_worker.py\n--model Qwen2.5-VL-7B-Instruct
    Encode-->>Frontend: Encoded embeddings
    Frontend->>VLM: Run worker.py --worker-type prefill\n--model Qwen2.5-VL-7B-Instruct
    VLM-->>Frontend: Prefill outputs
    Frontend->>Proc: Run processor.py\n--prompt-template "USER: <image> <prompt> ASSISTANT:"
    Proc-->>Frontend: Final response
    Frontend-->>User: Return answer

    note over Encode,VLM: Pods use envFromSecret=hf-token-secret

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

feat: add multimodal k8s deployment manifest #1836 — Adds a similar DynamoGraphDeployment manifest for a multimodal agg-phi3v setup with analogous services and GPU/env configurations.
feat: add multimodal deployment example for llava based on vllm v1 #2628 — Introduces a multimodal agg-llava DynamoGraphDeployment using vLLM with the same service structure and patterns, differing mainly by model selection.

Poem

Thump-thump goes my coder heart,
New pods hop in—each plays a part.
Qwen now listens, sees, and speaks,
GPU carrots for busy peaks.
Secrets tucked, commands aligned—
Deploy, enjoy, results refined.
Happy hops through DAGs entwined! 🥕🐇

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (8)

examples/multimodal/deploy/agg_qwen.yaml (8)

4-8: Confirm CRD scope and set metadata.namespace

Add metadata.namespace: agg-qwen so the CR is namespaced and grouped with its Pods. If the CRD is cluster-scoped, ignore; otherwise this omission will block apply.

Apply:

 apiVersion: nvidia.com/v1alpha1
 kind: DynamoGraphDeployment
 metadata:
   name: agg-qwen
+  namespace: agg-qwen

11-18: Frontend image is placeholder; consider pull settings and ports

If this is meant to run as-is, add imagePullPolicy, imagePullSecrets (private registry), and expose containerPort if the Frontend serves traffic.

Apply:

       extraPodSpec:
         mainContainer:
-          image: my-registry/vllm-runtime:my-tag
+          image: my-registry/vllm-runtime:my-tag
+          imagePullPolicy: IfNotPresent
+          ports:
+            - containerPort: 8080
+          # imagePullSecrets:
+          #   - name: my-registry-creds

26-35: Prefer exec-form command/args over /bin/sh -c

Exec form improves signal handling and avoids shell-quoting pitfalls.

Apply:

         mainContainer:
           image: my-registry/vllm-runtime:my-tag
           workingDir: /workspace/examples/multimodal
-          command:
-            - /bin/sh
-            - -c
-          args:
-            - python3 components/encode_worker.py --model Qwen/Qwen2.5-VL-7B-Instruct
+          command: ["python3"]
+          args:
+            - components/encode_worker.py
+            - --model
+            - Qwen/Qwen2.5-VL-7B-Instruct

44-51: Use exec-form for VLMWorker too

Avoid /bin/sh -c.

Apply:

         mainContainer:
           image: my-registry/vllm-runtime:my-tag
           workingDir: /workspace/examples/multimodal
-          command:
-            - /bin/sh
-            - -c
-          args:
-            - python3 components/worker.py --model Qwen/Qwen2.5-VL-7B-Instruct --worker-type prefill
+          command: ["python3"]
+          args:
+            - components/worker.py
+            - --model
+            - Qwen/Qwen2.5-VL-7B-Instruct
+            - --worker-type
+            - prefill

60-68: Processor quoting: make prompt-template unambiguous

Exec form avoids shell escaping; preserve literal "\n" by keeping YAML single quotes.

Apply:

         mainContainer:
           image: my-registry/vllm-runtime:my-tag
           workingDir: /workspace/examples/multimodal
-          command:
-            - /bin/sh
-            - -c
-          args:
-            - 'python3 components/processor.py --model Qwen/Qwen2.5-VL-7B-Instruct --prompt-template "USER: <image>\n<prompt> ASSISTANT:"'
+          command: ["python3"]
+          args:
+            - components/processor.py
+            - --model
+            - Qwen/Qwen2.5-VL-7B-Instruct
+            - --prompt-template
+            - 'USER: <image>\n<prompt> ASSISTANT:'

19-20: Secret reference structure: verify CRD expects a string

If the CRD mirrors Kubernetes PodSpec, it might require envFrom with secretRef.name. Confirm the schema; adjust if needed.

Example if standard:

envFrom:
  - secretRef:
      name: hf-token-secret

Also applies to: 36-37, 53-54

11-68: Add basic ops hardening (probes, securityContext, scheduling)

Liveness/readiness probes for Python services.
securityContext (runAsNonRoot, allowPrivilegeEscalation: false).
nodeSelector/tolerations to land on GPU nodes.

Example:

         mainContainer:
           image: my-registry/vllm-runtime:my-tag
+          securityContext:
+            runAsNonRoot: true
+            allowPrivilegeEscalation: false
+          livenessProbe:
+            tcpSocket: { port: 8080 }
+            initialDelaySeconds: 20
+            periodSeconds: 10
+          readinessProbe:
+            tcpSocket: { port: 8080 }
+            initialDelaySeconds: 5
+            periodSeconds: 5
+        nodeSelector:
+          nvidia.com/gpu.present: "true"
+        tolerations:
+          - key: "nvidia.com/gpu"
+            operator: "Exists"
+            effect: "NoSchedule"

Tune ports/handlers per component.

8-13: Namespace existence

Ensure Namespace agg-qwen is created. Provide a companion manifest.

Example:

apiVersion: v1
kind: Namespace
metadata:
  name: agg-qwen

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4581fcc and dfe0b0d.

📒 Files selected for processing (1)

examples/multimodal/deploy/agg_qwen.yaml (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and Test - dynamo

🔇 Additional comments (2)

examples/multimodal/deploy/agg_qwen.yaml (2)
8-68: Overall: solid example scaffold

Structure and component split look good; once the above tweaks are applied, this should be a reliable runnable example.

19-26: Add matching resource requests and retain the CRD’s gpu key
The DynamoGraphDeployment CRD schema defines resources.limits.gpu and resources.requests.gpu (it does not recognize nvidia.com/gpu), so don’t rename the key. To avoid BestEffort QoS, add matching requests for GPU, CPU, and memory:
       resources:
         limits:
           gpu: "1"
+        requests:
+          gpu: "1"
+          cpu: "2"
+          memory: "8Gi"
Likely an incorrect or invalid review comment.

examples/multimodal/deploy/agg_qwen.yaml

indrajit96

LGTM!

biswapanda

lgtm

Signed-off-by: GuanLuo <[email protected]>

Signed-off-by: GuanLuo <[email protected]> Signed-off-by: nnshah1 <[email protected]>

chore: add agg_qwen.yaml to multimodal deploy

dfe0b0d

Signed-off-by: GuanLuo <[email protected]>

GuanLuo requested review from hhzhang16, indrajit96, krishung5 and whoisj as code owners September 4, 2025 17:48

pull-request-size bot added the size/M label Sep 4, 2025

github-actions bot added the chore label Sep 4, 2025

coderabbitai bot reviewed Sep 4, 2025

View reviewed changes

examples/multimodal/deploy/agg_qwen.yaml Show resolved Hide resolved

examples/multimodal/deploy/agg_qwen.yaml Show resolved Hide resolved

indrajit96 approved these changes Sep 4, 2025

View reviewed changes

biswapanda approved these changes Sep 4, 2025

View reviewed changes

saturley-hall merged commit c403d18 into main Sep 4, 2025
11 checks passed

saturley-hall deleted the GuanLuo-patch-2 branch September 4, 2025 19:17

GuanLuo added a commit that referenced this pull request Sep 4, 2025

chore: add agg_qwen.yaml to multimodal deploy (#2872)

5aaef62

Signed-off-by: GuanLuo <[email protected]>

saturley-hall pushed a commit that referenced this pull request Sep 4, 2025

chore: add agg_qwen.yaml to multimodal deploy (#2872) (#2880)

cc098d1

Signed-off-by: GuanLuo <[email protected]>

dillon-cullinan pushed a commit that referenced this pull request Sep 5, 2025

chore: add agg_qwen.yaml to multimodal deploy (#2872)

4c311a7

Signed-off-by: GuanLuo <[email protected]>

coderabbitai bot mentioned this pull request Sep 5, 2025

fix: use correct prompt template in agg_qwen.yaml #2909

Merged

nnshah1 pushed a commit that referenced this pull request Sep 8, 2025

chore: add agg_qwen.yaml to multimodal deploy (#2872)

e7b2e56

Signed-off-by: GuanLuo <[email protected]> Signed-off-by: nnshah1 <[email protected]>

coderabbitai bot mentioned this pull request Sep 11, 2025

docs: added example for a frontend shared across multiple models #3008

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore: add agg_qwen.yaml to multimodal deploy #2872

chore: add agg_qwen.yaml to multimodal deploy #2872

Uh oh!

GuanLuo commented Sep 4, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 4, 2025

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

indrajit96 left a comment

Uh oh!

biswapanda left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

chore: add agg_qwen.yaml to multimodal deploy #2872

chore: add agg_qwen.yaml to multimodal deploy #2872

Uh oh!

Conversation

GuanLuo commented Sep 4, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 4, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

indrajit96 left a comment

Choose a reason for hiding this comment

Uh oh!

biswapanda left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

GuanLuo commented Sep 4, 2025 •

edited by coderabbitai bot

Loading