Skip to content

Conversation

@GuanLuo
Copy link
Contributor

@GuanLuo GuanLuo commented Sep 4, 2025

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Added a ready-to-deploy multimodal Qwen example using a vLLM backend, enabling image+text prompts and responses.
    • Includes coordinated services for frontend, encoding, VLM prefill, and processing, with single-GPU workers and HF token support.
    • Provides a default prompt template for multimodal interactions.
  • Documentation

    • Introduces a new example manifest demonstrating end-to-end setup of a multimodal Qwen deployment, helping users quickly spin up and test the workflow.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 4, 2025

Walkthrough

Adds a new Kubernetes DynamoGraphDeployment manifest for a multimodal Qwen (Qwen2.5-VL-7B-Instruct) example using the vLLM backend. Defines four services—Frontend, EncodeWorker, VLMWorker, Processor—each in namespace agg-qwen with replicas=1, per-service images, GPU limits for workers, envFromSecret for Hugging Face token, and component-specific commands/args.

Changes

Cohort / File(s) Summary
New multimodal deployment manifest
examples/multimodal/deploy/agg_qwen.yaml
Introduces a DynamoGraphDeployment (apiVersion nvidia.com/v1alpha1) named agg-qwen using backendFramework vllm with four services (Frontend, EncodeWorker, VLMWorker, Processor), per-service images, GPU limits for workers, envFromSecret hf-token-secret, workingDir, and component-specific command/args targeting Qwen/Qwen2.5-VL-7B-Instruct.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant Frontend as Frontend Service (Pod)
    participant Encode as EncodeWorker (Pod, GPU=1)
    participant VLM as VLMWorker (Pod, GPU=1, prefill)
    participant Proc as Processor (Pod, GPU=1)

    User->>Frontend: Submit multimodal request (text + image)
    Frontend->>Encode: Run encode_worker.py\n--model Qwen2.5-VL-7B-Instruct
    Encode-->>Frontend: Encoded embeddings
    Frontend->>VLM: Run worker.py --worker-type prefill\n--model Qwen2.5-VL-7B-Instruct
    VLM-->>Frontend: Prefill outputs
    Frontend->>Proc: Run processor.py\n--prompt-template "USER: <image> <prompt> ASSISTANT:"
    Proc-->>Frontend: Final response
    Frontend-->>User: Return answer

    note over Encode,VLM: Pods use envFromSecret=hf-token-secret
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Poem

Thump-thump goes my coder heart,
New pods hop in—each plays a part.
Qwen now listens, sees, and speaks,
GPU carrots for busy peaks.
Secrets tucked, commands aligned—
Deploy, enjoy, results refined.
Happy hops through DAGs entwined! 🥕🐇


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (8)
examples/multimodal/deploy/agg_qwen.yaml (8)

4-8: Confirm CRD scope and set metadata.namespace

Add metadata.namespace: agg-qwen so the CR is namespaced and grouped with its Pods. If the CRD is cluster-scoped, ignore; otherwise this omission will block apply.

Apply:

 apiVersion: nvidia.com/v1alpha1
 kind: DynamoGraphDeployment
 metadata:
   name: agg-qwen
+  namespace: agg-qwen

11-18: Frontend image is placeholder; consider pull settings and ports

If this is meant to run as-is, add imagePullPolicy, imagePullSecrets (private registry), and expose containerPort if the Frontend serves traffic.

Apply:

       extraPodSpec:
         mainContainer:
-          image: my-registry/vllm-runtime:my-tag
+          image: my-registry/vllm-runtime:my-tag
+          imagePullPolicy: IfNotPresent
+          ports:
+            - containerPort: 8080
+          # imagePullSecrets:
+          #   - name: my-registry-creds

26-35: Prefer exec-form command/args over /bin/sh -c

Exec form improves signal handling and avoids shell-quoting pitfalls.

Apply:

         mainContainer:
           image: my-registry/vllm-runtime:my-tag
           workingDir: /workspace/examples/multimodal
-          command:
-            - /bin/sh
-            - -c
-          args:
-            - python3 components/encode_worker.py --model Qwen/Qwen2.5-VL-7B-Instruct
+          command: ["python3"]
+          args:
+            - components/encode_worker.py
+            - --model
+            - Qwen/Qwen2.5-VL-7B-Instruct

44-51: Use exec-form for VLMWorker too

Avoid /bin/sh -c.

Apply:

         mainContainer:
           image: my-registry/vllm-runtime:my-tag
           workingDir: /workspace/examples/multimodal
-          command:
-            - /bin/sh
-            - -c
-          args:
-            - python3 components/worker.py --model Qwen/Qwen2.5-VL-7B-Instruct --worker-type prefill
+          command: ["python3"]
+          args:
+            - components/worker.py
+            - --model
+            - Qwen/Qwen2.5-VL-7B-Instruct
+            - --worker-type
+            - prefill

60-68: Processor quoting: make prompt-template unambiguous

Exec form avoids shell escaping; preserve literal "\n" by keeping YAML single quotes.

Apply:

         mainContainer:
           image: my-registry/vllm-runtime:my-tag
           workingDir: /workspace/examples/multimodal
-          command:
-            - /bin/sh
-            - -c
-          args:
-            - 'python3 components/processor.py --model Qwen/Qwen2.5-VL-7B-Instruct --prompt-template "USER: <image>\n<prompt> ASSISTANT:"'
+          command: ["python3"]
+          args:
+            - components/processor.py
+            - --model
+            - Qwen/Qwen2.5-VL-7B-Instruct
+            - --prompt-template
+            - 'USER: <image>\n<prompt> ASSISTANT:'

19-20: Secret reference structure: verify CRD expects a string

If the CRD mirrors Kubernetes PodSpec, it might require envFrom with secretRef.name. Confirm the schema; adjust if needed.

Example if standard:

envFrom:
  - secretRef:
      name: hf-token-secret

Also applies to: 36-37, 53-54


11-68: Add basic ops hardening (probes, securityContext, scheduling)

  • Liveness/readiness probes for Python services.
  • securityContext (runAsNonRoot, allowPrivilegeEscalation: false).
  • nodeSelector/tolerations to land on GPU nodes.

Example:

         mainContainer:
           image: my-registry/vllm-runtime:my-tag
+          securityContext:
+            runAsNonRoot: true
+            allowPrivilegeEscalation: false
+          livenessProbe:
+            tcpSocket: { port: 8080 }
+            initialDelaySeconds: 20
+            periodSeconds: 10
+          readinessProbe:
+            tcpSocket: { port: 8080 }
+            initialDelaySeconds: 5
+            periodSeconds: 5
+        nodeSelector:
+          nvidia.com/gpu.present: "true"
+        tolerations:
+          - key: "nvidia.com/gpu"
+            operator: "Exists"
+            effect: "NoSchedule"

Tune ports/handlers per component.


8-13: Namespace existence

Ensure Namespace agg-qwen is created. Provide a companion manifest.

Example:

apiVersion: v1
kind: Namespace
metadata:
  name: agg-qwen
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4581fcc and dfe0b0d.

📒 Files selected for processing (1)
  • examples/multimodal/deploy/agg_qwen.yaml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
examples/multimodal/deploy/agg_qwen.yaml (2)

8-68: Overall: solid example scaffold

Structure and component split look good; once the above tweaks are applied, this should be a reliable runnable example.


19-26: Add matching resource requests and retain the CRD’s gpu key
The DynamoGraphDeployment CRD schema defines resources.limits.gpu and resources.requests.gpu (it does not recognize nvidia.com/gpu), so don’t rename the key. To avoid BestEffort QoS, add matching requests for GPU, CPU, and memory:

       resources:
         limits:
           gpu: "1"
+        requests:
+          gpu: "1"
+          cpu: "2"
+          memory: "8Gi"

Likely an incorrect or invalid review comment.

Copy link
Contributor

@indrajit96 indrajit96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@biswapanda biswapanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@saturley-hall saturley-hall merged commit c403d18 into main Sep 4, 2025
11 checks passed
@saturley-hall saturley-hall deleted the GuanLuo-patch-2 branch September 4, 2025 19:17
GuanLuo added a commit that referenced this pull request Sep 4, 2025
saturley-hall pushed a commit that referenced this pull request Sep 4, 2025
dillon-cullinan pushed a commit that referenced this pull request Sep 5, 2025
nnshah1 pushed a commit that referenced this pull request Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants