Skip to content

Conversation

@krishung5
Copy link
Contributor

@krishung5 krishung5 commented Dec 11, 2025

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • Chores
    • Updated backend example scripts to revise GPU resource allocation configurations for multimodal worker processes. These changes optimize hardware resource distribution across multiple GPU devices in multi-GPU environments, enabling improved deployment flexibility and hardware compatibility. Users deploying multimodal services will see benefits in resource utilization and overall system flexibility.

✏️ Tip: You can customize this high-level summary in your review settings.

@krishung5 krishung5 requested review from a team as code owners December 11, 2025 08:11
@github-actions github-actions bot added the fix label Dec 11, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 11, 2025

Walkthrough

A shell script configuration for a multimodal vLLM backend was modified to assign a worker process to a different GPU device (from GPU 0 to GPU 1) in the multi-device setup.

Changes

Cohort / File(s) Change Summary
GPU Device Configuration
examples/backends/vllm/launch/agg_multimodal_epd.sh
Multimodal worker process CUDA device binding changed from CUDA_VISIBLE_DEVICES=0 to CUDA_VISIBLE_DEVICES=1

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

  • Single-line configuration change with no logic alterations
  • No dependencies or cross-file impacts to verify

Poem

🐰 A GPU swap, so swift and clean,
From zero to one, the device's scene,
The worker hops to a new home today,
Configuration whispers have paved the way! ✨

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The pull request description contains only template placeholders with no substantive content in any required sections (Overview, Details, Where should the reviewer start, Related Issues). Fill in all required sections: provide an overview of the change, explain why CUDA_VISIBLE_DEVICES needed adjustment, identify the affected script, and link the actual GitHub issue number.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: relocating the multimodal worker from CUDA_VISIBLE_DEVICES=0 to CUDA_VISIBLE_DEVICES=1 in the agg_multimodal_epd.sh script.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
examples/backends/vllm/launch/agg_multimodal_epd.sh (1)

82-83: Consider documenting GPU allocation strategy.

The GPU assignment (Encode Worker on GPU 0, PD Worker on GPU 1) is a key deployment detail that should be more explicitly documented in the script header or comments to help users understand the multi-GPU requirement and potential scaling implications.

Consider adding a brief comment above the worker launch section to explain the GPU allocation:

 # run E/P/D workers
+# Note: Encode and PD workers are assigned to separate GPUs (0 and 1) for independent scaling
 CUDA_VISIBLE_DEVICES=0 python -m dynamo.vllm --multimodal-encode-worker --enable-multimodal --model $MODEL_NAME &
 CUDA_VISIBLE_DEVICES=1 python -m dynamo.vllm --multimodal-worker --enable-multimodal --enable-mm-embeds --model $MODEL_NAME $EXTRA_ARGS &
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6b440e and 0cc7bc5.

📒 Files selected for processing (1)
  • examples/backends/vllm/launch/agg_multimodal_epd.sh (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ptarasiewiczNV
Repo: ai-dynamo/dynamo PR: 2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: operator (amd64)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
examples/backends/vllm/launch/agg_multimodal_epd.sh (1)

82-83: GPU allocation fix is correct, but verify multi-GPU availability.

The change properly separates the Encode Worker (GPU 0) and PD Worker (GPU 1) to prevent resource contention, which aligns with the EPD architecture's design goals. However, the script assumes at least 2 GPUs are available without validation.

Verify the following:

  1. Is multi-GPU availability documented as a system requirement for this deployment?
  2. Should the script include error handling or validation to check GPU availability before launching workers?
  3. Are there edge cases (e.g., single-GPU systems) that should be explicitly addressed or documented?

You may want to check if there are related deployment docs or runtime guards elsewhere in the codebase that handle GPU validation.

@krishung5 krishung5 merged commit 9b7152d into main Dec 11, 2025
30 checks passed
@krishung5 krishung5 deleted the krish/fix-cuda-device branch December 11, 2025 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants