fix: Fix CUDA_VISIBLE_DEVICES for agg_multimodal_epd.sh #4888

krishung5 · 2025-12-11T08:11:30Z

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

Chores
- Updated backend example scripts to revise GPU resource allocation configurations for multimodal worker processes. These changes optimize hardware resource distribution across multiple GPU devices in multi-GPU environments, enabling improved deployment flexibility and hardware compatibility. Users deploying multimodal services will see benefits in resource utilization and overall system flexibility.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-11T08:13:38Z

Walkthrough

A shell script configuration for a multimodal vLLM backend was modified to assign a worker process to a different GPU device (from GPU 0 to GPU 1) in the multi-device setup.

Changes

Cohort / File(s)	Change Summary
GPU Device Configuration `examples/backends/vllm/launch/agg_multimodal_epd.sh`	Multimodal worker process CUDA device binding changed from `CUDA_VISIBLE_DEVICES=0` to `CUDA_VISIBLE_DEVICES=1`

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Single-line configuration change with no logic alterations
No dependencies or cross-file impacts to verify

Poem

🐰 A GPU swap, so swift and clean,
From zero to one, the device's scene,
The worker hops to a new home today,
Configuration whispers have paved the way! ✨

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The pull request description contains only template placeholders with no substantive content in any required sections (Overview, Details, Where should the reviewer start, Related Issues).	Fill in all required sections: provide an overview of the change, explain why CUDA_VISIBLE_DEVICES needed adjustment, identify the affected script, and link the actual GitHub issue number.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately reflects the main change: relocating the multimodal worker from CUDA_VISIBLE_DEVICES=0 to CUDA_VISIBLE_DEVICES=1 in the agg_multimodal_epd.sh script.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

examples/backends/vllm/launch/agg_multimodal_epd.sh (1)
82-83: Consider documenting GPU allocation strategy.

The GPU assignment (Encode Worker on GPU 0, PD Worker on GPU 1) is a key deployment detail that should be more explicitly documented in the script header or comments to help users understand the multi-GPU requirement and potential scaling implications.

Consider adding a brief comment above the worker launch section to explain the GPU allocation:
 # run E/P/D workers
+# Note: Encode and PD workers are assigned to separate GPUs (0 and 1) for independent scaling
 CUDA_VISIBLE_DEVICES=0 python -m dynamo.vllm --multimodal-encode-worker --enable-multimodal --model $MODEL_NAME &
 CUDA_VISIBLE_DEVICES=1 python -m dynamo.vllm --multimodal-worker --enable-multimodal --enable-mm-embeds --model $MODEL_NAME $EXTRA_ARGS &

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c6b440e and 0cc7bc5.

📒 Files selected for processing (1)

examples/backends/vllm/launch/agg_multimodal_epd.sh (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: ptarasiewiczNV
Repo: ai-dynamo/dynamo PR: 2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: operator (amd64)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (1)

examples/backends/vllm/launch/agg_multimodal_epd.sh (1)

82-83: GPU allocation fix is correct, but verify multi-GPU availability.

The change properly separates the Encode Worker (GPU 0) and PD Worker (GPU 1) to prevent resource contention, which aligns with the EPD architecture's design goals. However, the script assumes at least 2 GPUs are available without validation.

Verify the following:

Is multi-GPU availability documented as a system requirement for this deployment?

Should the script include error handling or validation to check GPU availability before launching workers?

Are there edge cases (e.g., single-GPU systems) that should be explicitly addressed or documented?

You may want to check if there are related deployment docs or runtime guards elsewhere in the codebase that handle GPU validation.

Fix CUDA_VISIBLE_DEVICES

0cc7bc5

krishung5 requested review from a team as code owners December 11, 2025 08:11

pull-request-size bot added the size/XS label Dec 11, 2025

github-actions bot added the fix label Dec 11, 2025

coderabbitai bot reviewed Dec 11, 2025

View reviewed changes

PeaBrane approved these changes Dec 11, 2025

View reviewed changes

krishung5 merged commit 9b7152d into main Dec 11, 2025
30 checks passed

krishung5 deleted the krish/fix-cuda-device branch December 11, 2025 13:44

krishung5 added a commit that referenced this pull request Dec 11, 2025

fix: Fix CUDA_VISIBLE_DEVICES for agg_multimodal_epd.sh (#4888)

12ab77b

krishung5 mentioned this pull request Dec 11, 2025

fix: Fix CUDA_VISIBLE_DEVICES for agg_multimodal_epd.sh (#4888) #4894

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Fix CUDA_VISIBLE_DEVICES for agg_multimodal_epd.sh #4888

fix: Fix CUDA_VISIBLE_DEVICES for agg_multimodal_epd.sh #4888

Uh oh!

krishung5 commented Dec 11, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 11, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: Fix CUDA_VISIBLE_DEVICES for agg_multimodal_epd.sh #4888

fix: Fix CUDA_VISIBLE_DEVICES for agg_multimodal_epd.sh #4888

Uh oh!

Conversation

krishung5 commented Dec 11, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 11, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

krishung5 commented Dec 11, 2025 •

edited by coderabbitai bot

Loading