-
Notifications
You must be signed in to change notification settings - Fork 738
fix: Fix CUDA_VISIBLE_DEVICES for agg_multimodal_epd.sh #4888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughA shell script configuration for a multimodal vLLM backend was modified to assign a worker process to a different GPU device (from GPU 0 to GPU 1) in the multi-device setup. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
examples/backends/vllm/launch/agg_multimodal_epd.sh (1)
82-83: Consider documenting GPU allocation strategy.The GPU assignment (Encode Worker on GPU 0, PD Worker on GPU 1) is a key deployment detail that should be more explicitly documented in the script header or comments to help users understand the multi-GPU requirement and potential scaling implications.
Consider adding a brief comment above the worker launch section to explain the GPU allocation:
# run E/P/D workers +# Note: Encode and PD workers are assigned to separate GPUs (0 and 1) for independent scaling CUDA_VISIBLE_DEVICES=0 python -m dynamo.vllm --multimodal-encode-worker --enable-multimodal --model $MODEL_NAME & CUDA_VISIBLE_DEVICES=1 python -m dynamo.vllm --multimodal-worker --enable-multimodal --enable-mm-embeds --model $MODEL_NAME $EXTRA_ARGS &
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/backends/vllm/launch/agg_multimodal_epd.sh(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: ptarasiewiczNV
Repo: ai-dynamo/dynamo PR: 2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: operator (amd64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
examples/backends/vllm/launch/agg_multimodal_epd.sh (1)
82-83: GPU allocation fix is correct, but verify multi-GPU availability.The change properly separates the Encode Worker (GPU 0) and PD Worker (GPU 1) to prevent resource contention, which aligns with the EPD architecture's design goals. However, the script assumes at least 2 GPUs are available without validation.
Verify the following:
- Is multi-GPU availability documented as a system requirement for this deployment?
- Should the script include error handling or validation to check GPU availability before launching workers?
- Are there edge cases (e.g., single-GPU systems) that should be explicitly addressed or documented?
You may want to check if there are related deployment docs or runtime guards elsewhere in the codebase that handle GPU validation.
Overview:
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.