-
Notifications
You must be signed in to change notification settings - Fork 748
fix: Pin ARM version for VLLM FlashInfer #2358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi zaristei! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
|
Will try building and evaluating that flashinfer version is correct on both x86 and ARM |
|
/ok to test 023bdae |
WalkthroughThe changes update the FlashInfer version reference in the Dockerfile and modify the installation script to always install FlashInfer from the source repository, removing the previous conditional logic for ARM64 architecture. The installation process is now consistent across all architectures. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Poem
Note 🔌 MCP (Model Context Protocol) integration is now available in Early Access!Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
container/deps/vllm/install_vllm.sh (1)
28-29: Synchronise default FlashInfer ref with DockerfileThe default still points at
v0.2.8rc1, while the Dockerfile now passesv0.2.8.rc1. Keeping these values aligned prevents accidental installs of two different revisions when the script is invoked outside the Docker build.-FLASHINF_REF="v0.2.8rc1" +FLASHINF_REF="v0.2.8.rc1"
♻️ Duplicate comments (1)
container/deps/vllm/install_vllm.sh (1)
170-174: Use shallow clone to speed up FlashInfer source installCloning the full repository (~250 MB) noticeably increases build times. A shallow fetch of the required ref is sufficient and aligns with prior feedback in PR #2020.
-cd $INSTALLATION_DIR -git clone https://github.com/flashinfer-ai/flashinfer.git --recursive -cd flashinfer -git checkout $FLASHINF_REF +cd $INSTALLATION_DIR +git clone --depth 1 --branch "$FLASHINF_REF" \ + https://github.com/flashinfer-ai/flashinfer.git +cd flashinfer
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
container/Dockerfile.vllm(1 hunks)container/deps/vllm/install_vllm.sh(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: zaristei
PR: ai-dynamo/dynamo#2020
File: container/deps/vllm/install_vllm.sh:115-118
Timestamp: 2025-07-21T00:10:56.947Z
Learning: Graceful fallback for PyTorch wheel installation is broken on ARM architecture, so immediate exit on pinned version failure is preferred over fallback mechanisms in container/deps/vllm/install_vllm.sh for ARM64.
Learnt from: dmitry-tokarev-nv
PR: ai-dynamo/dynamo#2300
File: pyproject.toml:64-66
Timestamp: 2025-08-05T22:51:59.230Z
Learning: The ai-dynamo/dynamo project does not ship ARM64 wheels, so platform markers to restrict dependencies to x86_64 are not needed in pyproject.toml dependencies.
📚 Learning: 2025-07-21T00:10:56.947Z
Learnt from: zaristei
PR: ai-dynamo/dynamo#2020
File: container/deps/vllm/install_vllm.sh:115-118
Timestamp: 2025-07-21T00:10:56.947Z
Learning: Graceful fallback for PyTorch wheel installation is broken on ARM architecture, so immediate exit on pinned version failure is preferred over fallback mechanisms in container/deps/vllm/install_vllm.sh for ARM64.
Applied to files:
container/Dockerfile.vllmcontainer/deps/vllm/install_vllm.sh
📚 Learning: 2025-07-22T10:22:28.972Z
Learnt from: ptarasiewiczNV
PR: ai-dynamo/dynamo#2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.
Applied to files:
container/deps/vllm/install_vllm.sh
|
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
|
This PR has been closed due to inactivity. If you believe this PR is still relevant, please feel free to reopen it with additional context or information. |
Overview:
Fixes unpinned arm version for flashinfer in VLLM container
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit