-
Notifications
You must be signed in to change notification settings - Fork 2k
refactor cuda graph runner #6770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
Signed-off-by: junq <[email protected]>
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughIntroduces a new CUDA graph execution engine for PyTorch LLMs, integrates it into the existing model engine, updates warmup/forward/cleanup paths to delegate CUDA-graph handling to the engine, removes prior bespoke graph logic, and updates a third-party submodule pointer. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant PyTorchModelEngine as ModelEngine
participant CUDAGraphModelEngine as CudaGraphEngine
participant CUDA as CUDA Graph
Client->>ModelEngine: forward(batch, inputs)
ModelEngine->>CudaGraphEngine: pad_batch(scheduled_requests)
CudaGraphEngine-->>ModelEngine: padded batch (context)
ModelEngine->>CudaGraphEngine: execute(batch, inputs, forward_fn)
alt first run or missing graph
CudaGraphEngine->>CUDA: capture(warmup + forward)
CUDA-->>CudaGraphEngine: graph handle + outputs ref
else replay
CudaGraphEngine->>CUDA: replay()
CUDA-->>CudaGraphEngine: outputs
end
CudaGraphEngine-->>ModelEngine: graph_output or None
alt got graph_output
ModelEngine-->>Client: graph_output
else fallback
ModelEngine->>ModelEngine: eager forward_fn()
ModelEngine-->>Client: eager output
end
sequenceDiagram
participant Control as Control Flow
participant ModelEngine
participant CudaGraphEngine
Control->>ModelEngine: warmup()
ModelEngine->>CudaGraphEngine: execute(warmup batch, inputs, forward_fn)
CudaGraphEngine->>CudaGraphEngine: capture graph for batch size
CudaGraphEngine-->>ModelEngine: output (ignored/validated)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~35 minutes Possibly related PRs
Suggested reviewers
Note 🔌 MCP (Model Context Protocol) integration is now available in Early Access!Pro users can now connect to remote MCP servers under the Integrations page to get reviews and chat conversations that understand additional development context. 📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (3)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
Summary by CodeRabbit
New Features
Refactor
Chores