-
Notifications
You must be signed in to change notification settings - Fork 772
feat: add kv router to sglang #1605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi faradawn! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughThis update introduces runtime metrics publishing and ZMQ KV event publishing capabilities to the SGLang worker, adds a configurable router mode to the frontend, and updates the deployment example documentation to reflect these new features. The Dockerfile and configuration files are also updated to support the new functionality. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Frontend
participant Worker
participant MetricsPublisher
participant ZmqKvEventPublisher
User->>Frontend: Send request (specifying router mode)
Frontend->>Worker: Forward request
Worker->>Worker: _update_metrics()
Worker->>MetricsPublisher: Publish updated metrics
Worker->>ZmqKvEventPublisher: Publish KV events
Worker-->>Frontend: Return response
Frontend-->>User: Return result
Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
examples/sglang/components/worker.py (1)
68-80: Placeholder metrics need implementation.The TODO comments correctly identify that these metrics are using placeholder/random values. Ensure these are replaced with actual engine metrics before production use.
Would you like me to help identify the correct SGLang engine APIs to retrieve these metrics or open an issue to track this implementation?
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
container/Dockerfile.sglang(1 hunks)examples/sglang/README.md(1 hunks)examples/sglang/components/frontend.py(2 hunks)examples/sglang/components/worker.py(4 hunks)examples/sglang/configs/agg.yaml(1 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
examples/sglang/components/worker.py
[error] 89-89: Class 'SGLangWorker' has no 'dynamo_address' member
(E1101)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (5)
container/Dockerfile.sglang (1)
138-139: LGTM!The sglang commit update to include ZMQ KV event publisher aligns perfectly with the PR's objective of integrating KV router functionality.
examples/sglang/configs/agg.yaml (1)
23-23: LGTM!The
page-sizeparameter appropriately configures the KV cache block size for the worker.examples/sglang/README.md (1)
74-79: LGTM!Clear documentation of the new KV router option with a practical example command.
examples/sglang/components/frontend.py (1)
48-48: LGTM!The configurable router mode with "round-robin" default provides flexibility while maintaining backward compatibility.
Also applies to: 85-86
examples/sglang/components/worker.py (1)
38-44: Well-implemented KV router integration!The metrics publishing and ZMQ KV event publisher setup correctly supports the KV router functionality. The integration with the existing worker lifecycle is clean and follows best practices.
Also applies to: 65-66, 82-84, 91-115, 127-137, 173-173
WalkthroughThis update introduces metrics publishing and ZMQ KV event publishing to the SGLang worker, adds a configurable router mode for the frontend, and updates documentation and configuration files to reflect these new options. The Dockerfile is updated to use a newer SGLang commit supporting these features. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Frontend
participant SGLangWorker
participant MetricsPublisher
participant ZmqKvEventPublisher
User->>Frontend: Send request (with router mode)
Frontend->>SGLangWorker: Forward request
SGLangWorker->>SGLangWorker: _update_metrics()
SGLangWorker->>MetricsPublisher: Publish metrics
SGLangWorker->>ZmqKvEventPublisher: Relay KV events (if any)
SGLangWorker->>Frontend: Return response
Frontend->>User: Return response
Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
examples/sglang/configs/agg.yaml (1)
23-23: Configuration looks good. Consider adding documentation.The
page-sizeparameter correctly configures the KV cache block size for the worker.Consider adding a comment to document what this parameter controls:
SGLangWorker: model-path: deepseek-ai/DeepSeek-R1-Distill-Llama-8B served-model-name: deepseek-ai/DeepSeek-R1-Distill-Llama-8B + # KV cache block size for memory management page-size: 16
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
container/Dockerfile.sglang(1 hunks)examples/sglang/README.md(1 hunks)examples/sglang/components/frontend.py(2 hunks)examples/sglang/components/worker.py(4 hunks)examples/sglang/configs/agg.yaml(1 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
examples/sglang/components/worker.py
[error] 89-89: Class 'SGLangWorker' has no 'dynamo_address' member
(E1101)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (10)
container/Dockerfile.sglang (1)
138-139: LGTM! Dependency update for KV event publishing support.The SGLang commit update is well-documented and aligns with the PR objectives to integrate KV router functionality.
Please verify that this specific commit is stable and tested:
Is SGLang commit 777688b8929c877e4e28c2eac208d776abe4c3af from PR #68244 stable and ready for production use?examples/sglang/components/frontend.py (2)
48-48: Router configuration properly implemented.The router field with "round-robin" default provides a clean way to configure routing modes.
85-86: Command line argument passing looks correct.The router mode is properly passed to the dynamo-run subprocess.
examples/sglang/README.md (1)
74-79: Documentation clearly explains KV router usage.The new section properly demonstrates how to enable the KV router mode using the command line flag.
examples/sglang/components/worker.py (6)
38-44: Import additions look good.The new imports for metrics and KV event publishing are properly organized.
65-66: Metrics publisher initialization is correct.The WorkerMetricsPublisher is properly instantiated in the constructor.
82-84: Async endpoint creation pattern looks good.The method properly uses the dynamo context to create the metrics endpoint.
91-115: LLM registration and metrics initialization properly implemented.The addition of
kv_cache_block_sizeparameter and initial metrics publishing are correct. The async task creation for the metrics endpoint is a good pattern.
127-137: ZMQ KV Event Publisher configuration looks correct.The publisher is properly configured with worker ID and KV block size. Keeping a reference to prevent garbage collection is a good practice.
173-173: Metrics update on request processing.Calling
_update_metricsat the start of request processing ensures metrics are current.
|
Nice @faradawn - this looks great. I see some TODO's around grabbing engine metrics. There's actually a few in-flight PR's for this on our side and the SGLang side. Goal is to merge them in this week. In the meantime - in the README can you go ahead and call out those TODOs in a
Once these are in - I/we can double back and update based on engine metrics. LMK what you think |
|
Hi @ishandhanani, I have added the pending PRs to the README as a note. Let me know if there are additional things to modify! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @faradawn. Lets try to make sure we get the metrics in as fast as possible as a follow up. Right now its not actually working.
|
Thanks @alec-flowers! I have added logger.warn and mentioned this in the README so that users will know. |
|
Hi @ishandhanani and @alec-flowers, here is the log for kv router: 1) worker score calculation is working, 2) our placeholder warning is shown. Let me know if there is any issue! |
ishandhanani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
|
@ishandhanani and @alec-flowers do you know how to merge? I don't seem to have write access to the repo. Thanks! |
Head branch was pushed to by a user without write access
Overview:
On the SGLang side, kv event emitting has been implemented: sgl-project/sglang#6824
This PR wires up the Dynamo side.
Details:
KV router can be spun up successfully
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit