-
Notifications
You must be signed in to change notification settings - Fork 767
refactor: break profile_sla into different files; feat: support vllm_v1 #1588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe profiling script and its related utilities were refactored for modularity, moving local utility functions and constants into new utility modules. Backend-specific configuration handling was introduced via a dictionary of modifier classes. Plotting and benchmarking logic were externalized, and the main script now selects backend-specific behaviors based on a new argument. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant profile_sla.py
participant ConfigModifier
participant Utils
participant GenAIPerf
participant Plot
User->>profile_sla.py: Run with --backend <vllm_v0/vllm_v1>
profile_sla.py->>ConfigModifier: Select config modifier by backend
profile_sla.py->>Utils: Detect GPUs, prepare deployment
profile_sla.py->>ConfigModifier: Convert and set config for profiling
profile_sla.py->>Utils: Launch deployment, wait for server
loop Profiling (Prefill/Decode)
profile_sla.py->>GenAIPerf: Run benchmark (prefill/decode)
GenAIPerf-->>profile_sla.py: Return profiling results
end
profile_sla.py->>Plot: Generate plots from results
profile_sla.py-->>User: Save and report results
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🔭 Outside diff range comments (2)
benchmarks/profiler/profile_sla.py (2)
125-129:log2(0)crash when no GPU presentGuard against
available_gpus == 0to avoidValueError.if available_gpus == 0: logger.error("No GPU detected – cannot profile.") exit(1)
338-362:tp_sizeis undefined – should bebest_prefill_tpThe selected-prefill interpolation block re-uses
tp_size, which is out of scope and will raiseNameError.-prefill_config = config_modifier.set_config_tp_size(prefill_config, tp_size) +prefill_config = config_modifier.set_config_tp_size( + prefill_config, best_prefill_tp +) … -logger.info(f"Starting dynamo serve with TP size {tp_size}...") +dynamo_serve_cmd = get_dynamo_serve_cmd(prefill_config_fn) +logger.info(f"Starting dynamo serve with TP size {best_prefill_tp}…")Apply the same substitution for all subsequent references inside this block.
♻️ Duplicate comments (2)
benchmarks/profiler/utils/config.py (1)
218-236: Same undefined-variable issue as aboveReplicate the fix for the v1 path.
benchmarks/profiler/profile_sla.py (1)
430-446: Decode interpolation block suffers from the same variable leakReplace
tp_sizewithbest_decode_tpfor log messages and throughput calculation.
🧹 Nitpick comments (2)
benchmarks/profiler/utils/genai_perf.py (1)
154-160: Prefer context-manager forPopenUsing
with subprocess.Popen(…) as proc:guarantees the process is reaped on exceptions.Also applies to: 187-193, 204-210
benchmarks/profiler/utils/plot.py (1)
68-72: Callplt.tight_layout()before savingThis avoids clipped labels when figures are resized.
plt.savefig(plot_path, dpi=300) + plt.tight_layout()Also applies to: 97-101, 144-170, 218-221
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
benchmarks/profiler/profile_sla.py(12 hunks)benchmarks/profiler/utils/config.py(1 hunks)benchmarks/profiler/utils/defaults.py(1 hunks)benchmarks/profiler/utils/genai_perf.py(1 hunks)benchmarks/profiler/utils/plot.py(1 hunks)benchmarks/profiler/utils/utils.py(1 hunks)components/planner/src/dynamo/planner/defaults.py(1 hunks)components/planner/src/dynamo/planner/local_connector.py(1 hunks)docs/architecture/load_planner.md(1 hunks)docs/architecture/sla_planner.md(1 hunks)
🧰 Additional context used
🪛 Pylint (3.3.7)
benchmarks/profiler/utils/utils.py
[refactor] 85-100: Too many nested blocks (6/5)
(R1702)
components/planner/src/dynamo/planner/defaults.py
[refactor] 49-49: Too few public methods (0/2)
(R0903)
[refactor] 54-54: Too few public methods (0/2)
(R0903)
benchmarks/profiler/utils/config.py
[refactor] 104-107: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
[refactor] 111-114: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
[refactor] 204-207: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
[refactor] 211-214: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
benchmarks/profiler/utils/genai_perf.py
[refactor] 66-66: Too many arguments (6/5)
(R0913)
[refactor] 66-66: Too many positional arguments (6/5)
(R0917)
[refactor] 99-99: Too many arguments (7/5)
(R0913)
[refactor] 99-99: Too many positional arguments (7/5)
(R0917)
[refactor] 161-169: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
[refactor] 154-159: Consider using 'with' for resource-allocating operations
(R1732)
[refactor] 172-172: Too many arguments (6/5)
(R0913)
[refactor] 172-172: Too many positional arguments (6/5)
(R0917)
[refactor] 187-192: Consider using 'with' for resource-allocating operations
(R1732)
[refactor] 211-219: Unnecessary "else" after "return", remove the "else" and de-indent the code inside it
(R1705)
[refactor] 204-209: Consider using 'with' for resource-allocating operations
(R1732)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (4)
benchmarks/profiler/utils/defaults.py (1)
16-31: LGTM! Clean constant definition for decode profiling.The
DECODE_NUM_REQUESTS_RANGEconstant is well-structured with reasonable values for profiling decode workloads across different request counts. The modularization of this constant into a dedicated defaults module improves code organization.components/planner/src/dynamo/planner/defaults.py (1)
49-62: Backend-specific component name mapping looks good.The design cleanly separates component names by backend version using dedicated classes and a mapping dictionary. This enables runtime selection of appropriate component names based on the backend parameter.
Note: The static analysis warnings about "too few public methods" are false positives - these classes are intentionally designed as data containers for constants, not traditional classes with methods.
docs/architecture/load_planner.md (1)
30-31: Documentation correctly updated to reflect refactoring changes.The module path change from
utils.profile_slatoprofile_slaand the addition of the required--backendargument are consistent with the refactoring and backend-specific support introduced in this PR.components/planner/src/dynamo/planner/local_connector.py (1)
35-42: Backend parameter added but not utilized.The
backendparameter is documented in the constructor but not assigned to any instance variable or used within the constructor body.Verify if the backend parameter should be stored as an instance variable or used elsewhere:
#!/bin/bash # Description: Check if backend parameter is used in other methods or subclasses # Expected: Find usage patterns or confirm it's preparation for future implementation # Search for backend usage in LocalConnector and related files ast-grep --pattern 'self.backend' rg -A 3 -B 3 'backend.*str' components/planner/src/dynamo/planner/
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hongkuan Zhou <[email protected]>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Hongkuan Zhou <[email protected]>
Summary by CodeRabbit
New Features
Documentation
Refactor