-
Notifications
You must be signed in to change notification settings - Fork 787
feat: Add --use-ai-configurator to profile_sla.py #3079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughIntroduces AI-configurator-based performance estimation into the profiling flow, adding a new estimator module, pluggable callbacks in prefill/decode utilities, CLI flags to select AI-based paths, and tests validating AI-configurator integration and argument handling. Legacy live-deployment behavior remains unchanged when the AI-configurator option is not used. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant CLI as profile_sla CLI
participant Prefill as profile_prefill*
participant Decode as profile_decode*
participant Est as AIConfiguratorPerfEstimator
participant Bench as Live Benchmark
rect rgba(230,245,255,0.5)
Note over CLI: --use-ai-configurator=true
User->>CLI: run_profile(args)
CLI->>Est: init(system, backend, version, model)
CLI->>Prefill: profile_prefill_aiconfigurator(get_ttft via Est)
Prefill->>Est: estimate_prefill_perf(isl)
Est-->>Prefill: {context_latency}
CLI->>Est: get_max_kv_tokens(isl, osl)
CLI->>Decode: profile_decode_aiconfigurator(get_itl/throughput via Est)
Decode->>Est: estimate_perf(isl, osl, batch)
Est-->>Decode: {tpot, tokens/s/gpu}
Decode-->>CLI: raw_data + plots
Prefill-->>CLI: raw_data + plots
end
rect rgba(240,240,240,0.5)
Note over CLI: --use-ai-configurator=false
User->>CLI: run_profile(args)
CLI->>Prefill: profile_prefill (benchmark)
Prefill->>Bench: benchmark_prefill(isl)
Bench-->>Prefill: TTFT
CLI->>Decode: profile_decode (benchmark)
Decode->>Bench: benchmark_decode(isl, osl, num_requests)
Bench-->>Decode: ITL, throughput
Decode-->>CLI: raw_data + plots
Prefill-->>CLI: raw_data + plots
end
note right of CLI: *Helpers use callbacks to obtain metrics
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
benchmarks/profiler/profile_sla.py (2)
399-404: Return early if no prefill results to avoidmin()on empty listWhen prefill produced no results, the code logs an error but still proceeds to call
min(prefill_ttft), which crashes. Exit the recommendations phase early.Apply this diff:
else: logger.info("Analyzing results and generate recommendations...") # Safety guards: no results → exit early with a clear message if not (prefill_tp_size and prefill_ttft and prefill_thpt_per_gpu): logger.error("No prefill results produced; skipping recommendations.") + return
461-470: Guard KV-cache utilization division by zero
decode_kv_cache_size[...]can be 0 (parser returned 0), causing a crash. Add a safe path.Apply this diff:
- selected_decode_kv_cache_utilization = ( - decode_concurrency[selected_decode_idx] - * (args.isl + (args.osl / 2)) - / decode_kv_cache_size[selected_decode_idx] - ) + denom = decode_kv_cache_size[selected_decode_idx] + if denom <= 0: + logger.warning("KV cache size is non-positive; skipping utilization recommendation.") + selected_decode_kv_cache_utilization = 0.0 + else: + selected_decode_kv_cache_utilization = ( + decode_concurrency[selected_decode_idx] + * (args.isl + (args.osl / 2)) + / denom + )
🧹 Nitpick comments (10)
benchmarks/profiler/utils/estimate_perf.py (1)
157-159: Document backend's private API usageUsing
self.backend._get_memory_usagerelies on a private method (prefixed with underscore). This could break in future aiconfigurator updates.Consider adding a comment acknowledging this dependency:
def get_mem_usage(bs: int): + # Note: Using private API _get_memory_usage from aiconfigurator backend return self.backend._get_memory_usage( model, self.database, bs, 1, isl, osl )["total"]tests/profiler/test_profile_sla_aiconfigurator.py (2)
20-20: Remove unusednoqadirectiveThe
noqa: E402directive is unnecessary as E402 is not enabled in your Ruff configuration.-from benchmarks.profiler.profile_sla import run_profile # noqa: E402 +from benchmarks.profiler.profile_sla import run_profile
31-31: Consider using a more secure temporary directoryUsing
/tmp/test_profiling_resultscould potentially lead to security issues in shared environments. Consider using Python'stempfilemodule for test directories.You could enhance test isolation by using a unique temporary directory per test:
import tempfile @pytest.fixture def trtllm_args(self, tmp_path): class Args: backend = "trtllm" config = "components/backends/trtllm/deploy/disagg.yaml" output_dir = str(tmp_path / "test_profiling_results") # ... rest of the argsThis would leverage pytest's built-in
tmp_pathfixture for better test isolation.benchmarks/profiler/profile_sla.py (7)
328-335: Handle empty concurrency sweep and fix log messageSkip TP sizes where
max_concurrency <= 0(or none of the values fit) and fix the misleading log message.Apply this diff:
if not args.dry_run: sweep_num_request = [ num for num in DECODE_NUM_REQUESTS_RANGE if num <= max_concurrency ] - logger.info( - f"Sweeping num_request range based on maximum number of kv tokens: {sweep_num_request}" - ) + if not sweep_num_request: + logger.warning( + f"No feasible concurrency values under max_concurrency={max_concurrency}; skipping TP{tp_size}." + ) + continue + logger.info( + f"Sweeping num_request range based on computed max concurrency ({max_concurrency}): {sweep_num_request}" + )
392-395: Filter out empty-engine results before plotting decodeAvoid plotting with empty series to prevent downstream errors/warnings.
Apply this diff:
- if decode_results: - plot_decode_performance(decode_results, args.itl, args.output_dir) + if decode_results: + nonempty_results = [ + (tp, itl_list, thpt_list) + for (tp, itl_list, thpt_list) in decode_results + if itl_list and thpt_list + ] + if nonempty_results: + plot_decode_performance(nonempty_results, args.itl, args.output_dir) + else: + logger.warning("No non-empty decode results to plot.")
96-109: Consolidate flag validation (addresses Ruff TRY003/TRY301)Reduce repetition and produce a single actionable error listing missing flags when
--use-ai-configuratoris set.Apply this diff:
- if args.use_ai_configurator: - if not args.aic_system: - raise ValueError( - "Must provide --aic-system when using --use-ai-configurator." - ) - if not args.aic_model_name: - raise ValueError( - "Must provide --aic-model-name when using --use-ai-configurator." - ) - if not args.backend_version: - raise ValueError( - "Must provide --backend-version when using --use-ai-configurator." - ) + if args.use_ai_configurator: + missing = [] + if not args.aic_system: + missing.append("--aic-system") + if not args.aic_model_name: + missing.append("--aic-model-name") + if not args.backend_version: + missing.append("--backend-version") + if missing: + raise ValueError( + "Missing required flags for --use-ai-configurator: " + ", ".join(missing) + )
112-116: Avoid forcing lowercase for--aic-systemLowercasing may not match database keys (depends on aiconfigurator DB naming). Prefer passing the value verbatim and normalize inside the estimator if needed.
Apply this diff:
- args.aic_system.lower(), + args.aic_system,To confirm expected casing, please verify what
AIConfiguratorPerfEstimator(and the underlying DB) expects forsystemnames (e.g.,h100_sxmvsH100_SXM). If normalization is needed, do it centrally in the estimator.
571-583: Decode interpolation: computemax_kv_tokensonce and skip when zeroMirror the earlier guard: bail out early if
max_kv_tokens <= 0to avoid empty plots or later errors.Apply this diff:
elif args.use_ai_configurator: max_kv_tokens = ai_configurator_perf_estimator.get_max_kv_tokens( args.isl, args.osl, tp_size=best_decode_tp ) + if max_kv_tokens <= 0: + logger.warning("Estimator returned non-positive max_kv_tokens; skipping decode interpolation.") + max_kv_tokens = 0 + # Optionally return early or continue without plotting; choose one: + return profile_decode_aiconfigurator(
627-629: Surface estimator DB-not-found errors with actionable guidanceCatching
ValueErrorfrom the estimator (e.g., DB not found) can produce clearer logs before re-raising.Apply this diff:
- except Exception as e: + except ValueError as e: + logger.error(f"AI-configurator initialization failed: {e}. " + "Check --aic-system/--aic-model-name/--backend-version are supported by the local DB.") + raise + except Exception as e: logger.error(f"Profile job failed with error: {e}") raise
321-327: Skip decode TP when log parsing yields zero KV cacheIf
get_kv_cache_size_from_dynamo_logreturns 0,max_concurrencybecomes 0 and later steps may misbehave. Guard and continue.Apply this diff:
max_kv_tokens = config_modifier.get_kv_cache_size_from_dynamo_log( f"{work_dir}/{client.deployment_name}/{WORKER_COMPONENT_NAMES[args.backend].decode_worker_k8s_name.lower()}/0.log" ) - max_concurrency = max_kv_tokens // (args.isl + args.osl) + max_concurrency = max_kv_tokens // (args.isl + args.osl) if (args.isl + args.osl) > 0 else 0 + if max_concurrency <= 0: + logger.warning("Parsed KV cache size is zero; skipping this TP size.") + await client.delete_deployment() + deployment_clients.remove(client) + logger.info("Deployment deleted") + continue
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
benchmarks/profiler/profile_sla.py(12 hunks)benchmarks/profiler/utils/estimate_perf.py(1 hunks)benchmarks/profiler/utils/profile_decode.py(4 hunks)benchmarks/profiler/utils/profile_prefill.py(4 hunks)tests/profiler/test_profile_sla_aiconfigurator.py(1 hunks)tests/profiler/test_profile_sla_dryrun.py(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
tests/profiler/test_profile_sla_aiconfigurator.py (2)
benchmarks/profiler/profile_sla.py (1)
run_profile(63-634)tests/profiler/test_profile_sla_dryrun.py (4)
trtllm_args(99-125)Args(30-51)Args(59-80)Args(102-123)
benchmarks/profiler/profile_sla.py (6)
benchmarks/profiler/utils/estimate_perf.py (5)
estimate_perf(72-113)AIConfiguratorPerfEstimator(29-210)estimate_prefill_perf(115-135)get_max_batch_size(137-188)get_max_kv_tokens(190-210)benchmarks/profiler/utils/profile_decode.py (2)
profile_decode(93-128)profile_decode_aiconfigurator(130-155)benchmarks/profiler/utils/profile_prefill.py (2)
profile_prefill(72-100)profile_prefill_aiconfigurator(102-125)benchmarks/profiler/utils/config.py (4)
get_kv_cache_size_from_dynamo_log(195-196)get_kv_cache_size_from_dynamo_log(383-403)get_kv_cache_size_from_dynamo_log(591-603)get_kv_cache_size_from_dynamo_log(828-854)deploy/utils/dynamo_deployment.py (2)
get_service_url(211-217)delete_deployment(463-481)benchmarks/profiler/utils/genai_perf.py (1)
benchmark_decode(189-247)
benchmarks/profiler/utils/profile_decode.py (2)
benchmarks/profiler/utils/estimate_perf.py (2)
estimate_perf(72-113)AIConfiguratorPerfEstimator(29-210)benchmarks/profiler/utils/genai_perf.py (1)
benchmark_decode(189-247)
benchmarks/profiler/utils/profile_prefill.py (2)
benchmarks/profiler/utils/estimate_perf.py (3)
estimate_perf(72-113)AIConfiguratorPerfEstimator(29-210)estimate_prefill_perf(115-135)benchmarks/profiler/utils/genai_perf.py (1)
benchmark_prefill(154-186)
🪛 Ruff (0.12.2)
tests/profiler/test_profile_sla_aiconfigurator.py
20-20: Unused noqa directive (non-enabled: E402)
Remove unused noqa directive
(RUF100)
31-31: Probable insecure usage of temporary file or directory: "/tmp/test_profiling_results"
(S108)
benchmarks/profiler/utils/estimate_perf.py
53-55: Avoid specifying long messages outside the exception class
(TRY003)
benchmarks/profiler/profile_sla.py
98-100: Abstract raise to an inner function
(TRY301)
98-100: Avoid specifying long messages outside the exception class
(TRY003)
102-104: Abstract raise to an inner function
(TRY301)
102-104: Avoid specifying long messages outside the exception class
(TRY003)
106-108: Abstract raise to an inner function
(TRY301)
106-108: Avoid specifying long messages outside the exception class
(TRY003)
benchmarks/profiler/utils/profile_decode.py
26-26: Unused function argument: num_gpus
(ARG001)
102-102: Unused function argument: ai_configurator_perf_estimator
(ARG001)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3079/merge) by ilyasher.
benchmarks/profiler/utils/profile_decode.py
[error] 1-1: Ruff linting: 1 error found and auto-fixed by pre-commit; please re-run pre-commit to verify.
benchmarks/profiler/utils/profile_prefill.py
[error] 1-1: Ruff linting: 1 error found and auto-fixed by pre-commit; please re-run pre-commit to verify.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (6)
benchmarks/profiler/utils/estimate_perf.py (1)
53-55: Move exception message construction into ValueErrorFollowing TRY003, avoid constructing long error messages outside the exception class.
Apply this fix:
if not self.database: raise ValueError( - f"Database not found for system: {system}, backend: {backend}, version: {version}" + f"Database not found for system: {system}, backend: {backend}, version: {version}" )Wait, the code already has the message inside the
ValueError. This appears to be a false positive from the static analysis tool.tests/profiler/test_profile_sla_dryrun.py (1)
48-51: LGTM! Consistent addition of AI configurator fieldsThe new fields (
use_ai_configurator,aic_system,aic_model_name,backend_version) are properly initialized across all three test fixtures, maintaining consistency with the new AI configurator functionality.Also applies to: 77-80, 120-123
tests/profiler/test_profile_sla_aiconfigurator.py (1)
87-103: LGTM! Comprehensive test coverage for AI configuratorThe parameterized tests provide good coverage across multiple backend versions and model names, ensuring the AI configurator integration works with various configurations.
benchmarks/profiler/utils/profile_decode.py (2)
130-155: LGTM! Clean implementation of AI configurator integrationThe
profile_decode_aiconfiguratorfunction properly delegates to the helper with a clean callback implementation that extracts the required metrics from the estimator's response.
1-156: Run pre-commit to apply Ruff auto-fixesPipeline reported a Ruff lint error that pre-commit auto-fixed — run pre-commit locally (e.g., pre-commit run --all-files) and commit the resulting fixes for benchmarks/profiler/utils/profile_decode.py.
benchmarks/profiler/utils/profile_prefill.py (1)
1-126: Run pre-commit or remove unused importTuplein benchmarks/profiler/utils/profile_prefill.pyRuff flagged an unused import — the file still contains
from typing import Callable, Optional, Tuple(line 5). Runpre-commit run --all-filesor removeTupleand commit.
e946b01 to
ce20640
Compare
|
shall we add doc to |
|
@tedzhouhk I just updated the README |
Signed-off-by: Ilya Sherstyuk <[email protected]>
Signed-off-by: Ilya Sherstyuk <[email protected]>
Signed-off-by: Ilya Sherstyuk <[email protected]>
Signed-off-by: Ilya Sherstyuk <[email protected]>
Signed-off-by: Ilya Sherstyuk <[email protected]>
Signed-off-by: Ilya Sherstyuk <[email protected]>
Signed-off-by: Ilya Sherstyuk <[email protected]>
Signed-off-by: Ilya Sherstyuk <[email protected]>
Signed-off-by: Ilya Sherstyuk <[email protected]>
Co-authored-by: Hongkuan Zhou <[email protected]> Signed-off-by: Ilya Sherstyuk <[email protected]>
Co-authored-by: Hongkuan Zhou <[email protected]> Signed-off-by: Ilya Sherstyuk <[email protected]>
Signed-off-by: Ilya Sherstyuk <[email protected]>
dd3cace to
1e5dd89
Compare
|
/ok to test 6b6c005 |
Overview:
This PR adds a
--use-ai-configuratorflag toprofile_sla.py. If this flag is passed,profile_sla.pywill useaiconfiguratorto estimate the model perf for the various configs it tests instead of running an actual dynamo deployment.Advantages of
--use-ai-configurator:profile_sla.pytakes seconds instead of hoursDisadvantages:
Example usage
Details:
--use-ai-configuratorflag as well as--backend-version,--aic-model-name,--aic-systemprofile_sla.pyto callaiconfiguratorwhen--use-ai-configuratoris calledestimate_perf.pywhich contains a helper class for usingaiconfiguratorprofile_sla.py+--use-ai-configuratorWhere should the reviewer start?
Reviewer can read the added tests, then changes to
profile_sla.py, then other files.Summary by CodeRabbit
New Features
Tests