Skip to content

Conversation

@hhzhang16
Copy link
Contributor

@hhzhang16 hhzhang16 commented Sep 23, 2025

Overview:

instead of --input name=url, use

--benchmark_name
--endpoint_url
--model

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Added support for External HTTP endpoints.
    • Introduced new CLI flags: --benchmark_name and --endpoint_url for single-endpoint benchmarking.
  • Documentation

    • Updated usage examples and guidance to use --benchmark_name and --endpoint_url (replacing --input).
    • Revised plotting instructions to organize outputs by benchmark name.
    • Removed Sequential Execution guidance.
    • Updated in-cluster job examples to reflect new flags.
    • Clarified naming rules and reserved names.
  • Chores

    • Simplified benchmark configuration by removing multi-input label mappings in samples.

@hhzhang16 hhzhang16 requested review from a team as code owners September 23, 2025 21:15
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 23, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 23, 2025

Walkthrough

The CLI switches from --input label=URL to explicit --benchmark_name and --endpoint_url. Validation now uses validate_benchmark_name and validate_endpoint. The in-cluster job YAML and documentation are updated accordingly. Plotting and output directories are keyed by benchmark name. Supported backends mention external HTTP endpoints.

Changes

Cohort / File(s) Summary
CLI and validation refactor
benchmarks/utils/benchmark.py
Replaces multi-input parsing/validation with single-endpoint validators (validate_endpoint, validate_benchmark_name); removes validate_inputs and parse_input; updates main flow to accept --benchmark_name and --endpoint_url and pass a single mapping to the workflow.
In-cluster job config
benchmarks/incluster/benchmark_job.yaml
Updates args from --input "<label>=<url>" to --benchmark_name "<name>" and --endpoint_url "<url>"; removes commented multi-input guidance.
Docs: README
benchmarks/README.md
Revises flags and examples to use --benchmark_name and --endpoint_url; removes sequential execution notes; updates plotting and supported backends text.
Docs: guide
docs/benchmarks/benchmarking.md
Rewrites examples and terminology from input labels to benchmark names; updates client/in-cluster usage, naming restrictions, plotting references, and multi-namespace guidance.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as User
  participant C as CLI (benchmark.py)
  participant V as Validators
  participant R as Benchmark Runner
  participant E as Endpoint (HTTP/In-cluster)

  U->>C: Invoke with --benchmark_name, --endpoint_url
  C->>V: validate_benchmark_name(name)
  V-->>C: ok / error
  C->>V: validate_endpoint(url)
  V-->>C: ok / error
  rect rgba(220,240,255,0.6)
    note right of C: Build single mapping {name: url}
    C->>R: run_benchmark_workflow({name: url})
  end
  R->>E: Send benchmark requests
  E-->>R: Responses/metrics
  R-->>C: Results
  C-->>U: Outputs & plots under name/
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

I hop through flags, a nimble sprite,
From labels lost to names in light.
One endpoint path, clear as the sun,
Bench by name—then metrics run.
Plots burrow neatly, carrot-bright—
Thump-thump! A tidy benchmark night. 🥕🐇

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description includes a brief Overview that states the new flags but leaves the Details and "Where should the reviewer start?" sections empty and uses a placeholder issue number, so it does not meet the repository template's expectations for a complete description and lacks guidance reviewers need. Please expand the Details section with a concise summary of code and documentation changes, list specific files or functions for reviewers to inspect under "Where should the reviewer start?", provide testing or verification steps and expected behavior, and replace the placeholder issue number with the correct issue reference or remove it.
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title correctly identifies the primary change—how inputs are provided to the benchmark script—so it relates closely to the changeset; the wording is slightly awkward but still concise and informative for reviewers.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
docs/benchmarks/benchmarking.md (1)

341-347: Replace ${INPUT_NAME} with ${BENCHMARK_NAME} in retrieval commands.

The CLI no longer uses input labels; this will confuse users and copy/paste will fail.

 python3 -m deploy.utils.download_pvc_results \
   --namespace $NAMESPACE \
-  --output-dir ./benchmarks/results/${INPUT_NAME} \
-  --folder /data/results/${INPUT_NAME} \
+  --output-dir ./benchmarks/results/${BENCHMARK_NAME} \
+  --folder /data/results/${BENCHMARK_NAME} \
   --no-config
🧹 Nitpick comments (2)
benchmarks/utils/benchmark.py (2)

15-33: Also validate port range for HTTP(S) endpoints.

Currently HTTP(S) endpoints skip port checks; add a simple bound check.

     if is_running_in_cluster():
         # Allow HTTP(S) or internal service URLs like host[:port][/path]
         if has_http_scheme(v):
-            pass
+            parts = urlsplit(v)
+            if parts.port is not None and not (1 <= parts.port <= 65535):
+                raise ValueError("Endpoint port must be 1-65535")
         else:
             parts = urlsplit(f"//{v}")
             host_ok = bool(parts.hostname)
             port_ok = parts.port is None or (1 <= parts.port <= 65535)
             if not (host_ok and port_ok):
                 raise ValueError(
                     f"Endpoint must be HTTP(S) or internal service URL. Got: {endpoint}"
                 )

58-62: Clarify argparse help to mention in‑cluster service URLs.

-    help="HTTP endpoint URL to benchmark (e.g., http://localhost:8000)",
+    help="Endpoint to benchmark: HTTP(S) URL (e.g., http://localhost:8000) or in-cluster service URL host[:port]",
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a71802 and 9ca24fd.

📒 Files selected for processing (4)
  • benchmarks/README.md (2 hunks)
  • benchmarks/incluster/benchmark_job.yaml (1 hunks)
  • benchmarks/utils/benchmark.py (2 hunks)
  • docs/benchmarks/benchmarking.md (13 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
benchmarks/utils/benchmark.py (2)
deploy/utils/kubernetes.py (1)
  • is_running_in_cluster (27-30)
benchmarks/utils/workflow.py (2)
  • has_http_scheme (12-14)
  • run_benchmark_workflow (86-105)
🪛 Ruff (0.13.1)
benchmarks/utils/benchmark.py

27-29: Avoid specifying long messages outside the exception class

(TRY003)


32-32: Avoid specifying long messages outside the exception class

(TRY003)


38-38: Avoid specifying long messages outside the exception class

(TRY003)


44-44: Avoid specifying long messages outside the exception class

(TRY003)


48-48: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo

Copy link
Contributor

@biswapanda biswapanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few minor points. lgtm otherwise

@hhzhang16
Copy link
Contributor Author

@tmonty12 @biswapanda sounds good, I'll add that note!

@hhzhang16
Copy link
Contributor Author

/ok to test 4a905f5

@hhzhang16 hhzhang16 enabled auto-merge (squash) September 24, 2025 22:04
@hhzhang16 hhzhang16 merged commit fb12b67 into main Sep 24, 2025
15 checks passed
@hhzhang16 hhzhang16 deleted the hannahz/dyn-1032-update-how-inputs-are-input-into-the-benchmark-script branch September 24, 2025 22:27
kylehh pushed a commit that referenced this pull request Sep 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants