feat: update how inputs are input into the benchmark script #3187

hhzhang16 · 2025-09-23T21:15:24Z

Overview:

instead of --input name=url, use

--benchmark_name
--endpoint_url
--model

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Added support for External HTTP endpoints.
- Introduced new CLI flags: --benchmark_name and --endpoint_url for single-endpoint benchmarking.
Documentation
- Updated usage examples and guidance to use --benchmark_name and --endpoint_url (replacing --input).
- Revised plotting instructions to organize outputs by benchmark name.
- Removed Sequential Execution guidance.
- Updated in-cluster job examples to reflect new flags.
- Clarified naming rules and reserved names.
Chores
- Simplified benchmark configuration by removing multi-input label mappings in samples.

Signed-off-by: Hannah Zhang <[email protected]>

copy-pr-bot · 2025-09-23T21:15:27Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-09-23T21:22:15Z

Walkthrough

The CLI switches from --input label=URL to explicit --benchmark_name and --endpoint_url. Validation now uses validate_benchmark_name and validate_endpoint. The in-cluster job YAML and documentation are updated accordingly. Plotting and output directories are keyed by benchmark name. Supported backends mention external HTTP endpoints.

Changes

Cohort / File(s)	Summary
CLI and validation refactor `benchmarks/utils/benchmark.py`	Replaces multi-input parsing/validation with single-endpoint validators (`validate_endpoint`, `validate_benchmark_name`); removes `validate_inputs` and `parse_input`; updates main flow to accept `--benchmark_name` and `--endpoint_url` and pass a single mapping to the workflow.
In-cluster job config `benchmarks/incluster/benchmark_job.yaml`	Updates args from `--input "<label>=<url>"` to `--benchmark_name "<name>"` and `--endpoint_url "<url>"`; removes commented multi-input guidance.
Docs: README `benchmarks/README.md`	Revises flags and examples to use `--benchmark_name` and `--endpoint_url`; removes sequential execution notes; updates plotting and supported backends text.
Docs: guide `docs/benchmarks/benchmarking.md`	Rewrites examples and terminology from input labels to benchmark names; updates client/in-cluster usage, naming restrictions, plotting references, and multi-namespace guidance.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as User
  participant C as CLI (benchmark.py)
  participant V as Validators
  participant R as Benchmark Runner
  participant E as Endpoint (HTTP/In-cluster)

  U->>C: Invoke with --benchmark_name, --endpoint_url
  C->>V: validate_benchmark_name(name)
  V-->>C: ok / error
  C->>V: validate_endpoint(url)
  V-->>C: ok / error
  rect rgba(220,240,255,0.6)
    note right of C: Build single mapping {name: url}
    C->>R: run_benchmark_workflow({name: url})
  end
  R->>E: Send benchmark requests
  E-->>R: Responses/metrics
  R-->>C: Results
  C-->>U: Outputs & plots under name/

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

feat: remove kubectl dependencies from benchmarking #3098 — Also changes benchmarking CLI and validation in benchmarks/utils/benchmark.py, overlapping the transition away from earlier input handling.
feat: allow in-cluster perf benchmarks with a kubectl one-liner #3144 — Modifies input validation logic in the same module; related to the shift from validate_inputs to endpoint/name validators.

Poem

I hop through flags, a nimble sprite,
From labels lost to names in light.
One endpoint path, clear as the sun,
Bench by name—then metrics run.
Plots burrow neatly, carrot-bright—
Thump-thump! A tidy benchmark night. 🥕🐇

Pre-merge checks

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The PR description includes a brief Overview that states the new flags but leaves the Details and "Where should the reviewer start?" sections empty and uses a placeholder issue number, so it does not meet the repository template's expectations for a complete description and lacks guidance reviewers need.	Please expand the Details section with a concise summary of code and documentation changes, list specific files or functions for reviewers to inspect under "Where should the reviewer start?", provide testing or verification steps and expected behavior, and replace the placeholder issue number with the correct issue reference or remove it.
Docstring Coverage	⚠️ Warning	Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title correctly identifies the primary change—how inputs are provided to the benchmark script—so it relates closely to the changeset; the wording is slightly awkward but still concise and informative for reviewers.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

docs/benchmarks/benchmarking.md (1)
341-347: Replace ${INPUT_NAME} with ${BENCHMARK_NAME} in retrieval commands.

The CLI no longer uses input labels; this will confuse users and copy/paste will fail.
 python3 -m deploy.utils.download_pvc_results \
   --namespace $NAMESPACE \
-  --output-dir ./benchmarks/results/${INPUT_NAME} \
-  --folder /data/results/${INPUT_NAME} \
+  --output-dir ./benchmarks/results/${BENCHMARK_NAME} \
+  --folder /data/results/${BENCHMARK_NAME} \
   --no-config

🧹 Nitpick comments (2)

benchmarks/utils/benchmark.py (2)

15-33: Also validate port range for HTTP(S) endpoints.

Currently HTTP(S) endpoints skip port checks; add a simple bound check.

     if is_running_in_cluster():
         # Allow HTTP(S) or internal service URLs like host[:port][/path]
         if has_http_scheme(v):
-            pass
+            parts = urlsplit(v)
+            if parts.port is not None and not (1 <= parts.port <= 65535):
+                raise ValueError("Endpoint port must be 1-65535")
         else:
             parts = urlsplit(f"//{v}")
             host_ok = bool(parts.hostname)
             port_ok = parts.port is None or (1 <= parts.port <= 65535)
             if not (host_ok and port_ok):
                 raise ValueError(
                     f"Endpoint must be HTTP(S) or internal service URL. Got: {endpoint}"
                 )

58-62: Clarify argparse help to mention in‑cluster service URLs.

-    help="HTTP endpoint URL to benchmark (e.g., http://localhost:8000)",
+    help="Endpoint to benchmark: HTTP(S) URL (e.g., http://localhost:8000) or in-cluster service URL host[:port]",

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4a71802 and 9ca24fd.

📒 Files selected for processing (4)

benchmarks/README.md (2 hunks)
benchmarks/incluster/benchmark_job.yaml (1 hunks)
benchmarks/utils/benchmark.py (2 hunks)
docs/benchmarks/benchmarking.md (13 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

benchmarks/utils/benchmark.py (2)

deploy/utils/kubernetes.py (1)

is_running_in_cluster (27-30)

benchmarks/utils/workflow.py (2)

has_http_scheme (12-14)

run_benchmark_workflow (86-105)

🪛 Ruff (0.13.1)

benchmarks/utils/benchmark.py

27-29: Avoid specifying long messages outside the exception class

(TRY003)

32-32: Avoid specifying long messages outside the exception class

(TRY003)

38-38: Avoid specifying long messages outside the exception class

(TRY003)

44-44: Avoid specifying long messages outside the exception class

(TRY003)

48-48: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and Test - dynamo

benchmarks/incluster/benchmark_job.yaml

benchmarks/README.md

docs/benchmarks/benchmarking.md

Signed-off-by: Hannah Zhang <[email protected]>

benchmarks/incluster/benchmark_job.yaml

benchmarks/utils/benchmark.py

Signed-off-by: Hannah Zhang <[email protected]>

biswapanda

few minor points. lgtm otherwise

Signed-off-by: Hannah Zhang <[email protected]>

-update-how-inputs-are-input-into-the-benchmark-script

benchmarks/incluster/benchmark_job.yaml

hhzhang16 · 2025-09-24T18:42:31Z

@tmonty12 @biswapanda sounds good, I'll add that note!

Signed-off-by: Hannah Zhang <[email protected]>

-update-how-inputs-are-input-into-the-benchmark-script

hhzhang16 · 2025-09-24T21:52:29Z

/ok to test 4a905f5

Signed-off-by: Hannah Zhang <[email protected]> Signed-off-by: Kyle H <[email protected]>

hhzhang16 added 3 commits September 23, 2025 10:25

feat: update benchmark script benchmark input

0fdd219

Signed-off-by: Hannah Zhang <[email protected]>

feat: update benchmark job yaml to reflect new input changes

4ebc94f

Signed-off-by: Hannah Zhang <[email protected]>

docs: update benchmarking docs with new input format

9ca24fd

Signed-off-by: Hannah Zhang <[email protected]>

hhzhang16 requested review from a team as code owners September 23, 2025 21:15

pull-request-size bot added the size/L label Sep 23, 2025

github-actions bot added the feat label Sep 23, 2025

coderabbitai bot reviewed Sep 23, 2025

View reviewed changes

benchmarks/incluster/benchmark_job.yaml Outdated Show resolved Hide resolved

benchmarks/README.md Outdated Show resolved Hide resolved

docs/benchmarks/benchmarking.md Show resolved Hide resolved

docs/benchmarks/benchmarking.md Outdated Show resolved Hide resolved

hhzhang16 added 4 commits September 23, 2025 14:29

feat: update benchmark name

7ec41a1

Signed-off-by: Hannah Zhang <[email protected]>

feat: make endpoint-url description clearer

b879405

Signed-off-by: Hannah Zhang <[email protected]>

feat: use - instead of _ for arg names

4822f82

Signed-off-by: Hannah Zhang <[email protected]>

feat: update image name

447e05e

Signed-off-by: Hannah Zhang <[email protected]>

biswapanda reviewed Sep 24, 2025

View reviewed changes

benchmarks/incluster/benchmark_job.yaml Outdated Show resolved Hide resolved

biswapanda reviewed Sep 24, 2025

View reviewed changes

benchmarks/utils/benchmark.py Show resolved Hide resolved

feat: remove plotting from benchmark script

5ce4bc7

Signed-off-by: Hannah Zhang <[email protected]>

biswapanda reviewed Sep 24, 2025

View reviewed changes

hhzhang16 added 2 commits September 24, 2025 11:32

docs: adding some clarifications

f5b5c96

Signed-off-by: Hannah Zhang <[email protected]>

Merge branch 'main' of github.com:ai-dynamo/dynamo into hannahz/dyn-1032

d2ff2fd

-update-how-inputs-are-input-into-the-benchmark-script

tmonty12 approved these changes Sep 24, 2025

View reviewed changes

benchmarks/incluster/benchmark_job.yaml Outdated Show resolved Hide resolved

hhzhang16 added 3 commits September 24, 2025 11:54

docs: add note about container image

239d1db

Signed-off-by: Hannah Zhang <[email protected]>

docs: remove INPUT_NAME env var

d0a3c44

Signed-off-by: Hannah Zhang <[email protected]>

Merge branch 'main' of github.com:ai-dynamo/dynamo into hannahz/dyn-1032

4a905f5

-update-how-inputs-are-input-into-the-benchmark-script

hhzhang16 enabled auto-merge (squash) September 24, 2025 22:04

hhzhang16 merged commit fb12b67 into main Sep 24, 2025
15 checks passed

hhzhang16 deleted the hannahz/dyn-1032-update-how-inputs-are-input-into-the-benchmark-script branch September 24, 2025 22:27

kylehh pushed a commit that referenced this pull request Sep 25, 2025

feat: update how inputs are input into the benchmark script (#3187)

3a6ff34

Signed-off-by: Hannah Zhang <[email protected]> Signed-off-by: Kyle H <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: update how inputs are input into the benchmark script #3187

feat: update how inputs are input into the benchmark script #3187

Uh oh!

hhzhang16 commented Sep 23, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Sep 23, 2025

Uh oh!

coderabbitai bot commented Sep 23, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

biswapanda left a comment

Uh oh!

Uh oh!

hhzhang16 commented Sep 24, 2025

Uh oh!

hhzhang16 commented Sep 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: update how inputs are input into the benchmark script #3187

feat: update how inputs are input into the benchmark script #3187

Uh oh!

Conversation

hhzhang16 commented Sep 23, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Sep 23, 2025

Uh oh!

coderabbitai bot commented Sep 23, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

biswapanda left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hhzhang16 commented Sep 24, 2025

Uh oh!

hhzhang16 commented Sep 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hhzhang16 commented Sep 23, 2025 •

edited by coderabbitai bot

Loading