[DOCS] Restore LLM benchmarking guide (Fixes #2031) #4234

AsadShahid04 · 2025-11-11T09:32:40Z

Summary

This PR restores the lost benchmarking guide for benchmarks/llm scripts, addressing issue #2031. The guide was accidentally removed when the examples/llm directory was deleted in PR #1899.

Changes

Restored comprehensive benchmarking guide at benchmarks/llm/README.md
- Detailed instructions for using perf.sh and plot_pareto.py scripts
- Updated deployment methods (replaced outdated dynamo serve with current Kubernetes and local deployment approaches)
- Added prerequisites, hardware configuration notes, and troubleshooting sections
- Included examples for both aggregated and disaggregated serving modes
- Added instructions for single-node and multi-node deployments
Updated main benchmarks README at benchmarks/README.md
- Added reference to the new LLM benchmarking guide
Fixed bug in perf.sh (bonus fix)
- Modified script to create per-concurrency subdirectories (-concurrency1/, -concurrency2/, etc.) as expected by plot_pareto.py
- This ensures the documented workflow works end-to-end

Testing

✅ Tested locally on macOS (Docker setup)
✅ Tested on brev.dev cloud workspace (Ubuntu 22.04, NVIDIA L40S GPU)
✅ Verified perf.sh creates correct directory structure
✅ Verified plot_pareto.py can parse and generate plots from results
✅ Tested with Qwen/Qwen3-0.6B model

Reference

The original guide content was retrieved from commit 35c56065bb490e12bba84a6abf8107dc1f2c7529 and updated with current deployment methods.

Fixes #2031

@hhzhang16 @athreesh

Summary by CodeRabbit

Documentation
- Enhanced benchmarking documentation with detailed tools and framework information.
- Added comprehensive LLM benchmarking guide covering deployment options (Kubernetes and local), setup prerequisites, hardware recommendations, and multi-tool workflows.
- Included troubleshooting, monitoring guidance, and Pareto frontier plot interpretation for performance analysis.

- Restore benchmarking guide for perf.sh and plot_pareto.py scripts - Replace outdated dynamo serve references with current deployment methods - Add Kubernetes deployment examples using DynamoGraphDeployment - Add local deployment examples using python -m dynamo.frontend + workers - Document script usage, command-line options, and result interpretation - Add comprehensive examples for single-node and multi-node benchmarking - Update benchmarks/README.md to reference the new LLM benchmarking guide - Include troubleshooting section and additional resources TODO - Still needs to be done: - Test all commands locally to verify they work as documented - Test deployment examples on brev.dev to ensure cloud compatibility - Verify hardware configuration section is still accurate - Test perf.sh and plot_pareto.py scripts with actual deployments - Validate all links and references are correct Fixes ai-dynamo#2031

copy-pr-bot · 2025-11-11T09:32:43Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2025-11-11T09:32:47Z

👋 Hi AsadShahid04! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

coderabbitai · 2025-11-11T09:35:42Z

Walkthrough

This PR restores and expands the LLM benchmarking documentation previously lost when the examples/llm directory was deleted. It adds a "Benchmarking Tools" subsection to benchmarks/README.md and replaces a placeholder in benchmarks/llm/README.md with comprehensive documentation covering deployment options, benchmarking workflows, and troubleshooting guidance.

Changes

Cohort / File(s)	Summary
Documentation Restoration and Enhancement `benchmarks/README.md`, `benchmarks/llm/README.md`	Adds "Benchmarking Tools" subsection with framework and script details to README.md; replaces "Coming soon." placeholder with comprehensive LLM benchmarking guide including prerequisites, Kubernetes/local deployment steps, disaggregated/aggregated configurations, perf.sh and plot_pareto.py usage documentation, and troubleshooting guidance.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Documentation-only changes with no code logic or interdependencies to verify
Focus review on accuracy of deployment instructions (Kubernetes and local configurations)
Verify completeness and clarity of perf.sh and plot_pareto.py command-line examples
Confirm hardware recommendations and prerequisites are current and accurate
Check consistency of cross-references between the two README files

Poem

🐰 A guide once lost, now hops back to light,
Benchmarks and baselines, restored just right!
From Pareto plots to deployment's dance,
The tools are documented—give benchmarking a chance! ✨

Pre-merge checks

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: restoring the LLM benchmarking guide and fixing issue #2031.
Linked Issues check	✅ Passed	The PR successfully meets all objectives from issue #2031: restores the benchmarking guide, updates deprecated deployment methods, documents perf.sh and plot_pareto.py usage, and fixes the bonus perf.sh bug.
Out of Scope Changes check	✅ Passed	All changes are directly related to restoring and improving the benchmarking guide and fixing perf.sh as outlined in issue #2031; no out-of-scope modifications detected.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check	✅ Passed	The pull request provides a comprehensive description that covers all template sections: Overview (summary of changes), Details (specific file changes and improvements), Where to start (specific files mentioned: benchmarks/llm/README.md, benchmarks/README.md, perf.sh), Related Issues (uses 'Fixes #2031' action keyword).

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

benchmarks/llm/README.md (3)
213-213: Convert emphasized section headers to Markdown headings.

Lines 213 and 217 use bold emphasis (**text**) instead of proper Markdown heading syntax, which violates MD036 style guidelines. Convert these to proper headings:
- **Option 1: Kubernetes (Recommended)**
+ ### Option 1: Kubernetes (Recommended)

- **Option 2: Local**
+ ### Option 2: Local
This improves document structure and consistency with the rest of the guide.

Also applies to: 217-217

418-418: Hyphenate compound modifier "two-node".

Line 418 should use "two-node" as a hyphenated compound modifier before the noun:
- **Two node comparison**:
+ **Two-node comparison**:
374-374: Specify language identifier for code fence.

Line 374 defines a code block without a language identifier. Add bash or text to improve syntax highlighting and readability:
- ```
+ ```bash
 artifacts_root/
 ├── artifacts_0/

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d9b674b and 8738150.

📒 Files selected for processing (2)

benchmarks/README.md (1 hunks)
benchmarks/llm/README.md (1 hunks)

🧰 Additional context used

🪛 GitHub Check: Check for broken markdown links

benchmarks/llm/README.md

[failure] 563-563:
Broken link: Metrics and Visualization - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/benchmarks/llm/README.md?plain=1#L563

[failure] 562-562:
Broken link: Performance Tuning Guide - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/benchmarks/llm/README.md?plain=1#L562

[failure] 548-548:
Broken link: Performance Tuning Guide - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/benchmarks/llm/README.md?plain=1#L548

🪛 LanguageTool

benchmarks/llm/README.md

[grammar] ~418-~418: Use a hyphen to join words.
Context: ...facts-root-dir artifacts_root **Two node comparison**: bash python3 benc...

(QB_NEW_EN_HYPHEN)

🪛 markdownlint-cli2 (0.18.1)

benchmarks/llm/README.md

213-213: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

217-217: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

374-374: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and Test - dynamo

🔇 Additional comments (2)

benchmarks/llm/README.md (1)

1-563: Comprehensive LLM benchmarking guide restoration is well-structured and thorough.

This guide successfully restores and significantly expands the lost benchmarking documentation. The content is well-organized with clear sections covering:

Prerequisites with hardware specifications

Multiple deployment options (Kubernetes and local)

Single-node and multi-node disaggregated deployments

vLLM aggregated baselines

Detailed perf.sh and plot_pareto.py usage instructions

Result interpretation and metrics explanations

Troubleshooting guidance

The documentation structure flows well and provides actionable examples. The cross-reference from benchmarks/README.md correctly points to this comprehensive guide.

Verify that the three broken links (lines 548, 562, 563) are corrected or removed before merge, as flagged in the previous review comment.

benchmarks/README.md (1)

72-85: Well-structured addition of benchmarking tools index.

The new "Benchmarking Tools" section provides a clear index to different benchmarking capabilities in the directory:

Links to general framework (with reference to complete guide)

Links to LLM benchmarking scripts with Pareto plots

Links to router and profiler tools

The cross-reference to the LLM benchmarking guide (line 85) correctly directs users to the comprehensive documentation restored in benchmarks/llm/README.md. This improves documentation discoverability and user experience.

benchmarks/llm/README.md

- Replace ../../docs/guides/disagg_perf_tuning.md with ../../docs/performance/tuning.md (2 occurrences) - Replace ../../deploy/metrics/README.md with ../../deploy/metrics/k8s/README.md Fixes broken links that were pointing to non-existent files.

hhzhang16

The flow is tightly tailored for vLLM with a specific model and hardware. Have you tested with other models and backends?

benchmarks/llm/README.md

hhzhang16 · 2025-11-12T22:46:20Z

There seems to be some good overlap with this guide: https://github.com/AsadShahid04/dynamo/blob/docs/restore-llm-benchmarking-guide/docs/benchmarks/benchmarking.md

Could you look into what it could take to merge the two benchmarking guides and scripts?

…hardware note - Replace DeepSeek-R1-Distill-Llama-70B-FP8-dynamic with Qwen/Qwen3-0.6B throughout (smaller model better for examples and testing) - Change 'suboptimal results' to 'different results' for less judgmental wording Addresses review comments from PR ai-dynamo#4234

AsadShahid04 · 2025-11-13T09:18:13Z

There seems to be some good overlap with this guide: https://github.com/AsadShahid04/dynamo/blob/docs/restore-llm-benchmarking-guide/docs/benchmarks/benchmarking.md

Could you look into what it could take to merge the two benchmarking guides and scripts?

Thanks for pointing that out! I've analyzed the overlap between the two guides. Here's what I found:

Analysis

Overlap:

Both use AIPerf and have similar prerequisites
Both support Kubernetes and local deployments
Similar troubleshooting content

Key differences:

The benchmarks/llm/README.md guide (using perf.sh + plot_pareto.py) is focused on LLM benchmarking with:

Pareto frontier plots (unique to this tool)
Detailed disaggregated/aggregated deployment examples
Parallelism parameter tracking (TP, DP, prefill-TP, decode-TP)
Bash script simplicity

The docs/benchmarks/benchmarking.md guide (using benchmarks.utils) is more general with:

Server-side (in-cluster) benchmarking support
Works with any HTTP endpoint, not just LLM
Multiple plot types (not just Pareto)
More flexible Python API

Options

Option 1: Unified guide with tool selection (my recommendation)

Create a single guide that helps users choose the right tool upfront
Preserve both tools since they serve different needs
Consolidate shared content (prerequisites, troubleshooting)
Keep detailed examples in tool-specific sections

This would create a structure like:

Overview and tool selection guide
Shared prerequisites
Section for perf.sh (LLM-focused, Pareto plots)
Section for benchmarks.utils (general, server-side, multiple plots)
Common topics (result interpretation, troubleshooting)

Option 2: Migrate to single tool

Deprecate perf.sh and enhance benchmarks.utils with Pareto plots and parallelism params
Pros: Single tool to maintain
Cons: Breaking change, significant development effort

Option 3: Keep separate, add cross-references

Minimal changes, just add cross-references and a "choosing your tool" section
Pros: No breaking changes, minimal work
Cons: Still some duplication, potential user confusion

Recommendation

I think Option 1 makes the most sense because both tools are valuable for different use cases. The perf.sh tool is simpler for LLM benchmarking with Pareto analysis, while benchmarks.utils is more flexible for general endpoints and server-side benchmarking.

Next Steps

I can create a unified guide that consolidates the shared content and provides clear guidance on when to use each tool. This would involve:

Creating a unified structure at docs/benchmarks/README.md
Moving shared content to common sections
Keeping benchmarks/llm/README.md as a quick reference for perf.sh users
Adding cross-references between the guides

Question: Should we create a separate issue for this merge work and approve this PR in the meantime? The current PR restores the LLM benchmarking guide which was accidentally removed, and the merge work is a separate improvement that can be done afterward. @hhzhang16

AsadShahid04 · 2025-11-13T09:27:11Z

The flow is tightly tailored for vLLM with a specific model and hardware. Have you tested with other models and backends?

perf.sh only sends HTTP requests to /v1/chat/completions (OpenAI-compatible), so it works with any backend that exposes that API. The examples use vLLM for deployment, but the benchmarking step is the same.

Clarify that perf.sh workflow works with vLLM, SGLang, and TensorRT-LLM since they all expose the same OpenAI-compatible HTTP API. Examples use vLLM for clarity, but the same workflow applies to other backends. Addresses review comment about testing with other models and backends.

hhzhang16 · 2025-11-13T10:35:45Z

I'm okay with merging this first, but I would like to see Option 1 implemented in the medium-long term! Taking another look over the MR now

AsadShahid04 · 2025-11-13T10:38:08Z

I'm okay with merging this first, but I would like to see Option 1 implemented in the medium-long term! Taking another look over the MR now

Sounds good! Let me know if you want me to make another issue once this MR is closed. Thanks!

hhzhang16

Quick note: seeing this, could you double check?
Broken link: Metrics and Visualization - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/benchmarks/llm/README.md?plain=1#L566

hhzhang16 · 2025-11-13T12:16:41Z

I'm okay with merging this first, but I would like to see Option 1 implemented in the medium-long term! Taking another look over the MR now

Sounds good! Let me know if you want me to make another issue once this MR is closed. Thanks!

That would be amazing, thanks 🙇

- Fix broken link from deploy/metrics/k8s/README.md to docs/observability/prometheus-grafana.md - Addresses review comment from PR ai-dynamo#4234

AsadShahid04 · 2025-11-19T03:00:55Z

Quick note: seeing this, could you double check? Broken link: Metrics and Visualization - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/benchmarks/llm/README.md?plain=1#L566

Just fixed!

Who else needs to review to close this pull request?

hhzhang16 · 2025-12-02T17:41:50Z

My approval is enough, you just need to fix the CI issues!

dagil-nvidia · 2025-12-05T20:25:33Z

@BenHamm - can you take a look at this PR?

pull-request-size bot added the size/XL label Nov 11, 2025

github-actions bot added the external-contribution Pull request is from an external contributor label Nov 11, 2025

coderabbitai bot reviewed Nov 11, 2025

View reviewed changes

benchmarks/llm/README.md Outdated Show resolved Hide resolved

AsadShahid04 mentioned this pull request Nov 11, 2025

[DOCS]: Bring back benchmarking guide #2031

Open

hhzhang16 reviewed Nov 12, 2025

View reviewed changes

benchmarks/llm/README.md Outdated Show resolved Hide resolved

benchmarks/llm/README.md Outdated Show resolved Hide resolved

hhzhang16 approved these changes Nov 13, 2025

View reviewed changes

hhzhang16 reviewed Nov 13, 2025

View reviewed changes

AsadShahid04 and others added 2 commits November 18, 2025 18:43

Merge branch 'main' into docs/restore-llm-benchmarking-guide

580aba7

fix: correct broken link to metrics documentation

4a6e60b

- Fix broken link from deploy/metrics/k8s/README.md to docs/observability/prometheus-grafana.md - Addresses review comment from PR ai-dynamo#4234

[DOCS] Restore LLM benchmarking guide (Fixes #2031) #4234

Are you sure you want to change the base?

[DOCS] Restore LLM benchmarking guide (Fixes #2031) #4234

Uh oh!

Conversation

AsadShahid04 commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Testing

Reference

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Nov 11, 2025

Uh oh!

github-actions bot commented Nov 11, 2025

Uh oh!

coderabbitai bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hhzhang16 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hhzhang16 commented Nov 12, 2025

Uh oh!

AsadShahid04 commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Analysis

Options

Recommendation

Next Steps

Uh oh!

AsadShahid04 commented Nov 13, 2025

Uh oh!

hhzhang16 commented Nov 13, 2025

Uh oh!

AsadShahid04 commented Nov 13, 2025

Uh oh!

hhzhang16 left a comment

Choose a reason for hiding this comment

Uh oh!

hhzhang16 commented Nov 13, 2025

Uh oh!

AsadShahid04 commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hhzhang16 commented Dec 2, 2025

Uh oh!

dagil-nvidia commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

AsadShahid04 commented Nov 11, 2025 •

edited

Loading

coderabbitai bot commented Nov 11, 2025 •

edited

Loading

AsadShahid04 commented Nov 13, 2025 •

edited

Loading

AsadShahid04 commented Nov 19, 2025 •

edited

Loading