-
Notifications
You must be signed in to change notification settings - Fork 738
[DOCS] Restore LLM benchmarking guide (Fixes #2031) #4234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[DOCS] Restore LLM benchmarking guide (Fixes #2031) #4234
Conversation
- Restore benchmarking guide for perf.sh and plot_pareto.py scripts - Replace outdated dynamo serve references with current deployment methods - Add Kubernetes deployment examples using DynamoGraphDeployment - Add local deployment examples using python -m dynamo.frontend + workers - Document script usage, command-line options, and result interpretation - Add comprehensive examples for single-node and multi-node benchmarking - Update benchmarks/README.md to reference the new LLM benchmarking guide - Include troubleshooting section and additional resources TODO - Still needs to be done: - Test all commands locally to verify they work as documented - Test deployment examples on brev.dev to ensure cloud compatibility - Verify hardware configuration section is still accurate - Test perf.sh and plot_pareto.py scripts with actual deployments - Validate all links and references are correct Fixes ai-dynamo#2031
|
👋 Hi AsadShahid04! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughThis PR restores and expands the LLM benchmarking documentation previously lost when the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes
Poem
Pre-merge checks✅ Passed checks (5 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
benchmarks/llm/README.md (3)
213-213: Convert emphasized section headers to Markdown headings.Lines 213 and 217 use bold emphasis (
**text**) instead of proper Markdown heading syntax, which violates MD036 style guidelines. Convert these to proper headings:- **Option 1: Kubernetes (Recommended)** + ### Option 1: Kubernetes (Recommended) - **Option 2: Local** + ### Option 2: LocalThis improves document structure and consistency with the rest of the guide.
Also applies to: 217-217
418-418: Hyphenate compound modifier "two-node".Line 418 should use "two-node" as a hyphenated compound modifier before the noun:
- **Two node comparison**: + **Two-node comparison**:
374-374: Specify language identifier for code fence.Line 374 defines a code block without a language identifier. Add
bashortextto improve syntax highlighting and readability:- ``` + ```bash artifacts_root/ ├── artifacts_0/
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
benchmarks/README.md(1 hunks)benchmarks/llm/README.md(1 hunks)
🧰 Additional context used
🪛 GitHub Check: Check for broken markdown links
benchmarks/llm/README.md
[failure] 563-563:
Broken link: Metrics and Visualization - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/benchmarks/llm/README.md?plain=1#L563
[failure] 562-562:
Broken link: Performance Tuning Guide - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/benchmarks/llm/README.md?plain=1#L562
[failure] 548-548:
Broken link: Performance Tuning Guide - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/benchmarks/llm/README.md?plain=1#L548
🪛 LanguageTool
benchmarks/llm/README.md
[grammar] ~418-~418: Use a hyphen to join words.
Context: ...facts-root-dir artifacts_root **Two node comparison**: bash python3 benc...
(QB_NEW_EN_HYPHEN)
🪛 markdownlint-cli2 (0.18.1)
benchmarks/llm/README.md
213-213: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
217-217: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
374-374: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
benchmarks/llm/README.md (1)
1-563: Comprehensive LLM benchmarking guide restoration is well-structured and thorough.This guide successfully restores and significantly expands the lost benchmarking documentation. The content is well-organized with clear sections covering:
- Prerequisites with hardware specifications
- Multiple deployment options (Kubernetes and local)
- Single-node and multi-node disaggregated deployments
- vLLM aggregated baselines
- Detailed perf.sh and plot_pareto.py usage instructions
- Result interpretation and metrics explanations
- Troubleshooting guidance
The documentation structure flows well and provides actionable examples. The cross-reference from benchmarks/README.md correctly points to this comprehensive guide.
Verify that the three broken links (lines 548, 562, 563) are corrected or removed before merge, as flagged in the previous review comment.
benchmarks/README.md (1)
72-85: Well-structured addition of benchmarking tools index.The new "Benchmarking Tools" section provides a clear index to different benchmarking capabilities in the directory:
- Links to general framework (with reference to complete guide)
- Links to LLM benchmarking scripts with Pareto plots
- Links to router and profiler tools
The cross-reference to the LLM benchmarking guide (line 85) correctly directs users to the comprehensive documentation restored in benchmarks/llm/README.md. This improves documentation discoverability and user experience.
- Replace ../../docs/guides/disagg_perf_tuning.md with ../../docs/performance/tuning.md (2 occurrences) - Replace ../../deploy/metrics/README.md with ../../deploy/metrics/k8s/README.md Fixes broken links that were pointing to non-existent files.
hhzhang16
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The flow is tightly tailored for vLLM with a specific model and hardware. Have you tested with other models and backends?
|
There seems to be some good overlap with this guide: https://github.com/AsadShahid04/dynamo/blob/docs/restore-llm-benchmarking-guide/docs/benchmarks/benchmarking.md Could you look into what it could take to merge the two benchmarking guides and scripts? |
…hardware note - Replace DeepSeek-R1-Distill-Llama-70B-FP8-dynamic with Qwen/Qwen3-0.6B throughout (smaller model better for examples and testing) - Change 'suboptimal results' to 'different results' for less judgmental wording Addresses review comments from PR ai-dynamo#4234
Thanks for pointing that out! I've analyzed the overlap between the two guides. Here's what I found: AnalysisOverlap:
Key differences: The
The
OptionsOption 1: Unified guide with tool selection (my recommendation)
This would create a structure like:
Option 2: Migrate to single tool
Option 3: Keep separate, add cross-references
RecommendationI think Option 1 makes the most sense because both tools are valuable for different use cases. The perf.sh tool is simpler for LLM benchmarking with Pareto analysis, while benchmarks.utils is more flexible for general endpoints and server-side benchmarking. Next StepsI can create a unified guide that consolidates the shared content and provides clear guidance on when to use each tool. This would involve:
Question: Should we create a separate issue for this merge work and approve this PR in the meantime? The current PR restores the LLM benchmarking guide which was accidentally removed, and the merge work is a separate improvement that can be done afterward. @hhzhang16 |
|
Clarify that perf.sh workflow works with vLLM, SGLang, and TensorRT-LLM since they all expose the same OpenAI-compatible HTTP API. Examples use vLLM for clarity, but the same workflow applies to other backends. Addresses review comment about testing with other models and backends.
|
I'm okay with merging this first, but I would like to see Option 1 implemented in the medium-long term! Taking another look over the MR now |
Sounds good! Let me know if you want me to make another issue once this MR is closed. Thanks! |
hhzhang16
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick note: seeing this, could you double check?
Broken link: Metrics and Visualization - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/benchmarks/llm/README.md?plain=1#L566
That would be amazing, thanks 🙇 |
- Fix broken link from deploy/metrics/k8s/README.md to docs/observability/prometheus-grafana.md - Addresses review comment from PR ai-dynamo#4234
Just fixed! Who else needs to review to close this pull request? |
|
My approval is enough, you just need to fix the CI issues! |
|
@BenHamm - can you take a look at this PR? |
Summary
This PR restores the lost benchmarking guide for
benchmarks/llmscripts, addressing issue #2031. The guide was accidentally removed when theexamples/llmdirectory was deleted in PR #1899.Changes
Restored comprehensive benchmarking guide at
benchmarks/llm/README.mdperf.shandplot_pareto.pyscriptsdynamo servewith current Kubernetes and local deployment approaches)Updated main benchmarks README at
benchmarks/README.mdFixed bug in
perf.sh(bonus fix)-concurrency1/,-concurrency2/, etc.) as expected byplot_pareto.pyTesting
perf.shcreates correct directory structureplot_pareto.pycan parse and generate plots from resultsReference
The original guide content was retrieved from commit
35c56065bb490e12bba84a6abf8107dc1f2c7529and updated with current deployment methods.Fixes #2031
@hhzhang16 @athreesh
Summary by CodeRabbit