Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
docs: Add note that benchmarking workflow works with all backends
Clarify that perf.sh workflow works with vLLM, SGLang, and TensorRT-LLM
since they all expose the same OpenAI-compatible HTTP API. Examples use
vLLM for clarity, but the same workflow applies to other backends.

Addresses review comment about testing with other models and backends.
  • Loading branch information
AsadShahid04 committed Nov 13, 2025
commit 2e65deb87fd9d8cf15208a9d5208d3a414ab087f
3 changes: 3 additions & 0 deletions benchmarks/llm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@

This guide provides detailed steps on benchmarking Large Language Models (LLMs) using the `perf.sh` and `plot_pareto.py` scripts in single and multi-node configurations. These scripts use [AIPerf](https://github.com/triton-inference-server/perf_analyzer) to collect performance metrics and generate Pareto frontier visualizations.

> [!Note]
> This workflow works with all Dynamo backends (vLLM, SGLang, TensorRT-LLM) since they all expose the same OpenAI-compatible HTTP API. The examples in this guide use vLLM for clarity, but you can benchmark SGLang or TensorRT-LLM deployments using the same workflow—just deploy your workers with `python -m dynamo.sglang` or `python -m dynamo.trtllm` instead.

## Overview

The benchmarking tools in this directory help you:
Expand Down Expand Up @@ -560,4 +563,4 @@
- **[AIPerf Documentation](https://github.com/triton-inference-server/perf_analyzer/blob/main/genai-perf/docs/tutorial.md)** - Learn more about AIPerf benchmarking
- **[Dynamo Benchmarking Guide](../../docs/benchmarks/benchmarking.md)** - General benchmarking framework documentation
- **[Performance Tuning Guide](../../docs/performance/tuning.md)** - Optimize your deployment configuration
- **[Metrics and Visualization](../../deploy/metrics/k8s/README.md)** - Monitor deployments with Prometheus and Grafana

Check failure on line 566 in benchmarks/llm/README.md

View workflow job for this annotation

GitHub Actions / Check for broken markdown links

Broken link: [Metrics and Visualization](../../deploy/metrics/k8s/README.md) - View: https://github.com/ai-dynamo/dynamo/blob/HEAD/benchmarks/llm/README.md?plain=1#L566
Loading