docs: Add note that benchmarking workflow works with all backends

Clarify that perf.sh workflow works with vLLM, SGLang, and TensorRT-LLM since they all expose the same OpenAI-compatible HTTP API. Examples use vLLM for clarity, but the same workflow applies to other backends. Addresses review comment about testing with other models and backends.
ai-dynamo · AsadShahid04 · Nov 7, 2025 · Nov 12, 2025 · Nov 13, 2025 · Nov 13, 2025
commit 2e65deb87fd9d8cf15208a9d5208d3a414ab087f
diff --git a/benchmarks/llm/README.md b/benchmarks/llm/README.md
@@ -19,6 +19,9 @@
 
 This guide provides detailed steps on benchmarking Large Language Models (LLMs) using the `perf.sh` and `plot_pareto.py` scripts in single and multi-node configurations. These scripts use [AIPerf](https://github.com/triton-inference-server/perf_analyzer) to collect performance metrics and generate Pareto frontier visualizations.
 
+> [!Note]
+> This workflow works with all Dynamo backends (vLLM, SGLang, TensorRT-LLM) since they all expose the same OpenAI-compatible HTTP API. The examples in this guide use vLLM for clarity, but you can benchmark SGLang or TensorRT-LLM deployments using the same workflow—just deploy your workers with `python -m dynamo.sglang` or `python -m dynamo.trtllm` instead.
+
 ## Overview
 
 The benchmarking tools in this directory help you:
@@ -560,4 +563,4 @@
 - **[AIPerf Documentation](https://github.com/triton-inference-server/perf_analyzer/blob/main/genai-perf/docs/tutorial.md)** - Learn more about AIPerf benchmarking
 - **[Dynamo Benchmarking Guide](../../docs/benchmarks/benchmarking.md)** - General benchmarking framework documentation
 - **[Performance Tuning Guide](../../docs/performance/tuning.md)** - Optimize your deployment configuration
 - **[Metrics and Visualization](../../deploy/metrics/k8s/README.md)** - Monitor deployments with Prometheus and Grafana