Skip to content
Merged
Prev Previous commit
Next Next commit
docs: add hyperlinks to recipe table for easier navigation
  • Loading branch information
BenHamm authored and tanmayv25 committed Nov 18, 2025
commit 037c3bd884d4c3cd7ccf87fd9c8ba62deaf5982b
20 changes: 10 additions & 10 deletions recipes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,16 @@ Production-tested Kubernetes deployment recipes for LLM inference using NVIDIA D

| Model | Framework | Mode | GPUs | Deployment | Benchmark Recipe | Notes |
|-------|-----------|------|------|------------|------------------|-------|
| **Llama-3-70B** | vLLM | Aggregated | 4x H100/H200 | βœ… | βœ… | FP8 dynamic quantization |
| **Llama-3-70B** | vLLM | Disagg (Single-Node) | 8x H100/H200 | βœ… | βœ… | Prefill + Decode separation |
| **Llama-3-70B** | vLLM | Disagg (Multi-Node) | 16x H100/H200 | βœ… | βœ… | 2 nodes, 8 GPUs each |
| **Qwen3-32B-FP8** | TensorRT-LLM | Aggregated | 4x GPU | βœ… | βœ… | FP8 quantization |
| **Qwen3-32B-FP8** | TensorRT-LLM | Disaggregated | 8x GPU | βœ… | βœ… | Prefill + Decode separation |
| **GPT-OSS-120B** | TensorRT-LLM | Aggregated | 4x GB200 | βœ… | βœ… | Blackwell only, WideEP |
| **GPT-OSS-120B** | TensorRT-LLM | Disaggregated | TBD | ❌ | ❌ | Engine configs only, no K8s manifest |
| **DeepSeek-R1** | SGLang | Disagg WideEP | 8x H200 | βœ… | ❌ | Benchmark recipe pending |
| **DeepSeek-R1** | SGLang | Disagg WideEP | 16x H200 | βœ… | ❌ | Benchmark recipe pending |
| **DeepSeek-R1** | TensorRT-LLM | Disagg WideEP (GB200) | 32+4 GB200 | βœ… | βœ… | Multi-node: 8 decode + 1 prefill nodes |
| **[Llama-3-70B](llama-3-70b/vllm/agg/)** | vLLM | Aggregated | 4x H100/H200 | βœ… | βœ… | FP8 dynamic quantization |
| **[Llama-3-70B](llama-3-70b/vllm/disagg-single-node/)** | vLLM | Disagg (Single-Node) | 8x H100/H200 | βœ… | βœ… | Prefill + Decode separation |
| **[Llama-3-70B](llama-3-70b/vllm/disagg-multi-node/)** | vLLM | Disagg (Multi-Node) | 16x H100/H200 | βœ… | βœ… | 2 nodes, 8 GPUs each |
| **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GPU | βœ… | βœ… | FP8 quantization |
| **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | 8x GPU | βœ… | βœ… | Prefill + Decode separation |
| **[GPT-OSS-120B](gpt-oss-120b/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GB200 | βœ… | βœ… | Blackwell only, WideEP |
| **[GPT-OSS-120B](gpt-oss-120b/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | TBD | ❌ | ❌ | Engine configs only, no K8s manifest |
| **[DeepSeek-R1](deepseek-r1/sglang/disagg-8gpu/)** | SGLang | Disagg WideEP | 8x H200 | βœ… | ❌ | Benchmark recipe pending |
| **[DeepSeek-R1](deepseek-r1/sglang/disagg-16gpu/)** | SGLang | Disagg WideEP | 16x H200 | βœ… | ❌ | Benchmark recipe pending |
| **[DeepSeek-R1](deepseek-r1/trtllm/disagg/wide_ep/gb200/)** | TensorRT-LLM | Disagg WideEP (GB200) | 32+4 GB200 | βœ… | βœ… | Multi-node: 8 decode + 1 prefill nodes |

**Legend:**
- **Deployment**: βœ… = Complete `deploy.yaml` manifest available | ❌ = Missing or incomplete
Expand Down