Skip to content
Merged
Prev Previous commit
Add a new column
  • Loading branch information
tanmayv25 committed Nov 18, 2025
commit f7881a08b413314b340709b2cf4a3d746eba51a4
20 changes: 10 additions & 10 deletions recipes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,16 @@ Production-tested Kubernetes deployment recipes for LLM inference using NVIDIA D

| Model | Framework | Mode | GPUs | Deployment | Benchmark Recipe | Notes |GAIE integration |
|-------|-----------|------|------|------------|------------------|-------|------------------|
| **[Llama-3-70B](llama-3-70b/vllm/agg/)** | vLLM | Aggregated | 4x H100/H200 | βœ… | βœ… | FP8 dynamic quantization | βœ… |
| **[Llama-3-70B](llama-3-70b/vllm/disagg-single-node/)** | vLLM | Disagg (Single-Node) | 8x H100/H200 | βœ… | βœ… | Prefill + Decode separation |
| **[Llama-3-70B](llama-3-70b/vllm/disagg-multi-node/)** | vLLM | Disagg (Multi-Node) | 16x H100/H200 | βœ… | βœ… | 2 nodes, 8 GPUs each |
| **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GPU | βœ… | βœ… | FP8 quantization |
| **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | 8x GPU | βœ… | βœ… | Prefill + Decode separation |
| **[GPT-OSS-120B](gpt-oss-120b/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GB200 | βœ… | βœ… | Blackwell only, WideEP |
| **[GPT-OSS-120B](gpt-oss-120b/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | TBD | ❌ | ❌ | Engine configs only, no K8s manifest |
| **[DeepSeek-R1](deepseek-r1/sglang/disagg-8gpu/)** | SGLang | Disagg WideEP | 8x H200 | βœ… | ❌ | Benchmark recipe pending |
| **[DeepSeek-R1](deepseek-r1/sglang/disagg-16gpu/)** | SGLang | Disagg WideEP | 16x H200 | βœ… | ❌ | Benchmark recipe pending |
| **[DeepSeek-R1](deepseek-r1/trtllm/disagg/wide_ep/gb200/)** | TensorRT-LLM | Disagg WideEP (GB200) | 32+4 GB200 | βœ… | βœ… | Multi-node: 8 decode + 1 prefill nodes |
| **[Llama-3-70B](llama-3-70b/vllm/agg/)** | vLLM | Aggregated | 4x H100/H200 | βœ… | βœ… | FP8 dynamic quantization | βœ… | ❌ |
| **[Llama-3-70B](llama-3-70b/vllm/disagg-single-node/)** | vLLM | Disagg (Single-Node) | 8x H100/H200 | βœ… | βœ… | Prefill + Decode separation | ❌ |
| **[Llama-3-70B](llama-3-70b/vllm/disagg-multi-node/)** | vLLM | Disagg (Multi-Node) | 16x H100/H200 | βœ… | βœ… | 2 nodes, 8 GPUs each | ❌ |
| **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GPU | βœ… | βœ… | FP8 quantization | ❌ |
| **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | 8x GPU | βœ… | βœ… | Prefill + Decode separation | ❌ |
| **[GPT-OSS-120B](gpt-oss-120b/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GB200 | βœ… | βœ… | Blackwell only, WideEP | ❌ |
| **[GPT-OSS-120B](gpt-oss-120b/trtllm/disagg/)** | TensorRT-LLM | Disaggregated | TBD | ❌ | ❌ | Engine configs only, no K8s manifest | ❌ |
| **[DeepSeek-R1](deepseek-r1/sglang/disagg-8gpu/)** | SGLang | Disagg WideEP | 8x H200 | βœ… | ❌ | Benchmark recipe pending | ❌ |
| **[DeepSeek-R1](deepseek-r1/sglang/disagg-16gpu/)** | SGLang | Disagg WideEP | 16x H200 | βœ… | ❌ | Benchmark recipe pending | ❌ |
| **[DeepSeek-R1](deepseek-r1/trtllm/disagg/wide_ep/gb200/)** | TensorRT-LLM | Disagg WideEP (GB200) | 32+4 GB200 | βœ… | βœ… |Multi-node: 8 decode + 1 prefill nodes | ❌ |

**Legend:**
- **Deployment**: βœ… = Complete `deploy.yaml` manifest available | ❌ = Missing or incomplete
Expand Down
Loading