Skip to content
Prev Previous commit
Add back GAIE integration
Signed-off-by: Anna Tchernych <[email protected]>
  • Loading branch information
atchernych committed Nov 17, 2025
commit 9410f19d82b7cfdace2bd7d677b782f767fef5c3
18 changes: 15 additions & 3 deletions recipes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ Production-tested Kubernetes deployment recipes for LLM inference using NVIDIA D

## Available Recipes

| Model | Framework | Mode | GPUs | Deployment | Benchmark Recipe | Notes |
|-------|-----------|------|------|------------|------------------|-------|
| **[Llama-3-70B](llama-3-70b/vllm/agg/)** | vLLM | Aggregated | 4x H100/H200 | ✅ | ✅ | FP8 dynamic quantization |
| Model | Framework | Mode | GPUs | Deployment | Benchmark Recipe | Notes | GAIE integration |
|-------|-----------|------|------|------------|------------------|-------|------------------|
| **[Llama-3-70B](llama-3-70b/vllm/agg/)** | vLLM | Aggregated | 4x H100/H200 | ✅ | ✅ | FP8 dynamic quantization | ✅ |
| **[Llama-3-70B](llama-3-70b/vllm/disagg-single-node/)** | vLLM | Disagg (Single-Node) | 8x H100/H200 | ✅ | ✅ | Prefill + Decode separation |
| **[Llama-3-70B](llama-3-70b/vllm/disagg-multi-node/)** | vLLM | Disagg (Multi-Node) | 16x H100/H200 | ✅ | ✅ | 2 nodes, 8 GPUs each |
| **[Qwen3-32B-FP8](qwen3-32b-fp8/trtllm/agg/)** | TensorRT-LLM | Aggregated | 4x GPU | ✅ | ✅ | FP8 quantization |
Expand Down Expand Up @@ -147,6 +147,18 @@ kubectl logs -f job/<benchmark-job-name> -n ${NAMESPACE}
kubectl logs job/<benchmark-job-name> -n ${NAMESPACE} | tail -50
```

**Step 4: GAIE Integration (Optional)**

For Llama-3-70B with vLLM (Aggregated), an example of integration with the Inference Gateway is provided.

Follow to Follow [Deploy Inference Gateway Section 2](../deploy/inference-gateway/README.md#2-deploy-inference-gateway) to install GAIE. Then apply manifests.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix duplicate word "Follow".

Line 154 contains "Follow to Follow"—remove the duplicate.

-Follow to Follow [Deploy Inference Gateway Section 2](../deploy/inference-gateway/README.md#2-deploy-inference-gateway) to install GAIE.
+Follow [Deploy Inference Gateway Section 2](../deploy/inference-gateway/README.md#2-deploy-inference-gateway) to install GAIE.
🤖 Prompt for AI Agents
In recipes/README.md around line 154, the sentence contains the duplicated word
"Follow to Follow"; remove the extra "Follow" so the line reads correctly (e.g.,
"Follow to Deploy Inference Gateway Section 2..." or better "Follow Deploy
Inference Gateway Section 2...") ensuring grammar and the link remain intact.


```bash
export DEPLOY_PATH=llama-3-70b/vllm/agg/
#DEPLOY_PATH=<model>/<framework>/<mode>/
kubectl apply -R -f "$DEPLOY_PATH/gaie/k8s-manifests" -n "$NAMESPACE"
```

## Example Deployments

### Llama-3-70B with vLLM (Aggregated)
Expand Down
Loading