doc: add instruction to deploy model with inference gateway

ai-dynamo · biswapanda · Aug 14, 2025 · Aug 3, 2025 · Aug 4, 2025 · Aug 4, 2025
commit e822e24de5b54ffa7cce2f64cde21b631ecffd89
diff --git a/components/backends/sglang/deploy/README.md b/components/backends/sglang/deploy/README.md
@@ -103,8 +103,34 @@ args:
 ```
 
 ### 3. Deploy
+
+Use the following command to deploy the deployment file.
+
+First, create a secret for the HuggingFace token.
+```bash
+export HF_TOKEN=your_hf_token
+kubectl create secret generic hf-token-secret \
+  --from-literal=HF_TOKEN=${HF_TOKEN} \
+  -n ${NAMESPACE}
+```
+
+Then, deploy the model using the deployment file.
+
 ```bash
-kubectl apply -f <your-template>.yaml
+export DEPLOYMENT_FILE=agg.yaml
+kubectl apply -f $DEPLOYMENT_FILE -n ${NAMESPACE}
+```
+
+### 4. Using Custom Dynamo Frameworks Image for SGLang
+
+To use a custom dynamo frameworks image for SGLang, you can update the deployment file using yq:
+
+```bash
+export DEPLOYMENT_FILE=agg.yaml
+export FRAMEWORK_RUNTIME_IMAGE=<sglang-image>
+
+yq '.spec.services.[].extraPodSpec.mainContainer.image = env(FRAMEWORK_RUNTIME_IMAGE)' $DEPLOYMENT_FILE  > $DEPLOYMENT_FILE.generated
+kubectl apply -f $DEPLOYMENT_FILE.generated -n $NAMESPACE
 ```
 
 ## Model Configuration

diff --git a/components/backends/trtllm/README.md b/components/backends/trtllm/README.md
@@ -214,15 +214,41 @@ For Kubernetes deployment, YAML manifests are provided in the `deploy/` director
 
 #### Deploy to Kubernetes
 
-Example with disagg:
+See the [Create Deployment Guide](../../../docs/guides/dynamo_deploy/create_deployment.md) to learn how to deploy the deployment file.
+
+First, create a secret for the HuggingFace token.
+```bash
+export HF_TOKEN=your_hf_token
+kubectl create secret generic hf-token-secret \
+  --from-literal=HF_TOKEN=${HF_TOKEN} \
+  -n ${NAMESPACE}
+```
+
+Then, deploy the model using the deployment file.
+
 Export the NAMESPACE  you used in your Dynamo Cloud Installation.
 
 ```bash
 cd dynamo
 cd components/backends/trtllm/deploy
-kubectl apply -f disagg.yaml -n $NAMESPACE
+export DEPLOYMENT_FILE=agg.yaml
+kubectl apply -f $DEPLOYMENT_FILE -n $NAMESPACE
+```
+
+#### Using Custom Dynamo Frameworks Image for TensorRT-LLM
+
+To use a custom dynamo frameworks image for TensorRT-LLM, you can update the deployment file using yq:
+
+```bash
+export DEPLOYMENT_FILE=agg.yaml
+export FRAMEWORK_RUNTIME_IMAGE=<trtllm-image>
+
+yq '.spec.services.[].extraPodSpec.mainContainer.image = env(FRAMEWORK_RUNTIME_IMAGE)' $DEPLOYMENT_FILE  > $DEPLOYMENT_FILE.generated
+kubectl apply -f $DEPLOYMENT_FILE.generated -n $NAMESPACE
 ```
 
+#### Configuration Options
+
 To change `DYN_LOG` level, edit the yaml file by adding
 
 ```yaml

diff --git a/components/backends/vllm/README.md b/components/backends/vllm/README.md
@@ -180,15 +180,41 @@ For Kubernetes deployment, YAML manifests are provided in the `deploy/` director
 
 #### Deploy to Kubernetes
 
-Example with disagg:
+Use the following command to deploy the deployment file.
+
+First, create a secret for the HuggingFace token.
+```bash
+export HF_TOKEN=your_hf_token
+kubectl create secret generic hf-token-secret \
+  --from-literal=HF_TOKEN=${HF_TOKEN} \
+  -n ${NAMESPACE}
+```
+
+Then, deploy the model using the deployment file.
+
 Export the NAMESPACE  you used in your Dynamo Cloud Installation.
 
 ```bash
-cd dynamo
-cd components/backends/vllm/deploy
-kubectl apply -f disagg.yaml -n $NAMESPACE
+cd <dynamo-source-root>/components/backends/vllm/deploy
+export DEPLOYMENT_FILE=agg.yaml
+
+kubectl apply -f $DEPLOYMENT_FILE -n $NAMESPACE
 ```
 
+#### Using Custom Dynamo Frameworks Image for vLLM
+
+To use a custom dynamo frameworks image for vLLM, you can update the deployment file using yq:
+
+```bash
+export DEPLOYMENT_FILE=agg.yaml
+export FRAMEWORK_RUNTIME_IMAGE=<vllm-image>
+
+yq '.spec.services.[].extraPodSpec.mainContainer.image = env(FRAMEWORK_RUNTIME_IMAGE)' $DEPLOYMENT_FILE  > $DEPLOYMENT_FILE.generated
+kubectl apply -f $DEPLOYMENT_FILE.generated -n $NAMESPACE
+```
+
+#### Configuration Options
+
 To change `DYN_LOG` level, edit the yaml file by adding
 
 ```yaml

@@ -70,7 +70,17 @@ kubectl get gateway inference-gateway -n my-model
 # inference-gateway   kgateway   x.x.x.x   True         1m
 ```
 
-3. **Install dynamo model and dynamo gaie helm chart**
+3. **Deploy model**
+
+Follow the steps in [model deployment](../../components/backends/vllm/README.md) to deploy `Qwen/Qwen3-0.6B` model in aggregate mode using [agg.yaml](../../components/backends/vllm/deploy/agg.yaml) in `my-model` kubernetes namespace.
+
+sample commands to deploy model:
+```bash
+cd <dynamo-source-root>/components/backends/vllm/deploy
+kubectl apply -f agg.yaml -n my-model
+```
+
+4. **Install dynamo gaie helm chart**
 
 The Inference Gateway is configured through the `inference-gateway-resources.yaml` file.