Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
doc: add instruction to deploy model with inference gateway
  • Loading branch information
biswapanda committed Aug 14, 2025
commit e822e24de5b54ffa7cce2f64cde21b631ecffd89
28 changes: 27 additions & 1 deletion components/backends/sglang/deploy/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,8 +103,34 @@ args:
```

### 3. Deploy

Use the following command to deploy the deployment file.

First, create a secret for the HuggingFace token.
```bash
export HF_TOKEN=your_hf_token
kubectl create secret generic hf-token-secret \
--from-literal=HF_TOKEN=${HF_TOKEN} \
-n ${NAMESPACE}
```

Then, deploy the model using the deployment file.

```bash
kubectl apply -f <your-template>.yaml
export DEPLOYMENT_FILE=agg.yaml
kubectl apply -f $DEPLOYMENT_FILE -n ${NAMESPACE}
```

### 4. Using Custom Dynamo Frameworks Image for SGLang

To use a custom dynamo frameworks image for SGLang, you can update the deployment file using yq:

```bash
export DEPLOYMENT_FILE=agg.yaml
export FRAMEWORK_RUNTIME_IMAGE=<sglang-image>

yq '.spec.services.[].extraPodSpec.mainContainer.image = env(FRAMEWORK_RUNTIME_IMAGE)' $DEPLOYMENT_FILE > $DEPLOYMENT_FILE.generated
kubectl apply -f $DEPLOYMENT_FILE.generated -n $NAMESPACE
```

## Model Configuration
Expand Down
30 changes: 28 additions & 2 deletions components/backends/trtllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -214,15 +214,41 @@ For Kubernetes deployment, YAML manifests are provided in the `deploy/` director

#### Deploy to Kubernetes

Example with disagg:
See the [Create Deployment Guide](../../../docs/guides/dynamo_deploy/create_deployment.md) to learn how to deploy the deployment file.

First, create a secret for the HuggingFace token.
```bash
export HF_TOKEN=your_hf_token
kubectl create secret generic hf-token-secret \
--from-literal=HF_TOKEN=${HF_TOKEN} \
-n ${NAMESPACE}
```

Then, deploy the model using the deployment file.

Export the NAMESPACE you used in your Dynamo Cloud Installation.

```bash
cd dynamo
cd components/backends/trtllm/deploy
kubectl apply -f disagg.yaml -n $NAMESPACE
export DEPLOYMENT_FILE=agg.yaml
kubectl apply -f $DEPLOYMENT_FILE -n $NAMESPACE
```

#### Using Custom Dynamo Frameworks Image for TensorRT-LLM

To use a custom dynamo frameworks image for TensorRT-LLM, you can update the deployment file using yq:

```bash
export DEPLOYMENT_FILE=agg.yaml
export FRAMEWORK_RUNTIME_IMAGE=<trtllm-image>

yq '.spec.services.[].extraPodSpec.mainContainer.image = env(FRAMEWORK_RUNTIME_IMAGE)' $DEPLOYMENT_FILE > $DEPLOYMENT_FILE.generated
kubectl apply -f $DEPLOYMENT_FILE.generated -n $NAMESPACE
```

#### Configuration Options

To change `DYN_LOG` level, edit the yaml file by adding

```yaml
Expand Down
34 changes: 30 additions & 4 deletions components/backends/vllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,15 +180,41 @@ For Kubernetes deployment, YAML manifests are provided in the `deploy/` director

#### Deploy to Kubernetes

Example with disagg:
Use the following command to deploy the deployment file.

First, create a secret for the HuggingFace token.
```bash
export HF_TOKEN=your_hf_token
kubectl create secret generic hf-token-secret \
--from-literal=HF_TOKEN=${HF_TOKEN} \
-n ${NAMESPACE}
```

Then, deploy the model using the deployment file.

Export the NAMESPACE you used in your Dynamo Cloud Installation.

```bash
cd dynamo
cd components/backends/vllm/deploy
kubectl apply -f disagg.yaml -n $NAMESPACE
cd <dynamo-source-root>/components/backends/vllm/deploy
export DEPLOYMENT_FILE=agg.yaml

kubectl apply -f $DEPLOYMENT_FILE -n $NAMESPACE
```

#### Using Custom Dynamo Frameworks Image for vLLM

To use a custom dynamo frameworks image for vLLM, you can update the deployment file using yq:

```bash
export DEPLOYMENT_FILE=agg.yaml
export FRAMEWORK_RUNTIME_IMAGE=<vllm-image>

yq '.spec.services.[].extraPodSpec.mainContainer.image = env(FRAMEWORK_RUNTIME_IMAGE)' $DEPLOYMENT_FILE > $DEPLOYMENT_FILE.generated
kubectl apply -f $DEPLOYMENT_FILE.generated -n $NAMESPACE
```

#### Configuration Options

To change `DYN_LOG` level, edit the yaml file by adding

```yaml
Expand Down
12 changes: 11 additions & 1 deletion deploy/inference-gateway/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,17 @@ kubectl get gateway inference-gateway -n my-model
# inference-gateway kgateway x.x.x.x True 1m
```

3. **Install dynamo model and dynamo gaie helm chart**
3. **Deploy model**

Follow the steps in [model deployment](../../components/backends/vllm/README.md) to deploy `Qwen/Qwen3-0.6B` model in aggregate mode using [agg.yaml](../../components/backends/vllm/deploy/agg.yaml) in `my-model` kubernetes namespace.

sample commands to deploy model:
```bash
cd <dynamo-source-root>/components/backends/vllm/deploy
kubectl apply -f agg.yaml -n my-model
```

4. **Install dynamo gaie helm chart**

The Inference Gateway is configured through the `inference-gateway-resources.yaml` file.

Expand Down