You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This `agg_router.yaml` is adpated from vLLM deployment [example](https://github.com/ai-dynamo/dynamo/blob/main/components/backends/vllm/deploy/agg_router.yaml). It has following customizations
23
-
- Deployed `Qwen/Qwen2.5-1.5B-Instruct` model
23
+
- Deployed `Qwen/Qwen2.5-1.5B-Instruct` model
24
24
- Use KV cache based routing in frontend deployment `--router-mode kv`
25
25
- Mounted a local cache folder `/YOUR/LOCAL/CACHE/FOLDER` for model artifacts reuse
26
26
- Created 4 replicas for this model deployment by setting `replicas: 4`
27
-
- Added `debug` flag environment variable for observability
27
+
- Added `debug` flag environment variable for observability
28
28
Create a K8S secret with your Huggingface token and then deploy the models
0 commit comments