-
Notifications
You must be signed in to change notification settings - Fork 753
docs: Add AWS ECS deployment example for Dynamo vLLM #2415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds ECS deployment assets for Dynamo vLLM: a README guide and three ECS task definitions for ETCD/NATS (Fargate), frontend (EC2 with GPU), and prefill worker (EC2 with GPU). Instructions include cluster setup, task configuration, environment variables, runtime commands, deployment steps, and basic endpoint testing. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Frontend (ECS EC2 GPU)
participant ETCD
participant NATS
participant PrefillWorker (ECS EC2 GPU)
participant vLLM Server
User->>Frontend (ECS EC2 GPU): HTTP /v1/chat/completions
Frontend (ECS EC2 GPU)->>ETCD: Read/Write KV (coordination)
Frontend (ECS EC2 GPU)->>NATS: Publish work request
NATS-->>PrefillWorker (ECS EC2 GPU): Deliver request
PrefillWorker (ECS EC2 GPU)->>vLLM Server: Generate tokens (prefill/decoding)
PrefillWorker (ECS EC2 GPU)->>NATS: Send partial/final results
NATS-->>Frontend (ECS EC2 GPU): Results stream
Frontend (ECS EC2 GPU)-->>User: Response payload
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 15
🧹 Nitpick comments (12)
examples/deployments/ECS/task_definition_etcd_nats.json (2)
25-30: Security: etcd runs with no auth; restrict access or enable auth
ALLOW_NONE_AUTHENTICATION=YESis fine for a quick sample but risky in real environments. At minimum, restrict the task’s security group to only the ECS cluster subnets and the frontend/prefill EC2 instances. Prefer enabling proper auth for etcd.
38-44: Avoid hardcoding region and log group; parameterize or documentBoth containers hardcode
awslogs-regiontous-east-2and use fixed log group names. If users deploy elsewhere, this breaks expectations. Consider parameterizing via templates or clearly noting required edits.Also applies to: 85-91
examples/deployments/ECS/task_definition_frontend.json (3)
25-27: Prefer a single interpreter and avoid shelling two long-lived processes in one containerYou’re backgrounding the frontend and foregrounding vLLM via
sh -c. That complicates signal handling, shutdown, and health. Prefer:
- Split into two containers in one task (sidecar pattern), or
- Use a proper process supervisor (e.g., dumb-init, s6) and consistent interpreter.
At minimum, make the interpreter consistent.
- "cd components/backends/vllm && python -m dynamo.frontend --router-mode kv & python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --enforce-eager" + "cd components/backends/vllm && python3 -m dynamo.frontend --router-mode kv & python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --enforce-eager"
28-36: Env var consistency with docs and other task: schemes and secret namingGood:
NATS_SERVERusesnats://here. In the README, it showshttp://for NATS—fix that doc tonats://. Also, the secret name here (ngc_nvcr_access) differs from the prefill worker (ngc_access). Align naming or document the difference.
65-65: Host networking caveatWith
"networkMode": "host", only one task per instance can bind port 8000. That’s fine for a singleton service, but call it out in the README to avoid scale surprises.examples/deployments/ECS/task_definition_prefillworker.json (1)
20-28: Env var scheme and naming consistencyHere
NATS_SERVERusesnats://(correct). Ensure README and the frontend task definition’s documentation also usenats://consistently. Consider parameterizingIP_ADDRESSvia task overrides or Service Discovery instead of hardcoding.examples/deployments/ECS/README.md (6)
31-35: Add blank lines around NATS ports tableSatisfy MD058 for readability.
- -|Container port|Protocol|Port name| App protocol| -|-|-|-|-| -|4222|TCP|4222|HTTP| -|6222|TCP|6222|HTTP| -|8222|TCP|8222|HTTP| + +|Container port|Protocol|Port name| App protocol| +|-|-|-|-| +|4222|TCP|4222|HTTP| +|6222|TCP|6222|HTTP| +|8222|TCP|8222|HTTP| +
49-51: Add blank lines around frontend ports tableMD058 fix.
- -|Container port|Protocol|Port name| App protocol| -|-|-|-|-| -|8000|TCP|8000|HTTP| + +|Container port|Protocol|Port name| App protocol| +|-|-|-|-| +|8000|TCP|8000|HTTP| +
46-46: Minor grammar: “prebuild” → “prebuilt”Polish the doc.
- - Add your Image URL (You can use the prebuild [Dynamo container](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/containers/vllm-runtime)) and **Yes** for Essential container. It can be AWS ECR URL or Nvidia NGC URL. If using NGC URL, please also choose **Private registry authentication** and add your Secret Manager ARN or name. + - Add your Image URL (You can use the prebuilt [Dynamo container](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/containers/vllm-runtime)) and **Yes** for Essential container. It can be AWS ECR URL or Nvidia NGC URL. If using NGC URL, please also choose **Private registry authentication** and add your Secrets Manager ARN or name.
68-79: Use service discovery or DNS over hardcoded IPsRelying on IPs from task pages is brittle. Consider ECS Service Discovery (AWS Cloud Map) and point
ETCD_ENDPOINTS/NATS_SERVERto DNS names instead. At minimum, call this out as an enhancement.
70-76: Document IAM requirements for pulling private imagesSince task defs use
repositoryCredentials, explicitly mention that the task’s execution role must havesecretsmanager:GetSecretValuefor the provided secret, and which region it lives in.I can add a minimal IAM policy snippet to the README if you want.
14-37: Security note: etcd and NATS exposureAdd a note to restrict etcd and NATS to private subnets/security groups and avoid public access, especially with no auth on etcd and with NATS monitoring port 8222 enabled.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
examples/deployments/ECS/README.md(1 hunks)examples/deployments/ECS/task_definition_etcd_nats.json(1 hunks)examples/deployments/ECS/task_definition_frontend.json(1 hunks)examples/deployments/ECS/task_definition_prefillworker.json(1 hunks)
🧰 Additional context used
🪛 LanguageTool
examples/deployments/ECS/README.md
[grammar] ~46-~46: Ensure spelling is correct
Context: ...` - Add your Image URL (You can use the prebuild [Dynamo container](https://catalog.ngc....
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
🪛 markdownlint-cli2 (0.17.2)
examples/deployments/ECS/README.md
24-24: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
35-35: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
51-51: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
58-58: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/2415/merge) by kylehh.
examples/deployments/ECS/README.md
[error] 1-1: Trailing whitespace detected by pre-commit; the hook modified the file. Re-run pre-commit to verify.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
nealvaidya
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that could be clarified
713b563 to
7b6dd51
Compare
Signed-off-by: Neal Vaidya <[email protected]>
Signed-off-by: Neal Vaidya <[email protected]>
7b6dd51 to
ad9aee9
Compare
Signed-off-by: Neal Vaidya <[email protected]>
2ccdb0f to
9ebffdd
Compare
|
[like] Kyle Huang reacted to your message:
…________________________________
From: Neal Vaidya ***@***.***>
Sent: Tuesday, September 9, 2025 7:38:01 PM
To: ai-dynamo/dynamo ***@***.***>
Cc: Kyle Huang ***@***.***>; Mention ***@***.***>
Subject: Re: [ai-dynamo/dynamo] docs: Add AWS ECS deployment example for Dynamo vLLM (PR #2415)
Merged #2415<#2415> into main.
—
Reply to this email directly, view it on GitHub<#2415 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFYYN6K5FCH4ZEPWZBOGQ2D3R4UBTAVCNFSM6AAAAACDXVWHLGVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJZGU4TOMZWGAZTMNQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Signed-off-by: Neal Vaidya <[email protected]> Co-authored-by: Neal Vaidya <[email protected]> Signed-off-by: zhongdaor <[email protected]>
Overview:
Add AWS ECS deployment example for Dynamo vLLM
Details:
Summary by CodeRabbit