Skip to content

Conversation

@kylehh
Copy link
Contributor

@kylehh kylehh commented Aug 12, 2025

Overview:

Add AWS ECS deployment example for Dynamo vLLM

Details:

  1. Create ECS clusters
  2. Create 3 task definations
  • etcd/nats
  • vLLM frontend and decode worker
  • vLLM prefill node
  1. Deploy 3 tasks

Summary by CodeRabbit

  • New Features
    • Added sample AWS ECS task definitions: ETCD/NATS (Fargate), Dynamo vLLM Frontend (EC2 GPU), and Prefill Worker (EC2 GPU), including logging and environment configuration.
  • Documentation
    • Introduced an end-to-end guide for deploying Dynamo vLLM on AWS ECS, covering cluster setup, task deployment, configuration, and testing endpoints (/v1/models, /v1/chat/completions).

@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 12, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 12, 2025

Walkthrough

Adds ECS deployment assets for Dynamo vLLM: a README guide and three ECS task definitions for ETCD/NATS (Fargate), frontend (EC2 with GPU), and prefill worker (EC2 with GPU). Instructions include cluster setup, task configuration, environment variables, runtime commands, deployment steps, and basic endpoint testing.

Changes

Cohort / File(s) Summary
Documentation
examples/deployments/ECS/README.md
New end-to-end ECS deployment guide for Dynamo vLLM covering cluster setup, ETCD/NATS, frontend, prefill worker tasks, deployment, and testing commands.
ECS Task Definitions
examples/deployments/ECS/task_definition_etcd_nats.json, examples/deployments/ECS/task_definition_frontend.json, examples/deployments/ECS/task_definition_prefillworker.json
New ECS task JSONs: (1) Fargate task with etcd and nats containers, CloudWatch logs, ports, env; (2) EC2 host-mode GPU frontend running router+vLLM on port 8000; (3) EC2 GPU prefill worker running vLLM in prefill mode; all with placeholder roles/credentials and logging.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Frontend (ECS EC2 GPU)
  participant ETCD
  participant NATS
  participant PrefillWorker (ECS EC2 GPU)
  participant vLLM Server

  User->>Frontend (ECS EC2 GPU): HTTP /v1/chat/completions
  Frontend (ECS EC2 GPU)->>ETCD: Read/Write KV (coordination)
  Frontend (ECS EC2 GPU)->>NATS: Publish work request
  NATS-->>PrefillWorker (ECS EC2 GPU): Deliver request
  PrefillWorker (ECS EC2 GPU)->>vLLM Server: Generate tokens (prefill/decoding)
  PrefillWorker (ECS EC2 GPU)->>NATS: Send partial/final results
  NATS-->>Frontend (ECS EC2 GPU): Results stream
  Frontend (ECS EC2 GPU)-->>User: Response payload
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Poem

A rabbit boots up ECS at dawn,
ETCD hums, and NATS hops on.
Frontend listens—port eight-thousand’s clear,
Prefill workers nibble tokens near.
vLLM streams like carrot gold,
CloudWatch logs the tales it told.
Deploy, test—burrow boldly, behold! 🥕✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Nitpick comments (12)
examples/deployments/ECS/task_definition_etcd_nats.json (2)

25-30: Security: etcd runs with no auth; restrict access or enable auth

ALLOW_NONE_AUTHENTICATION=YES is fine for a quick sample but risky in real environments. At minimum, restrict the task’s security group to only the ECS cluster subnets and the frontend/prefill EC2 instances. Prefer enabling proper auth for etcd.


38-44: Avoid hardcoding region and log group; parameterize or document

Both containers hardcode awslogs-region to us-east-2 and use fixed log group names. If users deploy elsewhere, this breaks expectations. Consider parameterizing via templates or clearly noting required edits.

Also applies to: 85-91

examples/deployments/ECS/task_definition_frontend.json (3)

25-27: Prefer a single interpreter and avoid shelling two long-lived processes in one container

You’re backgrounding the frontend and foregrounding vLLM via sh -c. That complicates signal handling, shutdown, and health. Prefer:

  • Split into two containers in one task (sidecar pattern), or
  • Use a proper process supervisor (e.g., dumb-init, s6) and consistent interpreter.

At minimum, make the interpreter consistent.

-                "cd components/backends/vllm && python -m dynamo.frontend --router-mode kv & python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --enforce-eager"
+                "cd components/backends/vllm && python3 -m dynamo.frontend --router-mode kv & python3 -m dynamo.vllm --model Qwen/Qwen3-0.6B --enforce-eager"

28-36: Env var consistency with docs and other task: schemes and secret naming

Good: NATS_SERVER uses nats:// here. In the README, it shows http:// for NATS—fix that doc to nats://. Also, the secret name here (ngc_nvcr_access) differs from the prefill worker (ngc_access). Align naming or document the difference.


65-65: Host networking caveat

With "networkMode": "host", only one task per instance can bind port 8000. That’s fine for a singleton service, but call it out in the README to avoid scale surprises.

examples/deployments/ECS/task_definition_prefillworker.json (1)

20-28: Env var scheme and naming consistency

Here NATS_SERVER uses nats:// (correct). Ensure README and the frontend task definition’s documentation also use nats:// consistently. Consider parameterizing IP_ADDRESS via task overrides or Service Discovery instead of hardcoding.

examples/deployments/ECS/README.md (6)

31-35: Add blank lines around NATS ports table

Satisfy MD058 for readability.

-
-|Container port|Protocol|Port name| App protocol|
-|-|-|-|-|
-|4222|TCP|4222|HTTP|
-|6222|TCP|6222|HTTP|
-|8222|TCP|8222|HTTP|
+ 
+|Container port|Protocol|Port name| App protocol|
+|-|-|-|-|
+|4222|TCP|4222|HTTP|
+|6222|TCP|6222|HTTP|
+|8222|TCP|8222|HTTP|
+

49-51: Add blank lines around frontend ports table

MD058 fix.

-
-|Container port|Protocol|Port name| App protocol|
-|-|-|-|-|
-|8000|TCP|8000|HTTP|
+ 
+|Container port|Protocol|Port name| App protocol|
+|-|-|-|-|
+|8000|TCP|8000|HTTP|
+

46-46: Minor grammar: “prebuild” → “prebuilt”

Polish the doc.

- - Add your Image URL (You can use the prebuild [Dynamo container](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/containers/vllm-runtime)) and **Yes** for Essential container. It can be AWS ECR URL or Nvidia NGC URL. If using NGC URL, please also choose **Private registry authentication** and add your Secret Manager ARN or name. 
+ - Add your Image URL (You can use the prebuilt [Dynamo container](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-dynamo/containers/vllm-runtime)) and **Yes** for Essential container. It can be AWS ECR URL or Nvidia NGC URL. If using NGC URL, please also choose **Private registry authentication** and add your Secrets Manager ARN or name.

68-79: Use service discovery or DNS over hardcoded IPs

Relying on IPs from task pages is brittle. Consider ECS Service Discovery (AWS Cloud Map) and point ETCD_ENDPOINTS/NATS_SERVER to DNS names instead. At minimum, call this out as an enhancement.


70-76: Document IAM requirements for pulling private images

Since task defs use repositoryCredentials, explicitly mention that the task’s execution role must have secretsmanager:GetSecretValue for the provided secret, and which region it lives in.

I can add a minimal IAM policy snippet to the README if you want.


14-37: Security note: etcd and NATS exposure

Add a note to restrict etcd and NATS to private subnets/security groups and avoid public access, especially with no auth on etcd and with NATS monitoring port 8222 enabled.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18bb779 and 255df37.

📒 Files selected for processing (4)
  • examples/deployments/ECS/README.md (1 hunks)
  • examples/deployments/ECS/task_definition_etcd_nats.json (1 hunks)
  • examples/deployments/ECS/task_definition_frontend.json (1 hunks)
  • examples/deployments/ECS/task_definition_prefillworker.json (1 hunks)
🧰 Additional context used
🪛 LanguageTool
examples/deployments/ECS/README.md

[grammar] ~46-~46: Ensure spelling is correct
Context: ...` - Add your Image URL (You can use the prebuild [Dynamo container](https://catalog.ngc....

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.17.2)
examples/deployments/ECS/README.md

24-24: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


35-35: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


51-51: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


58-58: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/2415/merge) by kylehh.
examples/deployments/ECS/README.md

[error] 1-1: Trailing whitespace detected by pre-commit; the hook modified the file. Re-run pre-commit to verify.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo

Copy link
Contributor

@nealvaidya nealvaidya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that could be clarified

@nealvaidya nealvaidya changed the title Add AWS ECS deployment example for Dynamo vLLM docs: Add AWS ECS deployment example for Dynamo vLLM Sep 2, 2025
@github-actions github-actions bot added the docs label Sep 2, 2025
Signed-off-by: Neal Vaidya <[email protected]>
Signed-off-by: Neal Vaidya <[email protected]>
@nealvaidya nealvaidya merged commit 8c665c1 into main Sep 9, 2025
12 of 13 checks passed
@nealvaidya nealvaidya deleted the khuang-ecs branch September 9, 2025 19:37
@kylehh
Copy link
Contributor Author

kylehh commented Sep 9, 2025 via email

zhongdaor-nv pushed a commit that referenced this pull request Sep 15, 2025
Signed-off-by: Neal Vaidya <[email protected]>
Co-authored-by: Neal Vaidya <[email protected]>
Signed-off-by: zhongdaor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants