-
Notifications
You must be signed in to change notification settings - Fork 736
docs: Update EKS guide for Dynamo v0.6.0 #4448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
👋 Hi Wenhan-Tan! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughAdds comprehensive EKS deployment documentation for Dynamo, covering cluster creation, EFS configuration, platform installation, inference graph deployment, and endpoint testing procedures. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
examples/deployments/EKS/README.md (3)
1-11: Fix heading hierarchy to follow Markdown standards.The document starts with
# Dynamo Deployment on EKS(h1) but jumps directly to### Step 1. Create EKS cluster(h3), skipping the h2 level. Heading levels should increment by only one level at a time.Apply this diff to fix the heading hierarchy:
-# Dynamo Deployment on EKS +# Dynamo Deployment on EKS This guide covers steps of creating an Amazon EKS cluster... -[Step 1. Create EKS cluster](#step-1-create-eks-cluster) +## Steps -[Step 2. Install Dynamo Kubernetes Platform](#step-2-install-dynamo-kubernetes-platform) +[Step 1. Create EKS cluster](#step-1-create-eks-cluster) -[Step 3. Deploy Dynamo Inference Graph](#step-3-deploy-dynamo-inference-graph) +[Step 2. Install Dynamo Kubernetes Platform](#step-2-install-dynamo-kubernetes-platform) -### Step 1. Create EKS cluster +[Step 3. Deploy Dynamo Inference Graph](#step-3-deploy-dynamo-inference-graph) + +## Step 1. Create EKS cluster
94-94: Fix bare URL and improve phrasing.Line 94 contains a bare URL that should be wrapped in Markdown link syntax. Additionally, the phrasing "able to access" is more clearly expressed as "can access".
Apply this diff:
-Follow the steps to create an EFS file system: https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/master/docs/efs-create-filesystem.md. Make sure you mount subnets in the last step correctly. This will affect whether your nodes are able to access the created EFS file system. +Follow the [steps to create an EFS file system](https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/master/docs/efs-create-filesystem.md). Make sure you mount subnets correctly in the final step. This will affect whether your nodes can access the created EFS file system.Also applies to: 94-94
17-84: Add language identifiers to all fenced code blocks.All code blocks lack language specifications, which impacts syntax highlighting and readability in documentation viewers. Based on context, add appropriate language identifiers:
yamlfor Kubernetes manifests andbashfor shell commands.Examples of the fixes needed:
-``` +```yaml apiVersion: eksctl.io/v1alpha5 kind: ClusterConfig-``` +```bash eksctl create cluster -f <FILENAME>.yaml-``` +```bash # Create Image Pull Secret export DOCKER_SERVER=<ECR_REGISTRY>Apply similar fixes to all remaining code blocks. Use
yamlfor Kubernetes manifests and YAML configs, andbashfor shell commands and command examples.Also applies to: 88-90, 100-114, 118-120, 126-142, 145-153, 157-159, 163-169, 177-187, 191-199, 205-396, 400-402, 406-408, 412-433, 437-455, 459-461
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
examples/deployments/EKS/README.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
examples/deployments/EKS/README.md
[style] ~94-~94: As a shorter alternative for ‘able to’, consider using “can”.
Context: ...ly. This will affect whether your nodes are able to access the created EFS file system. ##...
(BE_ABLE_TO)
🪛 markdownlint-cli2 (0.18.1)
examples/deployments/EKS/README.md
11-11: Heading levels should only increment by one level at a time
Expected: h2; Actual: h3
(MD001, heading-increment)
17-17: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
88-88: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
94-94: Bare URL used
(MD034, no-bare-urls)
100-100: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
118-118: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
126-126: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
145-145: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
157-157: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
163-163: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
177-177: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
191-191: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
205-205: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
400-400: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
406-406: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
412-412: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
437-437: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
459-459: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
examples/deployments/EKS/README.md
Outdated
|
|
||
| #### c) Create Dynamo Inference Graph | ||
|
|
||
| For this example, we'll deploy `Qwen/Qwen3-32B` in disaggregated mode. We'll create 12 prefill workers with TP1 and 1 decode worker with TP4 for a total of 16 x H200 GPUs (p5en.48xlarge). Please change `<DYNAMO_TRTLLM_IMAGE>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let us note, this is disagg with kv router
| fsx: true | ||
| - name: p5en-ng | ||
| instanceType: p5en.48xlarge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add note saying that you can switch instance type to g6e or any other instance if p5en is not available
|
|
||
| ### Step 3. Deploy Dynamo Inference Graph | ||
|
|
||
| #### a) Build Dynamo TRTLLM runtime image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also show prebuilt NGC container way, show both ways - NGC container and build image
| #### b) Run below to install Dynamo Kubernetes Platform | ||
| ``` | ||
| # Install CRDs | ||
| helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-0.6.0.tgz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious - v0.6.1 is already out, with v0.7.0 coming soon, should this just be updated in preparation for v0.7.0 instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure we can wait for 0.7.0 since it'll be out this week
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Wenhan, Kshitiz - v0.7.0 is out, so this can be updated: #4448 (comment)
Can we close out this PR if updates are still needed? It'll get progressively harder to merge the longer it stays open.
| sharedMemory: | ||
| size: 64Gi | ||
| envs: | ||
| - name: DYN_LOG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these debug log levels intentional or we'd comment it out and leave a note to enable for diagnosis?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already left a note for all these env vars. I can definitely comment them out
| -n dynamo-system | ||
| ``` | ||
|
|
||
| #### b) Run below to install Dynamo Kubernetes Platform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we shouldn't just point to the production install doc here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works too. Probably even better
Overview:
Update EKS guide for Dynamo 0.6.0
Summary by CodeRabbit