Skip to content

Commit 3831d59

Browse files
committed
feat: Add Grove and Kai scheduler as part of dynamo cloud helm chart
Signed-off-by: Julien Mancuso <[email protected]>
1 parent f4420dd commit 3831d59

File tree

12 files changed

+1066
-177
lines changed

12 files changed

+1066
-177
lines changed
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
<!--
2+
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
SPDX-License-Identifier: Apache-2.0
4+
5+
Licensed under the Apache License, Version 2.0 (the "License");
6+
you may not use this file except in compliance with the License.
7+
You may obtain a copy of the License at
8+
9+
http://www.apache.org/licenses/LICENSE-2.0
10+
11+
Unless required by applicable law or agreed to in writing, software
12+
distributed under the License is distributed on an "AS IS" BASIS,
13+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
See the License for the specific language governing permissions and
15+
limitations under the License.
16+
-->
17+
18+
# dynamo-platform
19+
20+
A Helm chart for NVIDIA Dynamo Platform.
21+
22+
![Version: 0.5.0](https://img.shields.io/badge/Version-0.5.0-informational?style=flat-square) ![Type: application](https://img.shields.io/badge/Type-application-informational?style=flat-square)
23+
24+
## 🚀 Overview
25+
26+
The Dynamo Platform Helm chart deploys the complete Dynamo Cloud infrastructure on Kubernetes, including:
27+
28+
- **Dynamo Operator**: Kubernetes operator for managing Dynamo deployments
29+
- **NATS**: High-performance messaging system for component communication
30+
- **etcd**: Distributed key-value store for operator state management
31+
- **Grove**: Multi-node inference orchestration (optional)
32+
- **Kai Scheduler**: Advanced workload scheduling (optional)
33+
34+
## 📋 Prerequisites
35+
36+
- Kubernetes cluster (v1.20+)
37+
- Helm 3.8+
38+
- Sufficient cluster resources for your deployment scale
39+
- Container registry access (if using private images)
40+
41+
## 🔧 Configuration
42+
43+
## Requirements
44+
45+
| Repository | Name | Version |
46+
|------------|------|---------|
47+
| file://components/operator | dynamo-operator | 0.5.0 |
48+
| https://charts.bitnami.com/bitnami | etcd | 11.1.0 |
49+
| https://nats-io.github.io/k8s/helm/charts/ | nats | 1.3.2 |
50+
| oci://ghcr.io/nvidia/grove | grove(grove-charts) | v0.0.0-6e30275 |
51+
| oci://ghcr.io/nvidia/kai-scheduler | kai-scheduler | v0.8.1 |
52+
53+
## Values
54+
55+
| Key | Type | Default | Description |
56+
|-----|------|---------|-------------|
57+
| dynamo-operator.enabled | bool | `true` | Whether to enable the Dynamo Kubernetes operator deployment |
58+
| dynamo-operator.natsAddr | string | `""` | NATS server address for operator communication (leave empty to use the bundled NATS chart). Format: "nats://hostname:port" |
59+
| dynamo-operator.etcdAddr | string | `""` | etcd server address for operator state storage (leave empty to use the bundled etcd chart). Format: "http://hostname:port" or "https://hostname:port" |
60+
| dynamo-operator.namespaceRestriction.enabled | bool | `true` | Whether to restrict operator to specific namespaces |
61+
| dynamo-operator.namespaceRestriction.targetNamespace | string | `nil` | Target namespace for operator deployment (leave empty for current namespace) |
62+
| dynamo-operator.controllerManager.tolerations | list | `[]` | Node tolerations for controller manager pods |
63+
| dynamo-operator.controllerManager.manager.image.repository | string | `"nvcr.io/nvidia/ai-dynamo/kubernetes-operator"` | Official NVIDIA Dynamo operator image repository |
64+
| dynamo-operator.controllerManager.manager.image.tag | string | `""` | Image tag (leave empty to use chart default) |
65+
| dynamo-operator.controllerManager.manager.image.pullPolicy | string | `"IfNotPresent"` | Image pull policy - when to pull the image |
66+
| dynamo-operator.controllerManager.manager.args[0] | string | `"--health-probe-bind-address=:8081"` | Health probe endpoint for Kubernetes health checks |
67+
| dynamo-operator.controllerManager.manager.args[1] | string | `"--metrics-bind-address=127.0.0.1:8080"` | Metrics endpoint for Prometheus scraping (localhost only for security) |
68+
| dynamo-operator.imagePullSecrets | list | `[]` | Secrets for pulling private container images |
69+
| dynamo-operator.dynamo.groveTerminationDelay | string | `"15m"` | How long to wait before forcefully terminating Grove instances |
70+
| dynamo-operator.dynamo.internalImages.debugger | string | `"python:3.12-slim"` | Debugger image for troubleshooting deployments |
71+
| dynamo-operator.dynamo.enableRestrictedSecurityContext | bool | `false` | Whether to enable restricted security contexts for enhanced security |
72+
| dynamo-operator.dynamo.dockerRegistry.useKubernetesSecret | bool | `false` | Whether to use Kubernetes secrets for registry authentication |
73+
| dynamo-operator.dynamo.dockerRegistry.server | string | `nil` | Docker registry server URL |
74+
| dynamo-operator.dynamo.dockerRegistry.username | string | `nil` | Registry username |
75+
| dynamo-operator.dynamo.dockerRegistry.password | string | `nil` | Registry password (consider using existingSecretName instead) |
76+
| dynamo-operator.dynamo.dockerRegistry.existingSecretName | string | `nil` | Name of existing Kubernetes secret containing registry credentials |
77+
| dynamo-operator.dynamo.dockerRegistry.secure | bool | `true` | Whether the registry uses HTTPS |
78+
| dynamo-operator.dynamo.ingress.enabled | bool | `false` | Whether to create ingress resources |
79+
| dynamo-operator.dynamo.ingress.className | string | `nil` | Ingress class name (e.g., "nginx", "traefik") |
80+
| dynamo-operator.dynamo.ingress.tlsSecretName | string | `"my-tls-secret"` | Secret name containing TLS certificates |
81+
| dynamo-operator.dynamo.istio.enabled | bool | `false` | Whether to enable Istio integration |
82+
| dynamo-operator.dynamo.istio.gateway | string | `nil` | Istio gateway name for routing |
83+
| dynamo-operator.dynamo.ingressHostSuffix | string | `""` | Host suffix for generated ingress hostnames |
84+
| dynamo-operator.dynamo.virtualServiceSupportsHTTPS | bool | `false` | Whether VirtualServices should support HTTPS routing |
85+
| grove.enabled | bool | `false` | Whether to enable Grove for multi-node inference coordination, if enabled, the Grove operator will be deployed cluster-wide |
86+
| kai-scheduler.enabled | bool | `false` | Whether to enable Kai Scheduler for intelligent resource allocation, if enabled, the Kai Scheduler operator will be deployed cluster-wide |
87+
| etcd.enabled | bool | `true` | Whether to enable etcd deployment, disable if you want to use an external etcd instance |
88+
| nats.enabled | bool | `true` | Whether to enable NATS deployment, disable if you want to use an external NATS instance |
89+
90+
### NATS Configuration
91+
92+
For detailed NATS configuration options beyond `nats.enabled`, please refer to the official NATS Helm chart documentation:
93+
**[NATS Helm Chart Documentation](https://github.com/nats-io/k8s/tree/main/helm/charts/nats)**
94+
95+
### etcd Configuration
96+
97+
For detailed etcd configuration options beyond `etcd.enabled`, please refer to the official Bitnami etcd Helm chart documentation:
98+
**[etcd Helm Chart Documentation](https://github.com/bitnami/charts/tree/main/bitnami/etcd)**
99+
100+
## 📚 Additional Resources
101+
102+
- [Dynamo Cloud Deployment Guide](../../../../docs/guides/dynamo_deploy/dynamo_cloud.md)
103+
- [NATS Documentation](https://docs.nats.io/)
104+
- [etcd Documentation](https://etcd.io/docs/)
105+
- [Kubernetes Operator Pattern](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/)
106+
107+
----------------------------------------------
108+
Autogenerated from chart metadata using [helm-docs v1.14.2](https://github.com/norwoodj/helm-docs/releases/v1.14.2)
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
<!--
2+
SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
SPDX-License-Identifier: Apache-2.0
4+
5+
Licensed under the Apache License, Version 2.0 (the "License");
6+
you may not use this file except in compliance with the License.
7+
You may obtain a copy of the License at
8+
9+
http://www.apache.org/licenses/LICENSE-2.0
10+
11+
Unless required by applicable law or agreed to in writing, software
12+
distributed under the License is distributed on an "AS IS" BASIS,
13+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
See the License for the specific language governing permissions and
15+
limitations under the License.
16+
-->
17+
18+
{{ template "chart.header" . }}
19+
20+
{{ template "chart.description" . }}
21+
22+
{{ template "chart.versionBadge" . }}{{ template "chart.typeBadge" . }}{{ template "chart.appVersionBadge" . }}
23+
24+
## 🚀 Overview
25+
26+
The Dynamo Platform Helm chart deploys the complete Dynamo Cloud infrastructure on Kubernetes, including:
27+
28+
- **Dynamo Operator**: Kubernetes operator for managing Dynamo deployments
29+
- **NATS**: High-performance messaging system for component communication
30+
- **etcd**: Distributed key-value store for operator state management
31+
- **Grove**: Multi-node inference orchestration (optional)
32+
- **Kai Scheduler**: Advanced workload scheduling (optional)
33+
34+
## 📋 Prerequisites
35+
36+
- Kubernetes cluster (v1.20+)
37+
- Helm 3.8+
38+
- Sufficient cluster resources for your deployment scale
39+
- Container registry access (if using private images)
40+
41+
## 🔧 Configuration
42+
43+
{{ template "chart.requirementsSection" . }}
44+
45+
{{ template "chart.valuesSection" . }}
46+
47+
### NATS Configuration
48+
49+
For detailed NATS configuration options beyond `nats.enabled`, please refer to the official NATS Helm chart documentation:
50+
**[NATS Helm Chart Documentation](https://github.com/nats-io/k8s/tree/main/helm/charts/nats)**
51+
52+
### etcd Configuration
53+
54+
For detailed etcd configuration options beyond `etcd.enabled`, please refer to the official Bitnami etcd Helm chart documentation:
55+
**[etcd Helm Chart Documentation](https://github.com/bitnami/charts/tree/main/bitnami/etcd)**
56+
57+
58+
## 📚 Additional Resources
59+
60+
- [Dynamo Cloud Deployment Guide](../../../../docs/guides/dynamo_deploy/dynamo_cloud.md)
61+
- [NATS Documentation](https://docs.nats.io/)
62+
- [etcd Documentation](https://etcd.io/docs/)
63+
- [Kubernetes Operator Pattern](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/)
64+
65+
{{ template "helm-docs.versionFooter" . }}

0 commit comments

Comments
 (0)