-
Notifications
You must be signed in to change notification settings - Fork 738
feat: add scaling adapter #4699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
WalkthroughThis pull request removes the built-in autoscaling field from DynamoComponentDeployment and DynamoGraphDeployment CRDs, introducing a new dedicated DynamoGraphDeploymentScalingAdapter CRD and controller to manage per-service scaling instead. Autoscaling validation is removed from webhooks, the Autoscaling struct from API types, and HPA generation logic from controllers. Documentation is added for the new autoscaling architecture. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes
Poem
Pre-merge checks❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (5)
deploy/cloud/operator/internal/controller/common.go (1)
66-95: Consider renaming for clarity.The function name
servicesEqualsuggests full equality comparison, but it only compares replica counts. The comment correctly describes the behavior, but the name could be more precise to avoid future confusion.-// servicesEqual compares two services maps to detect changes in replica counts -func servicesEqual(old, new map[string]*v1alpha1.DynamoComponentDeploymentSharedSpec) bool { +// servicesReplicasEqual compares two services maps to detect changes in replica counts +func servicesReplicasEqual(old, new map[string]*v1alpha1.DynamoComponentDeploymentSharedSpec) bool {deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeploymentscalingadapters.yaml (1)
98-108: Consider adding a default value for replicas.The
replicasfield is required but has no default. While this is acceptable for a scaling adapter (the user/autoscaler should specify the value), consider whether a default of1would improve UX for initial creation.If a default is desired, add it to the schema:
replicas: description: |- Replicas is the desired number of replicas for the target service. This field is modified by external autoscalers (HPA/KEDA/Planner) or manually by users. + default: 1 format: int32 minimum: 0 type: integerdeploy/cloud/operator/internal/controller/dynamographdeploymentscalingadapter_controller.go (3)
78-85: Consider graceful handling when referenced DGD is deleted.When the DGD is not found, the controller returns an error causing exponential backoff requeue. If the DGD was intentionally deleted, this will generate continuous reconciliation attempts.
Consider: (1) setting a status condition indicating the DGD is missing, or (2) not requeuing when the DGD is gone (the adapter may be orphaned intentionally).
if err := r.Get(ctx, dgdKey, dgd); err != nil { if errors.IsNotFound(err) { logger.Error(err, "Referenced DGD not found", "dgd", dgdKey) - // DGD doesn't exist, can't proceed - return ctrl.Result{}, err + // DGD doesn't exist - don't requeue (avoids infinite retry) + r.Recorder.Eventf(adapter, corev1.EventTypeWarning, "DGDNotFound", + "Referenced DynamoGraphDeployment %s not found", dgdKey.Name) + return ctrl.Result{}, nil } return ctrl.Result{}, err }
103-120: Initial reconciliation may trigger false "out-of-band" sync.On first reconcile of a new adapter,
adapter.Status.Replicaswill be0(zero value). If the DGD service has, say,3replicas, this logic will detect it as an "out-of-band DGD change" and sync the adapter spec to3.While this might be acceptable behavior (adapter adopts DGD's current state), the log message "Detected out-of-band DGD change" is misleading for the initial sync case. Consider distinguishing between:
- Initial status population
- Actual out-of-band DGD edits
// 4. Detect out-of-band DGD changes (Scenario 1: User manually edited DGD) // If DGD replicas differ from adapter status, DGD was modified externally - if currentReplicas != adapter.Status.Replicas { - logger.Info("Detected out-of-band DGD change, syncing adapter from DGD", + if adapter.Status.Replicas != 0 && currentReplicas != adapter.Status.Replicas { + logger.Info("Detected out-of-band DGD change, syncing adapter from DGD",Or use a condition/annotation to track initialization state.
149-156: Minor: Comment numbering skips from step 5 to step 7.Step 6 is missing in the comment sequence. This is a cosmetic issue but may cause confusion.
- // 7. Update adapter status + // 6. Update adapter status
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (27)
deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml(0 hunks)deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml(0 hunks)deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeploymentscalingadapters.yaml(1 hunks)deploy/cloud/helm/platform/components/operator/templates/manager-rbac.yaml(2 hunks)deploy/cloud/operator/api/v1alpha1/common.go(0 hunks)deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go(0 hunks)deploy/cloud/operator/api/v1alpha1/dynamographdeploymentscalingadapter_types.go(1 hunks)deploy/cloud/operator/api/v1alpha1/zz_generated.deepcopy.go(1 hunks)deploy/cloud/operator/cmd/main.go(1 hunks)deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponentdeployments.yaml(0 hunks)deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml(0 hunks)deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeploymentscalingadapters.yaml(1 hunks)deploy/cloud/operator/config/rbac/role.yaml(2 hunks)deploy/cloud/operator/internal/consts/consts.go(0 hunks)deploy/cloud/operator/internal/controller/common.go(1 hunks)deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go(0 hunks)deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go(4 hunks)deploy/cloud/operator/internal/controller/dynamographdeploymentscalingadapter_controller.go(1 hunks)deploy/cloud/operator/internal/dynamo/graph.go(2 hunks)deploy/cloud/operator/internal/dynamo/graph_test.go(10 hunks)deploy/cloud/operator/internal/webhook/validation/dynamocomponentdeployment_test.go(0 hunks)deploy/cloud/operator/internal/webhook/validation/dynamographdeployment_test.go(0 hunks)deploy/cloud/operator/internal/webhook/validation/shared.go(0 hunks)deploy/cloud/operator/internal/webhook/validation/shared_test.go(0 hunks)docs/_sections/k8s_deployment.rst(1 hunks)docs/kubernetes/api_reference.md(17 hunks)docs/kubernetes/autoscaling.md(1 hunks)
💤 Files with no reviewable changes (12)
- deploy/cloud/operator/internal/webhook/validation/dynamographdeployment_test.go
- deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go
- deploy/cloud/operator/api/v1alpha1/common.go
- deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go
- deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml
- deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml
- deploy/cloud/operator/internal/consts/consts.go
- deploy/cloud/operator/internal/webhook/validation/shared.go
- deploy/cloud/operator/internal/webhook/validation/dynamocomponentdeployment_test.go
- deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml
- deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponentdeployments.yaml
- deploy/cloud/operator/internal/webhook/validation/shared_test.go
🧰 Additional context used
🧠 Learnings (8)
📓 Common learnings
Learnt from: julienmancuso
Repo: ai-dynamo/dynamo PR: 2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.465Z
Learning: The `stopSignal` field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.
Learnt from: julienmancuso
Repo: ai-dynamo/dynamo PR: 2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.534Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.
Learnt from: julienmancuso
Repo: ai-dynamo/dynamo PR: 1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1308-1312
Timestamp: 2025-06-11T21:29:28.650Z
Learning: User julienmancuso expects replies in English; avoid switching languages unless explicitly requested.
📚 Learning: 2025-09-04T19:03:06.643Z
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 2872
File: examples/multimodal/deploy/agg_qwen.yaml:53-60
Timestamp: 2025-09-04T19:03:06.643Z
Learning: In the dynamo repository, Kubernetes Custom Resources use `gpu: "1"` format for GPU resource limits and requests, not the standard Kubernetes `nvidia.com/gpu: 1` format. This applies to DynamoGraphDeployment resources and other dynamo CRs.
Applied to files:
deploy/cloud/operator/api/v1alpha1/dynamographdeploymentscalingadapter_types.godocs/kubernetes/api_reference.mddeploy/cloud/helm/crds/templates/nvidia.com_dynamographdeploymentscalingadapters.yamldeploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeploymentscalingadapters.yamldeploy/cloud/operator/config/rbac/role.yaml
📚 Learning: 2025-07-18T16:04:47.465Z
Learnt from: julienmancuso
Repo: ai-dynamo/dynamo PR: 2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.465Z
Learning: The `stopSignal` field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.
Applied to files:
deploy/cloud/operator/api/v1alpha1/dynamographdeploymentscalingadapter_types.godocs/kubernetes/api_reference.mddeploy/cloud/helm/crds/templates/nvidia.com_dynamographdeploymentscalingadapters.yamldeploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeploymentscalingadapters.yamldeploy/cloud/operator/internal/dynamo/graph.godeploy/cloud/operator/internal/dynamo/graph_test.go
📚 Learning: 2025-07-18T16:05:05.534Z
Learnt from: julienmancuso
Repo: ai-dynamo/dynamo PR: 2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.534Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.
Applied to files:
docs/kubernetes/api_reference.mddeploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeploymentscalingadapters.yamldeploy/cloud/operator/internal/dynamo/graph.godeploy/cloud/operator/internal/dynamo/graph_test.go
📚 Learning: 2025-07-18T16:04:31.771Z
Learnt from: julienmancuso
Repo: ai-dynamo/dynamo PR: 2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:92-98
Timestamp: 2025-07-18T16:04:31.771Z
Learning: CRD schemas in files like deploy/cloud/helm/crds/templates/*.yaml are auto-generated from Kubernetes library upgrades and should not be manually modified as changes would be overwritten during regeneration.
Applied to files:
docs/kubernetes/api_reference.mddeploy/cloud/helm/crds/templates/nvidia.com_dynamographdeploymentscalingadapters.yamldeploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeploymentscalingadapters.yaml
📚 Learning: 2025-11-05T20:23:29.539Z
Learnt from: nv-hwoo
Repo: ai-dynamo/dynamo PR: 4112
File: examples/backends/sglang/deploy/agg.yaml:76-78
Timestamp: 2025-11-05T20:23:29.539Z
Learning: In DynamoGraphDeployment (DGD) YAML manifests, the volumeMounts field uses `mountPoint` (not the standard Kubernetes `mountPath`) to specify where to mount volumes. This is defined in the DGD custom resource definition API.
Applied to files:
docs/kubernetes/api_reference.md
📚 Learning: 2025-06-04T13:09:53.416Z
Learnt from: julienmancuso
Repo: ai-dynamo/dynamo PR: 1365
File: deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go:171-178
Timestamp: 2025-06-04T13:09:53.416Z
Learning: The `DYN_DEPLOYMENT_CONFIG` environment variable (commonconsts.DynamoDeploymentConfigEnvVar) in the Dynamo operator will never be set via ValueFrom (secrets/config maps), only via direct Value assignment. The GetDynamoDeploymentConfig method correctly only checks env.Value for this specific environment variable.
Applied to files:
deploy/cloud/operator/internal/dynamo/graph_test.go
📚 Learning: 2025-05-29T16:29:45.152Z
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 1266
File: deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go:85-85
Timestamp: 2025-05-29T16:29:45.152Z
Learning: In the Dynamo codebase, ComponentTypePlanner constants with different cases ("Planner" vs "planner") are intentional and serve different purposes: component type in config vs component label. These should not be made consistent as they handle different contexts.
Applied to files:
deploy/cloud/operator/internal/dynamo/graph_test.go
🧬 Code graph analysis (6)
deploy/cloud/operator/api/v1alpha1/dynamographdeploymentscalingadapter_types.go (1)
deploy/cloud/operator/api/v1alpha1/groupversion_info.go (1)
SchemeBuilder(35-35)
deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (4)
deploy/cloud/operator/api/v1alpha1/dynamographdeployment_types.go (1)
DynamoGraphDeployment(63-71)deploy/cloud/operator/internal/controller_common/resource.go (1)
SyncResource(60-195)deploy/cloud/operator/api/v1alpha1/dynamographdeploymentscalingadapter_types.go (4)
DynamoGraphDeploymentScalingAdapter(83-89)DynamoGraphDeploymentScalingAdapterSpec(25-35)DynamoGraphDeploymentServiceRef(38-48)DynamoGraphDeploymentScalingAdapterList(94-98)deploy/cloud/operator/internal/consts/consts.go (2)
KubeLabelDynamoGraphDeploymentName(39-39)KubeLabelDynamoComponent(40-40)
deploy/cloud/operator/internal/dynamo/graph.go (1)
deploy/cloud/operator/internal/consts/consts.go (1)
KubeLabelDynamoComponent(40-40)
deploy/cloud/operator/cmd/main.go (2)
deploy/cloud/operator/internal/controller/dynamographdeploymentscalingadapter_controller.go (1)
DynamoGraphDeploymentScalingAdapterReconciler(45-50)deploy/cloud/operator/internal/controller_common/predicate.go (1)
Config(61-87)
deploy/cloud/operator/internal/dynamo/graph_test.go (1)
deploy/cloud/operator/internal/consts/consts.go (3)
KubeLabelDynamoComponent(40-40)KubeLabelDynamoSelector(32-32)KubeLabelDynamoGraphDeploymentName(39-39)
deploy/cloud/operator/api/v1alpha1/zz_generated.deepcopy.go (1)
deploy/cloud/operator/api/v1alpha1/dynamographdeploymentscalingadapter_types.go (5)
DynamoGraphDeploymentScalingAdapter(83-89)DynamoGraphDeploymentScalingAdapterList(94-98)DynamoGraphDeploymentScalingAdapterSpec(25-35)DynamoGraphDeploymentScalingAdapterStatus(51-65)DynamoGraphDeploymentServiceRef(38-48)
🪛 markdownlint-cli2 (0.18.1)
docs/kubernetes/autoscaling.md
18-18: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
docs/kubernetes/api_reference.md
293-293: Link fragments should be valid
(MD051, link-fragments)
632-632: Link fragments should be valid
(MD051, link-fragments)
651-651: Link fragments should be valid
(MD051, link-fragments)
714-714: Link fragments should be valid
(MD051, link-fragments)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: sglang (arm64)
- GitHub Check: trtllm (amd64)
- GitHub Check: sglang (amd64)
- GitHub Check: trtllm (arm64)
- GitHub Check: vllm (arm64)
- GitHub Check: operator (amd64)
- GitHub Check: vllm (amd64)
- GitHub Check: operator (arm64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (29)
deploy/cloud/operator/internal/dynamo/graph.go (1)
1037-1046: Service‑scoped labels for Grove cliques are consistent and adapter‑friendlyPassing
serviceNameintogenerateLabelsand addingKubeLabelDynamoComponent = componentNamemakes all cliques for a given service share the same selector (KubeLabelDynamoSelector) and component label, matching howDynamoComponentDeploymentis labeled and simplifying per‑service scaling selection. The behavior looks correct and is well‑aligned with the scaling adapter design.Also applies to: 1074-1084
deploy/cloud/operator/internal/dynamo/graph_test.go (1)
1297-1303: Tests correctly track new component‑level label and selector semanticsAll updated expectations for
KubeLabelDynamoComponentandKubeLabelDynamoSelectoron cliques and component deployments line up with the implementation ingenerateLabelsandGenerateDynamoComponentsDeployments: selectors are now per‑service (<graph>-<lower(serviceName)>), and every resource carries aKubeLabelDynamoComponentequal to the service key. This keeps tests in sync with the new scaling/labeling model.Also applies to: 1473-1481, 1873-1885, 2050-2062, 2191-2200, 2353-2361, 2775-2787, 2939-2949, 3079-3088, 3241-3249
docs/_sections/k8s_deployment.rst (1)
13-13: LGTM!The new toctree entry for the Autoscaling documentation follows the existing pattern and correctly references the new
autoscaling.mdfile.deploy/cloud/operator/internal/controller/common.go (1)
57-64: LGTM!The
getServiceKeyshelper is correctly implemented with proper capacity pre-allocation for the slice.docs/kubernetes/autoscaling.md (3)
18-39: Excellent architecture diagram.The ASCII diagram clearly illustrates the relationship between DGD, scaling adapters, and autoscalers. The static analysis warning about missing language specification can be safely ignored here as this is an ASCII diagram, not executable code.
136-161: Well-documented HPA example.The example correctly demonstrates targeting the DGDSA via
scaleTargetRefand includes sensible defaults for stabilization windows. Good practice showing both scale-up and scale-down behavior configuration.
363-395: Helpful troubleshooting section.The troubleshooting guidance covers common issues with practical kubectl commands. Good inclusion of the raw API check for custom metrics availability.
deploy/cloud/operator/api/v1alpha1/dynamographdeploymentscalingadapter_types.go (6)
24-35: Well-designed spec structure.The spec correctly uses
int32for replicas (matching Kubernetes conventions) withMinimum=0validation to support scale-to-zero scenarios with KEDA. Required validation ensures the adapter always has a valid reference.
37-48: LGTM!The
DynamoGraphDeploymentServiceRefcleanly encapsulates the reference to a specific service within a DGD, with appropriate validation to prevent empty strings.
50-65: LGTM!The status fields correctly support the scale subresource requirements:
Replicasfor current state,Selectorfor HPA pod discovery, andLastScaleTimefor scaling observability.
67-74: Proper scale subresource configuration.The kubebuilder markers correctly configure the scale subresource paths, enabling seamless integration with HPA, KEDA, and other scale-aware controllers. The print columns provide a good quick-view experience with
kubectl get dgdsa.
83-89: LGTM!The main type follows standard Kubernetes CRD patterns with embedded TypeMeta/ObjectMeta and separate Spec/Status structs. The
omitemptyon Spec is safe since the required field validations will be enforced by the webhook.
100-102: LGTM!The init function correctly registers both the resource and list types with the scheme builder.
deploy/cloud/operator/api/v1alpha1/zz_generated.deepcopy.go (1)
569-676: Auto-generated deepcopy functions look correct.The controller-gen output correctly handles:
- Value copy for
Spec(sinceDGDRefcontains only string fields)- Deep copy for
Status.LastScaleTimepointer viaDeepCopy()- Standard
DeepCopyObjectfor runtime.Object interface compliancedocs/kubernetes/api_reference.md (1)
297-373: Well-documented new scaling adapter API.The documentation for
DynamoGraphDeploymentScalingAdapteris comprehensive, covering the resource purpose, spec fields (replicas,dgdRef), and status fields (replicas,selector,lastScaleTime). The integration with HPA/KEDA via the scale subresource is clearly explained.deploy/cloud/operator/cmd/main.go (1)
581-589: LGTM - Controller registration follows established pattern.The
DynamoGraphDeploymentScalingAdapterReconcilerregistration is consistent with other controllers in this file. All required dependencies (Client,Scheme,Recorder,Config) are properly wired, and error handling follows the same pattern.deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeploymentscalingadapters.yaml (2)
131-136: Well-designed scale subresource configuration.The scale subresource is correctly configured for HPA/KEDA compatibility with proper paths for
specReplicasPath,statusReplicasPath, andlabelSelectorPath. This enables standard Kubernetes autoscalers to interact with the adapter.
1-136: LGTM - CRD definition is complete and follows Kubernetes conventions.The CRD properly defines all required components for a scaling adapter: spec with target reference and desired replicas, status for observed state, and the scale subresource for HPA/KEDA integration. The printer columns provide useful
kubectl getoutput.deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (4)
660-684: LGTM - Cleanup logic for orphaned adapters.The cleanup logic correctly lists adapters by DGD name label and removes those referencing services that no longer exist in the DGD spec. Error handling properly ignores
NotFounderrors, and events are recorded for observability.
689-693: LGTM with minor note on edge case.The naming function correctly lowercases service names for DNS subdomain compliance. Note that services differing only by case (e.g., "Frontend" vs "frontend") would produce the same adapter name, but this is unlikely in practice given typical service naming conventions.
714-720: Appropriate predicate configuration for owned adapters.The predicates correctly:
- Ignore create events (avoids redundant reconciliation after creating adapter)
- React to delete events (ensures adapters are recreated if deleted)
- Ignore update events (adapter controller manages status; spec changes flow from DGD)
This design ensures the DGD controller creates adapters but doesn't fight with the adapter controller for updates.
229-234: LGTM - Scaling adapter reconciliation placement in flow.The scaling adapter reconciliation is appropriately placed after PVC reconciliation and before service-specific resource reconciliation. This ensures adapters exist before any autoscaling-dependent logic runs.
deploy/cloud/helm/platform/components/operator/templates/manager-rbac.yaml (1)
382-401: No action needed. The DynamoGraphDeploymentScalingAdapter controller does not use finalizers, so the missingdynamographdeploymentscalingadapters/finalizerspermission is not required. The RBAC configuration is correct.deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeploymentscalingadapters.yaml (2)
1-51: LGTM! Well-structured CRD with proper HPA/KEDA integration.The CRD metadata, naming conventions, and printer columns are appropriately configured. The scale subresource paths correctly reference the schema fields.
131-136: LGTM! Scale subresource is correctly configured for HPA compatibility.The
labelSelectorPath,specReplicasPath, andstatusReplicasPathcorrectly reference the schema fields, enabling standard Kubernetes autoscalers to interact with this resource.deploy/cloud/operator/internal/controller/dynamographdeploymentscalingadapter_controller.go (4)
44-50: LGTM! Reconciler struct follows standard patterns.The struct correctly embeds the Kubernetes client and includes the event recorder for observability.
122-147: LGTM! Scaling logic correctly updates DGD and records events.The logic properly updates the DGD's service replicas and records both Kubernetes events and
LastScaleTimefor observability.
87-101: > Likely an incorrect or invalid review comment.
171-200: LGTM! Controller setup with efficient watch predicates.Good use of
GenerationChangedPredicatefor the primary resource and custom predicates for DGD watches. TheservicesEqualcomparison ensures reconciliation only on meaningful changes, properly comparing replica counts between service maps with nil-safe handling.
deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go
Outdated
Show resolved
Hide resolved
deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go
Show resolved
Hide resolved
deploy/cloud/operator/internal/controller/dynamographdeploymentscalingadapter_controller.go
Show resolved
Hide resolved
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
deploy/cloud/operator/api/v1alpha1/dynamographdeploymentscalingadapter_types.go
Outdated
Show resolved
Hide resolved
deploy/cloud/operator/internal/controller/dynamographdeploymentscalingadapter_controller.go
Outdated
Show resolved
Hide resolved
Signed-off-by: Julien Mancuso <[email protected]>
deploy/cloud/operator/internal/controller/dynamographdeploymentscalingadapter_controller.go
Outdated
Show resolved
Hide resolved
Signed-off-by: Julien Mancuso <[email protected]>
4cd75f8 to
7706428
Compare
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
deploy/cloud/operator/internal/webhook/validation/dynamographdeployment.go
Outdated
Show resolved
Hide resolved
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Merged 238 commits from main branch to bring the feature branch up to date. Key conflicts resolved: - Removed lib/kvbm-kernels references (deleted in main) - Kept nova/nova-backend/kvbm workspace members from feature branch - Maintained v2 module API refactoring from feature branch - Updated Cargo.lock files to reflect new dependencies Major updates from main include: - LoRA support for vLLM (#4810) - Multimodal documentation (#4510) - Scaling adapter features (#4699, #4825) - Tool calling support (#4822, #4722) - NIXL connect improvements (#4433) Signed-off-by: Ryan Olson <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Overview:
design proposal : https://docs.google.com/document/d/1qHDYqbfMO6WJY2vM30YurAh1Gm3Gk7eKnZWdsakFn6M/edit?usp=sharing
This MR introduces DGD Scaling adapter which allows to plug standard kubernetes scaling mechanism (HPA, Keda, ...)
See autoscaling user guide in this MR for more details
Summary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.