-
Notifications
You must be signed in to change notification settings - Fork 750
feat: refactor docker registry secret management in operator #1337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis change introduces a new boolean Helm value, Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Helm
participant Kubernetes
participant Operator
User->>Helm: Deploy chart with useKubernetesSecret=true
Helm->>Kubernetes: Render ServiceAccount with imagePullSecrets (dynamo-regcred)
Helm->>Kubernetes: Render Secret (dynamo-regcred)
Helm->>Kubernetes: Render Deployment with DOCKER_CONFIG env and volume
Kubernetes->>Operator: Start pods with mounted Docker config
Operator->>Registry: Authenticate using Docker config from secret
Registry-->>Operator: Responds to image pull/auth requests
Possibly related PRs
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (11)
deploy/cloud/operator/go.mod (4)
41-43: Review indirect containerd dependencies.
github.com/containerd/errdefs,github.com/containerd/errdefs/pkg, andgithub.amrom.workers.dev/containerd/logwere added as indirect requirements. These are typically transitive dependencies of the Docker client. If your code doesn’t directly reference them, rungo mod tidyto prune unused modules. If they are used directly, promote them into the mainrequireblock for clarity.
53-53: Prune unused indirect dependencyhttpsnoop.The
github.com/felixge/httpsnooppackage appears as an indirect requirement, probably from observability tooling. If it’s not used by your code or tests, rungo mod tidyto eliminate it and reduce module bloat.
74-76: Assess new Moby image-spec and TTY modules.Indirect additions of
github.com/moby/docker-image-spec,github.com/moby/sys/atomicwriter, andgithub.amrom.workers.dev/moby/termincrease the dependency footprint. Ensure these are transitively required by the Docker client and not directly imported; otherwise, clean them up withgo mod tidy.
92-98: Check OpenTelemetry module alignment.Several
go.opentelemetry.io/...packages (v1.36.0) were added as indirect dependencies. If your controller code directly uses OpenTelemetry APIs, promote the needed modules to the primaryrequireblock. Otherwise, prune withgo mod tidy. Also verify that all OTel modules are on compatible minor/patch versions to avoid conflicts.deploy/cloud/helm/dynamo-platform-values.yaml (1)
40-40: Enable Kubernetes secret usage by default
You’ve added the newuseKubernetesSecret: trueflag underdockerRegistry, which aligns with the PR’s goal of simplifying secret handling. Confirm that any documentation (README and comments in related values files) is updated to mention this new setting.deploy/cloud/helm/platform/components/operator/templates/regcred-secret.yaml (1)
15-25: Wrap secret manifest with Helm conditional
Thedynamo-regcredSecret is now correctly enclosed in anifblock driven byuseKubernetesSecret, so it will only be rendered when needed. The YAMLLint error at line 15 is a known false positive due to Helm templating syntax ({{- if ... }}); you can ignore it or add a linter disable directive.🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 15-15: syntax error: expected the node content, but found '-'
(syntax)
deploy/cloud/helm/platform/values.yaml (1)
46-46: Add descriptive comment for new flag
The defaultuseKubernetesSecret: falseis appropriate. To help users understand its purpose, consider adding a comment above this line, e.g.:# Create and use a Kubernetes secret for Docker registry credentials when enabled useKubernetesSecret: falsedeploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go (1)
1046-1046: Optional: Remove redundant mock in failing test.In the
nil instanceIDtest the mock ServiceAccount (with the new label) is never used, since validation fails before listing SAs. You can remove this redundant mock to simplify the test.deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml (1)
18-18: YAML templating may require quoting around templated name.Some YAML linters (e.g., YAMLLint) flag:
error: expected <block end>, but found '<scalar>'
for unquoted Helm template expressions. Consider wrapping the templated name in quotes:-name: {{ include "dynamo-operator.fullname" . }}-image-builder +name: "{{ include "dynamo-operator.fullname" . }}-image-builder"🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 18-18: syntax error: expected , but found ''
(syntax)
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go (1)
1699-1710: Consider de-duplicatingImagePullSecretsbefore assigning
imagePullSecretsis built by blindly appending the single secret + the list from spec.
If the same secret name appears in both places you’ll end up with duplicates, which causes the
Pod admission controller to reject the spec in newer Kubernetes versions (≥1.28).-imagePullSecrets = append(imagePullSecrets, opt.dynamoComponent.Spec.ImagePullSecrets...) +for _, s := range opt.dynamoComponent.Spec.ImagePullSecrets { + if !slices.ContainsFunc(imagePullSecrets, func(x corev1.LocalObjectReference) bool { return x.Name == s.Name }) { + imagePullSecrets = append(imagePullSecrets, s) + } +}(Use
slices.ContainsFuncfromgolang.org/x/exp/slicesor a tiny loop.)Not blocking, but saves users from puzzling “duplicate name” errors.
deploy/cloud/operator/internal/controller/dynamocomponent_controller.go (1)
684-686: Variable shadows imported package
filters := filters.NewArgs()overrides the importedfilterspackage name, which can confuse
readers and static analysis. Rename the variable, e.g.args := filters.NewArgs().
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
deploy/cloud/operator/go.sumis excluded by!**/*.sum
📒 Files selected for processing (14)
deploy/cloud/helm/dynamo-platform-values.yaml(1 hunks)deploy/cloud/helm/platform/components/operator/templates/component-serviceaccount.yaml(1 hunks)deploy/cloud/helm/platform/components/operator/templates/deployment.yaml(2 hunks)deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml(1 hunks)deploy/cloud/helm/platform/components/operator/templates/regcred-secret.yaml(2 hunks)deploy/cloud/helm/platform/components/operator/templates/secret-env.yaml(1 hunks)deploy/cloud/helm/platform/components/operator/values.yaml(2 hunks)deploy/cloud/helm/platform/values.yaml(1 hunks)deploy/cloud/operator/api/dynamo/schemas/schemas.go(0 hunks)deploy/cloud/operator/go.mod(4 hunks)deploy/cloud/operator/internal/consts/consts.go(1 hunks)deploy/cloud/operator/internal/controller/dynamocomponent_controller.go(3 hunks)deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go(2 hunks)deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go(5 hunks)
💤 Files with no reviewable changes (1)
- deploy/cloud/operator/api/dynamo/schemas/schemas.go
🧰 Additional context used
🧬 Code Graph Analysis (2)
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go (1)
deploy/cloud/operator/internal/consts/consts.go (2)
KubeLabelDynamoComponentPod(59-59)KubeLabelValueTrue(56-56)
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go (1)
deploy/cloud/operator/internal/consts/consts.go (2)
KubeLabelDynamoComponentPod(59-59)KubeLabelValueTrue(56-56)
🪛 YAMLlint (1.37.1)
deploy/cloud/helm/platform/components/operator/templates/regcred-secret.yaml
[error] 15-15: syntax error: expected the node content, but found '-'
(syntax)
deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml
[error] 18-18: syntax error: expected , but found ''
(syntax)
deploy/cloud/helm/platform/components/operator/templates/component-serviceaccount.yaml
[error] 18-18: syntax error: expected , but found ''
(syntax)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (11)
deploy/cloud/operator/go.mod (2)
12-12: Confirm necessity of direct Docker client dependency.The addition of
github.com/docker/docker v28.2.2+incompatibleas a direct requirement may not be necessary if your code only uses a subset that could be pulled in indirectly. Please verify that the operator code explicitly imports this module (e.g.,github.com/docker/docker/client) and that the+incompatiblesuffix is intentional. Consider running:go mod why github.com/docker/dockerto ensure it’s required.
48-49: Validate docker/go-connections and go-units usage.The new indirect dependencies
github.com/docker/go-connectionsandgithub.amrom.workers.dev/docker/go-unitsare likely pulled in by the Docker client. Confirm whether your code imports these packages directly; if not,go mod tidyshould remove them, otherwise move them to the directrequirelist.deploy/cloud/operator/internal/consts/consts.go (1)
59-59: Verify removal of old constant references
RenamingKubeLabelDynamoDeploymentPodtoKubeLabelDynamoComponentPodis correct. Ensure that all references to the old constant (KubeLabelDynamoDeploymentPod) have been removed or replaced across the codebase, including templates, controller logic, and tests.deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go (2)
896-896: Update ServiceAccount label key toKubeLabelDynamoComponentPod.Test mock uses the new
commonconsts.KubeLabelDynamoComponentPodconstant to select the ServiceAccount, matching the refactored controller logic.
967-968: Reflect mocked ServiceAccount in PodTemplateSpec.The
ImagePullSecretsandServiceAccountNamefields are correctly set toniland the mocked"default-test-sa", respectively, aligning with the updated secret-handling logic ingenerateLeaderWorkerSet.deploy/cloud/helm/platform/components/operator/values.yaml (2)
76-82: AddserviceAccount.annotationsblocks for new components.The new empty annotation maps for
imageBuilderandcomponentsenable customization of ServiceAccounts via Helm values. This supports the recently added templates without breaking existing setups.
99-101: IntroduceuseKubernetesSecretflag for registry credentials.The boolean
useKubernetesSecretreplaces the previousinClusterServerstring, giving operators clear control over whether a Kubernetes Secret (dynamo-regcred) is created and used for Docker registry authentication.deploy/cloud/helm/platform/components/operator/templates/deployment.yaml (2)
50-53: Conditional DOCKER_CONFIG forkube-rbac-proxy.The
DOCKER_CONFIGenvironment variable is correctly gated byuseKubernetesSecret, directing clients to/docker/.dockerwhen the Docker registry secret is in use.
130-142: Mount Docker config secret into manager PodSpec.The
docker-configvolume and its mount at/docker/.dockerare conditionally added under the same flag, ensuring pods have access to the registry credentials only when requested. This matches the ServiceAccount behavior.deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go (1)
1738-1740: LGTM – correct constant renamedThe selector now uses the new
KubeLabelDynamoComponentPodkey, keeping it aligned with the updated chart template.deploy/cloud/operator/internal/controller/dynamocomponent_controller.go (1)
1128-1134: Nice conditional – avoids leaking emptyDOCKER_CONFIGThe
DOCKER_CONFIGenv-var is now injected only when a secret is mounted.
Keeps the builder pods clean.
deploy/cloud/helm/platform/components/operator/templates/secret-env.yaml
Outdated
Show resolved
Hide resolved
deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml
Show resolved
Hide resolved
deploy/cloud/helm/platform/components/operator/templates/component-serviceaccount.yaml
Outdated
Show resolved
Hide resolved
deploy/cloud/operator/internal/controller/dynamocomponent_controller.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
deploy/cloud/helm/platform/components/operator/templates/component-serviceaccount.yaml (1)
15-31: LGTM! ServiceAccount template follows established patterns correctly.The template structure is sound and the conditional
imagePullSecretsimplementation aligns with the PR objective to enable optional Kubernetes secret usage for Docker registry authentication.Note: The static analysis YAML error is a false positive - yamllint doesn't understand Helm template syntax.
🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 18-18: syntax error: expected , but found ''
(syntax)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
deploy/cloud/operator/go.sumis excluded by!**/*.sum
📒 Files selected for processing (7)
deploy/cloud/helm/platform/components/operator/templates/component-serviceaccount.yaml(1 hunks)deploy/cloud/helm/platform/components/operator/templates/deployment.yaml(2 hunks)deploy/cloud/helm/platform/components/operator/templates/image-builder-serviceaccount.yaml(1 hunks)deploy/cloud/helm/platform/components/operator/templates/planner.yaml(1 hunks)deploy/cloud/helm/platform/components/operator/templates/secret-env.yaml(1 hunks)deploy/cloud/operator/go.mod(3 hunks)deploy/cloud/operator/internal/controller/dynamocomponent_controller.go(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- deploy/cloud/helm/platform/components/operator/templates/secret-env.yaml
- deploy/cloud/helm/platform/components/operator/templates/deployment.yaml
🧰 Additional context used
🧠 Learnings (1)
deploy/cloud/helm/platform/components/operator/templates/image-builder-serviceaccount.yaml (1)
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1337
File: deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml:0-0
Timestamp: 2025-06-03T15:26:55.699Z
Learning: The image-builder ServiceAccount in deploy/cloud/helm/platform/components/operator/templates/image-builer-serviceaccount.yaml does not need imagePullSecrets, unlike the component ServiceAccount.
🪛 YAMLlint (1.37.1)
deploy/cloud/helm/platform/components/operator/templates/planner.yaml
[error] 22-22: syntax error: could not find expected ':'
(syntax)
deploy/cloud/helm/platform/components/operator/templates/component-serviceaccount.yaml
[error] 18-18: syntax error: expected , but found ''
(syntax)
deploy/cloud/helm/platform/components/operator/templates/image-builder-serviceaccount.yaml
[error] 18-18: syntax error: expected , but found ''
(syntax)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (5)
deploy/cloud/operator/go.mod (1)
11-11: LGTM! Excellent choice of dependencies for cloud-native authentication.The addition of cloud-specific credential helpers (ECR, ACR, GCR) and the official Google container registry client aligns perfectly with the PR objective to simplify Docker secret handling by leveraging cloud-specific mechanisms.
Also applies to: 13-13, 15-15
deploy/cloud/helm/platform/components/operator/templates/image-builder-serviceaccount.yaml (1)
15-26: LGTM! ServiceAccount structure is correct.The ServiceAccount template follows proper Kubernetes manifest structure and correctly omits
imagePullSecretsas intended for the image-builder component, based on the retrieved learning context.Note: The static analysis YAML error appears to be a false positive caused by yamllint not understanding Helm template syntax (
{{ include ... }}).🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 18-18: syntax error: expected , but found ''
(syntax)
deploy/cloud/operator/internal/controller/dynamocomponent_controller.go (3)
41-46: Excellent addition of cloud-native credential helpers.The new imports for ECR, ACR, and GCR credential helpers, along with the official Google container registry client, provide robust multi-cloud authentication support that aligns perfectly with the PR objectives.
664-690: LGTM! Proper registry-aware image existence checking.The
checkImageExistsfunction has been correctly implemented usinggo-containerregistrywith a multi-keychain approach that supports authentication from various sources (Docker config, GCR, ECR, ACR). This is a significant improvement over the previous Docker daemon client approach.🧰 Tools
🪛 golangci-lint (1.64.8)
664-664:
checkImageExists-dockerRegistryis unused(unparam)
1128-1134: Good conditional environment variable setting.The
DOCKER_CONFIGenvironment variable is now only set when a Docker config JSON secret is provided, which aligns with the optional Kubernetes secret usage pattern introduced in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
deploy/cloud/operator/internal/controller/dynamocomponent_controller.go (3)
597-621: Remove unnecessary error return type.The static analysis is correct - this method never returns an error. The error return type should be removed for cleaner code.
Apply this diff to simplify the method signature:
-func (r *DynamoComponentReconciler) getDockerRegistry(DynamoComponent *nvidiacomv1alpha1.DynamoComponent) (dockerRegistry *schemas.DockerRegistrySchema, err error) { +func (r *DynamoComponentReconciler) getDockerRegistry(DynamoComponent *nvidiacomv1alpha1.DynamoComponent) *schemas.DockerRegistrySchema {And update the return statement:
- return + return dockerRegistryYou'll also need to update the caller at line 706 to remove error handling.
🧰 Tools
🪛 golangci-lint (1.64.8)
597-597:
(*DynamoComponentReconciler).getDockerRegistry- resulterris alwaysnil(unparam)
664-690: Excellent refactoring! Consider improving error handling.The new implementation correctly uses the go-containerregistry library and addresses the previous Docker client issue. The multi-keychain approach properly supports authentication from multiple sources.
However, the error handling could be more robust:
Consider using more specific error checking instead of string matching:
_, err = remote.Head(ref, remote.WithAuthFromKeychain(keychain)) if err != nil { - if strings.Contains(err.Error(), "404") { + var transportErr *transport.Error + if errors.As(err, &transportErr) && transportErr.StatusCode == http.StatusNotFound { return false, nil } return false, fmt.Errorf("checking image: %w", err) }You'll need to add imports:
import ( "errors" "net/http" "github.com/google/go-containerregistry/pkg/v1/remote/transport" )
597-621: Excellent architectural improvement!This refactoring successfully achieves the PR objectives by:
- ✅ Removing direct Docker secret credential management
- ✅ Enabling cloud-native authentication (GCR, ECR, ACR credential helpers)
- ✅ Supporting conditional Kubernetes secret usage
- ✅ Using proper containerregistry client instead of Docker daemon client
The changes make the operator more flexible and cloud-native, allowing it to leverage workload identity and other cloud authentication mechanisms while maintaining backward compatibility through conditional secret usage.
Also applies to: 664-690, 1128-1133
🧰 Tools
🪛 golangci-lint (1.64.8)
597-597:
(*DynamoComponentReconciler).getDockerRegistry- resulterris alwaysnil(unparam)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
deploy/cloud/operator/internal/controller/dynamocomponent_controller.go(7 hunks)
🧰 Additional context used
🪛 golangci-lint (1.64.8)
deploy/cloud/operator/internal/controller/dynamocomponent_controller.go
597-597: (*DynamoComponentReconciler).getDockerRegistry - result err is always nil
(unparam)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (3)
deploy/cloud/operator/internal/controller/dynamocomponent_controller.go (3)
41-46: LGTM! Appropriate imports for containerregistry functionality.The new imports correctly support the refactored image existence checking using the official Google containerregistry library and credential helpers for major cloud providers.
1128-1133: LGTM! Correct conditional secret usage.The conditional setting of
DOCKER_CONFIGproperly aligns with the PR objective to only use Kubernetes secrets when explicitly configured, allowing for alternative authentication methods like workload identity.
266-266: LGTM! Correct function call integration.The call to
checkImageExistsis properly updated to work with the refactored function signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
deploy/cloud/operator/internal/controller/dynamocomponent_controller.go (1)
659-685: Excellent fix for the image existence checking!This implementation correctly addresses the critical issue from previous reviews by using the proper registry-aware client instead of the Docker daemon client. The multi-keychain approach provides robust authentication across different cloud providers.
Minor suggestion: The 404 error detection using
strings.Contains(err.Error(), "404")might be fragile. Consider checking for more specific error types if the library provides them.- if strings.Contains(err.Error(), "404") { + // Check for common "not found" error patterns + if strings.Contains(err.Error(), "404") || + strings.Contains(err.Error(), "manifest unknown") || + strings.Contains(err.Error(), "not found") {
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
deploy/cloud/operator/internal/controller/dynamocomponent_controller.go(8 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (4)
deploy/cloud/operator/internal/controller/dynamocomponent_controller.go (4)
41-46: Excellent improvement to container registry authentication!The new imports from official Google container registry libraries with multi-cloud credential helpers (ECR, ACR, GCR) provide a much more robust and secure authentication approach compared to the previous implementation.
593-616: Good simplification of Docker registry configuration retrieval.Removing error handling is appropriate here since the method now only constructs a configuration struct from environment/config values rather than performing credential retrieval operations that could fail.
699-708: Appropriate simplification following dependency changes.Removing error handling is correct since
getDockerRegistry()no longer returns errors. The method now cleanly constructs and returns theImageInfostruct.
1119-1124: Good practice: conditional environment variable setting.Only setting
DOCKER_CONFIGwhendockerConfigJSONSecretNameis provided prevents potential issues with empty configuration paths and aligns with the PR's objective of conditionally using Kubernetes secrets for Docker authentication.
Overview:
simplify docker secret
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Chores