Skip to content

Conversation

@julienmancuso
Copy link
Contributor

@julienmancuso julienmancuso commented Aug 22, 2025

Overview:

fix env vars override

Summary by CodeRabbit

  • Bug Fixes

    • Adjusted environment variable merge order for container specs when overrides are applied. Component-level variables are now applied last, after base and main container values, reducing unexpected overrides across components (including Worker). No other functional changes.
  • Tests

    • Added unit test for Worker pod spec generation with main container overrides, validating environment variables, probes, ports, and mounts.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 22, 2025

Walkthrough

The environment variable merge order in GenerateBasePodSpec was changed to first merge existing container env with ExtraPodSpec.MainContainer env, then merge component env on top. A new unit test validates Worker component PodSpec generation with overrides. No public APIs changed.

Changes

Cohort / File(s) Summary
Operator pod-spec env merging
deploy/cloud/operator/internal/dynamo/graph.go
Adjusted env merging sequence: capture pre-merge container env, merge with ExtraPodSpec.MainContainer env, then merge component env. No other behavior changes.
Unit tests for Worker pod-spec
deploy/cloud/operator/internal/dynamo/graph_test.go
Added TestGenerateBasePodSpec_Worker validating pod spec construction and env merging with MainContainer overrides for Worker.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Caller
  participant G as GenerateBasePodSpec
  participant B as Base Container
  participant M as Extra.MainContainer
  participant E as Component Envs

  C->>G: GenerateBasePodSpec(...)
  G->>B: Read B.Env (pre-merge)
  Note right of G: containerEnvs = copy(B.Env)

  G->>B: Apply M overrides to container
  Note right of G: container now has post-merge Env

  G->>B: B.Env = MergeEnvs(containerEnvs, B.Env)
  Note over G,B: First merge: pre-merge env + main container env

  G->>B: B.Env = MergeEnvs(E, B.Env)
  Note over G,B: Second merge: component env on top (new precedence)

  G-->>C: PodSpec
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

In pods I hop with tidy cheer,
I shuffle envs both far and near.
First base and main, then components reign,
A merge so neat, precise, arcane.
With tests to guard the warren’s way—
Deploy, dear cluster, hop all day! 🐇✨

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.2.2)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
deploy/cloud/operator/internal/dynamo/graph.go (1)

704-712: Take a defensive copy of env slice before merge

containerEnvs := container.Env only copies the slice header and may alias the underlying array. If mergo modifies the slice in place for any reason, this can lead to subtle bugs. Copy the slice to get a stable snapshot.

Apply this diff:

-           containerEnvs := container.Env
+           // Take a snapshot of the pre-merge envs to avoid aliasing the underlying array
+           containerEnvs := append([]corev1.EnvVar(nil), container.Env...)
deploy/cloud/operator/internal/dynamo/graph_test.go (4)

4406-4413: Fix confusing test case name

This test lives under TestGenerateBasePodSpec_Worker but its case name reads “Planner component…”. Rename to reflect worker behavior to avoid confusion during triage.

Apply this diff:

-            name: "Planner component should have planner service account",
+            name: "Worker: MainContainer overrides + component env merge + probes",

4525-4529: Stabilize env var ordering before diff to reduce flakiness

Unlike other tests in this file, this one compares full PodSpecs without sorting env vars first. Insert a stable sort on both actual and expected container env slices before calling cmp.Diff.

Apply this diff:

-            diff := cmp.Diff(tt.expectedPodSpec, podSpec)
+            // Stabilize env var order (name asc) before diff to avoid order-related flakes.
+            for i := range podSpec.Containers {
+                podSpec.Containers[i].Env = sortEnvVars(podSpec.Containers[i].Env)
+            }
+            for i := range tt.expectedPodSpec.Containers {
+                tt.expectedPodSpec.Containers[i].Env = sortEnvVars(tt.expectedPodSpec.Containers[i].Env)
+            }
+            diff := cmp.Diff(tt.expectedPodSpec, podSpec)

4491-4499: Optional: simplify quantity pointer construction

You can avoid the lambda by using resource.NewQuantity(512*1024*1024, resource.BinarySI) directly, or a small helper to keep expected structures concise. Current code is fine but a bit noisy.

Example:

-                                SizeLimit: func() *resource.Quantity { q := resource.MustParse("512Mi"); return &q }(),
+                                SizeLimit: resource.NewQuantity(512*1024*1024, resource.BinarySI),

4525-4529: Prefer cmp options over full-struct equality for longevity

cmp.Diff on the full PodSpec will break if unrelated defaulted fields are introduced. Consider cmpopts.EquateEmpty() and field-by-field comparisons (or cmp filters) focused on what matters in this test (envs, probes, ports, shm mount). This keeps the test resilient to orthogonal changes.

Want me to propose a cmp.Options block that narrows the comparison?

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 02e59bb and 4b0a237.

📒 Files selected for processing (2)
  • deploy/cloud/operator/internal/dynamo/graph.go (1 hunks)
  • deploy/cloud/operator/internal/dynamo/graph_test.go (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1308-1312
Timestamp: 2025-06-11T21:29:28.650Z
Learning: User julienmancuso expects replies in English; avoid switching languages unless explicitly requested.
🧬 Code graph analysis (1)
deploy/cloud/operator/internal/dynamo/graph_test.go (5)
deploy/cloud/operator/internal/dynamo/graph.go (4)
  • Config (64-74)
  • GenerateBasePodSpec (680-816)
  • BackendFrameworkSGLang (599-599)
  • RoleMain (571-571)
deploy/cloud/operator/internal/controller_common/predicate.go (1)
  • Config (40-48)
deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (2)
  • DynamoComponentDeploymentOverridesSpec (56-58)
  • DynamoComponentDeploymentSharedSpec (60-111)
deploy/cloud/operator/internal/consts/consts.go (4)
  • ComponentTypeWorker (42-42)
  • DynamoSystemPortName (16-16)
  • DynamoSystemPort (15-15)
  • MultinodeDeploymentTypeGrove (65-65)
deploy/cloud/operator/api/dynamo/common/common.go (1)
  • ExtraPodSpec (58-61)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (4)
deploy/cloud/operator/internal/dynamo/graph.go (3)

704-712: Good fix: preserve base env then let MainContainer override it

Capturing pre-merge container.Env and then re-merging after mergo.Merge correctly ensures ExtraPodSpec.MainContainer envs take precedence over defaults from the base container. This addresses the “overwritten by base” bug.


704-712: Confirm intended “no-delete” semantics for envs

With the current re-merge, a user cannot remove a default env by setting main.Env to an empty slice (defaults get reintroduced from containerEnvs). If deletions (unsetting) should be supported, we’ll need an explicit deletion mechanism (e.g., a MergePolicy, or honoring an “unset” annotation per env var).

Would you like me to draft a small MergeEnvsWithPolicy helper to support explicit deletion?


719-719: Verify final precedence: component.Envs currently lose to container/MainContainer

MergeEnvs(common, specific) gives “specific” (2nd arg) higher precedence. Using MergeEnvs(component.Envs, container.Env) means values coming from the base+MainContainer will override values defined in component.Envs. Is that what we want?

  • If “component-level overrides should win,” flip the merge order.
  • If “MainContainer is the most specific,” keep as is.

If component-level overrides should win, apply:

-   container.Env = MergeEnvs(component.Envs, container.Env)
+   // Let component-level envs override base/MainContainer envs
+   container.Env = MergeEnvs(container.Env, component.Envs)

Follow-up: I can add/adjust tests to lock this precedence (base < MainContainer < component or base < component < MainContainer). Which order should we codify?

deploy/cloud/operator/internal/dynamo/graph_test.go (1)

4417-4440: Add an explicit “env precedence on key collision” assertion

This test ensures both component and MainContainer envs are present, but doesn’t cover collisions. Add a sub-test (or extend this case) with the same key in both component.Envs and MainContainer.Env to lock in the intended precedence. This will prevent regressions if merge order changes again.

I can push a follow-up test like below once you confirm the intended precedence:

         tests := []struct {
             name            string
             component       *v1alpha1.DynamoComponentDeploymentOverridesSpec
             expectedPodSpec *corev1.PodSpec
         }{
+            {
+                name: "Worker: env key collision precedence",
+                component: &v1alpha1.DynamoComponentDeploymentOverridesSpec{
+                    DynamoComponentDeploymentSharedSpec: v1alpha1.DynamoComponentDeploymentSharedSpec{
+                        Envs: []corev1.EnvVar{
+                            {Name: "OVERLAP", Value: "from-component"},
+                        },
+                        ComponentType: commonconsts.ComponentTypeWorker,
+                        ExtraPodSpec: &common.ExtraPodSpec{
+                            MainContainer: &corev1.Container{
+                                Command: []string{"python3"},
+                                Args:    []string{"-m", "dynamo.worker"},
+                                Env: []corev1.EnvVar{
+                                    {Name: "OVERLAP", Value: "from-main"},
+                                },
+                            },
+                        },
+                    },
+                },
+                expectedPodSpec: &corev1.PodSpec{
+                    Containers: []corev1.Container{
+                        {
+                            Name:    "main",
+                            Command: []string{"python3"},
+                            Args:    []string{"-m", "dynamo.worker"},
+                            Env: []corev1.EnvVar{
+                                // TODO: set Value to "from-main" or "from-component"
+                                // depending on the confirmed precedence.
+                                {Name: "OVERLAP", Value: "from-main"},
+                            },
+                            // ... keep the rest minimal for this focused assertion ...
+                        },
+                    },
+                },
+            },
         }

@julienmancuso julienmancuso merged commit 8ad558d into main Aug 24, 2025
9 of 12 checks passed
@julienmancuso julienmancuso deleted the jsm/dep-336 branch August 24, 2025 13:41
julienmancuso added a commit that referenced this pull request Aug 25, 2025
hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025
nv-anants pushed a commit that referenced this pull request Aug 28, 2025
jasonqinzhou pushed a commit that referenced this pull request Aug 30, 2025
KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025
Signed-off-by: Krishnan Prashanth <[email protected]>
nnshah1 pushed a commit that referenced this pull request Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants