Skip to content

Conversation

@GavinZhu-GMI
Copy link
Contributor

@GavinZhu-GMI GavinZhu-GMI commented Sep 12, 2025

Overview:

When the planner automatically adds VllmDecodeWorker and VllmPrefillWorker
components via Kubernetes API patches, these components were not inheriting
the dynamoNamespace configuration from existing components in the same
DynamoGraphDeployment. This caused namespace mismatch errors like:

"namespace mismatch for component VllmDecodeWorker: graph uses namespace
dynamo but component specifies dynamo-xxx"

Details:

Modified the getDynamoNamespace() function in deploy/cloud/operator/internal/dynamo/graph.go to:

  1. Separate component processing: Distinguish between components with explicit namespace settings and those without
  2. Inheritance logic: Use the first explicitly set namespace as the authoritative graph namespace
  3. Conflict detection: Maintain existing validation for namespace mismatches between explicitly configured components
  4. Fallback handling: Only use default namespace generation when no components have explicit settings

Where should the reviewer start?

Maybe start from getDynamoNamespace() func.

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • Haven't raised a issue yet.

Summary by CodeRabbit

  • New Features
    • Apply a graph-wide Dynamo namespace automatically to all component deployments.
    • Use a sensible default Dynamo namespace when none is specified.
  • Bug Fixes
    • Fix inconsistent namespace selection by standardizing resolution across components.
    • Add validation to detect and block conflicting per-component Dynamo namespaces with clear errors.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 12, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link

👋 Hi GavinZhu-GMI! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added the external-contribution Pull request is from an external contributor label Sep 12, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 12, 2025

Walkthrough

Adds graph-level DynamoNamespace resolution with conflict detection and defaulting, and forces all generated DynamoComponentDeployments to inherit that namespace. The resolver scans components, detects mismatches among explicit namespaces, derives a single graph namespace (explicit or default), and applies it during deployment generation.

Changes

Cohort / File(s) Summary of Changes
Dynamo graph resolution and propagation
deploy/cloud/operator/internal/dynamo/graph.go
Implement graph-wide DynamoNamespace resolution with conflict detection and default fallback; track components with/without explicit namespace; assign resolved namespace to each generated DynamoComponentDeployment. No exported/public signatures changed.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Operator
  participant Graph as DynamoGraph
  participant Resolver as getDynamoNamespace
  participant Default as GetDefaultDynamoNamespace
  participant DeployGen as GenerateDynamoComponentsDeployments
  participant K8s as Kubernetes

  Operator->>Resolver: Resolve graph DynamoNamespace (Graph)
  Resolver->>Graph: Scan services/components
  Note over Resolver: Collect components with/without explicit DynamoNamespace
  alt Multiple explicit namespaces differ
    Resolver-->>Operator: Error (namespace mismatch)
  else Any explicit namespace present
    Resolver-->>Operator: Resolved = first explicit namespace
  else No explicit namespaces
    Resolver->>Default: Request default namespace
    Default-->>Resolver: Default namespace
    Resolver-->>Operator: Resolved = default
  end

  Operator->>DeployGen: Generate component deployments (Resolved namespace)
  loop For each component
    DeployGen->>DeployGen: Set deployment.Spec.DynamoNamespace = Resolved
    DeployGen->>K8s: Create/Apply component deployment
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title accurately and concisely summarizes the primary change: ensuring auto-generated components inherit the graph Dynamo namespace. It directly reflects the modifications to namespace resolution described in the changeset, is specific to the bug being fixed, and contains no extraneous noise.
Description Check ✅ Passed The description follows the repository template and provides a clear Overview, Details, Where to start, and Related Issues note; it explains the bug, the change to getDynamoNamespace, the inheritance and conflict-detection behavior, and suggests a reviewer entry point. The Related Issues section correctly notes none exist, but the reviewer guidance is a bit informal and could use a precise file path and function pointers.

Poem

I thump my paws on fertile ground,
One name for fields, the graph profound.
No quarrels now where keys reside,
A single warren, unified.
If none declare, the burrow’s known—
Defaults embraced, deployments sown.
Hop! The namespace seeds are grown.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (5)
deploy/cloud/operator/internal/dynamo/graph.go (5)

133-136: Prefer ptr.To for string pointers; avoid taking address of local.

Using &graphDynamoNamespace works but aliases the same pointer across all deployments and relies on escape analysis. Use ptr.To for clarity and idiomatic K8s code.

-        deployment.Spec.DynamoNamespace = &graphDynamoNamespace
+        deployment.Spec.DynamoNamespace = ptr.To(graphDynamoNamespace)

194-205: Remove unused tracking slices.

componentsWithNamespace and componentsWithoutNamespace are collected but never used; they add noise and risk future confusion.

-    componentsWithNamespace := make([]string, 0)
-    componentsWithoutNamespace := make([]string, 0)
-    
-    // First pass: collect components with explicit namespace settings
+    // Iterate services and resolve the graph namespace from explicit settings
     for componentName, component := range parentDynamoGraphDeployment.Spec.Services {
         dynamoNamespace := ""
         if component.DynamoNamespace != nil && *component.DynamoNamespace != "" {
             dynamoNamespace = *component.DynamoNamespace
-            componentsWithNamespace = append(componentsWithNamespace, componentName)
-        } else {
-            componentsWithoutNamespace = append(componentsWithoutNamespace, componentName)
         }

197-216: Stabilize conflict detection by iterating over sorted service names.

Map iteration order is random; sorting yields deterministic, testable error messages.

-    // First pass: collect components with explicit namespace settings
-    for componentName, component := range parentDynamoGraphDeployment.Spec.Services {
+    // Iterate deterministically for stable behavior/messages
+    serviceNames := make([]string, 0, len(parentDynamoGraphDeployment.Spec.Services))
+    for name := range parentDynamoGraphDeployment.Spec.Services {
+        serviceNames = append(serviceNames, name)
+    }
+    sort.Strings(serviceNames)
+    for _, componentName := range serviceNames {
+        component := parentDynamoGraphDeployment.Spec.Services[componentName]
         dynamoNamespace := ""
         if component.DynamoNamespace != nil && *component.DynamoNamespace != "" {
             dynamoNamespace = *component.DynamoNamespace
         }

930-937: Use ptr.To and confirm in-place mutation is intended.

  • Use ptr.To for consistency with K8s style.
  • This mutates the input CR (dynamoDeployment.Spec.Services[...]); verify this side-effect is desired.
-        component.DynamoNamespace = &dynamoNamespace
+        component.DynamoNamespace = ptr.To(dynamoNamespace)

If mutation is not desired, deep-copy the component before modification and pass the copy downstream.


1010-1031: Add KubeLabelDynamoNamespace to Grove path for parity with component deployments.

Component CRs set the label; Grove cliques/pods don’t. Adding it keeps selectors and tooling consistent.

     labels[commonconsts.KubeLabelDynamoGraphDeploymentName] = dynamoDeployment.Name
     if component.ComponentType != "" {
         labels[commonconsts.KubeLabelDynamoComponentType] = component.ComponentType
     }
+    if component.DynamoNamespace != nil && *component.DynamoNamespace != "" {
+        labels[commonconsts.KubeLabelDynamoNamespace] = *component.DynamoNamespace
+    }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c8ecc40 and 4b68ea2.

📒 Files selected for processing (1)
  • deploy/cloud/operator/internal/dynamo/graph.go (2 hunks)
🧰 Additional context used
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/3020/merge) by GavinZhu-GMI.
deploy/cloud/operator/internal/dynamo/graph.go

[error] 1-1: Command failed: pre-commit run --show-diff-on-failure --color=always --all-files. Trailing-whitespace check failed; the hook modified deploy/cloud/operator/internal/dynamo/graph.go. Re-run 'pre-commit run --all-files' and commit the changes.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo

@github-actions
Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Oct 12, 2025
@GavinZhu-GMI GavinZhu-GMI requested a review from a team as a code owner October 13, 2025 00:40
@github-actions github-actions bot removed the Stale label Oct 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor fix size/S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant