Skip to content

Conversation

@julienmancuso
Copy link
Contributor

@julienmancuso julienmancuso commented Aug 22, 2025

Overview:

do not fail if backendFramework cannot be detected

Summary by CodeRabbit

  • New Features

    • Added a “No-op” backend option; deployments with no detectable backend now proceed without error.
    • Enabled explicit selection of the no-op backend.
    • Enhanced multinode deployment support by applying backend-specific adjustments during pod/container generation.
  • Refactor

    • Simplified backend detection and configuration handling to treat “no backend” as a valid outcome.
  • Tests

    • Updated test cases to expect the no-op backend for no-detection scenarios and removed error expectations accordingly.

@nealvaidya
Copy link
Contributor

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 22, 2025

Walkthrough

Introduces a “noop” backend framework sentinel, adds a NoopBackend implementing extended Backend interface (now takes a MultinodeDeployer in UpdateContainer), adds a helper to convert DynamoComponentDeployment to OverridesSpec, updates backend factory and pod-spec wiring to pass a multinode deployer, and adjusts detection logic/tests to return noop instead of errors.

Changes

Cohort / File(s) Summary
Backend framework, factory, and wiring
deploy/cloud/operator/internal/dynamo/graph.go
Added BackendFrameworkNoop ("noop") sentinel; implemented public NoopBackend; extended Backend interface to include multinodeDeployer in UpdateContainer; added ConvertDynamoComponentDeploymentToSpec(...); updated BackendFactory to return NoopBackend for noop; GenerateBasePodSpec now acquires a MultinodeDeployer and passes it to backend.UpdateContainer; detection/determination logic returns noop when nothing detected.
Tests updated for noop semantics
deploy/cloud/operator/internal/dynamo/graph_test.go
Revised tests to expect BackendFrameworkNoop instead of errors for no-detection/failure cases; updated test names and expectations accordingly; shifted from expectError to asserting concrete BackendFramework outcomes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as Caller
  participant D as detectBackendFrameworkFromArgs
  participant DF as determineBackendFramework
  participant BF as BackendFactory
  participant G as GenerateBasePodSpec
  participant MF as MultinodeDeployerFactory
  participant B as Backend (incl. NoopBackend)

  U->>D: Analyze args
  D-->>U: BackendFramework (noop if none)

  U->>DF: Combine detection + explicit config
  alt explicit provided
    DF-->>U: explicit if compatible
    else DF-->>U: error on mismatch
  else no explicit
    DF-->>U: noop (sentinel)
  end

  U->>BF: Create backend for framework
  BF-->>U: Backend instance (NoopBackend for noop)

  U->>G: Generate base pod spec
  G->>MF: Get MultinodeDeployer
  MF-->>G: MultinodeDeployer
  G->>B: UpdateContainer(container, nodes, role, componentSpec, svc, multinodeDeployer)
  B-->>G: Container updated (no-op if NoopBackend)
  G->>B: UpdatePodSpec(podSpec, nodes, role, componentSpec, svc)
  B-->>G: PodSpec updated (no-op if NoopBackend)
  G-->>U: Final pod spec
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • feat: add grove multinode support #2269 — Touches the same backend/multinode abstractions and factory/wiring in graph.go, likely part of or precursor to these sentinel/noop and interface changes.

Poem

I twitch my ears at “noop” so neat,
A silent hop, no thumping feat—
Backend burrow calm and clear,
Multinode winds now whisper near.
Specs convert, containers glide,
I nibble tests with quiet pride.
Code-carrots stacked, let’s ride! 🥕🐇

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.2.2)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
deploy/cloud/operator/internal/dynamo/graph.go (2)

1056-1059: Normalize explicit backend string to avoid case/whitespace mismatches

If users set Spec.BackendFramework as "SGLANG" or with spaces, it won’t match detected lowercase tokens. Normalize once when deriving explicitFramework.

-    var explicitFramework BackendFramework
-    if explicitBackendFramework != "" {
-        explicitFramework = BackendFramework(explicitBackendFramework)
-    }
+    var explicitFramework BackendFramework
+    if explicitBackendFramework != "" {
+        // normalize user input to canonical lowercase, trimmed form
+        normalized := strings.ToLower(strings.TrimSpace(explicitBackendFramework))
+        explicitFramework = BackendFramework(normalized)
+    }

1082-1083: Document the new noop sentinel behavior for future readers

determineBackendFramework now returns BackendFrameworkNoop when a worker has neither detection nor explicit config. Consider updating the function comment to state this contract explicitly.

-// determineBackendFramework is the core logic for hybrid backend framework detection
-// Takes extracted parameters and applies the detection logic
+// determineBackendFramework is the core logic for hybrid backend framework detection.
+// Behavior:
+// - Non-worker components: always return BackendFrameworkNoop.
+// - Workers: try detect from command/args; if none detected and no explicit config, return BackendFrameworkNoop.
+// - If both detected and explicit exist and differ (excluding noop), return an error.
deploy/cloud/operator/internal/dynamo/graph_test.go (1)

3547-3569: Add noop to BackendFramework enum test for completeness

Now that BackendFrameworkNoop is a first-class sentinel, include it in the enum assertions to prevent regressions.

 func TestBackendFrameworkEnum(t *testing.T) {
   // Test that backend framework constants are defined correctly
   if BackendFrameworkSGLang != "sglang" {
     t.Errorf("BackendFrameworkSGLang = %v, want \"sglang\"", BackendFrameworkSGLang)
   }
   if BackendFrameworkVLLM != "vllm" {
     t.Errorf("BackendFrameworkVLLM = %v, want \"vllm\"", BackendFrameworkVLLM)
   }
   if BackendFrameworkTRTLLM != "trtllm" {
     t.Errorf("BackendFrameworkTRTLLM = %v, want \"trtllm\"", BackendFrameworkTRTLLM)
   }
+  if BackendFrameworkNoop != "noop" {
+    t.Errorf("BackendFrameworkNoop = %v, want \"noop\"", BackendFrameworkNoop)
+  }
 
   // Test that frameworks can be compared
-  frameworks := []BackendFramework{BackendFrameworkSGLang, BackendFrameworkVLLM, BackendFrameworkTRTLLM}
+  frameworks := []BackendFramework{BackendFrameworkSGLang, BackendFrameworkVLLM, BackendFrameworkTRTLLM, BackendFrameworkNoop}
   for _, framework := range frameworks {
     switch framework {
-    case BackendFrameworkSGLang, BackendFrameworkVLLM, BackendFrameworkTRTLLM:
+    case BackendFrameworkSGLang, BackendFrameworkVLLM, BackendFrameworkTRTLLM, BackendFrameworkNoop:
       // Expected
     default:
       t.Errorf("Unexpected framework value: %v", framework)
     }
   }
 }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b92a805 and d8aa401.

📒 Files selected for processing (2)
  • deploy/cloud/operator/internal/dynamo/graph.go (3 hunks)
  • deploy/cloud/operator/internal/dynamo/graph_test.go (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1308-1312
Timestamp: 2025-06-11T21:29:28.650Z
Learning: User julienmancuso expects replies in English; avoid switching languages unless explicitly requested.
🧬 Code graph analysis (1)
deploy/cloud/operator/internal/dynamo/graph_test.go (3)
deploy/cloud/operator/internal/dynamo/graph.go (1)
  • BackendFrameworkNoop (1026-1026)
deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (2)
  • DynamoComponentDeploymentOverridesSpec (56-58)
  • DynamoComponentDeploymentSharedSpec (60-111)
deploy/cloud/operator/api/v1alpha1/dynamographdeployment_types.go (1)
  • DynamoGraphDeployment (63-71)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (8)
deploy/cloud/operator/internal/dynamo/graph.go (3)

1015-1015: Return noop on no detection — aligns with PR goal

Switching to BackendFrameworkNoop avoids hard failures when args don’t reveal a backend. This matches the PR intent and keeps workers deployable in ambiguous cases.


1062-1065: Ignore noop in mismatch checks — good guard

Excluding BackendFrameworkNoop from mismatch comparisons prevents false conflicts when detection yields “no-op/unknown.” Looks correct.


1068-1070: Preference order (detected > explicit) is sensible

Preferring a positive detection over an explicit config (except noop) minimizes config drift. Looks good.

deploy/cloud/operator/internal/dynamo/graph_test.go (5)

3630-3633: Test updated to expect noop when nothing detected — good coverage

Asserting BackendFrameworkNoop here matches the new sentinel behavior.


3713-3716: Worker no detection/no explicit → noop path validated

This test cements the non-failing fallback. Looks good.


3718-3723: Worker “detection failure” scenario now expecting noop — consistent

Aligns with the relaxed behavior.


3846-3855: Component-level: no detection/no explicit → noop — correct expectation

Matches determineBackendFramework contract.


3857-3871: Component-level: benign args now result in noop — correct

Good to see both paths (component-only and graph-level explicit) covered.

@julienmancuso
Copy link
Contributor Author

Should the spec in https://github.com/ai-dynamo/dynamo/blob/main/deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml be updated too to reflect this?

@nealvaidya , there is no change for this field. Noop is not a value that users should have to set.
Thsi change is for corner cases (Hello world example) where component of type "worker" are defined with dummy scripts.

@nealvaidya
Copy link
Contributor

@nealvaidya , there is no change for this field. Noop is not a value that users should have to set.
Thsi change is for corner cases (Hello world example) where component of type "worker" are defined with dummy scripts.

Apologies if I'm misunderstanding, but I don't think we should consider this a corner case -- it should be explicitly supported that users can write and deploy their own components that don't utilize any of the existing vllm/sglang/trtllm backends. If you're saying the user doesn't need to explicitly specify the backend as "none", we should state in the property description or in some documentation that it is optional (and ideally what the effect is if you do set it)

Copy link
Contributor

@biswapanda biswapanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM

@julienmancuso
Copy link
Contributor Author

@nealvaidya , there is no change for this field. Noop is not a value that users should have to set.
Thsi change is for corner cases (Hello world example) where component of type "worker" are defined with dummy scripts.

Apologies if I'm misunderstanding, but I don't think we should consider this a corner case -- it should be explicitly supported that users can write and deploy their own components that don't utilize any of the existing vllm/sglang/trtllm backends. If you're saying the user doesn't need to explicitly specify the backend as "none", we should state in the property description or in some documentation that it is optional (and ideally what the effect is if you do set it)

This is used by the operator to determine what kind of multinode configuration it has to inject/use (ray for vllm, sglang additional parameters and mpirun for trtllm). What I'm saying is that if not 1 of these 3, the operator will do nothing and users doesn't have to specify anything.

@julienmancuso julienmancuso merged commit 390a339 into main Aug 24, 2025
12 of 14 checks passed
@julienmancuso julienmancuso deleted the jsm/dep-338 branch August 24, 2025 13:40
hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025
jasonqinzhou pushed a commit that referenced this pull request Aug 30, 2025
KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025
nnshah1 pushed a commit that referenced this pull request Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants