-
Notifications
You must be signed in to change notification settings - Fork 753
fix: do not fail if backendFramework cannot be detected #2655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Should the spec in https://github.com/ai-dynamo/dynamo/blob/main/deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml be updated too to reflect this? |
WalkthroughIntroduces a “noop” backend framework sentinel, adds a NoopBackend implementing extended Backend interface (now takes a MultinodeDeployer in UpdateContainer), adds a helper to convert DynamoComponentDeployment to OverridesSpec, updates backend factory and pod-spec wiring to pass a multinode deployer, and adjusts detection logic/tests to return noop instead of errors. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as Caller
participant D as detectBackendFrameworkFromArgs
participant DF as determineBackendFramework
participant BF as BackendFactory
participant G as GenerateBasePodSpec
participant MF as MultinodeDeployerFactory
participant B as Backend (incl. NoopBackend)
U->>D: Analyze args
D-->>U: BackendFramework (noop if none)
U->>DF: Combine detection + explicit config
alt explicit provided
DF-->>U: explicit if compatible
else DF-->>U: error on mismatch
else no explicit
DF-->>U: noop (sentinel)
end
U->>BF: Create backend for framework
BF-->>U: Backend instance (NoopBackend for noop)
U->>G: Generate base pod spec
G->>MF: Get MultinodeDeployer
MF-->>G: MultinodeDeployer
G->>B: UpdateContainer(container, nodes, role, componentSpec, svc, multinodeDeployer)
B-->>G: Container updated (no-op if NoopBackend)
G->>B: UpdatePodSpec(podSpec, nodes, role, componentSpec, svc)
B-->>G: PodSpec updated (no-op if NoopBackend)
G-->>U: Final pod spec
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Poem
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.2.2)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
deploy/cloud/operator/internal/dynamo/graph.go (2)
1056-1059: Normalize explicit backend string to avoid case/whitespace mismatchesIf users set Spec.BackendFramework as "SGLANG" or with spaces, it won’t match detected lowercase tokens. Normalize once when deriving explicitFramework.
- var explicitFramework BackendFramework - if explicitBackendFramework != "" { - explicitFramework = BackendFramework(explicitBackendFramework) - } + var explicitFramework BackendFramework + if explicitBackendFramework != "" { + // normalize user input to canonical lowercase, trimmed form + normalized := strings.ToLower(strings.TrimSpace(explicitBackendFramework)) + explicitFramework = BackendFramework(normalized) + }
1082-1083: Document the new noop sentinel behavior for future readersdetermineBackendFramework now returns BackendFrameworkNoop when a worker has neither detection nor explicit config. Consider updating the function comment to state this contract explicitly.
-// determineBackendFramework is the core logic for hybrid backend framework detection -// Takes extracted parameters and applies the detection logic +// determineBackendFramework is the core logic for hybrid backend framework detection. +// Behavior: +// - Non-worker components: always return BackendFrameworkNoop. +// - Workers: try detect from command/args; if none detected and no explicit config, return BackendFrameworkNoop. +// - If both detected and explicit exist and differ (excluding noop), return an error.deploy/cloud/operator/internal/dynamo/graph_test.go (1)
3547-3569: Add noop to BackendFramework enum test for completenessNow that BackendFrameworkNoop is a first-class sentinel, include it in the enum assertions to prevent regressions.
func TestBackendFrameworkEnum(t *testing.T) { // Test that backend framework constants are defined correctly if BackendFrameworkSGLang != "sglang" { t.Errorf("BackendFrameworkSGLang = %v, want \"sglang\"", BackendFrameworkSGLang) } if BackendFrameworkVLLM != "vllm" { t.Errorf("BackendFrameworkVLLM = %v, want \"vllm\"", BackendFrameworkVLLM) } if BackendFrameworkTRTLLM != "trtllm" { t.Errorf("BackendFrameworkTRTLLM = %v, want \"trtllm\"", BackendFrameworkTRTLLM) } + if BackendFrameworkNoop != "noop" { + t.Errorf("BackendFrameworkNoop = %v, want \"noop\"", BackendFrameworkNoop) + } // Test that frameworks can be compared - frameworks := []BackendFramework{BackendFrameworkSGLang, BackendFrameworkVLLM, BackendFrameworkTRTLLM} + frameworks := []BackendFramework{BackendFrameworkSGLang, BackendFrameworkVLLM, BackendFrameworkTRTLLM, BackendFrameworkNoop} for _, framework := range frameworks { switch framework { - case BackendFrameworkSGLang, BackendFrameworkVLLM, BackendFrameworkTRTLLM: + case BackendFrameworkSGLang, BackendFrameworkVLLM, BackendFrameworkTRTLLM, BackendFrameworkNoop: // Expected default: t.Errorf("Unexpected framework value: %v", framework) } } }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
deploy/cloud/operator/internal/dynamo/graph.go(3 hunks)deploy/cloud/operator/internal/dynamo/graph_test.go(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1308-1312
Timestamp: 2025-06-11T21:29:28.650Z
Learning: User julienmancuso expects replies in English; avoid switching languages unless explicitly requested.
🧬 Code graph analysis (1)
deploy/cloud/operator/internal/dynamo/graph_test.go (3)
deploy/cloud/operator/internal/dynamo/graph.go (1)
BackendFrameworkNoop(1026-1026)deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (2)
DynamoComponentDeploymentOverridesSpec(56-58)DynamoComponentDeploymentSharedSpec(60-111)deploy/cloud/operator/api/v1alpha1/dynamographdeployment_types.go (1)
DynamoGraphDeployment(63-71)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (8)
deploy/cloud/operator/internal/dynamo/graph.go (3)
1015-1015: Return noop on no detection — aligns with PR goalSwitching to BackendFrameworkNoop avoids hard failures when args don’t reveal a backend. This matches the PR intent and keeps workers deployable in ambiguous cases.
1062-1065: Ignore noop in mismatch checks — good guardExcluding BackendFrameworkNoop from mismatch comparisons prevents false conflicts when detection yields “no-op/unknown.” Looks correct.
1068-1070: Preference order (detected > explicit) is sensiblePreferring a positive detection over an explicit config (except noop) minimizes config drift. Looks good.
deploy/cloud/operator/internal/dynamo/graph_test.go (5)
3630-3633: Test updated to expect noop when nothing detected — good coverageAsserting BackendFrameworkNoop here matches the new sentinel behavior.
3713-3716: Worker no detection/no explicit → noop path validatedThis test cements the non-failing fallback. Looks good.
3718-3723: Worker “detection failure” scenario now expecting noop — consistentAligns with the relaxed behavior.
3846-3855: Component-level: no detection/no explicit → noop — correct expectationMatches determineBackendFramework contract.
3857-3871: Component-level: benign args now result in noop — correctGood to see both paths (component-only and graph-level explicit) covered.
@nealvaidya , there is no change for this field. Noop is not a value that users should have to set. |
Apologies if I'm misunderstanding, but I don't think we should consider this a corner case -- it should be explicitly supported that users can write and deploy their own components that don't utilize any of the existing vllm/sglang/trtllm backends. If you're saying the user doesn't need to explicitly specify the backend as "none", we should state in the property description or in some documentation that it is optional (and ideally what the effect is if you do set it) |
biswapanda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM
This is used by the operator to determine what kind of multinode configuration it has to inject/use (ray for vllm, sglang additional parameters and mpirun for trtllm). What I'm saying is that if not 1 of these 3, the operator will do nothing and users doesn't have to specify anything. |
Signed-off-by: Hannah Zhang <[email protected]>
Signed-off-by: Jason Zhou <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
Signed-off-by: nnshah1 <[email protected]>
Overview:
do not fail if backendFramework cannot be detected
Summary by CodeRabbit
New Features
Refactor
Tests