-
Notifications
You must be signed in to change notification settings - Fork 753
fix: service args and operator fixes #1297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis update primarily increases version numbers across multiple Rust, Python, and Docker-related files, adjusts dependency versions, and introduces resource configuration changes in both code and example YAMLs. It also modifies logic for resource field handling, adds validation for GPU resource types, and restricts specific Kubernetes argument injection to a targeted component type. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ServiceInterface
participant ServiceInfo
participant ServiceConfig
User->>ServiceInterface: Define service (with/without image/resources)
ServiceInterface->>ServiceInfo: from_service(service)
ServiceInfo->>ServiceInfo: Resolve image (service.config.image or DYNAMO_IMAGE)
ServiceInfo->>ServiceConfig: Use resources field (with validated cpu/gpu/memory)
ServiceInfo->>User: Return ServiceInfo instance
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
docs/support_matrix.md (1)
71-71: Minor formatting: Add blank line around table.The static analysis tool suggests adding a blank line around the table for better markdown formatting.
| **Build Dependency** | **Version** | |----------------------|-------------| + | **Base Container** | [25.03](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda-dl-base/tags) |🧰 Tools
🪛 markdownlint-cli2 (0.17.2)
71-71: Tables should be surrounded by blank lines
null(MD058, blanks-around-tables)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (3)
Cargo.lockis excluded by!**/*.locklib/bindings/python/Cargo.lockis excluded by!**/*.locklib/runtime/examples/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (15)
Cargo.toml(2 hunks)container/Dockerfile.vllm(1 hunks)container/build.sh(1 hunks)deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go(2 hunks)deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go(1 hunks)deploy/sdk/src/dynamo/sdk/cli/build.py(1 hunks)deploy/sdk/src/dynamo/sdk/core/lib.py(0 hunks)deploy/sdk/src/dynamo/sdk/core/protocol/interface.py(2 hunks)deploy/sdk/src/dynamo/sdk/tests/test_resources.py(2 hunks)docs/support_matrix.md(1 hunks)examples/llm/configs/agg_router.yaml(1 hunks)lib/bindings/python/Cargo.toml(2 hunks)lib/llm/Cargo.toml(1 hunks)lib/runtime/examples/Cargo.toml(1 hunks)pyproject.toml(2 hunks)
💤 Files with no reviewable changes (1)
- deploy/sdk/src/dynamo/sdk/core/lib.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
deploy/sdk/src/dynamo/sdk/tests/test_resources.py (1)
deploy/sdk/src/dynamo/sdk/core/protocol/interface.py (2)
ServiceInterface(105-198)config(116-118)
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go (2)
deploy/cloud/operator/internal/consts/consts.go (1)
KubeLabelDynamoComponent(47-47)deploy/cloud/operator/internal/dynamo/graph.go (1)
ComponentTypePlanner(48-48)
🪛 markdownlint-cli2 (0.17.2)
docs/support_matrix.md
71-71: Tables should be surrounded by blank lines
null
(MD058, blanks-around-tables)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Build and Test - vllm
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: pre-merge-rust (lib/bindings/python)
🔇 Additional comments (22)
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go (1)
85-85:Details
❌ Incorrect review comment
Verify the component type casing consistency.
There's a potential inconsistency with the component type constant. The new constant
ComponentTypePlanner = "Planner"uses capital "P", but there's already a similar constantComponentTypePlanner = "planner"(lowercase) indeploy/cloud/operator/internal/dynamo/graph.goline 48.Please verify which casing is correct for the component type identification:
🏁 Script executed:
#!/bin/bash # Search for component type usages and labels to determine the correct casing echo "=== Searching for Planner component type references ===" rg -i "planner" --type go -A 2 -B 2 echo -e "\n=== Searching for component type label usage ===" rg "KubeLabelDynamoComponent.*[Pp]lanner" --type go -A 2 -B 2 echo -e "\n=== Searching for component type constants ===" ast-grep --pattern $'ComponentTypePlanner = $_'Length of output: 13159
Ignore the casing discrepancy – it’s intentional
The two
ComponentTypePlannerconstants serve different contexts and must remain as-is:
- In
deploy/cloud/operator/internal/dynamo/graph.go(and its tests),ComponentTypePlanner = "planner"(lowercase) is used for Kubernetes label values (commonconsts.KubeLabelDynamoComponent).- In
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go(and related JSON‐based DYN_DEPLOYMENT_CONFIG),ComponentTypePlanner = "Planner"(capitalized) maps to configuration keys and command-line flags.No changes required.
Likely an incorrect or invalid review comment.
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go (1)
937-937: Correct test update reflecting the conditional environment argument logic.The test correctly removes the
--test-lws-deploy-service.environment=kubernetesargument from the expected command since the test deployment doesn't have the Planner component type label. This properly validates the new conditional behavior implemented in the controller.deploy/sdk/src/dynamo/sdk/core/protocol/interface.py (3)
66-70: LGTM! Good validation for GPU field.The validator ensures consistent string handling for GPU values, converting integers to strings when needed. This prevents type inconsistencies.
77-77:Details
✅ Verification successful
Verify field rename from 'resource' to 'resources' across codebase.
The field has been renamed from
resourcetoresourcesin ServiceConfig. This is a breaking change that requires updates in all code that accesses this field.
🏁 Script executed:
#!/bin/bash # Description: Check for usage of the old 'resource' field name that needs updating # Search for any code accessing the old 'resource' field rg "\.resource\b" --type py -A 2 -B 2 # Search for any remaining references to old field in tests or other files ast-grep --pattern $'$_.resource'Length of output: 393
No outdated
.resourcereferences found
A search for.resourcereturned only an import of theresourcemodule, with no attribute accesses using the old field name. The rename toresourceshas been applied consistently across the codebase.
62-64:Details
✅ Verification successful
Verify resource type changes don't break existing code.
The resource fields have been changed from mixed types to all strings with updated defaults:
cpu: Changed fromint(default 1) tostr(default "1")memory: Default increased from "100Mi" to "500Mi"gpu: Now enforced as string type with validatorThese type changes could break existing code that expects integer CPU values or relies on the old memory default.
🏁 Script executed:
#!/bin/bash # Description: Check for any code that might be affected by the resource type changes # Search for direct access to cpu field that might expect integer rg -A 3 -B 3 "\.cpu\s*=" --type py # Search for any hardcoded references to the old memory default rg "100Mi" --type py # Search for potential integer assignments to cpu field ast-grep --pattern $'$_.cpu = $INT'Length of output: 1405
ResourceConfig string-based fields verified
- Tests in
deploy/sdk/src/dynamo/sdk/tests/test_resources.pyalready assert string values ("2","1","4Gi"), confirming consumer code expects strings.- No occurrences of the old
"100Mi"default were found in the codebase.- Deployment logic in
deploy/sdk/src/dynamo/sdk/core/protocol/deployment.pystrips and validates string CPU formats.- The
gpuvalidator aligns with existing usage of string values.- The
ServiceConfig.resource→resourcesrename is consistently applied (tests reference.resources).No breaking changes detected.
examples/llm/configs/agg_router.yaml (1)
43-44: LGTM! Resource specifications align with new schema.The explicit CPU and memory resource specifications align well with the updated ResourceConfig schema. The allocations (10 CPUs, 20Gi memory, 1 GPU) are reasonable for a VLLM worker service.
pyproject.toml (2)
18-18: Version bump to 0.3.0 looks appropriate.The minor version bump from 0.2.1 to 0.3.0 is consistent with the scope of changes including resource configuration updates and operator fixes.
32-32: Runtime dependency version correctly updated.The ai-dynamo-runtime dependency version is properly updated to match the project version, ensuring compatibility.
lib/runtime/examples/Cargo.toml (1)
24-24: Version consistency maintained across workspace.The version bump to 0.3.0 maintains consistency with the Python package and overall project versioning.
lib/llm/Cargo.toml (1)
84-84: LGTM - Dependency version update aligns with workspace version bump.The
nixl-sysdependency version has been updated from0.2.1-rc.3to0.3.0-rc.2, which aligns with the broader workspace version bump to0.3.0. The change from rc.3 to rc.2 is expected as this represents a new major/minor version series with its own release candidate numbering.lib/bindings/python/Cargo.toml (2)
22-22: LGTM - Version bump aligns with workspace updates.The version update from
0.2.1to0.3.0is consistent with the coordinated workspace version bump across all related packages.
78-78: Good addition of trailing newline.Adding the trailing newline improves file formatting consistency.
Cargo.toml (2)
31-31: LGTM - Workspace version bump to 0.3.0.The workspace version has been properly updated from
0.2.1to0.3.0, coordinating the version bump across the entire project.
42-44: LGTM - Local dependency versions updated consistently.All local workspace dependencies (
dynamo-runtime,dynamo-llm,dynamo-tokens) have been updated consistently to version0.3.0, ensuring proper version alignment across the workspace.deploy/sdk/src/dynamo/sdk/tests/test_resources.py (2)
19-19: Good addition of ServiceInterface import.Adding the
ServiceInterfaceimport enables proper type annotation and interface-based testing, which improves type safety and code clarity.
44-48: LGTM - Improved test assertions with proper typing.The updates improve the test in several ways:
- Proper type annotation using
ServiceInterfaceprovides better type safety- Updated assertions using
dyn_svc.configcorrectly test the interface contract- Resource value assertions verify that the decorator properly configures CPU (
"2"), GPU ("1"), and memory ("4Gi") as string values- The change to string-based resource values aligns with the updated resource configuration model
container/Dockerfile.vllm (1)
172-172: LGTM! Version bump is coordinated with other package updates.The version update from "0.8.4.post1" to "0.8.4.post2" is consistent with the overall project version bump and aligns with documentation updates.
container/build.sh (1)
112-112: LGTM! NIXL commit hash update aligns with dependency version bump.The commit hash update corresponds to the nixl-sys dependency upgrade and follows the coordinated version bump strategy.
docs/support_matrix.md (1)
65-69: LGTM! Coordinated version updates are consistent across all packages.All version updates follow the coordinated release pattern and align with changes in other files:
- ai-dynamo and related packages: 0.2.1 → 0.3.0
- ai-dynamo-vllm: 0.8.4.post1 → 0.8.4.post2
- NIXL: 0.2.1 → 0.3.0
Also applies to: 74-74, 76-76, 79-79
deploy/sdk/src/dynamo/sdk/cli/build.py (3)
132-136: LGTM! Improved image resolution with proper fallback logic.The new image resolution logic provides flexibility by checking
service.config.imagefirst, then falling back to the globalDYNAMO_IMAGEconstant. The assertion ensures that an image is always specified, improving robustness.
143-143: LGTM! Using resolved image variable improves code clarity.Using the
imagevariable instead of directly accessingservice.config.imageimproves code clarity and ensures the fallback logic is properly applied.
141-141:Details
✅ Verification successful
Verify the field name change is consistent across the codebase.
The field name changed from
resourcetoresources. This appears to be part of a schema update, but please ensure this change is consistent across all files that reference this field.
🏁 Script executed:
#!/bin/bash # Description: Verify all references to the resource/resources field are updated consistently # Expected: All references should use 'resources' not 'resource' # Search for any remaining references to the old field name rg -A 2 -B 2 "\.resource\b" --type py # Search for the new field name usage rg -A 2 -B 2 "\.resources\b" --type pyLength of output: 1265
Field rename consistency verified
All references to the config field now use
.resources. The only remainingresourceoccurrences are module imports inallocator.py(from dynamo.sdk.lib.resource import …), not attribute accesses. No further changes needed.
| if componentType, exists := opt.dynamoComponentDeployment.Labels[commonconsts.KubeLabelDynamoComponent]; exists && componentType == ComponentTypePlanner { | ||
| args = append(args, fmt.Sprintf("--%s.environment=%s", opt.dynamoComponentDeployment.Spec.ServiceName, KubernetesDeploymentStrategy)) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Good implementation of conditional environment argument injection.
The logic correctly restricts the --environment=kubernetes argument to only Planner components as specified in the PR objectives. The implementation properly checks the component type before adding the argument.
Please verify that KubeLabelDynamoComponent contains the component type (not just the component name) for proper matching:
🏁 Script executed:
#!/bin/bash
# Verify the label contains component type information
echo "=== Checking KubeLabelDynamoComponent usage ==="
rg "KubeLabelDynamoComponent" --type go -A 3 -B 3
echo -e "\n=== Checking how component labels are set ==="
ast-grep --pattern $'labels[commonconsts.KubeLabelDynamoComponent] = $_'Length of output: 22753
Use the correct label key for component-type checks
The code is currently using nvidia.com/dynamo-component (which holds the component name) to decide when to inject the Planner environment flag. It should instead read nvidia.com/dynamo-component-type (which holds the component type). Please update the conditional as follows:
--- a/deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go
+++ b/deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go
@@ -1457,7 +1457,7 @@ func (r *DynamoComponentDeploymentReconciler) reconcile(...) {
}
- if componentType, exists := opt.dynamoComponentDeployment.Labels[commonconsts.KubeLabelDynamoComponent]; exists && componentType == ComponentTypePlanner {
+ if componentType, exists := opt.dynamoComponentDeployment.Labels[commonconsts.KubeLabelDynamoComponentType]; exists && componentType == ComponentTypePlanner {
args = append(args, fmt.Sprintf("--%s.environment=%s", opt.dynamoComponentDeployment.Spec.ServiceName, KubernetesDeploymentStrategy))
}
}• File: deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go
• Lines: 1458–1460
This change ensures the Planner-only --environment=kubernetes flag is tied to the type label, not the name label.
🤖 Prompt for AI Agents
In
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go
around lines 1458 to 1460, the code incorrectly uses the label key for component
name (commonconsts.KubeLabelDynamoComponent) to check the component type. Update
the conditional to use the correct label key
commonconsts.KubeLabelDynamoComponentType that holds the component type. This
ensures the environment argument is only injected for Planner components based
on their type, not their name.
Overview:
Cherry pick : #1255
Linear ticket for the issue : https://linear.app/nvidia-dynamo/issue/DEP-130/dynamo-build-populate-default-image-name
This PR cherry picks following fixes -
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit