Skip to content

Conversation

@biswapanda
Copy link
Contributor

@biswapanda biswapanda commented May 28, 2025

Details:

  • use default env variable based DYNAMO_IMAGE when image is not specified explicitly

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • Bug Fixes

    • Improved reliability in selecting the Docker image for services by ensuring a fallback option is used and errors are raised if no image is set.
  • Style

    • Removed unnecessary debug and logging statements for cleaner output.
  • Enhancements

    • Updated resource configuration fields with improved defaults and consistent typing.
    • Renamed resource configuration field for clarity.
    • Added detailed validation and checks on resource configuration attributes in tests.
    • Expanded resource allocation for a worker service with additional CPU and memory specifications.
    • Restricted environment argument addition to deployments labeled as the "Planner" component type.
  • Tests

    • Updated test cases to verify resource configuration values more explicitly.
    • Adjusted test command arguments to reflect deployment changes.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented May 28, 2025

Walkthrough

The from_service method in ServiceInfo was updated to select a Docker image by checking the service configuration first, then a global constant, and raising an error if neither is set. A debug print statement was removed from the to_package_name static method. Additionally, a logging statement was removed from the service function in the core library. The ResourceConfig class had its fields updated for types and defaults, including a validator for gpu. The ServiceConfig class renamed its resource field to resources. A test was enhanced to assert specific resource configuration values. A new deployment constant ComponentTypePlanner was added, and environment argument injection was restricted to deployments labeled as "Planner." A service configuration YAML was updated to include CPU and memory resources for a worker. The ManagedThread class's task parameter type was changed to expect an asynchronous coroutine.

Changes

File(s) Change Summary
deploy/sdk/src/dynamo/sdk/cli/build.py Updated image selection logic in ServiceInfo.from_service; corrected resource to resources; removed debug print from to_package_name.
deploy/sdk/src/dynamo/sdk/core/lib.py Removed logging statement inside the service function.
deploy/sdk/src/dynamo/sdk/core/protocol/interface.py Updated ResourceConfig field types and defaults (cpu, memory, gpu); added gpu validator; renamed ServiceConfig field resource to resources.
deploy/sdk/src/dynamo/sdk/tests/test_resources.py Enhanced test to assert specific resource config values (cpu, gpu, memory) on service instance.
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go Added constant ComponentTypePlanner; restricted environment argument addition to deployments labeled as "Planner".
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go Modified test to remove environment argument from leader container command line in deployment reconciler test.
examples/llm/configs/agg_router.yaml Added CPU and memory resource specifications to VllmWorker service configuration alongside existing GPU.
examples/tensorrt_llm/common/utils.py Updated ManagedThread class to expect asynchronous coroutine type for task parameter and local variable.

Poem

A Docker image, now chosen with care,
First from config, then global, if there.
Debug prints are gone, the logs are now clean,
Snake_case conversion, silent and keen.
Resources plural, tests check with might,
The rabbits hop on, code shining bright! 🐇


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fb588ae and def8d1f.

📒 Files selected for processing (1)
  • examples/tensorrt_llm/common/utils.py (3 hunks)
✅ Files skipped from review due to trivial changes (1)
  • examples/tensorrt_llm/common/utils.py
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Test - vllm

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the feat label May 28, 2025
@biswapanda biswapanda self-assigned this May 28, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
deploy/sdk/src/dynamo/sdk/cli/build.py (1)

132-136: Improve error message accuracy.

The assertion error message mentions "environment variable" but DYNAMO_IMAGE is imported from dynamo.sdk (line 40), not necessarily sourced from an environment variable. This could mislead users about how to resolve the issue.

Consider updating the error message to be more accurate:

-        assert (
-            image is not None
-        ), "Please set DYNAMO_IMAGE environment variable or image field in service config"
+        assert (
+            image is not None
+        ), "Please ensure DYNAMO_IMAGE is configured or set the image field in service config"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8cc1361 and 3d54b89.

📒 Files selected for processing (1)
  • deploy/sdk/src/dynamo/sdk/cli/build.py (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Validate PR title and add label
  • GitHub Check: Build and Test - vllm
  • GitHub Check: Mirror Repository to GitLab
🔇 Additional comments (1)
deploy/sdk/src/dynamo/sdk/cli/build.py (1)

132-143: Good implementation of fallback mechanism.

The fallback logic for Docker image selection is well-implemented. Using service.config.image or DYNAMO_IMAGE provides a clean way to prioritize service-specific configuration while having a sensible default. The assertion ensures the system fails fast if neither option is available, which is good for debugging.

@biswapanda biswapanda enabled auto-merge (squash) May 28, 2025 23:16
@biswapanda biswapanda disabled auto-merge May 28, 2025 23:25
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a9238f and 2a58676.

📒 Files selected for processing (2)
  • deploy/sdk/src/dynamo/sdk/core/protocol/interface.py (1 hunks)
  • deploy/sdk/src/dynamo/sdk/tests/test_resources.py (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
deploy/sdk/src/dynamo/sdk/tests/test_resources.py (1)
deploy/sdk/src/dynamo/sdk/core/protocol/interface.py (2)
  • ServiceInterface (99-192)
  • config (110-112)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Mirror Repository to GitLab
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (3)
deploy/sdk/src/dynamo/sdk/tests/test_resources.py (3)

19-19: Good addition of type annotation.

Adding the ServiceInterface import and type annotation improves code clarity and enables better IDE support.


44-48: Note: PR objective discrepancy.

The PR is titled "feat: populate default image name" and mentions using DYNAMO_IMAGE environment variable, but the changes in these files are related to resource configuration field renaming and test enhancements. This suggests either missing files or incomplete context for the actual default image functionality.

Could you confirm if there are additional files in this PR that implement the default image name functionality mentioned in the PR objectives?

Likely an incorrect or invalid review comment.


44-48:

Details

✅ Verification successful

Enhanced test assertions are valuable, but verify CPU type conversion.

The enhanced test assertions that check specific resource values are much better than just checking for existence. However, there's a potential type inconsistency: the service decorator specifies cpu: "2" (string) but the test expects cpu == 2 (integer).

Please verify if automatic type conversion is happening in the ResourceConfig class. Run this script to check the ResourceConfig field types:

If no automatic conversion exists, consider updating the test to match the expected type:

-    assert dyn_svc.config.resources.cpu == 2
+    assert dyn_svc.config.resources.cpu == "2"

🏁 Script executed:

#!/bin/bash
# Description: Check ResourceConfig field definitions and any type conversion logic
# Expected: Understanding of whether CPU should be string or int

# Search for ResourceConfig class definition and any type conversion
ast-grep --pattern 'class ResourceConfig($_):
  $$$'

# Search for any CPU field type annotations or validation
rg -A 5 -B 5 "cpu.*:" --type py
rg -A 5 -B 5 "cpu.*int" --type py

Length of output: 30213


🏁 Script executed:

rg "from .*BaseModel" -n deploy/sdk/src/dynamo/sdk/core/protocol/interface.py
rg "import BaseModel" -n deploy/sdk/src/dynamo/sdk/core/protocol/interface.py
rg "pydantic" -n deploy/sdk/src/dynamo/sdk/core/protocol/interface.py
rg -A5 -B2 "class ServiceConfig" -n deploy/sdk/src/dynamo/sdk/core/protocol/interface.py

Length of output: 646


No change needed: Pydantic auto-converts CPU string to int.

Pydantic’s BaseModel (imported in deploy/sdk/src/dynamo/sdk/core/protocol/interface.py) will coerce "2" into an int for the cpu: int field, so the test’s assert dyn_svc.config.resources.cpu == 2 is correct.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d70fef8 and 5e8cb05.

📒 Files selected for processing (2)
  • deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go (2 hunks)
  • examples/llm/configs/agg_router.yaml (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go (2)
deploy/cloud/operator/internal/consts/consts.go (1)
  • KubeLabelDynamoComponent (47-47)
deploy/cloud/operator/internal/dynamo/graph.go (1)
  • ComponentTypePlanner (48-48)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (2)
examples/llm/configs/agg_router.yaml (1)

43-44: LGTM! Appropriate resource configuration.

The CPU and memory resource specifications are well-sized for a VllmWorker service and follow Kubernetes resource specification standards.

deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go (1)

1458-1460: Conditional logic implementation looks correct.

The implementation properly restricts the environment argument injection to only Planner component deployments. The logic structure is sound:

  1. Checks for the existence of the component type label
  2. Compares the label value with the ComponentTypePlanner constant
  3. Only appends the environment argument when both conditions are met

However, this logic depends on the casing consistency issue identified in the previous comment. Ensure the label values match the constant definition.

@biswapanda biswapanda force-pushed the bis/dep-130-dynamo-build-populate-name branch from 5e8cb05 to e70d2ca Compare May 29, 2025 07:50
@copy-pr-bot
Copy link

copy-pr-bot bot commented May 29, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@biswapanda biswapanda force-pushed the bis/dep-130-dynamo-build-populate-name branch from 85e1c93 to fb588ae Compare May 29, 2025 19:22
@ajcasagrande
Copy link
Contributor

ajcasagrande commented May 29, 2025

Some of my comments from #1266 (review) apply here.

@biswapanda biswapanda enabled auto-merge (squash) May 30, 2025 00:46
@biswapanda biswapanda merged commit 1ae7641 into main May 30, 2025
6 checks passed
@biswapanda biswapanda deleted the bis/dep-130-dynamo-build-populate-name branch May 30, 2025 01:21
biswapanda added a commit that referenced this pull request May 30, 2025
@ZYWNB666
Copy link

ZYWNB666 commented Jul 25, 2025

I think I should raise a question that doesn't exist in the issues, about resources

The current resources can only use gpu: number, and in deployment it is nvidia.com/gpu, but our environment uses the renaming function, and the available resources are nvidia.com/gputype: number , may I ask if there is a way to handle this separately?

For instance, when I want to use an L4 model gpu, I only need to set nvidia.com/l4: 2 in the limit. But how do I choose the gpu type using the gpu: number method?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants