Skip to content

Conversation

@julienmancuso
Copy link
Contributor

@julienmancuso julienmancuso commented Aug 28, 2025

Overview:

update planner to use DYN_PARENT_DGD_K8S_NAME

Summary by CodeRabbit

  • New Features

    • Select the parent graph deployment via the DYN_PARENT_DGD_K8S_NAME environment variable.
    • Bulk scale replicas across multiple components with optional blocking until ready.
  • Improvements

    • Simplified deployment discovery with generalized “Parent DynamoGraphDeployment not found” errors.
    • Readiness checks added before scaling; early return if not ready.
  • Tests

    • Added tests for environment-variable selection, readiness checks, and multi-component replica scaling.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 28, 2025

Walkthrough

Shifts KubernetesAPI to resolve the parent DynamoGraphDeployment via the DYN_PARENT_DGD_K8S_NAME environment variable. Updates KubernetesConnector to use the new API, adds set_component_replicas with readiness checks, and revises tests accordingly. Removes component-arg-based retrieval, simplifies error messages, and introduces delegation via get_parent_graph_deployment.

Changes

Cohort / File(s) Summary
KubernetesAPI env-var lookup
components/planner/src/dynamo/planner/kube.py
Replace owner-reference-based lookup with env-var-driven retrieval. Add async get_parent_graph_deployment(); update get_graph_deployment() to parameterless wrapper delegating to parent retrieval. Handle 404→None; import os.
Connector API updates and replica management
components/planner/src/dynamo/planner/kubernetes_connector.py
Use new get_graph_deployment() signature; generalize not-found errors. Add set_component_replicas(target_replicas, blocking) with is_deployment_ready checks, per-component update, and optional wait. Remove prior validation helper ensuring shared deployment.
KubernetesAPI tests
components/planner/test/kube.py
Add tests for env-var-driven parent lookup, absence of env var, and get_graph_deployment delegation to get_parent_graph_deployment.
Connector tests
components/planner/test/kubernetes_connector.py
Update constructor usage (two namespaces). Add mock is_deployment_ready. Adjust add/remove component tests to new API and messages. Add async tests for set_component_replicas, readiness flow, and error cases.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Env as Environment
  participant Planner as KubernetesAPI
  participant K8s as Kubernetes API

  Note over Planner: Retrieve parent graph deployment (new)
  Env->>Planner: DYN_PARENT_DGD_K8S_NAME
  alt Env var set
    Planner->>K8s: _get_graph_deployment_from_name(name)
    alt Found
      K8s-->>Planner: Deployment dict
      Planner-->>Caller: Deployment dict
    else 404 Not Found
      K8s-->>Planner: ApiException(404)
      Planner-->>Caller: None
    else Other error
      K8s--xPlanner: ApiException(other)
      Planner--xCaller: Propagate error
    end
  else Env var missing
    Planner-->>Caller: None
  end
Loading
sequenceDiagram
  autonumber
  participant Conn as KubernetesConnector
  participant API as KubernetesAPI
  participant K8s as Kubernetes API

  Note over Conn: set_component_replicas(target_replicas, blocking)
  Conn->>API: get_graph_deployment()
  alt Deployment found
    API-->>Conn: {name, ...}
    Conn->>API: is_deployment_ready(name)
    alt Not ready
      API-->>Conn: False
      Note over Conn: Log warning and return early
      Conn-->>Caller: Return
    else Ready
      API-->>Conn: True
      loop For each component in target_replicas
        Conn->>API: update_graph_replicas(name, component, replicas)
        API-->>Conn: ack
      end
      alt blocking=True
        Conn->>API: wait_for_graph_deployment_ready(name)
        API-->>Conn: ack
      end
      Conn-->>Caller: Return
    end
  else Not found
    API-->>Conn: None
    Conn--xCaller: ValueError("Parent DynamoGraphDeployment not found")
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

A whisk of wind in cluster air,
I sniff the name that pods now share—
Env vars whisper where to hop,
To find the graph, I do not stop.
Replicas hum, in tidy rows,
I thump approval—on it goes. 🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbit in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbit in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbit gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbit read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbit help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbit ignore or @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbit summary or @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbit or @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
components/planner/src/dynamo/planner/kube.py (2)

102-109: Handle 404s: align readiness check with API semantics

CustomObjectsApi raises ApiException on 404; current if not graph_deployment branch won’t execute in real runs. Catch 404 and raise ValueError as intended.

-        graph_deployment = self._get_graph_deployment_from_name(graph_deployment_name)
-
-        if not graph_deployment:
-            raise ValueError(f"Graph deployment {graph_deployment_name} not found")
+        try:
+            graph_deployment = self._get_graph_deployment_from_name(graph_deployment_name)
+        except client.ApiException as e:
+            if e.status == 404:
+                raise ValueError(f"Graph deployment {graph_deployment_name} not found")
+            raise

128-134: Handle 404s inside the wait loop

Same issue as above: catch ApiException and translate 404 to ValueError.

-            graph_deployment = self._get_graph_deployment_from_name(
-                graph_deployment_name
-            )
-
-            if not graph_deployment:
-                raise ValueError(f"Graph deployment {graph_deployment_name} not found")
+            try:
+                graph_deployment = self._get_graph_deployment_from_name(
+                    graph_deployment_name
+                )
+            except client.ApiException as e:
+                if e.status == 404:
+                    raise ValueError(f"Graph deployment {graph_deployment_name} not found")
+                raise
🧹 Nitpick comments (5)
components/planner/test/kube.py (1)

267-272: Add assertion: ensure no API call when env var is unset

Guard against accidental calls by asserting _get_graph_deployment_from_name wasn’t invoked.

@@
-    with patch.dict(os.environ, {}, clear=True):
-        result = await k8s_api.get_parent_graph_deployment()
-        assert result is None
+    with patch.dict(os.environ, {}, clear=True):
+        with patch.object(k8s_api, "_get_graph_deployment_from_name") as mock_get:
+            result = await k8s_api.get_parent_graph_deployment()
+            assert result is None
+            mock_get.assert_not_called()
components/planner/src/dynamo/planner/kubernetes_connector.py (2)

35-39: Remove unnecessary f-string

String has no placeholders; flagged by Ruff F541.

-            raise ValueError(
-                f"Parent DynamoGraphDeployment not found"
-            )
+            raise ValueError("Parent DynamoGraphDeployment not found")

56-60: Remove unnecessary f-string

Same issue as above.

-            raise ValueError(
-                f"Parent DynamoGraphDeployment not found"
-            )
+            raise ValueError("Parent DynamoGraphDeployment not found")
components/planner/test/kubernetes_connector.py (1)

146-168: Strengthen assertions for multi-component updates

Also verify exact calls per component; optionally add tests for non-ready and non-blocking flows.

@@
-from dynamo.planner.kubernetes_connector import KubernetesConnector
+from dynamo.planner.kubernetes_connector import KubernetesConnector
+from unittest.mock import call
@@
     # Assert
     mock_kube_api.get_graph_deployment.assert_called_once()
     mock_kube_api.is_deployment_ready.assert_called_once_with("test-graph")
-    # Should be called twice, once for each component
-    assert mock_kube_api.update_graph_replicas.call_count == 2
+    mock_kube_api.update_graph_replicas.assert_has_calls(
+        [call("test-graph", "component1", 3), call("test-graph", "component2", 2)],
+        any_order=True,
+    )
     mock_kube_api.wait_for_graph_deployment_ready.assert_called_once_with("test-graph")

Optionally add:

+@pytest.mark.asyncio
+async def test_set_component_replicas_ignores_when_not_ready(kubernetes_connector, mock_kube_api):
+    target_replicas = {"c1": 2}
+    mock_kube_api.get_graph_deployment.return_value = {"metadata": {"name": "test-graph"}}
+    mock_kube_api.is_deployment_ready.return_value = False
+    await kubernetes_connector.set_component_replicas(target_replicas)
+    mock_kube_api.update_graph_replicas.assert_not_called()
+    mock_kube_api.wait_for_graph_deployment_ready.assert_not_called()
+
+@pytest.mark.asyncio
+async def test_set_component_replicas_non_blocking_does_not_wait(kubernetes_connector, mock_kube_api):
+    target_replicas = {"c1": 2}
+    mock_kube_api.get_graph_deployment.return_value = {"metadata": {"name": "test-graph"}}
+    mock_kube_api.is_deployment_ready.return_value = True
+    await kubernetes_connector.set_component_replicas(target_replicas, blocking=False)
+    mock_kube_api.wait_for_graph_deployment_ready.assert_not_called()
components/planner/src/dynamo/planner/kube.py (1)

143-147: Prefer logging over print for observability

Use module logger instead of print to integrate with runtime logging.

-            print(
+            logger.info(
                 f"[Attempt {attempt + 1}/{max_attempts}] "
                 f"(status: {ready_condition.get('status') if ready_condition else 'N/A'}, "
                 f"message: {ready_condition.get('message') if ready_condition else 'no condition found'})"
             )

Add once at top:

@@
-import os
+import os
+import logging
@@
-from kubernetes import client, config
+from kubernetes import client, config
@@
+logger = logging.getLogger(__name__)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7d13b6e and 16e26fa.

📒 Files selected for processing (4)
  • components/planner/src/dynamo/planner/kube.py (2 hunks)
  • components/planner/src/dynamo/planner/kubernetes_connector.py (3 hunks)
  • components/planner/test/kube.py (2 hunks)
  • components/planner/test/kubernetes_connector.py (5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1308-1312
Timestamp: 2025-06-11T21:29:28.650Z
Learning: User julienmancuso expects replies in English; avoid switching languages unless explicitly requested.
🧬 Code graph analysis (3)
components/planner/test/kube.py (1)
components/planner/src/dynamo/planner/kube.py (2)
  • get_parent_graph_deployment (57-77)
  • get_graph_deployment (79-86)
components/planner/src/dynamo/planner/kubernetes_connector.py (1)
components/planner/src/dynamo/planner/kube.py (1)
  • get_graph_deployment (79-86)
components/planner/test/kubernetes_connector.py (2)
components/planner/src/dynamo/planner/kube.py (4)
  • is_deployment_ready (102-115)
  • get_graph_deployment (79-86)
  • update_graph_replicas (88-100)
  • wait_for_graph_deployment_ready (117-153)
components/planner/src/dynamo/planner/kubernetes_connector.py (2)
  • KubernetesConnector (27-119)
  • set_component_replicas (75-106)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/2774/merge) by julienmancuso.
components/planner/test/kube.py

[error] 1-1: isort: Import order issues detected. The file was auto-fixed by isort during pre-commit.


[error] 1-1: Black formatting applied. File reformatted by Black during pre-commit.


[error] 1-1: ruff: lint issues auto-fixed (3 issues).

components/planner/src/dynamo/planner/kubernetes_connector.py

[error] 1-1: Black formatting applied. File reformatted by Black during pre-commit.


[error] 1-1: ruff: lint issues auto-fixed (3 issues).

components/planner/test/kubernetes_connector.py

[error] 1-1: Black formatting applied. File reformatted by Black during pre-commit.


[error] 1-1: ruff: lint issues auto-fixed (3 issues).

🪛 Ruff (0.12.2)
components/planner/src/dynamo/planner/kubernetes_connector.py

38-38: f-string without any placeholders

Remove extraneous f prefix

(F541)


59-59: f-string without any placeholders

Remove extraneous f prefix

(F541)


85-85: f-string without any placeholders

Remove extraneous f prefix

(F541)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: Build and Test - vllm
  • GitHub Check: Mirror Repository to GitLab
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
🔇 Additional comments (5)
components/planner/test/kube.py (2)

18-18: LGTM: import for env handling

os import is appropriate for env-var driven tests.


252-264: Good: happy-path env-var test covers name propagation

Test asserts delegation to _get_graph_deployment_from_name with the env var value.

components/planner/test/kubernetes_connector.py (2)

29-30: LGTM: AsyncMock coverage includes readiness path

Including is_deployment_ready as AsyncMock enables the new flow.


66-71: Good: relax assertion on get_graph_deployment args

This aligns with the new parameterless API.

components/planner/src/dynamo/planner/kube.py (1)

57-76: LGTM: env-var–based parent DGD resolution

Behavior matches PR objective (DYN_PARENT_DGD_K8S_NAME) and gracefully maps 404 to None.

@julienmancuso julienmancuso merged commit 37adc0a into main Aug 29, 2025
15 checks passed
@julienmancuso julienmancuso deleted the jsm/dep-360 branch August 29, 2025 04:24
jasonqinzhou pushed a commit that referenced this pull request Aug 30, 2025
michaelshin pushed a commit that referenced this pull request Sep 2, 2025
KrishnanPrash pushed a commit that referenced this pull request Sep 2, 2025
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
nnshah1 pushed a commit that referenced this pull request Sep 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants