Skip to content

Conversation

@tedzhouhk
Copy link
Contributor

@tedzhouhk tedzhouhk commented Sep 6, 2025

Summary by CodeRabbit

  • New Features

    • Added Virtual environment support via a new connector, enabling ETCD-backed scaling.
    • Exposed VirtualConnector in the planner’s public API.
    • CLI: added --namespace option and expanded environment choices to include "virtual".
    • Planner supports asynchronous initialization for connectors to prepare before run.
  • Changes

    • Namespace resolution now uses the provided CLI argument instead of defaults.
  • Documentation

    • Added deployment guidance for Kubernetes and Virtual environments, including ETCD-based flow and configuration examples.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 6, 2025

Walkthrough

Exposes VirtualConnector via planner’s public API; adds a new Virtual environment path, async initialization in Planner, and namespace CLI arg. planner_sla now uses args.namespace. Introduces VirtualConnector with ETCD-backed scaling decisions. CLI accepts environment {kubernetes, virtual}. Updates docs to describe K8s and Virtual deployments.

Changes

Cohort / File(s) Summary of Changes
Public API exposure
components/planner/src/dynamo/planner/__init__.py
Exported VirtualConnector via import and inclusion in __all__.
Planner CLI args
components/planner/src/dynamo/planner/utils/planner_argparse.py
Added --namespace; expanded --environment choices to include "virtual".
Planner core init and flow
components/planner/src/dynamo/planner/utils/planner_core.py, components/planner/src/dynamo/planner/planner_sla.py
Added async init hook Planner._async_init and awaited it before run(). Selected VirtualConnector when environment is "virtual". Switched namespace source to args.namespace. Removed SLAPlannerDefaults usage in SLA init path.
Virtual connector implementation
components/planner/src/dynamo/planner/virtual_connector.py
New ETCD-backed VirtualConnector with async init, state load, scaling decision updates, readiness checks, and optional blocking waits; added constants for polling/timeout.
Documentation
docs/architecture/sla_planner.md
Added K8s and Virtual deployment sections; documented ETCD key structure and scaling decision flow, plus configuration examples.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant CLI as CLI
  participant SLA as planner_sla.init_planner
  participant Core as Planner (planner_core)
  participant Conn as Connector
  note over CLI,SLA: Startup with environment and namespace args

  CLI->>SLA: Parse args (env, namespace, backend...)
  SLA->>Core: start_sla_planner(args)
  Core->>Core: select connector (kubernetes|virtual)
  Core->>Conn: instantiate Connector(runtime, namespace, backend)
  alt Connector supports async init
    Core->>Conn: await _async_init()
  end
  Core->>Core: run()
Loading
sequenceDiagram
  autonumber
  participant Planner as Planner
  participant V as VirtualConnector
  participant ETCD as ETCD KV
  rect rgba(220,235,245,0.4)
    note right of V: Async initialization
    Planner->>V: _async_init()
    V->>ETCD: Read /{ns}/planner/state
    V-->>Planner: current num_prefill/num_decode, decision_id
  end

  rect rgba(235,245,220,0.4)
    note over Planner,V: Issue scaling decision
    Planner->>V: add/remove/set (blocking?)
    V->>ETCD: Write desired counts + increment decision_id
  end

  alt blocking wait
    loop until scaled_decision_id == decision_id or timeout
      V->>ETCD: Read scaled_decision_id
      ETCD-->>V: value
      V->>V: check readiness/timeout
    end
    V-->>Planner: completion or timeout
  else non-blocking
    V-->>Planner: return immediately
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

In burrows of code I twitch my ears,
A virtual path now softly appears.
ETCD whispers, decisions align,
Async hops make startup fine.
Kubernetes or clouds unseen—
I nibble namespaces, crisp and clean.
Thump-thump: the planner’s keen! 🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (6)
docs/architecture/sla_planner.md (2)

111-121: K8s section: cross-link LGTM; add namespace flag mention

Consider noting the new --namespace CLI to keep CLI/docs aligned.

-**To deploy SLA Planner:**
+**To deploy SLA Planner (specify your namespace via `--namespace` or `DYNAMO_NAMESPACE`):**

124-187: Virtual deployment docs mostly align; polish and consistency fixes

  • Grammar and list style nits; switch “dash” to asterisks to satisfy MD004.
  • Tighten ETCD key phrasing (“stored as a string”) and align timeout wording with constants.
  • Optional: clarify that “Bare metal with local connector is deprecated” does not apply to the new VirtualConnector path.
-**Planner Output Keys** (written by the planner):
-- `num_prefill_workers`: Integer (stored as string) specifying the target number of prefill workers
-- `num_decode_workers`: Integer (stored as string) specifying the target number of decode workers
-- `decision_id`: Integer (stored as string) with incremental ID for each scaling decision (-1 if no decisions made)
+**Planner Output Keys** (written by the planner):
+* `num_prefill_workers`: Integer (stored as a string) specifying the target number of prefill workers
+* `num_decode_workers`: Integer (stored as a string) specifying the target number of decode workers
+* `decision_id`: Integer (stored as a string) incremented for each scaling decision (-1 if none made)

-**Deployment Environment Input Key** (written by the deployment environment):
-- `scaled_decision_id`: Integer (stored as string) specifying the newest decision_id that has been successfully scaled
+**Deployment Environment Input Key** (written by the deployment environment):
+* `scaled_decision_id`: Integer (stored as a string) specifying the newest decision_id that has been successfully scaled

-4. **Timeout Handling**: If a scaling decision isn't acknowledged within 30 minutes (1800 seconds), the planner proceeds with new decisions anyway
+4. **Timeout Handling**: If a scaling decision isn't acknowledged within 30 minutes (1800 seconds), the planner proceeds with new decisions

+> [!NOTE]
+> “Bare metal with local connector is deprecated” refers to the old local connector. VirtualConnector is supported for external/third‑party environments.
components/planner/src/dynamo/planner/virtual_connector.py (1)

171-200: Skip-window logic: reset timestamp when readiness returns

If a prior call set first_skip_timestamp and a later call finds readiness true but then detects “no change,” the skip timer remains set unnecessarily. Harmless, but can confuse logs.

-        is_ready = await self._is_scaling_ready()
+        is_ready = await self._is_scaling_ready()
+        if is_ready and self.first_skip_timestamp is not None:
+            self.first_skip_timestamp = None
components/planner/src/dynamo/planner/__init__.py (1)

7-7: Guard VirtualConnector import to avoid hard optional-deps failure.

Top-level importing VirtualConnector can fail if ETCD/virtual extras aren’t installed, breaking imports even for Kubernetes-only users. Make the export lazy/optional.

Apply this diff to the touched lines:

@@
 __all__ = [
     "PlannerConnector",
     "KubernetesConnector",
-    "VirtualConnector",
     "LoadPlannerDefaults",
     "SLAPlannerDefaults",
     "ServiceConfig",
 ]
@@
-from dynamo.planner.virtual_connector import VirtualConnector
+try:
+    from dynamo.planner.virtual_connector import VirtualConnector  # optional
+    __all__.append("VirtualConnector")
+except Exception:
+    # Leave VirtualConnector unexported if optional deps are missing
+    VirtualConnector = None  # type: ignore

Also applies to: 17-17

components/planner/src/dynamo/planner/utils/planner_core.py (2)

14-14: Nit: consider lazy-importing VirtualConnector.

Importing both connectors at module load pulls optional deps even when unused. You can import VirtualConnector only inside the virtual branch below.

-from dynamo.planner import KubernetesConnector, VirtualConnector
+from dynamo.planner import KubernetesConnector

And inside the virtual branch (see lines 72–75) add:

+from dynamo.planner import VirtualConnector

133-141: Make async init a formal, public hook.

Calling a private _async_init breaks encapsulation. Define async_init() in PlannerConnector (no-op default) and call it here.

Apply this diff here:

-    async def _async_init(self):
+    async def _async_init(self):
         """Async initialization for components that need it"""
         if (
             not self.dryrun
             and hasattr(self, "connector")
-            and hasattr(self.connector, "_async_init")
+            and hasattr(self.connector, "async_init")
         ):
-            await self.connector._async_init()
+            await self.connector.async_init()

Then, in PlannerConnector (outside this file):

class PlannerConnector:
    async def async_init(self) -> None:
        return None

And implement async_init() in VirtualConnector.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1477f6e and 74dd1b2.

📒 Files selected for processing (6)
  • components/planner/src/dynamo/planner/__init__.py (2 hunks)
  • components/planner/src/dynamo/planner/planner_sla.py (1 hunks)
  • components/planner/src/dynamo/planner/utils/planner_argparse.py (1 hunks)
  • components/planner/src/dynamo/planner/utils/planner_core.py (4 hunks)
  • components/planner/src/dynamo/planner/virtual_connector.py (1 hunks)
  • docs/architecture/sla_planner.md (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
components/planner/src/dynamo/planner/utils/planner_argparse.py (1)
components/planner/src/dynamo/planner/defaults.py (1)
  • SLAPlannerDefaults (72-83)
components/planner/src/dynamo/planner/virtual_connector.py (3)
components/planner/src/dynamo/planner/planner_connector.py (1)
  • PlannerConnector (20-29)
lib/bindings/python/src/dynamo/_core.pyi (2)
  • DistributedRuntime (31-54)
  • EtcdKvCache (102-179)
components/planner/src/dynamo/planner/utils/planner_core.py (1)
  • _async_init (133-140)
components/planner/src/dynamo/planner/__init__.py (1)
components/planner/src/dynamo/planner/virtual_connector.py (1)
  • VirtualConnector (23-307)
components/planner/src/dynamo/planner/utils/planner_core.py (1)
components/planner/src/dynamo/planner/virtual_connector.py (1)
  • _async_init (60-79)
🪛 LanguageTool
docs/architecture/sla_planner.md

[grammar] ~134-~134: There might be a mistake here.
Context: ... Output Keys** (written by the planner): - num_prefill_workers: Integer (stored as string) specifying ...

(QB_NEW_EN)


[grammar] ~135-~135: There might be a mistake here.
Context: ...ing the target number of prefill workers - num_decode_workers: Integer (stored as string) specifying ...

(QB_NEW_EN)


[grammar] ~136-~136: There might be a mistake here.
Context: ...ying the target number of decode workers - decision_id: Integer (stored as string) with increm...

(QB_NEW_EN)


[grammar] ~144-~144: There might be a mistake here.
Context: ...writes them to ETCD with an incremented decision_id 2. Change Detection: The planner skips sc...

(QB_NEW_EN)


[grammar] ~145-~145: There might be a mistake here.
Context: ...t counts match current counts, logging: "No scaling needed (prefill=X, decode=Y), skipping ETCD update" 3. Readiness Check: Before making new dec...

(QB_NEW_EN)


[grammar] ~146-~146: There might be a mistake here.
Context: ...perations have completed by checking if scaled_decision_id >= decision_id 4. Timeout Handling: If a scaling decisio...

(QB_NEW_EN)

🪛 markdownlint-cli2 (0.17.2)
docs/architecture/sla_planner.md

135-135: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


136-136: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


137-137: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)


140-140: Unordered list style
Expected: asterisk; Actual: dash

(MD004, ul-style)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build and Test - vllm
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (11)
components/planner/src/dynamo/planner/utils/planner_argparse.py (3)

31-33: Add “virtual” env: LGTM

Choices now reflect the new connector path.


34-38: New --namespace flag: LGTM

Good defaulting to SLAPlannerDefaults.namespace; aligns with planner_core and docs.


116-120: Prometheus port attribute exists: SLAPlannerDefaults.prometheus_port is defined in defaults.py (via _get_prometheus_port_from_env()), so the import is valid.

Likely an incorrect or invalid review comment.

components/planner/src/dynamo/planner/virtual_connector.py (5)

126-134: Component name mapping depends on k8s-style names

Using prefill_worker_k8s_name / decode_worker_k8s_name here assumes those identifiers are also used in virtual mode. If virtual environments use different names, scaling commands will be ignored.

Please confirm WORKER_COMPONENT_NAMES[backend] are the canonical names across both k8s and virtual. If not, consider adding a virtual mapping or allowing aliases via config.


153-170: No-op updates are correctly skipped: LGTM

Good guard against unnecessary ETCD writes.


210-219: Write-order: LGTM

Counts written before decision_id keeps consumer logic simple and consistent.


225-245: Blocking wait bounds: LGTM

Bounded retries with clear logs; aligns with docs’ 1800s.


37-46: Runtime dependency check message: LGTM

Clear error if ETCD is unavailable.

components/planner/src/dynamo/planner/planner_sla.py (1)

42-42: Namespace argument default confirmed. The --namespace flag in create_sla_planner_parser uses default=SLAPlannerDefaults.namespace, where SLAPlannerDefaults.namespace is os.environ.get("DYNAMO_NAMESPACE", "vllm-disagg-planner"), ensuring a non-empty default at runtime.

components/planner/src/dynamo/planner/utils/planner_core.py (2)

67-68: Namespace via CLI is correct.

Using args.namespace removes the hidden dependency on defaults and honors user input.


554-555: Good: await initialization before run loop.

Ensures connector state (e.g., ETCD cache) is ready before scaling decisions.

Copy link
Contributor

@michaelshin michaelshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment left but makes sense to me

Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
@tedzhouhk tedzhouhk merged commit 8543d0d into main Sep 9, 2025
14 of 15 checks passed
@tedzhouhk tedzhouhk deleted the hzhou/virtual_connector branch September 9, 2025 18:26
tedzhouhk added a commit that referenced this pull request Sep 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants