-
Notifications
You must be signed in to change notification settings - Fork 755
feat: add a virtual connector for 3rd party deployments #2913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: hongkuanz <[email protected]>
WalkthroughExposes VirtualConnector via planner’s public API; adds a new Virtual environment path, async initialization in Planner, and namespace CLI arg. planner_sla now uses args.namespace. Introduces VirtualConnector with ETCD-backed scaling decisions. CLI accepts environment {kubernetes, virtual}. Updates docs to describe K8s and Virtual deployments. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CLI as CLI
participant SLA as planner_sla.init_planner
participant Core as Planner (planner_core)
participant Conn as Connector
note over CLI,SLA: Startup with environment and namespace args
CLI->>SLA: Parse args (env, namespace, backend...)
SLA->>Core: start_sla_planner(args)
Core->>Core: select connector (kubernetes|virtual)
Core->>Conn: instantiate Connector(runtime, namespace, backend)
alt Connector supports async init
Core->>Conn: await _async_init()
end
Core->>Core: run()
sequenceDiagram
autonumber
participant Planner as Planner
participant V as VirtualConnector
participant ETCD as ETCD KV
rect rgba(220,235,245,0.4)
note right of V: Async initialization
Planner->>V: _async_init()
V->>ETCD: Read /{ns}/planner/state
V-->>Planner: current num_prefill/num_decode, decision_id
end
rect rgba(235,245,220,0.4)
note over Planner,V: Issue scaling decision
Planner->>V: add/remove/set (blocking?)
V->>ETCD: Write desired counts + increment decision_id
end
alt blocking wait
loop until scaled_decision_id == decision_id or timeout
V->>ETCD: Read scaled_decision_id
ETCD-->>V: value
V->>V: check readiness/timeout
end
V-->>Planner: completion or timeout
else non-blocking
V-->>Planner: return immediately
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (6)
docs/architecture/sla_planner.md (2)
111-121: K8s section: cross-link LGTM; add namespace flag mentionConsider noting the new --namespace CLI to keep CLI/docs aligned.
-**To deploy SLA Planner:** +**To deploy SLA Planner (specify your namespace via `--namespace` or `DYNAMO_NAMESPACE`):**
124-187: Virtual deployment docs mostly align; polish and consistency fixes
- Grammar and list style nits; switch “dash” to asterisks to satisfy MD004.
- Tighten ETCD key phrasing (“stored as a string”) and align timeout wording with constants.
- Optional: clarify that “Bare metal with local connector is deprecated” does not apply to the new VirtualConnector path.
-**Planner Output Keys** (written by the planner): -- `num_prefill_workers`: Integer (stored as string) specifying the target number of prefill workers -- `num_decode_workers`: Integer (stored as string) specifying the target number of decode workers -- `decision_id`: Integer (stored as string) with incremental ID for each scaling decision (-1 if no decisions made) +**Planner Output Keys** (written by the planner): +* `num_prefill_workers`: Integer (stored as a string) specifying the target number of prefill workers +* `num_decode_workers`: Integer (stored as a string) specifying the target number of decode workers +* `decision_id`: Integer (stored as a string) incremented for each scaling decision (-1 if none made) -**Deployment Environment Input Key** (written by the deployment environment): -- `scaled_decision_id`: Integer (stored as string) specifying the newest decision_id that has been successfully scaled +**Deployment Environment Input Key** (written by the deployment environment): +* `scaled_decision_id`: Integer (stored as a string) specifying the newest decision_id that has been successfully scaled -4. **Timeout Handling**: If a scaling decision isn't acknowledged within 30 minutes (1800 seconds), the planner proceeds with new decisions anyway +4. **Timeout Handling**: If a scaling decision isn't acknowledged within 30 minutes (1800 seconds), the planner proceeds with new decisions +> [!NOTE] +> “Bare metal with local connector is deprecated” refers to the old local connector. VirtualConnector is supported for external/third‑party environments.components/planner/src/dynamo/planner/virtual_connector.py (1)
171-200: Skip-window logic: reset timestamp when readiness returnsIf a prior call set first_skip_timestamp and a later call finds readiness true but then detects “no change,” the skip timer remains set unnecessarily. Harmless, but can confuse logs.
- is_ready = await self._is_scaling_ready() + is_ready = await self._is_scaling_ready() + if is_ready and self.first_skip_timestamp is not None: + self.first_skip_timestamp = Nonecomponents/planner/src/dynamo/planner/__init__.py (1)
7-7: Guard VirtualConnector import to avoid hard optional-deps failure.Top-level importing
VirtualConnectorcan fail if ETCD/virtual extras aren’t installed, breaking imports even for Kubernetes-only users. Make the export lazy/optional.Apply this diff to the touched lines:
@@ __all__ = [ "PlannerConnector", "KubernetesConnector", - "VirtualConnector", "LoadPlannerDefaults", "SLAPlannerDefaults", "ServiceConfig", ] @@ -from dynamo.planner.virtual_connector import VirtualConnector +try: + from dynamo.planner.virtual_connector import VirtualConnector # optional + __all__.append("VirtualConnector") +except Exception: + # Leave VirtualConnector unexported if optional deps are missing + VirtualConnector = None # type: ignoreAlso applies to: 17-17
components/planner/src/dynamo/planner/utils/planner_core.py (2)
14-14: Nit: consider lazy-importing VirtualConnector.Importing both connectors at module load pulls optional deps even when unused. You can import
VirtualConnectoronly inside the virtual branch below.-from dynamo.planner import KubernetesConnector, VirtualConnector +from dynamo.planner import KubernetesConnectorAnd inside the virtual branch (see lines 72–75) add:
+from dynamo.planner import VirtualConnector
133-141: Make async init a formal, public hook.Calling a private
_async_initbreaks encapsulation. Defineasync_init()inPlannerConnector(no-op default) and call it here.Apply this diff here:
- async def _async_init(self): + async def _async_init(self): """Async initialization for components that need it""" if ( not self.dryrun and hasattr(self, "connector") - and hasattr(self.connector, "_async_init") + and hasattr(self.connector, "async_init") ): - await self.connector._async_init() + await self.connector.async_init()Then, in
PlannerConnector(outside this file):class PlannerConnector: async def async_init(self) -> None: return NoneAnd implement
async_init()inVirtualConnector.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
components/planner/src/dynamo/planner/__init__.py(2 hunks)components/planner/src/dynamo/planner/planner_sla.py(1 hunks)components/planner/src/dynamo/planner/utils/planner_argparse.py(1 hunks)components/planner/src/dynamo/planner/utils/planner_core.py(4 hunks)components/planner/src/dynamo/planner/virtual_connector.py(1 hunks)docs/architecture/sla_planner.md(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
components/planner/src/dynamo/planner/utils/planner_argparse.py (1)
components/planner/src/dynamo/planner/defaults.py (1)
SLAPlannerDefaults(72-83)
components/planner/src/dynamo/planner/virtual_connector.py (3)
components/planner/src/dynamo/planner/planner_connector.py (1)
PlannerConnector(20-29)lib/bindings/python/src/dynamo/_core.pyi (2)
DistributedRuntime(31-54)EtcdKvCache(102-179)components/planner/src/dynamo/planner/utils/planner_core.py (1)
_async_init(133-140)
components/planner/src/dynamo/planner/__init__.py (1)
components/planner/src/dynamo/planner/virtual_connector.py (1)
VirtualConnector(23-307)
components/planner/src/dynamo/planner/utils/planner_core.py (1)
components/planner/src/dynamo/planner/virtual_connector.py (1)
_async_init(60-79)
🪛 LanguageTool
docs/architecture/sla_planner.md
[grammar] ~134-~134: There might be a mistake here.
Context: ... Output Keys** (written by the planner): - num_prefill_workers: Integer (stored as string) specifying ...
(QB_NEW_EN)
[grammar] ~135-~135: There might be a mistake here.
Context: ...ing the target number of prefill workers - num_decode_workers: Integer (stored as string) specifying ...
(QB_NEW_EN)
[grammar] ~136-~136: There might be a mistake here.
Context: ...ying the target number of decode workers - decision_id: Integer (stored as string) with increm...
(QB_NEW_EN)
[grammar] ~144-~144: There might be a mistake here.
Context: ...writes them to ETCD with an incremented decision_id 2. Change Detection: The planner skips sc...
(QB_NEW_EN)
[grammar] ~145-~145: There might be a mistake here.
Context: ...t counts match current counts, logging: "No scaling needed (prefill=X, decode=Y), skipping ETCD update" 3. Readiness Check: Before making new dec...
(QB_NEW_EN)
[grammar] ~146-~146: There might be a mistake here.
Context: ...perations have completed by checking if scaled_decision_id >= decision_id 4. Timeout Handling: If a scaling decisio...
(QB_NEW_EN)
🪛 markdownlint-cli2 (0.17.2)
docs/architecture/sla_planner.md
135-135: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
136-136: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
137-137: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
140-140: Unordered list style
Expected: asterisk; Actual: dash
(MD004, ul-style)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build and Test - vllm
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (11)
components/planner/src/dynamo/planner/utils/planner_argparse.py (3)
31-33: Add “virtual” env: LGTMChoices now reflect the new connector path.
34-38: New --namespace flag: LGTMGood defaulting to SLAPlannerDefaults.namespace; aligns with planner_core and docs.
116-120: Prometheus port attribute exists:SLAPlannerDefaults.prometheus_portis defined in defaults.py (via_get_prometheus_port_from_env()), so the import is valid.Likely an incorrect or invalid review comment.
components/planner/src/dynamo/planner/virtual_connector.py (5)
126-134: Component name mapping depends on k8s-style namesUsing
prefill_worker_k8s_name/decode_worker_k8s_namehere assumes those identifiers are also used in virtual mode. If virtual environments use different names, scaling commands will be ignored.Please confirm WORKER_COMPONENT_NAMES[backend] are the canonical names across both k8s and virtual. If not, consider adding a virtual mapping or allowing aliases via config.
153-170: No-op updates are correctly skipped: LGTMGood guard against unnecessary ETCD writes.
210-219: Write-order: LGTMCounts written before decision_id keeps consumer logic simple and consistent.
225-245: Blocking wait bounds: LGTMBounded retries with clear logs; aligns with docs’ 1800s.
37-46: Runtime dependency check message: LGTMClear error if ETCD is unavailable.
components/planner/src/dynamo/planner/planner_sla.py (1)
42-42: Namespace argument default confirmed. The--namespaceflag increate_sla_planner_parserusesdefault=SLAPlannerDefaults.namespace, whereSLAPlannerDefaults.namespaceisos.environ.get("DYNAMO_NAMESPACE", "vllm-disagg-planner"), ensuring a non-empty default at runtime.components/planner/src/dynamo/planner/utils/planner_core.py (2)
67-68: Namespace via CLI is correct.Using
args.namespaceremoves the hidden dependency on defaults and honors user input.
554-555: Good: await initialization before run loop.Ensures connector state (e.g., ETCD cache) is ready before scaling decisions.
michaelshin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment left but makes sense to me
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
Summary by CodeRabbit
New Features
Changes
Documentation