-
Notifications
You must be signed in to change notification settings - Fork 753
feat: add subComponentType in DGD API and uptake in planner #3200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughIntroduces subComponentType across CRDs, operator, deployments, and planner code. Refactors planner and Kubernetes connector to resolve services by sub-component type (prefill/decode), updates Kubernetes API methods, adds new exceptions, and adjusts profiler utilities and tests to be target-aware. Removes embedded Prometheus blocks and adds optional env override for Prometheus endpoint. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Profiler as Profiler
participant Config as Config Modifiers
participant FS as Logs FS
note over Profiler,Config: Target-aware profiling (prefill/decode)
Profiler->>Config: set_config_tp_size(config, tp, target)
Profiler->>Config: get_kv_cache_size_from_dynamo_log(config, work_dir, name, target)
Config->>Config: Resolve service by subComponentType (target)
Config->>FS: get_dynamo_log_path(work_dir, name, resolved_service)
FS-->>Config: log path
Config-->>Profiler: KV cache size
sequenceDiagram
autonumber
participant Planner as SLA Planner
participant K8sConn as KubernetesConnector
participant KubeAPI as KubernetesAPI
note over Planner,K8sConn: Sub-component-based scaling
Planner->>K8sConn: verify_prefill_and_decode_components_exist(...)
K8sConn->>KubeAPI: get_graph_deployment(name)
KubeAPI-->>K8sConn: deployment(spec.services[].subComponentType)
K8sConn-->>Planner: OK or raise
Planner->>K8sConn: set_component_replicas([TargetReplica(PREFILL), TargetReplica(DECODE)])
K8sConn->>KubeAPI: update_graph_replicas(name, component_name, replicas)
KubeAPI-->>K8sConn: patched
K8sConn->>KubeAPI: get_graph_deployment(name)
K8sConn->>KubeAPI: is_deployment_ready(deployment)
KubeAPI-->>K8sConn: ready?
K8sConn-->>Planner: scaling complete
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
Pre-merge checks❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 18
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (10)
docs/kubernetes/sla_planner_deployment.md (1)
52-60: Remove Prometheus from expected pods; it now runs under kube-prometheus-stack in monitoring namespaceDocs say Prometheus blocks were removed in favor of kube-prometheus-stack. The expected pods list still shows a Prometheus pod in the app namespace, which is misleading.
Apply:
vllm-disagg-planner-frontend-* 1/1 Running -vllm-disagg-planner-prometheus-* 1/1 Running vllm-disagg-planner-planner-* 1/1 Running vllm-disagg-planner-backend-* 1/1 Running vllm-disagg-planner-prefill-* 1/1 Running +# Prometheus is provided by kube-prometheus-stack in the "monitoring" namespace and is not expected here.components/planner/src/dynamo/planner/utils/planner_core.py (1)
395-420: Potential UnboundLocalError: next_num_p/next_num_d may be undefined when predictions are None.If predict_load() returns any None, the compute block is skipped and we later reference undefined variables in target_replicas.
Apply:
next_num_req, next_isl, next_osl = self.predict_load() - if next_num_req is not None and next_isl is not None and next_osl is not None: + if next_num_req is None or next_isl is None or next_osl is None: + logger.info("Missing load predictions; skipping adjustment") + return + else: try: next_num_p, next_num_d = self._compute_replica_requirements( next_num_req, next_isl, next_osl ) except Exception as e: logger.error(f"Failed to compute number of replicas: {e}") return if not self.args.no_operation: # Ensure change for virtual connector does not break things target_replicas = [ TargetReplica( sub_component_type=SubComponentType.PREFILL, component_name=self.prefill_component_name, desired_replicas=next_num_p ), TargetReplica( sub_component_type=SubComponentType.DECODE, component_name=self.decode_component_name, desired_replicas=next_num_d ), ]components/planner/src/dynamo/planner/kubernetes_connector.py (1)
221-234: Fix CLI: map --component to SubComponentType and correct default.
add_component/remove_componentexpectSubComponentType, but the CLI passes a string and defaults to "planner" (invalid). This will raise AttributeError at runtime.Apply this diff:
parser = argparse.ArgumentParser() parser.add_argument("--dynamo_namespace", type=str, default="dynamo") parser.add_argument("--k8s_namespace", type=str, default="default") parser.add_argument("--action", type=str, choices=["add", "remove"]) -parser.add_argument("--component", type=str, default="planner") +parser.add_argument( + "--component", + type=str, + choices=[t.value for t in SubComponentType], + default=SubComponentType.PREFILL.value, + help="Target sub-component to scale", +) parser.add_argument("--blocking", action="store_true") args = parser.parse_args() connector = KubernetesConnector(args.dynamo_namespace, args.k8s_namespace) if args.action == "add": - task = connector.add_component(args.component, args.blocking) + task = connector.add_component(SubComponentType(args.component), args.blocking) elif args.action == "remove": - task = connector.remove_component(args.component, args.blocking) + task = connector.remove_component(SubComponentType(args.component), args.blocking) asyncio.run(task)components/planner/test/kube.py (3)
141-146: Patching async method with a non‑awaitable will failget_graph_deployment is async; patching it to return a dict makes await blow up. Use AsyncMock.
Apply this diff:
- with patch.object( - k8s_api, "get_graph_deployment", return_value=mock_deployment - ): + with patch.object(k8s_api, "get_graph_deployment", new_callable=AsyncMock) as mock_get: + mock_get.return_value = mock_deployment # Test with minimal attempts and delay for faster testing await k8s_api.wait_for_graph_deployment_ready( "test-deployment", max_attempts=2, delay_seconds=0.1 )Also add AsyncMock to imports:
# at top of file from unittest.mock import MagicMock, patch, AsyncMock
167-173: Same AsyncMock issue in timeout testPatch the async method with AsyncMock.
Apply this diff:
- with patch.object( - k8s_api, "get_graph_deployment", return_value=mock_deployment - ): + with patch.object(k8s_api, "get_graph_deployment", new_callable=AsyncMock) as mock_get: + mock_get.return_value = mock_deployment # Test with minimal attempts and delay for faster testing with pytest.raises(TimeoutError) as exc_info: await k8s_api.wait_for_graph_deployment_ready( "test-deployment", max_attempts=2, delay_seconds=0.1 )
202-208: Async patching needed here as wellUse AsyncMock to patch get_graph_deployment.
Apply this diff:
- with patch.object( - k8s_api, "get_graph_deployment", return_value=mock_deployment - ): + with patch.object(k8s_api, "get_graph_deployment", new_callable=AsyncMock) as mock_get: + mock_get.return_value = mock_deployment # Test with minimal attempts and delay for faster testing with pytest.raises(TimeoutError) as exc_info: await k8s_api.wait_for_graph_deployment_ready( "test-deployment", max_attempts=2, delay_seconds=0.1 )components/planner/src/dynamo/planner/virtual_connector.py (1)
306-312: Raise the standardized EmptyTargetReplicasErrorAligns with the KubernetesConnector and public exception API.
- if not target_replicas: - raise ValueError("target_replicas cannot be empty") + if not target_replicas: + raise EmptyTargetReplicasError()Also add the import:
-from dynamo.planner.utils.exceptions import DeploymentValidationError +from dynamo.planner.utils.exceptions import DeploymentValidationError, EmptyTargetReplicasErrorbenchmarks/profiler/utils/config.py (3)
243-244: Guard optional flag removal.
.removeraises if the flag is absent.Apply this diff:
- args.remove("--is-prefill-worker") + if "--is-prefill-worker" in args: + args.remove("--is-prefill-worker")
400-427: Possible UnboundLocalError in except; brittle parsing for concurrency line.
linemay be undefined if file open fails; avoid referencing it in the except. Also avoid slicing[:-1]to strip trailing chars.Apply this diff:
- try: + try: with open(dynamo_log_fn, "r") as f: for line in f: if "Maximum concurrency for" in line: line = line.strip().split("Maximum concurrency for ")[1] token_count = int( line.split(" tokens per request: ")[0].replace(",", "") ) - concurrency = float(line.split(" tokens per request: ")[1][:-1]) + # Strip any trailing punctuation/percent symbols/spaces + concurrency_str = line.split(" tokens per request: ")[1].strip().rstrip("%). ") + concurrency = float(concurrency_str) @@ - except Exception as e: - logger.warning( - f"Failed to parse KV cache size from line: {line}. Error: {e}" - ) + except Exception as e: + logger.warning(f"Failed to parse KV cache size from file {dynamo_log_fn}. Error: {e}")
687-725: Ensure TRTLLMWorker exists after conversion; fail fast if neither worker present.When neither prefill/decode is found, accessing
cfg.spec.services["TRTLLMWorker"]will crash.Apply this diff:
- worker_service = cfg.spec.services["TRTLLMWorker"] + if "TRTLLMWorker" not in cfg.spec.services: + raise ValueError("Missing TRT-LLM worker service after conversion; ensure prefill or decode worker exists") + worker_service = cfg.spec.services["TRTLLMWorker"]
🧹 Nitpick comments (46)
components/planner/src/dynamo/planner/defaults.py (1)
57-62: Clarify expected PROMETHEUS_ENDPOINT format and ensure consistencyConfirm whether PROMETHEUS_ENDPOINT must include a scheme (http/https). Current logic returns a full URL when the env var is set, but the fallback path may return a bare hostname (without scheme), which could be inconsistent at call sites.
Would you like a follow-up patch to normalize the return value to a full URL in all branches, or to validate/auto-prepend a default scheme (e.g., http://) when missing?
docs/kubernetes/sla_planner_deployment.md (5)
12-13: Clarify scrape vs. adjustment interval relationship to avoid empty query windowsState that the planner’s adjustment interval should be >= Prometheus scrape interval (with small slack). Otherwise, planner queries may hit gaps and produce unstable scaling.
You can insert a short NOTE right after the paragraph below Line 17:
The adjustment interval can be defined in the planner manifest as an argument. The default interval value can be found [here](/components/planner/src/dynamo/planner/defaults.py#L31). +> Note +> Ensure the planner adjustment interval is greater than or equal to the Prometheus scrape interval (plus a few seconds of slack) to avoid querying empty/partial windows and to stabilize scaling decisions.
16-17: Use descriptive link text and avoid fragile line anchorsReplace “here” with descriptive text and drop the line anchor to prevent link rot when the file changes.
Apply:
-The adjustment interval can be defined in the planner manifest as an argument. The default interval value can be found [here](/components/planner/src/dynamo/planner/defaults.py#L31). +The adjustment interval can be defined in the planner manifest as an argument. See the default value in the [planner defaults (defaults.py)](/components/planner/src/dynamo/planner/defaults.py).This also addresses markdownlint MD059.
16-17: Standardize terminology: decode vs backendElsewhere (Lines 92–98) logs say “decode workers,” while the diagram and Components list reference “backend.” Standardize on one term (ideally “decode”) or explicitly note that “backend” equals “decode” for vLLM to avoid confusion.
109-111: Parameterize Prometheus service name; it depends on Helm releaseService naming for kube-prometheus-stack varies by release (e.g., -kube-prometheus-prometheus). Provide discovery + port-forward steps.
Apply:
-# Verify Prometheus is accessible -kubectl port-forward svc/prometheus-kube-prometheus-prometheus -n monitoring 9090:9090 -curl "http://localhost:9090/api/v1/query?query=up" +# Verify Prometheus is accessible +# Discover the Prometheus service name (replace monitoring if you use a different namespace) +kubectl -n monitoring get svc | grep prometheus +# Port-forward (replace RELEASE with your Helm release name) +kubectl -n monitoring port-forward svc/RELEASE-kube-prometheus-prometheus 9090:9090 +curl "http://localhost:9090/api/v1/query?query=up"
126-133: subComponentType troubleshooting: verify minimum CRD version, fix filename, and improve link text
- The filename in the error (“disagg.yaml”) should match earlier instructions (“disagg_planner.yaml”).
- “here” link text violates MD059; use descriptive text.
- “> 0.5.0” may be inaccurate; please confirm the exact minimum CRD version that introduces subComponentType and update the docs.
Apply:
-Error from server (BadRequest): error when creating "components/backends/vllm/deploy/disagg.yaml": DynamoGraphDeployment in version "v1alpha1" cannot be handled as a DynamoGraphDeployment: strict decoding error: unknown field "spec.services.DecodeWorker.subComponentType", unknown field "spec.services.PrefillWorker.subComponentType" +Error from server (BadRequest): error when creating "components/backends/vllm/deploy/disagg_planner.yaml": DynamoGraphDeployment in version "v1alpha1" cannot be handled as a DynamoGraphDeployment: strict decoding error: unknown field "spec.services.DecodeWorker.subComponentType", unknown field "spec.services.PrefillWorker.subComponentType" -This is because the `subComponentType` field has only been added in newer versions of the DynamoGraphDeployment CRD (> 0.5.0). You can upgrade the CRD version by following the instructions [here](/docs/kubernetes/installation_guide.md). +This occurs because the `subComponentType` field exists only in newer versions of the DynamoGraphDeployment CRD (confirm the minimum version). Upgrade the CRD by following the [Kubernetes installation guide](/docs/kubernetes/installation_guide.md).Please verify and update the exact minimum version in the sentence above. Optionally add a quick check:
# Check if subComponentType is present in your CRD schema kubectl explain dynamographdeployment.spec.services.DecodeWorker | grep -i subComponentType || true kubectl explain dynamographdeployment.spec.services.PrefillWorker | grep -i subComponentType || truedeploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml (2)
788-788: Grammar tweak: “allows overriding”Tighten the wording for clarity and consistency.
- description: Dynamo namespace of the service (allows to override the Dynamo namespace of the service defined in annotations inside the Dynamo archive) + description: Dynamo namespace of the service (allows overriding the Dynamo namespace defined in annotations inside the Dynamo archive for this service)
918-920: Doc polish: grammar + expand “k8s” to “Kubernetes”; verify casing of mainContainerUse formal terminology and fix “allows to override”. Also, ensure the field is actually named “mainContainer” (lower camelCase) to avoid confusion.
- ExtraPodSpec allows to override the main pod spec configuration. - It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field - that allows overriding the main container configuration. + ExtraPodSpec allows overriding the main pod spec configuration. + It is a standard Kubernetes PodSpec. It also contains a mainContainer (standard Kubernetes Container) field + that allows overriding the main container configuration.Please confirm the actual field name is “mainContainer”. If it’s “MainContainer” in the schema, keep the casing here.
components/planner/src/dynamo/planner/kubernetes_connector.py (1)
186-193: Remove unused variable ‘available_types’.It’s computed but never used.
Apply this diff:
- # Collect all available subComponentTypes for better error messages - available_types = [] matching_services = [] for curr_name, curr_service in services.items(): service_sub_type = curr_service.get("subComponentType", "") - if service_sub_type: - available_types.append(service_sub_type) if service_sub_type == sub_component_type.value: matching_services.append((curr_name, curr_service))deploy/cloud/operator/internal/dynamo/graph.go (1)
995-997: Propagate sub-component label — consider also setting it on the parent DCD labelsSetting KubeLabelDynamoSubComponentType here is good. For consistency/queryability, also add this label when initially constructing the DynamoComponentDeployment labels so the DCD itself is labeled (not just pods/cliques). Example insertion in GenerateDynamoComponentsDeployments where other labels are set:
if component.SubComponentType != "" { labels[commonconsts.KubeLabelDynamoSubComponentType] = component.SubComponentType }deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go (1)
1157-1160: Avoid mutating CRD metadata labels in-placepodLabels points to the CRD’s metadata labels map via getKubeLabels; adding the sub-component label mutates that base map. To reduce side effects, clone before augmenting pod labels:
// after: podLabels := r.getKubeLabels(opt.dynamoComponentDeployment) podLabels = maps.Clone(podLabels)deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (1)
76-78: Add validation for allowed sub-component valuesConstrain values at the CRD level to prevent drift and ease client interoperability.
- // SubComponentType indicates the sub-role of this component (for example, "prefill"). - SubComponentType string `json:"subComponentType,omitempty"` + // SubComponentType indicates the sub-role of this component (for example, "prefill"). + // +kubebuilder:validation:Enum=prefill;decode + SubComponentType string `json:"subComponentType,omitempty"`deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml (4)
686-686: Wording nit: "allows overriding"Minor grammar fix to improve clarity.
- description: Dynamo namespace of the service (allows to override the Dynamo namespace of the service defined in annotations inside the Dynamo archive) + description: Dynamo namespace of the service (allows overriding the Dynamo namespace defined in annotations inside the Dynamo archive)
819-822: Clarify and align casing for mainContainerUse the exact field name in the description to avoid confusion and make intent explicit.
- ExtraPodSpec allows to override the main pod spec configuration. - It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field - that allows overriding the main container configuration. + ExtraPodSpec allows overriding the main PodSpec configuration. + It is a standard Kubernetes PodSpec with an additional "mainContainer" (standard k8s Container) + field that allows overriding the main container configuration.
10243-10243: Align description with field name "serviceName"Current text is ambiguous; clarify it refers to the Kubernetes Service name for this component.
- description: The name of the component + description: The name of the Kubernetes Service for this component
10257-10259: Constrain subComponentType to known values (and add discoverability)Add enum validation to prevent typos and consider exposing it as a printer column.
Apply enum to the schema:
subComponentType: description: SubComponentType indicates the sub-role of this component (for example, "prefill"). type: string + enum: + - prefill + - decodeAdditionally, add a printer column (outside this hunk) for quick visibility when listing CRs:
@@ - additionalPrinterColumns: - description: Dynamo component jsonPath: .spec.dynamoComponent name: DynamoComponent type: string - description: Available jsonPath: .status.conditions[?(@.type=='Available')].status name: Available type: string + - description: Sub-component role + jsonPath: .spec.subComponentType + name: SubComponent + type: stringPlease confirm planner/operator currently accept only "prefill"/"decode"; if more values are expected imminently, we can switch to a regex pattern or defer the enum.
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go (1)
701-705: Use domain-like SubComponentType and add nil-case test- SubComponentType: "test-sub-component", + SubComponentType: "prefill",
- Add a companion test case with SubComponentType unset to assert the sub-component label is omitted (backward compatibility).
deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponentdeployments.yaml (3)
819-821: Guard against conflicting ExtraPodSpec overridesExtraPodSpec now documents a MainContainer override; add a CEL validation to prevent setting both mainContainer and containers simultaneously to avoid ambiguous precedence.
Apply this diff:
extraPodSpec: description: |- ExtraPodSpec allows to override the main pod spec configuration. It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field that allows overriding the main container configuration. + x-kubernetes-validations: + - rule: "!(has(self.mainContainer) && has(self.containers) && size(self.containers) > 0)" + message: "Set either mainContainer or containers, not both."
10243-10244: Clarify serviceName descriptionAvoid confusion with “component” vs K8s Service. Recommend clarifying the description.
- serviceName: - description: The name of the component + serviceName: + description: The Kubernetes Service name for this component
10257-10260: Strengthen subComponentType schema (enum + clearer doc)To prevent typos and align with planner expectations, restrict allowed values and clarify examples.
- subComponentType: - description: SubComponentType indicates the sub-role of this component (for example, "prefill"). - type: string + subComponentType: + description: SubComponentType indicates the sub-role of this component (e.g., "prefill" or "decode"). + type: string + enum: + - prefill + - decodeOptional: expose this in kubectl get output by adding an additionalPrinterColumn (outside this hunk):
- Path: .spec.subComponentType
- Name: SubComponentType
- Type: string
Would you like me to propose the exact additionalPrinterColumns snippet in the versions block?
benchmarks/profiler/profile_sla.py (3)
155-156: Avoid cumulative mutation of config across TP loopset_config_tp_size returns a new dict but you feed the previous iteration’s mutated config back in. It’s safer to start from the base per iteration to prevent argument duplication or drift across backends.
Consider keeping a base prefill_config = convert_config(config, "prefill") and, inside the loop, calling set_config_tp_size on that base to get an iteration‑local config.
281-282: Same concern for decode TP loopReuse the base decode_config from convert_config(config, "decode") each iteration to avoid unwanted accumulation.
610-611: Repeat the KV‑tokens fallback for decode interpolationMirror the earlier safeguard to avoid empty sweeps when log parsing yields 0.
Apply this diff:
- max_kv_tokens = config_modifier.get_kv_cache_size_from_dynamo_log( - decode_config, work_dir, client.deployment_name, "decode" - ) + max_kv_tokens = config_modifier.get_kv_cache_size_from_dynamo_log( + decode_config, work_dir, client.deployment_name, "decode" + ) or (args.isl + args.osl)components/planner/test/kube.py (3)
250-250: Remove unused fixture argumentmock_custom_api isn’t used in this test.
Apply this diff:
-async def test_get_graph_deployment(k8s_api, mock_custom_api): +async def test_get_graph_deployment(k8s_api):
264-264: Remove unused fixture argumentmock_custom_api isn’t used here either.
Apply this diff:
-async def test_get_graph_deployment_not_found(k8s_api, mock_custom_api): +async def test_get_graph_deployment_not_found(k8s_api):
139-146: Minor API design note: async wrapper over sync client blocks the loopget_graph_deployment calls a synchronous Kubernetes client; awaiting it doesn’t prevent event‑loop blocking. Consider making get_graph_deployment synchronous or running the sync call in a thread executor to avoid blocking.
Proposed change in dynamo/planner/kube.py (outside this diff):
# Option A: make it sync def get_graph_deployment(self, graph_deployment_name: str) -> dict: try: return self._get_graph_deployment_from_name(graph_deployment_name) except client.ApiException as e: if e.status == 404: raise DynamoGraphDeploymentNotFoundError( deployment_name=graph_deployment_name, namespace=self.current_namespace ) raise # and in wait_for_graph_deployment_ready: call it without await # Option B: keep async and offload async def get_graph_deployment(self, graph_deployment_name: str) -> dict: loop = asyncio.get_running_loop() try: return await loop.run_in_executor( None, self._get_graph_deployment_from_name, graph_deployment_name ) except client.ApiException as e: if e.status == 404: raise DynamoGraphDeploymentNotFoundError( deployment_name=graph_deployment_name, namespace=self.current_namespace ) raisedeploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml (3)
788-788: Tighten wording for clarity (remove “allows to override”).Suggested edit improves grammar and keeps meaning intact.
- description: Dynamo namespace of the service (allows to override the Dynamo namespace of the service defined in annotations inside the Dynamo archive) + description: Dynamo namespace of the service (allows overriding the namespace defined via annotations in the Dynamo archive)
918-920: Clarify “ExtraPodSpec” text and fix “MainContainer” casing.
- Use “allows overriding” instead of “allows to override”.
- Refer to the field as “mainContainer” (camelCase) to match schema conventions.
- ExtraPodSpec allows to override the main pod spec configuration. - It is a k8s standard PodSpec. It also contains a MainContainer (standard k8s Container) field - that allows overriding the main container configuration. + ExtraPodSpec allows overriding the main PodSpec configuration. + It is a standard Kubernetes PodSpec and includes a mainContainer (standard Kubernetes Container) field + for overriding the main container configuration.
10342-10342: Align terminology: “serviceName” should describe a service.Current text says “component” which can be confusing given the field is serviceName.
- description: The name of the component + description: The name of the service (component)components/planner/test/kubernetes_connector.py (3)
35-39: Use Mock for sync method to avoid confusion
update_graph_replicasis synchronous. UsingAsyncMockhere is misleading. Switch toMock()for accuracy and to avoid accidental awaits.- mock_api.update_graph_replicas = AsyncMock() + mock_api.update_graph_replicas = Mock()
59-65: Avoid brittle assertion on exact error stringAsserting exact message text is fragile. Check for the presence of the env var token instead.
- assert set(exception.errors) == {"DYN_PARENT_DGD_K8S_NAME environment variable is not set"} + assert any("DYN_PARENT_DGD_K8S_NAME" in e for e in exception.errors)
184-189: Pass SubComponentType instead of a stringFor clarity and type safety, pass
SubComponentType(even though the exception is thrown before it’s used).- with pytest.raises(DynamoGraphDeploymentNotFoundError): - await kubernetes_connector.add_component(component_name) + with pytest.raises(DynamoGraphDeploymentNotFoundError): + await kubernetes_connector.add_component(SubComponentType.PREFILL)components/planner/src/dynamo/planner/utils/exceptions.py (1)
120-122: Remove stray trailing comma in constructorMinor cleanup:
__init__(self, )is odd; make it a standard no-arg signature.- def __init__(self, ): + def __init__(self): message = "target_replicas cannot be empty" super().__init__(message)components/planner/src/dynamo/planner/kube.py (2)
104-138: Check readiness before sleeping; don’t delay the first attemptCurrently you sleep before the first readiness check, adding unnecessary latency. Check first, then sleep between retries.
- for attempt in range(max_attempts): - await asyncio.sleep(delay_seconds) - - graph_deployment = await self.get_graph_deployment( - graph_deployment_name - ) + for attempt in range(1, max_attempts + 1): + graph_deployment = await self.get_graph_deployment(graph_deployment_name) @@ - logger.info( - f"[Attempt {attempt + 1}/{max_attempts}] " + logger.info( + f"[Attempt {attempt}/{max_attempts}] " f"(status: {ready_condition.get('status') if ready_condition else 'N/A'}, " f"message: {ready_condition.get('message') if ready_condition else 'no condition found'})" ) + if attempt < max_attempts: + await asyncio.sleep(delay_seconds)
26-28: Avoid configuring logging at import-timeCalling
configure_dynamo_logging()on import can have unintended side-effects. Prefer configuring logging in the application entrypoint and usinglogger = logging.getLogger(__name__)here.components/planner/src/dynamo/planner/virtual_connector.py (1)
282-305: Accept SubComponentType and normalize internallyThese helpers should accept
SubComponentTypefor symmetry withKubernetesConnector. Keep backward compatibility by acceptingstrtoo.- async def add_component(self, worker_type: str, blocking: bool = True): + async def add_component(self, worker_type: str | SubComponentType, blocking: bool = True): """Add a component by increasing its replica count by 1""" - if worker_type == "prefill": + kind = worker_type.value if isinstance(worker_type, SubComponentType) else worker_type + if kind == "prefill": await self._update_scaling_decision( num_prefill=self.num_prefill_workers + 1 ) - elif worker_type == "decode": + elif kind == "decode": await self._update_scaling_decision(num_decode=self.num_decode_workers + 1) @@ - async def remove_component(self, worker_type: str, blocking: bool = True): + async def remove_component(self, worker_type: str | SubComponentType, blocking: bool = True): """Remove a component by decreasing its replica count by 1""" - if worker_type == "prefill": + kind = worker_type.value if isinstance(worker_type, SubComponentType) else worker_type + if kind == "prefill": new_count = max(0, self.num_prefill_workers - 1) await self._update_scaling_decision(num_prefill=new_count) - elif worker_type == "decode": + elif kind == "decode": new_count = max(0, self.num_decode_workers - 1) await self._update_scaling_decision(num_decode=new_count)benchmarks/profiler/tests/config_test.py (1)
376-448: Consider parametrizing repeated “convert_config + validate” flowsThese blocks repeat across backends/targets. Parametrization would reduce duplication and improve maintainability.
benchmarks/profiler/utils/config.py (9)
182-194: Custom exception for multi-match; resolves Ruff TRY003 and clarifies failure mode.Replace the inline ValueError with a dedicated exception and move the message into the class.
Apply this diff:
- if len(matching_service_names) > 1: - raise ValueError(f"Multiple services with subComponentType {sub_component_type} found") + if len(matching_service_names) > 1: + raise MultipleSubComponentTypeMatchError(sub_component_type)Add this exception near the logger declarations:
class MultipleSubComponentTypeMatchError(ValueError): def __init__(self, sub_component_type: str): super().__init__( f"Multiple services with subComponentType '{sub_component_type}' found; expected at most one." )
60-60: Constrain subComponentType to known values.Use a Literal or Enum to validate only "prefill" or "decode" and catch config typos early.
Example:
-class Service(BaseModel): +class Service(BaseModel): @@ - subComponentType: Optional[str] = None + subComponentType: Optional[Literal["prefill", "decode"]] = None
316-323: Error message mentions decode even for prefill.Keep message neutral to match actual
worker_name.Apply this diff:
- raise ValueError( - f"Missing extraPodSpec or mainContainer in VLLM decode worker service '{worker_name}'" - ) + raise ValueError( + f"Missing extraPodSpec or mainContainer in VLLM worker service '{worker_name}'" + )
487-490: Guard deletion if prefill service is absent.- del cfg.spec.services[ - prefill_worker_name - ] + if prefill_worker_name in cfg.spec.services: + del cfg.spec.services[prefill_worker_name]
573-579: Add prefill fallback for SGLang model-name resolution.Avoids KeyError when only prefill exists.
- worker_service_name = get_service_name_from_sub_component_type(cfg.spec.services, "decode") - - if not worker_service_name: - worker_service_name = WORKER_COMPONENT_NAMES["sglang"].decode_worker_k8s_name + worker_service_name = get_service_name_from_sub_component_type(cfg.spec.services, "decode") + if not worker_service_name: + worker_service_name = get_service_name_from_sub_component_type(cfg.spec.services, "prefill") + if not worker_service_name: + worker_service_name = WORKER_COMPONENT_NAMES["sglang"].decode_worker_k8s_name + worker_service = cfg.spec.services.get(worker_service_name) + if not worker_service: + worker_service = cfg.spec.services.get(WORKER_COMPONENT_NAMES["sglang"].prefill_worker_k8s_name) + if not worker_service: + logger.warning( + f"Worker service '{worker_service_name}' not found, using default model name: {DEFAULT_MODEL_NAME}" + ) + return DEFAULT_MODEL_NAME - worker_service = cfg.spec.services[worker_service_name]
684-686: Guard deletion of decode worker.Avoid
KeyErrorif decode worker isn’t in services.- del cfg.spec.services[decode_worker_name] + if decode_worker_name in cfg.spec.services: + del cfg.spec.services[decode_worker_name]
726-739: Same guard for decode-only conversion path.- worker_service = cfg.spec.services["TRTLLMWorker"] + if "TRTLLMWorker" not in cfg.spec.services: + raise ValueError("Missing TRT-LLM worker service after conversion; ensure decode or prefill worker exists") + worker_service = cfg.spec.services["TRTLLMWorker"]
789-792: Safer dict access for TRT-LLM worker service.If the resolved name isn’t present, this will KeyError. Consider
.getwith an explicit error.- worker_service = cfg.spec.services[ - worker_name - ] + worker_service = cfg.spec.services.get(worker_name) + if not worker_service: + raise ValueError(f"TRT-LLM worker service '{worker_name}' not found in config")
368-368: TODO: Frontend resolution by componentType.Acknowledged. Consider adding a helper similar to
get_service_name_from_sub_component_typefor frontend when you tackle this.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (29)
benchmarks/profiler/profile_sla.py(6 hunks)benchmarks/profiler/tests/config_test.py(1 hunks)benchmarks/profiler/utils/config.py(21 hunks)components/backends/sglang/deploy/disagg_planner.yaml(2 hunks)components/backends/trtllm/deploy/disagg_planner.yaml(2 hunks)components/backends/vllm/deploy/disagg_planner.yaml(2 hunks)components/planner/src/dynamo/planner/__init__.py(1 hunks)components/planner/src/dynamo/planner/defaults.py(1 hunks)components/planner/src/dynamo/planner/kube.py(5 hunks)components/planner/src/dynamo/planner/kubernetes_connector.py(2 hunks)components/planner/src/dynamo/planner/utils/exceptions.py(1 hunks)components/planner/src/dynamo/planner/utils/planner_core.py(4 hunks)components/planner/src/dynamo/planner/virtual_connector.py(6 hunks)components/planner/test/kube.py(9 hunks)components/planner/test/kubernetes_connector.py(8 hunks)deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml(4 hunks)deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml(4 hunks)deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go(1 hunks)deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponentdeployments.yaml(4 hunks)deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml(4 hunks)deploy/cloud/operator/internal/consts/consts.go(1 hunks)deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go(1 hunks)deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go(3 hunks)deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go(1 hunks)deploy/cloud/operator/internal/dynamo/graph.go(1 hunks)deploy/cloud/operator/internal/dynamo/graph_test.go(7 hunks)docs/kubernetes/sla_planner_deployment.md(4 hunks)tests/planner/perf_test_configs/disagg_8b_planner.yaml(2 hunks)tests/planner/scaling/disagg_planner.yaml(2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-07-18T16:05:05.534Z
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.534Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.
Applied to files:
deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponentdeployments.yamldeploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yamldeploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml
📚 Learning: 2025-09-04T19:03:06.643Z
Learnt from: biswapanda
PR: ai-dynamo/dynamo#2872
File: examples/multimodal/deploy/agg_qwen.yaml:53-60
Timestamp: 2025-09-04T19:03:06.643Z
Learning: In the dynamo repository, Kubernetes Custom Resources use `gpu: "1"` format for GPU resource limits and requests, not the standard Kubernetes `nvidia.com/gpu: 1` format. This applies to DynamoGraphDeployment resources and other dynamo CRs.
Applied to files:
deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yamldeploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml
📚 Learning: 2025-07-18T16:04:47.465Z
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.465Z
Learning: The `stopSignal` field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.
Applied to files:
deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml
🧬 Code graph analysis (14)
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller.go (2)
components/planner/src/dynamo/planner/kubernetes_connector.py (1)
SubComponentType(38-40)deploy/cloud/operator/internal/consts/consts.go (1)
KubeLabelDynamoSubComponentType(40-40)
deploy/cloud/operator/api/v1alpha1/dynamocomponentdeployment_types.go (1)
components/planner/src/dynamo/planner/kubernetes_connector.py (1)
SubComponentType(38-40)
components/planner/test/kube.py (2)
components/planner/src/dynamo/planner/kube.py (5)
KubernetesAPI(40-137)update_graph_replicas(80-92)is_deployment_ready(94-102)wait_for_graph_deployment_ready(104-137)get_graph_deployment(63-78)components/planner/src/dynamo/planner/utils/exceptions.py (1)
DynamoGraphDeploymentNotFoundError(35-53)
deploy/cloud/operator/internal/dynamo/graph.go (2)
components/planner/src/dynamo/planner/kubernetes_connector.py (1)
SubComponentType(38-40)deploy/cloud/operator/internal/consts/consts.go (1)
KubeLabelDynamoSubComponentType(40-40)
components/planner/src/dynamo/planner/__init__.py (1)
components/planner/src/dynamo/planner/kubernetes_connector.py (2)
TargetReplica(50-53)SubComponentType(38-40)
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go (2)
deploy/cloud/operator/internal/consts/consts.go (2)
ComponentTypeWorker(53-53)KubeLabelDynamoSubComponentType(40-40)components/planner/src/dynamo/planner/kubernetes_connector.py (1)
SubComponentType(38-40)
components/planner/src/dynamo/planner/virtual_connector.py (2)
components/planner/src/dynamo/planner/kubernetes_connector.py (6)
SubComponentType(38-40)TargetReplica(50-53)wait_for_deployment_ready(126-136)verify_prefill_and_decode_components_exist(100-124)add_component(67-81)remove_component(83-98)components/planner/src/dynamo/planner/utils/exceptions.py (1)
DeploymentValidationError(100-110)
benchmarks/profiler/profile_sla.py (1)
benchmarks/profiler/utils/config.py (8)
set_config_tp_size(164-165)set_config_tp_size(290-335)set_config_tp_size(523-567)set_config_tp_size(782-830)get_kv_cache_size_from_dynamo_log(176-177)get_kv_cache_size_from_dynamo_log(401-427)get_kv_cache_size_from_dynamo_log(632-651)get_kv_cache_size_from_dynamo_log(896-930)
benchmarks/profiler/tests/config_test.py (1)
benchmarks/profiler/utils/config.py (20)
Config(72-75)SGLangConfigModifier(430-651)TrtllmConfigModifier(654-930)VllmV1ConfigModifier(197-427)Service(56-61)get_service_name_from_sub_component_type(182-193)break_arguments(78-90)parse_override_engine_args(136-155)get_model_name(168-169)get_model_name(338-366)get_model_name(570-598)get_model_name(833-876)convert_config(160-161)convert_config(199-287)convert_config(432-520)convert_config(656-779)set_config_tp_size(164-165)set_config_tp_size(290-335)set_config_tp_size(523-567)set_config_tp_size(782-830)
components/planner/src/dynamo/planner/utils/planner_core.py (2)
components/planner/src/dynamo/planner/kubernetes_connector.py (6)
KubernetesConnector(56-214)TargetReplica(50-53)SubComponentType(38-40)set_component_replicas(138-169)verify_prefill_and_decode_components_exist(100-124)wait_for_deployment_ready(126-136)components/planner/src/dynamo/planner/virtual_connector.py (4)
VirtualConnector(29-328)set_component_replicas(306-328)verify_prefill_and_decode_components_exist(115-126)wait_for_deployment_ready(95-113)
deploy/cloud/operator/internal/dynamo/graph_test.go (2)
components/planner/src/dynamo/planner/kubernetes_connector.py (1)
SubComponentType(38-40)deploy/cloud/operator/internal/consts/consts.go (2)
KubeLabelDynamoSubComponentType(40-40)ComponentTypeWorker(53-53)
components/planner/src/dynamo/planner/kubernetes_connector.py (3)
components/planner/src/dynamo/planner/utils/exceptions.py (5)
DeploymentValidationError(100-110)DuplicateSubComponentError(80-97)EmptyTargetReplicasError(113-122)ComponentError(56-62)SubComponentNotFoundError(65-77)benchmarks/profiler/utils/config.py (1)
Service(56-61)components/planner/src/dynamo/planner/kube.py (5)
KubernetesAPI(40-137)get_graph_deployment(63-78)update_graph_replicas(80-92)wait_for_graph_deployment_ready(104-137)is_deployment_ready(94-102)
components/planner/test/kubernetes_connector.py (3)
components/planner/src/dynamo/planner/utils/exceptions.py (5)
DeploymentValidationError(100-110)DuplicateSubComponentError(80-97)DynamoGraphDeploymentNotFoundError(35-53)EmptyTargetReplicasError(113-122)SubComponentNotFoundError(65-77)components/planner/src/dynamo/planner/kubernetes_connector.py (9)
KubernetesConnector(56-214)SubComponentType(38-40)TargetReplica(50-53)get_service_from_sub_component_type_or_name(173-214)number_replicas(47-48)add_component(67-81)remove_component(83-98)set_component_replicas(138-169)verify_prefill_and_decode_components_exist(100-124)components/planner/src/dynamo/planner/kube.py (4)
is_deployment_ready(94-102)get_graph_deployment(63-78)update_graph_replicas(80-92)wait_for_graph_deployment_ready(104-137)
components/planner/src/dynamo/planner/kube.py (2)
components/planner/src/dynamo/planner/utils/exceptions.py (1)
DynamoGraphDeploymentNotFoundError(35-53)lib/bindings/python/src/dynamo/runtime/logging.py (1)
configure_dynamo_logging(77-107)
🪛 Ruff (0.13.1)
components/planner/test/kube.py
250-250: Unused function argument: mock_custom_api
(ARG001)
264-264: Unused function argument: mock_custom_api
(ARG001)
components/planner/src/dynamo/planner/virtual_connector.py
115-115: Unused method argument: prefill_component_name
(ARG002)
115-115: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
115-115: Unused method argument: decode_component_name
(ARG002)
115-115: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
components/planner/src/dynamo/planner/kubernetes_connector.py
100-100: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
100-100: PEP 484 prohibits implicit Optional
Convert to T | None
(RUF013)
components/planner/src/dynamo/planner/kube.py
77-77: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
benchmarks/profiler/utils/config.py
189-189: Avoid specifying long messages outside the exception class
(TRY003)
🪛 markdownlint-cli2 (0.18.1)
docs/kubernetes/sla_planner_deployment.md
16-16: Link text should be descriptive
(MD059, descriptive-link-text)
132-132: Link text should be descriptive
(MD059, descriptive-link-text)
🔇 Additional comments (31)
docs/kubernetes/sla_planner_deployment.md (1)
30-30: LGTM: kube-prometheus-stack prerequisite is clearThe prerequisite aligns with removing embedded Prometheus from planner deployments.
components/planner/src/dynamo/planner/utils/planner_core.py (2)
14-14: Ensure re-exports or import directly from submodules
ImportingKubernetesConnector,VirtualConnector,TargetReplica, andSubComponentTypefromdynamo.plannerroot requires them to be exposed indynamo/planner/__init__.py. Confirm they’re listed in__all__(or explicitly imported), or switch to importing each from its defining submodule.
96-98: Attribute access is safe; all component classes define bothprefill_worker_k8s_nameanddecode_worker_k8s_name.deploy/cloud/operator/internal/dynamo/graph_test.go (1)
65-69: Sub-component labeling and propagation look correct.Usage of SubComponentType in specs and KubeLabelDynamoSubComponentType in labels is consistent and covered across single-node and multinode cases. Good coverage.
Also applies to: 110-115, 1093-1095, 1246-1247, 1650-1652, 1774-1776, 1925-1927
components/planner/src/dynamo/planner/kubernetes_connector.py (2)
38-54: Shift to SubComponentType- and TargetReplica‑based scaling is solid.Enum and models are clear/typed; fallback logic is handled elsewhere. Nice.
147-151: Confirm readiness gate before scaling won’t stall recovery.Skipping updates when the deployment isn’t Ready may block corrective scaling during convergence. If intentional, consider documenting the rationale, or allow an override flag.
components/planner/src/dynamo/planner/__init__.py (1)
11-13: Public API exports look right.Re-exporting SubComponentType and TargetReplica aligns callers with the new interface.
Also applies to: 17-17
deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go (1)
128-129: Improved status update logging.Including CRD, state, reason, and message will help triage status write failures.
components/backends/vllm/deploy/disagg_planner.yaml (1)
70-71: Correctly typed subComponentType for VLLM workers.Values match planner expectations: decode/prefill.
If applicable, ensure sglang/trtllm disagg configs also set subComponentType consistently across workers.
Also applies to: 96-97
deploy/cloud/operator/internal/consts/consts.go (1)
40-40: LGTM: new label constantThe key name aligns with existing label conventions.
tests/planner/scaling/disagg_planner.yaml (1)
70-70: LGTM: subComponentType values match planner enumdecode/prefill align with the planner’s expected values.
Also applies to: 94-94
deploy/cloud/operator/internal/controller/dynamocomponentdeployment_controller_test.go (2)
787-787: Leader pod now carries SubComponentType label: LGTMGood to see the label propagated to leader pods; helps selectors/observability stay consistent. Ensure any consumers expect it on leaders too.
898-898: Worker pod SubComponentType label: LGTMExpectation matches the new API surface and consts.
deploy/cloud/operator/config/crd/bases/nvidia.com_dynamocomponentdeployments.yaml (1)
686-688: Docs tweak LGTMDescription casing/readability improvement looks good.
components/backends/sglang/deploy/disagg_planner.yaml (1)
70-70: subComponentType annotations for workers — LGTMThe additions align with planner/kube lookup by subComponentType and the runtime args (--disaggregation-mode) are consistent.
Confirm the CRD/operator version deployed in your clusters accepts spec.services.*.subComponentType so these manifests won’t be rejected.
Also applies to: 102-102
components/backends/trtllm/deploy/disagg_planner.yaml (1)
88-88: Explicit subComponentType for TRT‑LLM workers — LGTMConsistent with planner/profiler changes and fallback naming. Probes/args already reflect decode/prefill modes.
Double‑check WORKER_COMPONENT_NAMES["trtllm"] maps to TRTLLMDecodeWorker/TRTLLMPrefillWorker to keep fallback behavior sane.
Also applies to: 137-137
tests/planner/perf_test_configs/disagg_8b_planner.yaml (1)
104-104: VLLM workers labeled with subComponentType — LGTMMatches planner/kube and profiler expectations.
Also applies to: 161-161
benchmarks/profiler/profile_sla.py (2)
490-491: Best prefill TP application — LGTMSignature update with explicit "prefill" target looks correct.
562-563: Best decode TP application — LGTMSignature update with explicit "decode" target looks correct.
components/planner/test/kube.py (5)
79-91: update_graph_replicas test — LGTMValidates the patch payload/path precisely.
105-106: is_deployment_ready tests — LGTMCovers True/False paths via conditions parsing.
Also applies to: 123-124
182-194: Not‑found path coverage — LGTMAsserts DynamoGraphDeploymentNotFoundError and validates contextual fields.
20-21: Import client — LGTMNeeded for ApiException checks.
24-24: Using public exception type in tests — LGTMValidates new error surface.
deploy/cloud/operator/config/crd/bases/nvidia.com_dynamographdeployments.yaml (1)
10356-10359: Constrain subComponentType with enum in both CRDs
Open string is error-prone—replace the plain string with an enum of known values (e.g. “prefill”, “decode”) in both nvidia.com_dynamographdeployments.yaml and nvidia.com_dynamocomponentdeployments.yaml. Confirm the full list of valid subComponentType values in the operator code and update accordingly.subComponentType: - description: SubComponentType indicates the sub-role of this component (for example, "prefill"). - type: string + description: SubComponentType indicates the sub-role of this component (e.g., "prefill" or "decode"). + type: string + enum: + - prefill + - decodecomponents/planner/test/kubernetes_connector.py (1)
396-426: Nice coverage of duplicate/missing sub-componentsGood validation of combined error aggregation and message content for duplicates and missing components.
components/planner/src/dynamo/planner/kube.py (1)
94-103: LGTM: readiness check is correctThe Ready condition detection is straightforward and robust.
components/planner/src/dynamo/planner/virtual_connector.py (1)
95-114: LGTM: non-K8s readiness loop is sensibleThe readiness polling mirrors K8s behavior without raising; good parity.
benchmarks/profiler/tests/config_test.py (1)
1-13: Solid, comprehensive test coverageGood breadth across VLLM/SGLang/TRT-LLM paths with and without subComponentType, including TP updates and arg translation.
benchmarks/profiler/utils/config.py (2)
933-937: Approve code changes
All call sites passconfigas the first argument toget_kv_cache_size_from_dynamo_log; updating the Protocol signature remains the only required change.
28-28: WORKER_COMPONENT_NAMES['trtllm'] defines decode_worker_k8s_name – verified that TrtllmComponentName includes decode_worker_k8s_name (and related attributes), so the lookup is safe.
975027e to
a972063
Compare
a972063 to
1c7ba91
Compare
deploy/cloud/operator/internal/controller/dynamographdeployment_controller.go
Show resolved
Hide resolved
Signed-off-by: tmontfort <[email protected]>
Signed-off-by: tmontfort <[email protected]>
Signed-off-by: tmontfort <[email protected]>
1e48d4f to
20504a2
Compare
tedzhouhk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we update the planner doc and add that now planner needs the cluster to setup a cluster-wide prometheus server and provide instructions/point to existing doc?
I updated |
Signed-off-by: hongkuanz <[email protected]>
Overview:
This PR adds a field
subComponentTypeto the DGD service API for the planner component to determine theprefillanddecodeservices (components) to scale.Details:
benchmark/profiler: updates profiler to add support for using thesubComponentTypeto modify configs for prefill/decode profiling. Has backwards support for the hardcoded component names per backend framework.disagg_planner.yaml:Prometheusservice definitions in favor of using thekube-prometheus-stackk8s installation.subComponentTypefieldcomponents/planner/src/dynamo/planner:kubernetes_connector.pyto use thesubComponentTypefield but includes fallback logic for hardcoded component names.planner_core.pyto verify that the DGD deployment has a prefill and decode component defined (via subComponentType or hardcoded component names) before starting planner loop - this ensures we fail fast in the case of an invalid DGD according to planner.deploy/cloud: DefinesubComponentTypein API - optional field. Adds sub component type label to downstream k8s resources.Edited:
Dockerfile*: removing the prometheus dependency ascomponents/planner/src/dynamo/planner/prometheus.pyhas been removed. This would start a prometheus server as a component of DGD but has been deprecated in favor of using cluster wide prometheus helm installation.Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit