Skip to content

Conversation

@tedzhouhk
Copy link
Contributor

@tedzhouhk tedzhouhk commented Sep 10, 2025

  • add an --override-engine-args argument in trtllm to override flags in yaml file
  • support trtllm in pre-deployment sweeping, add test
  • support trtllm in sla planner
  • update docs

Summary by CodeRabbit

  • New Features

    • Added TensorRT-LLM support to the SLA-Based Planner and profiler backend options.
    • Introduced Disaggregated Planner deployment pattern for TRT-LLM.
    • Added JSON-based engine-args overrides and dynamic namespace support via environment variable.
    • Improved CLI/argument handling for robust quoting and JSON payloads.
  • Documentation

    • Updated support matrices to mark TRT-LLM as supported.
    • Added deployment guidance and commands for additional backends.
  • Chores

    • Enhanced Docker images: bundled deploy artifacts, installed common deps, and improved dev startup experience.
  • Tests

    • Added dry-run test coverage for the TRT-LLM backend in the profiler.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 10, 2025

Walkthrough

Adds TRT-LLM as a supported backend across planner, profiler, runtime, and docs. Introduces config utilities, CLI flags, and deployment manifests for disaggregated planner workflows. Implements JSON-based engine-args overrides and dynamic namespace endpoints for TRT-LLM. Updates Dockerfiles, profiling tests, and documentation tables to reflect support.

Changes

Cohort / File(s) Summary
Docs: Support matrices and guides
README.md, components/backends/trtllm/README.md, docs/architecture/planner_intro.rst, docs/benchmarks/pre_deployment_profiling.md, docs/guides/dynamo_deploy/sla_planner_deployment.md
Mark SLA-Based Planner/TensorRT-LLM as supported; add commented deploy commands; clarify expected pods; no code changes.
Profiler: Backend options and tests
benchmarks/profiler/profile_sla.py, tests/profiler/test_profile_sla_dryrun.py
Add "trtllm" to --backend choices and help text; add dry-run test for trtllm mirroring existing cases.
Profiler: Config utilities and TRT-LLM modifier
benchmarks/profiler/utils/config.py
Switch arg parsing to shlex; add TrtllmConfigModifier (convert_config, set_config_tp_size, getters, KV cache parsing); export via CONFIG_MODIFIERS and all.
TRT-LLM runtime: Config/CLI and overrides
components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py, components/backends/trtllm/src/dynamo/trtllm/main.py
Add DYN_NAMESPACE-driven endpoints; add --override-engine-args and Config.override_engine_args; deep-merge JSON overrides into engine args with error exit on parse failure.
Planner: Backend registration
components/planner/src/dynamo/planner/defaults.py, components/planner/src/dynamo/planner/utils/planner_argparse.py
Define TrtllmComponentName and register in WORKER_COMPONENT_NAMES; allow "trtllm" in planner CLI backend choices.
Deploy: Disaggregated planner
components/backends/trtllm/deploy/README.md, components/backends/trtllm/deploy/disagg_planner.yaml
Document and add YAML for disaggregated planner deployment with Frontend, Planner, Prometheus, and separate prefill/decode workers; PVC mount for profiling results.
Containers: Build/runtime adjustments
container/Dockerfile.sglang, container/Dockerfile.trtllm
Copy deploy assets into images; install deps via requirements.txt; set banner/entrypoint for dev TRT-LLM image.
Misc: Type-check hygiene
deploy/utils/dynamo_deployment.py
Remove type: ignore from aiofiles import.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as User
  participant CLI as trtllm_utils.py (CLI)
  participant C as Config
  participant M as main.py (init)
  participant E as TRT-LLM Engine

  U->>CLI: run dynamo.trtllm ... --override-engine-args '{"engine":{"tp":2}}'
  CLI->>C: parse args, set override_engine_args
  C-->>M: Config with override_engine_args
  M->>M: parse JSON (override-engine-args)
  alt JSON parse ok
    M->>M: deep_merge overrides into arg_map
    M->>E: start with merged engine args
    E-->>U: ready
  else JSON parse error
    M-->>U: log error and exit(1)
  end
Loading
sequenceDiagram
  autonumber
  actor Client
  participant FE as Frontend (HTTP 8000)
  participant Planner as SLA Planner
  participant Prom as Prometheus
  participant Pref as TRTLLMPrefillWorker
  participant Dec as TRTLLMDecodeWorker
  note over Planner,Prom: Profiling results PVC mounted (/workspace/profiling_results)

  Client->>FE: request generate
  FE->>Planner: query capacity/targets
  Planner->>Prom: scrape metrics
  Prom-->>Planner: metrics
  Planner-->>FE: routing decision (prefill/decode)
  FE->>Pref: prefill (disagg mode)
  Pref-->>FE: prefill outputs
  FE->>Dec: decode
  Dec-->>FE: tokens
  FE-->>Client: response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Pre-merge checks (1 passed, 2 warnings)

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The current description consists only of high-level bullet points and does not follow the repository’s required template. It is missing the “Overview,” “Details,” “Where should the reviewer start?”, and “Related Issues” sections, so reviewers lack structured context about intent, specific changes, entry points, and issue linkage. This omission reduces clarity and makes it harder to verify completeness against the PR objectives. Please restructure the description using the provided template by adding an “Overview” section summarizing the PR’s purpose, a “Details” section describing each change, a “Where should the reviewer start?” section listing key files, and a “Related Issues” section with action keywords (e.g., “Closes #xxx”) to link relevant GitHub issues. Completing these sections will ensure consistency, improve reviewer guidance, and enhance traceability.
Docstring Coverage ⚠️ Warning Docstring coverage is 27.78% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (1 passed)
Check name Status Explanation
Title Check ✅ Passed The title “feat: support trtllm in sla-planner” clearly indicates the addition of trtllm support within the SLA planner component, which is indeed one aspect of the PR. However, the changes also introduce an --override-engine-args CLI flag, extend pre-deployment sweeping with new tests, and update a variety of documentation files, so the title only partially reflects the overall scope. As a result, while relevant, it does not fully convey the breadth of enhancements in this pull request.

Poem

A rabbit taps the deploy drum, hop-hop—
Prefill, decode, the planners don’t stop.
Namespaces dance, engines align,
JSON whispers, overrides fine.
Pods wake up in tidy rows—
SLA moonlight, throughput glows.
Carrot-coded, the pipeline flows. 🥕✨

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
docs/benchmarks/pre_deployment_profiling.md (1)

155-174: Add trtllm to backend choices in the example YAML args

The docs still show only vllm/sglang; include trtllm to avoid confusion.

Apply this diff:

           - --backend
-            - <vllm/sglang>
+            - <vllm/sglang/trtllm>
components/planner/src/dynamo/planner/defaults.py (1)

57-70: Fix fallback Prometheus endpoint: missing scheme and port.

When k8s_namespace is empty/default, the function returns just <namespace>-prometheus (no scheme/port), which breaks HTTP clients and is inconsistent with the non-default branch.

Apply:

-        return f"{namespace}-prometheus"
+        # Fallback to in-cluster service DNS without namespace and include port
+        return f"http://{namespace}-prometheus:{port}"
components/backends/trtllm/deploy/README.md (1)

235-236: Fix broken link to client section in trtllm README
The anchor #client in components/backends/trtllm/deploy/README.md (lines 235–236) does not exist in components/backends/vllm/README.md. Update the link to point at the correct section (e.g. #vllm-quick-start) or add a “Client” heading in the vLLM README.

🧹 Nitpick comments (22)
container/Dockerfile.sglang (1)

239-247: Remove duplicate COPY of deploy directory.

deploy/ is copied twice in the runtime stage (Line 240 and Line 246). The new line introduces a redundant layer, needlessly increasing image size and cache bust frequency. Keep only one COPY.

If deploy artifacts are also needed in the framework stage, confirm and add a single COPY there instead of duplicating in runtime.

Apply:

 COPY examples /workspace/examples
- COPY deploy /workspace/deploy
 RUN uv pip install /workspace/benchmarks

 # Copy benchmarks, backends and tests for CI
 COPY tests /workspace/tests
 COPY benchmarks /workspace/benchmarks
 COPY deploy /workspace/deploy
docs/benchmarks/pre_deployment_profiling.md (1)

52-58: Specify code-fence language for sample output

Add a language to satisfy MD040 and improve rendering.

Apply this diff:

-```
+```text
 2025-05-16 15:20:24 - __main__ - INFO - Analyzing results and generate recommendations...
 ...

</blockquote></details>
<details>
<summary>components/planner/src/dynamo/planner/utils/planner_argparse.py (1)</summary><blockquote>

`40-44`: **Avoid drift by deriving backend choices from a single source of truth**

Use the central mapping (defaults.WORKER_COMPONENT_NAMES) for CLI choices so new backends don’t require multiple edits.

Apply this diff:

```diff
-from dynamo.planner.defaults import SLAPlannerDefaults
+from dynamo.planner.defaults import SLAPlannerDefaults, WORKER_COMPONENT_NAMES
@@
-    parser.add_argument(
+    parser.add_argument(
         "--backend",
         default=SLAPlannerDefaults.backend,
-        choices=["vllm", "sglang", "trtllm"],
+        choices=sorted(WORKER_COMPONENT_NAMES.keys()),
         help="Backend type",
     )
docs/guides/dynamo_deploy/sla_planner_deployment.md (1)

50-57: Add code-fence language to the “Expected pods” block

Small docs lint fix (MD040). Recommend text/console.

Apply this diff:

-```
+```text
 # For vLLM:
 vllm-disagg-planner-frontend-*            1/1 Running
 vllm-disagg-planner-prometheus-*          1/1 Running
 vllm-disagg-planner-planner-*             1/1 Running
 vllm-disagg-planner-backend-*             1/1 Running
 vllm-disagg-planner-prefill-*             1/1 Running

</blockquote></details>
<details>
<summary>container/Dockerfile.trtllm (2)</summary><blockquote>

`332-335`: **Install via uv for consistency with the rest of the image**

Optional: use uv pip here (as elsewhere) to avoid mixing installers and to leverage uv’s resolver/caching.

Apply this diff:

```diff
-RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requirements.txt \
-    pip install --requirement /tmp/requirements.txt
+RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requirements.txt \
+    uv pip install --requirement /tmp/requirements.txt

505-505: Copying entire deploy/ into runtime may bloat the image

If only a subset is needed at runtime/CI, consider copying just required subpaths (e.g., deploy/utils or specific manifests) to keep image size down.

components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py (4)

17-23: Unify env var naming for namespace.

Elsewhere we use DYNAMO_NAMESPACE; here we read DYN_NAMESPACE. Recommend reading the same var for consistency.

-DYN_NAMESPACE = os.environ.get("DYN_NAMESPACE", "dynamo")
+DYN_NAMESPACE = os.environ.get("DYNAMO_NAMESPACE", os.environ.get("DYN_NAMESPACE", "dynamo"))

222-227: Clarify flag semantics: it's JSON, not a Python dict.

Help string says “Python dictionary string” but we parse with json.loads. Avoid ambiguity and guide quoting.

-        help='Python dictionary string to override specific engine arguments from the YAML file. Example: \'{"tensor_parallel_size": 2, "kv_cache_config": {"enable_block_reuse": false}}\'',
+        help='JSON string to override specific engine arguments from the YAML file. Example: \'{"tensor_parallel_size": 2, "kv_cache_config": {"enable_block_reuse": false}}\'',

241-259: Nit: boolean flag parsing.

--use-nixl-connect uses type=bool; typical CLI UX is action="store_true". Optional to adjust.

-    parser.add_argument(
-        "--use-nixl-connect",
-        type=bool,
-        default=False,
-        help="Use NIXL Connect for communication between workers.",
-    )
+    parser.add_argument(
+        "--use-nixl-connect",
+        action="store_true",
+        help="Use NIXL Connect for communication between workers.",
+    )

300-371: Use enum .value for argparse defaults
Replace

default=DEFAULT_DISAGGREGATION_MODE

and

default=DEFAULT_DISAGGREGATION_STRATEGY

with

default=DEFAULT_DISAGGREGATION_MODE.value

and

default=DEFAULT_DISAGGREGATION_STRATEGY.value

to align the str type and choices with the actual default values.

components/backends/trtllm/src/dynamo/trtllm/main.py (1)

242-252: Logging engine args is helpful; consider redaction if sensitive.

If args may include secrets in YAML, redact before logging.

benchmarks/profiler/utils/config.py (4)

812-839: Narrow exception and improve parsing resilience.

Catching bare Exception is noisy (BLE001). Also handle large logs safely.

-        except Exception as e:
-            logger.warning(f"Failed to parse KV cache size from log file. Error: {e}")
+        except (OSError, ValueError, re.error) as e:
+            logger.warning(f"Failed to parse KV cache size from log file. Error: {e}")

565-701: Optional: avoid hard-coded TRT-LLM worker names.

You already import WORKER_COMPONENT_NAMES; using it reduces drift if names change.

-            if "TRTLLMPrefillWorker" in cfg.spec.services:
+            prefill_name = WORKER_COMPONENT_NAMES["trtllm"].prefill_worker_k8s_name
+            if prefill_name in cfg.spec.services:
-                cfg.spec.services["TRTLLMWorker"] = cfg.spec.services[
-                    "TRTLLMPrefillWorker"
-                ]
-                del cfg.spec.services["TRTLLMPrefillWorker"]
+                cfg.spec.services["TRTLLMWorker"] = cfg.spec.services[prefill_name]
+                del cfg.spec.services[prefill_name]
...
-            if "TRTLLMDecodeWorker" in cfg.spec.services:
+            decode_name = WORKER_COMPONENT_NAMES["trtllm"].decode_worker_k8s_name
+            if decode_name in cfg.spec.services:
-                cfg.spec.services["TRTLLMWorker"] = cfg.spec.services[
-                    "TRTLLMDecodeWorker"
-                ]
-                del cfg.spec.services["TRTLLMDecodeWorker"]
+                cfg.spec.services["TRTLLMWorker"] = cfg.spec.services[decode_name]
+                del cfg.spec.services[decode_name]

82-95: Tiny improvement to remove_valued_arguments.

Currently removes only the first occurrence. If args can be duplicated, consider a loop.

 def remove_valued_arguments(args: list[str], key: str) -> list[str]:
-    """Remove a valued argument (e.g., --key value) from the arguments list if exists."""
-    if key in args:
-        idx = args.index(key)
-        if idx + 1 < len(args):
-            del args[idx : idx + 2]
+    """Remove all occurrences of a valued argument (e.g., --key value)."""
+    while True:
+        try:
+            idx = args.index(key)
+            del args[idx : min(idx + 2, len(args))]
+        except ValueError:
+            break
     return args

24-31: Logger setup ok; avoid duplicate handlers in repeated imports.

If this module can be imported multiple times, guard handler addition.

-if not any(isinstance(h, logging.StreamHandler) for h in logger.handlers):
-    logger.addHandler(console_handler)
+if not any(isinstance(h, logging.StreamHandler) for h in logger.handlers):
+    logger.addHandler(console_handler)

(Ensure a guard is present; if not, add it.)

components/backends/trtllm/deploy/README.md (3)

45-54: Add this new pattern to the “Choose Your Template” list and cross-link the YAML.

Readers won’t discover disagg_planner.yaml from the “Choose Your Template” section. Add a bullet there to keep docs consistent.

Example insertion to Lines 132-138:

- - Use `disagg_router.yaml` for high-performance with KV cache routing and disaggregation
+ - Use `disagg_router.yaml` for high-performance with KV cache routing and disaggregation
+ - Use `disagg_planner.yaml` for SLA-based auto-scaling with a separate planner and Prometheus

55-56: Call out the required PVC for the planner.

disagg_planner.yaml expects a pre-created PVC named dynamo-pvc. Mention it here to prevent failed deploys.

 > [!NOTE]
 > This deployment requires pre-deployment profiling to be completed first. See [Pre-Deployment Profiling](../../../../docs/benchmarks/pre_deployment_profiling.md) for detailed instructions.
+>
+> Additionally, ensure a PersistentVolumeClaim named `dynamo-pvc` exists in the target namespace; the planner mounts it at `/workspace/profiling_results`.

193-194: Fix port-forward target; current example mixes Deployment and Pod naming.

Use the Service or Deployment name without pod UID. The service created by the CRD is typically <deployment-name>-frontend.

-kubectl port-forward deployment/trtllm-v1-disagg-frontend-<pod-uuid-info> 8000:8000
+# Prefer forwarding the Service:
+kubectl port-forward svc/trtllm-disagg-frontend 8000:8000
+# Or forward the Deployment (no pod UID):
+# kubectl port-forward deployment/trtllm-disagg-frontend 8000:8000
tests/profiler/test_profile_sla_dryrun.py (1)

27-49: Avoid hardcoding /tmp; use pytest’s tmp_path for isolation and to silence S108.

This removes shared state and addresses the static-analysis warning.

-    @pytest.fixture
-    def vllm_args(self):
+    @pytest.fixture
+    def vllm_args(self, tmp_path):
@@
-            output_dir = "/tmp/test_profiling_results"
+            output_dir = str(tmp_path / "profiling_results")
@@
-    @pytest.fixture
-    def sglang_args(self):
+    @pytest.fixture
+    def sglang_args(self, tmp_path):
@@
-            output_dir = "/tmp/test_profiling_results"
+            output_dir = str(tmp_path / "profiling_results")
@@
-    @pytest.fixture
-    def trtllm_args(self):
+    @pytest.fixture
+    def trtllm_args(self, tmp_path):
@@
-            output_dir = "/tmp/test_profiling_results"
+            output_dir = str(tmp_path / "profiling_results")

Also applies to: 52-74, 90-114

components/backends/trtllm/deploy/disagg_planner.yaml (3)

117-124: Expose Prometheus port explicitly for clarity and service selectors.

Declare containerPort 8000 to match PROMETHEUS_PORT.

       extraPodSpec:
         mainContainer:
           image: nvcr.io/nvidian/dynamo-dev/dynamo-trtllm-runtime:hzhou-0909-03
           workingDir: /workspace/components/backends/trtllm
+          ports:
+            - name: http
+              containerPort: 8000
           command:
             - python3

21-39: Use a stable, configurable image reference.

Hardcoding a private dev tag (hzhou-0909-03) will break for most users. Parameterize via values/env or reference a public, versioned tag; reflect the same in README.

If you want, I can add a kustomize overlay and a values file to centralize the image/tag.

Also applies to: 72-87, 117-124, 155-172, 189-206


87-94: Avoid using componentType: frontend as a workaround to create a Service.

Prefer a proper componentType or an explicit Service template if the CRD supports it; the current hack may confuse operators.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae38bd4 and ae9595f.

📒 Files selected for processing (17)
  • README.md (1 hunks)
  • benchmarks/profiler/profile_sla.py (1 hunks)
  • benchmarks/profiler/utils/config.py (4 hunks)
  • components/backends/trtllm/README.md (1 hunks)
  • components/backends/trtllm/deploy/README.md (1 hunks)
  • components/backends/trtllm/deploy/disagg_planner.yaml (1 hunks)
  • components/backends/trtllm/src/dynamo/trtllm/main.py (2 hunks)
  • components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py (6 hunks)
  • components/planner/src/dynamo/planner/defaults.py (1 hunks)
  • components/planner/src/dynamo/planner/utils/planner_argparse.py (1 hunks)
  • container/Dockerfile.sglang (1 hunks)
  • container/Dockerfile.trtllm (2 hunks)
  • deploy/utils/dynamo_deployment.py (1 hunks)
  • docs/architecture/planner_intro.rst (1 hunks)
  • docs/benchmarks/pre_deployment_profiling.md (1 hunks)
  • docs/guides/dynamo_deploy/sla_planner_deployment.md (1 hunks)
  • tests/profiler/test_profile_sla_dryrun.py (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-30T20:43:49.632Z
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.

Applied to files:

  • container/Dockerfile.trtllm
🧬 Code graph analysis (2)
tests/profiler/test_profile_sla_dryrun.py (1)
benchmarks/profiler/profile_sla.py (1)
  • run_profile (56-534)
benchmarks/profiler/utils/config.py (1)
components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py (1)
  • Config (28-93)
🪛 Ruff (0.12.2)
components/backends/trtllm/src/dynamo/trtllm/main.py

217-217: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

tests/profiler/test_profile_sla_dryrun.py

97-97: Probable insecure usage of temporary file or directory: "/tmp/test_profiling_results"

(S108)

benchmarks/profiler/utils/config.py

595-597: Avoid specifying long messages outside the exception class

(TRY003)


657-659: Avoid specifying long messages outside the exception class

(TRY003)


726-726: Avoid specifying long messages outside the exception class

(TRY003)


831-831: Do not catch blind exception: Exception

(BLE001)

🪛 markdownlint-cli2 (0.17.2)
docs/guides/dynamo_deploy/sla_planner_deployment.md

50-50: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Build and Test - vllm
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (20)
docs/benchmarks/pre_deployment_profiling.md (1)

17-18: Matrix update looks good

TRT-LLM Dense marked as supported is consistent with the PR scope.

README.md (1)

62-63: SLA-Based Planner support flip to ✅ for TRT-LLM

Docs alignment looks correct with the new backend support.

docs/architecture/planner_intro.rst (1)

47-52: SGLang support appears inconsistent across docs—please verify

README shows SLA Planner supports SGLang (✅), but this table lists SGLang as ❌. If SGLang is supported here, flip to ✅; otherwise, update README for consistency.

If supported, apply:

-   * -
-     - ❌
-     - SGLang
+   * -
+     - ✅
+     - SGLang
components/backends/trtllm/README.md (1)

58-59: Planner support status updated to ✅

Matches the planner/docs changes in this PR.

deploy/utils/dynamo_deployment.py (1)

27-27: Dropping type ignore for aiofiles: ensure typing and packaging are covered

Confirm:

  • aiofiles is installed wherever this utility runs (not only in dev/runtime images).
  • mypy/pyright passes without stubs; otherwise consider adding types or reintroducing a targeted ignore.
docs/guides/dynamo_deploy/sla_planner_deployment.md (1)

37-44: Good: surfaced sglang/trtllm variants

Clearer for users discovering backend-specific manifests.

components/planner/src/dynamo/planner/defaults.py (2)

104-116: TRT-LLM component mapping looks correct for DECODE_FIRST.

Decode-first semantics and endpoints align with runtime naming; no issues spotted.


118-122: Registering 'trtllm' in WORKER_COMPONENT_NAMES is correct.

This enables planner routing; good addition.

components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py (3)

49-49: Config surface addition ‘override_engine_args’ looks good.

Field addition and str inclusion are consistent.

Also applies to: 81-81


126-139: Default endpoint help and logic are consistent.

First/next worker defaults are wired correctly.

Also applies to: 317-335


363-364: Wiring the parsed override into Config is correct.

No issues.

benchmarks/profiler/utils/config.py (6)

87-94: Good use of shlex for robust arg splitting.

Handles quoted JSON/paths reliably.


108-110: Joining args with shlex.join is appropriate.

Keeps YAML concise while preserving quoting.


641-699: Overall TRT-LLM convert_config flow looks solid.

Rename/merge, arg normalization, and replicas=1 are consistent.


703-753: TP size override via --override-engine-args is correct.

Also updates GPU requests/limits; good.


755-793: Model name resolution covers aggregated and disagg worker names.

Reasonable fallbacks and warnings.


841-849: Exports look good.

Re-exporting WORKER_COMPONENT_NAMES is convenient for callers.

tests/profiler/test_profile_sla_dryrun.py (1)

115-120: LGTM on the new TRT-LLM dry-run test.

Matches existing vLLM/sglang tests and exercises the new backend path.

benchmarks/profiler/profile_sla.py (1)

552-553: CLI backend choices extended to include trtllm — LGTM.

This aligns with CONFIG_MODIFIERS and new tests.

components/backends/trtllm/deploy/disagg_planner.yaml (1)

41-47: Question: Does the Planner actually need the HuggingFace token?

If not required, drop envFromSecret: hf-token-secret from Planner to reduce secret surface.

biswapanda and others added 7 commits September 10, 2025 09:12
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
This reverts commit f04de1d.

Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
@tedzhouhk tedzhouhk merged commit dcee4db into main Sep 10, 2025
14 of 16 checks passed
@tedzhouhk tedzhouhk deleted the hzhou/trtllm-sweep branch September 10, 2025 21:41
ayushag-nv added a commit that referenced this pull request Sep 15, 2025
Signed-off-by: Biswa Panda <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: ayushag <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Harrison Saturley-Hall <[email protected]>
Signed-off-by: alec-flowers <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Pavithra Vijayakrishnan <[email protected]>
Signed-off-by: Harry Kim <[email protected]>
Signed-off-by: Tushar Sharma <[email protected]>
Signed-off-by: GuanLuo <[email protected]>
Signed-off-by: Hannah Zhang <[email protected]>
Signed-off-by: Graham King <[email protected]>
Signed-off-by: Anant Sharma <[email protected]>
Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: Neal Vaidya <[email protected]>
Co-authored-by: Biswa Panda <[email protected]>
Co-authored-by: Ayush Agarwal <[email protected]>
Co-authored-by: Dillon Cullinan <[email protected]>
Co-authored-by: Harrison Saturley-Hall <[email protected]>
Co-authored-by: alec-flowers <[email protected]>
Co-authored-by: julienmancuso <[email protected]>
Co-authored-by: Pavithra Vijayakrishnan <[email protected]>
Co-authored-by: Harry Kim <[email protected]>
Co-authored-by: Tushar Sharma <[email protected]>
Co-authored-by: Alec <[email protected]>
Co-authored-by: GuanLuo <[email protected]>
Co-authored-by: hhzhang16 <[email protected]>
Co-authored-by: Graham King <[email protected]>
Co-authored-by: Anant Sharma <[email protected]>
Co-authored-by: Greg Clark <[email protected]>
Co-authored-by: Neal Vaidya <[email protected]>
Co-authored-by: nv-nmailhot <[email protected]>
Signed-off-by: ayushag <[email protected]>
zhongdaor-nv pushed a commit that referenced this pull request Sep 15, 2025
Signed-off-by: Biswa Panda <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: ayushag <[email protected]>
Signed-off-by: Dillon Cullinan <[email protected]>
Signed-off-by: Harrison Saturley-Hall <[email protected]>
Signed-off-by: alec-flowers <[email protected]>
Signed-off-by: Julien Mancuso <[email protected]>
Signed-off-by: Pavithra Vijayakrishnan <[email protected]>
Signed-off-by: Harry Kim <[email protected]>
Signed-off-by: Tushar Sharma <[email protected]>
Signed-off-by: GuanLuo <[email protected]>
Signed-off-by: Hannah Zhang <[email protected]>
Signed-off-by: Graham King <[email protected]>
Signed-off-by: Anant Sharma <[email protected]>
Signed-off-by: Greg Clark <[email protected]>
Signed-off-by: Neal Vaidya <[email protected]>
Co-authored-by: Biswa Panda <[email protected]>
Co-authored-by: Ayush Agarwal <[email protected]>
Co-authored-by: Dillon Cullinan <[email protected]>
Co-authored-by: Harrison Saturley-Hall <[email protected]>
Co-authored-by: alec-flowers <[email protected]>
Co-authored-by: julienmancuso <[email protected]>
Co-authored-by: Pavithra Vijayakrishnan <[email protected]>
Co-authored-by: Harry Kim <[email protected]>
Co-authored-by: Tushar Sharma <[email protected]>
Co-authored-by: Alec <[email protected]>
Co-authored-by: GuanLuo <[email protected]>
Co-authored-by: hhzhang16 <[email protected]>
Co-authored-by: Graham King <[email protected]>
Co-authored-by: Anant Sharma <[email protected]>
Co-authored-by: Greg Clark <[email protected]>
Co-authored-by: Neal Vaidya <[email protected]>
Co-authored-by: nv-nmailhot <[email protected]>
Signed-off-by: zhongdaor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.