feat: support trtllm in sla-planner #2980

tedzhouhk · 2025-09-10T01:58:47Z

add an --override-engine-args argument in trtllm to override flags in yaml file
support trtllm in pre-deployment sweeping, add test
support trtllm in sla planner
update docs

Summary by CodeRabbit

New Features
- Added TensorRT-LLM support to the SLA-Based Planner and profiler backend options.
- Introduced Disaggregated Planner deployment pattern for TRT-LLM.
- Added JSON-based engine-args overrides and dynamic namespace support via environment variable.
- Improved CLI/argument handling for robust quoting and JSON payloads.
Documentation
- Updated support matrices to mark TRT-LLM as supported.
- Added deployment guidance and commands for additional backends.
Chores
- Enhanced Docker images: bundled deploy artifacts, installed common deps, and improved dev startup experience.
Tests
- Added dry-run test coverage for the TRT-LLM backend in the profiler.

coderabbitai · 2025-09-10T02:13:12Z

Walkthrough

Adds TRT-LLM as a supported backend across planner, profiler, runtime, and docs. Introduces config utilities, CLI flags, and deployment manifests for disaggregated planner workflows. Implements JSON-based engine-args overrides and dynamic namespace endpoints for TRT-LLM. Updates Dockerfiles, profiling tests, and documentation tables to reflect support.

Changes

Cohort / File(s)	Summary
Docs: Support matrices and guides `README.md`, `components/backends/trtllm/README.md`, `docs/architecture/planner_intro.rst`, `docs/benchmarks/pre_deployment_profiling.md`, `docs/guides/dynamo_deploy/sla_planner_deployment.md`	Mark SLA-Based Planner/TensorRT-LLM as supported; add commented deploy commands; clarify expected pods; no code changes.
Profiler: Backend options and tests `benchmarks/profiler/profile_sla.py`, `tests/profiler/test_profile_sla_dryrun.py`	Add "trtllm" to --backend choices and help text; add dry-run test for trtllm mirroring existing cases.
Profiler: Config utilities and TRT-LLM modifier `benchmarks/profiler/utils/config.py`	Switch arg parsing to shlex; add TrtllmConfigModifier (convert_config, set_config_tp_size, getters, KV cache parsing); export via CONFIG_MODIFIERS and all.
TRT-LLM runtime: Config/CLI and overrides `components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py`, `components/backends/trtllm/src/dynamo/trtllm/main.py`	Add DYN_NAMESPACE-driven endpoints; add --override-engine-args and Config.override_engine_args; deep-merge JSON overrides into engine args with error exit on parse failure.
Planner: Backend registration `components/planner/src/dynamo/planner/defaults.py`, `components/planner/src/dynamo/planner/utils/planner_argparse.py`	Define TrtllmComponentName and register in WORKER_COMPONENT_NAMES; allow "trtllm" in planner CLI backend choices.
Deploy: Disaggregated planner `components/backends/trtllm/deploy/README.md`, `components/backends/trtllm/deploy/disagg_planner.yaml`	Document and add YAML for disaggregated planner deployment with Frontend, Planner, Prometheus, and separate prefill/decode workers; PVC mount for profiling results.
Containers: Build/runtime adjustments `container/Dockerfile.sglang`, `container/Dockerfile.trtllm`	Copy deploy assets into images; install deps via requirements.txt; set banner/entrypoint for dev TRT-LLM image.
Misc: Type-check hygiene `deploy/utils/dynamo_deployment.py`	Remove type: ignore from aiofiles import.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor U as User
  participant CLI as trtllm_utils.py (CLI)
  participant C as Config
  participant M as main.py (init)
  participant E as TRT-LLM Engine

  U->>CLI: run dynamo.trtllm ... --override-engine-args '{"engine":{"tp":2}}'
  CLI->>C: parse args, set override_engine_args
  C-->>M: Config with override_engine_args
  M->>M: parse JSON (override-engine-args)
  alt JSON parse ok
    M->>M: deep_merge overrides into arg_map
    M->>E: start with merged engine args
    E-->>U: ready
  else JSON parse error
    M-->>U: log error and exit(1)
  end

sequenceDiagram
  autonumber
  actor Client
  participant FE as Frontend (HTTP 8000)
  participant Planner as SLA Planner
  participant Prom as Prometheus
  participant Pref as TRTLLMPrefillWorker
  participant Dec as TRTLLMDecodeWorker
  note over Planner,Prom: Profiling results PVC mounted (/workspace/profiling_results)

  Client->>FE: request generate
  FE->>Planner: query capacity/targets
  Planner->>Prom: scrape metrics
  Prom-->>Planner: metrics
  Planner-->>FE: routing decision (prefill/decode)
  FE->>Pref: prefill (disagg mode)
  Pref-->>FE: prefill outputs
  FE->>Dec: decode
  Dec-->>FE: tokens
  FE-->>Client: response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

chore: Update trtllm version to 1.1.0rc3 #2930 — Both touch trtllm main.py, modifying runtime config handling (JSON overrides vs. import changes).
feat: support sglang in sla planner #2421 — Both extend planner backend support, updating WORKER_COMPONENT_NAMES and CLI backend choices.
test: add test for pre-deployment script #2857 — Both modify profiler CLI/config utilities and add backend-specific tests/config modifiers.

Pre-merge checks (1 passed, 2 warnings)

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description Check	⚠️ Warning	The current description consists only of high-level bullet points and does not follow the repository’s required template. It is missing the “Overview,” “Details,” “Where should the reviewer start?”, and “Related Issues” sections, so reviewers lack structured context about intent, specific changes, entry points, and issue linkage. This omission reduces clarity and makes it harder to verify completeness against the PR objectives.	Please restructure the description using the provided template by adding an “Overview” section summarizing the PR’s purpose, a “Details” section describing each change, a “Where should the reviewer start?” section listing key files, and a “Related Issues” section with action keywords (e.g., “Closes #xxx”) to link relevant GitHub issues. Completing these sections will ensure consistency, improve reviewer guidance, and enhance traceability.
Docstring Coverage	⚠️ Warning	Docstring coverage is 27.78% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title “feat: support trtllm in sla-planner” clearly indicates the addition of trtllm support within the SLA planner component, which is indeed one aspect of the PR. However, the changes also introduce an `--override-engine-args` CLI flag, extend pre-deployment sweeping with new tests, and update a variety of documentation files, so the title only partially reflects the overall scope. As a result, while relevant, it does not fully convey the breadth of enhancements in this pull request.

Poem

A rabbit taps the deploy drum, hop-hop—
Prefill, decode, the planners don’t stop.
Namespaces dance, engines align,
JSON whispers, overrides fine.
Pods wake up in tidy rows—
SLA moonlight, throughput glows.
Carrot-coded, the pipeline flows. 🥕✨

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

docs/benchmarks/pre_deployment_profiling.md (1)
155-174: Add trtllm to backend choices in the example YAML args

The docs still show only vllm/sglang; include trtllm to avoid confusion.

Apply this diff:
           - --backend
-            - <vllm/sglang>
+            - <vllm/sglang/trtllm>
components/planner/src/dynamo/planner/defaults.py (1)
57-70: Fix fallback Prometheus endpoint: missing scheme and port.

When k8s_namespace is empty/default, the function returns just <namespace>-prometheus (no scheme/port), which breaks HTTP clients and is inconsistent with the non-default branch.

Apply:
-        return f"{namespace}-prometheus"
+        # Fallback to in-cluster service DNS without namespace and include port
+        return f"http://{namespace}-prometheus:{port}"
components/backends/trtllm/deploy/README.md (1)

235-236: Fix broken link to client section in trtllm README
The anchor #client in components/backends/trtllm/deploy/README.md (lines 235–236) does not exist in components/backends/vllm/README.md. Update the link to point at the correct section (e.g. #vllm-quick-start) or add a “Client” heading in the vLLM README.

🧹 Nitpick comments (22)

container/Dockerfile.sglang (1)
239-247: Remove duplicate COPY of deploy directory.

deploy/ is copied twice in the runtime stage (Line 240 and Line 246). The new line introduces a redundant layer, needlessly increasing image size and cache bust frequency. Keep only one COPY.

If deploy artifacts are also needed in the framework stage, confirm and add a single COPY there instead of duplicating in runtime.

Apply:
 COPY examples /workspace/examples
- COPY deploy /workspace/deploy
 RUN uv pip install /workspace/benchmarks

 # Copy benchmarks, backends and tests for CI
 COPY tests /workspace/tests
 COPY benchmarks /workspace/benchmarks
 COPY deploy /workspace/deploy
docs/benchmarks/pre_deployment_profiling.md (1)
52-58: Specify code-fence language for sample output

Add a language to satisfy MD040 and improve rendering.

Apply this diff:
-```
+```text
 2025-05-16 15:20:24 - __main__ - INFO - Analyzing results and generate recommendations...
 ...
</blockquote></details>
<details>
<summary>components/planner/src/dynamo/planner/utils/planner_argparse.py (1)</summary><blockquote>

`40-44`: **Avoid drift by deriving backend choices from a single source of truth**

Use the central mapping (defaults.WORKER_COMPONENT_NAMES) for CLI choices so new backends don’t require multiple edits.

Apply this diff:

```diff
-from dynamo.planner.defaults import SLAPlannerDefaults
+from dynamo.planner.defaults import SLAPlannerDefaults, WORKER_COMPONENT_NAMES
@@
-    parser.add_argument(
+    parser.add_argument(
         "--backend",
         default=SLAPlannerDefaults.backend,
-        choices=["vllm", "sglang", "trtllm"],
+        choices=sorted(WORKER_COMPONENT_NAMES.keys()),
         help="Backend type",
     )
docs/guides/dynamo_deploy/sla_planner_deployment.md (1)
50-57: Add code-fence language to the “Expected pods” block

Small docs lint fix (MD040). Recommend text/console.

Apply this diff:
-```
+```text
 # For vLLM:
 vllm-disagg-planner-frontend-*            1/1 Running
 vllm-disagg-planner-prometheus-*          1/1 Running
 vllm-disagg-planner-planner-*             1/1 Running
 vllm-disagg-planner-backend-*             1/1 Running
 vllm-disagg-planner-prefill-*             1/1 Running
</blockquote></details>
<details>
<summary>container/Dockerfile.trtllm (2)</summary><blockquote>

`332-335`: **Install via uv for consistency with the rest of the image**

Optional: use uv pip here (as elsewhere) to avoid mixing installers and to leverage uv’s resolver/caching.

Apply this diff:

```diff
-RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requirements.txt \
-    pip install --requirement /tmp/requirements.txt
+RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requirements.txt \
+    uv pip install --requirement /tmp/requirements.txt
505-505: Copying entire deploy/ into runtime may bloat the image

If only a subset is needed at runtime/CI, consider copying just required subpaths (e.g., deploy/utils or specific manifests) to keep image size down.
components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py (4)
17-23: Unify env var naming for namespace.

Elsewhere we use DYNAMO_NAMESPACE; here we read DYN_NAMESPACE. Recommend reading the same var for consistency.
-DYN_NAMESPACE = os.environ.get("DYN_NAMESPACE", "dynamo")
+DYN_NAMESPACE = os.environ.get("DYNAMO_NAMESPACE", os.environ.get("DYN_NAMESPACE", "dynamo"))
222-227: Clarify flag semantics: it's JSON, not a Python dict.

Help string says “Python dictionary string” but we parse with json.loads. Avoid ambiguity and guide quoting.
-        help='Python dictionary string to override specific engine arguments from the YAML file. Example: \'{"tensor_parallel_size": 2, "kv_cache_config": {"enable_block_reuse": false}}\'',
+        help='JSON string to override specific engine arguments from the YAML file. Example: \'{"tensor_parallel_size": 2, "kv_cache_config": {"enable_block_reuse": false}}\'',
241-259: Nit: boolean flag parsing.

--use-nixl-connect uses type=bool; typical CLI UX is action="store_true". Optional to adjust.
-    parser.add_argument(
-        "--use-nixl-connect",
-        type=bool,
-        default=False,
-        help="Use NIXL Connect for communication between workers.",
-    )
+    parser.add_argument(
+        "--use-nixl-connect",
+        action="store_true",
+        help="Use NIXL Connect for communication between workers.",
+    )
300-371: Use enum .value for argparse defaults
Replace
default=DEFAULT_DISAGGREGATION_MODE
and
default=DEFAULT_DISAGGREGATION_STRATEGY
with
default=DEFAULT_DISAGGREGATION_MODE.value
and
default=DEFAULT_DISAGGREGATION_STRATEGY.value
to align the str type and choices with the actual default values.
components/backends/trtllm/src/dynamo/trtllm/main.py (1)

242-252: Logging engine args is helpful; consider redaction if sensitive.

If args may include secrets in YAML, redact before logging.
benchmarks/profiler/utils/config.py (4)
812-839: Narrow exception and improve parsing resilience.

Catching bare Exception is noisy (BLE001). Also handle large logs safely.
-        except Exception as e:
-            logger.warning(f"Failed to parse KV cache size from log file. Error: {e}")
+        except (OSError, ValueError, re.error) as e:
+            logger.warning(f"Failed to parse KV cache size from log file. Error: {e}")
565-701: Optional: avoid hard-coded TRT-LLM worker names.

You already import WORKER_COMPONENT_NAMES; using it reduces drift if names change.
-            if "TRTLLMPrefillWorker" in cfg.spec.services:
+            prefill_name = WORKER_COMPONENT_NAMES["trtllm"].prefill_worker_k8s_name
+            if prefill_name in cfg.spec.services:
-                cfg.spec.services["TRTLLMWorker"] = cfg.spec.services[
-                    "TRTLLMPrefillWorker"
-                ]
-                del cfg.spec.services["TRTLLMPrefillWorker"]
+                cfg.spec.services["TRTLLMWorker"] = cfg.spec.services[prefill_name]
+                del cfg.spec.services[prefill_name]
...
-            if "TRTLLMDecodeWorker" in cfg.spec.services:
+            decode_name = WORKER_COMPONENT_NAMES["trtllm"].decode_worker_k8s_name
+            if decode_name in cfg.spec.services:
-                cfg.spec.services["TRTLLMWorker"] = cfg.spec.services[
-                    "TRTLLMDecodeWorker"
-                ]
-                del cfg.spec.services["TRTLLMDecodeWorker"]
+                cfg.spec.services["TRTLLMWorker"] = cfg.spec.services[decode_name]
+                del cfg.spec.services[decode_name]
82-95: Tiny improvement to remove_valued_arguments.

Currently removes only the first occurrence. If args can be duplicated, consider a loop.
 def remove_valued_arguments(args: list[str], key: str) -> list[str]:
-    """Remove a valued argument (e.g., --key value) from the arguments list if exists."""
-    if key in args:
-        idx = args.index(key)
-        if idx + 1 < len(args):
-            del args[idx : idx + 2]
+    """Remove all occurrences of a valued argument (e.g., --key value)."""
+    while True:
+        try:
+            idx = args.index(key)
+            del args[idx : min(idx + 2, len(args))]
+        except ValueError:
+            break
     return args
24-31: Logger setup ok; avoid duplicate handlers in repeated imports.

If this module can be imported multiple times, guard handler addition.
-if not any(isinstance(h, logging.StreamHandler) for h in logger.handlers):
-    logger.addHandler(console_handler)
+if not any(isinstance(h, logging.StreamHandler) for h in logger.handlers):
+    logger.addHandler(console_handler)
(Ensure a guard is present; if not, add it.)
components/backends/trtllm/deploy/README.md (3)
45-54: Add this new pattern to the “Choose Your Template” list and cross-link the YAML.

Readers won’t discover disagg_planner.yaml from the “Choose Your Template” section. Add a bullet there to keep docs consistent.

Example insertion to Lines 132-138:
- - Use `disagg_router.yaml` for high-performance with KV cache routing and disaggregation
+ - Use `disagg_router.yaml` for high-performance with KV cache routing and disaggregation
+ - Use `disagg_planner.yaml` for SLA-based auto-scaling with a separate planner and Prometheus
55-56: Call out the required PVC for the planner.

disagg_planner.yaml expects a pre-created PVC named dynamo-pvc. Mention it here to prevent failed deploys.
 > [!NOTE]
 > This deployment requires pre-deployment profiling to be completed first. See [Pre-Deployment Profiling](../../../../docs/benchmarks/pre_deployment_profiling.md) for detailed instructions.
+>
+> Additionally, ensure a PersistentVolumeClaim named `dynamo-pvc` exists in the target namespace; the planner mounts it at `/workspace/profiling_results`.
193-194: Fix port-forward target; current example mixes Deployment and Pod naming.

Use the Service or Deployment name without pod UID. The service created by the CRD is typically <deployment-name>-frontend.
-kubectl port-forward deployment/trtllm-v1-disagg-frontend-<pod-uuid-info> 8000:8000
+# Prefer forwarding the Service:
+kubectl port-forward svc/trtllm-disagg-frontend 8000:8000
+# Or forward the Deployment (no pod UID):
+# kubectl port-forward deployment/trtllm-disagg-frontend 8000:8000
tests/profiler/test_profile_sla_dryrun.py (1)
27-49: Avoid hardcoding /tmp; use pytest’s tmp_path for isolation and to silence S108.

This removes shared state and addresses the static-analysis warning.
-    @pytest.fixture
-    def vllm_args(self):
+    @pytest.fixture
+    def vllm_args(self, tmp_path):
@@
-            output_dir = "/tmp/test_profiling_results"
+            output_dir = str(tmp_path / "profiling_results")
@@
-    @pytest.fixture
-    def sglang_args(self):
+    @pytest.fixture
+    def sglang_args(self, tmp_path):
@@
-            output_dir = "/tmp/test_profiling_results"
+            output_dir = str(tmp_path / "profiling_results")
@@
-    @pytest.fixture
-    def trtllm_args(self):
+    @pytest.fixture
+    def trtllm_args(self, tmp_path):
@@
-            output_dir = "/tmp/test_profiling_results"
+            output_dir = str(tmp_path / "profiling_results")
Also applies to: 52-74, 90-114
components/backends/trtllm/deploy/disagg_planner.yaml (3)
117-124: Expose Prometheus port explicitly for clarity and service selectors.

Declare containerPort 8000 to match PROMETHEUS_PORT.
       extraPodSpec:
         mainContainer:
           image: nvcr.io/nvidian/dynamo-dev/dynamo-trtllm-runtime:hzhou-0909-03
           workingDir: /workspace/components/backends/trtllm
+          ports:
+            - name: http
+              containerPort: 8000
           command:
             - python3
21-39: Use a stable, configurable image reference.

Hardcoding a private dev tag (hzhou-0909-03) will break for most users. Parameterize via values/env or reference a public, versioned tag; reflect the same in README.

If you want, I can add a kustomize overlay and a values file to centralize the image/tag.

Also applies to: 72-87, 117-124, 155-172, 189-206

87-94: Avoid using componentType: frontend as a workaround to create a Service.

Prefer a proper componentType or an explicit Service template if the CRD supports it; the current hack may confuse operators.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae38bd4 and ae9595f.

📒 Files selected for processing (17)

README.md (1 hunks)
benchmarks/profiler/profile_sla.py (1 hunks)
benchmarks/profiler/utils/config.py (4 hunks)
components/backends/trtllm/README.md (1 hunks)
components/backends/trtllm/deploy/README.md (1 hunks)
components/backends/trtllm/deploy/disagg_planner.yaml (1 hunks)
components/backends/trtllm/src/dynamo/trtllm/main.py (2 hunks)
components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py (6 hunks)
components/planner/src/dynamo/planner/defaults.py (1 hunks)
components/planner/src/dynamo/planner/utils/planner_argparse.py (1 hunks)
container/Dockerfile.sglang (1 hunks)
container/Dockerfile.trtllm (2 hunks)
deploy/utils/dynamo_deployment.py (1 hunks)
docs/architecture/planner_intro.rst (1 hunks)
docs/benchmarks/pre_deployment_profiling.md (1 hunks)
docs/guides/dynamo_deploy/sla_planner_deployment.md (1 hunks)
tests/profiler/test_profile_sla_dryrun.py (2 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-08-30T20:43:49.632Z

Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.

Applied to files:

container/Dockerfile.trtllm

🧬 Code graph analysis (2)

tests/profiler/test_profile_sla_dryrun.py (1)

benchmarks/profiler/profile_sla.py (1)

run_profile (56-534)

benchmarks/profiler/utils/config.py (1)

components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py (1)

Config (28-93)

🪛 Ruff (0.12.2)

components/backends/trtllm/src/dynamo/trtllm/main.py

217-217: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

tests/profiler/test_profile_sla_dryrun.py

97-97: Probable insecure usage of temporary file or directory: "/tmp/test_profiling_results"

(S108)

benchmarks/profiler/utils/config.py

595-597: Avoid specifying long messages outside the exception class

(TRY003)

657-659: Avoid specifying long messages outside the exception class

(TRY003)

726-726: Avoid specifying long messages outside the exception class

(TRY003)

831-831: Do not catch blind exception: Exception

(BLE001)

🪛 markdownlint-cli2 (0.17.2)

docs/guides/dynamo_deploy/sla_planner_deployment.md

50-50: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Build and Test - vllm
GitHub Check: Build and Test - dynamo

🔇 Additional comments (20)

docs/benchmarks/pre_deployment_profiling.md (1)

17-18: Matrix update looks good

TRT-LLM Dense marked as supported is consistent with the PR scope.

README.md (1)

62-63: SLA-Based Planner support flip to ✅ for TRT-LLM

Docs alignment looks correct with the new backend support.
docs/architecture/planner_intro.rst (1)
47-52: SGLang support appears inconsistent across docs—please verify

README shows SLA Planner supports SGLang (✅), but this table lists SGLang as ❌. If SGLang is supported here, flip to ✅; otherwise, update README for consistency.

If supported, apply:
-   * -
-     - ❌
-     - SGLang
+   * -
+     - ✅
+     - SGLang
components/backends/trtllm/README.md (1)

58-59: Planner support status updated to ✅

Matches the planner/docs changes in this PR.

deploy/utils/dynamo_deployment.py (1)

27-27: Dropping type ignore for aiofiles: ensure typing and packaging are covered

Confirm:

aiofiles is installed wherever this utility runs (not only in dev/runtime images).

mypy/pyright passes without stubs; otherwise consider adding types or reintroducing a targeted ignore.

docs/guides/dynamo_deploy/sla_planner_deployment.md (1)

37-44: Good: surfaced sglang/trtllm variants

Clearer for users discovering backend-specific manifests.

components/planner/src/dynamo/planner/defaults.py (2)

104-116: TRT-LLM component mapping looks correct for DECODE_FIRST.

Decode-first semantics and endpoints align with runtime naming; no issues spotted.

118-122: Registering 'trtllm' in WORKER_COMPONENT_NAMES is correct.

This enables planner routing; good addition.

components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py (3)

49-49: Config surface addition ‘override_engine_args’ looks good.

Field addition and str inclusion are consistent.

Also applies to: 81-81

126-139: Default endpoint help and logic are consistent.

First/next worker defaults are wired correctly.

Also applies to: 317-335

363-364: Wiring the parsed override into Config is correct.

No issues.

benchmarks/profiler/utils/config.py (6)

87-94: Good use of shlex for robust arg splitting.

Handles quoted JSON/paths reliably.

108-110: Joining args with shlex.join is appropriate.

Keeps YAML concise while preserving quoting.

641-699: Overall TRT-LLM convert_config flow looks solid.

Rename/merge, arg normalization, and replicas=1 are consistent.

703-753: TP size override via --override-engine-args is correct.

Also updates GPU requests/limits; good.

755-793: Model name resolution covers aggregated and disagg worker names.

Reasonable fallbacks and warnings.

841-849: Exports look good.

Re-exporting WORKER_COMPONENT_NAMES is convenient for callers.

tests/profiler/test_profile_sla_dryrun.py (1)

115-120: LGTM on the new TRT-LLM dry-run test.

Matches existing vLLM/sglang tests and exercises the new backend path.

benchmarks/profiler/profile_sla.py (1)

552-553: CLI backend choices extended to include trtllm — LGTM.

This aligns with CONFIG_MODIFIERS and new tests.

components/backends/trtllm/deploy/disagg_planner.yaml (1)

41-47: Question: Does the Planner actually need the HuggingFace token?

If not required, drop envFromSecret: hf-token-secret from Planner to reduce secret surface.

benchmarks/profiler/utils/config.py

components/backends/trtllm/deploy/disagg_planner.yaml

components/backends/trtllm/src/dynamo/trtllm/main.py

Signed-off-by: Biswa Panda <[email protected]> Signed-off-by: hongkuanz <[email protected]>

Signed-off-by: hongkuanz <[email protected]>

This reverts commit f04de1d. Signed-off-by: hongkuanz <[email protected]>

Signed-off-by: hongkuanz <[email protected]>

…/trtllm-sweep Signed-off-by: hongkuanz <[email protected]>

benchmarks/profiler/utils/config.py

components/backends/trtllm/src/dynamo/trtllm/main.py

benchmarks/profiler/utils/config.py

Signed-off-by: hongkuanz <[email protected]>

components/backends/trtllm/src/dynamo/trtllm/main.py

Signed-off-by: Biswa Panda <[email protected]> Signed-off-by: hongkuanz <[email protected]> Signed-off-by: ayushag <[email protected]> Signed-off-by: Dillon Cullinan <[email protected]> Signed-off-by: Harrison Saturley-Hall <[email protected]> Signed-off-by: alec-flowers <[email protected]> Signed-off-by: Julien Mancuso <[email protected]> Signed-off-by: Pavithra Vijayakrishnan <[email protected]> Signed-off-by: Harry Kim <[email protected]> Signed-off-by: Tushar Sharma <[email protected]> Signed-off-by: GuanLuo <[email protected]> Signed-off-by: Hannah Zhang <[email protected]> Signed-off-by: Graham King <[email protected]> Signed-off-by: Anant Sharma <[email protected]> Signed-off-by: Greg Clark <[email protected]> Signed-off-by: Neal Vaidya <[email protected]> Co-authored-by: Biswa Panda <[email protected]> Co-authored-by: Ayush Agarwal <[email protected]> Co-authored-by: Dillon Cullinan <[email protected]> Co-authored-by: Harrison Saturley-Hall <[email protected]> Co-authored-by: alec-flowers <[email protected]> Co-authored-by: julienmancuso <[email protected]> Co-authored-by: Pavithra Vijayakrishnan <[email protected]> Co-authored-by: Harry Kim <[email protected]> Co-authored-by: Tushar Sharma <[email protected]> Co-authored-by: Alec <[email protected]> Co-authored-by: GuanLuo <[email protected]> Co-authored-by: hhzhang16 <[email protected]> Co-authored-by: Graham King <[email protected]> Co-authored-by: Anant Sharma <[email protected]> Co-authored-by: Greg Clark <[email protected]> Co-authored-by: Neal Vaidya <[email protected]> Co-authored-by: nv-nmailhot <[email protected]> Signed-off-by: ayushag <[email protected]>

Signed-off-by: Biswa Panda <[email protected]> Signed-off-by: hongkuanz <[email protected]> Signed-off-by: ayushag <[email protected]> Signed-off-by: Dillon Cullinan <[email protected]> Signed-off-by: Harrison Saturley-Hall <[email protected]> Signed-off-by: alec-flowers <[email protected]> Signed-off-by: Julien Mancuso <[email protected]> Signed-off-by: Pavithra Vijayakrishnan <[email protected]> Signed-off-by: Harry Kim <[email protected]> Signed-off-by: Tushar Sharma <[email protected]> Signed-off-by: GuanLuo <[email protected]> Signed-off-by: Hannah Zhang <[email protected]> Signed-off-by: Graham King <[email protected]> Signed-off-by: Anant Sharma <[email protected]> Signed-off-by: Greg Clark <[email protected]> Signed-off-by: Neal Vaidya <[email protected]> Co-authored-by: Biswa Panda <[email protected]> Co-authored-by: Ayush Agarwal <[email protected]> Co-authored-by: Dillon Cullinan <[email protected]> Co-authored-by: Harrison Saturley-Hall <[email protected]> Co-authored-by: alec-flowers <[email protected]> Co-authored-by: julienmancuso <[email protected]> Co-authored-by: Pavithra Vijayakrishnan <[email protected]> Co-authored-by: Harry Kim <[email protected]> Co-authored-by: Tushar Sharma <[email protected]> Co-authored-by: Alec <[email protected]> Co-authored-by: GuanLuo <[email protected]> Co-authored-by: hhzhang16 <[email protected]> Co-authored-by: Graham King <[email protected]> Co-authored-by: Anant Sharma <[email protected]> Co-authored-by: Greg Clark <[email protected]> Co-authored-by: Neal Vaidya <[email protected]> Co-authored-by: nv-nmailhot <[email protected]> Signed-off-by: zhongdaor <[email protected]>

tedzhouhk requested review from a team, Aphoh, PeaBrane, alec-flowers, atchernych, biswapanda, hhzhang16, hutm, ishandhanani, jasonqinzhou, julienmancuso, michaelshin, mohammedabdulwahhab, nnshah1, ptarasiewiczNV, richardhuo-nv, rmccorm4 and tanmayv25 as code owners September 10, 2025 01:58

pull-request-size bot added the size/XL label Sep 10, 2025

github-actions bot added the feat label Sep 10, 2025

coderabbitai bot reviewed Sep 10, 2025

View reviewed changes

biswapanda and others added 7 commits September 10, 2025 09:12

chore: update CODEOWNERS for multimodal examples (#2878)

1aa8c32

Signed-off-by: Biswa Panda <[email protected]> Signed-off-by: hongkuanz <[email protected]>

first attempt

ece24b1

Signed-off-by: hongkuanz <[email protected]>

add aiofiles

c5633e1

Signed-off-by: hongkuanz <[email protected]>

fix trtllm docker file

0846ae7

Signed-off-by: hongkuanz <[email protected]>

Revert "fix trtllm docker file"

90b5ff1

This reverts commit f04de1d. Signed-off-by: hongkuanz <[email protected]>

install requirement.txt

f2a66cb

Signed-off-by: hongkuanz <[email protected]>

add trtllm to sla planner

4001324

Signed-off-by: hongkuanz <[email protected]>

tedzhouhk and others added 4 commits September 10, 2025 09:13

feat: add a virtual connector for 3rd party deployments (#2913)

37213b6

Signed-off-by: hongkuanz <[email protected]>

fix: dyn namespace scoping for trtllm

3af3425

Signed-off-by: hongkuanz <[email protected]>

pc

436307c

Signed-off-by: hongkuanz <[email protected]>

remove duplicate

e62f664

Signed-off-by: hongkuanz <[email protected]>

tedzhouhk force-pushed the hzhou/trtllm-sweep branch from ae9595f to e62f664 Compare September 10, 2025 16:13

tedzhouhk requested review from a team, indrajit96, krishung5, nealvaidya and whoisj as code owners September 10, 2025 16:13

copy-pr-bot bot temporarily deployed to GITLAB September 10, 2025 16:13 Inactive

pull-request-size bot added size/XXL and removed size/XL labels Sep 10, 2025

copy-pr-bot bot temporarily deployed to GITLAB September 10, 2025 16:14 Inactive

Merge branch 'main' of https://github.com/ai-dynamo/dynamo into hzhou…

a4a3e66

…/trtllm-sweep Signed-off-by: hongkuanz <[email protected]>

pull-request-size bot added size/XL and removed size/XXL labels Sep 10, 2025

copy-pr-bot bot temporarily deployed to GITLAB September 10, 2025 16:15 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 10, 2025 16:19 Inactive

hhzhang16 reviewed Sep 10, 2025

View reviewed changes

benchmarks/profiler/utils/config.py Outdated Show resolved Hide resolved

components/backends/trtllm/src/dynamo/trtllm/main.py Outdated Show resolved Hide resolved

benchmarks/profiler/utils/config.py Outdated Show resolved Hide resolved

address pr comment

e6ac2a7

Signed-off-by: hongkuanz <[email protected]>

copy-pr-bot bot temporarily deployed to GITLAB September 10, 2025 20:29 Inactive

hhzhang16 approved these changes Sep 10, 2025

View reviewed changes

nv-kmcgill53 requested changes Sep 10, 2025

View reviewed changes

components/backends/trtllm/src/dynamo/trtllm/main.py Show resolved Hide resolved

nv-kmcgill53 approved these changes Sep 10, 2025

View reviewed changes

tedzhouhk merged commit dcee4db into main Sep 10, 2025
14 of 16 checks passed

tedzhouhk deleted the hzhou/trtllm-sweep branch September 10, 2025 21:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support trtllm in sla-planner #2980

feat: support trtllm in sla-planner #2980

Uh oh!

tedzhouhk commented Sep 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 10, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

feat: support trtllm in sla-planner #2980

feat: support trtllm in sla-planner #2980

Uh oh!

Conversation

tedzhouhk commented Sep 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 10, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Pre-merge checks (1 passed, 2 warnings)

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

tedzhouhk commented Sep 10, 2025 •

edited by coderabbitai bot

Loading