-
Notifications
You must be signed in to change notification settings - Fork 755
feat: support trtllm in sla-planner #2980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds TRT-LLM as a supported backend across planner, profiler, runtime, and docs. Introduces config utilities, CLI flags, and deployment manifests for disaggregated planner workflows. Implements JSON-based engine-args overrides and dynamic namespace endpoints for TRT-LLM. Updates Dockerfiles, profiling tests, and documentation tables to reflect support. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor U as User
participant CLI as trtllm_utils.py (CLI)
participant C as Config
participant M as main.py (init)
participant E as TRT-LLM Engine
U->>CLI: run dynamo.trtllm ... --override-engine-args '{"engine":{"tp":2}}'
CLI->>C: parse args, set override_engine_args
C-->>M: Config with override_engine_args
M->>M: parse JSON (override-engine-args)
alt JSON parse ok
M->>M: deep_merge overrides into arg_map
M->>E: start with merged engine args
E-->>U: ready
else JSON parse error
M-->>U: log error and exit(1)
end
sequenceDiagram
autonumber
actor Client
participant FE as Frontend (HTTP 8000)
participant Planner as SLA Planner
participant Prom as Prometheus
participant Pref as TRTLLMPrefillWorker
participant Dec as TRTLLMDecodeWorker
note over Planner,Prom: Profiling results PVC mounted (/workspace/profiling_results)
Client->>FE: request generate
FE->>Planner: query capacity/targets
Planner->>Prom: scrape metrics
Prom-->>Planner: metrics
Planner-->>FE: routing decision (prefill/decode)
FE->>Pref: prefill (disagg mode)
Pref-->>FE: prefill outputs
FE->>Dec: decode
Dec-->>FE: tokens
FE-->>Client: response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Pre-merge checks (1 passed, 2 warnings)❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Poem
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
docs/benchmarks/pre_deployment_profiling.md (1)
155-174: Add trtllm to backend choices in the example YAML argsThe docs still show only vllm/sglang; include trtllm to avoid confusion.
Apply this diff:
- --backend - - <vllm/sglang> + - <vllm/sglang/trtllm>components/planner/src/dynamo/planner/defaults.py (1)
57-70: Fix fallback Prometheus endpoint: missing scheme and port.When
k8s_namespaceis empty/default, the function returns just<namespace>-prometheus(no scheme/port), which breaks HTTP clients and is inconsistent with the non-default branch.Apply:
- return f"{namespace}-prometheus" + # Fallback to in-cluster service DNS without namespace and include port + return f"http://{namespace}-prometheus:{port}"components/backends/trtllm/deploy/README.md (1)
235-236: Fix broken link to client section in trtllm README
The anchor#clientin components/backends/trtllm/deploy/README.md (lines 235–236) does not exist in components/backends/vllm/README.md. Update the link to point at the correct section (e.g.#vllm-quick-start) or add a “Client” heading in the vLLM README.
🧹 Nitpick comments (22)
container/Dockerfile.sglang (1)
239-247: Remove duplicate COPY of deploy directory.
deploy/is copied twice in the runtime stage (Line 240 and Line 246). The new line introduces a redundant layer, needlessly increasing image size and cache bust frequency. Keep only one COPY.If deploy artifacts are also needed in the framework stage, confirm and add a single COPY there instead of duplicating in runtime.
Apply:
COPY examples /workspace/examples - COPY deploy /workspace/deploy RUN uv pip install /workspace/benchmarks # Copy benchmarks, backends and tests for CI COPY tests /workspace/tests COPY benchmarks /workspace/benchmarks COPY deploy /workspace/deploydocs/benchmarks/pre_deployment_profiling.md (1)
52-58: Specify code-fence language for sample outputAdd a language to satisfy MD040 and improve rendering.
Apply this diff:
-``` +```text 2025-05-16 15:20:24 - __main__ - INFO - Analyzing results and generate recommendations... ...</blockquote></details> <details> <summary>components/planner/src/dynamo/planner/utils/planner_argparse.py (1)</summary><blockquote> `40-44`: **Avoid drift by deriving backend choices from a single source of truth** Use the central mapping (defaults.WORKER_COMPONENT_NAMES) for CLI choices so new backends don’t require multiple edits. Apply this diff: ```diff -from dynamo.planner.defaults import SLAPlannerDefaults +from dynamo.planner.defaults import SLAPlannerDefaults, WORKER_COMPONENT_NAMES @@ - parser.add_argument( + parser.add_argument( "--backend", default=SLAPlannerDefaults.backend, - choices=["vllm", "sglang", "trtllm"], + choices=sorted(WORKER_COMPONENT_NAMES.keys()), help="Backend type", )docs/guides/dynamo_deploy/sla_planner_deployment.md (1)
50-57: Add code-fence language to the “Expected pods” blockSmall docs lint fix (MD040). Recommend text/console.
Apply this diff:
-``` +```text # For vLLM: vllm-disagg-planner-frontend-* 1/1 Running vllm-disagg-planner-prometheus-* 1/1 Running vllm-disagg-planner-planner-* 1/1 Running vllm-disagg-planner-backend-* 1/1 Running vllm-disagg-planner-prefill-* 1/1 Running</blockquote></details> <details> <summary>container/Dockerfile.trtllm (2)</summary><blockquote> `332-335`: **Install via uv for consistency with the rest of the image** Optional: use uv pip here (as elsewhere) to avoid mixing installers and to leverage uv’s resolver/caching. Apply this diff: ```diff -RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requirements.txt \ - pip install --requirement /tmp/requirements.txt +RUN --mount=type=bind,source=./container/deps/requirements.txt,target=/tmp/requirements.txt \ + uv pip install --requirement /tmp/requirements.txt
505-505: Copying entire deploy/ into runtime may bloat the imageIf only a subset is needed at runtime/CI, consider copying just required subpaths (e.g., deploy/utils or specific manifests) to keep image size down.
components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py (4)
17-23: Unify env var naming for namespace.Elsewhere we use
DYNAMO_NAMESPACE; here we readDYN_NAMESPACE. Recommend reading the same var for consistency.-DYN_NAMESPACE = os.environ.get("DYN_NAMESPACE", "dynamo") +DYN_NAMESPACE = os.environ.get("DYNAMO_NAMESPACE", os.environ.get("DYN_NAMESPACE", "dynamo"))
222-227: Clarify flag semantics: it's JSON, not a Python dict.Help string says “Python dictionary string” but we parse with
json.loads. Avoid ambiguity and guide quoting.- help='Python dictionary string to override specific engine arguments from the YAML file. Example: \'{"tensor_parallel_size": 2, "kv_cache_config": {"enable_block_reuse": false}}\'', + help='JSON string to override specific engine arguments from the YAML file. Example: \'{"tensor_parallel_size": 2, "kv_cache_config": {"enable_block_reuse": false}}\'',
241-259: Nit: boolean flag parsing.
--use-nixl-connectusestype=bool; typical CLI UX isaction="store_true". Optional to adjust.- parser.add_argument( - "--use-nixl-connect", - type=bool, - default=False, - help="Use NIXL Connect for communication between workers.", - ) + parser.add_argument( + "--use-nixl-connect", + action="store_true", + help="Use NIXL Connect for communication between workers.", + )
300-371: Use enum.valuefor argparse defaults
Replacedefault=DEFAULT_DISAGGREGATION_MODEand
default=DEFAULT_DISAGGREGATION_STRATEGYwith
default=DEFAULT_DISAGGREGATION_MODE.valueand
default=DEFAULT_DISAGGREGATION_STRATEGY.valueto align the
strtype andchoiceswith the actual default values.components/backends/trtllm/src/dynamo/trtllm/main.py (1)
242-252: Logging engine args is helpful; consider redaction if sensitive.If args may include secrets in YAML, redact before logging.
benchmarks/profiler/utils/config.py (4)
812-839: Narrow exception and improve parsing resilience.Catching bare
Exceptionis noisy (BLE001). Also handle large logs safely.- except Exception as e: - logger.warning(f"Failed to parse KV cache size from log file. Error: {e}") + except (OSError, ValueError, re.error) as e: + logger.warning(f"Failed to parse KV cache size from log file. Error: {e}")
565-701: Optional: avoid hard-coded TRT-LLM worker names.You already import
WORKER_COMPONENT_NAMES; using it reduces drift if names change.- if "TRTLLMPrefillWorker" in cfg.spec.services: + prefill_name = WORKER_COMPONENT_NAMES["trtllm"].prefill_worker_k8s_name + if prefill_name in cfg.spec.services: - cfg.spec.services["TRTLLMWorker"] = cfg.spec.services[ - "TRTLLMPrefillWorker" - ] - del cfg.spec.services["TRTLLMPrefillWorker"] + cfg.spec.services["TRTLLMWorker"] = cfg.spec.services[prefill_name] + del cfg.spec.services[prefill_name] ... - if "TRTLLMDecodeWorker" in cfg.spec.services: + decode_name = WORKER_COMPONENT_NAMES["trtllm"].decode_worker_k8s_name + if decode_name in cfg.spec.services: - cfg.spec.services["TRTLLMWorker"] = cfg.spec.services[ - "TRTLLMDecodeWorker" - ] - del cfg.spec.services["TRTLLMDecodeWorker"] + cfg.spec.services["TRTLLMWorker"] = cfg.spec.services[decode_name] + del cfg.spec.services[decode_name]
82-95: Tiny improvement to remove_valued_arguments.Currently removes only the first occurrence. If args can be duplicated, consider a loop.
def remove_valued_arguments(args: list[str], key: str) -> list[str]: - """Remove a valued argument (e.g., --key value) from the arguments list if exists.""" - if key in args: - idx = args.index(key) - if idx + 1 < len(args): - del args[idx : idx + 2] + """Remove all occurrences of a valued argument (e.g., --key value).""" + while True: + try: + idx = args.index(key) + del args[idx : min(idx + 2, len(args))] + except ValueError: + break return args
24-31: Logger setup ok; avoid duplicate handlers in repeated imports.If this module can be imported multiple times, guard handler addition.
-if not any(isinstance(h, logging.StreamHandler) for h in logger.handlers): - logger.addHandler(console_handler) +if not any(isinstance(h, logging.StreamHandler) for h in logger.handlers): + logger.addHandler(console_handler)(Ensure a guard is present; if not, add it.)
components/backends/trtllm/deploy/README.md (3)
45-54: Add this new pattern to the “Choose Your Template” list and cross-link the YAML.Readers won’t discover
disagg_planner.yamlfrom the “Choose Your Template” section. Add a bullet there to keep docs consistent.Example insertion to Lines 132-138:
- - Use `disagg_router.yaml` for high-performance with KV cache routing and disaggregation + - Use `disagg_router.yaml` for high-performance with KV cache routing and disaggregation + - Use `disagg_planner.yaml` for SLA-based auto-scaling with a separate planner and Prometheus
55-56: Call out the required PVC for the planner.
disagg_planner.yamlexpects a pre-created PVC nameddynamo-pvc. Mention it here to prevent failed deploys.> [!NOTE] > This deployment requires pre-deployment profiling to be completed first. See [Pre-Deployment Profiling](../../../../docs/benchmarks/pre_deployment_profiling.md) for detailed instructions. +> +> Additionally, ensure a PersistentVolumeClaim named `dynamo-pvc` exists in the target namespace; the planner mounts it at `/workspace/profiling_results`.
193-194: Fix port-forward target; current example mixes Deployment and Pod naming.Use the Service or Deployment name without pod UID. The service created by the CRD is typically
<deployment-name>-frontend.-kubectl port-forward deployment/trtllm-v1-disagg-frontend-<pod-uuid-info> 8000:8000 +# Prefer forwarding the Service: +kubectl port-forward svc/trtllm-disagg-frontend 8000:8000 +# Or forward the Deployment (no pod UID): +# kubectl port-forward deployment/trtllm-disagg-frontend 8000:8000tests/profiler/test_profile_sla_dryrun.py (1)
27-49: Avoid hardcoding /tmp; use pytest’s tmp_path for isolation and to silence S108.This removes shared state and addresses the static-analysis warning.
- @pytest.fixture - def vllm_args(self): + @pytest.fixture + def vllm_args(self, tmp_path): @@ - output_dir = "/tmp/test_profiling_results" + output_dir = str(tmp_path / "profiling_results") @@ - @pytest.fixture - def sglang_args(self): + @pytest.fixture + def sglang_args(self, tmp_path): @@ - output_dir = "/tmp/test_profiling_results" + output_dir = str(tmp_path / "profiling_results") @@ - @pytest.fixture - def trtllm_args(self): + @pytest.fixture + def trtllm_args(self, tmp_path): @@ - output_dir = "/tmp/test_profiling_results" + output_dir = str(tmp_path / "profiling_results")Also applies to: 52-74, 90-114
components/backends/trtllm/deploy/disagg_planner.yaml (3)
117-124: Expose Prometheus port explicitly for clarity and service selectors.Declare containerPort 8000 to match PROMETHEUS_PORT.
extraPodSpec: mainContainer: image: nvcr.io/nvidian/dynamo-dev/dynamo-trtllm-runtime:hzhou-0909-03 workingDir: /workspace/components/backends/trtllm + ports: + - name: http + containerPort: 8000 command: - python3
21-39: Use a stable, configurable image reference.Hardcoding a private dev tag (
hzhou-0909-03) will break for most users. Parameterize via values/env or reference a public, versioned tag; reflect the same in README.If you want, I can add a kustomize overlay and a values file to centralize the image/tag.
Also applies to: 72-87, 117-124, 155-172, 189-206
87-94: Avoid usingcomponentType: frontendas a workaround to create a Service.Prefer a proper
componentTypeor an explicit Service template if the CRD supports it; the current hack may confuse operators.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (17)
README.md(1 hunks)benchmarks/profiler/profile_sla.py(1 hunks)benchmarks/profiler/utils/config.py(4 hunks)components/backends/trtllm/README.md(1 hunks)components/backends/trtllm/deploy/README.md(1 hunks)components/backends/trtllm/deploy/disagg_planner.yaml(1 hunks)components/backends/trtllm/src/dynamo/trtllm/main.py(2 hunks)components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py(6 hunks)components/planner/src/dynamo/planner/defaults.py(1 hunks)components/planner/src/dynamo/planner/utils/planner_argparse.py(1 hunks)container/Dockerfile.sglang(1 hunks)container/Dockerfile.trtllm(2 hunks)deploy/utils/dynamo_deployment.py(1 hunks)docs/architecture/planner_intro.rst(1 hunks)docs/benchmarks/pre_deployment_profiling.md(1 hunks)docs/guides/dynamo_deploy/sla_planner_deployment.md(1 hunks)tests/profiler/test_profile_sla_dryrun.py(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-08-30T20:43:49.632Z
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.
Applied to files:
container/Dockerfile.trtllm
🧬 Code graph analysis (2)
tests/profiler/test_profile_sla_dryrun.py (1)
benchmarks/profiler/profile_sla.py (1)
run_profile(56-534)
benchmarks/profiler/utils/config.py (1)
components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py (1)
Config(28-93)
🪛 Ruff (0.12.2)
components/backends/trtllm/src/dynamo/trtllm/main.py
217-217: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
tests/profiler/test_profile_sla_dryrun.py
97-97: Probable insecure usage of temporary file or directory: "/tmp/test_profiling_results"
(S108)
benchmarks/profiler/utils/config.py
595-597: Avoid specifying long messages outside the exception class
(TRY003)
657-659: Avoid specifying long messages outside the exception class
(TRY003)
726-726: Avoid specifying long messages outside the exception class
(TRY003)
831-831: Do not catch blind exception: Exception
(BLE001)
🪛 markdownlint-cli2 (0.17.2)
docs/guides/dynamo_deploy/sla_planner_deployment.md
50-50: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Build and Test - vllm
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (20)
docs/benchmarks/pre_deployment_profiling.md (1)
17-18: Matrix update looks goodTRT-LLM Dense marked as supported is consistent with the PR scope.
README.md (1)
62-63: SLA-Based Planner support flip to ✅ for TRT-LLMDocs alignment looks correct with the new backend support.
docs/architecture/planner_intro.rst (1)
47-52: SGLang support appears inconsistent across docs—please verifyREADME shows SLA Planner supports SGLang (✅), but this table lists SGLang as ❌. If SGLang is supported here, flip to ✅; otherwise, update README for consistency.
If supported, apply:
- * - - - ❌ - - SGLang + * - + - ✅ + - SGLangcomponents/backends/trtllm/README.md (1)
58-59: Planner support status updated to ✅Matches the planner/docs changes in this PR.
deploy/utils/dynamo_deployment.py (1)
27-27: Dropping type ignore for aiofiles: ensure typing and packaging are coveredConfirm:
- aiofiles is installed wherever this utility runs (not only in dev/runtime images).
- mypy/pyright passes without stubs; otherwise consider adding types or reintroducing a targeted ignore.
docs/guides/dynamo_deploy/sla_planner_deployment.md (1)
37-44: Good: surfaced sglang/trtllm variantsClearer for users discovering backend-specific manifests.
components/planner/src/dynamo/planner/defaults.py (2)
104-116: TRT-LLM component mapping looks correct for DECODE_FIRST.Decode-first semantics and endpoints align with runtime naming; no issues spotted.
118-122: Registering 'trtllm' in WORKER_COMPONENT_NAMES is correct.This enables planner routing; good addition.
components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py (3)
49-49: Config surface addition ‘override_engine_args’ looks good.Field addition and str inclusion are consistent.
Also applies to: 81-81
126-139: Default endpoint help and logic are consistent.First/next worker defaults are wired correctly.
Also applies to: 317-335
363-364: Wiring the parsed override into Config is correct.No issues.
benchmarks/profiler/utils/config.py (6)
87-94: Good use of shlex for robust arg splitting.Handles quoted JSON/paths reliably.
108-110: Joining args with shlex.join is appropriate.Keeps YAML concise while preserving quoting.
641-699: Overall TRT-LLM convert_config flow looks solid.Rename/merge, arg normalization, and replicas=1 are consistent.
703-753: TP size override via --override-engine-args is correct.Also updates GPU requests/limits; good.
755-793: Model name resolution covers aggregated and disagg worker names.Reasonable fallbacks and warnings.
841-849: Exports look good.Re-exporting WORKER_COMPONENT_NAMES is convenient for callers.
tests/profiler/test_profile_sla_dryrun.py (1)
115-120: LGTM on the new TRT-LLM dry-run test.Matches existing vLLM/sglang tests and exercises the new backend path.
benchmarks/profiler/profile_sla.py (1)
552-553: CLI backend choices extended to include trtllm — LGTM.This aligns with CONFIG_MODIFIERS and new tests.
components/backends/trtllm/deploy/disagg_planner.yaml (1)
41-47: Question: Does the Planner actually need the HuggingFace token?If not required, drop
envFromSecret: hf-token-secretfrom Planner to reduce secret surface.
Signed-off-by: Biswa Panda <[email protected]> Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
This reverts commit f04de1d. Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
ae9595f to
e62f664
Compare
…/trtllm-sweep Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: hongkuanz <[email protected]>
Signed-off-by: Biswa Panda <[email protected]> Signed-off-by: hongkuanz <[email protected]> Signed-off-by: ayushag <[email protected]> Signed-off-by: Dillon Cullinan <[email protected]> Signed-off-by: Harrison Saturley-Hall <[email protected]> Signed-off-by: alec-flowers <[email protected]> Signed-off-by: Julien Mancuso <[email protected]> Signed-off-by: Pavithra Vijayakrishnan <[email protected]> Signed-off-by: Harry Kim <[email protected]> Signed-off-by: Tushar Sharma <[email protected]> Signed-off-by: GuanLuo <[email protected]> Signed-off-by: Hannah Zhang <[email protected]> Signed-off-by: Graham King <[email protected]> Signed-off-by: Anant Sharma <[email protected]> Signed-off-by: Greg Clark <[email protected]> Signed-off-by: Neal Vaidya <[email protected]> Co-authored-by: Biswa Panda <[email protected]> Co-authored-by: Ayush Agarwal <[email protected]> Co-authored-by: Dillon Cullinan <[email protected]> Co-authored-by: Harrison Saturley-Hall <[email protected]> Co-authored-by: alec-flowers <[email protected]> Co-authored-by: julienmancuso <[email protected]> Co-authored-by: Pavithra Vijayakrishnan <[email protected]> Co-authored-by: Harry Kim <[email protected]> Co-authored-by: Tushar Sharma <[email protected]> Co-authored-by: Alec <[email protected]> Co-authored-by: GuanLuo <[email protected]> Co-authored-by: hhzhang16 <[email protected]> Co-authored-by: Graham King <[email protected]> Co-authored-by: Anant Sharma <[email protected]> Co-authored-by: Greg Clark <[email protected]> Co-authored-by: Neal Vaidya <[email protected]> Co-authored-by: nv-nmailhot <[email protected]> Signed-off-by: ayushag <[email protected]>
Signed-off-by: Biswa Panda <[email protected]> Signed-off-by: hongkuanz <[email protected]> Signed-off-by: ayushag <[email protected]> Signed-off-by: Dillon Cullinan <[email protected]> Signed-off-by: Harrison Saturley-Hall <[email protected]> Signed-off-by: alec-flowers <[email protected]> Signed-off-by: Julien Mancuso <[email protected]> Signed-off-by: Pavithra Vijayakrishnan <[email protected]> Signed-off-by: Harry Kim <[email protected]> Signed-off-by: Tushar Sharma <[email protected]> Signed-off-by: GuanLuo <[email protected]> Signed-off-by: Hannah Zhang <[email protected]> Signed-off-by: Graham King <[email protected]> Signed-off-by: Anant Sharma <[email protected]> Signed-off-by: Greg Clark <[email protected]> Signed-off-by: Neal Vaidya <[email protected]> Co-authored-by: Biswa Panda <[email protected]> Co-authored-by: Ayush Agarwal <[email protected]> Co-authored-by: Dillon Cullinan <[email protected]> Co-authored-by: Harrison Saturley-Hall <[email protected]> Co-authored-by: alec-flowers <[email protected]> Co-authored-by: julienmancuso <[email protected]> Co-authored-by: Pavithra Vijayakrishnan <[email protected]> Co-authored-by: Harry Kim <[email protected]> Co-authored-by: Tushar Sharma <[email protected]> Co-authored-by: Alec <[email protected]> Co-authored-by: GuanLuo <[email protected]> Co-authored-by: hhzhang16 <[email protected]> Co-authored-by: Graham King <[email protected]> Co-authored-by: Anant Sharma <[email protected]> Co-authored-by: Greg Clark <[email protected]> Co-authored-by: Neal Vaidya <[email protected]> Co-authored-by: nv-nmailhot <[email protected]> Signed-off-by: zhongdaor <[email protected]>
--override-engine-argsargument in trtllm to override flags in yaml fileSummary by CodeRabbit
New Features
Documentation
Chores
Tests