Skip to content

Conversation

@nv-anants
Copy link
Contributor

@nv-anants nv-anants commented Oct 21, 2025

Overview:

move engine configs and multi node configs out of components directory. Engine configs were added to recipes folder.

ref: components dir restructing DEP.

  • closes: OPS-1364

Summary by CodeRabbit

  • Documentation

    • Updated configuration file paths in deployment and launch documentation to reflect new directory structure.
    • Updated example commands and scripts with new configuration paths.
  • Chores

    • Reorganized engine configuration file paths to use standardized recipes directory structure.
    • Updated environment variable defaults for engine configuration locations.
    • Added new configuration files for model deployments.

Signed-off-by: Anant Sharma <[email protected]>
Signed-off-by: Anant Sharma <[email protected]>
Signed-off-by: Anant Sharma <[email protected]>
@nv-anants nv-anants marked this pull request as ready for review October 23, 2025 14:40
@nv-anants nv-anants requested review from a team as code owners October 23, 2025 14:40
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 23, 2025

Walkthrough

This PR refactors path resolution across the TRTLLM backend by introducing a DYNAMO_HOME environment variable (defaulting to /workspace) and migrating configuration file references from hard-coded engine_configs/ paths to recipe-based paths structured as recipes/<model>/trtllm/. Changes are applied consistently across deployment scripts, shell launch scripts, YAML configurations, documentation, and example files.

Changes

Cohort / File(s) Summary
Launch scripts
components/backends/trtllm/launch/agg.sh, agg_metrics.sh, agg_router.sh, disagg.sh, disagg_router.sh, epd_disagg.sh, gpt_oss_disagg.sh
Added DYNAMO_HOME environment variable with default /workspace and updated engine argument defaults from engine_configs/* to $DYNAMO_HOME/recipes/<model>/trtllm/*
Deployment YAML files
components/backends/trtllm/deploy/agg.yaml, agg-with-config.yaml, agg_router.yaml, disagg.yaml, disagg-multinode.yaml, disagg_planner.yaml, disagg_router.yaml
Updated container workingDir from /workspace/components/backends/trtllm to /workspace/ and replaced engine_configs/ paths with recipe-based paths (./recipes/<model>/trtllm/*) in extra-engine-args
Documentation path updates
docs/backends/trtllm/README.md, gemma3_sliding_window_attention.md, gpt-oss.md, llama4_plus_eagle.md, multimodal_support.md, docs/backends/trtllm/multinode/multinode-examples.md, multinode-multimodal-example.md
Updated example configurations and command references from engine_configs/ to recipes/ paths
Kubernetes documentation
docs/kubernetes/README.md
Updated example command to use $DYNAMO_HOME/recipes/deepseek-r1-distill-llama-8b/agg.yaml variable reference
Example scripts
examples/basics/multinode/trtllm/srun_aggregated.sh, srun_disaggregated.sh
Updated DEFAULT_MOUNT to three levels up (../../../), replaced engine_configs/ paths with recipes/ paths, and updated startup script locations
Recipe load balancer configs
recipes/deepseek-r1/trtllm/wide_ep/wide_ep_agg.yaml, wide_ep_decode.yaml, wide_ep_prefill.yaml
Updated moe_config.load_balancer path from /mnt/engine_configs/ to /mnt/recipes/ structure
New recipe YAML files
recipes/qwen2-vl-7b-instruct/trtllm/agg.yaml, recipes/qwen3/trtllm/agg.yaml, recipes/qwen3/trtllm/decode.yaml, recipes/qwen3/trtllm/prefill.yaml
Introduced new model-specific TRTLLM configuration files with tensor parallelism, attention, memory, and backend settings
Build/benchmark scripts
benchmarks/router/run_engines.sh
Added DYNAMO_HOME variable and replaced hard-coded component paths with $RECIPE_PATH references

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Rationale: Large cohort of changes (20+ files) but predominantly homogeneous refactoring—repeated path migration pattern applied consistently across multiple file types. Each individual change involves straightforward path substitution with minimal logic complexity. Primary review focus should verify path consistency across files and validate that all recipe directory structures are correctly specified.

Poem

🐰 Paths refactored, recipes bloom,
From engine_configs to workspace room,
DYNAMO_HOME leads the way,
Scripts and configs unified today!
*~ A Rabbit's Delight* 🌱✨

Pre-merge checks

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description Check ❓ Inconclusive The pull request description provides an Overview section and issue reference but is missing two significant sections from the template: the Details section (which should explain specific changes made) and the Where should the reviewer start section (which should guide reviewers to critical files). While the Overview gives a high-level summary and references the relevant DEP, the description lacks structured detail about the scope of changes across the 30+ modified files, the approach taken, or guidance on which files warrant closest review. For a large refactoring PR, this structural incompleteness makes the description incomplete relative to the template requirements. The description should be expanded to include a Details section documenting the changes systematically (e.g., which config files were moved, what path updates were made, any version considerations) and a Where should the reviewer start section highlighting key files such as the launch scripts and deployment YAML files that define the new path structure. This would help reviewers understand the scope and prioritize their review effectively.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The pull request title "refactor: move engine configs out of components directory" directly and clearly describes the primary change across the changeset. The title accurately summarizes the main objective evident from the raw_summary: relocating engine configuration files from components/backends/trtllm/engine_configs/ to the recipes/ directory structure, with corresponding updates to all references across launch scripts, deployment configs, documentation, and examples. The title is concise, specific, and follows standard conventions without vague terminology or noise.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/backends/trtllm/deploy/agg-with-config.yaml (1)

54-54: Critical path resolution mismatch between workingDir and ConfigMap mount.

The ConfigMap is mounted at /workspace/ (line 58), but the workingDir remains /workspace/components/backends/trtllm (line 54). The relative path ./recipes/qwen3/trtllm/agg.yaml will resolve to /workspace/components/backends/trtllm/recipes/qwen3/trtllm/agg.yaml, which does not match the ConfigMap mount location at /workspace/recipes/qwen3/trtllm/agg.yaml.

This will cause the deployment to fail when attempting to load the engine configuration.

Fix: Update workingDir to /workspace/ to match the ConfigMap mount, or update the extra-engine-args path to be absolute (e.g., /workspace/recipes/qwen3/trtllm/agg.yaml).

  mainContainer:
    image: nvcr.io/nvidia/ai-dynamo/tensorrtllm-runtime:my-tag
-   workingDir: /workspace/components/backends/trtllm
+   workingDir: /workspace/
    volumeMounts:
    - name: nvidia-config
-     mountPath: /workspace/
+     mountPath: /workspace/
      readOnly: true
    command:
    - python3
    - -m
    - dynamo.trtllm
    args:
      - --model-path
      - Qwen/Qwen3-0.6B
      - --served-model-name
      - Qwen/Qwen3-0.6B
      - --extra-engine-args
      - ./recipes/qwen3/trtllm/agg.yaml

Also applies to: 58-58, 70-70

🧹 Nitpick comments (4)
docs/backends/trtllm/llama4_plus_eagle.md (1)

33-33: Update documentation note with full path clarification.

Line 33 references the recipes/llama4/trtllm/eagle folder, but this is a relative path. Consider clarifying whether this path is relative to $DYNAMO_HOME, the container root, or from a specific working directory to reduce user confusion.

-* Make sure the (`eagle3_one_model: true`) is set in the LLM API config inside the `recipes/llama4/trtllm/eagle` folder.
+* Make sure the (`eagle3_one_model: true`) is set in the LLM API config inside the `$DYNAMO_HOME/recipes/llama4/trtllm/eagle` folder.
docs/backends/trtllm/gpt-oss.md (1)

93-93: Documentation correctly reflects recipe-based paths, but consider mentioning DYNAMO_HOME environment variable.

The configuration file references have been properly updated to use recipe-based paths. However, users may benefit from knowing that the paths assume DYNAMO_HOME=/workspace by default, and this environment variable can be customized. Consider adding a brief note in the Prerequisites or Instructions section about this.

Also applies to: 100-100, 150-150, 167-167

recipes/qwen2-vl-7b-instruct/trtllm/agg.yaml (1)

30-33: Helpful inline documentation for configuration changes.

The comments referencing TensorRT-LLM upstream changes provide valuable context for maintainers. Consider adding similar documentation to other recipe files (prefill, decode variants) if they also reflect upstream API changes.

components/backends/trtllm/launch/disagg.sh (1)

6-6: Path resolution inconsistency: shell scripts vs deployment YAMLs.

Shell scripts use absolute paths via $DYNAMO_HOME/recipes/ (Lines 10-11) while deployment YAMLs use relative paths ./recipes/ depending on workingDir. Both approaches assume /workspace/recipes/ exists. While this should work, the inconsistency increases maintenance complexity. Consider standardizing on one approach (e.g., all absolute, or all relative with consistent workingDir).

Also applies to: 10-11

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 94aa2a7 and 10a8aac.

📒 Files selected for processing (32)
  • benchmarks/router/run_engines.sh (2 hunks)
  • components/backends/trtllm/deploy/agg-with-config.yaml (2 hunks)
  • components/backends/trtllm/deploy/agg.yaml (2 hunks)
  • components/backends/trtllm/deploy/agg_router.yaml (2 hunks)
  • components/backends/trtllm/deploy/disagg-multinode.yaml (4 hunks)
  • components/backends/trtllm/deploy/disagg.yaml (4 hunks)
  • components/backends/trtllm/deploy/disagg_planner.yaml (4 hunks)
  • components/backends/trtllm/deploy/disagg_router.yaml (4 hunks)
  • components/backends/trtllm/launch/agg.sh (1 hunks)
  • components/backends/trtllm/launch/agg_metrics.sh (1 hunks)
  • components/backends/trtllm/launch/agg_router.sh (1 hunks)
  • components/backends/trtllm/launch/disagg.sh (1 hunks)
  • components/backends/trtllm/launch/disagg_router.sh (1 hunks)
  • components/backends/trtllm/launch/epd_disagg.sh (1 hunks)
  • components/backends/trtllm/launch/gpt_oss_disagg.sh (1 hunks)
  • docs/backends/trtllm/README.md (1 hunks)
  • docs/backends/trtllm/gemma3_sliding_window_attention.md (4 hunks)
  • docs/backends/trtllm/gpt-oss.md (3 hunks)
  • docs/backends/trtllm/llama4_plus_eagle.md (2 hunks)
  • docs/backends/trtllm/multimodal_support.md (2 hunks)
  • docs/backends/trtllm/multinode/multinode-examples.md (2 hunks)
  • docs/backends/trtllm/multinode/multinode-multimodal-example.md (2 hunks)
  • docs/kubernetes/README.md (1 hunks)
  • examples/basics/multinode/trtllm/srun_aggregated.sh (3 hunks)
  • examples/basics/multinode/trtllm/srun_disaggregated.sh (4 hunks)
  • recipes/deepseek-r1/trtllm/wide_ep/wide_ep_agg.yaml (1 hunks)
  • recipes/deepseek-r1/trtllm/wide_ep/wide_ep_decode.yaml (1 hunks)
  • recipes/deepseek-r1/trtllm/wide_ep/wide_ep_prefill.yaml (1 hunks)
  • recipes/qwen2-vl-7b-instruct/trtllm/agg.yaml (1 hunks)
  • recipes/qwen3/trtllm/agg.yaml (1 hunks)
  • recipes/qwen3/trtllm/decode.yaml (1 hunks)
  • recipes/qwen3/trtllm/prefill.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-07-03T10:14:30.570Z
Learnt from: fsaady
PR: ai-dynamo/dynamo#1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.

Applied to files:

  • examples/basics/multinode/trtllm/srun_aggregated.sh
  • examples/basics/multinode/trtllm/srun_disaggregated.sh
📚 Learning: 2025-07-02T13:20:28.800Z
Learnt from: fsaady
PR: ai-dynamo/dynamo#1730
File: examples/sglang/slurm_jobs/job_script_template.j2:59-59
Timestamp: 2025-07-02T13:20:28.800Z
Learning: In the SLURM job script template at examples/sglang/slurm_jobs/job_script_template.j2, the `--total_nodes` parameter represents the total nodes per worker type (prefill or decode), not the total nodes in the entire cluster. Each worker type needs to know its own group size for distributed coordination.

Applied to files:

  • examples/basics/multinode/trtllm/srun_disaggregated.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: vllm (arm64)
  • GitHub Check: sglang
  • GitHub Check: vllm (amd64)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (31)
components/backends/trtllm/launch/agg_metrics.sh (2)

6-6: DYNAMO_HOME default is consistent across all launch scripts.

Verified: All 7 launch scripts (agg_metrics.sh, agg.sh, agg_router.sh, disagg.sh, disagg_router.sh, epd_disagg.sh, gpt_oss_disagg.sh) use the identical default ${DYNAMO_HOME:-"/workspace"} at line 6.


9-9: No issues found—recipes directory structure is correctly defined.

The referenced path $DYNAMO_HOME/recipes/qwen3/trtllm/agg.yaml exists in the repository at recipes/qwen3/trtllm/agg.yaml, along with the expected decode.yaml and prefill.yaml files. The directory structure follows the established pattern used across all supported models in the repository.

recipes/deepseek-r1/trtllm/wide_ep/wide_ep_decode.yaml (1)

20-20: Load balancer file verified and properly configured.

The eplb.yaml file exists at recipes/deepseek-r1/trtllm/wide_ep/eplb.yaml and is consistently referenced across all three wide_ep config files (decode, prefill, and agg). The file has been correctly migrated to the new recipes path.

components/backends/trtllm/deploy/agg.yaml (2)

39-39: Manual verification required for path and file existence.

The verification could not locate the file components/backends/trtllm/deploy/agg.yaml, the recipes directory, or any agg.yaml files in the repository structure. This could indicate the files are newly added in this PR and not yet present in the cloned state, or the paths differ from what was expected.

Please manually verify that:

  • The referenced engine config file exists at ./recipes/qwen3/trtllm/agg.yaml in the repository
  • The working directory context from which this relative path is resolved is correctly configured in your deployment setup
  • The path change from the previous location is intentional and the file has been moved/created at the new location

28-28: Relative path resolution verified in repository structure.

The repository contains recipes/qwen3/trtllm/agg.yaml at the root level, confirming the relative path ./recipes/qwen3/trtllm/agg.yaml in the args will resolve correctly when workingDir is set to /workspace/. The code change itself is valid.

However, this requires confirming that the deployment container includes the recipes directory mounted or copied at /workspace/recipes/ during image build or pod initialization. Verify with your container deployment configuration and Dockerfile/build scripts to ensure recipes are present in the runtime environment.

docs/backends/trtllm/llama4_plus_eagle.md (1)

57-57: File extension inconsistency confirmed in documentation—requires verification of correct extension.

The documentation does contain the inconsistency: line 57 references eagle_agg.yml (.yml) while lines 65 and 67 reference eagle_prefill.yaml and eagle_decode.yaml (.yaml). However, these config files are deployment-time artifacts (paths reference /mnt/recipes/ which is a runtime location) and do not exist in the repository.

The inconsistency in the documented file extensions is real, but determining which is correct requires verifying against your actual deployment configuration or confirming the intended setup. Please verify whether eagle_agg.yml should use .yml or .yaml to match the other config files.

recipes/deepseek-r1/trtllm/wide_ep/wide_ep_prefill.yaml (1)

20-20: No changes required—eplb.yaml file is correctly migrated and referenced.

The load_balancer configuration file (eplb.yaml) exists at recipes/deepseek-r1/trtllm/wide_ep/eplb.yaml and is properly referenced. The /mnt/recipes/... path is an intentional deployment-time mount point used consistently across all three related config files (wide_ep_prefill.yaml, wide_ep_agg.yaml, and wide_ep_decode.yaml). The migration from engine_configs/ to recipes is complete and verified.

recipes/deepseek-r1/trtllm/wide_ep/wide_ep_agg.yaml (1)

14-14: Load balancer file exists and is correctly referenced.

The eplb.yaml file is present at recipes/deepseek-r1/trtllm/wide_ep/eplb.yaml with valid content. The path reference in wide_ep_agg.yaml at line 14 correctly points to this file, and it has been properly migrated to the recipes directory structure.

docs/backends/trtllm/README.md (1)

165-165: Incorrect review comment.

The documented path ./recipes/deepseek-r1/trtllm/mtp/mtp_agg.yaml is an example environment variable setting for users, not a source repository file. These are external recipe configuration files that users need to obtain or create, relative to $DYNAMO_HOME/components/backends/trtllm where the command executes. This pattern is consistent throughout documentation (gemma3, gpt-oss, multinode examples), where recipe files are user-provided, not tracked in source control. The absence from the repository is expected and correct.

Likely an incorrect or invalid review comment.

docs/backends/trtllm/gpt-oss.md (1)

131-143: Minor path inconsistency between launch script reference and manual command path.

Line 131 references /workspace/components/backends/trtllm, but line 139 shows /workspace/dynamo/components/backends/trtllm. Verify which path is correct for the updated directory structure, or clarify the context (e.g., whether mounted paths differ in containers).

recipes/qwen3/trtllm/decode.yaml (1)

1-31: Recipe configuration file properly structured.

The decode configuration is well-formed with appropriate defaults. The settings align with typical decode-phase optimizations (lower batch size, default cache transceiver backend).

recipes/qwen2-vl-7b-instruct/trtllm/agg.yaml (1)

1-29: Aggregated mode configuration properly structured.

The configuration is complete with appropriate tensor parallelism (8) and batch sizing for non-disaggregated inference.

benchmarks/router/run_engines.sh (1)

7-7: DYNAMO_HOME and RECIPE_PATH properly implemented.

The environment variable defaults are clean and the path construction is consistent. Engine config resolution now correctly uses the recipe-based structure.

Also applies to: 10-10, 89-95

examples/basics/multinode/trtllm/srun_disaggregated.sh (1)

13-13: Recipe config paths properly updated for multinode context.

The engine config paths now correctly reference /mnt/recipes/deepseek-r1/trtllm/wide_ep/ with the new recipe structure. The mount point depth adjustment supports the new repository structure.

Also applies to: 20-20, 24-24

components/backends/trtllm/launch/epd_disagg.sh (1)

6-6: DYNAMO_HOME pattern consistently applied.

The script properly introduces DYNAMO_HOME with a sensible default and constructs the engine config paths correctly. Verify that all three recipe files (prefill, decode, encode) exist in recipes/qwen2-vl-7b-instruct/trtllm/.

Also applies to: 10-12

components/backends/trtllm/deploy/disagg_planner.yaml (2)

89-89: Worker workingDir properly updated for recipe path resolution.

The TRTLLMDecodeWorker and TRTLLMPrefillWorker workingDir changes to /workspace/ correctly align with the recipe-based path structure. Relative paths like ./recipes/qwen3/trtllm/decode.yaml will now resolve correctly.

Also applies to: 100-100, 118-118, 129-129


20-20: Frontend workingDir unchanged—verify this is intentional.

The Frontend container workingDir remains at /workspace/components/backends/trtllm while worker components have been moved to /workspace/. Confirm this is the intended behavior or update for consistency.

components/backends/trtllm/deploy/disagg_router.yaml (1)

31-31: Verify recipes directory structure for both prefill and decode configurations.

Similar to agg_router.yaml, the workingDir changes (Lines 31, 59) shift path resolution for engine configs (Lines 42, 70) to /workspace/recipes/qwen3/trtllm/{prefill,decode}.yaml. Confirm these paths are accessible in deployed environments.

Also applies to: 42-42, 59-59, 70-70

docs/backends/trtllm/multinode/multinode-examples.md (1)

139-139: Verify recipes directory is present at host path mounted to /mnt.

The documentation examples reference /mnt/recipes/... paths in containers, which map to the host via MOUNTS="${PWD}/../:/mnt" (Line 110). Confirm that the recipes directory exists at the parent directory of the documentation folder and contains the referenced config files (wide_ep_agg.yaml, wide_ep_prefill.yaml, etc.).

Also applies to: 168-169

components/backends/trtllm/launch/agg.sh (1)

6-6: Path resolution approach consistent with disagg.sh.

This script follows the same DYNAMO_HOME-based path resolution as disagg.sh. Refer to earlier comment about path resolution inconsistency between shell scripts and deployment YAMLs.

Also applies to: 9-9

docs/backends/trtllm/multinode/multinode-multimodal-example.md (1)

37-37: Verify recipes directory structure includes model-specific subdirectories.

The documentation references /mnt/recipes/llama4/trtllm/multimodal/ paths. Ensure the recipes directory hierarchy includes model-specific subdirectories (llama4, gemma3, deepseek-r1, etc.) and that the sed command on Line 37 correctly targets these new paths.

Also applies to: 103-104

components/backends/trtllm/deploy/disagg.yaml (1)

29-29: Path resolution consistent with other YAML deployments, but see critical issue flagged for gemma3 documentation.

This file follows the same relative path pattern as other deployment YAMLs (workingDir=/workspace/ with ./recipes/... references). However, note the critical path resolution context mismatch identified in the gemma3 documentation file regarding where recipes directory should actually exist.

Also applies to: 40-40, 57-57, 68-68

components/backends/trtllm/deploy/agg_router.yaml (1)

31-31: Verify volume mounts for recipes directory are configured elsewhere or baked into container image.

The file ./recipes/qwen3/trtllm/agg.yaml exists in the repository, and the relative path will correctly resolve to /workspace/recipes/qwen3/trtllm/agg.yaml at runtime. However, the agg_router.yaml shown contains no volume mount configuration for the recipes/ directory. Confirm either:

  • The recipes/ directory is included in the container image at /workspace/recipes/, or
  • Volume mounts for the recipes directory are configured in a parent deployment resource or elsewhere in your setup.
components/backends/trtllm/launch/agg_router.sh (1)

6-6: LGTM!

The DYNAMO_HOME introduction and path update are correct and consistent with the refactoring pattern across the PR.

Also applies to: 9-9

docs/backends/trtllm/multimodal_support.md (1)

28-28: LGTM!

Documentation correctly reflects the new path structure and working directory conventions. The use of relative paths after cd $DYNAMO_HOME is appropriate and clear for users.

Also applies to: 30-30, 78-78, 83-84

recipes/qwen3/trtllm/agg.yaml (1)

1-34: LGTM on configuration structure.

The new YAML configuration is well-documented and includes helpful comments referencing upstream TensorRT-LLM changes. Consider verifying that this configuration works correctly with the qwen3-0.6B model during testing.

recipes/qwen3/trtllm/prefill.yaml (1)

1-30: LGTM on configuration structure.

The prefill-specific configuration includes appropriate tuning for prefill-only workers, notably disabling overlap_scheduler with a clear explanation. The cache_transceiver_config suggests proper disaggregated mode setup.

components/backends/trtllm/launch/disagg_router.sh (1)

6-6: LGTM on pattern, but verify decode.yaml exists.

The DYNAMO_HOME introduction and path updates follow the refactoring pattern. However, this script references recipes/qwen3/trtllm/decode.yaml, which was not provided in the PR files.

Verify that recipes/qwen3/trtllm/decode.yaml exists in the repository or will be created in this PR.

Also applies to: 10-11

components/backends/trtllm/launch/gpt_oss_disagg.sh (1)

6-6: LGTM on pattern, but verify gpt-oss-120b config files exist.

The DYNAMO_HOME introduction is consistent with the PR pattern. The script references recipes/gpt-oss-120b/trtllm/disagg/prefill.yaml and decode.yaml, which were not provided in the PR files.

Verify that the following files exist in the repository or will be created in this PR:

  • recipes/gpt-oss-120b/trtllm/disagg/prefill.yaml
  • recipes/gpt-oss-120b/trtllm/disagg/decode.yaml

Also applies to: 10-11

components/backends/trtllm/deploy/disagg-multinode.yaml (2)

128-142: Potential inconsistency: ConfigMap definitions not used by workers.

The deployment YAML defines prefill.yaml and decode.yaml inline in the ConfigMap (lines 8-63), but the workers reference external files via ./recipes/qwen3/trtllm/prefill.yaml and decode.yaml. This suggests the inline ConfigMap definitions are unused.

This could indicate:

  1. Incomplete refactoring where ConfigMap should be removed, or
  2. The args should reference the ConfigMap-mounted files instead (e.g., /workspace/prefill.yaml)

Please clarify whether the inline ConfigMap definitions should be:

  • Removed entirely (if recipes/ directory is now the source of truth), or
  • Referenced correctly by updating the args to use them

Also applies to: 168-182


131-131: LGTM on path structure.

The workingDir change to /workspace/ correctly enables relative path resolution for ./recipes/... references. This is consistent with the documentation updates.

Also applies to: 171-171

Signed-off-by: Anant Sharma <[email protected]>
Signed-off-by: Anant Sharma <[email protected]>
Signed-off-by: Anant Sharma <[email protected]>
Copy link
Contributor

@richardhuo-nv richardhuo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@nv-anants nv-anants merged commit 8354d32 into main Oct 24, 2025
24 of 25 checks passed
@nv-anants nv-anants deleted the reorg/trtllm-configs branch October 24, 2025 17:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants