[None][chore] unify the backend strings and warn the default backend change #6350

Superjomn · 2025-07-25T03:20:52Z

Description

Add warning in trtllm-serve and trtllm-bench if the --backend is not specified, this will guide the existing TRT users to specify --backend tensorrt to keep the existing behavior
Unify the backend candidates between trtllm-serve and trtllm-bench, like changing the trtllm-serve --backend trt to --backend tensorrt

Summary by CodeRabbit

New Features
- Updated backend selection in CLI commands to use a structured backend enumeration with deferred defaulting to "PyTorch" and user warnings.
- Improved backend name display and validation across benchmarks and serving commands for clearer backend management.
Tests
- Added unit tests to verify backend enum display names and canonical values.
- Updated test parameterization and fixtures to use full backend names ("tensorrt" instead of "trt") for clarity.
Chores
- Replaced hardcoded backend strings with a unified backend enumeration for consistent handling.
- Removed deprecated global backend alias list to streamline backend options.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

coderabbitai · 2025-07-25T03:21:01Z

📝 Walkthrough

Walkthrough

A new BackendType enum was introduced to represent supported backend frameworks with multiple aliases and canonical values. The CLI options for backend selection in latency, throughput, and serve commands were updated to use this enum with a default of None, deferring default assignment with a warning. The global variable ALL_SUPPORTED_BACKENDS was removed. Test fixtures and parameterizations were updated to replace the shorthand "trt" with "tensorrt". Unit tests for BackendType were added.

Changes

File(s)	Change Summary
tensorrt_llm/llmapi/utils.py	Added `BackendType` enum with multiple aliases and utility methods for backend normalization, default handling, and logging.
tensorrt_llm/bench/benchmark/low_latency.py	Updated `--backend` CLI option to use `BackendType.canonical_values()` with default `None`; added backend defaulting with warning and info logs; replaced string literals with enum values.
tensorrt_llm/bench/benchmark/throughput.py	Updated `--backend` CLI option to use `BackendType.canonical_values()` with default `None`; added backend defaulting with warning and info logs; replaced string literals with enum values.
tensorrt_llm/commands/serve.py	Updated `--backend` CLI option to use `BackendType.canonical_values()` with default `None`; changed `serve` function signature to accept `Optional[str]`; added backend defaulting with warning and info logs; replaced string literals with enum values.
tests/unittest/llmapi/test_utils.py	Added unit tests for `BackendType` enum members verifying display names, canonical values, and canonical values list.
tensorrt_llm/bench/benchmark/utils/general.py	Removed global variable `ALL_SUPPORTED_BACKENDS` (list of supported backend strings).
tensorrt_llm/llmapi/init.py	Added `BackendType` to public exports (`__all__`) of the `tensorrt_llm.llmapi` package.
tests/integration/defs/test_e2e.py	Updated test parameterization decorators to replace `"trt"` with `"tensorrt"` in backend lists for multiple end-to-end tests.
tests/unittest/llmapi/apps/_test_openai_chat.py	Changed test fixture parameter from `"trt"` to `"tensorrt"` and updated related conditional logic accordingly.
tests/unittest/llmapi/apps/_test_openai_completions.py	Changed test fixture parameter from `"trt"` to `"tensorrt"` and updated conditional logic in server fixture accordingly.
tests/unittest/llmapi/apps/_test_openai_misc.py	Changed test fixture parameter from `"trt"` to `"tensorrt"`.
tests/unittest/llmapi/apps/_test_openai_multi_gpu.py	Changed test fixture parameter from `"trt"` to `"tensorrt"`.
tests/unittest/llmapi/apps/_test_openai_reasoning.py	Changed test fixture parameter from `"trt"` to `"tensorrt"`.
tests/integration/defs/stress_test/stress_test.py	Updated backend parameterization from `"trt"` to `"tensorrt"` in stress test.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI_Command
    participant Utils

    User->>CLI_Command: Run command with --backend option (or none)
    CLI_Command->>CLI_Command: Check if backend is None
    alt backend is None
        CLI_Command->>CLI_Command: Log warning about default backend "pytorch"
        CLI_Command->>CLI_Command: Set backend to "pytorch"
    end
    CLI_Command->>Utils: get_default_backend_with_warning(backend)
    Utils-->>CLI_Command: Return validated backend string
    CLI_Command->>Utils: print_backend_info(backend)
    CLI_Command->>User: Continue command execution

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~7 minutes

Possibly related PRs

[BREAKING CHANGE]: change default backend to PyTorch in trtllm-serve #5717: Modifies the serve command's --backend option and backend parameter handling similarly, changing default backend behavior and updating backend validation and logging to use the new BackendType enum instead of string literals or fixed lists.
[fix] Fixes to parameter usage and low latency configuration. #6343: Also modifies backend handling in low_latency.py, replacing string literals and ALL_SUPPORTED_BACKENDS with the BackendType enum, complementing backend management improvements.
[nvbug/5320234] fix: test_trtllm_bench_llmapi_launch #6359: Standardizes backend string identifiers by replacing shorthand "trt" with canonical "tensorrt" in tests and benchmark commands, directly related in backend naming conventions.

Suggested labels

Community want to contribute

Suggested reviewers

FrankD412
kaiyux
yilin-void

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

tensorrt_llm/bench/benchmark/low_latency.py (1)

182-186: Same mapping / duplication issue as in throughput path

Replicate the fix suggested for throughput_command to avoid three divergent copies of the logic.

🧹 Nitpick comments (2)

tensorrt_llm/bench/benchmark/throughput.py (1)

1-3: Missing NVIDIA copyright header

TensorRT-LLM OSS policy requires the header at the top of every source file. Please add it while touching this file.

tensorrt_llm/commands/serve.py (1)

268-272: Backend alias “trt” is not recognised by the message

backend_repr = "PyTorch" if backend == "pytorch" else "TensorRT" treats anything other than "pytorch" as TensorRT.
Small but visible UX bug: if the user passes --backend trt, the warning still prints “TensorRT” correctly, but if a new backend (_autodeploy) slips through later, the wording becomes wrong.

Reuse a shared helper (see comment in throughput.py) to keep the mapping single-sourced.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6268a60 and 119b6fe.

📒 Files selected for processing (3)

tensorrt_llm/bench/benchmark/low_latency.py (1 hunks)
tensorrt_llm/bench/benchmark/throughput.py (1 hunks)
tensorrt_llm/commands/serve.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

tensorrt_llm/bench/benchmark/low_latency.py
tensorrt_llm/bench/benchmark/throughput.py
tensorrt_llm/commands/serve.py

**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

tensorrt_llm/bench/benchmark/low_latency.py
tensorrt_llm/bench/benchmark/throughput.py
tensorrt_llm/commands/serve.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

tensorrt_llm/bench/benchmark/throughput.py

Superjomn · 2025-07-25T05:19:33Z

/bot run

tensorrt-cicd · 2025-07-25T05:24:40Z

PR_Github #12949 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-25T05:52:44Z

PR_Github #12949 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #9657 completed with status: 'FAILURE'

Superjomn · 2025-07-25T06:33:20Z

/bot run

tensorrt-cicd · 2025-07-25T06:39:48Z

PR_Github #12968 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-25T07:18:50Z

PR_Github #12968 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #9674 completed with status: 'FAILURE'

FrankD412 · 2025-07-25T17:04:12Z

@Superjomn -- ping, accidentally rebased your branch. Was in the wrong tab.

Superjomn · 2025-07-26T02:29:00Z

@FrankD412 NVM, I will reset the commit and rebase it again.

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4324128 and 34efdfe.

📒 Files selected for processing (6)

requirements.txt (1 hunks)
tensorrt_llm/bench/benchmark/low_latency.py (1 hunks)
tensorrt_llm/bench/benchmark/throughput.py (2 hunks)
tensorrt_llm/commands/serve.py (2 hunks)
tensorrt_llm/llmapi/utils.py (2 hunks)
tests/unittest/llmapi/test_llm_utils.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

tensorrt_llm/bench/benchmark/low_latency.py
tensorrt_llm/bench/benchmark/throughput.py
tensorrt_llm/commands/serve.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

tests/unittest/llmapi/test_llm_utils.py
tensorrt_llm/llmapi/utils.py

**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

tests/unittest/llmapi/test_llm_utils.py
tensorrt_llm/llmapi/utils.py

🔇 Additional comments (4)

tensorrt_llm/llmapi/utils.py (2)

23-23: LGTM: Proper import for MultiValueEnum support.

The import correctly adds support for the new BackendType enum implementation.

572-573: LGTM: Clean utility function implementation.

The function provides a simple interface to get human-readable backend names using the new enum.
tests/unittest/llmapi/test_llm_utils.py (2)

7-7: LGTM: Proper import for testing the new enum.

The import correctly adds the BackendType enum for unit testing.

64-74: LGTM: Comprehensive BackendType Enum Tests
The new tests cover both display_name and value for all BackendType members, which would surface any recursion errors in the enum implementation. No code changes are needed.

If you’d still like to manually inspect the properties in isolation, run from the repository root so that tensorrt_llm is on PYTHONPATH:
export PYTHONPATH=$(pwd)
python3 <<EOF
from tensorrt_llm.llmapi.utils import BackendType
for member in (BackendType.PYTORCH, BackendType.TENSORRT, BackendType._AUTODEPLOY):
    print(member, member.display_name, member.value)
EOF

requirements.txt

tensorrt_llm/llmapi/utils.py

Superjomn · 2025-07-27T03:13:08Z

/bot run

Superjomn · 2025-07-27T03:13:54Z

/bot run

coderabbitai

Actionable comments posted: 1

🔭 Outside diff range comments (1)

tests/unittest/llmapi/test_utils.py (1)

1-30: Add required NVIDIA copyright header.

According to the coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header that includes the current year.

Add the copyright header at the top of the file:

+# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 from tensorrt_llm.llmapi.utils import BackendType, get_backend_repr

🧹 Nitpick comments (2)

tests/unittest/llmapi/test_utils.py (2)
4-14: Consider renaming test method for better clarity.

The test structure and logic look good, but the method name test_value is somewhat generic. Consider renaming it to test_canonical_value to more precisely describe what property is being tested.
-    def test_value(self):
+    def test_canonical_value(self):
17-29: Consider adding edge case tests for better coverage.

The current tests cover the main functionality well, but consider adding tests for edge cases to ensure robustness:
def test_get_backend_repr_edge_cases():
    # Test invalid backend names
    with pytest.raises(ValueError):  # or whatever exception is expected
        get_backend_repr("invalid_backend")
    
    # Test empty string
    with pytest.raises(ValueError):
        get_backend_repr("")
    
    # Test None value
    with pytest.raises((ValueError, TypeError)):
        get_backend_repr(None)
This would help verify the function's behavior with unexpected inputs, which is important since it's used in user-facing commands.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 34efdfe and 1c52e02.

📒 Files selected for processing (2)

tensorrt_llm/llmapi/utils.py (2 hunks)
tests/unittest/llmapi/test_utils.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tensorrt_llm/llmapi/utils.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

tests/unittest/llmapi/test_utils.py

**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

tests/unittest/llmapi/test_utils.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (1)

tests/unittest/llmapi/test_utils.py (1)

1-1: LGTM! Import follows namespace guidelines.

The import correctly maintains the full namespace path and imports only the specific items needed for testing.

tests/unittest/llmapi/test_utils.py

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (1)

tensorrt_llm/llmapi/utils.py (1)

670-688: Fix docstring formatting issues.

The static analysis correctly identifies docstring formatting problems. The docstring needs proper formatting and the typo should be fixed.

Apply this diff to fix the docstring formatting:

-    # TODO[Superjom]: Remove this method after v1.0.0 is released.
-    @staticmethod
-    def get_default_backend_with_warning(
-            backend: Optional[str]) -> "BackendType":
-        """ Warn the user if the backend is not set, as we changed the default
-        backend to from tensorrt topytorch from v1.0 """
+    # TODO[Superjom]: Remove this method after v1.0.0 is released.
+    @staticmethod
+    def get_default_backend_with_warning(
+            backend: Optional[str]) -> "BackendType":
+        """Warn the user if the backend is not set, as we changed the default backend from tensorrt to pytorch from v1.0.
+        
+        Args:
+            backend: Optional backend string to validate
+            
+        Returns:
+            Valid backend string (canonical value)
+        """

The method logic is correct for handling default backend assignment with warnings.

🧹 Nitpick comments (1)

tensorrt_llm/commands/serve.py (1)
294-297: Fix docstring punctuation.

The docstring should end with proper punctuation as indicated by the static analysis tool.

Apply this diff to fix the docstring:
-    """Running an OpenAI API compatible server
+    """Running an OpenAI API compatible server.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 90a0bfb and 300ac5d.

📒 Files selected for processing (15)

tensorrt_llm/bench/benchmark/low_latency.py (7 hunks)
tensorrt_llm/bench/benchmark/throughput.py (10 hunks)
tensorrt_llm/bench/benchmark/utils/general.py (0 hunks)
tensorrt_llm/commands/serve.py (6 hunks)
tensorrt_llm/llmapi/__init__.py (2 hunks)
tensorrt_llm/llmapi/utils.py (2 hunks)
tests/integration/defs/stress_test/stress_test.py (2 hunks)
tests/integration/defs/test_e2e.py (4 hunks)
tests/integration/test_lists/test-db/l0_a10.yml (2 hunks)
tests/unittest/llmapi/apps/_test_openai_chat.py (2 hunks)
tests/unittest/llmapi/apps/_test_openai_completions.py (2 hunks)
tests/unittest/llmapi/apps/_test_openai_misc.py (1 hunks)
tests/unittest/llmapi/apps/_test_openai_multi_gpu.py (1 hunks)
tests/unittest/llmapi/apps/_test_openai_reasoning.py (1 hunks)
tests/unittest/llmapi/test_utils.py (2 hunks)

💤 Files with no reviewable changes (1)

tensorrt_llm/bench/benchmark/utils/general.py

🚧 Files skipped from review as they are similar to previous changes (11)

tests/unittest/llmapi/apps/_test_openai_multi_gpu.py
tensorrt_llm/llmapi/init.py
tests/integration/defs/stress_test/stress_test.py
tests/unittest/llmapi/apps/_test_openai_reasoning.py
tests/unittest/llmapi/apps/_test_openai_misc.py
tests/integration/test_lists/test-db/l0_a10.yml
tests/unittest/llmapi/test_utils.py
tests/unittest/llmapi/apps/_test_openai_chat.py
tests/integration/defs/test_e2e.py
tensorrt_llm/bench/benchmark/low_latency.py
tests/unittest/llmapi/apps/_test_openai_completions.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case, and prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without reflection.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

tensorrt_llm/bench/benchmark/throughput.py
tensorrt_llm/commands/serve.py
tensorrt_llm/llmapi/utils.py

**/*.{cpp,h,cu,py,cc,cxx,hpp}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

tensorrt_llm/bench/benchmark/throughput.py
tensorrt_llm/commands/serve.py
tensorrt_llm/llmapi/utils.py

🧠 Learnings (6)

📓 Common learnings

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

tensorrt_llm/bench/benchmark/throughput.py
tensorrt_llm/commands/serve.py
tensorrt_llm/llmapi/utils.py

📚 Learning: in tensorrt_llm/executor/worker.py, the lora adapter cache optimization logic that checks `is_adapte...

Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

Applied to files:

tensorrt_llm/bench/benchmark/throughput.py

📚 Learning: in tensorrt-llm's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()...

Learnt from: yechank-nvidia
PR: NVIDIA/TensorRT-LLM#6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()` is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call `strip_for_generation()` to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

Applied to files:

tensorrt_llm/bench/benchmark/throughput.py

📚 Learning: applies to **/*.{cpp,h,hpp,cc,cxx} : when practical, a switch statement controlled by an enum should...

Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-05T06:09:30.350Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx} : When practical, a switch statement controlled by an enum should have a case for each enum value and not have a default clause.

Applied to files:

tensorrt_llm/bench/benchmark/throughput.py
tensorrt_llm/commands/serve.py

📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

tensorrt_llm/bench/benchmark/throughput.py
tensorrt_llm/commands/serve.py

🪛 Ruff (0.12.2)

tensorrt_llm/commands/serve.py

294-297: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)

tensorrt_llm/llmapi/utils.py

671-673: 1 blank line required between summary line and description

(D205)

671-673: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (23)

tensorrt_llm/llmapi/utils.py (6)

23-23: LGTM!

The import of MultiValueEnum from aenum is appropriate for creating the backend enum with multiple aliases per member.

640-648: LGTM!

The BackendType enum class definition is well-structured with appropriate multi-value aliases. The enum members correctly define display names, canonical values, and alternative aliases for each backend type.

650-652: LGTM!

The canonical_value property correctly returns the canonical string identifier (second value in the tuple) for each backend type.

654-657: LGTM!

The default_value method correctly returns the canonical value of the PYTORCH backend, providing a centralized default backend selection.

659-662: LGTM!

The canonical_values method provides a clean way to get all supported backend canonical values, effectively replacing the previous global ALL_SUPPORTED_BACKENDS list.

665-668: LGTM!

The print_backend_info method provides a consistent way to log backend information across different commands and modules.

tensorrt_llm/bench/benchmark/throughput.py (10)

30-30: LGTM!

The import of BackendType from tensorrt_llm.llmapi correctly replaces the previous ALL_SUPPORTED_BACKENDS import, providing better type safety and centralized backend management.

47-51: LGTM!

The CLI option update correctly uses BackendType.canonical_values() for choices and defers default assignment to runtime, enabling the warning mechanism for the default backend change.

257-259: LGTM!

The backend processing correctly uses the new BackendType methods for default assignment with warning and info printing, implementing the intended backward compatibility mechanism.

272-272: LGTM!

The backend variable is now properly typed as BackendType instead of a string, providing better type safety and preventing potential errors.

313-313: LGTM!

The backend comparison using BackendType.TENSORRT instead of string literal provides better type safety and consistency.

324-324: LGTM!

The backend comparison using BackendType.TENSORRT enum member is consistent with the new typing approach.

340-340: LGTM!

The error message correctly uses BackendType.canonical_values() to provide users with the current list of supported backends.

373-376: LGTM!

The backend conversion logic correctly handles the case where exec_backend is a BackendType instance, converting it to its canonical string value for runtime configuration.

395-395: LGTM!

Using backend.canonical_value for the kwargs ensures the runtime receives the correct string identifier.

397-397: LGTM!

The comparison using BackendType.PYTORCH enum member is consistent with the new typing approach and more maintainable than string literals.

tensorrt_llm/commands/serve.py (7)

28-28: LGTM!

The import of BackendType from tensorrt_llm.llmapi.utils is consistent with the migration to the unified backend enum approach.

75-75: LGTM!

The backend parameter type change from str to BackendType with default BackendType.PYTORCH improves type safety and consistency.

141-141: LGTM!

The backend comparison using BackendType.PYTORCH enum member instead of string literal provides better type safety and consistency.

164-164: LGTM!

The backend comparison in launch_server correctly uses BackendType.PYTORCH enum member for type-safe backend checking.

189-194: LGTM!

The CLI option update correctly uses BackendType.canonical_values() for choices and defers default assignment to runtime, consistent with the approach in benchmark commands.

283-293: LGTM!

The serve function signature correctly maintains the Optional[str] type for the CLI parameter while the function body properly converts it to BackendType.

300-301: LGTM!

The backend processing correctly uses BackendType.get_default_backend_with_warning() for default assignment with warning and BackendType.print_backend_info() for consistent logging.

tensorrt-cicd · 2025-08-05T07:56:55Z

PR_Github #14097 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10637 completed with status: 'FAILURE'

Superjomn · 2025-08-05T11:23:15Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-05T11:28:57Z

PR_Github #14146 [ run ] triggered by Bot

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tensorrt_llm/llmapi/utils.py (1)
640-689: Well-designed BackendType enum implementation.

The BackendType enum effectively unifies backend string representations across the codebase with the following strengths:

Clear separation of display names and canonical values using MultiValueEnum

Proper __str__ implementation returning the display name

Comprehensive static methods for unified behavior across CLI commands

Appropriate warning handling for the default backend change from TensorRT to PyTorch

The implementation follows coding guidelines and successfully addresses the PR objectives of unifying backend strings and warning about default backend changes.

Consider addressing the minor docstring formatting issue flagged by static analysis:
-        """ Warn the user if the backend is not set, as we changed the default
-        backend to from tensorrt topytorch from v1.0 """
+        """Warn the user if the backend is not set, as we changed the default backend from tensorrt to pytorch from v1.0.
+        
+        Args:
+            backend: Optional backend string to validate
+            
+        Returns:
+            BackendType instance for the specified or default backend
+        """

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cdc0003 and 8762c7a.

📒 Files selected for processing (15)

tensorrt_llm/bench/benchmark/low_latency.py (7 hunks)
tensorrt_llm/bench/benchmark/throughput.py (10 hunks)
tensorrt_llm/bench/benchmark/utils/general.py (0 hunks)
tensorrt_llm/commands/serve.py (6 hunks)
tensorrt_llm/llmapi/__init__.py (2 hunks)
tensorrt_llm/llmapi/utils.py (2 hunks)
tests/integration/defs/stress_test/stress_test.py (2 hunks)
tests/integration/defs/test_e2e.py (4 hunks)
tests/integration/test_lists/test-db/l0_a10.yml (2 hunks)
tests/unittest/llmapi/apps/_test_openai_chat.py (2 hunks)
tests/unittest/llmapi/apps/_test_openai_completions.py (2 hunks)
tests/unittest/llmapi/apps/_test_openai_misc.py (1 hunks)
tests/unittest/llmapi/apps/_test_openai_multi_gpu.py (1 hunks)
tests/unittest/llmapi/apps/_test_openai_reasoning.py (1 hunks)
tests/unittest/llmapi/test_utils.py (2 hunks)

💤 Files with no reviewable changes (1)

tensorrt_llm/bench/benchmark/utils/general.py

🚧 Files skipped from review as they are similar to previous changes (12)

tensorrt_llm/llmapi/init.py
tests/unittest/llmapi/apps/_test_openai_chat.py
tests/integration/defs/stress_test/stress_test.py
tests/unittest/llmapi/apps/_test_openai_reasoning.py
tests/unittest/llmapi/apps/_test_openai_multi_gpu.py
tests/integration/test_lists/test-db/l0_a10.yml
tests/unittest/llmapi/apps/_test_openai_misc.py
tensorrt_llm/bench/benchmark/low_latency.py
tests/unittest/llmapi/apps/_test_openai_completions.py
tests/unittest/llmapi/test_utils.py
tensorrt_llm/bench/benchmark/throughput.py
tests/integration/defs/test_e2e.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without reflection.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

tensorrt_llm/commands/serve.py
tensorrt_llm/llmapi/utils.py

**/*.{cpp,h,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

tensorrt_llm/commands/serve.py
tensorrt_llm/llmapi/utils.py

🧠 Learnings (5)

📓 Common learnings

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

tensorrt_llm/commands/serve.py
tensorrt_llm/llmapi/utils.py

📚 Learning: applies to **/*.py : the code developed for tensorrt-llm should conform to python 3.8+....

Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-05T07:49:09.844Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.

Applied to files:

tensorrt_llm/commands/serve.py
tensorrt_llm/llmapi/utils.py

📚 Learning: applies to **/*.{cpp,h,hpp,cc,cxx} : when practical, a switch statement controlled by an enum should...

Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-05T07:49:09.844Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx} : When practical, a switch statement controlled by an enum should have a case for each enum value and not have a default clause.

Applied to files:

tensorrt_llm/commands/serve.py

📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

tensorrt_llm/commands/serve.py

🧬 Code Graph Analysis (1)

tensorrt_llm/llmapi/utils.py (2)

tests/unittest/llmapi/apps/_test_openai_chat.py (1)

backend (24-25)

tensorrt_llm/logger.py (1)

warning (131-132)

🪛 Ruff (0.12.2)

tensorrt_llm/commands/serve.py

294-297: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)

tensorrt_llm/llmapi/utils.py

671-673: 1 blank line required between summary line and description

(D205)

671-673: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (7)

tensorrt_llm/llmapi/utils.py (1)

23-23: LGTM!

The import of MultiValueEnum from aenum is correctly placed and necessary for implementing the BackendType enum with multiple string aliases per backend.

tensorrt_llm/commands/serve.py (6)

28-28: LGTM!

The import of BackendType is correctly placed and necessary for implementing unified backend handling in the serve command.

75-75: LGTM!

The parameter type change from Optional[str] to BackendType with default BackendType.PYTORCH improves type safety and aligns with the unified backend approach. The default value correctly reflects the new PyTorch default mentioned in the PR objectives.

141-141: LGTM!

The backend comparison update from string-based to enum-based comparison (backend == BackendType.PYTORCH) improves type safety and maintainability while preserving the same logic.

164-164: LGTM!

The backend comparison update in launch_server correctly uses enum-based comparison (backend == BackendType.PYTORCH) instead of string comparison, maintaining the same logic while improving type safety.

189-194: Excellent CLI option update for backend consistency.

The changes effectively implement unified backend handling:

click.Choice(BackendType.canonical_values()) ensures CLI choices match the enum definition

default=None defers default assignment to enable the warning mechanism

Help text correctly uses 'tensorrt' (canonical value) for consistency

These changes align perfectly with the PR objectives of unifying backend strings and supporting the default backend change warning.

283-301: Excellent implementation of unified backend handling.

The serve function updates successfully implement the PR objectives:

BackendType.get_default_backend_with_warning(backend) handles both default assignment and the warning about the default backend change from TensorRT to PyTorch

BackendType.print_backend_info(backend) provides consistent backend logging

The implementation maintains backward compatibility while enabling type-safe enum-based backend handling

The code correctly processes the CLI string input into a BackendType instance that's used throughout the rest of the function, ensuring consistency with the unified backend approach.

FrankD412 · 2025-08-05T16:39:57Z

@Superjomn -- just a heads up, I had someone find an issue with the latency sub-command in trtllm-bench so I have a PR out to consolidate some of the logic. I'm waiting on this PR. See #6458

tensorrt-cicd · 2025-08-05T18:41:25Z

PR_Github #14146 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10673 completed with status: 'FAILURE'

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tensorrt_llm/commands/serve.py (1)
301-304: Add missing punctuation to docstring.

The docstring should end with proper punctuation according to Python documentation standards.
-    """Running an OpenAI API compatible server
+    """Running an OpenAI API compatible server.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f565d76 and dd082ca.

📒 Files selected for processing (3)

tensorrt_llm/bench/benchmark/throughput.py (10 hunks)
tensorrt_llm/commands/serve.py (6 hunks)
tests/integration/test_lists/test-db/l0_a10.yml (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

tests/integration/test_lists/test-db/l0_a10.yml
tensorrt_llm/bench/benchmark/throughput.py

🧰 Additional context used

📓 Path-based instructions (2)

**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

tensorrt_llm/commands/serve.py

**/*.{cpp,h,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

tensorrt_llm/commands/serve.py

🧠 Learnings (5)

📓 Common learnings

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

tensorrt_llm/commands/serve.py

📚 Learning: applies to **/*.py : the code developed for tensorrt-llm should conform to python 3.8+....

Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T00:54:56.009Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.

Applied to files:

tensorrt_llm/commands/serve.py

📚 Learning: applies to **/*.{cpp,h,hpp,cc,cxx} : when practical, a switch statement controlled by an enum should...

Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T00:54:56.009Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx} : When practical, a switch statement controlled by an enum should have a case for each enum value and not have a default clause.

Applied to files:

tensorrt_llm/commands/serve.py

📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

tensorrt_llm/commands/serve.py

🧬 Code Graph Analysis (1)

tensorrt_llm/commands/serve.py (1)

tensorrt_llm/llmapi/utils.py (7)

BackendType (640-688)

get (354-367)

get (420-437)

get_default_backend_with_warning (672-688)

canonical_value (651-652)

canonical_values (660-662)

print_backend_info (666-668)

🪛 Ruff (0.12.2)

tensorrt_llm/commands/serve.py

301-304: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (7)

tensorrt_llm/commands/serve.py (7)

28-28: Import addition looks good.

The BackendType import is necessary for the enum integration and follows proper Python import conventions.

75-75: Good type safety improvement.

Changing the parameter type from string to BackendType with a proper enum default improves type safety and consistency.

141-141: Enum comparison is more type-safe.

Using backend == BackendType.PYTORCH instead of string comparison improves type safety and consistency with the enum integration.

161-167: Robust backend validation and conversion.

The backend handling logic properly validates and converts between string and BackendType instances, with appropriate error handling for invalid types. This supports a smooth transition during the enum integration.

171-171: Consistent enum usage.

The enum comparison backend == BackendType.PYTORCH is consistent with the type-safe approach used throughout the refactor.

196-201: Excellent CLI option improvements.

Using BackendType.canonical_values() ensures CLI choices stay synchronized with the enum definition. The default=None approach allows proper default handling with warnings, and the help text now uses the canonical 'tensorrt' naming for consistency.

290-308: Clean backend integration with proper user feedback.

The serve function properly converts the CLI string parameter to BackendType using the warning method and provides user feedback via print_backend_info. This maintains CLI compatibility while leveraging the enum benefits internally.

Superjomn · 2025-08-06T01:26:55Z

/bot run --disable-fail-fast

tensorrt-cicd · 2025-08-06T01:32:05Z

PR_Github #14200 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-06T07:25:47Z

PR_Github #14200 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10725 completed with status: 'FAILURE'

Signed-off-by: Superjomn <[email protected]>

Superjomn · 2025-08-06T08:09:25Z

/bot run --stage-list "A10-PyTorch-1, H100_PCIe-PyTorch-1"

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tensorrt_llm/llmapi/utils.py (1)

670-688: Fix docstring formatting issues.

The method implementation is correct, but the docstring needs formatting improvements based on static analysis hints.

Apply this diff to fix the docstring formatting:

-    def get_default_backend_with_warning(
-            backend: Optional[str]) -> "BackendType":
-        """ Warn the user if the backend is not set, as we changed the default
-        backend to from tensorrt topytorch from v1.0 """
+    def get_default_backend_with_warning(
+            backend: Optional[str]) -> "BackendType":
+        """Warn the user if the backend is not set.
+        
+        We changed the default backend from tensorrt to pytorch from v1.0.
+        
+        Args:
+            backend: Optional backend string to validate
+            
+        Returns:
+            BackendType instance for the validated backend
+        """

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 698af17 and 30f134c.

📒 Files selected for processing (16)

tensorrt_llm/bench/benchmark/low_latency.py (7 hunks)
tensorrt_llm/bench/benchmark/throughput.py (10 hunks)
tensorrt_llm/bench/benchmark/utils/general.py (0 hunks)
tensorrt_llm/commands/serve.py (6 hunks)
tensorrt_llm/llmapi/__init__.py (2 hunks)
tensorrt_llm/llmapi/llm.py (1 hunks)
tensorrt_llm/llmapi/utils.py (2 hunks)
tests/integration/defs/stress_test/stress_test.py (2 hunks)
tests/integration/defs/test_e2e.py (4 hunks)
tests/integration/test_lists/test-db/l0_a10.yml (2 hunks)
tests/unittest/llmapi/apps/_test_openai_chat.py (2 hunks)
tests/unittest/llmapi/apps/_test_openai_completions.py (2 hunks)
tests/unittest/llmapi/apps/_test_openai_misc.py (1 hunks)
tests/unittest/llmapi/apps/_test_openai_multi_gpu.py (1 hunks)
tests/unittest/llmapi/apps/_test_openai_reasoning.py (1 hunks)
tests/unittest/llmapi/test_utils.py (2 hunks)

💤 Files with no reviewable changes (1)

tensorrt_llm/bench/benchmark/utils/general.py

✅ Files skipped from review due to trivial changes (1)

tensorrt_llm/llmapi/llm.py

🚧 Files skipped from review as they are similar to previous changes (12)

tensorrt_llm/llmapi/init.py
tests/integration/defs/stress_test/stress_test.py
tests/unittest/llmapi/apps/_test_openai_reasoning.py
tests/unittest/llmapi/test_utils.py
tests/unittest/llmapi/apps/_test_openai_completions.py
tests/integration/defs/test_e2e.py
tests/unittest/llmapi/apps/_test_openai_misc.py
tests/integration/test_lists/test-db/l0_a10.yml
tests/unittest/llmapi/apps/_test_openai_chat.py
tensorrt_llm/bench/benchmark/low_latency.py
tests/unittest/llmapi/apps/_test_openai_multi_gpu.py
tensorrt_llm/bench/benchmark/throughput.py

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...

Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Applied to files:

tensorrt_llm/commands/serve.py
tensorrt_llm/llmapi/utils.py

📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...

Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

tensorrt_llm/commands/serve.py

🪛 Ruff (0.12.2)

tensorrt_llm/commands/serve.py

299-302: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)

tensorrt_llm/llmapi/utils.py

671-673: 1 blank line required between summary line and description

(D205)

671-673: First line should end with a period, question mark, or exclamation point

Add closing punctuation

(D415)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Pre-commit Check

🔇 Additional comments (12)

tensorrt_llm/llmapi/utils.py (4)

23-23: LGTM!

The import of MultiValueEnum from aenum is correctly placed and necessary for the new BackendType enum implementation.

640-649: LGTM!

The BackendType enum class design is well-structured, using MultiValueEnum to support multiple aliases per backend. The __str__ method correctly returns the display name (first value) as requested in past reviews.

650-652: LGTM!

The canonical_value property correctly returns the canonical backend identifier (second value in the tuple).

654-668: LGTM!

The static utility methods provide a clean and consistent API for backend handling. The implementation correctly extracts canonical values and provides informative logging.

tensorrt_llm/commands/serve.py (8)

28-28: LGTM!

The import of BackendType is correctly placed and necessary for the backend standardization changes.

75-75: LGTM!

The parameter type change to BackendType with a sensible default improves type safety and consistency.

141-141: LGTM!

The backend assignment logic correctly uses BackendType.PYTORCH enum comparison instead of string literals.

161-166: LGTM!

The backend validation and conversion logic properly handles both BackendType instances and string values, with appropriate warning for missing backends.

169-169: LGTM!

The backend comparison now correctly uses the BackendType.PYTORCH enum instead of string literals.

194-199: LGTM!

The CLI option updates correctly use BackendType.canonical_values() for choices and defer default assignment to provide user warnings. The help text now consistently uses "tensorrt" as requested in past reviews.

288-298: LGTM!

The function signature updates properly handle the backend parameter as an optional string, maintaining CLI compatibility.

305-306: LGTM!

The backend processing and logging using BackendType methods provides consistent behavior and user feedback across the codebase.

tensorrt-cicd · 2025-08-06T08:15:26Z

PR_Github #14272 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-06T11:30:53Z

PR_Github #14272 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10776 (Partly Tested) completed with status: 'FAILURE'

Superjomn · 2025-08-06T14:11:49Z

Given that this introduces a breaking change so close to the 1.0 code freeze, I’d like to postpone merging this PR until after the 1.0 release. In the meantime, I’ll update it to:

Accept both trt and tensorrt as valid backends to avoid an unexpected break
Emit a deprecation warning when trt is used

Superjomn requested a review from a team as a code owner July 25, 2025 03:20

Superjomn requested a review from FrankD412 July 25, 2025 03:20

coderabbitai bot reviewed Jul 25, 2025

View reviewed changes

tensorrt_llm/bench/benchmark/throughput.py Outdated Show resolved Hide resolved

Superjomn requested review from LinPoly and kaiyux July 25, 2025 03:24

Superjomn force-pushed the add-warning-bench-serve branch from 119b6fe to 27dc49c Compare July 25, 2025 03:30

Superjomn requested a review from a team as a code owner July 25, 2025 03:30

Superjomn requested review from juney-nvidia and removed request for juney-nvidia July 25, 2025 03:30

FrankD412 reviewed Jul 25, 2025

View reviewed changes

tensorrt_llm/bench/benchmark/throughput.py Outdated Show resolved Hide resolved

Superjomn changed the title ~~chore: add waring for the default backend on serve and bench commands~~ chore: add warning for the default backend on serve and bench commands Jul 25, 2025

Superjomn force-pushed the add-warning-bench-serve branch from 27dc49c to 4324128 Compare July 25, 2025 06:33

Superjomn force-pushed the add-warning-bench-serve branch from aba9fb8 to 34efdfe Compare July 27, 2025 01:54

coderabbitai bot requested review from litaotju and nv-guomingz July 27, 2025 01:54

coderabbitai bot reviewed Jul 27, 2025

View reviewed changes

requirements.txt Outdated Show resolved Hide resolved

tensorrt_llm/llmapi/utils.py Outdated Show resolved Hide resolved

Superjomn force-pushed the add-warning-bench-serve branch from 34efdfe to 1a558c2 Compare July 27, 2025 03:12

Superjomn force-pushed the add-warning-bench-serve branch from 1a558c2 to 1c52e02 Compare July 27, 2025 03:13

coderabbitai bot reviewed Jul 27, 2025

View reviewed changes

tests/unittest/llmapi/test_utils.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Aug 5, 2025

View reviewed changes

FrankD412 approved these changes Aug 5, 2025

View reviewed changes

Superjomn force-pushed the add-warning-bench-serve branch 2 times, most recently from cdc0003 to 8762c7a Compare August 5, 2025 11:23

coderabbitai bot reviewed Aug 5, 2025

View reviewed changes

FrankD412 mentioned this pull request Aug 5, 2025

[None][fix] Update to pull LLM from a central location. #6458

Merged

Superjomn force-pushed the add-warning-bench-serve branch 2 times, most recently from f565d76 to dd082ca Compare August 6, 2025 01:19

coderabbitai bot reviewed Aug 6, 2025

View reviewed changes

Superjomn force-pushed the add-warning-bench-serve branch from dd082ca to 698af17 Compare August 6, 2025 01:26

Superjomn added the Release Blocker PRs that blocking the final release build or branching out the release branch label Aug 6, 2025

Superjomn added 3 commits August 6, 2025 16:06

init

8812113

Signed-off-by: Superjomn <[email protected]>

fix

2db961c

Signed-off-by: Superjomn <[email protected]>

fix list

30f134c

Signed-off-by: Superjomn <[email protected]>

Superjomn force-pushed the add-warning-bench-serve branch from 698af17 to 30f134c Compare August 6, 2025 08:07

coderabbitai bot reviewed Aug 6, 2025

View reviewed changes

Superjomn removed the Release Blocker PRs that blocking the final release build or branching out the release branch label Aug 6, 2025

chzblych removed their request for review September 2, 2025 05:12

[None][chore] unify the backend strings and warn the default backend change #6350

Are you sure you want to change the base?

[None][chore] unify the backend strings and warn the default backend change #6350

Uh oh!

Conversation

Superjomn commented Jul 25, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary by CodeRabbit

Summary by CodeRabbit

Description

Test Coverage

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

coderabbitai bot commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Superjomn commented Jul 25, 2025

Uh oh!

tensorrt-cicd commented Jul 25, 2025

Uh oh!

tensorrt-cicd commented Jul 25, 2025

Uh oh!

Superjomn commented Jul 25, 2025

Uh oh!

tensorrt-cicd commented Jul 25, 2025

Uh oh!

tensorrt-cicd commented Jul 25, 2025

Uh oh!

FrankD412 commented Jul 25, 2025

Uh oh!

Superjomn commented Jul 26, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Superjomn commented Jul 27, 2025

Uh oh!

Superjomn commented Jul 27, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tensorrt-cicd commented Aug 5, 2025

Uh oh!

Superjomn commented Aug 5, 2025

Uh oh!

tensorrt-cicd commented Aug 5, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

FrankD412 commented Aug 5, 2025

Uh oh!

Superjomn commented Jul 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 25, 2025 •

edited

Loading