-
Notifications
You must be signed in to change notification settings - Fork 2k
[None][chore] unify the backend strings and warn the default backend change #6350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughA new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI_Command
participant Utils
User->>CLI_Command: Run command with --backend option (or none)
CLI_Command->>CLI_Command: Check if backend is None
alt backend is None
CLI_Command->>CLI_Command: Log warning about default backend "pytorch"
CLI_Command->>CLI_Command: Set backend to "pytorch"
end
CLI_Command->>Utils: get_default_backend_with_warning(backend)
Utils-->>CLI_Command: Return validated backend string
CLI_Command->>Utils: print_backend_info(backend)
CLI_Command->>User: Continue command execution
Estimated code review effort🎯 2 (Simple) | ⏱️ ~7 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
tensorrt_llm/bench/benchmark/low_latency.py (1)
182-186: Same mapping / duplication issue as in throughput pathReplicate the fix suggested for
throughput_commandto avoid three divergent copies of the logic.
🧹 Nitpick comments (2)
tensorrt_llm/bench/benchmark/throughput.py (1)
1-3: Missing NVIDIA copyright headerTensorRT-LLM OSS policy requires the header at the top of every source file. Please add it while touching this file.
tensorrt_llm/commands/serve.py (1)
268-272: Backend alias “trt” is not recognised by the message
backend_repr = "PyTorch" if backend == "pytorch" else "TensorRT"treats anything other than"pytorch"as TensorRT.
Small but visible UX bug: if the user passes--backend trt, the warning still prints “TensorRT” correctly, but if a new backend (_autodeploy) slips through later, the wording becomes wrong.Reuse a shared helper (see comment in throughput.py) to keep the mapping single-sourced.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
tensorrt_llm/bench/benchmark/low_latency.py(1 hunks)tensorrt_llm/bench/benchmark/throughput.py(1 hunks)tensorrt_llm/commands/serve.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tensorrt_llm/bench/benchmark/low_latency.pytensorrt_llm/bench/benchmark/throughput.pytensorrt_llm/commands/serve.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tensorrt_llm/bench/benchmark/low_latency.pytensorrt_llm/bench/benchmark/throughput.pytensorrt_llm/commands/serve.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
119b6fe to
27dc49c
Compare
|
/bot run |
|
PR_Github #12949 [ run ] triggered by Bot |
|
PR_Github #12949 [ run ] completed with state |
27dc49c to
4324128
Compare
|
/bot run |
|
PR_Github #12968 [ run ] triggered by Bot |
|
PR_Github #12968 [ run ] completed with state |
|
@Superjomn -- ping, accidentally rebased your branch. Was in the wrong tab. |
|
@FrankD412 NVM, I will reset the commit and rebase it again. |
aba9fb8 to
34efdfe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
requirements.txt(1 hunks)tensorrt_llm/bench/benchmark/low_latency.py(1 hunks)tensorrt_llm/bench/benchmark/throughput.py(2 hunks)tensorrt_llm/commands/serve.py(2 hunks)tensorrt_llm/llmapi/utils.py(2 hunks)tests/unittest/llmapi/test_llm_utils.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- tensorrt_llm/bench/benchmark/low_latency.py
- tensorrt_llm/bench/benchmark/throughput.py
- tensorrt_llm/commands/serve.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tests/unittest/llmapi/test_llm_utils.pytensorrt_llm/llmapi/utils.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tests/unittest/llmapi/test_llm_utils.pytensorrt_llm/llmapi/utils.py
🔇 Additional comments (4)
tensorrt_llm/llmapi/utils.py (2)
23-23: LGTM: Proper import for MultiValueEnum support.The import correctly adds support for the new BackendType enum implementation.
572-573: LGTM: Clean utility function implementation.The function provides a simple interface to get human-readable backend names using the new enum.
tests/unittest/llmapi/test_llm_utils.py (2)
7-7: LGTM: Proper import for testing the new enum.The import correctly adds the BackendType enum for unit testing.
64-74: LGTM: Comprehensive BackendType Enum Tests
The new tests cover bothdisplay_nameandvaluefor allBackendTypemembers, which would surface any recursion errors in the enum implementation. No code changes are needed.If you’d still like to manually inspect the properties in isolation, run from the repository root so that
tensorrt_llmis onPYTHONPATH:export PYTHONPATH=$(pwd) python3 <<EOF from tensorrt_llm.llmapi.utils import BackendType for member in (BackendType.PYTORCH, BackendType.TENSORRT, BackendType._AUTODEPLOY): print(member, member.display_name, member.value) EOF
34efdfe to
1a558c2
Compare
|
/bot run |
1a558c2 to
1c52e02
Compare
|
/bot run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
tests/unittest/llmapi/test_utils.py (1)
1-30: Add required NVIDIA copyright header.According to the coding guidelines, all TensorRT-LLM source files should contain an NVIDIA copyright header that includes the current year.
Add the copyright header at the top of the file:
+# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + from tensorrt_llm.llmapi.utils import BackendType, get_backend_repr
🧹 Nitpick comments (2)
tests/unittest/llmapi/test_utils.py (2)
4-14: Consider renaming test method for better clarity.The test structure and logic look good, but the method name
test_valueis somewhat generic. Consider renaming it totest_canonical_valueto more precisely describe what property is being tested.- def test_value(self): + def test_canonical_value(self):
17-29: Consider adding edge case tests for better coverage.The current tests cover the main functionality well, but consider adding tests for edge cases to ensure robustness:
def test_get_backend_repr_edge_cases(): # Test invalid backend names with pytest.raises(ValueError): # or whatever exception is expected get_backend_repr("invalid_backend") # Test empty string with pytest.raises(ValueError): get_backend_repr("") # Test None value with pytest.raises((ValueError, TypeError)): get_backend_repr(None)This would help verify the function's behavior with unexpected inputs, which is important since it's used in user-facing commands.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
tensorrt_llm/llmapi/utils.py(2 hunks)tests/unittest/llmapi/test_utils.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- tensorrt_llm/llmapi/utils.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tests/unittest/llmapi/test_utils.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tests/unittest/llmapi/test_utils.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (1)
tests/unittest/llmapi/test_utils.py (1)
1-1: LGTM! Import follows namespace guidelines.The import correctly maintains the full namespace path and imports only the specific items needed for testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
tensorrt_llm/llmapi/utils.py (1)
670-688: Fix docstring formatting issues.The static analysis correctly identifies docstring formatting problems. The docstring needs proper formatting and the typo should be fixed.
Apply this diff to fix the docstring formatting:
- # TODO[Superjom]: Remove this method after v1.0.0 is released. - @staticmethod - def get_default_backend_with_warning( - backend: Optional[str]) -> "BackendType": - """ Warn the user if the backend is not set, as we changed the default - backend to from tensorrt topytorch from v1.0 """ + # TODO[Superjom]: Remove this method after v1.0.0 is released. + @staticmethod + def get_default_backend_with_warning( + backend: Optional[str]) -> "BackendType": + """Warn the user if the backend is not set, as we changed the default backend from tensorrt to pytorch from v1.0. + + Args: + backend: Optional backend string to validate + + Returns: + Valid backend string (canonical value) + """The method logic is correct for handling default backend assignment with warnings.
🧹 Nitpick comments (1)
tensorrt_llm/commands/serve.py (1)
294-297: Fix docstring punctuation.The docstring should end with proper punctuation as indicated by the static analysis tool.
Apply this diff to fix the docstring:
- """Running an OpenAI API compatible server + """Running an OpenAI API compatible server.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
tensorrt_llm/bench/benchmark/low_latency.py(7 hunks)tensorrt_llm/bench/benchmark/throughput.py(10 hunks)tensorrt_llm/bench/benchmark/utils/general.py(0 hunks)tensorrt_llm/commands/serve.py(6 hunks)tensorrt_llm/llmapi/__init__.py(2 hunks)tensorrt_llm/llmapi/utils.py(2 hunks)tests/integration/defs/stress_test/stress_test.py(2 hunks)tests/integration/defs/test_e2e.py(4 hunks)tests/integration/test_lists/test-db/l0_a10.yml(2 hunks)tests/unittest/llmapi/apps/_test_openai_chat.py(2 hunks)tests/unittest/llmapi/apps/_test_openai_completions.py(2 hunks)tests/unittest/llmapi/apps/_test_openai_misc.py(1 hunks)tests/unittest/llmapi/apps/_test_openai_multi_gpu.py(1 hunks)tests/unittest/llmapi/apps/_test_openai_reasoning.py(1 hunks)tests/unittest/llmapi/test_utils.py(2 hunks)
💤 Files with no reviewable changes (1)
- tensorrt_llm/bench/benchmark/utils/general.py
🚧 Files skipped from review as they are similar to previous changes (11)
- tests/unittest/llmapi/apps/_test_openai_multi_gpu.py
- tensorrt_llm/llmapi/init.py
- tests/integration/defs/stress_test/stress_test.py
- tests/unittest/llmapi/apps/_test_openai_reasoning.py
- tests/unittest/llmapi/apps/_test_openai_misc.py
- tests/integration/test_lists/test-db/l0_a10.yml
- tests/unittest/llmapi/test_utils.py
- tests/unittest/llmapi/apps/_test_openai_chat.py
- tests/integration/defs/test_e2e.py
- tensorrt_llm/bench/benchmark/low_latency.py
- tests/unittest/llmapi/apps/_test_openai_completions.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: Python code should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case, and prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a Python file, prefer docstrings over comments.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without reflection.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tensorrt_llm/bench/benchmark/throughput.pytensorrt_llm/commands/serve.pytensorrt_llm/llmapi/utils.py
**/*.{cpp,h,cu,py,cc,cxx,hpp}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tensorrt_llm/bench/benchmark/throughput.pytensorrt_llm/commands/serve.pytensorrt_llm/llmapi/utils.py
🧠 Learnings (6)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tensorrt_llm/bench/benchmark/throughput.pytensorrt_llm/commands/serve.pytensorrt_llm/llmapi/utils.py
📚 Learning: in tensorrt_llm/executor/worker.py, the lora adapter cache optimization logic that checks `is_adapte...
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
Applied to files:
tensorrt_llm/bench/benchmark/throughput.py
📚 Learning: in tensorrt-llm's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()...
Learnt from: yechank-nvidia
PR: NVIDIA/TensorRT-LLM#6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using `from_shared_tensor()` is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call `strip_for_generation()` to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.
Applied to files:
tensorrt_llm/bench/benchmark/throughput.py
📚 Learning: applies to **/*.{cpp,h,hpp,cc,cxx} : when practical, a switch statement controlled by an enum should...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-05T06:09:30.350Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx} : When practical, a switch statement controlled by an enum should have a case for each enum value and not have a default clause.
Applied to files:
tensorrt_llm/bench/benchmark/throughput.pytensorrt_llm/commands/serve.py
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
tensorrt_llm/bench/benchmark/throughput.pytensorrt_llm/commands/serve.py
🪛 Ruff (0.12.2)
tensorrt_llm/commands/serve.py
294-297: First line should end with a period, question mark, or exclamation point
Add closing punctuation
(D415)
tensorrt_llm/llmapi/utils.py
671-673: 1 blank line required between summary line and description
(D205)
671-673: First line should end with a period, question mark, or exclamation point
Add closing punctuation
(D415)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (23)
tensorrt_llm/llmapi/utils.py (6)
23-23: LGTM!The import of
MultiValueEnumfromaenumis appropriate for creating the backend enum with multiple aliases per member.
640-648: LGTM!The
BackendTypeenum class definition is well-structured with appropriate multi-value aliases. The enum members correctly define display names, canonical values, and alternative aliases for each backend type.
650-652: LGTM!The
canonical_valueproperty correctly returns the canonical string identifier (second value in the tuple) for each backend type.
654-657: LGTM!The
default_valuemethod correctly returns the canonical value of the PYTORCH backend, providing a centralized default backend selection.
659-662: LGTM!The
canonical_valuesmethod provides a clean way to get all supported backend canonical values, effectively replacing the previous globalALL_SUPPORTED_BACKENDSlist.
665-668: LGTM!The
print_backend_infomethod provides a consistent way to log backend information across different commands and modules.tensorrt_llm/bench/benchmark/throughput.py (10)
30-30: LGTM!The import of
BackendTypefromtensorrt_llm.llmapicorrectly replaces the previousALL_SUPPORTED_BACKENDSimport, providing better type safety and centralized backend management.
47-51: LGTM!The CLI option update correctly uses
BackendType.canonical_values()for choices and defers default assignment to runtime, enabling the warning mechanism for the default backend change.
257-259: LGTM!The backend processing correctly uses the new
BackendTypemethods for default assignment with warning and info printing, implementing the intended backward compatibility mechanism.
272-272: LGTM!The backend variable is now properly typed as
BackendTypeinstead of a string, providing better type safety and preventing potential errors.
313-313: LGTM!The backend comparison using
BackendType.TENSORRTinstead of string literal provides better type safety and consistency.
324-324: LGTM!The backend comparison using
BackendType.TENSORRTenum member is consistent with the new typing approach.
340-340: LGTM!The error message correctly uses
BackendType.canonical_values()to provide users with the current list of supported backends.
373-376: LGTM!The backend conversion logic correctly handles the case where
exec_backendis aBackendTypeinstance, converting it to its canonical string value for runtime configuration.
395-395: LGTM!Using
backend.canonical_valuefor the kwargs ensures the runtime receives the correct string identifier.
397-397: LGTM!The comparison using
BackendType.PYTORCHenum member is consistent with the new typing approach and more maintainable than string literals.tensorrt_llm/commands/serve.py (7)
28-28: LGTM!The import of
BackendTypefromtensorrt_llm.llmapi.utilsis consistent with the migration to the unified backend enum approach.
75-75: LGTM!The backend parameter type change from
strtoBackendTypewith defaultBackendType.PYTORCHimproves type safety and consistency.
141-141: LGTM!The backend comparison using
BackendType.PYTORCHenum member instead of string literal provides better type safety and consistency.
164-164: LGTM!The backend comparison in
launch_servercorrectly usesBackendType.PYTORCHenum member for type-safe backend checking.
189-194: LGTM!The CLI option update correctly uses
BackendType.canonical_values()for choices and defers default assignment to runtime, consistent with the approach in benchmark commands.
283-293: LGTM!The serve function signature correctly maintains the
Optional[str]type for the CLI parameter while the function body properly converts it toBackendType.
300-301: LGTM!The backend processing correctly uses
BackendType.get_default_backend_with_warning()for default assignment with warning andBackendType.print_backend_info()for consistent logging.
|
PR_Github #14097 [ run ] completed with state |
cdc0003 to
8762c7a
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #14146 [ run ] triggered by Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tensorrt_llm/llmapi/utils.py (1)
640-689: Well-designed BackendType enum implementation.The
BackendTypeenum effectively unifies backend string representations across the codebase with the following strengths:
- Clear separation of display names and canonical values using
MultiValueEnum- Proper
__str__implementation returning the display name- Comprehensive static methods for unified behavior across CLI commands
- Appropriate warning handling for the default backend change from TensorRT to PyTorch
The implementation follows coding guidelines and successfully addresses the PR objectives of unifying backend strings and warning about default backend changes.
Consider addressing the minor docstring formatting issue flagged by static analysis:
- """ Warn the user if the backend is not set, as we changed the default - backend to from tensorrt topytorch from v1.0 """ + """Warn the user if the backend is not set, as we changed the default backend from tensorrt to pytorch from v1.0. + + Args: + backend: Optional backend string to validate + + Returns: + BackendType instance for the specified or default backend + """
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
tensorrt_llm/bench/benchmark/low_latency.py(7 hunks)tensorrt_llm/bench/benchmark/throughput.py(10 hunks)tensorrt_llm/bench/benchmark/utils/general.py(0 hunks)tensorrt_llm/commands/serve.py(6 hunks)tensorrt_llm/llmapi/__init__.py(2 hunks)tensorrt_llm/llmapi/utils.py(2 hunks)tests/integration/defs/stress_test/stress_test.py(2 hunks)tests/integration/defs/test_e2e.py(4 hunks)tests/integration/test_lists/test-db/l0_a10.yml(2 hunks)tests/unittest/llmapi/apps/_test_openai_chat.py(2 hunks)tests/unittest/llmapi/apps/_test_openai_completions.py(2 hunks)tests/unittest/llmapi/apps/_test_openai_misc.py(1 hunks)tests/unittest/llmapi/apps/_test_openai_multi_gpu.py(1 hunks)tests/unittest/llmapi/apps/_test_openai_reasoning.py(1 hunks)tests/unittest/llmapi/test_utils.py(2 hunks)
💤 Files with no reviewable changes (1)
- tensorrt_llm/bench/benchmark/utils/general.py
🚧 Files skipped from review as they are similar to previous changes (12)
- tensorrt_llm/llmapi/init.py
- tests/unittest/llmapi/apps/_test_openai_chat.py
- tests/integration/defs/stress_test/stress_test.py
- tests/unittest/llmapi/apps/_test_openai_reasoning.py
- tests/unittest/llmapi/apps/_test_openai_multi_gpu.py
- tests/integration/test_lists/test-db/l0_a10.yml
- tests/unittest/llmapi/apps/_test_openai_misc.py
- tensorrt_llm/bench/benchmark/low_latency.py
- tests/unittest/llmapi/apps/_test_openai_completions.py
- tests/unittest/llmapi/test_utils.py
- tensorrt_llm/bench/benchmark/throughput.py
- tests/integration/defs/test_e2e.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without reflection.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tensorrt_llm/commands/serve.pytensorrt_llm/llmapi/utils.py
**/*.{cpp,h,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tensorrt_llm/commands/serve.pytensorrt_llm/llmapi/utils.py
🧠 Learnings (5)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tensorrt_llm/commands/serve.pytensorrt_llm/llmapi/utils.py
📚 Learning: applies to **/*.py : the code developed for tensorrt-llm should conform to python 3.8+....
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-05T07:49:09.844Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.
Applied to files:
tensorrt_llm/commands/serve.pytensorrt_llm/llmapi/utils.py
📚 Learning: applies to **/*.{cpp,h,hpp,cc,cxx} : when practical, a switch statement controlled by an enum should...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-05T07:49:09.844Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx} : When practical, a switch statement controlled by an enum should have a case for each enum value and not have a default clause.
Applied to files:
tensorrt_llm/commands/serve.py
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
tensorrt_llm/commands/serve.py
🧬 Code Graph Analysis (1)
tensorrt_llm/llmapi/utils.py (2)
tests/unittest/llmapi/apps/_test_openai_chat.py (1)
backend(24-25)tensorrt_llm/logger.py (1)
warning(131-132)
🪛 Ruff (0.12.2)
tensorrt_llm/commands/serve.py
294-297: First line should end with a period, question mark, or exclamation point
Add closing punctuation
(D415)
tensorrt_llm/llmapi/utils.py
671-673: 1 blank line required between summary line and description
(D205)
671-673: First line should end with a period, question mark, or exclamation point
Add closing punctuation
(D415)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (7)
tensorrt_llm/llmapi/utils.py (1)
23-23: LGTM!The import of
MultiValueEnumfromaenumis correctly placed and necessary for implementing theBackendTypeenum with multiple string aliases per backend.tensorrt_llm/commands/serve.py (6)
28-28: LGTM!The import of
BackendTypeis correctly placed and necessary for implementing unified backend handling in the serve command.
75-75: LGTM!The parameter type change from
Optional[str]toBackendTypewith defaultBackendType.PYTORCHimproves type safety and aligns with the unified backend approach. The default value correctly reflects the new PyTorch default mentioned in the PR objectives.
141-141: LGTM!The backend comparison update from string-based to enum-based comparison (
backend == BackendType.PYTORCH) improves type safety and maintainability while preserving the same logic.
164-164: LGTM!The backend comparison update in
launch_servercorrectly uses enum-based comparison (backend == BackendType.PYTORCH) instead of string comparison, maintaining the same logic while improving type safety.
189-194: Excellent CLI option update for backend consistency.The changes effectively implement unified backend handling:
click.Choice(BackendType.canonical_values())ensures CLI choices match the enum definitiondefault=Nonedefers default assignment to enable the warning mechanism- Help text correctly uses
'tensorrt'(canonical value) for consistencyThese changes align perfectly with the PR objectives of unifying backend strings and supporting the default backend change warning.
283-301: Excellent implementation of unified backend handling.The serve function updates successfully implement the PR objectives:
BackendType.get_default_backend_with_warning(backend)handles both default assignment and the warning about the default backend change from TensorRT to PyTorchBackendType.print_backend_info(backend)provides consistent backend logging- The implementation maintains backward compatibility while enabling type-safe enum-based backend handling
The code correctly processes the CLI string input into a
BackendTypeinstance that's used throughout the rest of the function, ensuring consistency with the unified backend approach.
|
@Superjomn -- just a heads up, I had someone find an issue with the |
|
PR_Github #14146 [ run ] completed with state |
f565d76 to
dd082ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tensorrt_llm/commands/serve.py (1)
301-304: Add missing punctuation to docstring.The docstring should end with proper punctuation according to Python documentation standards.
- """Running an OpenAI API compatible server + """Running an OpenAI API compatible server.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
tensorrt_llm/bench/benchmark/throughput.py(10 hunks)tensorrt_llm/commands/serve.py(6 hunks)tests/integration/test_lists/test-db/l0_a10.yml(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/integration/test_lists/test-db/l0_a10.yml
- tensorrt_llm/bench/benchmark/throughput.py
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tensorrt_llm/commands/serve.py
**/*.{cpp,h,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tensorrt_llm/commands/serve.py
🧠 Learnings (5)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tensorrt_llm/commands/serve.py
📚 Learning: applies to **/*.py : the code developed for tensorrt-llm should conform to python 3.8+....
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T00:54:56.009Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.
Applied to files:
tensorrt_llm/commands/serve.py
📚 Learning: applies to **/*.{cpp,h,hpp,cc,cxx} : when practical, a switch statement controlled by an enum should...
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-08-06T00:54:56.009Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx} : When practical, a switch statement controlled by an enum should have a case for each enum value and not have a default clause.
Applied to files:
tensorrt_llm/commands/serve.py
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
tensorrt_llm/commands/serve.py
🧬 Code Graph Analysis (1)
tensorrt_llm/commands/serve.py (1)
tensorrt_llm/llmapi/utils.py (7)
BackendType(640-688)get(354-367)get(420-437)get_default_backend_with_warning(672-688)canonical_value(651-652)canonical_values(660-662)print_backend_info(666-668)
🪛 Ruff (0.12.2)
tensorrt_llm/commands/serve.py
301-304: First line should end with a period, question mark, or exclamation point
Add closing punctuation
(D415)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (7)
tensorrt_llm/commands/serve.py (7)
28-28: Import addition looks good.The
BackendTypeimport is necessary for the enum integration and follows proper Python import conventions.
75-75: Good type safety improvement.Changing the parameter type from string to
BackendTypewith a proper enum default improves type safety and consistency.
141-141: Enum comparison is more type-safe.Using
backend == BackendType.PYTORCHinstead of string comparison improves type safety and consistency with the enum integration.
161-167: Robust backend validation and conversion.The backend handling logic properly validates and converts between string and
BackendTypeinstances, with appropriate error handling for invalid types. This supports a smooth transition during the enum integration.
171-171: Consistent enum usage.The enum comparison
backend == BackendType.PYTORCHis consistent with the type-safe approach used throughout the refactor.
196-201: Excellent CLI option improvements.Using
BackendType.canonical_values()ensures CLI choices stay synchronized with the enum definition. Thedefault=Noneapproach allows proper default handling with warnings, and the help text now uses the canonical 'tensorrt' naming for consistency.
290-308: Clean backend integration with proper user feedback.The serve function properly converts the CLI string parameter to
BackendTypeusing the warning method and provides user feedback viaprint_backend_info. This maintains CLI compatibility while leveraging the enum benefits internally.
dd082ca to
698af17
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #14200 [ run ] triggered by Bot |
|
PR_Github #14200 [ run ] completed with state |
Signed-off-by: Superjomn <[email protected]>
Signed-off-by: Superjomn <[email protected]>
Signed-off-by: Superjomn <[email protected]>
698af17 to
30f134c
Compare
|
/bot run --stage-list "A10-PyTorch-1, H100_PCIe-PyTorch-1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tensorrt_llm/llmapi/utils.py (1)
670-688: Fix docstring formatting issues.The method implementation is correct, but the docstring needs formatting improvements based on static analysis hints.
Apply this diff to fix the docstring formatting:
- def get_default_backend_with_warning( - backend: Optional[str]) -> "BackendType": - """ Warn the user if the backend is not set, as we changed the default - backend to from tensorrt topytorch from v1.0 """ + def get_default_backend_with_warning( + backend: Optional[str]) -> "BackendType": + """Warn the user if the backend is not set. + + We changed the default backend from tensorrt to pytorch from v1.0. + + Args: + backend: Optional backend string to validate + + Returns: + BackendType instance for the validated backend + """
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
tensorrt_llm/bench/benchmark/low_latency.py(7 hunks)tensorrt_llm/bench/benchmark/throughput.py(10 hunks)tensorrt_llm/bench/benchmark/utils/general.py(0 hunks)tensorrt_llm/commands/serve.py(6 hunks)tensorrt_llm/llmapi/__init__.py(2 hunks)tensorrt_llm/llmapi/llm.py(1 hunks)tensorrt_llm/llmapi/utils.py(2 hunks)tests/integration/defs/stress_test/stress_test.py(2 hunks)tests/integration/defs/test_e2e.py(4 hunks)tests/integration/test_lists/test-db/l0_a10.yml(2 hunks)tests/unittest/llmapi/apps/_test_openai_chat.py(2 hunks)tests/unittest/llmapi/apps/_test_openai_completions.py(2 hunks)tests/unittest/llmapi/apps/_test_openai_misc.py(1 hunks)tests/unittest/llmapi/apps/_test_openai_multi_gpu.py(1 hunks)tests/unittest/llmapi/apps/_test_openai_reasoning.py(1 hunks)tests/unittest/llmapi/test_utils.py(2 hunks)
💤 Files with no reviewable changes (1)
- tensorrt_llm/bench/benchmark/utils/general.py
✅ Files skipped from review due to trivial changes (1)
- tensorrt_llm/llmapi/llm.py
🚧 Files skipped from review as they are similar to previous changes (12)
- tensorrt_llm/llmapi/init.py
- tests/integration/defs/stress_test/stress_test.py
- tests/unittest/llmapi/apps/_test_openai_reasoning.py
- tests/unittest/llmapi/test_utils.py
- tests/unittest/llmapi/apps/_test_openai_completions.py
- tests/integration/defs/test_e2e.py
- tests/unittest/llmapi/apps/_test_openai_misc.py
- tests/integration/test_lists/test-db/l0_a10.yml
- tests/unittest/llmapi/apps/_test_openai_chat.py
- tensorrt_llm/bench/benchmark/low_latency.py
- tests/unittest/llmapi/apps/_test_openai_multi_gpu.py
- tensorrt_llm/bench/benchmark/throughput.py
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
📚 Learning: in tensorrt-llm testing, it's common to have both cli flow tests (test_cli_flow.py) and pytorch api ...
Learnt from: moraxu
PR: NVIDIA/TensorRT-LLM#6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Applied to files:
tensorrt_llm/commands/serve.pytensorrt_llm/llmapi/utils.py
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
tensorrt_llm/commands/serve.py
🪛 Ruff (0.12.2)
tensorrt_llm/commands/serve.py
299-302: First line should end with a period, question mark, or exclamation point
Add closing punctuation
(D415)
tensorrt_llm/llmapi/utils.py
671-673: 1 blank line required between summary line and description
(D205)
671-673: First line should end with a period, question mark, or exclamation point
Add closing punctuation
(D415)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (12)
tensorrt_llm/llmapi/utils.py (4)
23-23: LGTM!The import of
MultiValueEnumfromaenumis correctly placed and necessary for the newBackendTypeenum implementation.
640-649: LGTM!The
BackendTypeenum class design is well-structured, usingMultiValueEnumto support multiple aliases per backend. The__str__method correctly returns the display name (first value) as requested in past reviews.
650-652: LGTM!The
canonical_valueproperty correctly returns the canonical backend identifier (second value in the tuple).
654-668: LGTM!The static utility methods provide a clean and consistent API for backend handling. The implementation correctly extracts canonical values and provides informative logging.
tensorrt_llm/commands/serve.py (8)
28-28: LGTM!The import of
BackendTypeis correctly placed and necessary for the backend standardization changes.
75-75: LGTM!The parameter type change to
BackendTypewith a sensible default improves type safety and consistency.
141-141: LGTM!The backend assignment logic correctly uses
BackendType.PYTORCHenum comparison instead of string literals.
161-166: LGTM!The backend validation and conversion logic properly handles both
BackendTypeinstances and string values, with appropriate warning for missing backends.
169-169: LGTM!The backend comparison now correctly uses the
BackendType.PYTORCHenum instead of string literals.
194-199: LGTM!The CLI option updates correctly use
BackendType.canonical_values()for choices and defer default assignment to provide user warnings. The help text now consistently uses "tensorrt" as requested in past reviews.
288-298: LGTM!The function signature updates properly handle the backend parameter as an optional string, maintaining CLI compatibility.
305-306: LGTM!The backend processing and logging using
BackendTypemethods provides consistent behavior and user feedback across the codebase.
|
PR_Github #14272 [ run ] triggered by Bot |
|
PR_Github #14272 [ run ] completed with state |
|
Given that this introduces a breaking change so close to the 1.0 code freeze, I’d like to postpone merging this PR until after the 1.0 release. In the meantime, I’ll update it to:
|
Description
--backendis not specified, this will guide the existing TRT users to specify--backend tensorrtto keep the existing behavior--backend trtto--backend tensorrtSummary by CodeRabbit
Summary by CodeRabbit
New Features
Tests
Chores
Description
Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
Details
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.