-
Notifications
You must be signed in to change notification settings - Fork 2k
[None][feat] Add C++ RequestSpecificException #6362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[None][feat] Add C++ RequestSpecificException #6362
Conversation
📝 WalkthroughWalkthroughA request-specific exception system was added: new C++ RequestSpecificException with error codes and request IDs, pybind bindings and an exceptions submodule, integration into network/transmission and batch manager paths, Python executor error-wrapping, helper utilities, version header, and unit tests. Changes
Sequence Diagram(s)sequenceDiagram
participant PythonUser as Python User
participant PyModule as TRTLLM Python Module
participant PyBind as pybind11 Bindings
participant Cpp as C++ Layer
participant Promise as Promise/Future
PythonUser->>PyModule: Import exceptions submodule
PyModule->>PyBind: init exception bindings
PyBind->>Cpp: expose RequestSpecificException & error codes
PythonUser->>PyModule: Call API that triggers network/op
PyModule->>Cpp: execute operation
Cpp-->>Cpp: throws RequestSpecificException(requestID, errorCode)
Cpp->>Promise: set exception on promise with requestID & errorCode
Cpp-->>PyBind: exception translated when crossing to Python
PyBind-->>PythonUser: raise Python RequestSpecificException (with request_id, error_code)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (1)
cpp/tensorrt_llm/pybind/common/tllmExceptions.cpp (1)
84-85: Consider providing accurate file/line information for Python-created exceptions.Using
__FILE__and__LINE__in the convenience functions will always report the binding file location rather than where the exception was created in Python. Consider accepting file/line as optional parameters or documenting this limitation.- [](uint64_t request_id, tc::RequestErrorCode error_code, std::string const& message) - { return tc::RequestSpecificException(__FILE__, __LINE__, message.c_str(), request_id, error_code); }, - py::arg("request_id"), py::arg("error_code"), py::arg("message"), + [](uint64_t request_id, tc::RequestErrorCode error_code, std::string const& message, + std::string const& file = __FILE__, std::size_t line = __LINE__) + { return tc::RequestSpecificException(file.c_str(), line, message.c_str(), request_id, error_code); }, + py::arg("request_id"), py::arg("error_code"), py::arg("message"), + py::arg("file") = __FILE__, py::arg("line") = __LINE__,Also applies to: 93-93
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
cpp/include/tensorrt_llm/common/tllmException.h(2 hunks)cpp/tensorrt_llm/common/tllmException.cpp(1 hunks)cpp/tensorrt_llm/pybind/common/tllmExceptions.cpp(1 hunks)cpp/tensorrt_llm/pybind/common/tllmExceptions.h(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{cpp,h,hpp,cc,cxx}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.{cpp,h,hpp,cc,cxx}: Closing braces of namespaces should have a comment saying the namespace it closes (e.g., } // namespace foo)
Prefer const or constexpr variables over #defines whenever possible, as the latter are not visible to the compiler.
A variable that is not modified after its initialization should be declared as const.
Except 0, nullptr, true, false, all other literals should only be used for variable initialization.
Use the Allman indentation style for braces in C++ code.
Put the semicolon for an empty for or while loop in a new line.
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements).
If and else should always be followed by brace-delimited statements, even if empty or a single statement.
C++ filenames should use camel case with first letter lowercase (e.g., thisIsAFilename.cpp), and must be case-insensitive unique within a compilation target.
All types (including class names) should use camel case with uppercase first letter (e.g., FooBarClass).
Local variables, methods, and namespaces should use camel case with first letter lowercase (e.g., localFooBar).
Non-magic-number global variables that are non-static and not defined in anonymous namespace should use camel case prefixed by a lower case 'g' (e.g., gDontUseGlobalFoos).
Non-magic-number global variables that are static or defined in an anonymous namespace should use camel case prefixed by a lower case 's' (e.g., sMutableStaticGlobal).
Locally visible static variable should use camel case with lowercase prefix 's' as the first letter of the name (e.g., static std::once_flag sFlag;).
Class member variables should use camel case prefixed with an 'm' (e.g., mNbFooValues). Public member variables do not require the 'm' prefix but it is encouraged for clarity.
Enumerations, global constants, static constants at class-scope, and function-scope magic-number/literal constants are uppercase snakecase with prefix...
Files:
cpp/tensorrt_llm/pybind/common/tllmExceptions.hcpp/tensorrt_llm/common/tllmException.cppcpp/tensorrt_llm/pybind/common/tllmExceptions.cppcpp/include/tensorrt_llm/common/tllmException.h
**/*.{h,hpp}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.
Files:
cpp/tensorrt_llm/pybind/common/tllmExceptions.hcpp/include/tensorrt_llm/common/tllmException.h
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
cpp/tensorrt_llm/pybind/common/tllmExceptions.hcpp/tensorrt_llm/common/tllmException.cppcpp/tensorrt_llm/pybind/common/tllmExceptions.cppcpp/include/tensorrt_llm/common/tllmException.h
🔇 Additional comments (2)
cpp/tensorrt_llm/pybind/common/tllmExceptions.h (1)
1-33: LGTM! Header file follows all coding guidelines.The new header file is well-structured with proper copyright header, namespace closing comment, and Doxygen documentation.
cpp/tensorrt_llm/pybind/common/tllmExceptions.cpp (1)
99-99: Add namespace closing comment.According to coding guidelines, closing braces of namespaces should have a comment.
-} // namespace tensorrt_llm::pybind::common +} // namespace tensorrt_llm::pybind::commonWait, the comment is already there. This appears to be correct.
| RequestSpecificException::RequestSpecificException( | ||
| char const* file, std::size_t line, char const* msg, uint64_t requestID, RequestErrorCode errorCode) | ||
| : std::runtime_error{fmtstr( | ||
| "%s (Request ID: %lu, Error Code: %u) (%s:%zu)", msg, requestID, static_cast<uint32_t>(errorCode), file, line)} | ||
| , mRequestID{requestID} | ||
| , mErrorCode{errorCode} | ||
| { | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use portable format specifiers for cross-platform compatibility.
The format specifier %lu for uint64_t is not portable across different platforms. Use the appropriate format macro from <cinttypes>.
Add the include at the top of the file:
#include <cinttypes>Then update the format string:
- "%s (Request ID: %lu, Error Code: %u) (%s:%zu)", msg, requestID, static_cast<uint32_t>(errorCode), file, line)}
+ "%s (Request ID: %" PRIu64 ", Error Code: %u) (%s:%zu)", msg, requestID, static_cast<uint32_t>(errorCode), file, line)}🤖 Prompt for AI Agents
In cpp/tensorrt_llm/common/tllmException.cpp around lines 109 to 116, the format
specifier %lu used for the uint64_t requestID is not portable across platforms.
To fix this, include <cinttypes> at the top of the file and replace %lu with the
portable macro PRIu64 in the format string to ensure cross-platform
compatibility.
511920c to
f0a2bd4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
cpp/include/tensorrt_llm/common/tllmException.h(2 hunks)cpp/tensorrt_llm/common/tllmException.cpp(1 hunks)cpp/tensorrt_llm/pybind/CMakeLists.txt(1 hunks)cpp/tensorrt_llm/pybind/bindings.cpp(2 hunks)cpp/tensorrt_llm/pybind/common/tllmExceptions.cpp(1 hunks)cpp/tensorrt_llm/pybind/common/tllmExceptions.h(1 hunks)tests/unittest/others/test_exception.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- cpp/tensorrt_llm/pybind/common/tllmExceptions.h
🚧 Files skipped from review as they are similar to previous changes (5)
- cpp/tensorrt_llm/pybind/CMakeLists.txt
- cpp/tensorrt_llm/common/tllmException.cpp
- cpp/include/tensorrt_llm/common/tllmException.h
- cpp/tensorrt_llm/pybind/bindings.cpp
- cpp/tensorrt_llm/pybind/common/tllmExceptions.cpp
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without reflection.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tests/unittest/others/test_exception.py
**/*.{cpp,h,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tests/unittest/others/test_exception.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-07-30T06:11:42.362Z
Learning: Applies to **/*.py : When using try-except blocks in Python, limit the except to the smallest set of errors possible.
tests/unittest/others/test_exception.py (1)
Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
🔇 Additional comments (6)
tests/unittest/others/test_exception.py (6)
6-13: LGTM! Well-structured basic exception test.The test correctly validates exception creation, catching, and message handling with appropriate assertions.
25-33: LGTM! Proper attribute existence validation.The test correctly checks for the expected attributes with clear assertions and helpful comments about the C++ integration context.
35-45: LGTM! Comprehensive traceback validation.The test properly validates exception traceback information using appropriate sys.exc_info() calls and meaningful assertions.
47-58: LGTM! Correct exception equality testing.The test appropriately validates that separate exception instances are distinct objects, even with identical messages.
60-70: LGTM! Proper exception chaining validation.The test correctly uses Python's exception chaining syntax and validates the cause attribute appropriately.
83-96: LGTM! Comprehensive multiple instance testing.The test effectively validates creation of multiple exception instances with proper message handling and type checking.
| @@ -0,0 +1,95 @@ | |||
| import sys | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing required NVIDIA copyright header.
According to the coding guidelines, all TensorRT-LLM source files including .py files must contain an NVIDIA copyright header with the current year.
Add the copyright header at the beginning of the file:
+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
import sys📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| import sys | |
| # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |
| # SPDX-License-Identifier: Apache-2.0 | |
| # | |
| # Licensed under the Apache License, Version 2.0 (the "License"); | |
| # you may not use this file except in compliance with the License. | |
| # You may obtain a copy of the License at | |
| # | |
| # http://www.apache.org/licenses/LICENSE-2.0 | |
| # | |
| # Unless required by applicable law or agreed to in writing, software | |
| # distributed under the License is distributed on an "AS IS" BASIS, | |
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
| # See the License for the specific language governing permissions and | |
| # limitations under the License. | |
| # | |
| import sys |
🤖 Prompt for AI Agents
In tests/unittest/others/test_exception.py at line 1, the file is missing the
required NVIDIA copyright header. Add the standard NVIDIA copyright header with
the current year at the very top of the file before any imports or code,
following the project's coding guidelines.
| def test_exception_inheritance(): | ||
| """Test that exception properly inherits from base Exception.""" | ||
| try: | ||
| raise RequestSpecificException("Test inheritance") | ||
| except Exception as e: # Should catch base Exception | ||
| assert isinstance(e, RequestSpecificException) | ||
| except RequestSpecificException as e: # Should also catch specific type | ||
| assert isinstance(e, RequestSpecificException) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix unreachable except block.
The second except block will never execute because RequestSpecificException inherits from Exception, so it will always be caught by the first except block.
Apply this diff to fix the test logic:
def test_exception_inheritance():
"""Test that exception properly inherits from base Exception."""
- try:
- raise RequestSpecificException("Test inheritance")
- except Exception as e: # Should catch base Exception
- assert isinstance(e, RequestSpecificException)
- except RequestSpecificException as e: # Should also catch specific type
- assert isinstance(e, RequestSpecificException)
+ # Test catching as base Exception
+ try:
+ raise RequestSpecificException("Test inheritance")
+ except Exception as e:
+ assert isinstance(e, RequestSpecificException)
+
+ # Test catching as specific type
+ try:
+ raise RequestSpecificException("Test inheritance")
+ except RequestSpecificException as e:
+ assert isinstance(e, RequestSpecificException)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_exception_inheritance(): | |
| """Test that exception properly inherits from base Exception.""" | |
| try: | |
| raise RequestSpecificException("Test inheritance") | |
| except Exception as e: # Should catch base Exception | |
| assert isinstance(e, RequestSpecificException) | |
| except RequestSpecificException as e: # Should also catch specific type | |
| assert isinstance(e, RequestSpecificException) | |
| def test_exception_inheritance(): | |
| """Test that exception properly inherits from base Exception.""" | |
| # Test catching as base Exception | |
| try: | |
| raise RequestSpecificException("Test inheritance") | |
| except Exception as e: | |
| assert isinstance(e, RequestSpecificException) | |
| # Test catching as specific type | |
| try: | |
| raise RequestSpecificException("Test inheritance") | |
| except RequestSpecificException as e: | |
| assert isinstance(e, RequestSpecificException) |
🤖 Prompt for AI Agents
In tests/unittest/others/test_exception.py around lines 15 to 23, the second
except block for RequestSpecificException is unreachable because the first
except block catches all Exceptions including RequestSpecificException. To fix
this, remove the first except block catching Exception and keep only the except
block for RequestSpecificException to properly test the inheritance and catching
of the specific exception.
| def test_cpp_exception_translation(): | ||
| """Test that C++ exceptions are properly translated to Python.""" | ||
| try: | ||
| # This would normally be triggered by C++ code | ||
| # For now, we'll test the Python exception creation | ||
| raise RequestSpecificException("Test C++ translation") | ||
| except RequestSpecificException as e: | ||
| # Check that the exception was properly created | ||
| assert isinstance(e, RequestSpecificException) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Test doesn't match its intended purpose.
The test name suggests it tests C++ exception translation, but it only creates and catches a Python exception. This doesn't validate the actual C++ to Python exception translation mechanism.
Consider either:
- Renaming the test to reflect what it actually does:
-def test_cpp_exception_translation():
- """Test that C++ exceptions are properly translated to Python."""
+def test_python_exception_creation():
+ """Test Python exception creation and basic functionality."""- Or implementing actual C++ translation testing if there's a way to trigger C++ exceptions from the test environment.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_cpp_exception_translation(): | |
| """Test that C++ exceptions are properly translated to Python.""" | |
| try: | |
| # This would normally be triggered by C++ code | |
| # For now, we'll test the Python exception creation | |
| raise RequestSpecificException("Test C++ translation") | |
| except RequestSpecificException as e: | |
| # Check that the exception was properly created | |
| assert isinstance(e, RequestSpecificException) | |
| def test_python_exception_creation(): | |
| """Test Python exception creation and basic functionality.""" | |
| try: | |
| # This would normally be triggered by C++ code | |
| # For now, we'll test the Python exception creation | |
| raise RequestSpecificException("Test C++ translation") | |
| except RequestSpecificException as e: | |
| # Check that the exception was properly created | |
| assert isinstance(e, RequestSpecificException) |
🤖 Prompt for AI Agents
In tests/unittest/others/test_exception.py around lines 72 to 81, the test named
test_cpp_exception_translation does not actually test C++ exception translation
but only raises and catches a Python exception. To fix this, either rename the
test to reflect that it only tests Python exception creation or, if possible,
modify the test to trigger a real C++ exception and verify its translation to
Python exceptions.
f0a2bd4 to
2e12da4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (2)
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp (1)
154-156: Consider using a more meaningful request ID instead of -1Both timeout exceptions use -1 as the request ID. Since this is a connection-level operation, consider extracting a meaningful request ID from the context or connection state if available, or document why -1 is appropriate here.
Also applies to: 191-193
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (1)
20-22: Consider consolidating related includesThe timeout implementation added new includes. Consider grouping them with other standard library includes for better organization.
#include "connection.h" #include "tensorrt_llm/common/envUtils.h" -#include <chrono> #include <string> +#include <chrono> #include <thread> #include <unistd.h>
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
cpp/include/tensorrt_llm/common/tllmException.h(2 hunks)cpp/include/tensorrt_llm/executor/cacheCommunicator.h(2 hunks)cpp/tensorrt_llm/common/tllmException.cpp(1 hunks)cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp(3 hunks)cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h(1 hunks)cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.cpp(1 hunks)cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.h(1 hunks)cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp(4 hunks)cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.h(3 hunks)cpp/tensorrt_llm/pybind/CMakeLists.txt(1 hunks)cpp/tensorrt_llm/pybind/bindings.cpp(2 hunks)cpp/tensorrt_llm/pybind/common/tllmExceptions.cpp(1 hunks)cpp/tensorrt_llm/pybind/common/tllmExceptions.h(1 hunks)tests/unittest/others/test_exception.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- cpp/tensorrt_llm/pybind/CMakeLists.txt
- cpp/tensorrt_llm/pybind/bindings.cpp
- cpp/tensorrt_llm/common/tllmException.cpp
- cpp/tensorrt_llm/pybind/common/tllmExceptions.h
- cpp/tensorrt_llm/pybind/common/tllmExceptions.cpp
- tests/unittest/others/test_exception.py
- cpp/include/tensorrt_llm/common/tllmException.h
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{cpp,h,hpp,cc,cxx}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.{cpp,h,hpp,cc,cxx}: Closing braces of namespaces should have a comment saying the namespace it closes (e.g., } // namespace foo)
Prefer const or constexpr variables over #defines whenever possible, as the latter are not visible to the compiler.
A variable that is not modified after its initialization should be declared as const.
Except 0 (only used in comparison for checking signness/existence/emptiness) and nullptr, true, false, all other literals should only be used for variable initialization.
Use the Allman indentation style for braces.
Put the semicolon for an empty for or while loop in a new line.
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements).
If and else should always be followed by brace-delimited statements, even if empty or a single statement.
C++ filenames should use camel case with first letter lowercase (e.g., thisIsAFilename.cpp), and all files involved in the compilation of a target must have filenames that are case-insensitive unique.
All types (including class names) are camel case with uppercase first letter (e.g., FooBarClass).
Local variables, methods, and namespaces use camel case with first letter lowercase (e.g., localFooBar).
Non-magic-number global variables that are non-static and not defined in anonymous namespace use camel case prefixed by a lower case 'g' (e.g., gDontUseGlobalFoos).
Non-magic-number global variables that are static or defined in an anonymous namespace use camel case prefixed by a lower case 's' (e.g., sMutableStaticGlobal).
Locally visible static variable uses camel case with lowercase prefix 's' as the first letter of the name (e.g., static std::once_flag sFlag;).
Class member variables use camel case prefixed with an 'm' (e.g., mNbFooValues). Public member variables do not require the 'm' prefix but it is encouraged for clarity.
Enumerations, global constants, static constants at class-scope, and function-scope magic...
Files:
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.hcpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.cppcpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cppcpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.hcpp/include/tensorrt_llm/executor/cacheCommunicator.hcpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.hcpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp
**/*.{h,hpp}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.
Files:
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.hcpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.hcpp/include/tensorrt_llm/executor/cacheCommunicator.hcpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.hcpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.cppcpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cppcpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.hcpp/include/tensorrt_llm/executor/cacheCommunicator.hcpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.hcpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp
🧠 Learnings (3)
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.h (6)
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-07-31T04:50:23.290Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-07-31T04:50:23.290Z
Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.
Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-07-31T04:50:23.290Z
Learning: Applies to **/*.cu : CUDA code includes code that must be compiled with a CUDA compiler, such as code using device, managed, constant, global, or kernel launching syntax.
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-07-31T04:50:23.290Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.
cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (1)
Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
cpp/include/tensorrt_llm/executor/cacheCommunicator.h (1)
Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (7)
cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.cpp (2)
30-35: LGTM! Proper interface alignment with timeout parameter.The implementation correctly adds the timeout parameter to maintain interface compatibility while explicitly ignoring it for MPI connections. The void cast properly suppresses unused parameter warnings and the comment clearly explains the behavior.
37-42: LGTM! Consistent timeout parameter handling.The recv method implementation mirrors the send method approach, maintaining consistency in how the timeout parameter is handled for MPI connections.
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.h (3)
23-23: LGTM! Required include for timeout support.The chrono header is necessary for the
std::chrono::millisecondstimeout parameter type.
32-32: LGTM! Required include for exception handling.The tllmException header is necessary for the
RequestSpecificExceptionthat will be thrown on timeout according to the implementation.
51-54: LGTM! Proper method signature updates for timeout support.The send and recv method signatures are correctly updated with the timeout parameter, maintaining the override relationship with the base Connection interface and providing a sensible default value.
cpp/include/tensorrt_llm/executor/cacheCommunicator.h (2)
20-20: LGTM! Required include for timeout parameter type.The chrono header is properly added to support the
std::chrono::millisecondstimeout parameter in the Connection interface methods.
50-56: LGTM! Proper base interface update for timeout support.The Connection interface methods are correctly updated with the timeout parameter as pure virtual functions. The default timeout value maintains backward compatibility while ensuring all derived implementations support the timeout parameter.
| if (timeout.count() > 0) | ||
| { | ||
| // For AgentConnection, we'll use a simple timeout check on the status | ||
| // This is a simplified approach since the underlying transfer agent may not support timeouts | ||
| auto start = std::chrono::steady_clock::now(); | ||
| while (!status->isCompleted()) | ||
| { | ||
| auto now = std::chrono::steady_clock::now(); | ||
| if (now - start > timeout) | ||
| { | ||
| TLLM_THROW("AgentConnection::send timeout after " + std::to_string(timeout.count()) + "ms"); | ||
| } | ||
| std::this_thread::sleep_for(std::chrono::milliseconds(1)); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Inefficient polling implementation for timeout handling
The current implementation uses a busy-wait loop with 1ms sleep intervals. This approach is inefficient and can cause unnecessary CPU usage and latency.
Consider using a more efficient approach. If the underlying status object supports it, use a wait_for method or condition variable. Alternatively, consider implementing timeout support in the transfer agent itself.
- auto start = std::chrono::steady_clock::now();
- while (!status->isCompleted())
- {
- auto now = std::chrono::steady_clock::now();
- if (now - start > timeout)
- {
- TLLM_THROW("AgentConnection::send timeout after " + std::to_string(timeout.count()) + "ms");
- }
- std::this_thread::sleep_for(std::chrono::milliseconds(1));
- }
+ // If the status object supports wait_for:
+ if (!status->wait_for(timeout))
+ {
+ throw NEW_TLLM_REQUEST_SPECIFIC_EXCEPTION_WITH_ERROR_CODE(-1, common::RequestErrorCode::NETWORK_TIMEOUT,
+ "AgentConnection::send timeout after " + std::to_string(timeout.count()) + "ms");
+ }Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
between lines 99 and 113, the current timeout handling uses a busy-wait loop
with 1ms sleeps, causing inefficient CPU usage. To fix this, replace the polling
loop with a more efficient mechanism such as using a condition variable or a
wait_for method on the status object if available. If these are not supported,
consider adding timeout support directly in the transfer agent to avoid active
waiting and reduce CPU load.
| auto now = std::chrono::steady_clock::now(); | ||
| if (now - start > timeout) | ||
| { | ||
| TLLM_THROW("AgentConnection::send timeout after " + std::to_string(timeout.count()) + "ms"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use consistent exception types across connection implementations
The AgentConnection uses TLLM_THROW while UcxConnection uses NEW_TLLM_REQUEST_SPECIFIC_EXCEPTION_WITH_ERROR_CODE. For consistency with the rest of the codebase and to support request-specific error handling, use the same exception type.
- TLLM_THROW("AgentConnection::send timeout after " + std::to_string(timeout.count()) + "ms");
+ throw NEW_TLLM_REQUEST_SPECIFIC_EXCEPTION_WITH_ERROR_CODE(-1, common::RequestErrorCode::NETWORK_TIMEOUT,
+ "AgentConnection::send timeout after " + std::to_string(timeout.count()) + "ms");- TLLM_THROW("AgentConnection::recv timeout after " + std::to_string(timeout.count()) + "ms");
+ throw NEW_TLLM_REQUEST_SPECIFIC_EXCEPTION_WITH_ERROR_CODE(-1, common::RequestErrorCode::NETWORK_TIMEOUT,
+ "AgentConnection::recv timeout after " + std::to_string(timeout.count()) + "ms");Also applies to: 146-146
🤖 Prompt for AI Agents
In cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp at
lines 109 and 146, replace the use of TLLM_THROW with
NEW_TLLM_REQUEST_SPECIFIC_EXCEPTION_WITH_ERROR_CODE to maintain consistency with
other connection implementations and enable request-specific error handling.
Adjust the exception throwing code accordingly to include the error code and any
relevant request context as required by the macro.
| std::thread( | ||
| [&]() | ||
| { | ||
| mAgentConnectionManager->waitForSyncInfo(mRemoteAgentName, syncInfo); | ||
| promise.set_value(); | ||
| }) | ||
| .detach(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Critical: Detached thread continues running after timeout
The detached thread will continue running even after a timeout occurs, which can lead to:
- Resource leaks (thread keeps running indefinitely)
- Race conditions (thread may complete after the function returns)
- Potential crashes if the promise is destroyed while the thread is still running
The thread should not be detached. Instead, manage its lifecycle properly:
- std::thread(
- [&]()
- {
- mAgentConnectionManager->waitForSyncInfo(mRemoteAgentName, syncInfo);
- promise.set_value();
- })
- .detach();
+ auto worker = std::thread(
+ [&]()
+ {
+ try {
+ mAgentConnectionManager->waitForSyncInfo(mRemoteAgentName, syncInfo);
+ promise.set_value();
+ } catch (...) {
+ promise.set_exception(std::current_exception());
+ }
+ });
+
+ auto status = future.wait_for(timeout);
+ if (status == std::future_status::timeout)
+ {
+ // Note: Cannot safely interrupt waitForSyncInfo, so thread cleanup is problematic
+ // This design needs to be reconsidered - waitForSyncInfo should support cancellation
+ worker.detach(); // Still problematic but documented
+ throw NEW_TLLM_REQUEST_SPECIFIC_EXCEPTION_WITH_ERROR_CODE(-1, common::RequestErrorCode::NETWORK_TIMEOUT,
+ "AgentConnection::recv timeout after " + std::to_string(timeout.count()) + "ms");
+ }
+ worker.join();However, this still has issues because waitForSyncInfo cannot be interrupted. A better solution would be to make waitForSyncInfo timeout-aware.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
around lines 135 to 141, the detached thread running waitForSyncInfo continues
after a timeout, risking resource leaks and race conditions. To fix this, do not
detach the thread; instead, keep a handle to join it properly. Additionally,
modify waitForSyncInfo to support timeout awareness so the thread can exit early
if needed, preventing it from running indefinitely.
| void send(DataContext const& ctx, void const* data, size_t size, | ||
| std::chrono::milliseconds timeout = std::chrono::milliseconds{0}) const override; | ||
| void recv(DataContext const& ctx, void* data, size_t size, | ||
| std::chrono::milliseconds timeout = std::chrono::milliseconds{0}) const override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing chrono include for timeout parameter.
The method signatures correctly add the timeout parameter, but the file is missing the necessary #include <chrono> header for the std::chrono::milliseconds type.
Add the missing include:
#pragma once
+#include <chrono>
#include "tensorrt_llm/batch_manager/cacheTransBuffer.h"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| void send(DataContext const& ctx, void const* data, size_t size, | |
| std::chrono::milliseconds timeout = std::chrono::milliseconds{0}) const override; | |
| void recv(DataContext const& ctx, void* data, size_t size, | |
| std::chrono::milliseconds timeout = std::chrono::milliseconds{0}) const override; | |
| #pragma once | |
| #include <chrono> | |
| #include "tensorrt_llm/batch_manager/cacheTransBuffer.h" |
🤖 Prompt for AI Agents
In cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h around
lines 174 to 177, the methods use std::chrono::milliseconds for the timeout
parameter but the file lacks the necessary #include <chrono> directive. Add
#include <chrono> at the top of the file to ensure the std::chrono::milliseconds
type is recognized and the code compiles correctly.
| void send(DataContext const& ctx, void const* data, size_t size, | ||
| std::chrono::milliseconds timeout = std::chrono::milliseconds{0}) const override; | ||
| void recv(DataContext const& ctx, void* data, size_t size, | ||
| std::chrono::milliseconds timeout = std::chrono::milliseconds{0}) const override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add missing chrono include for timeout parameter.
The method signatures correctly add the timeout parameter, but the file is missing the necessary #include <chrono> header for the std::chrono::milliseconds type.
Add the missing include:
#pragma once
+#include <chrono>
#include "tensorrt_llm/common/cudaUtils.h"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| void send(DataContext const& ctx, void const* data, size_t size, | |
| std::chrono::milliseconds timeout = std::chrono::milliseconds{0}) const override; | |
| void recv(DataContext const& ctx, void* data, size_t size, | |
| std::chrono::milliseconds timeout = std::chrono::milliseconds{0}) const override; | |
| #pragma once | |
| #include <chrono> | |
| #include "tensorrt_llm/common/cudaUtils.h" |
🤖 Prompt for AI Agents
In cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.h around
lines 33 to 36, the methods use std::chrono::milliseconds for the timeout
parameter but the file lacks the #include <chrono> directive. Add #include
<chrono> at the top of the file to ensure the std::chrono::milliseconds type is
recognized and the code compiles correctly.
2e12da4 to
b4d365a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (1)
18-18: Replace #pragma once with traditional header guard.According to the coding guidelines, header files should use a preprocessor guard with the format
TRTLLM_followed by the filename in caps. Replace#pragma oncewith the appropriate guard.-#pragma once +#ifndef TRTLLM_DATATRANSCEIVER_H +#define TRTLLM_DATATRANSCEIVER_HAnd add at the end of the file before the namespace closing:
+#endif // TRTLLM_DATATRANSCEIVER_H } // namespace tensorrt_llm::batch_manager
🧹 Nitpick comments (3)
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (2)
146-154: Consider using actual request ID when available.The exception handling implementation is solid, but consider using the actual request ID from
mRequestwhen available instead of always usingkUNKNOWN_REQUEST_ID. This would provide more specific error context for debugging.- throw common::RequestSpecificException( - __FILE__, __LINE__, e.what(), common::kUNKNOWN_REQUEST_ID, common::RequestErrorCode::kNETWORK_ERROR); + auto requestId = (mRequest != nullptr) ? mRequest->getRequestId() : common::kUNKNOWN_REQUEST_ID; + throw common::RequestSpecificException( + __FILE__, __LINE__, e.what(), requestId, common::RequestErrorCode::kNETWORK_ERROR);
159-167: Consider using actual request ID when available.Same suggestion as for the
sendmethod - consider using the actual request ID frommRequestwhen available instead of always usingkUNKNOWN_REQUEST_ID.- throw common::RequestSpecificException( - __FILE__, __LINE__, e.what(), common::kUNKNOWN_REQUEST_ID, common::RequestErrorCode::kNETWORK_ERROR); + auto requestId = (mRequest != nullptr) ? mRequest->getRequestId() : common::kUNKNOWN_REQUEST_ID; + throw common::RequestSpecificException( + __FILE__, __LINE__, e.what(), requestId, common::RequestErrorCode::kNETWORK_ERROR);tensorrt_llm/_torch/pyexecutor/py_executor.py (1)
1282-1301: Well-designed exception handling wrapper.The helper method follows good design principles by centralizing exception handling logic and properly differentiating between request-specific and system errors. The implementation correctly handles both exception types and provides appropriate logging.
Consider adding type hints for better code documentation:
- def _check_cache_transfer_status_helper(self, - method_name: str, - method_call, - atLeastNum: int = 0): + def _check_cache_transfer_status_helper(self, + method_name: str, + method_call: callable, + atLeastNum: int = 0):
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
cpp/tensorrt_llm/batch_manager/dataTransceiver.h(1 hunks)cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp(1 hunks)tensorrt_llm/_torch/pyexecutor/py_executor.py(8 hunks)
✅ Files skipped from review due to trivial changes (1)
- cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.
Files:
tensorrt_llm/_torch/pyexecutor/py_executor.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.
Files:
tensorrt_llm/_torch/pyexecutor/py_executor.pycpp/tensorrt_llm/batch_manager/dataTransceiver.h
**/*.{cpp,h,hpp,cc,cxx}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.{cpp,h,hpp,cc,cxx}: Closing braces of namespaces should have a comment saying the namespace it closes (e.g., } // namespace foo)
Prefer const or constexpr variables over #defines whenever possible, as the latter are not visible to the compiler.
A variable that is not modified after its initialization should be declared as const.
Except 0 (only used in comparison for checking signness/existence/emptiness) and nullptr, true, false, all other literals should only be used for variable initialization.
Use the Allman indentation style for braces in C++ code.
Put the semicolon for an empty for or while loop in a new line.
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements).
If and else should always be followed by brace-delimited statements, even if empty or a single statement.
C++ filenames should use camel case with first letter lowercase (e.g., thisIsAFilename.cpp), and all files involved in the compilation of a target must have filenames that are case-insensitive unique.
All types (including class names) are camel case with uppercase first letter (e.g., FooBarClass).
Local variables, methods, and namespaces use camel case with first letter lowercase (e.g., localFooBar).
Non-magic-number global variables that are non-static and not defined in anonymous namespace use camel case prefixed by a lower case 'g' (e.g., gDontUseGlobalFoos).
Non-magic-number global variables that are static or defined in an anonymous namespace use camel case prefixed by a lower case 's' (e.g., sMutableStaticGlobal).
Locally visible static variable uses camel case with lowercase prefix 's' as the first letter of the name (e.g., static std::once_flag sFlag;).
Class member variables use camel case prefixed with an 'm' (e.g., mNbFooValues). Public member variables do not require the 'm' prefix but it is encouraged for clarity.
Enumerations, global constants, static constants at class-scope and function-...
Files:
cpp/tensorrt_llm/batch_manager/dataTransceiver.h
**/*.{h,hpp}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.
Files:
cpp/tensorrt_llm/batch_manager/dataTransceiver.h
🧠 Learnings (2)
📚 Learning: in tensorrt_llm/executor/worker.py, the lora adapter cache optimization logic that checks `is_adapte...
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
Applied to files:
tensorrt_llm/_torch/pyexecutor/py_executor.py
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.
Applied to files:
tensorrt_llm/_torch/pyexecutor/py_executor.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pre-commit Check
🔇 Additional comments (4)
tensorrt_llm/_torch/pyexecutor/py_executor.py (4)
19-19: LGTM!The import follows Python coding guidelines by maintaining the namespace when importing.
1302-1306: LGTM!Clean wrapper implementation that properly delegates to the helper method with appropriate NVTX profiling annotation.
1308-1312: LGTM!Consistent implementation that follows the same pattern as the context transfer status wrapper.
683-683: LGTM!All direct cache transfer status calls have been properly replaced with the new wrapper methods, maintaining the existing behavior while adding structured exception handling.
Also applies to: 880-880, 1151-1151, 1252-1252, 1272-1272
| def _handle_errors(self, | ||
| error_msg: Optional[str] = None, | ||
| request_ids: Optional[List[int]] = None): | ||
| error_responses = {} | ||
| error_msg = error_msg or "error" | ||
| for request in self.active_requests: | ||
| if request_ids is not None and request.py_request_id not in request_ids: | ||
| continue | ||
| req_id = request.py_request_id | ||
| request.state = LlmRequestState.GENERATION_COMPLETE | ||
| self._terminate_request(request) | ||
| error_responses[req_id] = LlmResponse( | ||
| request_id=req_id, | ||
| error_msg=error_msg, | ||
| client_id=request.py_client_id) | ||
| self.active_requests.clear() | ||
|
|
||
| if request_ids is not None: | ||
| for req_id in request_ids: | ||
| self.active_requests.remove(req_id) | ||
| else: | ||
| self.active_requests.clear() | ||
| self._enqueue_responses(error_responses) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the request removal logic.
There's a critical bug in the request removal logic. Line 1439 attempts to remove request IDs from active_requests, but active_requests contains LlmRequest objects, not IDs.
- if request_ids is not None:
- for req_id in request_ids:
- self.active_requests.remove(req_id)
- else:
- self.active_requests.clear()
+ if request_ids is not None:
+ self.active_requests = [req for req in self.active_requests
+ if req.py_request_id not in request_ids]
+ else:
+ self.active_requests.clear()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _handle_errors(self, | |
| error_msg: Optional[str] = None, | |
| request_ids: Optional[List[int]] = None): | |
| error_responses = {} | |
| error_msg = error_msg or "error" | |
| for request in self.active_requests: | |
| if request_ids is not None and request.py_request_id not in request_ids: | |
| continue | |
| req_id = request.py_request_id | |
| request.state = LlmRequestState.GENERATION_COMPLETE | |
| self._terminate_request(request) | |
| error_responses[req_id] = LlmResponse( | |
| request_id=req_id, | |
| error_msg=error_msg, | |
| client_id=request.py_client_id) | |
| self.active_requests.clear() | |
| if request_ids is not None: | |
| for req_id in request_ids: | |
| self.active_requests.remove(req_id) | |
| else: | |
| self.active_requests.clear() | |
| self._enqueue_responses(error_responses) | |
| def _handle_errors(self, | |
| error_msg: Optional[str] = None, | |
| request_ids: Optional[List[int]] = None): | |
| error_responses = {} | |
| error_msg = error_msg or "error" | |
| for request in self.active_requests: | |
| if request_ids is not None and request.py_request_id not in request_ids: | |
| continue | |
| req_id = request.py_request_id | |
| request.state = LlmRequestState.GENERATION_COMPLETE | |
| self._terminate_request(request) | |
| error_responses[req_id] = LlmResponse( | |
| request_id=req_id, | |
| error_msg=error_msg, | |
| client_id=request.py_client_id) | |
| if request_ids is not None: | |
| self.active_requests = [ | |
| req for req in self.active_requests | |
| if req.py_request_id not in request_ids | |
| ] | |
| else: | |
| self.active_requests.clear() | |
| self._enqueue_responses(error_responses) |
🤖 Prompt for AI Agents
In tensorrt_llm/_torch/pyexecutor/py_executor.py around lines 1421 to 1442, the
code incorrectly tries to remove request IDs from the active_requests list,
which actually contains LlmRequest objects. To fix this, update the removal
logic to remove the corresponding LlmRequest objects whose py_request_id matches
the IDs in request_ids, instead of removing by ID directly. This ensures the
correct objects are removed from active_requests.
| { | ||
| public: | ||
| explicit RequestSpecificException( | ||
| char const* file, std::size_t line, char const* msg, uint64_t requestID, RequestErrorCode errorCode); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use const std::string& instead of const char* if possible.
| explicit RequestSpecificException( | ||
| char const* file, std::size_t line, char const* msg, uint64_t requestID, RequestErrorCode errorCode); | ||
|
|
||
| ~RequestSpecificException() noexcept override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please confirm the behavior of override a dtor.
Signed-off-by: Shunkang <[email protected]>
Signed-off-by: Shunkang <[email protected]>
Signed-off-by: Shunkang <[email protected]>
This reverts commit 0c284df.
Signed-off-by: Shunkang <[email protected]>
Signed-off-by: Shunkang <[email protected]>
Signed-off-by: Shunkang <[email protected]>
Signed-off-by: Shunkang <[email protected]>
a69ce96 to
d9478fd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (4)
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp (3)
29-31: Using-aliases added but not used; either leverage them for request-aware errors or remove to keep scope cleanGiven the PR’s goal (introduce RequestSpecificException), consider actually using it in this file (e.g., in the catch block) to attach an error code and correlation info, if available. If this TU isn’t the right place to emit request-aware exceptions, drop these aliases to avoid namespace pollution.
As a minimal cleanup, you can remove them now and reintroduce when needed:
-// Using declarations to shorten the code -using RequestSpecificException = tensorrt_llm::common::RequestSpecificException; -using RequestErrorCode = tensorrt_llm::common::RequestErrorCode;Would you like me to draft a conversion of the catch block to throw RequestSpecificException with an appropriate RequestErrorCode and correlation (e.g., connection/request id), consistent with your new exception model?
46-62: Avoid busy-wait spin loops; use the same callback+future pattern used elsewhere in this fileThe constructor handshake currently spins CPU with while (!req->isCompleted()) loops. Prefer the non-blocking UCX ops with a completion callback and a promise/future, mirroring your send()/recv() methods, to prevent CPU burn and improve responsiveness.
Suggested refactor:
- std::shared_ptr<ucxx::Request> recvRequest - = mEndpoint->tagRecv(reinterpret_cast<void*>(&mConnectionIdInPeer), sizeof(mConnectionIdInPeer), - ucxx::Tag(ResponserTag), ucxx::TagMaskFull); - while (!recvRequest->isCompleted()) - ; - recvRequest->checkError(); + { + std::promise<void> p; + auto f = p.get_future(); + auto cb = [&](ucs_status_t, ucxx::RequestCallbackUserData) { p.set_value(); }; + auto recvRequest = mEndpoint->tagRecv( + reinterpret_cast<void*>(&mConnectionIdInPeer), + sizeof(mConnectionIdInPeer), + ucxx::Tag(ResponserTag), + ucxx::TagMaskFull, + /* enableCallback = */ false, + cb); + if (!recvRequest->isCompleted()) + { + f.get(); + } + recvRequest->checkError(); + } - auto sendTag = ucxx::Tag(mConnectionIdInPeer << 32 | (RequesterTag & 0xFFFFFFFF)); - std::shared_ptr<ucxx::Request> sendRequest - = mEndpoint->tagSend(reinterpret_cast<void*>(&mConnectionId), sizeof(mConnectionId), sendTag); - while (!sendRequest->isCompleted()) - ; - sendRequest->checkError(); + { + auto sendTag = ucxx::Tag(mConnectionIdInPeer << 32 | (RequesterTag & 0xFFFFFFFF)); + std::promise<void> p; + auto f = p.get_future(); + auto cb = [&](ucs_status_t, ucxx::RequestCallbackUserData) { p.set_value(); }; + auto sendRequest = mEndpoint->tagSend( + reinterpret_cast<void*>(&mConnectionId), + sizeof(mConnectionId), + sendTag, + /* enableCallback = */ false, + cb); + if (!sendRequest->isCompleted()) + { + f.get(); + } + sendRequest->checkError(); + }- std::shared_ptr<ucxx::Request> sendRequest = mEndpoint->tagSend( - reinterpret_cast<void*>(&mConnectionId), sizeof(mConnectionId), ucxx::Tag(ResponserTag)); - while (!sendRequest->isCompleted()) - ; - sendRequest->checkError(); + { + std::promise<void> p; + auto f = p.get_future(); + auto cb = [&](ucs_status_t, ucxx::RequestCallbackUserData) { p.set_value(); }; + auto sendRequest = mEndpoint->tagSend( + reinterpret_cast<void*>(&mConnectionId), + sizeof(mConnectionId), + ucxx::Tag(ResponserTag), + /* enableCallback = */ false, + cb); + if (!sendRequest->isCompleted()) + { + f.get(); + } + sendRequest->checkError(); + } - auto recvTag = ucxx::Tag(mConnectionId << 32 | (RequesterTag & 0xFFFFFFFF)); - std::shared_ptr<ucxx::Request> recvRequest = mEndpoint->tagRecv( - reinterpret_cast<void*>(&mConnectionIdInPeer), sizeof(mConnectionIdInPeer), recvTag, ucxx::TagMaskFull); - while (!recvRequest->isCompleted()) - ; - recvRequest->checkError(); + { + auto recvTag = ucxx::Tag(mConnectionId << 32 | (RequesterTag & 0xFFFFFFFF)); + std::promise<void> p; + auto f = p.get_future(); + auto cb = [&](ucs_status_t, ucxx::RequestCallbackUserData) { p.set_value(); }; + auto recvRequest = mEndpoint->tagRecv( + reinterpret_cast<void*>(&mConnectionIdInPeer), + sizeof(mConnectionIdInPeer), + recvTag, + ucxx::TagMaskFull, + /* enableCallback = */ false, + cb); + if (!recvRequest->isCompleted()) + { + f.get(); + } + recvRequest->checkError(); + }Also applies to: 66-80
29-32: Replace magic literals (0xFFFFFFFF and 32) with named constexprs per style guidePer guidelines, avoid non-trivial magic literals. Define file-local constants and use them in tag computations.
namespace tensorrt_llm::executor::kv_cache { -// Using declarations to shorten the code -using RequestSpecificException = tensorrt_llm::common::RequestSpecificException; -using RequestErrorCode = tensorrt_llm::common::RequestErrorCode; +// Using declarations to shorten the code +using RequestSpecificException = tensorrt_llm::common::RequestSpecificException; +using RequestErrorCode = tensorrt_llm::common::RequestErrorCode; + +namespace { +constexpr uint64_t kTAG_MASK_32 = 0xFFFFFFFFULL; +constexpr uint64_t kTAG_SHIFT_32 = 32ULL; +} // anonymous namespace- uint64_t tag - = ((mSendTagPrefix & 0xFFFFFFFF) << 32) | static_cast<uint64_t>(batch_manager::TransceiverTag::kID_TAG); + uint64_t tag = ((mSendTagPrefix & kTAG_MASK_32) << kTAG_SHIFT_32) + | static_cast<uint64_t>(batch_manager::TransceiverTag::kID_TAG);- uint64_t sendTag = ((mSendTagPrefix & 0xFFFFFFFF) << 32) | (static_cast<uint64_t>(ctx.getTag()) & (0xFFFFFFFF)); + uint64_t sendTag = ((mSendTagPrefix & kTAG_MASK_32) << kTAG_SHIFT_32) + | (static_cast<uint64_t>(ctx.getTag()) & kTAG_MASK_32);- uint64_t recvTag = ((mRecvTagPrefix & 0xFFFFFFFF) << 32) | (static_cast<uint64_t>(ctx.getTag()) & (0xFFFFFFFF)); + uint64_t recvTag = ((mRecvTagPrefix & kTAG_MASK_32) << kTAG_SHIFT_32) + | (static_cast<uint64_t>(ctx.getTag()) & kTAG_MASK_32);Also applies to: 117-119, 149-151, 174-175
cpp/include/tensorrt_llm/executor/dataTransceiverStateUtils.h (1)
1-15: Update copyright year to reflect current year.The copyright should reflect the current year (2025) according to the coding guidelines.
Apply this diff to update the copyright header:
-/* - * Copyright (c) 2023-2024, NVIDIA CORPORATION. All rights reserved. +/* + * Copyright (c) 2023-2025, NVIDIA CORPORATION. All rights reserved.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
cpp/include/tensorrt_llm/common/tllmException.h(2 hunks)cpp/include/tensorrt_llm/executor/dataTransceiverState.h(1 hunks)cpp/include/tensorrt_llm/executor/dataTransceiverStateUtils.h(1 hunks)cpp/include/tensorrt_llm/executor/version.h(1 hunks)cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp(3 hunks)cpp/tensorrt_llm/batch_manager/dataTransceiver.h(1 hunks)cpp/tensorrt_llm/common/tllmException.cpp(1 hunks)cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- cpp/include/tensorrt_llm/executor/version.h
🚧 Files skipped from review as they are similar to previous changes (4)
- cpp/include/tensorrt_llm/common/tllmException.h
- cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
- cpp/tensorrt_llm/batch_manager/dataTransceiver.h
- cpp/tensorrt_llm/common/tllmException.cpp
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh}: In C++, close namespaces with a comment naming the namespace (e.g., } // namespace foo)
Prefer const/constexpr variables over #define for constants
Declare variables const if not modified after initialization
Use Allman brace style in C++
C++ filenames use lowerCamelCase and must be case-insensitively unique within a build target
C++ type names use UpperCamelCase
Local variables, methods, and namespaces use lowerCamelCase
Global non-static variables not in anonymous namespace use gPrefix lowerCamelCase (e.g., gExample)
Static globals or globals in anonymous namespaces use sPrefix lowerCamelCase
Locally visible static variables start with 's' (e.g., static std::once_flag sFlag;)
Member variables use mPrefix lowerCamelCase; public members may omit but are encouraged to use 'm'
Constants (enums, global/static/function-scope magic numbers) use kPREFIXED_UPPER_SNAKE (e.g., kDIGIT_NUM)
If macros are unavoidable, use UPPER_SNAKE_CASE (prefer constants over #define)
Constructor parameter that conflicts with a public member name gets trailing underscore (foo_)
Literal suffixes should be uppercase (e.g., 1234L not 1234l)
C++: use spaces only; indent 4 spaces
Run clang-format (LLVM style) before submitting; wrap lines at 120 characters
If formatting must be bypassed, use // clang-format off/on around the section
Prefer smart pointers; use unique_ptr for sole ownership, shared_ptr for shared; weak_ptr only in exceptional cases
Do not use deprecated pre-C++11 smart pointers
Use C++ style comments; avoid C comments except special inline cases; prefer // single-line
Capitalize and punctuate full-sentence comments
Follow Doxygen rules: use //! for comments and //!< for members in C++
Disable code with #if/#endif and mnemonic conditions; avoid commented-out code; avoid dead code
Do not throw exceptions across library boundaries
Use least-forceful casts; avoid removing const/volatile; avoid C-style and functional casts (except constructors); p...
Files:
cpp/include/tensorrt_llm/executor/dataTransceiverState.hcpp/include/tensorrt_llm/executor/dataTransceiverStateUtils.hcpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp
**/*.{h,hpp,hxx,hh,cuh,cpp,cxx,cc,cu}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Parameter names must be consistent between declarations and definitions
Files:
cpp/include/tensorrt_llm/executor/dataTransceiverState.hcpp/include/tensorrt_llm/executor/dataTransceiverStateUtils.hcpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp
**/*.{h,hpp,hxx,hh,cuh}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Header files must use include guards named TRTLLM__H without underscores prefix/suffix (e.g., TRTLLM_FOO_BAR_HELLO_H)
Files:
cpp/include/tensorrt_llm/executor/dataTransceiverState.hcpp/include/tensorrt_llm/executor/dataTransceiverStateUtils.h
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
Prepend NVIDIA copyright header (current year) to all source files
Files:
cpp/include/tensorrt_llm/executor/dataTransceiverState.hcpp/include/tensorrt_llm/executor/dataTransceiverStateUtils.hcpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp
**/*.{cpp,cxx,cc,cu}
📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)
**/*.{cpp,cxx,cc,cu}: Avoid literal values except for 0, nullptr, true, false; use named constexpr for other literals
Place semicolon of empty for/while loop on a new line
Always use brace-delimited bodies for switch/while/do-for/if/else
Use inline C comments in argument lists when parameter meaning is unclear (e.g., /* checkForErrors = */ false)
Do not use assignment in subexpressions (e.g., if (x = y) ... is forbidden)
Switch on enums should enumerate all values and omit default to catch new values at compile time
Structure switch statements; prohibit fallthrough except between empty cases; each case ends with break or throw; return at end of case not allowed; put break inside braces for compound case
Prefer anonymous namespaces over static for internal linkage of functions
Every defined function must be called at least once (no unused methods)
Files:
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp
🔇 Additional comments (3)
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp (1)
23-23: Include of tllmException.h is appropriate and consistentBringing in tllmException.h here is correct given the use of TLLM_THROW/TLLM_CHECK_WITH_INFO in this TU. No issues.
cpp/include/tensorrt_llm/executor/dataTransceiverState.h (1)
32-33: LGTM! Forward declaration properly placed.The forward declaration of
Serializationis correctly positioned before thekv_cachenamespace to support the friend declarations inCacheStateandCommState. This enables the serialization functionality without requiring the full definition at this point.cpp/include/tensorrt_llm/executor/dataTransceiverStateUtils.h (1)
70-70: Serialization::serialize Overload ConfirmedThe call to
Serialization::serialize(state)maps to the following declarations incpp/include/tensorrt_llm/executor/serialization.h:
static std::vector<char> serialize(DataTransceiverState const& dataTransceiverState);static void serialize(DataTransceiverState const& dataTransceiverState, std::ostream& os);No missing declarations or forward-only declarations—this overload is available and correctly declared.
| * limitations under the License. | ||
| */ | ||
|
|
||
| #pragma once |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing include guard.
Header files must use include guards named TRTLLM_<FILENAME>_H according to the coding guidelines.
Apply this diff to add the required include guard:
-#pragma once
+#ifndef TRTLLM_DATATRANSCEIVERSTATE_UTILS_H
+#define TRTLLM_DATATRANSCEIVERSTATE_UTILS_HAnd add the closing endif at the end of the file:
} // namespace tensorrt_llm::executor
+
+#endif // TRTLLM_DATATRANSCEIVERSTATE_UTILS_HCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In cpp/include/tensorrt_llm/executor/dataTransceiverStateUtils.h around line 17,
the header is missing the required include guard; add a top-level include guard
using the macro name TRTLLM_DATATRANSCEIVERSTATEUTILS_H (per guideline) and
place the corresponding #endif at the end of the file so the file is properly
protected against multiple inclusion.
| inline std::vector<char> createDataTransceiverStateSocket(std::vector<SizeType32> const& nbKvHeadsPerLayer, | ||
| SizeType32 sizePerHead, SizeType32 tokensPerBlock, SizeType32 tensorParallelism, SizeType32 pipelineParallelism, | ||
| nvinfer1::DataType dataType, std::vector<std::string> const& socketAddresses, | ||
| kv_cache::CacheState::AttentionType attentionType, int kvFactor, bool enableAttentionDP, int dpRank, int dpSize, | ||
| int rank) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider adding input validation for critical parameters.
Both functions accept multiple parameters that could be invalid (e.g., empty vectors, zero values for critical dimensions). Consider adding basic validation to prevent runtime errors downstream.
Add validation at the beginning of both functions:
inline std::vector<char> createDataTransceiverStateSocket(std::vector<SizeType32> const& nbKvHeadsPerLayer,
SizeType32 sizePerHead, SizeType32 tokensPerBlock, SizeType32 tensorParallelism, SizeType32 pipelineParallelism,
nvinfer1::DataType dataType, std::vector<std::string> const& socketAddresses,
kv_cache::CacheState::AttentionType attentionType, int kvFactor, bool enableAttentionDP, int dpRank, int dpSize,
int rank)
{
+ if (nbKvHeadsPerLayer.empty() || socketAddresses.empty() || sizePerHead == 0 || tokensPerBlock == 0)
+ {
+ throw std::invalid_argument("Invalid parameters: vectors cannot be empty and dimensions must be positive");
+ }Also applies to: 91-95
🤖 Prompt for AI Agents
In cpp/include/tensorrt_llm/executor/dataTransceiverStateUtils.h around lines
46-50 (and similarly 91-95), add input validation at the start of both
functions: check that vectors (nbKvHeadsPerLayer, socketAddresses) are not
empty, that sizePerHead, tokensPerBlock, tensorParallelism, pipelineParallelism,
kvFactor are > 0, that rank, dpRank are >= 0 and
dpSize/tensorParallelism/pipelineParallelism are >= 1 and that rank/dpRank <
total ranks (e.g., < dpSize or < tensorParallelism*pipelineParallelism as
applicable); validate that attentionType and dataType are within expected enums
if applicable. If any check fails, throw a clear std::invalid_argument with a
message indicating which parameter is invalid (or return an error code if
project style prefers), so downstream code cannot proceed with invalid
dimensions. Ensure mirror checks are added to the second function at lines 91-95
with consistent error messages.
| for (size_t i = 0; i < socketAddresses.size(); ++i) | ||
| { | ||
| kv_cache::SocketState socketState{static_cast<uint16_t>(8000 + i), socketAddresses[i]}; | ||
| socketStates.emplace_back(std::move(socketState)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential integer overflow and type safety concerns.
There are several issues with the socket state creation:
static_cast<uint16_t>(8000 + i)could overflow ifiis large- Using
size_tfor loop variable but casting touint16_t - No validation of port range limits
Apply this diff to add proper bounds checking and type safety:
for (size_t i = 0; i < socketAddresses.size(); ++i)
{
- kv_cache::SocketState socketState{static_cast<uint16_t>(8000 + i), socketAddresses[i]};
+ if (8000 + i > std::numeric_limits<uint16_t>::max())
+ {
+ throw std::runtime_error("Port number exceeds uint16_t range");
+ }
+ kv_cache::SocketState socketState{static_cast<uint16_t>(8000 + i), socketAddresses[i]};
socketStates.emplace_back(std::move(socketState));
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for (size_t i = 0; i < socketAddresses.size(); ++i) | |
| { | |
| kv_cache::SocketState socketState{static_cast<uint16_t>(8000 + i), socketAddresses[i]}; | |
| socketStates.emplace_back(std::move(socketState)); | |
| } | |
| for (size_t i = 0; i < socketAddresses.size(); ++i) | |
| { | |
| if (8000 + i > std::numeric_limits<uint16_t>::max()) | |
| { | |
| throw std::runtime_error("Port number exceeds uint16_t range"); | |
| } | |
| kv_cache::SocketState socketState{static_cast<uint16_t>(8000 + i), socketAddresses[i]}; | |
| socketStates.emplace_back(std::move(socketState)); | |
| } |
🤖 Prompt for AI Agents
In cpp/include/tensorrt_llm/executor/dataTransceiverStateUtils.h around lines
58-62, the code unsafely casts 8000 + i to uint16_t and uses size_t for the loop
without validating port range; change the loop to iterate with a well-typed
integer (e.g. uint32_t or size_t) but compute the port as a checked integer:
compute uint32_t port = basePort + static_cast<uint32_t>(i); verify port is
within the valid TCP port range (>=1 and <= 65535) and fits into uint16_t before
casting; if the check fails, handle it (throw an exception or return an error)
instead of silently wrapping; finally static_cast<uint16_t>(port) only after the
check and use that safe value when constructing kv_cache::SocketState.
| for (size_t i = 0; i < agentNames.size(); ++i) | ||
| { | ||
| agentStates.emplace_back(agentNames[i], "127.0.0.1:" + std::to_string(8000 + i)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Potential integer overflow in agent connection info.
Similar to the socket version, there's a potential overflow when converting 8000 + i to string without bounds checking.
Apply this diff to add bounds checking:
for (size_t i = 0; i < agentNames.size(); ++i)
{
+ if (8000 + i > std::numeric_limits<uint16_t>::max())
+ {
+ throw std::runtime_error("Port number exceeds uint16_t range");
+ }
agentStates.emplace_back(agentNames[i], "127.0.0.1:" + std::to_string(8000 + i));
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for (size_t i = 0; i < agentNames.size(); ++i) | |
| { | |
| agentStates.emplace_back(agentNames[i], "127.0.0.1:" + std::to_string(8000 + i)); | |
| } | |
| for (size_t i = 0; i < agentNames.size(); ++i) | |
| { | |
| if (8000 + i > std::numeric_limits<uint16_t>::max()) | |
| { | |
| throw std::runtime_error("Port number exceeds uint16_t range"); | |
| } | |
| agentStates.emplace_back( | |
| agentNames[i], | |
| "127.0.0.1:" + std::to_string(8000 + i) | |
| ); | |
| } |
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores
Description
Test Coverage
GitHub Bot Help
/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...Provide a user friendly way for developers to interact with a Jenkins server.
Run
/bot [-h|--help]to print this help message.See details below for each supported subcommand.
Details
run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]Launch build/test pipelines. All previously running jobs will be killed.
--reuse-test (optional)pipeline-id(OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.--disable-reuse-test(OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.--disable-fail-fast(OPTIONAL) : Disable fail fast on build/tests/infra failures.--skip-test(OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.--stage-list "A10-PyTorch-1, xxx"(OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.--gpu-type "A30, H100_PCIe"(OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.--test-backend "pytorch, cpp"(OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.--only-multi-gpu-test(OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.--disable-multi-gpu-test(OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.--add-multi-gpu-test(OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.--post-merge(OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx"(OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".--detailed-log(OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.--debug(OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in thestage-listparameter to access the appropriate container environment. Note: Does NOT update GitHub check status.For guidance on mapping tests to stage names, see
docs/source/reference/ci-overview.mdand the
scripts/test_to_stage_mapping.pyhelper.kill
killKill all running builds associated with pull request.
skip
skip --comment COMMENTSkip testing for latest commit on pull request.
--comment "Reason for skipping build/test"is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.reuse-pipeline
reuse-pipelineReuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.