Skip to content

Conversation

@Shunkangz
Copy link
Collaborator

@Shunkangz Shunkangz commented Jul 25, 2025

Summary by CodeRabbit

  • New Features

    • New request-specific exception with error codes exposed in C++ and Python; Python exception objects include request_id and error_code.
    • Structured error handling wrappers in the Python executor to isolate and handle errors per request.
  • Bug Fixes

    • Improved propagation of request-specific errors across network/data transmission paths.
  • Tests

    • Added unit tests covering the new exception and its Python integration.
  • Chores

    • Updated build/module initialization to include the new exception bindings.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 25, 2025

📝 Walkthrough

Walkthrough

A request-specific exception system was added: new C++ RequestSpecificException with error codes and request IDs, pybind bindings and an exceptions submodule, integration into network/transmission and batch manager paths, Python executor error-wrapping, helper utilities, version header, and unit tests.

Changes

Cohort / File(s) Change Summary
C++ Exception Infrastructure
cpp/include/tensorrt_llm/common/tllmException.h, cpp/tensorrt_llm/common/tllmException.cpp
Add RequestErrorCode enum, kUNKNOWN_REQUEST_ID, RequestSpecificException class (stores request ID and error code), accessors, destructor, and macro NEW_TLLM_REQUEST_SPECIFIC_EXCEPTION_WITH_ERROR_CODE.
Python Bindings for Exceptions
cpp/tensorrt_llm/pybind/common/tllmExceptions.h, cpp/tensorrt_llm/pybind/common/tllmExceptions.cpp, cpp/tensorrt_llm/pybind/CMakeLists.txt, cpp/tensorrt_llm/pybind/bindings.cpp
Add pybind11 bindings for RequestErrorCode and RequestSpecificException, create Python exception type with request_id and error_code attributes, register translator, add new exceptions submodule and update build sources.
Batch Manager & Transmission Integration
cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp, cpp/tensorrt_llm/batch_manager/dataTransceiver.h, cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp
Catch and propagate RequestSpecificException in requester/responder paths and promise setting; wrap TransferSession::send/recv with try-catch to translate std::exception into RequestSpecificException (network error).
Executor / State Utilities & Version
cpp/include/tensorrt_llm/executor/dataTransceiverState.h, cpp/include/tensorrt_llm/executor/dataTransceiverStateUtils.h, cpp/include/tensorrt_llm/executor/version.h
Add forward declaration for Serialization; new inline helpers to create/serialize DataTransceiverState for socket/agent setups; add kTensorRtLlmVersion constant header.
Python Executor Refactor
tensorrt_llm/_torch/pyexecutor/py_executor.py
Introduce wrapper methods _check_cache_transfer_status_helper, _check_disagg_ctx_cache_transfer_status, _check_disagg_gen_cache_transfer_status to catch RequestSpecificException and other exceptions; extend _handle_errors to accept optional request IDs.
Tests
tests/unittest/others/test_exception.py
Add tests for Python RequestSpecificException behavior: attributes, inheritance, traceback, equality, chaining, C++ translation simulation, multiple instances.

Sequence Diagram(s)

sequenceDiagram
    participant PythonUser as Python User
    participant PyModule as TRTLLM Python Module
    participant PyBind as pybind11 Bindings
    participant Cpp as C++ Layer
    participant Promise as Promise/Future

    PythonUser->>PyModule: Import exceptions submodule
    PyModule->>PyBind: init exception bindings
    PyBind->>Cpp: expose RequestSpecificException & error codes

    PythonUser->>PyModule: Call API that triggers network/op
    PyModule->>Cpp: execute operation
    Cpp-->>Cpp: throws RequestSpecificException(requestID, errorCode)
    Cpp->>Promise: set exception on promise with requestID & errorCode
    Cpp-->>PyBind: exception translated when crossing to Python
    PyBind-->>PythonUser: raise Python RequestSpecificException (with request_id, error_code)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • DomBrown
  • chzblych
  • chuangz0
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
cpp/tensorrt_llm/pybind/common/tllmExceptions.cpp (1)

84-85: Consider providing accurate file/line information for Python-created exceptions.

Using __FILE__ and __LINE__ in the convenience functions will always report the binding file location rather than where the exception was created in Python. Consider accepting file/line as optional parameters or documenting this limitation.

-        [](uint64_t request_id, tc::RequestErrorCode error_code, std::string const& message)
-        { return tc::RequestSpecificException(__FILE__, __LINE__, message.c_str(), request_id, error_code); },
-        py::arg("request_id"), py::arg("error_code"), py::arg("message"),
+        [](uint64_t request_id, tc::RequestErrorCode error_code, std::string const& message,
+           std::string const& file = __FILE__, std::size_t line = __LINE__)
+        { return tc::RequestSpecificException(file.c_str(), line, message.c_str(), request_id, error_code); },
+        py::arg("request_id"), py::arg("error_code"), py::arg("message"),
+        py::arg("file") = __FILE__, py::arg("line") = __LINE__,

Also applies to: 93-93

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 470544c and 3883557.

📒 Files selected for processing (4)
  • cpp/include/tensorrt_llm/common/tllmException.h (2 hunks)
  • cpp/tensorrt_llm/common/tllmException.cpp (1 hunks)
  • cpp/tensorrt_llm/pybind/common/tllmExceptions.cpp (1 hunks)
  • cpp/tensorrt_llm/pybind/common/tllmExceptions.h (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{cpp,h,hpp,cc,cxx}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.{cpp,h,hpp,cc,cxx}: Closing braces of namespaces should have a comment saying the namespace it closes (e.g., } // namespace foo)
Prefer const or constexpr variables over #defines whenever possible, as the latter are not visible to the compiler.
A variable that is not modified after its initialization should be declared as const.
Except 0, nullptr, true, false, all other literals should only be used for variable initialization.
Use the Allman indentation style for braces in C++ code.
Put the semicolon for an empty for or while loop in a new line.
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements).
If and else should always be followed by brace-delimited statements, even if empty or a single statement.
C++ filenames should use camel case with first letter lowercase (e.g., thisIsAFilename.cpp), and must be case-insensitive unique within a compilation target.
All types (including class names) should use camel case with uppercase first letter (e.g., FooBarClass).
Local variables, methods, and namespaces should use camel case with first letter lowercase (e.g., localFooBar).
Non-magic-number global variables that are non-static and not defined in anonymous namespace should use camel case prefixed by a lower case 'g' (e.g., gDontUseGlobalFoos).
Non-magic-number global variables that are static or defined in an anonymous namespace should use camel case prefixed by a lower case 's' (e.g., sMutableStaticGlobal).
Locally visible static variable should use camel case with lowercase prefix 's' as the first letter of the name (e.g., static std::once_flag sFlag;).
Class member variables should use camel case prefixed with an 'm' (e.g., mNbFooValues). Public member variables do not require the 'm' prefix but it is encouraged for clarity.
Enumerations, global constants, static constants at class-scope, and function-scope magic-number/literal constants are uppercase snakecase with prefix...

Files:

  • cpp/tensorrt_llm/pybind/common/tllmExceptions.h
  • cpp/tensorrt_llm/common/tllmException.cpp
  • cpp/tensorrt_llm/pybind/common/tllmExceptions.cpp
  • cpp/include/tensorrt_llm/common/tllmException.h
**/*.{h,hpp}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Files:

  • cpp/tensorrt_llm/pybind/common/tllmExceptions.h
  • cpp/include/tensorrt_llm/common/tllmException.h
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • cpp/tensorrt_llm/pybind/common/tllmExceptions.h
  • cpp/tensorrt_llm/common/tllmException.cpp
  • cpp/tensorrt_llm/pybind/common/tllmExceptions.cpp
  • cpp/include/tensorrt_llm/common/tllmException.h
🔇 Additional comments (2)
cpp/tensorrt_llm/pybind/common/tllmExceptions.h (1)

1-33: LGTM! Header file follows all coding guidelines.

The new header file is well-structured with proper copyright header, namespace closing comment, and Doxygen documentation.

cpp/tensorrt_llm/pybind/common/tllmExceptions.cpp (1)

99-99: Add namespace closing comment.

According to coding guidelines, closing braces of namespaces should have a comment.

-} // namespace tensorrt_llm::pybind::common
+} // namespace tensorrt_llm::pybind::common

Wait, the comment is already there. This appears to be correct.

Comment on lines +109 to +116
RequestSpecificException::RequestSpecificException(
char const* file, std::size_t line, char const* msg, uint64_t requestID, RequestErrorCode errorCode)
: std::runtime_error{fmtstr(
"%s (Request ID: %lu, Error Code: %u) (%s:%zu)", msg, requestID, static_cast<uint32_t>(errorCode), file, line)}
, mRequestID{requestID}
, mErrorCode{errorCode}
{
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use portable format specifiers for cross-platform compatibility.

The format specifier %lu for uint64_t is not portable across different platforms. Use the appropriate format macro from <cinttypes>.

Add the include at the top of the file:

#include <cinttypes>

Then update the format string:

-        "%s (Request ID: %lu, Error Code: %u) (%s:%zu)", msg, requestID, static_cast<uint32_t>(errorCode), file, line)}
+        "%s (Request ID: %" PRIu64 ", Error Code: %u) (%s:%zu)", msg, requestID, static_cast<uint32_t>(errorCode), file, line)}
🤖 Prompt for AI Agents
In cpp/tensorrt_llm/common/tllmException.cpp around lines 109 to 116, the format
specifier %lu used for the uint64_t requestID is not portable across platforms.
To fix this, include <cinttypes> at the top of the file and replace %lu with the
portable macro PRIu64 in the format string to ensure cross-platform
compatibility.

@Shunkangz Shunkangz force-pushed the catch_kv_cache_transfer_exception branch 3 times, most recently from 511920c to f0a2bd4 Compare July 31, 2025 03:20
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 511920c and f0a2bd4.

📒 Files selected for processing (7)
  • cpp/include/tensorrt_llm/common/tllmException.h (2 hunks)
  • cpp/tensorrt_llm/common/tllmException.cpp (1 hunks)
  • cpp/tensorrt_llm/pybind/CMakeLists.txt (1 hunks)
  • cpp/tensorrt_llm/pybind/bindings.cpp (2 hunks)
  • cpp/tensorrt_llm/pybind/common/tllmExceptions.cpp (1 hunks)
  • cpp/tensorrt_llm/pybind/common/tllmExceptions.h (1 hunks)
  • tests/unittest/others/test_exception.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • cpp/tensorrt_llm/pybind/common/tllmExceptions.h
🚧 Files skipped from review as they are similar to previous changes (5)
  • cpp/tensorrt_llm/pybind/CMakeLists.txt
  • cpp/tensorrt_llm/common/tllmException.cpp
  • cpp/include/tensorrt_llm/common/tllmException.h
  • cpp/tensorrt_llm/pybind/bindings.cpp
  • cpp/tensorrt_llm/pybind/common/tllmExceptions.cpp
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL).
Python constants should use upper snake_case (e.g., MY_CONSTANT).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for Python classes and functions, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without reflection.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tests/unittest/others/test_exception.py
**/*.{cpp,h,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tests/unittest/others/test_exception.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-07-30T06:11:42.362Z
Learning: Applies to **/*.py : When using try-except blocks in Python, limit the except to the smallest set of errors possible.
tests/unittest/others/test_exception.py (1)

Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

🔇 Additional comments (6)
tests/unittest/others/test_exception.py (6)

6-13: LGTM! Well-structured basic exception test.

The test correctly validates exception creation, catching, and message handling with appropriate assertions.


25-33: LGTM! Proper attribute existence validation.

The test correctly checks for the expected attributes with clear assertions and helpful comments about the C++ integration context.


35-45: LGTM! Comprehensive traceback validation.

The test properly validates exception traceback information using appropriate sys.exc_info() calls and meaningful assertions.


47-58: LGTM! Correct exception equality testing.

The test appropriately validates that separate exception instances are distinct objects, even with identical messages.


60-70: LGTM! Proper exception chaining validation.

The test correctly uses Python's exception chaining syntax and validates the cause attribute appropriately.


83-96: LGTM! Comprehensive multiple instance testing.

The test effectively validates creation of multiple exception instances with proper message handling and type checking.

@@ -0,0 +1,95 @@
import sys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Missing required NVIDIA copyright header.

According to the coding guidelines, all TensorRT-LLM source files including .py files must contain an NVIDIA copyright header with the current year.

Add the copyright header at the beginning of the file:

+# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
 import sys
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import sys
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
import sys
🤖 Prompt for AI Agents
In tests/unittest/others/test_exception.py at line 1, the file is missing the
required NVIDIA copyright header. Add the standard NVIDIA copyright header with
the current year at the very top of the file before any imports or code,
following the project's coding guidelines.

Comment on lines +15 to +23
def test_exception_inheritance():
"""Test that exception properly inherits from base Exception."""
try:
raise RequestSpecificException("Test inheritance")
except Exception as e: # Should catch base Exception
assert isinstance(e, RequestSpecificException)
except RequestSpecificException as e: # Should also catch specific type
assert isinstance(e, RequestSpecificException)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix unreachable except block.

The second except block will never execute because RequestSpecificException inherits from Exception, so it will always be caught by the first except block.

Apply this diff to fix the test logic:

 def test_exception_inheritance():
     """Test that exception properly inherits from base Exception."""
-    try:
-        raise RequestSpecificException("Test inheritance")
-    except Exception as e:  # Should catch base Exception
-        assert isinstance(e, RequestSpecificException)
-    except RequestSpecificException as e:  # Should also catch specific type
-        assert isinstance(e, RequestSpecificException)
+    # Test catching as base Exception
+    try:
+        raise RequestSpecificException("Test inheritance")
+    except Exception as e:
+        assert isinstance(e, RequestSpecificException)
+    
+    # Test catching as specific type
+    try:
+        raise RequestSpecificException("Test inheritance")
+    except RequestSpecificException as e:
+        assert isinstance(e, RequestSpecificException)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_exception_inheritance():
"""Test that exception properly inherits from base Exception."""
try:
raise RequestSpecificException("Test inheritance")
except Exception as e: # Should catch base Exception
assert isinstance(e, RequestSpecificException)
except RequestSpecificException as e: # Should also catch specific type
assert isinstance(e, RequestSpecificException)
def test_exception_inheritance():
"""Test that exception properly inherits from base Exception."""
# Test catching as base Exception
try:
raise RequestSpecificException("Test inheritance")
except Exception as e:
assert isinstance(e, RequestSpecificException)
# Test catching as specific type
try:
raise RequestSpecificException("Test inheritance")
except RequestSpecificException as e:
assert isinstance(e, RequestSpecificException)
🤖 Prompt for AI Agents
In tests/unittest/others/test_exception.py around lines 15 to 23, the second
except block for RequestSpecificException is unreachable because the first
except block catches all Exceptions including RequestSpecificException. To fix
this, remove the first except block catching Exception and keep only the except
block for RequestSpecificException to properly test the inheritance and catching
of the specific exception.

Comment on lines +72 to +81
def test_cpp_exception_translation():
"""Test that C++ exceptions are properly translated to Python."""
try:
# This would normally be triggered by C++ code
# For now, we'll test the Python exception creation
raise RequestSpecificException("Test C++ translation")
except RequestSpecificException as e:
# Check that the exception was properly created
assert isinstance(e, RequestSpecificException)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Test doesn't match its intended purpose.

The test name suggests it tests C++ exception translation, but it only creates and catches a Python exception. This doesn't validate the actual C++ to Python exception translation mechanism.

Consider either:

  1. Renaming the test to reflect what it actually does:
-def test_cpp_exception_translation():
-    """Test that C++ exceptions are properly translated to Python."""
+def test_python_exception_creation():
+    """Test Python exception creation and basic functionality."""
  1. Or implementing actual C++ translation testing if there's a way to trigger C++ exceptions from the test environment.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def test_cpp_exception_translation():
"""Test that C++ exceptions are properly translated to Python."""
try:
# This would normally be triggered by C++ code
# For now, we'll test the Python exception creation
raise RequestSpecificException("Test C++ translation")
except RequestSpecificException as e:
# Check that the exception was properly created
assert isinstance(e, RequestSpecificException)
def test_python_exception_creation():
"""Test Python exception creation and basic functionality."""
try:
# This would normally be triggered by C++ code
# For now, we'll test the Python exception creation
raise RequestSpecificException("Test C++ translation")
except RequestSpecificException as e:
# Check that the exception was properly created
assert isinstance(e, RequestSpecificException)
🤖 Prompt for AI Agents
In tests/unittest/others/test_exception.py around lines 72 to 81, the test named
test_cpp_exception_translation does not actually test C++ exception translation
but only raises and catches a Python exception. To fix this, either rename the
test to reflect that it only tests Python exception creation or, if possible,
modify the test to trigger a real C++ exception and verify its translation to
Python exceptions.

@Shunkangz Shunkangz force-pushed the catch_kv_cache_transfer_exception branch from f0a2bd4 to 2e12da4 Compare July 31, 2025 07:46
@coderabbitai coderabbitai bot requested a review from nv-guomingz July 31, 2025 07:46
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp (1)

154-156: Consider using a more meaningful request ID instead of -1

Both timeout exceptions use -1 as the request ID. Since this is a connection-level operation, consider extracting a meaningful request ID from the context or connection state if available, or document why -1 is appropriate here.

Also applies to: 191-193

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (1)

20-22: Consider consolidating related includes

The timeout implementation added new includes. Consider grouping them with other standard library includes for better organization.

 #include "connection.h"
 #include "tensorrt_llm/common/envUtils.h"
-#include <chrono>
 #include <string>
+#include <chrono>
 #include <thread>
 #include <unistd.h>
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f0a2bd4 and 2e12da4.

📒 Files selected for processing (14)
  • cpp/include/tensorrt_llm/common/tllmException.h (2 hunks)
  • cpp/include/tensorrt_llm/executor/cacheCommunicator.h (2 hunks)
  • cpp/tensorrt_llm/common/tllmException.cpp (1 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (3 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h (1 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.cpp (1 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.h (1 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp (4 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.h (3 hunks)
  • cpp/tensorrt_llm/pybind/CMakeLists.txt (1 hunks)
  • cpp/tensorrt_llm/pybind/bindings.cpp (2 hunks)
  • cpp/tensorrt_llm/pybind/common/tllmExceptions.cpp (1 hunks)
  • cpp/tensorrt_llm/pybind/common/tllmExceptions.h (1 hunks)
  • tests/unittest/others/test_exception.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • cpp/tensorrt_llm/pybind/CMakeLists.txt
  • cpp/tensorrt_llm/pybind/bindings.cpp
  • cpp/tensorrt_llm/common/tllmException.cpp
  • cpp/tensorrt_llm/pybind/common/tllmExceptions.h
  • cpp/tensorrt_llm/pybind/common/tllmExceptions.cpp
  • tests/unittest/others/test_exception.py
  • cpp/include/tensorrt_llm/common/tllmException.h
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{cpp,h,hpp,cc,cxx}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.{cpp,h,hpp,cc,cxx}: Closing braces of namespaces should have a comment saying the namespace it closes (e.g., } // namespace foo)
Prefer const or constexpr variables over #defines whenever possible, as the latter are not visible to the compiler.
A variable that is not modified after its initialization should be declared as const.
Except 0 (only used in comparison for checking signness/existence/emptiness) and nullptr, true, false, all other literals should only be used for variable initialization.
Use the Allman indentation style for braces.
Put the semicolon for an empty for or while loop in a new line.
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements).
If and else should always be followed by brace-delimited statements, even if empty or a single statement.
C++ filenames should use camel case with first letter lowercase (e.g., thisIsAFilename.cpp), and all files involved in the compilation of a target must have filenames that are case-insensitive unique.
All types (including class names) are camel case with uppercase first letter (e.g., FooBarClass).
Local variables, methods, and namespaces use camel case with first letter lowercase (e.g., localFooBar).
Non-magic-number global variables that are non-static and not defined in anonymous namespace use camel case prefixed by a lower case 'g' (e.g., gDontUseGlobalFoos).
Non-magic-number global variables that are static or defined in an anonymous namespace use camel case prefixed by a lower case 's' (e.g., sMutableStaticGlobal).
Locally visible static variable uses camel case with lowercase prefix 's' as the first letter of the name (e.g., static std::once_flag sFlag;).
Class member variables use camel case prefixed with an 'm' (e.g., mNbFooValues). Public member variables do not require the 'm' prefix but it is encouraged for clarity.
Enumerations, global constants, static constants at class-scope, and function-scope magic...

Files:

  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.h
  • cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.h
  • cpp/include/tensorrt_llm/executor/cacheCommunicator.h
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp
**/*.{h,hpp}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Files:

  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.h
  • cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.h
  • cpp/include/tensorrt_llm/executor/cacheCommunicator.h
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.h
  • cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
  • cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.h
  • cpp/include/tensorrt_llm/executor/cacheCommunicator.h
  • cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h
  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp
🧠 Learnings (3)
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.h (6)

Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-07-31T04:50:23.290Z
Learning: Applies to **/*.{cpp,h,hpp,cc,cxx,cu,py} : All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Learnt from: moraxu
PR: #6303
File: tests/integration/test_lists/qa/examples_test_list.txt:494-494
Timestamp: 2025-07-28T17:06:08.621Z
Learning: In TensorRT-LLM testing, it's common to have both CLI flow tests (test_cli_flow.py) and PyTorch API tests (test_llm_api_pytorch.py) for the same model. These serve different purposes: CLI flow tests validate the traditional command-line workflow, while PyTorch API tests validate the newer LLM API backend. Both are legitimate and should coexist.

Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-07-31T04:50:23.290Z
Learning: Applies to **/*.{h,hpp} : Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-07-31T04:50:23.290Z
Learning: Applies to **/*.cu : CUDA code includes code that must be compiled with a CUDA compiler, such as code using device, managed, constant, global, or kernel launching syntax.

Learnt from: CR
PR: NVIDIA/TensorRT-LLM#0
File: CODING_GUIDELINES.md:0-0
Timestamp: 2025-07-31T04:50:23.290Z
Learning: Applies to **/*.py : The code developed for TensorRT-LLM should conform to Python 3.8+.

cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

cpp/include/tensorrt_llm/executor/cacheCommunicator.h (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (7)
cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.cpp (2)

30-35: LGTM! Proper interface alignment with timeout parameter.

The implementation correctly adds the timeout parameter to maintain interface compatibility while explicitly ignoring it for MPI connections. The void cast properly suppresses unused parameter warnings and the comment clearly explains the behavior.


37-42: LGTM! Consistent timeout parameter handling.

The recv method implementation mirrors the send method approach, maintaining consistency in how the timeout parameter is handled for MPI connections.

cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.h (3)

23-23: LGTM! Required include for timeout support.

The chrono header is necessary for the std::chrono::milliseconds timeout parameter type.


32-32: LGTM! Required include for exception handling.

The tllmException header is necessary for the RequestSpecificException that will be thrown on timeout according to the implementation.


51-54: LGTM! Proper method signature updates for timeout support.

The send and recv method signatures are correctly updated with the timeout parameter, maintaining the override relationship with the base Connection interface and providing a sensible default value.

cpp/include/tensorrt_llm/executor/cacheCommunicator.h (2)

20-20: LGTM! Required include for timeout parameter type.

The chrono header is properly added to support the std::chrono::milliseconds timeout parameter in the Connection interface methods.


50-56: LGTM! Proper base interface update for timeout support.

The Connection interface methods are correctly updated with the timeout parameter as pure virtual functions. The default timeout value maintains backward compatibility while ensuring all derived implementations support the timeout parameter.

Comment on lines 99 to 113
if (timeout.count() > 0)
{
// For AgentConnection, we'll use a simple timeout check on the status
// This is a simplified approach since the underlying transfer agent may not support timeouts
auto start = std::chrono::steady_clock::now();
while (!status->isCompleted())
{
auto now = std::chrono::steady_clock::now();
if (now - start > timeout)
{
TLLM_THROW("AgentConnection::send timeout after " + std::to_string(timeout.count()) + "ms");
}
std::this_thread::sleep_for(std::chrono::milliseconds(1));
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Inefficient polling implementation for timeout handling

The current implementation uses a busy-wait loop with 1ms sleep intervals. This approach is inefficient and can cause unnecessary CPU usage and latency.

Consider using a more efficient approach. If the underlying status object supports it, use a wait_for method or condition variable. Alternatively, consider implementing timeout support in the transfer agent itself.

-        auto start = std::chrono::steady_clock::now();
-        while (!status->isCompleted())
-        {
-            auto now = std::chrono::steady_clock::now();
-            if (now - start > timeout)
-            {
-                TLLM_THROW("AgentConnection::send timeout after " + std::to_string(timeout.count()) + "ms");
-            }
-            std::this_thread::sleep_for(std::chrono::milliseconds(1));
-        }
+        // If the status object supports wait_for:
+        if (!status->wait_for(timeout))
+        {
+            throw NEW_TLLM_REQUEST_SPECIFIC_EXCEPTION_WITH_ERROR_CODE(-1, common::RequestErrorCode::NETWORK_TIMEOUT,
+                "AgentConnection::send timeout after " + std::to_string(timeout.count()) + "ms");
+        }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
between lines 99 and 113, the current timeout handling uses a busy-wait loop
with 1ms sleeps, causing inefficient CPU usage. To fix this, replace the polling
loop with a more efficient mechanism such as using a condition variable or a
wait_for method on the status object if available. If these are not supported,
consider adding timeout support directly in the transfer agent to avoid active
waiting and reduce CPU load.

auto now = std::chrono::steady_clock::now();
if (now - start > timeout)
{
TLLM_THROW("AgentConnection::send timeout after " + std::to_string(timeout.count()) + "ms");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Use consistent exception types across connection implementations

The AgentConnection uses TLLM_THROW while UcxConnection uses NEW_TLLM_REQUEST_SPECIFIC_EXCEPTION_WITH_ERROR_CODE. For consistency with the rest of the codebase and to support request-specific error handling, use the same exception type.

-                TLLM_THROW("AgentConnection::send timeout after " + std::to_string(timeout.count()) + "ms");
+                throw NEW_TLLM_REQUEST_SPECIFIC_EXCEPTION_WITH_ERROR_CODE(-1, common::RequestErrorCode::NETWORK_TIMEOUT,
+                    "AgentConnection::send timeout after " + std::to_string(timeout.count()) + "ms");
-            TLLM_THROW("AgentConnection::recv timeout after " + std::to_string(timeout.count()) + "ms");
+            throw NEW_TLLM_REQUEST_SPECIFIC_EXCEPTION_WITH_ERROR_CODE(-1, common::RequestErrorCode::NETWORK_TIMEOUT,
+                "AgentConnection::recv timeout after " + std::to_string(timeout.count()) + "ms");

Also applies to: 146-146

🤖 Prompt for AI Agents
In cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp at
lines 109 and 146, replace the use of TLLM_THROW with
NEW_TLLM_REQUEST_SPECIFIC_EXCEPTION_WITH_ERROR_CODE to maintain consistency with
other connection implementations and enable request-specific error handling.
Adjust the exception throwing code accordingly to include the error code and any
relevant request context as required by the macro.

Comment on lines 135 to 141
std::thread(
[&]()
{
mAgentConnectionManager->waitForSyncInfo(mRemoteAgentName, syncInfo);
promise.set_value();
})
.detach();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical: Detached thread continues running after timeout

The detached thread will continue running even after a timeout occurs, which can lead to:

  • Resource leaks (thread keeps running indefinitely)
  • Race conditions (thread may complete after the function returns)
  • Potential crashes if the promise is destroyed while the thread is still running

The thread should not be detached. Instead, manage its lifecycle properly:

-        std::thread(
-            [&]()
-            {
-                mAgentConnectionManager->waitForSyncInfo(mRemoteAgentName, syncInfo);
-                promise.set_value();
-            })
-            .detach();
+        auto worker = std::thread(
+            [&]()
+            {
+                try {
+                    mAgentConnectionManager->waitForSyncInfo(mRemoteAgentName, syncInfo);
+                    promise.set_value();
+                } catch (...) {
+                    promise.set_exception(std::current_exception());
+                }
+            });
+        
+        auto status = future.wait_for(timeout);
+        if (status == std::future_status::timeout)
+        {
+            // Note: Cannot safely interrupt waitForSyncInfo, so thread cleanup is problematic
+            // This design needs to be reconsidered - waitForSyncInfo should support cancellation
+            worker.detach(); // Still problematic but documented
+            throw NEW_TLLM_REQUEST_SPECIFIC_EXCEPTION_WITH_ERROR_CODE(-1, common::RequestErrorCode::NETWORK_TIMEOUT,
+                "AgentConnection::recv timeout after " + std::to_string(timeout.count()) + "ms");
+        }
+        worker.join();

However, this still has issues because waitForSyncInfo cannot be interrupted. A better solution would be to make waitForSyncInfo timeout-aware.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.cpp
around lines 135 to 141, the detached thread running waitForSyncInfo continues
after a timeout, risking resource leaks and race conditions. To fix this, do not
detach the thread; instead, keep a handle to join it properly. Additionally,
modify waitForSyncInfo to support timeout awareness so the thread can exit early
if needed, preventing it from running indefinitely.

Comment on lines 174 to 177
void send(DataContext const& ctx, void const* data, size_t size,
std::chrono::milliseconds timeout = std::chrono::milliseconds{0}) const override;
void recv(DataContext const& ctx, void* data, size_t size,
std::chrono::milliseconds timeout = std::chrono::milliseconds{0}) const override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add missing chrono include for timeout parameter.

The method signatures correctly add the timeout parameter, but the file is missing the necessary #include <chrono> header for the std::chrono::milliseconds type.

Add the missing include:

 #pragma once
 
+#include <chrono>
 #include "tensorrt_llm/batch_manager/cacheTransBuffer.h"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
void send(DataContext const& ctx, void const* data, size_t size,
std::chrono::milliseconds timeout = std::chrono::milliseconds{0}) const override;
void recv(DataContext const& ctx, void* data, size_t size,
std::chrono::milliseconds timeout = std::chrono::milliseconds{0}) const override;
#pragma once
#include <chrono>
#include "tensorrt_llm/batch_manager/cacheTransBuffer.h"
🤖 Prompt for AI Agents
In cpp/tensorrt_llm/executor/cache_transmission/agent_utils/connection.h around
lines 174 to 177, the methods use std::chrono::milliseconds for the timeout
parameter but the file lacks the necessary #include <chrono> directive. Add
#include <chrono> at the top of the file to ensure the std::chrono::milliseconds
type is recognized and the code compiles correctly.

Comment on lines 33 to 36
void send(DataContext const& ctx, void const* data, size_t size,
std::chrono::milliseconds timeout = std::chrono::milliseconds{0}) const override;
void recv(DataContext const& ctx, void* data, size_t size,
std::chrono::milliseconds timeout = std::chrono::milliseconds{0}) const override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add missing chrono include for timeout parameter.

The method signatures correctly add the timeout parameter, but the file is missing the necessary #include <chrono> header for the std::chrono::milliseconds type.

Add the missing include:

 #pragma once
 
+#include <chrono>
 #include "tensorrt_llm/common/cudaUtils.h"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
void send(DataContext const& ctx, void const* data, size_t size,
std::chrono::milliseconds timeout = std::chrono::milliseconds{0}) const override;
void recv(DataContext const& ctx, void* data, size_t size,
std::chrono::milliseconds timeout = std::chrono::milliseconds{0}) const override;
#pragma once
#include <chrono>
#include "tensorrt_llm/common/cudaUtils.h"
🤖 Prompt for AI Agents
In cpp/tensorrt_llm/executor/cache_transmission/mpi_utils/connection.h around
lines 33 to 36, the methods use std::chrono::milliseconds for the timeout
parameter but the file lacks the #include <chrono> directive. Add #include
<chrono> at the top of the file to ensure the std::chrono::milliseconds type is
recognized and the code compiles correctly.

@Shunkangz Shunkangz force-pushed the catch_kv_cache_transfer_exception branch from 2e12da4 to b4d365a Compare August 1, 2025 07:14
@Shunkangz Shunkangz marked this pull request as ready for review August 4, 2025 09:02
@Shunkangz Shunkangz changed the title Add C++ RequestSpecificException [None][feat] Add C++ RequestSpecificException Aug 4, 2025
@Shunkangz Shunkangz requested a review from a team as a code owner August 4, 2025 09:04
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (1)

18-18: Replace #pragma once with traditional header guard.

According to the coding guidelines, header files should use a preprocessor guard with the format TRTLLM_ followed by the filename in caps. Replace #pragma once with the appropriate guard.

-#pragma once
+#ifndef TRTLLM_DATATRANSCEIVER_H
+#define TRTLLM_DATATRANSCEIVER_H

And add at the end of the file before the namespace closing:

+#endif // TRTLLM_DATATRANSCEIVER_H
 } // namespace tensorrt_llm::batch_manager
🧹 Nitpick comments (3)
cpp/tensorrt_llm/batch_manager/dataTransceiver.h (2)

146-154: Consider using actual request ID when available.

The exception handling implementation is solid, but consider using the actual request ID from mRequest when available instead of always using kUNKNOWN_REQUEST_ID. This would provide more specific error context for debugging.

-            throw common::RequestSpecificException(
-                __FILE__, __LINE__, e.what(), common::kUNKNOWN_REQUEST_ID, common::RequestErrorCode::kNETWORK_ERROR);
+            auto requestId = (mRequest != nullptr) ? mRequest->getRequestId() : common::kUNKNOWN_REQUEST_ID;
+            throw common::RequestSpecificException(
+                __FILE__, __LINE__, e.what(), requestId, common::RequestErrorCode::kNETWORK_ERROR);

159-167: Consider using actual request ID when available.

Same suggestion as for the send method - consider using the actual request ID from mRequest when available instead of always using kUNKNOWN_REQUEST_ID.

-            throw common::RequestSpecificException(
-                __FILE__, __LINE__, e.what(), common::kUNKNOWN_REQUEST_ID, common::RequestErrorCode::kNETWORK_ERROR);
+            auto requestId = (mRequest != nullptr) ? mRequest->getRequestId() : common::kUNKNOWN_REQUEST_ID;
+            throw common::RequestSpecificException(
+                __FILE__, __LINE__, e.what(), requestId, common::RequestErrorCode::kNETWORK_ERROR);
tensorrt_llm/_torch/pyexecutor/py_executor.py (1)

1282-1301: Well-designed exception handling wrapper.

The helper method follows good design principles by centralizing exception handling logic and properly differentiating between request-specific and system errors. The implementation correctly handles both exception types and provides appropriate logging.

Consider adding type hints for better code documentation:

-    def _check_cache_transfer_status_helper(self,
-                                            method_name: str,
-                                            method_call,
-                                            atLeastNum: int = 0):
+    def _check_cache_transfer_status_helper(self,
+                                            method_name: str,
+                                            method_call: callable,
+                                            atLeastNum: int = 0):
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b4d365a and a69ce96.

📒 Files selected for processing (3)
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.h (1 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp (1 hunks)
  • tensorrt_llm/_torch/pyexecutor/py_executor.py (8 hunks)
✅ Files skipped from review due to trivial changes (1)
  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.py: The code developed for TensorRT-LLM should conform to Python 3.8+.
Indent Python code with 4 spaces. Do not use tabs.
Always maintain the namespace when importing in Python, even if only one class or function from a module is used.
Python filenames should use snake_case (e.g., some_file.py).
Python classes should use PascalCase (e.g., class SomeClass).
Python functions and methods should use snake_case (e.g., def my_awesome_function():).
Python local variables should use snake_case. Prefix k for variable names that start with a number (e.g., k_99th_percentile = ...).
Python global variables should use upper snake_case and prefix G (e.g., G_MY_GLOBAL = ...).
Python constants should use upper snake_case (e.g., MY_CONSTANT = ...).
Avoid shadowing variables declared in an outer scope in Python.
Initialize all externally visible members of a Python class in the constructor.
For interfaces that may be used outside a file, prefer docstrings over comments in Python.
Comments in Python should be reserved for code within a function, or interfaces that are local to a file.
Use Google style docstrings for classes and functions in Python, which can be parsed by Sphinx.
Attributes and variables in Python can be documented inline; attribute docstrings will be rendered under the docstring for the class.
Avoid using reflection in Python when functionality can be easily achieved without it.
When using try-except blocks in Python, limit the except to the smallest set of errors possible.
When using try-except blocks to handle multiple possible variable types in Python, keep the body of the try as small as possible, using the else block to implement the logic.

Files:

  • tensorrt_llm/_torch/pyexecutor/py_executor.py
**/*.{cpp,h,hpp,cc,cxx,cu,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

All TensorRT-LLM Open Source Software code should contain an NVIDIA copyright header that includes the current year. This includes .cpp, .h, .cu, .py, and any other source files which are compiled or interpreted.

Files:

  • tensorrt_llm/_torch/pyexecutor/py_executor.py
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.h
**/*.{cpp,h,hpp,cc,cxx}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.{cpp,h,hpp,cc,cxx}: Closing braces of namespaces should have a comment saying the namespace it closes (e.g., } // namespace foo)
Prefer const or constexpr variables over #defines whenever possible, as the latter are not visible to the compiler.
A variable that is not modified after its initialization should be declared as const.
Except 0 (only used in comparison for checking signness/existence/emptiness) and nullptr, true, false, all other literals should only be used for variable initialization.
Use the Allman indentation style for braces in C++ code.
Put the semicolon for an empty for or while loop in a new line.
The statement forming the body of a switch, while, do .. while or for statement shall be a compound statement (use brace-delimited statements).
If and else should always be followed by brace-delimited statements, even if empty or a single statement.
C++ filenames should use camel case with first letter lowercase (e.g., thisIsAFilename.cpp), and all files involved in the compilation of a target must have filenames that are case-insensitive unique.
All types (including class names) are camel case with uppercase first letter (e.g., FooBarClass).
Local variables, methods, and namespaces use camel case with first letter lowercase (e.g., localFooBar).
Non-magic-number global variables that are non-static and not defined in anonymous namespace use camel case prefixed by a lower case 'g' (e.g., gDontUseGlobalFoos).
Non-magic-number global variables that are static or defined in an anonymous namespace use camel case prefixed by a lower case 's' (e.g., sMutableStaticGlobal).
Locally visible static variable uses camel case with lowercase prefix 's' as the first letter of the name (e.g., static std::once_flag sFlag;).
Class member variables use camel case prefixed with an 'm' (e.g., mNbFooValues). Public member variables do not require the 'm' prefix but it is encouraged for clarity.
Enumerations, global constants, static constants at class-scope and function-...

Files:

  • cpp/tensorrt_llm/batch_manager/dataTransceiver.h
**/*.{h,hpp}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Use a preprocessor guard in header files. The guard name must have prefix TRTLLM_ followed by the filename, all in caps, and no trailing underscore.

Files:

  • cpp/tensorrt_llm/batch_manager/dataTransceiver.h
🧠 Learnings (2)
📚 Learning: in tensorrt_llm/executor/worker.py, the lora adapter cache optimization logic that checks `is_adapte...
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.402Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

Applied to files:

  • tensorrt_llm/_torch/pyexecutor/py_executor.py
📚 Learning: in tensorrt-llm, examples directory can have different dependency versions than the root requirement...
Learnt from: yibinl-nvidia
PR: NVIDIA/TensorRT-LLM#6506
File: examples/models/core/mixtral/requirements.txt:3-3
Timestamp: 2025-08-01T15:14:45.673Z
Learning: In TensorRT-LLM, examples directory can have different dependency versions than the root requirements.txt file. Version conflicts between root and examples dependencies are acceptable because examples are designed to be standalone and self-contained.

Applied to files:

  • tensorrt_llm/_torch/pyexecutor/py_executor.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (4)
tensorrt_llm/_torch/pyexecutor/py_executor.py (4)

19-19: LGTM!

The import follows Python coding guidelines by maintaining the namespace when importing.


1302-1306: LGTM!

Clean wrapper implementation that properly delegates to the helper method with appropriate NVTX profiling annotation.


1308-1312: LGTM!

Consistent implementation that follows the same pattern as the context transfer status wrapper.


683-683: LGTM!

All direct cache transfer status calls have been properly replaced with the new wrapper methods, maintaining the existing behavior while adding structured exception handling.

Also applies to: 880-880, 1151-1151, 1252-1252, 1272-1272

Comment on lines 1421 to 1557
def _handle_errors(self,
error_msg: Optional[str] = None,
request_ids: Optional[List[int]] = None):
error_responses = {}
error_msg = error_msg or "error"
for request in self.active_requests:
if request_ids is not None and request.py_request_id not in request_ids:
continue
req_id = request.py_request_id
request.state = LlmRequestState.GENERATION_COMPLETE
self._terminate_request(request)
error_responses[req_id] = LlmResponse(
request_id=req_id,
error_msg=error_msg,
client_id=request.py_client_id)
self.active_requests.clear()

if request_ids is not None:
for req_id in request_ids:
self.active_requests.remove(req_id)
else:
self.active_requests.clear()
self._enqueue_responses(error_responses)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix the request removal logic.

There's a critical bug in the request removal logic. Line 1439 attempts to remove request IDs from active_requests, but active_requests contains LlmRequest objects, not IDs.

-        if request_ids is not None:
-            for req_id in request_ids:
-                self.active_requests.remove(req_id)
-        else:
-            self.active_requests.clear()
+        if request_ids is not None:
+            self.active_requests = [req for req in self.active_requests 
+                                  if req.py_request_id not in request_ids]
+        else:
+            self.active_requests.clear()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _handle_errors(self,
error_msg: Optional[str] = None,
request_ids: Optional[List[int]] = None):
error_responses = {}
error_msg = error_msg or "error"
for request in self.active_requests:
if request_ids is not None and request.py_request_id not in request_ids:
continue
req_id = request.py_request_id
request.state = LlmRequestState.GENERATION_COMPLETE
self._terminate_request(request)
error_responses[req_id] = LlmResponse(
request_id=req_id,
error_msg=error_msg,
client_id=request.py_client_id)
self.active_requests.clear()
if request_ids is not None:
for req_id in request_ids:
self.active_requests.remove(req_id)
else:
self.active_requests.clear()
self._enqueue_responses(error_responses)
def _handle_errors(self,
error_msg: Optional[str] = None,
request_ids: Optional[List[int]] = None):
error_responses = {}
error_msg = error_msg or "error"
for request in self.active_requests:
if request_ids is not None and request.py_request_id not in request_ids:
continue
req_id = request.py_request_id
request.state = LlmRequestState.GENERATION_COMPLETE
self._terminate_request(request)
error_responses[req_id] = LlmResponse(
request_id=req_id,
error_msg=error_msg,
client_id=request.py_client_id)
if request_ids is not None:
self.active_requests = [
req for req in self.active_requests
if req.py_request_id not in request_ids
]
else:
self.active_requests.clear()
self._enqueue_responses(error_responses)
🤖 Prompt for AI Agents
In tensorrt_llm/_torch/pyexecutor/py_executor.py around lines 1421 to 1442, the
code incorrectly tries to remove request IDs from the active_requests list,
which actually contains LlmRequest objects. To fix this, update the removal
logic to remove the corresponding LlmRequest objects whose py_request_id matches
the IDs in request_ids, instead of removing by ID directly. This ensures the
correct objects are removed from active_requests.

{
public:
explicit RequestSpecificException(
char const* file, std::size_t line, char const* msg, uint64_t requestID, RequestErrorCode errorCode);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use const std::string& instead of const char* if possible.

explicit RequestSpecificException(
char const* file, std::size_t line, char const* msg, uint64_t requestID, RequestErrorCode errorCode);

~RequestSpecificException() noexcept override;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm the behavior of override a dtor.

@Shunkangz Shunkangz force-pushed the catch_kv_cache_transfer_exception branch from a69ce96 to d9478fd Compare August 13, 2025 09:42
@Shunkangz Shunkangz requested a review from a team as a code owner August 13, 2025 09:42
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp (3)

29-31: Using-aliases added but not used; either leverage them for request-aware errors or remove to keep scope clean

Given the PR’s goal (introduce RequestSpecificException), consider actually using it in this file (e.g., in the catch block) to attach an error code and correlation info, if available. If this TU isn’t the right place to emit request-aware exceptions, drop these aliases to avoid namespace pollution.

As a minimal cleanup, you can remove them now and reintroduce when needed:

-// Using declarations to shorten the code
-using RequestSpecificException = tensorrt_llm::common::RequestSpecificException;
-using RequestErrorCode = tensorrt_llm::common::RequestErrorCode;

Would you like me to draft a conversion of the catch block to throw RequestSpecificException with an appropriate RequestErrorCode and correlation (e.g., connection/request id), consistent with your new exception model?


46-62: Avoid busy-wait spin loops; use the same callback+future pattern used elsewhere in this file

The constructor handshake currently spins CPU with while (!req->isCompleted()) loops. Prefer the non-blocking UCX ops with a completion callback and a promise/future, mirroring your send()/recv() methods, to prevent CPU burn and improve responsiveness.

Suggested refactor:

-            std::shared_ptr<ucxx::Request> recvRequest
-                = mEndpoint->tagRecv(reinterpret_cast<void*>(&mConnectionIdInPeer), sizeof(mConnectionIdInPeer),
-                    ucxx::Tag(ResponserTag), ucxx::TagMaskFull);
-            while (!recvRequest->isCompleted())
-                ;
-            recvRequest->checkError();
+            {
+                std::promise<void> p;
+                auto f = p.get_future();
+                auto cb = [&](ucs_status_t, ucxx::RequestCallbackUserData) { p.set_value(); };
+                auto recvRequest = mEndpoint->tagRecv(
+                    reinterpret_cast<void*>(&mConnectionIdInPeer),
+                    sizeof(mConnectionIdInPeer),
+                    ucxx::Tag(ResponserTag),
+                    ucxx::TagMaskFull,
+                    /* enableCallback = */ false,
+                    cb);
+                if (!recvRequest->isCompleted())
+                {
+                    f.get();
+                }
+                recvRequest->checkError();
+            }
 
-            auto sendTag = ucxx::Tag(mConnectionIdInPeer << 32 | (RequesterTag & 0xFFFFFFFF));
-            std::shared_ptr<ucxx::Request> sendRequest
-                = mEndpoint->tagSend(reinterpret_cast<void*>(&mConnectionId), sizeof(mConnectionId), sendTag);
-            while (!sendRequest->isCompleted())
-                ;
-            sendRequest->checkError();
+            {
+                auto sendTag = ucxx::Tag(mConnectionIdInPeer << 32 | (RequesterTag & 0xFFFFFFFF));
+                std::promise<void> p;
+                auto f = p.get_future();
+                auto cb = [&](ucs_status_t, ucxx::RequestCallbackUserData) { p.set_value(); };
+                auto sendRequest = mEndpoint->tagSend(
+                    reinterpret_cast<void*>(&mConnectionId),
+                    sizeof(mConnectionId),
+                    sendTag,
+                    /* enableCallback = */ false,
+                    cb);
+                if (!sendRequest->isCompleted())
+                {
+                    f.get();
+                }
+                sendRequest->checkError();
+            }
-            std::shared_ptr<ucxx::Request> sendRequest = mEndpoint->tagSend(
-                reinterpret_cast<void*>(&mConnectionId), sizeof(mConnectionId), ucxx::Tag(ResponserTag));
-            while (!sendRequest->isCompleted())
-                ;
-            sendRequest->checkError();
+            {
+                std::promise<void> p;
+                auto f = p.get_future();
+                auto cb = [&](ucs_status_t, ucxx::RequestCallbackUserData) { p.set_value(); };
+                auto sendRequest = mEndpoint->tagSend(
+                    reinterpret_cast<void*>(&mConnectionId),
+                    sizeof(mConnectionId),
+                    ucxx::Tag(ResponserTag),
+                    /* enableCallback = */ false,
+                    cb);
+                if (!sendRequest->isCompleted())
+                {
+                    f.get();
+                }
+                sendRequest->checkError();
+            }
 
-            auto recvTag = ucxx::Tag(mConnectionId << 32 | (RequesterTag & 0xFFFFFFFF));
-            std::shared_ptr<ucxx::Request> recvRequest = mEndpoint->tagRecv(
-                reinterpret_cast<void*>(&mConnectionIdInPeer), sizeof(mConnectionIdInPeer), recvTag, ucxx::TagMaskFull);
-            while (!recvRequest->isCompleted())
-                ;
-            recvRequest->checkError();
+            {
+                auto recvTag = ucxx::Tag(mConnectionId << 32 | (RequesterTag & 0xFFFFFFFF));
+                std::promise<void> p;
+                auto f = p.get_future();
+                auto cb = [&](ucs_status_t, ucxx::RequestCallbackUserData) { p.set_value(); };
+                auto recvRequest = mEndpoint->tagRecv(
+                    reinterpret_cast<void*>(&mConnectionIdInPeer),
+                    sizeof(mConnectionIdInPeer),
+                    recvTag,
+                    ucxx::TagMaskFull,
+                    /* enableCallback = */ false,
+                    cb);
+                if (!recvRequest->isCompleted())
+                {
+                    f.get();
+                }
+                recvRequest->checkError();
+            }

Also applies to: 66-80


29-32: Replace magic literals (0xFFFFFFFF and 32) with named constexprs per style guide

Per guidelines, avoid non-trivial magic literals. Define file-local constants and use them in tag computations.

 namespace tensorrt_llm::executor::kv_cache
 {
 
-// Using declarations to shorten the code
-using RequestSpecificException = tensorrt_llm::common::RequestSpecificException;
-using RequestErrorCode = tensorrt_llm::common::RequestErrorCode;
+// Using declarations to shorten the code
+using RequestSpecificException = tensorrt_llm::common::RequestSpecificException;
+using RequestErrorCode = tensorrt_llm::common::RequestErrorCode;
+
+namespace {
+constexpr uint64_t kTAG_MASK_32 = 0xFFFFFFFFULL;
+constexpr uint64_t kTAG_SHIFT_32 = 32ULL;
+} // anonymous namespace
-    uint64_t tag
-        = ((mSendTagPrefix & 0xFFFFFFFF) << 32) | static_cast<uint64_t>(batch_manager::TransceiverTag::kID_TAG);
+    uint64_t tag = ((mSendTagPrefix & kTAG_MASK_32) << kTAG_SHIFT_32)
+        | static_cast<uint64_t>(batch_manager::TransceiverTag::kID_TAG);
-    uint64_t sendTag = ((mSendTagPrefix & 0xFFFFFFFF) << 32) | (static_cast<uint64_t>(ctx.getTag()) & (0xFFFFFFFF));
+    uint64_t sendTag = ((mSendTagPrefix & kTAG_MASK_32) << kTAG_SHIFT_32)
+        | (static_cast<uint64_t>(ctx.getTag()) & kTAG_MASK_32);
-    uint64_t recvTag = ((mRecvTagPrefix & 0xFFFFFFFF) << 32) | (static_cast<uint64_t>(ctx.getTag()) & (0xFFFFFFFF));
+    uint64_t recvTag = ((mRecvTagPrefix & kTAG_MASK_32) << kTAG_SHIFT_32)
+        | (static_cast<uint64_t>(ctx.getTag()) & kTAG_MASK_32);

Also applies to: 117-119, 149-151, 174-175

cpp/include/tensorrt_llm/executor/dataTransceiverStateUtils.h (1)

1-15: Update copyright year to reflect current year.

The copyright should reflect the current year (2025) according to the coding guidelines.

Apply this diff to update the copyright header:

-/*
- * Copyright (c) 2023-2024, NVIDIA CORPORATION.  All rights reserved.
+/*
+ * Copyright (c) 2023-2025, NVIDIA CORPORATION.  All rights reserved.
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a69ce96 and d9478fd.

📒 Files selected for processing (8)
  • cpp/include/tensorrt_llm/common/tllmException.h (2 hunks)
  • cpp/include/tensorrt_llm/executor/dataTransceiverState.h (1 hunks)
  • cpp/include/tensorrt_llm/executor/dataTransceiverStateUtils.h (1 hunks)
  • cpp/include/tensorrt_llm/executor/version.h (1 hunks)
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp (3 hunks)
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.h (1 hunks)
  • cpp/tensorrt_llm/common/tllmException.cpp (1 hunks)
  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • cpp/include/tensorrt_llm/executor/version.h
🚧 Files skipped from review as they are similar to previous changes (4)
  • cpp/include/tensorrt_llm/common/tllmException.h
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.cpp
  • cpp/tensorrt_llm/batch_manager/dataTransceiver.h
  • cpp/tensorrt_llm/common/tllmException.cpp
🧰 Additional context used
📓 Path-based instructions (5)
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh}: In C++, close namespaces with a comment naming the namespace (e.g., } // namespace foo)
Prefer const/constexpr variables over #define for constants
Declare variables const if not modified after initialization
Use Allman brace style in C++
C++ filenames use lowerCamelCase and must be case-insensitively unique within a build target
C++ type names use UpperCamelCase
Local variables, methods, and namespaces use lowerCamelCase
Global non-static variables not in anonymous namespace use gPrefix lowerCamelCase (e.g., gExample)
Static globals or globals in anonymous namespaces use sPrefix lowerCamelCase
Locally visible static variables start with 's' (e.g., static std::once_flag sFlag;)
Member variables use mPrefix lowerCamelCase; public members may omit but are encouraged to use 'm'
Constants (enums, global/static/function-scope magic numbers) use kPREFIXED_UPPER_SNAKE (e.g., kDIGIT_NUM)
If macros are unavoidable, use UPPER_SNAKE_CASE (prefer constants over #define)
Constructor parameter that conflicts with a public member name gets trailing underscore (foo_)
Literal suffixes should be uppercase (e.g., 1234L not 1234l)
C++: use spaces only; indent 4 spaces
Run clang-format (LLVM style) before submitting; wrap lines at 120 characters
If formatting must be bypassed, use // clang-format off/on around the section
Prefer smart pointers; use unique_ptr for sole ownership, shared_ptr for shared; weak_ptr only in exceptional cases
Do not use deprecated pre-C++11 smart pointers
Use C++ style comments; avoid C comments except special inline cases; prefer // single-line
Capitalize and punctuate full-sentence comments
Follow Doxygen rules: use //! for comments and //!< for members in C++
Disable code with #if/#endif and mnemonic conditions; avoid commented-out code; avoid dead code
Do not throw exceptions across library boundaries
Use least-forceful casts; avoid removing const/volatile; avoid C-style and functional casts (except constructors); p...

Files:

  • cpp/include/tensorrt_llm/executor/dataTransceiverState.h
  • cpp/include/tensorrt_llm/executor/dataTransceiverStateUtils.h
  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp
**/*.{h,hpp,hxx,hh,cuh,cpp,cxx,cc,cu}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Parameter names must be consistent between declarations and definitions

Files:

  • cpp/include/tensorrt_llm/executor/dataTransceiverState.h
  • cpp/include/tensorrt_llm/executor/dataTransceiverStateUtils.h
  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp
**/*.{h,hpp,hxx,hh,cuh}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Header files must use include guards named TRTLLM__H without underscores prefix/suffix (e.g., TRTLLM_FOO_BAR_HELLO_H)

Files:

  • cpp/include/tensorrt_llm/executor/dataTransceiverState.h
  • cpp/include/tensorrt_llm/executor/dataTransceiverStateUtils.h
**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh,py}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

Prepend NVIDIA copyright header (current year) to all source files

Files:

  • cpp/include/tensorrt_llm/executor/dataTransceiverState.h
  • cpp/include/tensorrt_llm/executor/dataTransceiverStateUtils.h
  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp
**/*.{cpp,cxx,cc,cu}

📄 CodeRabbit Inference Engine (CODING_GUIDELINES.md)

**/*.{cpp,cxx,cc,cu}: Avoid literal values except for 0, nullptr, true, false; use named constexpr for other literals
Place semicolon of empty for/while loop on a new line
Always use brace-delimited bodies for switch/while/do-for/if/else
Use inline C comments in argument lists when parameter meaning is unclear (e.g., /* checkForErrors = */ false)
Do not use assignment in subexpressions (e.g., if (x = y) ... is forbidden)
Switch on enums should enumerate all values and omit default to catch new values at compile time
Structure switch statements; prohibit fallthrough except between empty cases; each case ends with break or throw; return at end of case not allowed; put break inside braces for compound case
Prefer anonymous namespaces over static for internal linkage of functions
Every defined function must be called at least once (no unused methods)

Files:

  • cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp
🔇 Additional comments (3)
cpp/tensorrt_llm/executor/cache_transmission/ucx_utils/connection.cpp (1)

23-23: Include of tllmException.h is appropriate and consistent

Bringing in tllmException.h here is correct given the use of TLLM_THROW/TLLM_CHECK_WITH_INFO in this TU. No issues.

cpp/include/tensorrt_llm/executor/dataTransceiverState.h (1)

32-33: LGTM! Forward declaration properly placed.

The forward declaration of Serialization is correctly positioned before the kv_cache namespace to support the friend declarations in CacheState and CommState. This enables the serialization functionality without requiring the full definition at this point.

cpp/include/tensorrt_llm/executor/dataTransceiverStateUtils.h (1)

70-70: Serialization::serialize Overload Confirmed

The call to Serialization::serialize(state) maps to the following declarations in cpp/include/tensorrt_llm/executor/serialization.h:

  • static std::vector<char> serialize(DataTransceiverState const& dataTransceiverState);
  • static void serialize(DataTransceiverState const& dataTransceiverState, std::ostream& os);

No missing declarations or forward-only declarations—this overload is available and correctly declared.

* limitations under the License.
*/

#pragma once
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Missing include guard.

Header files must use include guards named TRTLLM_<FILENAME>_H according to the coding guidelines.

Apply this diff to add the required include guard:

-#pragma once
+#ifndef TRTLLM_DATATRANSCEIVERSTATE_UTILS_H
+#define TRTLLM_DATATRANSCEIVERSTATE_UTILS_H

And add the closing endif at the end of the file:

 } // namespace tensorrt_llm::executor
+
+#endif // TRTLLM_DATATRANSCEIVERSTATE_UTILS_H

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In cpp/include/tensorrt_llm/executor/dataTransceiverStateUtils.h around line 17,
the header is missing the required include guard; add a top-level include guard
using the macro name TRTLLM_DATATRANSCEIVERSTATEUTILS_H (per guideline) and
place the corresponding #endif at the end of the file so the file is properly
protected against multiple inclusion.

Comment on lines +46 to +50
inline std::vector<char> createDataTransceiverStateSocket(std::vector<SizeType32> const& nbKvHeadsPerLayer,
SizeType32 sizePerHead, SizeType32 tokensPerBlock, SizeType32 tensorParallelism, SizeType32 pipelineParallelism,
nvinfer1::DataType dataType, std::vector<std::string> const& socketAddresses,
kv_cache::CacheState::AttentionType attentionType, int kvFactor, bool enableAttentionDP, int dpRank, int dpSize,
int rank)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding input validation for critical parameters.

Both functions accept multiple parameters that could be invalid (e.g., empty vectors, zero values for critical dimensions). Consider adding basic validation to prevent runtime errors downstream.

Add validation at the beginning of both functions:

 inline std::vector<char> createDataTransceiverStateSocket(std::vector<SizeType32> const& nbKvHeadsPerLayer,
     SizeType32 sizePerHead, SizeType32 tokensPerBlock, SizeType32 tensorParallelism, SizeType32 pipelineParallelism,
     nvinfer1::DataType dataType, std::vector<std::string> const& socketAddresses,
     kv_cache::CacheState::AttentionType attentionType, int kvFactor, bool enableAttentionDP, int dpRank, int dpSize,
     int rank)
 {
+    if (nbKvHeadsPerLayer.empty() || socketAddresses.empty() || sizePerHead == 0 || tokensPerBlock == 0)
+    {
+        throw std::invalid_argument("Invalid parameters: vectors cannot be empty and dimensions must be positive");
+    }

Also applies to: 91-95

🤖 Prompt for AI Agents
In cpp/include/tensorrt_llm/executor/dataTransceiverStateUtils.h around lines
46-50 (and similarly 91-95), add input validation at the start of both
functions: check that vectors (nbKvHeadsPerLayer, socketAddresses) are not
empty, that sizePerHead, tokensPerBlock, tensorParallelism, pipelineParallelism,
kvFactor are > 0, that rank, dpRank are >= 0 and
dpSize/tensorParallelism/pipelineParallelism are >= 1 and that rank/dpRank <
total ranks (e.g., < dpSize or < tensorParallelism*pipelineParallelism as
applicable); validate that attentionType and dataType are within expected enums
if applicable. If any check fails, throw a clear std::invalid_argument with a
message indicating which parameter is invalid (or return an error code if
project style prefers), so downstream code cannot proceed with invalid
dimensions. Ensure mirror checks are added to the second function at lines 91-95
with consistent error messages.

Comment on lines +58 to +62
for (size_t i = 0; i < socketAddresses.size(); ++i)
{
kv_cache::SocketState socketState{static_cast<uint16_t>(8000 + i), socketAddresses[i]};
socketStates.emplace_back(std::move(socketState));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential integer overflow and type safety concerns.

There are several issues with the socket state creation:

  1. static_cast<uint16_t>(8000 + i) could overflow if i is large
  2. Using size_t for loop variable but casting to uint16_t
  3. No validation of port range limits

Apply this diff to add proper bounds checking and type safety:

     for (size_t i = 0; i < socketAddresses.size(); ++i)
     {
-        kv_cache::SocketState socketState{static_cast<uint16_t>(8000 + i), socketAddresses[i]};
+        if (8000 + i > std::numeric_limits<uint16_t>::max())
+        {
+            throw std::runtime_error("Port number exceeds uint16_t range");
+        }
+        kv_cache::SocketState socketState{static_cast<uint16_t>(8000 + i), socketAddresses[i]};
         socketStates.emplace_back(std::move(socketState));
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for (size_t i = 0; i < socketAddresses.size(); ++i)
{
kv_cache::SocketState socketState{static_cast<uint16_t>(8000 + i), socketAddresses[i]};
socketStates.emplace_back(std::move(socketState));
}
for (size_t i = 0; i < socketAddresses.size(); ++i)
{
if (8000 + i > std::numeric_limits<uint16_t>::max())
{
throw std::runtime_error("Port number exceeds uint16_t range");
}
kv_cache::SocketState socketState{static_cast<uint16_t>(8000 + i), socketAddresses[i]};
socketStates.emplace_back(std::move(socketState));
}
🤖 Prompt for AI Agents
In cpp/include/tensorrt_llm/executor/dataTransceiverStateUtils.h around lines
58-62, the code unsafely casts 8000 + i to uint16_t and uses size_t for the loop
without validating port range; change the loop to iterate with a well-typed
integer (e.g. uint32_t or size_t) but compute the port as a checked integer:
compute uint32_t port = basePort + static_cast<uint32_t>(i); verify port is
within the valid TCP port range (>=1 and <= 65535) and fits into uint16_t before
casting; if the check fails, handle it (throw an exception or return an error)
instead of silently wrapping; finally static_cast<uint16_t>(port) only after the
check and use that safe value when constructing kv_cache::SocketState.

Comment on lines +103 to +106
for (size_t i = 0; i < agentNames.size(); ++i)
{
agentStates.emplace_back(agentNames[i], "127.0.0.1:" + std::to_string(8000 + i));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Potential integer overflow in agent connection info.

Similar to the socket version, there's a potential overflow when converting 8000 + i to string without bounds checking.

Apply this diff to add bounds checking:

     for (size_t i = 0; i < agentNames.size(); ++i)
     {
+        if (8000 + i > std::numeric_limits<uint16_t>::max())
+        {
+            throw std::runtime_error("Port number exceeds uint16_t range");
+        }
         agentStates.emplace_back(agentNames[i], "127.0.0.1:" + std::to_string(8000 + i));
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for (size_t i = 0; i < agentNames.size(); ++i)
{
agentStates.emplace_back(agentNames[i], "127.0.0.1:" + std::to_string(8000 + i));
}
for (size_t i = 0; i < agentNames.size(); ++i)
{
if (8000 + i > std::numeric_limits<uint16_t>::max())
{
throw std::runtime_error("Port number exceeds uint16_t range");
}
agentStates.emplace_back(
agentNames[i],
"127.0.0.1:" + std::to_string(8000 + i)
);
}

@Shunkangz Shunkangz closed this Aug 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants