Skip to content

Conversation

@Linda-Stadter
Copy link
Collaborator

@Linda-Stadter Linda-Stadter commented Jul 20, 2025

Do not merge!

This draft sets nanobind as the default binding library and executes all tests in the CI pipeline with nanobind.

Summary by CodeRabbit

  • New Features

    • Improved handling of optional boolean parameters in Python bindings, allowing explicit use of None for more flexible configuration.
    • Enhanced conversion of backend types in configuration, ensuring proper enum mapping for backend selection.
  • Bug Fixes

    • Corrected the processing of logprobs parameters to ensure consistent boolean handling in sampling requests.
  • Chores

    • Updated the default binding type for build scripts to use "nanobind" instead of "pybind".
    • Refined exception translation logic in Python bindings for better error handling.

Description

Test Coverage

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 20, 2025

Walkthrough

The changes update exception handling in nanobind bindings, revise constructor argument handling for OutputConfig, switch the build script's default binding type to nanobind, ensure backend string conversion to enum in cache transceiver config, and enforce boolean conversion for logprobs in OpenAI protocol sampling parameter construction.

Changes

File(s) Change Summary
cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp Refactored exception registration: exceptions now stored as static nb::object variables for translation.
cpp/tensorrt_llm/nanobind/executor/request.cpp Changed OutputConfig constructor binding to use optional bools (default None) and value_or(false) logic.
scripts/build_wheel.py Changed default binding_type from "pybind" to "nanobind" in main and CLI argument parser.
tensorrt_llm/llmapi/llm_args.py Added import and converted backend string to enum in CacheTransceiverConfig._to_pybind method.
tensorrt_llm/serve/openai_protocol.py Changed logprobs handling to ensure _return_log_probs is always a boolean in sampling param methods.

Sequence Diagram(s)

sequenceDiagram
    participant Python
    participant OutputConfig
    Python->>OutputConfig: __init__(optional<bool>..., optional<vector>)
    OutputConfig-->>OutputConfig: Convert each optional<bool> to bool using value_or(false)
    OutputConfig-->>Python: Constructed OutputConfig instance
Loading
sequenceDiagram
    participant CacheTransceiverConfig
    participant _CacheTransceiverBackendType
    CacheTransceiverConfig->>_CacheTransceiverBackendType: from_string(self.backend)
    _CacheTransceiverBackendType-->>CacheTransceiverConfig: Enum value
    CacheTransceiverConfig-->>pybind: Pass enum value (not string)
Loading
sequenceDiagram
    participant CompletionRequest
    participant SamplingParams
    CompletionRequest->>CompletionRequest: to_sampling_params()
    CompletionRequest-->>SamplingParams: Set _return_log_probs = bool(self.logprobs)
    SamplingParams-->>CompletionRequest: SamplingParams instance
Loading

Estimated code review effort

2 (~15 minutes)

Suggested reviewers

  • DomBrown
  • chzblych

Poem

The bindings now are neat and kind,
Optional bools with peace of mind.
Exceptions wrapped, no longer stray,
Enum conversions lead the way.
Logprobs now are truly true—
Hopping forward, code anew!
🐇✨


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fda758a and 28ff348.

📒 Files selected for processing (3)
  • cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (2 hunks)
  • cpp/tensorrt_llm/nanobind/executor/request.cpp (1 hunks)
  • scripts/build_wheel.py (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
  • scripts/build_wheel.py
  • cpp/tensorrt_llm/nanobind/executor/request.cpp
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Linda-Stadter
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12383 [ run ] triggered by Bot

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Nitpick comments (14)
cpp/tensorrt_llm/nanobind/executor/request.h (1)

20-27: Consider forward-declaring nanobind::module_ instead of including the full header

#include <nanobind/nanobind.h> pulls in a fairly large header and propagates it to every TU that includes request.h, which can noticeably impact build times.
A lightweight alternative is to forward-declare nanobind::module_ and use the fully-qualified name in the declaration; the implementation file can keep the heavy include.

-#include <nanobind/nanobind.h>
-namespace nb = nanobind;
-
-namespace tensorrt_llm::nanobind::executor
-{
-// Register bindings for executor API.
-void initRequestBindings(nb::module_& m);
+namespace nanobind { class module_; }
+
+namespace tensorrt_llm::nanobind::executor
+{
+// Register bindings for executor API.
+void initRequestBindings(nanobind::module_& m);
 }

This avoids polluting including translation units with the nb alias and trims compile dependencies.

cpp/tensorrt_llm/nanobind/userbuffers/bindings.h (1)

20-29: Same compile-time concern: replace heavy include with forward declaration

For consistency and faster incremental builds, forward-declare nanobind::module_ and drop the header/alias here too.

-#include <nanobind/nanobind.h>
-namespace nb = nanobind;
+namespace nanobind { class module_; }

Then update the signature:

-static void initBindings(nb::module_& m);
+static void initBindings(nanobind::module_& m);
cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h (1)

18-28: Reduce header fan-out by forward-declaring nanobind::module_

As with the other binding headers, swap the full include for a forward declaration to keep compile units slim.

-#include <nanobind/nanobind.h>
-namespace nb = nanobind;
+namespace nanobind { class module_; }

And adjust the method parameter accordingly.

cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.h (1)

20-28: Lightweight forward declaration recommended

Follow the same pattern to avoid repeatedly including Nanobind’s heavy header from public headers.

-#include <nanobind/nanobind.h>
-namespace nb = nanobind;
+namespace nanobind { class module_; }

This keeps compile times predictable when the testing headers are transitively included.

cpp/tensorrt_llm/nanobind/batch_manager/algorithms.h (1)

18-28: Apply the forward-declaration pattern here as well

Consistently forward-declaring nanobind::module_ across all binding headers will maximise the compile-time benefit and avoid leaking the nb alias into every translation unit.

-#include <nanobind/nanobind.h>
-namespace nb = nanobind;
+namespace nanobind { class module_; }

Then use nanobind::module_ in initBindings.

cpp/tensorrt_llm/nanobind/common/bindTypes.h (1)

44-65: Solid pickling implementation with minor suggestion.

The serialization logic correctly preserves set contents through vector conversion and reconstruction. The validation and placement new usage are appropriate.

Consider using a more descriptive error message:

-                    throw std::runtime_error("Invalid state!");
+                    throw std::runtime_error("Invalid state: expected tuple with 1 element for set deserialization");
tests/unittest/bindings/test_executor_bindings.py (2)

1204-1211: Remove commented debug code

The commented debugging code for checking picklability should be removed to keep the test clean.

-# for prop in ['is_final', 'output_token_ids', 'cum_log_probs', 'log_probs', 'context_logits', 'generation_logits', 'encoder_output', 'finish_reasons', 'sequence_index', 'is_sequence_final', 'decoding_iter', 'context_phase_params', 'request_perf_metrics']:
-#     try:
-#         value = getattr(result, prop)
-#         pickle.dumps(value)
-#         print(f"✓ {prop} pickles successfully")
-#     except Exception as e:
-#         print(f"✗ {prop} fails: {e}")
-

1923-1929: Consider removing or making debug prints conditional

Debug print statements have been added to the test. While useful during development, they may clutter test output in CI/CD pipelines.

Consider either removing the debug prints or making them conditional based on an environment variable:

-for i, (req_id, client_id, logits) in enumerate(
-        zip(req_id_batch, client_id_batch, logits_batch)):
-    print(f"DEBUG: Request {i}: req_id={req_id}, client_id={client_id}")
-    print(
-        f"DEBUG: Request {i}: logits dtype={logits.dtype}, shape={logits.shape}, device={logits.device}"
-    )
+for i, (req_id, client_id, logits) in enumerate(
+        zip(req_id_batch, client_id_batch, logits_batch)):
+    if os.environ.get('DEBUG_TESTS'):
+        print(f"DEBUG: Request {i}: req_id={req_id}, client_id={client_id}")
+        print(
+            f"DEBUG: Request {i}: logits dtype={logits.dtype}, shape={logits.shape}, device={logits.device}"
+        )
cpp/tensorrt_llm/nanobind/executor/bindings.cpp (1)

248-253: Consider handling fractional milliseconds or documenting the precision loss.

The conversion from double to int64_t in line 252 will truncate fractional milliseconds. If sub-millisecond precision is needed, consider using std::chrono::duration<double, std::milli> instead.

-                if (timeout_ms)
-                {
-                    return self.getLatestEvents(std::chrono::milliseconds(static_cast<int64_t>(*timeout_ms)));
-                }
+                if (timeout_ms)
+                {
+                    // Note: Fractional milliseconds are truncated
+                    return self.getLatestEvents(std::chrono::milliseconds(static_cast<int64_t>(*timeout_ms)));
+                }
cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.cpp (1)

44-44: Complete the enum value documentation.

The documentation for CUM_LOG_PROBS is incomplete.

-        .value("CUM_LOG_PROBS", OutputContentType::kCUM_LOG_PROBS, "Cumulative Log");
+        .value("CUM_LOG_PROBS", OutputContentType::kCUM_LOG_PROBS, "Cumulative Log Probs");
cpp/tensorrt_llm/nanobind/common/customCasters.h (2)

44-47: Update comment to reference nanobind instead of pybind.

The comment references "Pybind" but this is a nanobind implementation. Update the comment to avoid confusion.

-// Pybind requires to have a central include in order for type casters to work.
+// Nanobind requires to have a central include in order for type casters to work.
 // Opaque bindings add a type caster, so they have the same requirement.
-// See the warning in https://pybind11.readthedocs.io/en/stable/advanced/cast/custom.html
+// See the nanobind documentation on custom type casters.

280-291: Document that this caster copies referenced objects.

The from_cpp method creates copies of all referenced objects rather than maintaining references. This changes the semantics but may be necessary for Python interop.

Add a comment explaining this behavior:

 static handle from_cpp(VectorType const& src, rv_policy policy, cleanup_list* cleanup) noexcept
 {
-
+    // Note: We copy the referenced objects to avoid lifetime issues when passing to Python
     std::vector<T> result;
cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (1)

254-331: Consider refactoring the 49-parameter constructor for better maintainability.

The constructor has 49 parameters, which makes it difficult to maintain and use correctly.

Consider using a builder pattern or configuration struct to group related parameters. This would improve readability and reduce the chance of parameter ordering errors. For example:

  • Group tensor-related parameters into a TensorConfig struct
  • Group LoRA-related parameters into a LoraConfig struct
  • Group multimodal parameters into a MultimodalConfig struct
cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (1)

584-584: Naming inconsistency between parameter and property methods.

The parameter mm_embedding_offloading (line 584) maps to getPromptTableOffloading/setPromptTableOffloading methods (lines 631-632). Consider using a consistent name to avoid confusion.

Either rename the parameter to match the methods:

-    nb::arg("gather_generation_logits") = false, nb::arg("mm_embedding_offloading") = false,
+    nb::arg("gather_generation_logits") = false, nb::arg("prompt_table_offloading") = false,

Or use an alias in the property definition to maintain backward compatibility while clarifying the mapping.

Also applies to: 631-632

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 98428f3 and 7ede3a9.

📒 Files selected for processing (47)
  • cpp/CMakeLists.txt (2 hunks)
  • cpp/tensorrt_llm/nanobind/CMakeLists.txt (2 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp (1 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/algorithms.h (1 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (1 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/bindings.h (1 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp (1 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h (1 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (1 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h (1 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.cpp (1 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.h (1 hunks)
  • cpp/tensorrt_llm/nanobind/bindings.cpp (2 hunks)
  • cpp/tensorrt_llm/nanobind/common/bindTypes.h (1 hunks)
  • cpp/tensorrt_llm/nanobind/common/customCasters.h (1 hunks)
  • cpp/tensorrt_llm/nanobind/executor/bindings.cpp (1 hunks)
  • cpp/tensorrt_llm/nanobind/executor/bindings.h (1 hunks)
  • cpp/tensorrt_llm/nanobind/executor/executor.cpp (1 hunks)
  • cpp/tensorrt_llm/nanobind/executor/executor.h (1 hunks)
  • cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (1 hunks)
  • cpp/tensorrt_llm/nanobind/executor/executorConfig.h (1 hunks)
  • cpp/tensorrt_llm/nanobind/executor/request.cpp (1 hunks)
  • cpp/tensorrt_llm/nanobind/executor/request.h (1 hunks)
  • cpp/tensorrt_llm/nanobind/runtime/bindings.cpp (1 hunks)
  • cpp/tensorrt_llm/nanobind/runtime/bindings.h (1 hunks)
  • cpp/tensorrt_llm/nanobind/runtime/moeBindings.cpp (1 hunks)
  • cpp/tensorrt_llm/nanobind/runtime/moeBindings.h (1 hunks)
  • cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.cpp (1 hunks)
  • cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.h (1 hunks)
  • cpp/tensorrt_llm/nanobind/userbuffers/bindings.cpp (1 hunks)
  • cpp/tensorrt_llm/nanobind/userbuffers/bindings.h (1 hunks)
  • cpp/tensorrt_llm/pybind/bindings.cpp (2 hunks)
  • cpp/tensorrt_llm/pybind/executor/bindings.cpp (1 hunks)
  • cpp/tensorrt_llm/pybind/executor/executorConfig.cpp (1 hunks)
  • examples/models/core/llama/summarize_long.py (1 hunks)
  • examples/models/core/qwen2audio/run.py (1 hunks)
  • examples/models/core/qwenvl/run.py (1 hunks)
  • jenkins/Build.groovy (4 hunks)
  • jenkins/L0_Test.groovy (3 hunks)
  • scripts/build_wheel.py (2 hunks)
  • tensorrt_llm/builder.py (1 hunks)
  • tensorrt_llm/commands/build.py (2 hunks)
  • tensorrt_llm/llmapi/llm_args.py (2 hunks)
  • tensorrt_llm/runtime/model_runner.py (1 hunks)
  • tests/integration/test_lists/test-db/l0_a10.yml (1 hunks)
  • tests/unittest/bindings/test_bindings_ut.py (0 hunks)
  • tests/unittest/bindings/test_executor_bindings.py (7 hunks)
💤 Files with no reviewable changes (1)
  • tests/unittest/bindings/test_bindings_ut.py
🧰 Additional context used
🧠 Learnings (5)
cpp/tensorrt_llm/pybind/executor/executorConfig.cpp (1)
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.374Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
tests/integration/test_lists/test-db/l0_a10.yml (1)
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.374Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
tensorrt_llm/llmapi/llm_args.py (1)
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.374Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp (1)
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.374Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (1)
Learnt from: amitz-nv
PR: NVIDIA/TensorRT-LLM#5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.374Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks `is_adapter_in_cpu_cache()` and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.
🧬 Code Graph Analysis (7)
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h (3)
cpp/tensorrt_llm/nanobind/batch_manager/bindings.h (1)
  • tensorrt_llm (23-28)
cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h (1)
  • tensorrt_llm (22-29)
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (4)
  • initBindings (293-460)
  • initBindings (293-293)
  • initBindings (462-490)
  • initBindings (462-462)
cpp/tensorrt_llm/nanobind/batch_manager/algorithms.h (11)
cpp/tensorrt_llm/nanobind/batch_manager/bindings.h (1)
  • tensorrt_llm (23-28)
cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.h (1)
  • tensorrt_llm (22-29)
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h (2)
  • tensorrt_llm (23-30)
  • tensorrt_llm (32-39)
cpp/tensorrt_llm/nanobind/executor/bindings.h (1)
  • tensorrt_llm (23-29)
cpp/tensorrt_llm/nanobind/runtime/bindings.h (1)
  • tensorrt_llm (24-30)
cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.h (1)
  • tensorrt_llm (24-29)
cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (4)
  • initBindings (293-460)
  • initBindings (293-293)
  • initBindings (462-490)
  • initBindings (462-462)
cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (2)
  • initBindings (55-523)
  • initBindings (55-55)
cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp (2)
  • initBindings (55-178)
  • initBindings (55-55)
cpp/tensorrt_llm/nanobind/executor/bindings.cpp (2)
  • initBindings (48-261)
  • initBindings (48-48)
cpp/tensorrt_llm/nanobind/runtime/bindings.cpp (2)
  • initBindings (102-362)
  • initBindings (102-102)
cpp/tensorrt_llm/pybind/bindings.cpp (1)
cpp/include/tensorrt_llm/runtime/modelConfig.h (1)
  • KVCacheTypeFromString (76-94)
cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.cpp (2)
cpp/include/tensorrt_llm/runtime/modelConfig.h (3)
  • useGptAttentionPlugin (293-296)
  • setKVCacheType (586-589)
  • useLoraPlugin (540-543)
cpp/include/tensorrt_llm/batch_manager/llmRequest.h (1)
  • setDraftTokens (1093-1096)
cpp/tensorrt_llm/nanobind/runtime/moeBindings.cpp (1)
cpp/tensorrt_llm/nanobind/executor/executor.cpp (2)
  • shutdown (179-188)
  • shutdown (179-179)
cpp/tensorrt_llm/nanobind/runtime/bindings.cpp (4)
cpp/include/tensorrt_llm/runtime/request.h (1)
  • generatedTokensPerEngineStep (50-50)
cpp/tensorrt_llm/batch_manager/runtimeBuffers.cpp (2)
  • create (74-212)
  • create (74-78)
cpp/include/tensorrt_llm/runtime/cudaStream.h (1)
  • synchronize (81-96)
cpp/tensorrt_llm/nanobind/runtime/moeBindings.cpp (2)
  • initMoeBindings (49-122)
  • initMoeBindings (49-49)
cpp/tensorrt_llm/nanobind/executor/executorConfig.cpp (2)
cpp/include/tensorrt_llm/runtime/decodingInput.h (1)
  • sinkTokenLength (47-47)
tensorrt_llm/llmapi/llm_args.py (3)
  • SchedulerConfig (606-624)
  • PeftCacheConfig (628-690)
  • ExtendedRuntimePerfKnobConfig (836-862)
🪛 Ruff (0.12.2)
tests/unittest/bindings/test_executor_bindings.py

97-97: Local variable executor is assigned to but never used

Remove assignment to unused variable executor

(F841)


1203-1203: Line too long (260 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (52)
cpp/tensorrt_llm/pybind/executor/executorConfig.cpp (1)

339-339: LGTM! Correct type casting for pickle deserialization.

The fix ensures cuda_graph_cache_size is properly cast to SizeType32 during deserialization, which aligns with the getter method and property definition.

examples/models/core/qwen2audio/run.py (1)

125-126: LGTM! Good refactoring for enum parsing consistency.

The change from direct KVCacheType construction to using KVCacheType.from_string() improves API consistency and provides better error handling while maintaining the same fallback behavior.

cpp/CMakeLists.txt (2)

202-202: LGTM! Proper support for dual binding architecture.

The condition extension to include BUILD_DEEP_EP ensures pybind11 is available for deep execution path features regardless of the primary binding type, supporting the nanobind integration strategy.


221-221: LGTM! Maintains consistency with subdirectory inclusion logic.

This change ensures pybind11 headers are available when BUILD_DEEP_EP is enabled, maintaining consistency with the subdirectory addition on line 202.

cpp/tensorrt_llm/nanobind/executor/bindings.h (1)

1-30: LGTM! Well-designed header following best practices.

The header demonstrates good design with proper licensing, clean namespace organization, minimal includes, and a focused API declaration. Fits well with the modular nanobind binding architecture.

cpp/tensorrt_llm/nanobind/runtime/moeBindings.h (1)

1-30: LGTM! Consistent design with good modular separation.

The header follows the same excellent patterns as other nanobind headers with proper licensing, clean namespace organization, and focused API. The separation of MOE bindings maintains good modularity in the runtime bindings architecture.

examples/models/core/qwenvl/run.py (1)

121-122: LGTM! Proper enum parsing standardization.

The change to use KVCacheType.from_string() instead of direct constructor aligns with the codebase-wide standardization effort for enum parsing and provides better validation.

scripts/build_wheel.py (2)

301-301: LGTM! Strategic shift to nanobind as default.

The change to default to nanobind instead of pybind aligns with the PR's nanobind integration focus and represents a strategic shift toward nanobind as the preferred Python binding mechanism.


864-864: LGTM! Consistent default value alignment.

The argument parser default correctly matches the function parameter default, ensuring consistency in the command-line interface.

tensorrt_llm/commands/build.py (2)

41-55: Excellent enum parsing helper implementation.

The enum_type helper function is well-designed with:

  • Support for both enum instances and string values
  • Proper use of from_string method for validation
  • Clear error messages listing valid values
  • Reusable pattern for other enum types

This improves the robustness of command-line argument parsing.


151-151: LGTM! Proper integration of enum helper.

The use of enum_type(KVCacheType) provides better validation and error handling for the --kv_cache_type argument compared to direct enum usage.

tests/integration/test_lists/test-db/l0_a10.yml (1)

193-207: LGTM! Appropriate nanobind test configuration.

The new l0_a10_nanobind test entry is well-configured:

  • Targets appropriate hardware (A10 GPUs with single GPU setup)
  • Uses correct platform constraints (Ubuntu Linux)
  • Focuses on pre_merge testing for rapid CI feedback
  • Tests the relevant unittest/bindings suite for nanobind validation

This supports the PR's nanobind integration testing objectives effectively.

examples/models/core/llama/summarize_long.py (1)

100-100: LGTM! Consistent enum parsing standardization.

The change to use KVCacheType.from_string() instead of direct constructor follows the same standardization pattern applied across the codebase and provides better validation for string-to-enum conversion.

tensorrt_llm/builder.py (1)

596-596: LGTM: Standardized enum parsing approach

The change from direct KVCacheType() construction to KVCacheType.from_string() improves enum handling by using an explicit parsing method. This aligns with the broader codebase effort to standardize enum string parsing and supports the nanobind integration.

cpp/tensorrt_llm/nanobind/executor/executorConfig.h (1)

1-31: LGTM: Well-structured header file

The header file follows proper C++ conventions with:

  • Correct copyright header
  • Proper include guard using #pragma once
  • Clean namespace organization
  • Clear function declaration for nanobind binding initialization
cpp/tensorrt_llm/nanobind/runtime/bindings.h (1)

1-31: LGTM: Properly structured header with dual initialization pattern

The header file follows excellent C++ conventions and the dual function approach (initBindings and initBindingsEarly) suggests a thoughtful two-phase initialization pattern for complex runtime bindings.

cpp/tensorrt_llm/nanobind/batch_manager/bindings.h (1)

1-29: LGTM: Consistent and well-structured binding header

The header file maintains consistency with other nanobind binding headers in the codebase, following proper C++ conventions with correct copyright, include guards, and clear namespace organization.

tensorrt_llm/llmapi/llm_args.py (2)

33-33: LGTM! Import addition follows established patterns.

The import of CacheTransceiverBackendType is properly aliased and follows the naming convention used by other executor binding imports in this file.


895-895: LGTM! Enum conversion follows established patterns.

The conversion from string to enum using from_string() method is consistent with the pattern used throughout the codebase for type-safe enum handling in the pybind layer. This change improves robustness by ensuring proper enum conversion rather than passing raw strings.

tensorrt_llm/runtime/model_runner.py (1)

89-89: LGTM! Standardizes KVCacheType parsing with from_string method.

This change aligns with the codebase-wide standardization effort to use KVCacheType.from_string() for string-to-enum conversion, ensuring consistent and robust parsing throughout the TensorRT LLM codebase.

cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.h (1)

1-40: LGTM! Well-structured nanobind binding header.

The header follows standard patterns for nanobind bindings with:

  • Proper copyright and license headers
  • Correct include guard with #pragma once
  • Appropriate namespace organization
  • Consistent class structure with static initBindings methods
  • Matches the pattern established in other nanobind binding headers
cpp/tensorrt_llm/pybind/executor/bindings.cpp (1)

247-257: LGTM! Improves API clarity and type safety.

The lambda wrapper enhances the binding by:

  • Making the timeout unit explicit (timeout_ms vs timeout)
  • Providing type safety with proper conversion from double to int64_t for std::chrono::milliseconds
  • Correctly handling both timeout provided and nullopt cases
  • Aligning with the nanobind binding patterns used elsewhere in the codebase
cpp/tensorrt_llm/nanobind/CMakeLists.txt (3)

6-21: LGTM! Comprehensive expansion of nanobind source files.

The expanded source list properly includes all the new nanobind binding files across:

  • Batch manager components (algorithms, cache management, etc.)
  • Executor bindings
  • Runtime bindings including MOE support
  • Testing and user buffer bindings

This aligns with the PR's goal of providing comprehensive Python bindings via nanobind.


32-35: LGTM! Appropriate conditional NVSHMEM linking.

The conditional linking of NVSHMEM libraries when ENABLE_NVSHMEM is set is the correct approach for handling this optional dependency.


37-55: LGTM! Updated linking configuration supports expanded functionality.

The updates appropriately support the expanded nanobind bindings:

  • Addition of ${CUDA_NVML_LIB} for GPU memory management
  • CUDA toolkit stubs rpath for proper runtime linking on Linux
  • Standardized error message compilation flag
cpp/tensorrt_llm/pybind/bindings.cpp (2)

173-173: LGTM! Standardizes KVCacheType binding with from_string method.

This change aligns with the codebase-wide effort to standardize KVCacheType string-to-enum conversion using the dedicated from_string static method, ensuring consistency across both pybind and nanobind bindings.


358-361: LGTM! Improves error handling robustness.

Replacing the debug-only assertion with a runtime exception improves the robustness of the SamplingConfig deserialization by ensuring proper error handling in both debug and release builds.

cpp/tensorrt_llm/nanobind/common/bindTypes.h (3)

1-22: LGTM - proper header structure.

The copyright header and includes are well-structured and include only necessary dependencies.


24-31: Good template design.

The namespace organization and function signature follow nanobind conventions appropriately.


32-43: Excellent container binding implementation.

The binding correctly exposes standard set operations and Python special methods. The use of overload_cast for erase properly disambiguates method overloads, and the keep_alive<0, 1>() annotation on the iterator ensures memory safety.

jenkins/L0_Test.groovy (4)

67-69: Good configuration addition.

The NANOBIND_CONFIG field follows the established pattern for build configuration constants.


77-77: Consistent configuration entry.

The nanobind entry in BUILD_CONFIGS follows the established pattern with an appropriate tarball filename.


1749-1749: Well-structured test configuration.

The "A10-Nanobind" test stage configuration follows the established format and uses the appropriate test list l0_a10_nanobind.


1826-1828: Consistent configuration logic.

The nanobind detection logic follows the established pattern for other build configurations and is properly integrated into the existing conditional structure.

cpp/tensorrt_llm/nanobind/userbuffers/bindings.cpp (4)

1-26: Well-structured header and includes.

The copyright header, includes, and namespace aliases are appropriate and follow good practices.


32-36: Safe and well-designed class binding.

The UBBuffer binding correctly exposes read-only properties and safely converts the address pointer to intptr_t for Python interaction. This prevents unsafe pointer manipulation from Python while still providing access to the address value.


38-44: Consistent and safe function bindings.

The function bindings properly handle address conversions between intptr_t and void*, maintaining consistency with the UBBuffer class binding. The parameter and return type handling is appropriate.


45-47: Complete and proper binding implementation.

The final function binding and namespace closure are correctly implemented.

jenkins/Build.groovy (3)

50-54: LGTM: Configuration constants are properly defined.

The new nanobind configuration constants follow the established naming convention and are correctly annotated with @Field for pipeline shared variables.


65-69: LGTM: Nanobind build configurations are well-structured.

The new nanobind configurations correctly:

  • Add --binding_type nanobind parameter for nanobind-specific builds
  • Use distinct tarball names with "nanobind-" prefix to avoid conflicts
  • Inherit appropriate cmake variables and wheel architectures from their vanilla counterparts
  • Maintain consistency between x86_64 and aarch64 variants

Also applies to: 85-89


542-543: LGTM: Nanobind build stage is properly integrated.

The new "Build TRT-LLM Nanobind" stage correctly:

  • Uses conditional logic to select appropriate architecture-specific nanobind configuration
  • Follows the established pattern of existing parallel build stages
  • Integrates seamlessly with the existing prepareLLMBuild function
tests/unittest/bindings/test_executor_bindings.py (2)

1276-1276: API update looks correct

The change from datetime.timedelta(seconds=1) to 1000 milliseconds aligns with the updated nanobind binding API.


1518-1520: Good use of enum type

Using trtllm.CacheTransceiverBackendType.UCX instead of the string "UCX" improves type safety and API consistency.

cpp/tensorrt_llm/nanobind/batch_manager/algorithms.cpp (1)

55-178: Well-structured nanobind implementation

The nanobind bindings for batch manager algorithms are well-implemented with:

  • Consistent binding patterns across all algorithm classes
  • Proper handling of optional parameters with std::nullopt defaults
  • Appropriate tensor conversions between PyTorch and TensorRT-LLM formats
  • Clean structured return types for Python consumption
cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.cpp (1)

40-131: Comprehensive and well-designed conversion implementation

The LlmRequest conversion implementation is excellent:

  • Clean helper functions for optional tensor conversion
  • Proper CUDA stream unwrapping in callback adapter
  • Deep copies of token vectors with appropriate shared_ptr usage
  • All 49 parameters properly handled with clear comments
  • Consistent use of from_torch() for tensor conversions
cpp/tensorrt_llm/nanobind/batch_manager/llmRequest.h (1)

35-38: Good documentation of the nanobind limitation workaround.

The comment clearly explains why c10::Stream is used instead of c10::cuda::CUDAStream and provides a reference to the test example.

cpp/tensorrt_llm/nanobind/testing/modelSpecBinding.cpp (1)

46-85: Well-structured ModelSpec bindings.

The bindings correctly use reference_internal policy for method chaining and provide a proper copy implementation.

cpp/tensorrt_llm/nanobind/batch_manager/cacheTransceiver.cpp (1)

77-104: Well-implemented cache transceiver bindings.

The trampoline class correctly implements all pure virtual methods, and the bindings properly expose the C++ API to Python with appropriate naming conventions.

cpp/tensorrt_llm/nanobind/executor/executor.h (1)

30-127: Well-designed executor wrapper with proper GIL handling.

The Executor class provides a comprehensive Python binding interface with:

  • Multiple constructor overloads for different initialization scenarios
  • Proper GIL release in blocking operations (awaitResponses methods)
  • Clean RAII with unique_ptr for the underlying executor
  • Appropriate context manager protocol support
cpp/tensorrt_llm/nanobind/bindings.cpp (1)

79-79: Consider the implications of disabling leak warnings.

Disabling nanobind leak warnings might hide memory management issues. Consider enabling them during development/testing and only disabling in production if necessary.

Is there a specific reason for disabling leak warnings? If it's due to known false positives, consider documenting the rationale.

cpp/tensorrt_llm/nanobind/executor/executor.cpp (2)

179-188: Good practice: GIL release during shutdown

The implementation correctly releases the Python GIL before calling the executor shutdown, allowing spawned threads to make forward progress. The logging statements help with debugging.


158-164: Fix incorrect data pointer assignment for encoder buffer

The encoder data pointer is incorrectly assigned from the decoder buffer.

Apply this fix:

 uint8_t const* encoderData = reinterpret_cast<uint8_t const*>(encoderEngineBuffer.data());
 size_t encoderSize = encoderEngineBuffer.size();
-uint8_t const* decoderData = reinterpret_cast<uint8_t const*>(decoderEngineBuffer.data());
+uint8_t const* decoderData = reinterpret_cast<uint8_t const*>(decoderEngineBuffer.data());
 size_t decoderSize = decoderEngineBuffer.size();

Likely an incorrect or invalid review comment.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12383 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9202 completed with status: 'FAILURE'

@Linda-Stadter
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12390 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12390 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9209 completed with status: 'FAILURE'

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
cpp/tensorrt_llm/nanobind/executor/request.cpp (1)

902-908: Make parameter const reference for better const-correctness.

The function doesn't modify the input bytes, so the parameter should be const reference.

 m.def("deserialize_result",
-    [](nb::bytes& x)
+    [](nb::bytes const& x)
     {
         std::string str(x.c_str(), x.size());
         std::istringstream is(str);
         return tle::serialize_utils::deserialize<tle::Result>(is);
     });
♻️ Duplicate comments (4)
cpp/tensorrt_llm/nanobind/executor/executor.cpp (1)

98-112: Add validation for numpy array memory layout and null pointer check

The previous review comment about validating numpy array contiguity and adding null pointer checks is still applicable to this refactored code.

tests/unittest/bindings/test_executor_bindings.py (1)

70-101: Add assertions to validate the executor creation

The test creates an executor with managed weights but doesn't verify that it was created successfully. The executor variable is assigned but never used.

cpp/tensorrt_llm/nanobind/executor/request.cpp (2)

87-87: Fix undefined behavior in setstate methods.

All setstate methods use placement new without first calling the destructor, causing undefined behavior that could lead to resource leaks.

Also applies to: 187-188, 208-210, 270-271, 289-290, 309-310, 329-329, 347-348, 377-379, 393-396, 454-457, 461-463, 516-518, 552-553, 583-603, 749-755, 780-782, 803-804, 827-832, 858-872, 919-920


608-683: Constructor has too many parameters.

The Request constructor with 38 parameters is difficult to use and maintain. Consider using a builder pattern or configuration object.

🧹 Nitpick comments (3)
tests/unittest/bindings/test_executor_bindings.py (3)

1204-1211: Remove commented debug code

This commented debug code should be removed as it's not needed in the test.

-    # for prop in ['is_final', 'output_token_ids', 'cum_log_probs', 'log_probs', 'context_logits', 'generation_logits', 'encoder_output', 'finish_reasons', 'sequence_index', 'is_sequence_final', 'decoding_iter', 'context_phase_params', 'request_perf_metrics']:
-    #     try:
-    #         value = getattr(result, prop)
-    #         pickle.dumps(value)
-    #         print(f"✓ {prop} pickles successfully")
-    #     except Exception as e:
-    #         print(f"✗ {prop} fails: {e}")
-

1923-1930: Consider removing debug print statements

Debug print statements should be removed or converted to proper logging if they need to be retained for diagnostic purposes.

     def logits_post_processor_batched(
             req_id_batch: tp.List[int], logits_batch: tp.List[torch.Tensor],
             ids_batch: tp.List[tp.List[tp.List[int]]], stream_ptr: int,
             client_id_batch: tp.List[tp.Optional[int]]):
         for i, (req_id, client_id, logits) in enumerate(
                 zip(req_id_batch, client_id_batch, logits_batch)):
-            print(f"DEBUG: Request {i}: req_id={req_id}, client_id={client_id}")
-            print(
-                f"DEBUG: Request {i}: logits dtype={logits.dtype}, shape={logits.shape}, device={logits.device}"
-            )
             assert client_id == 123

1199-1200: Format long line for better readability

Line 1200 exceeds the maximum line length limit.

-    result.context_phase_params = trtllm.ContextPhaseParams([1, 2], 123, None,
-                                                            None)
+    result.context_phase_params = trtllm.ContextPhaseParams(
+        [1, 2], 123, None, None)
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7ede3a9 and 58bb0a4.

📒 Files selected for processing (12)
  • cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp (2 hunks)
  • cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp (2 hunks)
  • cpp/tensorrt_llm/nanobind/bindings.cpp (2 hunks)
  • cpp/tensorrt_llm/nanobind/common/bindTypes.h (2 hunks)
  • cpp/tensorrt_llm/nanobind/common/customCasters.h (3 hunks)
  • cpp/tensorrt_llm/nanobind/executor/executor.cpp (2 hunks)
  • cpp/tensorrt_llm/nanobind/executor/request.cpp (8 hunks)
  • cpp/tensorrt_llm/pybind/bindings.cpp (1 hunks)
  • scripts/build_wheel.py (2 hunks)
  • tensorrt_llm/llmapi/llm_args.py (2 hunks)
  • tests/unittest/bindings/test_bindings_ut.py (0 hunks)
  • tests/unittest/bindings/test_executor_bindings.py (5 hunks)
🧠 Learnings (2)
cpp/tensorrt_llm/nanobind/executor/executor.cpp (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.374Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

tests/unittest/bindings/test_executor_bindings.py (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.374Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

🪛 Ruff (0.12.2)
tests/unittest/bindings/test_executor_bindings.py

97-97: Local variable executor is assigned to but never used

Remove assignment to unused variable executor

(F841)


1203-1203: Line too long (260 > 120)

(E501)

💤 Files with no reviewable changes (1)
  • tests/unittest/bindings/test_bindings_ut.py
✅ Files skipped from review due to trivial changes (3)
  • scripts/build_wheel.py
  • cpp/tensorrt_llm/nanobind/batch_manager/bindings.cpp
  • cpp/tensorrt_llm/pybind/bindings.cpp
🚧 Files skipped from review as they are similar to previous changes (5)
  • cpp/tensorrt_llm/nanobind/bindings.cpp
  • cpp/tensorrt_llm/nanobind/common/bindTypes.h
  • cpp/tensorrt_llm/nanobind/common/customCasters.h
  • cpp/tensorrt_llm/nanobind/batch_manager/kvCacheManager.cpp
  • tensorrt_llm/llmapi/llm_args.py
🧰 Additional context used
🧠 Learnings (2)
cpp/tensorrt_llm/nanobind/executor/executor.cpp (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.374Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

tests/unittest/bindings/test_executor_bindings.py (1)

Learnt from: amitz-nv
PR: #5616
File: tensorrt_llm/executor/worker.py:375-384
Timestamp: 2025-07-17T09:01:27.374Z
Learning: In tensorrt_llm/executor/worker.py, the LoRA adapter cache optimization logic that checks is_adapter_in_cpu_cache() and conditionally passes None for weights/config has a known race condition issue that cannot be solved with simple error handling or verification checks. This is a known limitation that requires a more comprehensive solution.

🪛 Ruff (0.12.2)
tests/unittest/bindings/test_executor_bindings.py

97-97: Local variable executor is assigned to but never used

Remove assignment to unused variable executor

(F841)


1203-1203: Line too long (260 > 120)

(E501)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pre-commit Check
🔇 Additional comments (3)
cpp/tensorrt_llm/nanobind/executor/executor.cpp (1)

146-147: LGTM!

The explicit cast to nb::object correctly aligns with the refactored numpyToTensor function signature.

cpp/tensorrt_llm/nanobind/executor/request.cpp (2)

213-228: Constructor implementation follows correct nanobind pattern.

The use of placement new on self in the __init__ lambda is the proper way to construct objects in nanobind Python bindings.


468-487: Well-structured handling of optional opaque_state parameter.

The constructor correctly differentiates between cases with and without opaque_state, properly converting Python bytes to C++ vector when needed.

@Linda-Stadter
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12570 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12570 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9349 completed with status: 'FAILURE'

@Linda-Stadter
Copy link
Collaborator Author

/bot run

Signed-off-by: Linda-Stadter <[email protected]>
Signed-off-by: Linda-Stadter <[email protected]>
@Linda-Stadter
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12721 [ run ] triggered by Bot

@Linda-Stadter
Copy link
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12735 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12721 [ run ] completed with state ABORTED

@Linda-Stadter
Copy link
Collaborator Author

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12754 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12735 [ run ] completed with state ABORTED
/LLM/main/L0_MergeRequest_PR pipeline #9481 completed with status: 'FAILURE'

@tensorrt-cicd
Copy link
Collaborator

PR_Github #12754 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9496 completed with status: 'SUCCESS'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants