Skip to content

Conversation

@netanel-haber
Copy link
Collaborator

@netanel-haber netanel-haber commented Jul 28, 2025

THIS IS A DRAFT PR AND IS NOT CURRENTLY MEANT FOR REVIEW.

It branches out from @dcampora's PR that actually adds these changes, so you should review there. The reason I opened it into TRTLLM main was so I could run the CI.

How to get this branch:

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 28, 2025

Note

Reviews paused

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.
📝 Walkthrough

Walkthrough

This update systematically renames configuration parameters, method arguments, and internal variables from enable_trtllm_sampler to use_torch_sampler and from maxBatchSize to maxNumSequences across Python, C++, and YAML files. Additional robustness improvements, logging, and error messaging enhancements are included, along with updates to tests and documentation to reflect the new naming conventions and logic.

Changes

Cohort / File(s) Change Summary
Sampler Flag Renaming (Python API, Configs, CLI, Tests)
examples/llm-api/quickstart_advanced.py, tensorrt_llm/llmapi/llm_args.py, tensorrt_llm/_torch/pyexecutor/config.py, tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml, tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml, tests/integration/defs/accuracy/test_llm_api_pytorch.py, tests/unittest/_torch/modeling/test_modeling_nemotron_h.py, tests/unittest/_torch/speculative/test_draft_target.py, tests/unittest/_torch/speculative/test_eagle3.py, tests/unittest/_torch/test_overlap_scheduler.py, tests/unittest/_torch/test_return_logits.py, tests/unittest/api_stability/references/llm.yaml
Renamed sampler configuration flag from enable_trtllm_sampler to use_torch_sampler throughout argument parsing, config classes, CLI, and tests. Updated logic and docstrings to reflect the new semantics. Adjusted related test and config YAML files accordingly.
Sampler Instantiation and Decoding Mode Logic
tensorrt_llm/_torch/pyexecutor/_util.py
Updated sampler instantiation logic to prefer TorchSampler under more conditions and refined decoding mode selection with robust attribute access and explicit fallbacks.
Sampler and State Class Refactoring
tensorrt_llm/_torch/pyexecutor/sampler.py
Renamed num_seq_slots to max_num_sequences in TorchSampler. Changed finalize_events in SampleStateTRTLLM to optional. Updated TRTLLMSampler to use max_num_sequences instead of max_batch_size. Improved handling of missing finalize_events.
Executor Logging
tensorrt_llm/_torch/pyexecutor/py_executor_creator.py
Added logging of the sampler type after instantiation in the executor creation process.
Batch Size to Sequence Count Renaming (C++ API, Runtime, Bindings)
cpp/include/tensorrt_llm/runtime/decoderState.h, cpp/include/tensorrt_llm/runtime/gptDecoder.h, cpp/include/tensorrt_llm/runtime/gptDecoderBatched.h, cpp/include/tensorrt_llm/runtime/iGptDecoderBatched.h, cpp/tensorrt_llm/runtime/decoderState.cpp, cpp/tensorrt_llm/runtime/gptDecoder.cpp, cpp/tensorrt_llm/runtime/gptDecoderBatched.cpp, cpp/tensorrt_llm/pybind/runtime/bindings.cpp, cpp/tensorrt_llm/batch_manager/createNewDecoderRequests.cpp
Renamed parameters, member variables, and method arguments from maxBatchSize to maxNumSequences in decoder and runtime classes, methods, and Python bindings. Updated all usages, comments, and related API signatures for clarity and consistency.
Sampling Kernel Parameter Check Enhancement
cpp/tensorrt_llm/kernels/samplingTopKKernels.h
Added error messages to parameter range checks using TLLM_CHECK_WITH_INFO for maxTopP and maxTopK.
Layer Utility Error Reporting
cpp/tensorrt_llm/layers/layerUtils.h
Enhanced maxOfBatchSlots to check for negative max values and log detailed error information if detected.
Test Data and Model Fixture Updates
tests/integration/defs/test_e2e.py, tests/unittest/_torch/test_beam_search.py, tests/unittest/llmapi/apps/_test_openai_misc.py, tests/unittest/_torch/test_trtllm_sampler.py
Updated expected keywords in image modality test, changed model fixture to Qwen3-0.6B, adjusted max sequence length, and removed or updated sampler-related parameters in test setups.

Sequence Diagram(s)

sequenceDiagram
    participant CLI/User
    participant LLM Args Parser
    participant LLM Config
    participant LLM Constructor
    participant Sampler Instantiator

    CLI/User->>LLM Args Parser: Pass --use_torch_sampler flag
    LLM Args Parser->>LLM Config: Set use_torch_sampler in config
    LLM Config->>LLM Constructor: Pass use_torch_sampler param
    LLM Constructor->>Sampler Instantiator: Instantiate sampler (Torch or TRTLLM)
    Sampler Instantiator-->>LLM Constructor: Return sampler instance
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

Suggested reviewers

  • dcampora
  • HuiGao-NV
  • venkywonka
✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@netanel-haber
Copy link
Collaborator Author

/bot run --only-multi-gpu-test

@coderabbitai coderabbitai bot added the Doc <NV>TRTLLM's textual/illustrative materials: API refs, guides, tutorials. Improvement & clarity. label Jul 28, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tensorrt_llm/_torch/pyexecutor/_util.py (1)

589-589: Fix line length violations for better readability.

Static analysis indicates lines exceed the 120-character limit. Consider breaking these long lines for better code readability.

For line 589:

-    if pytorch_backend_config.use_torch_sampler or pytorch_backend_config.enable_mixed_sampler or engine.spec_config is not None:
+    if (pytorch_backend_config.use_torch_sampler or 
+        pytorch_backend_config.enable_mixed_sampler or 
+        engine.spec_config is not None):

For line 659:

-            "Model is built with 'explicit draft tokens' decoding, but decoding mode is something else. Overwriting decoding mode."
+            "Model is built with 'explicit draft tokens' decoding, but decoding mode is "
+            "something else. Overwriting decoding mode."

Also applies to: 659-659

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03632a6 and af3fd56.

📒 Files selected for processing (19)
  • examples/llm-api/quickstart_advanced.py (2 hunks)
  • tensorrt_llm/_torch/pyexecutor/_util.py (6 hunks)
  • tensorrt_llm/_torch/pyexecutor/config.py (1 hunks)
  • tensorrt_llm/_torch/pyexecutor/sampler.py (3 hunks)
  • tensorrt_llm/llmapi/llm_args.py (2 hunks)
  • tests/integration/defs/accuracy/test_llm_api_pytorch.py (3 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (1 hunks)
  • tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml (2 hunks)
  • tests/integration/defs/test_e2e.py (1 hunks)
  • tests/integration/test_lists/waives.txt (0 hunks)
  • tests/unittest/_torch/modeling/test_modeling_nemotron_h.py (1 hunks)
  • tests/unittest/_torch/speculative/test_draft_target.py (1 hunks)
  • tests/unittest/_torch/speculative/test_eagle3.py (1 hunks)
  • tests/unittest/_torch/test_beam_search.py (0 hunks)
  • tests/unittest/_torch/test_overlap_scheduler.py (4 hunks)
  • tests/unittest/_torch/test_return_logits.py (4 hunks)
  • tests/unittest/_torch/test_trtllm_sampler.py (0 hunks)
  • tests/unittest/api_stability/references/llm.yaml (1 hunks)
  • tests/unittest/llmapi/apps/_test_openai_misc.py (2 hunks)
💤 Files with no reviewable changes (3)
  • tests/unittest/_torch/test_beam_search.py
  • tests/unittest/_torch/test_trtllm_sampler.py
  • tests/integration/test_lists/waives.txt
🧰 Additional context used
🧠 Learnings (1)
tensorrt_llm/_torch/pyexecutor/_util.py (1)

Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor() is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation() to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

🧬 Code Graph Analysis (2)
tests/unittest/_torch/test_overlap_scheduler.py (2)
tests/unittest/_torch/test_trtllm_sampler.py (3)
  • create_llm (25-38)
  • model_path (21-22)
  • test_case (15-17)
tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py (1)
  • model_path (31-44)
tensorrt_llm/_torch/pyexecutor/_util.py (3)
tensorrt_llm/_torch/pyexecutor/sampler.py (3)
  • TorchSampler (208-457)
  • EarlyStopSampler (70-97)
  • TRTLLMSampler (486-956)
tensorrt_llm/logger.py (1)
  • warning (131-132)
cpp/include/tensorrt_llm/executor/types.h (3)
  • DecodingMode (532-574)
  • DecodingMode (813-816)
  • Eagle (582-591)
🪛 Ruff (0.12.2)
tensorrt_llm/_torch/pyexecutor/_util.py

589-589: Line too long (129 > 120)

(E501)


659-659: Line too long (131 > 120)

(E501)

🔇 Additional comments (27)
tests/integration/defs/test_e2e.py (2)

2030-2030: LGTM: Test data update reflects model behavior change.

The keyword change from "clouds" to "waves" appears to be a legitimate update to match the actual output behavior of the qwen2.5-vl-7b-instruct model.


2032-2035: LGTM: Improved traffic keyword expectations.

The updated keywords improve the test by:

  • Removing the duplicate "traffic" entry
  • Replacing specific terms like "bus" and "police" with more general traffic concepts
  • Using more accurate descriptors like "lanes", "congestion", and "road"

This appears to reflect improved model behavior that produces more general traffic descriptions.

tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml (2)

14-14: LGTM: Standardized sampler configuration parameter.

The change from enable_trtllm_sampler: True to use_torch_sampler: False maintains the same behavior (using TRTLLM sampler) while aligning with the codebase-wide standardization effort.


30-30: LGTM: Consistent sampler configuration across server types.

The generation_servers section correctly uses the same standardized use_torch_sampler: False parameter, maintaining consistency with the context_servers configuration.

tests/unittest/_torch/speculative/test_draft_target.py (1)

44-44: LGTM: Explicit sampler configuration improves test determinism.

Adding use_torch_sampler=True to the common configuration ensures both the speculative and reference LLM instances use the same sampler, making the test behavior explicit and deterministic.

tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (1)

15-15: LGTM: Standardized sampler configuration for NGram test.

The addition of use_torch_sampler: True to the context_servers configuration follows the codebase standardization effort and explicitly specifies the sampler choice for this test scenario.

tests/unittest/_torch/speculative/test_eagle3.py (1)

63-63: LGTM: Consistent sampler configuration for Eagle3 test.

Adding use_torch_sampler=True to the common configuration ensures both speculative and reference LLM instances use the same sampler, providing consistent and deterministic test behavior across all Eagle3 test parameter combinations.

tests/unittest/api_stability/references/llm.yaml (1)

106-109: LGTM: Parameter renaming aligns with standardization effort.

The renaming from enable_trtllm_sampler to use_torch_sampler correctly reflects the new semantics, and the status promotion from prototype to beta indicates the feature is becoming more stable.

tests/unittest/_torch/modeling/test_modeling_nemotron_h.py (1)

44-44: LGTM: Semantic equivalence maintained.

The change from enable_trtllm_sampler=True to use_torch_sampler=False correctly maintains the same behavior (using TRTLLM sampler) while adopting the new parameter naming convention.

tensorrt_llm/_torch/pyexecutor/config.py (1)

57-60: LGTM: Clean parameter renaming with updated documentation.

The field renaming from enable_trtllm_sampler to use_torch_sampler is semantically correct, and the updated docstring clearly explains the new behavior. The default value of False maintains backward compatibility by defaulting to the TRTLLM sampler.

examples/llm-api/quickstart_advanced.py (2)

57-59: LGTM: Command-line argument updated consistently.

The argument parser correctly updates from --enable_trtllm_sampler to --use_torch_sampler while maintaining the same default behavior.


208-208: LGTM: LLM constructor parameter updated consistently.

The LLM constructor call correctly uses the new use_torch_sampler parameter, maintaining consistency with the updated command-line argument.

tests/integration/defs/accuracy/test_llm_api_pytorch.py (3)

189-189: LGTM! Parameter renaming aligns with codebase refactoring.

The change from enable_trtllm_sampler=True to use_torch_sampler=True is consistent with the broader parameter standardization effort described in the PR objectives.


222-222: LGTM! Parameter renaming maintains test intent.

The change from enable_trtllm_sampler=False to use_torch_sampler=False correctly maintains the original test behavior while aligning with the parameter standardization.


648-649: LGTM! Reasonable test configuration addition.

The addition of max_batch_size=64 parameter aligns with similar configurations used in other tests and likely optimizes the test setup for the Qwen3-8B model.

tests/unittest/llmapi/apps/_test_openai_misc.py (2)

15-15: LGTM! Model update for test optimization.

The change to use "Qwen3/Qwen3-0.6B-Base" instead of the TinyLlama model is a reasonable test configuration update, likely providing better performance or compatibility for the test suite.


28-32: LGTM! Well-documented parameter adjustment.

The max_seq_len update to "32768" correctly aligns with the new model's max_position_embeddings. The added comment provides valuable context for future maintainers.

tests/unittest/_torch/test_return_logits.py (2)

19-44: LGTM! Consistent parameter renaming with correct logic adaptation.

The changes from enable_trtllm_sampler to use_torch_sampler are well-executed:

  • Parameter names updated in decorator, function signature, and LLM instantiation
  • Conditional logic correctly inverted from if not enable_trtllm_sampler to if use_torch_sampler
  • Test behavior remains consistent with the new parameter semantics

86-111: LGTM! Consistent parameter updates in async test function.

The parameter renaming in the async test function follows the same correct pattern as the synchronous version, maintaining test consistency while aligning with the codebase refactoring.

tests/unittest/_torch/test_overlap_scheduler.py (1)

24-77: LGTM! Comprehensive and consistent parameter refactoring.

All aspects of the parameter renaming from enable_trtllm_sampler to use_torch_sampler are correctly implemented:

  • Function signature and dictionary key updated consistently
  • Pytest parameterization and test function parameter updated
  • Logic inversion for stop_words setting correctly maintains original behavior
  • All function calls updated with new parameter name

The changes maintain test functionality while aligning with the broader codebase standardization effort.

tensorrt_llm/llmapi/llm_args.py (2)

1898-1902: LGTM! Improved field naming and documentation.

The renaming from enable_trtllm_sampler to use_torch_sampler with updated semantics makes the configuration more intuitive. The description clearly indicates that True means using the Torch sampler instead of the TRTLLM sampler, and the status upgrade to "beta" reflects increased stability.


2198-2198: LGTM! Consistent parameter name update.

The parameter name change from enable_trtllm_sampler to use_torch_sampler in the PyTorchConfig constructor call correctly reflects the field rename and maintains consistency across the codebase.

tensorrt_llm/_torch/pyexecutor/sampler.py (2)

481-482: LGTM! Well-documented optional field enhancement.

Making finalize_events optional with a clear docstring explanation improves the flexibility of SampleStateTRTLLM creation, especially for the _forward_step_inter_pp use case mentioned.


894-894: LGTM! Proper defensive null check.

The addition of finalize_events is not None prevents potential errors when finalize_events is None, which aligns well with the earlier change making this field optional.

tensorrt_llm/_torch/pyexecutor/_util.py (3)

589-590: LGTM! Flag renaming aligns with PR objectives.

The change from enable_trtllm_sampler to use_torch_sampler correctly reflects the standardized configuration parameter naming throughout the codebase. The broadened condition appropriately selects TorchSampler in more scenarios.


594-596: LGTM! Correct fallback sampler for generation models.

The fallback to TRTLLMSampler with proper parameters aligns with the reversed semantics of the use_torch_sampler flag. When the flag is False (default), using TRTLLMSampler is the appropriate behavior.


618-619: LGTM! Consistent defensive attribute access pattern.

The systematic replacement of direct attribute access with getattr(executor_config.speculative_config, "attribute_name", False) improves robustness by preventing AttributeError when attributes are missing. The consistent use of False as the default value across all cases maintains logical coherence.

Also applies to: 626-627, 637-638, 645-646, 656-657, 664-666, 676-677, 684-685, 695-696

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13213 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13213 [ run ] completed with state FAILURE

@coderabbitai coderabbitai bot requested a review from liji-nv July 28, 2025 15:45
@netanel-haber
Copy link
Collaborator Author

/bot run --only-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13215 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13215 [ run ] completed with state FAILURE

@nv-guomingz nv-guomingz changed the title [TRTLLM-6121] TRTLLMSampler PP support [TRTLLM-6121] TRTLLM Sampler PP support Jul 28, 2025
@netanel-haber netanel-haber changed the title [TRTLLM-6121] TRTLLM Sampler PP support NOT MEANT FOR REVIEW YET [TRTLLM-6121] TRTLLM Sampler PP support Jul 29, 2025
@netanel-haber
Copy link
Collaborator Author

/bot run --only-multi-gpu-test

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13359 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13359 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9989 (Partly Tested) completed with status: 'FAILURE'

@netanel-haber
Copy link
Collaborator Author

/bot run --only-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13381 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #13381 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10007 (Partly Tested) completed with status: 'FAILURE'

@netanel-haber
Copy link
Collaborator Author

/bot run --only-multi-gpu-test --disable-fail-fast

@coderabbitai coderabbitai bot requested a review from HuiGao-NV July 30, 2025 12:09
dcampora and others added 16 commits August 5, 2025 14:30
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
…coder classes

- Updated the parameter names and related comments in the DecoderState and GptDecoder classes to reflect the change from maxBatchSize to maxNumSequences.
- Adjustments were made in the setup methods, member variables, and associated bindings in the Python interface.
- This change improves clarity regarding the number of sequences being processed.

Signed-off-by: Robin Kobus <[email protected]>
`Optional` to accommodate `_forward_step_inter_pp` which creates a `SampleState` without `finalize_events`

Signed-off-by: Netanel Haber <[email protected]>
Signed-off-by: Netanel Haber <[email protected]>

something

Signed-off-by: Netanel Haber <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
Signed-off-by: Daniel Campora <[email protected]>
@netanel-haber netanel-haber force-pushed the user/nhaber/fix/TRTLLM-6121-trtllm-sampler-pp-support branch from 400138a to dacf557 Compare August 5, 2025 21:05
@netanel-haber
Copy link
Collaborator Author

/bot run --only-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14185 [ run ] triggered by Bot

@netanel-haber
Copy link
Collaborator Author

/bot run --only-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14189 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14185 [ run ] completed with state ABORTED

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14189 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10714 (Partly Tested) completed with status: 'FAILURE'

@netanel-haber
Copy link
Collaborator Author

/bot run --only-multi-gpu-test --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14267 [ run ] triggered by Bot

@tensorrt-cicd
Copy link
Collaborator

PR_Github #14267 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10772 (Partly Tested) completed with status: 'SUCCESS'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Doc <NV>TRTLLM's textual/illustrative materials: API refs, guides, tutorials. Improvement & clarity.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants