NOT MEANT FOR REVIEW YET [TRTLLM-6121] TRTLLM Sampler PP support #6415

netanel-haber · 2025-07-28T15:26:23Z

THIS IS A DRAFT PR AND IS NOT CURRENTLY MEANT FOR REVIEW.

It branches out from @dcampora's PR that actually adds these changes, so you should review there. The reason I opened it into TRTLLM main was so I could run the CI.

How to get this branch:

coderabbitai · 2025-07-28T15:26:30Z

Note

Reviews paused

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

📝 Walkthrough

Walkthrough

This update systematically renames configuration parameters, method arguments, and internal variables from enable_trtllm_sampler to use_torch_sampler and from maxBatchSize to maxNumSequences across Python, C++, and YAML files. Additional robustness improvements, logging, and error messaging enhancements are included, along with updates to tests and documentation to reflect the new naming conventions and logic.

Changes

Cohort / File(s)	Change Summary
Sampler Flag Renaming (Python API, Configs, CLI, Tests) `examples/llm-api/quickstart_advanced.py`, `tensorrt_llm/llmapi/llm_args.py`, `tensorrt_llm/_torch/pyexecutor/config.py`, `tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml`, `tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml`, `tests/integration/defs/accuracy/test_llm_api_pytorch.py`, `tests/unittest/_torch/modeling/test_modeling_nemotron_h.py`, `tests/unittest/_torch/speculative/test_draft_target.py`, `tests/unittest/_torch/speculative/test_eagle3.py`, `tests/unittest/_torch/test_overlap_scheduler.py`, `tests/unittest/_torch/test_return_logits.py`, `tests/unittest/api_stability/references/llm.yaml`	Renamed sampler configuration flag from `enable_trtllm_sampler` to `use_torch_sampler` throughout argument parsing, config classes, CLI, and tests. Updated logic and docstrings to reflect the new semantics. Adjusted related test and config YAML files accordingly.
Sampler Instantiation and Decoding Mode Logic `tensorrt_llm/_torch/pyexecutor/_util.py`	Updated sampler instantiation logic to prefer `TorchSampler` under more conditions and refined decoding mode selection with robust attribute access and explicit fallbacks.
Sampler and State Class Refactoring `tensorrt_llm/_torch/pyexecutor/sampler.py`	Renamed `num_seq_slots` to `max_num_sequences` in `TorchSampler`. Changed `finalize_events` in `SampleStateTRTLLM` to optional. Updated `TRTLLMSampler` to use `max_num_sequences` instead of `max_batch_size`. Improved handling of missing `finalize_events`.
Executor Logging `tensorrt_llm/_torch/pyexecutor/py_executor_creator.py`	Added logging of the sampler type after instantiation in the executor creation process.
Batch Size to Sequence Count Renaming (C++ API, Runtime, Bindings) `cpp/include/tensorrt_llm/runtime/decoderState.h`, `cpp/include/tensorrt_llm/runtime/gptDecoder.h`, `cpp/include/tensorrt_llm/runtime/gptDecoderBatched.h`, `cpp/include/tensorrt_llm/runtime/iGptDecoderBatched.h`, `cpp/tensorrt_llm/runtime/decoderState.cpp`, `cpp/tensorrt_llm/runtime/gptDecoder.cpp`, `cpp/tensorrt_llm/runtime/gptDecoderBatched.cpp`, `cpp/tensorrt_llm/pybind/runtime/bindings.cpp`, `cpp/tensorrt_llm/batch_manager/createNewDecoderRequests.cpp`	Renamed parameters, member variables, and method arguments from `maxBatchSize` to `maxNumSequences` in decoder and runtime classes, methods, and Python bindings. Updated all usages, comments, and related API signatures for clarity and consistency.
Sampling Kernel Parameter Check Enhancement `cpp/tensorrt_llm/kernels/samplingTopKKernels.h`	Added error messages to parameter range checks using `TLLM_CHECK_WITH_INFO` for `maxTopP` and `maxTopK`.
Layer Utility Error Reporting `cpp/tensorrt_llm/layers/layerUtils.h`	Enhanced `maxOfBatchSlots` to check for negative max values and log detailed error information if detected.
Test Data and Model Fixture Updates `tests/integration/defs/test_e2e.py`, `tests/unittest/_torch/test_beam_search.py`, `tests/unittest/llmapi/apps/_test_openai_misc.py`, `tests/unittest/_torch/test_trtllm_sampler.py`	Updated expected keywords in image modality test, changed model fixture to Qwen3-0.6B, adjusted max sequence length, and removed or updated sampler-related parameters in test setups.

Sequence Diagram(s)

sequenceDiagram
    participant CLI/User
    participant LLM Args Parser
    participant LLM Config
    participant LLM Constructor
    participant Sampler Instantiator

    CLI/User->>LLM Args Parser: Pass --use_torch_sampler flag
    LLM Args Parser->>LLM Config: Set use_torch_sampler in config
    LLM Config->>LLM Constructor: Pass use_torch_sampler param
    LLM Constructor->>Sampler Instantiator: Instantiate sampler (Torch or TRTLLM)
    Sampler Instantiator-->>LLM Constructor: Return sampler instance

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

Suggested reviewers

dcampora
HuiGao-NV
venkywonka

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

netanel-haber · 2025-07-28T15:28:15Z

/bot run --only-multi-gpu-test

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tensorrt_llm/_torch/pyexecutor/_util.py (1)

589-589: Fix line length violations for better readability.

Static analysis indicates lines exceed the 120-character limit. Consider breaking these long lines for better code readability.

For line 589:

-    if pytorch_backend_config.use_torch_sampler or pytorch_backend_config.enable_mixed_sampler or engine.spec_config is not None:
+    if (pytorch_backend_config.use_torch_sampler or 
+        pytorch_backend_config.enable_mixed_sampler or 
+        engine.spec_config is not None):

For line 659:

-            "Model is built with 'explicit draft tokens' decoding, but decoding mode is something else. Overwriting decoding mode."
+            "Model is built with 'explicit draft tokens' decoding, but decoding mode is "
+            "something else. Overwriting decoding mode."

Also applies to: 659-659

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 03632a6 and af3fd56.

📒 Files selected for processing (19)

examples/llm-api/quickstart_advanced.py (2 hunks)
tensorrt_llm/_torch/pyexecutor/_util.py (6 hunks)
tensorrt_llm/_torch/pyexecutor/config.py (1 hunks)
tensorrt_llm/_torch/pyexecutor/sampler.py (3 hunks)
tensorrt_llm/llmapi/llm_args.py (2 hunks)
tests/integration/defs/accuracy/test_llm_api_pytorch.py (3 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (1 hunks)
tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml (2 hunks)
tests/integration/defs/test_e2e.py (1 hunks)
tests/integration/test_lists/waives.txt (0 hunks)
tests/unittest/_torch/modeling/test_modeling_nemotron_h.py (1 hunks)
tests/unittest/_torch/speculative/test_draft_target.py (1 hunks)
tests/unittest/_torch/speculative/test_eagle3.py (1 hunks)
tests/unittest/_torch/test_beam_search.py (0 hunks)
tests/unittest/_torch/test_overlap_scheduler.py (4 hunks)
tests/unittest/_torch/test_return_logits.py (4 hunks)
tests/unittest/_torch/test_trtllm_sampler.py (0 hunks)
tests/unittest/api_stability/references/llm.yaml (1 hunks)
tests/unittest/llmapi/apps/_test_openai_misc.py (2 hunks)

💤 Files with no reviewable changes (3)

tests/unittest/_torch/test_beam_search.py
tests/unittest/_torch/test_trtllm_sampler.py
tests/integration/test_lists/waives.txt

🧰 Additional context used

🧠 Learnings (1)

tensorrt_llm/_torch/pyexecutor/_util.py (1)

Learnt from: yechank-nvidia
PR: #6254
File: tensorrt_llm/_torch/pyexecutor/model_engine.py:1201-1204
Timestamp: 2025-07-22T09:22:14.726Z
Learning: In TensorRT-LLM's multimodal processing pipeline, shared tensor recovery using from_shared_tensor() is only needed during the context phase. Generation requests reuse the already-recovered tensor data and only need to call strip_for_generation() to remove unnecessary multimodal data while preserving the recovered tensors. This avoids redundant tensor recovery operations during generation.

🧬 Code Graph Analysis (2)

tests/unittest/_torch/test_overlap_scheduler.py (2)

tests/unittest/_torch/test_trtllm_sampler.py (3)

create_llm (25-38)

model_path (21-22)

test_case (15-17)

tests/integration/defs/disaggregated/test_disaggregated_single_gpu.py (1)

model_path (31-44)

tensorrt_llm/_torch/pyexecutor/_util.py (3)

tensorrt_llm/_torch/pyexecutor/sampler.py (3)

TorchSampler (208-457)

EarlyStopSampler (70-97)

TRTLLMSampler (486-956)

tensorrt_llm/logger.py (1)

warning (131-132)

cpp/include/tensorrt_llm/executor/types.h (3)

DecodingMode (532-574)

DecodingMode (813-816)

Eagle (582-591)

🪛 Ruff (0.12.2)

tensorrt_llm/_torch/pyexecutor/_util.py

589-589: Line too long (129 > 120)

(E501)

659-659: Line too long (131 > 120)

(E501)

🔇 Additional comments (27)

tests/integration/defs/test_e2e.py (2)

2030-2030: LGTM: Test data update reflects model behavior change.

The keyword change from "clouds" to "waves" appears to be a legitimate update to match the actual output behavior of the qwen2.5-vl-7b-instruct model.

2032-2035: LGTM: Improved traffic keyword expectations.

The updated keywords improve the test by:

Removing the duplicate "traffic" entry

Replacing specific terms like "bus" and "police" with more general traffic concepts

Using more accurate descriptors like "lanes", "congestion", and "road"

This appears to reflect improved model behavior that produces more general traffic descriptions.

tests/integration/defs/disaggregated/test_configs/disagg_config_trtllm_sampler.yaml (2)

14-14: LGTM: Standardized sampler configuration parameter.

The change from enable_trtllm_sampler: True to use_torch_sampler: False maintains the same behavior (using TRTLLM sampler) while aligning with the codebase-wide standardization effort.

30-30: LGTM: Consistent sampler configuration across server types.

The generation_servers section correctly uses the same standardized use_torch_sampler: False parameter, maintaining consistency with the context_servers configuration.

tests/unittest/_torch/speculative/test_draft_target.py (1)

44-44: LGTM: Explicit sampler configuration improves test determinism.

Adding use_torch_sampler=True to the common configuration ensures both the speculative and reference LLM instances use the same sampler, making the test behavior explicit and deterministic.

tests/integration/defs/disaggregated/test_configs/disagg_config_ngram.yaml (1)

15-15: LGTM: Standardized sampler configuration for NGram test.

The addition of use_torch_sampler: True to the context_servers configuration follows the codebase standardization effort and explicitly specifies the sampler choice for this test scenario.

tests/unittest/_torch/speculative/test_eagle3.py (1)

63-63: LGTM: Consistent sampler configuration for Eagle3 test.

Adding use_torch_sampler=True to the common configuration ensures both speculative and reference LLM instances use the same sampler, providing consistent and deterministic test behavior across all Eagle3 test parameter combinations.

tests/unittest/api_stability/references/llm.yaml (1)

106-109: LGTM: Parameter renaming aligns with standardization effort.

The renaming from enable_trtllm_sampler to use_torch_sampler correctly reflects the new semantics, and the status promotion from prototype to beta indicates the feature is becoming more stable.

tests/unittest/_torch/modeling/test_modeling_nemotron_h.py (1)

44-44: LGTM: Semantic equivalence maintained.

The change from enable_trtllm_sampler=True to use_torch_sampler=False correctly maintains the same behavior (using TRTLLM sampler) while adopting the new parameter naming convention.

tensorrt_llm/_torch/pyexecutor/config.py (1)

57-60: LGTM: Clean parameter renaming with updated documentation.

The field renaming from enable_trtllm_sampler to use_torch_sampler is semantically correct, and the updated docstring clearly explains the new behavior. The default value of False maintains backward compatibility by defaulting to the TRTLLM sampler.

examples/llm-api/quickstart_advanced.py (2)

57-59: LGTM: Command-line argument updated consistently.

The argument parser correctly updates from --enable_trtllm_sampler to --use_torch_sampler while maintaining the same default behavior.

208-208: LGTM: LLM constructor parameter updated consistently.

The LLM constructor call correctly uses the new use_torch_sampler parameter, maintaining consistency with the updated command-line argument.

tests/integration/defs/accuracy/test_llm_api_pytorch.py (3)

189-189: LGTM! Parameter renaming aligns with codebase refactoring.

The change from enable_trtllm_sampler=True to use_torch_sampler=True is consistent with the broader parameter standardization effort described in the PR objectives.

222-222: LGTM! Parameter renaming maintains test intent.

The change from enable_trtllm_sampler=False to use_torch_sampler=False correctly maintains the original test behavior while aligning with the parameter standardization.

648-649: LGTM! Reasonable test configuration addition.

The addition of max_batch_size=64 parameter aligns with similar configurations used in other tests and likely optimizes the test setup for the Qwen3-8B model.

tests/unittest/llmapi/apps/_test_openai_misc.py (2)

15-15: LGTM! Model update for test optimization.

The change to use "Qwen3/Qwen3-0.6B-Base" instead of the TinyLlama model is a reasonable test configuration update, likely providing better performance or compatibility for the test suite.

28-32: LGTM! Well-documented parameter adjustment.

The max_seq_len update to "32768" correctly aligns with the new model's max_position_embeddings. The added comment provides valuable context for future maintainers.

tests/unittest/_torch/test_return_logits.py (2)

19-44: LGTM! Consistent parameter renaming with correct logic adaptation.

The changes from enable_trtllm_sampler to use_torch_sampler are well-executed:

Parameter names updated in decorator, function signature, and LLM instantiation

Conditional logic correctly inverted from if not enable_trtllm_sampler to if use_torch_sampler

Test behavior remains consistent with the new parameter semantics

86-111: LGTM! Consistent parameter updates in async test function.

The parameter renaming in the async test function follows the same correct pattern as the synchronous version, maintaining test consistency while aligning with the codebase refactoring.

tests/unittest/_torch/test_overlap_scheduler.py (1)

24-77: LGTM! Comprehensive and consistent parameter refactoring.

All aspects of the parameter renaming from enable_trtllm_sampler to use_torch_sampler are correctly implemented:

Function signature and dictionary key updated consistently

Pytest parameterization and test function parameter updated

Logic inversion for stop_words setting correctly maintains original behavior

All function calls updated with new parameter name

The changes maintain test functionality while aligning with the broader codebase standardization effort.

tensorrt_llm/llmapi/llm_args.py (2)

1898-1902: LGTM! Improved field naming and documentation.

The renaming from enable_trtllm_sampler to use_torch_sampler with updated semantics makes the configuration more intuitive. The description clearly indicates that True means using the Torch sampler instead of the TRTLLM sampler, and the status upgrade to "beta" reflects increased stability.

2198-2198: LGTM! Consistent parameter name update.

The parameter name change from enable_trtllm_sampler to use_torch_sampler in the PyTorchConfig constructor call correctly reflects the field rename and maintains consistency across the codebase.

tensorrt_llm/_torch/pyexecutor/sampler.py (2)

481-482: LGTM! Well-documented optional field enhancement.

Making finalize_events optional with a clear docstring explanation improves the flexibility of SampleStateTRTLLM creation, especially for the _forward_step_inter_pp use case mentioned.

894-894: LGTM! Proper defensive null check.

The addition of finalize_events is not None prevents potential errors when finalize_events is None, which aligns well with the earlier change making this field optional.

tensorrt_llm/_torch/pyexecutor/_util.py (3)

589-590: LGTM! Flag renaming aligns with PR objectives.

The change from enable_trtllm_sampler to use_torch_sampler correctly reflects the standardized configuration parameter naming throughout the codebase. The broadened condition appropriately selects TorchSampler in more scenarios.

594-596: LGTM! Correct fallback sampler for generation models.

The fallback to TRTLLMSampler with proper parameters aligns with the reversed semantics of the use_torch_sampler flag. When the flag is False (default), using TRTLLMSampler is the appropriate behavior.

618-619: LGTM! Consistent defensive attribute access pattern.

The systematic replacement of direct attribute access with getattr(executor_config.speculative_config, "attribute_name", False) improves robustness by preventing AttributeError when attributes are missing. The consistent use of False as the default value across all cases maintains logical coherence.

Also applies to: 626-627, 637-638, 645-646, 656-657, 664-666, 676-677, 684-685, 695-696

tensorrt_llm/_torch/pyexecutor/sampler.py

tensorrt-cicd · 2025-07-28T15:33:24Z

PR_Github #13213 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-28T15:34:18Z

PR_Github #13213 [ run ] completed with state FAILURE

netanel-haber · 2025-07-28T15:46:58Z

/bot run --only-multi-gpu-test

tensorrt-cicd · 2025-07-28T15:51:57Z

PR_Github #13215 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-28T15:52:52Z

PR_Github #13215 [ run ] completed with state FAILURE

tensorrt_llm/llmapi/llm_args.py

netanel-haber · 2025-07-29T11:04:01Z

/bot run --only-multi-gpu-test

tensorrt-cicd · 2025-07-29T11:09:18Z

PR_Github #13359 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-29T13:30:29Z

PR_Github #13359 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #9989 (Partly Tested) completed with status: 'FAILURE'

netanel-haber · 2025-07-29T15:05:01Z

/bot run --only-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2025-07-29T15:10:45Z

PR_Github #13381 [ run ] triggered by Bot

tensorrt-cicd · 2025-07-30T04:21:26Z

PR_Github #13381 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10007 (Partly Tested) completed with status: 'FAILURE'

netanel-haber · 2025-07-30T12:06:16Z

/bot run --only-multi-gpu-test --disable-fail-fast

Signed-off-by: Daniel Campora <[email protected]>

…ition_embeddings. Signed-off-by: Daniel Campora <[email protected]>

Signed-off-by: Daniel Campora <[email protected]>

…coder classes - Updated the parameter names and related comments in the DecoderState and GptDecoder classes to reflect the change from maxBatchSize to maxNumSequences. - Adjustments were made in the setup methods, member variables, and associated bindings in the Python interface. - This change improves clarity regarding the number of sequences being processed. Signed-off-by: Robin Kobus <[email protected]>

`Optional` to accommodate `_forward_step_inter_pp` which creates a `SampleState` without `finalize_events` Signed-off-by: Netanel Haber <[email protected]>

Signed-off-by: Netanel Haber <[email protected]> something Signed-off-by: Netanel Haber <[email protected]>

Signed-off-by: Daniel Campora <[email protected]>

Signed-off-by: Netanel Haber <[email protected]>

netanel-haber · 2025-08-05T21:07:09Z

/bot run --only-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2025-08-05T21:12:36Z

PR_Github #14185 [ run ] triggered by Bot

Signed-off-by: Netanel Haber <[email protected]>

netanel-haber · 2025-08-05T22:42:29Z

/bot run --only-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2025-08-05T22:51:22Z

PR_Github #14189 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-05T22:51:24Z

PR_Github #14185 [ run ] completed with state ABORTED

tensorrt-cicd · 2025-08-06T01:25:35Z

PR_Github #14189 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10714 (Partly Tested) completed with status: 'FAILURE'

Signed-off-by: Netanel Haber <[email protected]>

…p-support Signed-off-by: Netanel Haber <[email protected]>

netanel-haber · 2025-08-06T07:47:12Z

/bot run --only-multi-gpu-test --disable-fail-fast

tensorrt-cicd · 2025-08-06T07:52:21Z

PR_Github #14267 [ run ] triggered by Bot

tensorrt-cicd · 2025-08-06T13:18:06Z

PR_Github #14267 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #10772 (Partly Tested) completed with status: 'SUCCESS'

coderabbitai bot requested review from litaotju, nv-guomingz and syuoni July 28, 2025 15:26

coderabbitai bot added the Doc <NV>TRTLLM's textual/illustrative materials: API refs, guides, tutorials. Improvement & clarity. label Jul 28, 2025

coderabbitai bot reviewed Jul 28, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/sampler.py Show resolved Hide resolved

coderabbitai bot requested a review from liji-nv July 28, 2025 15:45

nv-guomingz changed the title ~~[TRTLLM-6121] TRTLLMSampler PP support~~ [TRTLLM-6121] TRTLLM Sampler PP support Jul 28, 2025

nv-guomingz reviewed Jul 28, 2025

View reviewed changes

tensorrt_llm/llmapi/llm_args.py Show resolved Hide resolved

netanel-haber changed the title ~~[TRTLLM-6121] TRTLLM Sampler PP support~~ NOT MEANT FOR REVIEW YET [TRTLLM-6121] TRTLLM Sampler PP support Jul 29, 2025

netanel-haber removed request for liji-nv, litaotju and syuoni July 29, 2025 11:07

coderabbitai bot requested review from dcampora, liji-nv and venkywonka July 30, 2025 12:05

coderabbitai bot requested a review from HuiGao-NV July 30, 2025 12:09

dcampora and others added 16 commits August 5, 2025 14:30

Adapt e2e test format.

225c84e

Signed-off-by: Daniel Campora <[email protected]>

Use Qwen3/Qwen3-0.6B-Base instead of TinyLlama for its longer max_pos…

76de001

…ition_embeddings. Signed-off-by: Daniel Campora <[email protected]>

Remove unnecessary enable_torch_sampler in test.

3de720f

Signed-off-by: Daniel Campora <[email protected]>

Removed unnecessary setting of Torch sampler to false.

30899b6

Signed-off-by: Daniel Campora <[email protected]>

Change enable_torch_sampler to use_torch_sampler.

78b7cf8

Signed-off-by: Daniel Campora <[email protected]>

Fix remaining single gpu issues.

957fa95

Signed-off-by: Daniel Campora <[email protected]>

Fix remaining failing tests.

d200cd8

Signed-off-by: Daniel Campora <[email protected]>

Apply test correction.

29e5aba

Signed-off-by: Daniel Campora <[email protected]>

test_disaggregated_torch_sampler

b07a850

Signed-off-by: Daniel Campora <[email protected]>

Use torch sampler for tests that use large models.

f462ce4

Signed-off-by: Daniel Campora <[email protected]>

finalize_events: dict[str, CudaEvent]** | None = None**

0a3472b

`Optional` to accommodate `_forward_step_inter_pp` which creates a `SampleState` without `finalize_events` Signed-off-by: Netanel Haber <[email protected]>

wording

0817776

Signed-off-by: Netanel Haber <[email protected]> something Signed-off-by: Netanel Haber <[email protected]>

Adapt to use_torch_sampler.

179e3a0

Signed-off-by: Daniel Campora <[email protected]>

Fix test_openai_chat_multimodal.

35f0c17

Signed-off-by: Daniel Campora <[email protected]>

setup is_attention_dp_dummy=True gen requests

dacf557

Signed-off-by: Netanel Haber <[email protected]>

netanel-haber force-pushed the user/nhaber/fix/TRTLLM-6121-trtllm-sampler-pp-support branch from 400138a to dacf557 Compare August 5, 2025 21:05

fix dc84695 pkl5.Intracomm.Request has def wait, not def Wait

b3411bb

Signed-off-by: Netanel Haber <[email protected]>

netanel-haber added 2 commits August 6, 2025 10:37

minimal disagg fix

68a69bf

Signed-off-by: Netanel Haber <[email protected]>

Merge branch 'main' into user/nhaber/fix/TRTLLM-6121-trtllm-sampler-p…

2834cb1

…p-support Signed-off-by: Netanel Haber <[email protected]>

netanel-haber closed this Aug 8, 2025

NOT MEANT FOR REVIEW YET [TRTLLM-6121] TRTLLM Sampler PP support #6415

NOT MEANT FOR REVIEW YET [TRTLLM-6121] TRTLLM Sampler PP support #6415

Uh oh!

Conversation

netanel-haber commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

netanel-haber commented Jul 28, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tensorrt-cicd commented Jul 28, 2025

Uh oh!

tensorrt-cicd commented Jul 28, 2025

Uh oh!

netanel-haber commented Jul 28, 2025

Uh oh!

tensorrt-cicd commented Jul 28, 2025

Uh oh!

tensorrt-cicd commented Jul 28, 2025

Uh oh!

Uh oh!

netanel-haber commented Jul 29, 2025

Uh oh!

tensorrt-cicd commented Jul 29, 2025

Uh oh!

tensorrt-cicd commented Jul 29, 2025

Uh oh!

netanel-haber commented Jul 29, 2025

Uh oh!

tensorrt-cicd commented Jul 29, 2025

Uh oh!

tensorrt-cicd commented Jul 30, 2025

Uh oh!

netanel-haber commented Jul 30, 2025

Uh oh!

netanel-haber commented Aug 5, 2025

Uh oh!

tensorrt-cicd commented Aug 5, 2025

Uh oh!

netanel-haber commented Aug 5, 2025

Uh oh!

tensorrt-cicd commented Aug 5, 2025

Uh oh!

tensorrt-cicd commented Aug 5, 2025

Uh oh!

tensorrt-cicd commented Aug 6, 2025

Uh oh!

netanel-haber commented Aug 6, 2025

Uh oh!

tensorrt-cicd commented Aug 6, 2025

Uh oh!

tensorrt-cicd commented Aug 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

netanel-haber commented Jul 28, 2025 •

edited

Loading

coderabbitai bot commented Jul 28, 2025 •

edited

Loading