[feat] Implement pytorch sampler for MTP #5627

pathorn · 2025-06-30T22:30:43Z

[feat] Implement pytorch sampler for MTP

Description

Previously, speculative decoding was always greedy and does not support sampling parameters.

This PR implements temperature, top-p, top-k and min-p sampling parameters in python when using MTP speculative decoding (for DeepSeek).

This also adds a log-probs return value from the pytorch sampler (Intended to be used together with #5620)

Future work: support this pytorch sampler for non-MTP models. Figure out why it is faster than the C++ sampler.

Test Coverage

None

netanel-haber · 2025-07-01T12:50:49Z

tensorrt_llm/_torch/pyexecutor/model_engine.py

        lora_config: Optional[LoraConfig] = None,
        is_draft_model: bool = False,
    ):
+        torch.manual_seed(0)


Is this intentional?

Without this line, TRTLLM would malfunction because each GPU would produce a different answer.

This was necessary to prevent divergent calculations on different GPUs with different seeds.

There's a larger problem that currently the user cannot pass in a seed for individual requests. Ideally we would not depend on the global torch seed, but for some reason this is difficult within PyTorch since its cuda graph seems to make some assumptions that break the ability to use a custom RNG.

tensorrt_llm/_torch/pyexecutor/model_engine.py

netanel-haber · 2025-07-01T12:53:14Z

tensorrt_llm/_torch/speculative/mtp.py

        # Strict acceptance
        else:
-            if self.is_thop:
+            if False:


What is happening here?

We did not implement the C++ MTP sampling kernel for thop, so this is a hack to use the pytorch implementation of the sampler. I'll revert this line for now, but I'm not sure if the code added by this PR will work properly without forcing this case.

Signed-off-by: Patrick Reiter Horn <[email protected]>

QiJune · 2025-07-02T01:57:47Z

tensorrt_llm/_torch/pyexecutor/model_engine.py


 import psutil
 import safetensors
+from tensorrt_llm._torch.pyexecutor.llm_request import LlmRequest


This line seems duplicate with line 24.

QiJune · 2025-07-02T02:01:07Z

@pathorn Thanks for you contribution. Could you please try the code format tool? https://github.com/NVIDIA/TensorRT-LLM/blob/main/CONTRIBUTING.md#coding-style

jhaotingc · 2025-07-16T22:23:09Z

Hi, thanks @pathorn for contribution, @QiJune, @netanel-haber for reviewing, @nvxuanyuc and I will help getting this merged these weeks.

amukkara · 2025-07-17T18:44:25Z

tensorrt_llm/_torch/pyexecutor/sampler.py

    next_tokens = torch.multinomial(softmax, num_samples=1).squeeze(-1)
    return next_tokens, softmax

+def flashinfer_sample(


flashinfer_sample is not invoked anywhere. can be deleted.

amukkara · 2025-07-17T18:46:16Z

tensorrt_llm/_torch/pyexecutor/sampler.py

+    #     generator = torch.Generator(device="cuda")
+    #     generator.manual_seed(0)
+    # next_tokens = flashinfer_sample(adjusted_logits, top_k, top_p, generator)
+    # logits = apply_top_k_top_p(logits, top_k, top_p)


cleanup commented out code blocks (here and in other files).

pathorn requested review from a team as code owners June 30, 2025 22:30

pathorn requested review from HuiGao-NV and yuxianq June 30, 2025 22:30

juney-nvidia requested review from dcampora, lfr-0531, mikeiovine and netanel-haber and removed request for HuiGao-NV and yuxianq June 30, 2025 22:36

juney-nvidia added Community want to contribute PRs initiated from Community Community Engagement help/insights needed from community labels Jun 30, 2025

netanel-haber reviewed Jul 1, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/model_engine.py Outdated Show resolved Hide resolved

netanel-haber reviewed Jul 1, 2025

View reviewed changes

tensorrt_llm/_torch/pyexecutor/model_engine.py Outdated Show resolved Hide resolved

netanel-haber reviewed Jul 1, 2025

View reviewed changes

Implement pytorch sampler for MTP

d4a928c

Signed-off-by: Patrick Reiter Horn <[email protected]>

pathorn force-pushed the python-mtp-sampler branch from bee8f0b to d4a928c Compare July 1, 2025 23:51

QiJune reviewed Jul 2, 2025

View reviewed changes

amukkara reviewed Jul 17, 2025

View reviewed changes

nvxuanyuc mentioned this pull request Jul 22, 2025

[None][feat] Implement advanced sampling for one model path mtp/eagle #6245

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feat] Implement pytorch sampler for MTP #5627

[feat] Implement pytorch sampler for MTP #5627

pathorn commented Jun 30, 2025

Uh oh!

netanel-haber Jul 1, 2025

Uh oh!

pathorn Jul 1, 2025

Uh oh!

Uh oh!

Uh oh!

netanel-haber Jul 1, 2025

Uh oh!

pathorn Jul 1, 2025

Uh oh!

QiJune Jul 2, 2025

Uh oh!

QiJune commented Jul 2, 2025

Uh oh!

jhaotingc commented Jul 16, 2025

Uh oh!

amukkara Jul 17, 2025

Uh oh!

amukkara Jul 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[feat] Implement pytorch sampler for MTP #5627

Are you sure you want to change the base?

[feat] Implement pytorch sampler for MTP #5627

Conversation

pathorn commented Jun 30, 2025

Description

Test Coverage

Uh oh!

netanel-haber Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

pathorn Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

netanel-haber Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

pathorn Jul 1, 2025

Choose a reason for hiding this comment

Uh oh!

QiJune Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

QiJune commented Jul 2, 2025

Uh oh!

jhaotingc commented Jul 16, 2025

Uh oh!

amukkara Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

amukkara Jul 17, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants