Skip to content

Commit 64e3bfa

Browse files
authored
[None][fix] Fix KV cache recompute in draft_target spec decode (#7348)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
1 parent f156221 commit 64e3bfa

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

tensorrt_llm/_torch/speculative/model_drafter.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -151,8 +151,10 @@ def _create_draft_request_for_request(
151151
assert num_draft_tokens == 0
152152
return self._create_context_request(request, input_tokens)
153153

154-
# No tokens accepted - generation request
155-
elif num_accepted_tokens == 0:
154+
# No tokens accepted - generation request. This only applies to speculation algorithms
155+
# that need to recompute KV cache for accepted tokens like eagle3.
156+
elif num_accepted_tokens == 0 or not self.spec_config.spec_dec_mode.needs_kv_cache_recompute(
157+
):
156158
return self._create_generation_request(request, input_tokens)
157159

158160
# Tokens accepted - chunked context request

0 commit comments

Comments
 (0)