Skip to content

Commit f8759a7

Browse files
committed
fixing the eagle serving
1 parent 485a756 commit f8759a7

File tree

2 files changed

+1
-3
lines changed

2 files changed

+1
-3
lines changed

components/backends/trtllm/engine_configs/llama4/eagle_one_model/eagle_agg.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,11 +30,10 @@ speculative_config:
3030

3131
kv_cache_config:
3232
free_gpu_memory_fraction: 0.5
33-
enable_block_reuse: true # true when target and draft are same kv dtype
33+
enable_block_reuse: false # true when target and draft are same kv dtype
3434

3535
cuda_graph_config:
3636
padding_enabled: true
3737
max_batch_size: 8
38-
dtype: fp8
3938

4039
print_iter_log: true

components/backends/trtllm/engine_configs/llama4/eagle_one_model/eagle_prefill.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,6 @@ speculative_config:
3333
kv_cache_config:
3434
free_gpu_memory_fraction: 0.5
3535
enable_block_reuse: true
36-
dtype: fp8
3736

3837
cache_transceiver_config:
3938
backend: default

0 commit comments

Comments
 (0)