Skip to content
Merged
Prev Previous commit
Next Next commit
attempt to fix gpu OOM issue
  • Loading branch information
KrishnanPrash committed Jul 29, 2025
commit 82ffd54aa561b2c9d614c16e462714af26d71ec9
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ max_num_tokens: 256
max_seq_len: 8448

kv_cache_config:
free_gpu_memory_fraction: 0.7
free_gpu_memory_fraction: 0.3
dtype: fp8

cuda_graph_config:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ max_num_tokens: 8192
max_seq_len: 8192

kv_cache_config:
free_gpu_memory_fraction: 0.75
free_gpu_memory_fraction: 0.3
dtype: fp8 # NOTE: This dtype must match in both prefill/decode configs

# NOTE: pytorch_backend_config section flattened since: https://github.com/NVIDIA/TensorRT-LLM/pull/4603
Expand Down