Skip to content

Commit 1c53561

Browse files
added fast process alert
1 parent b10c17c commit 1c53561

File tree

1 file changed

+6
-1
lines changed

1 file changed

+6
-1
lines changed

flash_rl/vllm_patch.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,12 @@ def hacked_process_weights_after_loading(
108108
quant_method = getattr(module, "quant_method", None)
109109
if isinstance(quant_method, QuantizeMethodBase):
110110

111-
if isinstance(quant_method, Fp8LinearMethod) or isinstance(quant_method, CompressedTensorsW8A8Int8):
111+
if isinstance(quant_method, Fp8LinearMethod):
112+
# for fast processing, we will do manual processing later
113+
assert not quant_method.use_marlin, 'marlin (w8a16) does not support fp8_fast processing'
114+
continue
115+
116+
if isinstance(quant_method, CompressedTensorsW8A8Int8):
112117
# for fast processing, we will do manual processing later
113118
continue
114119

0 commit comments

Comments
 (0)