Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
update docstring
Signed-off-by: Kaihui-intel <[email protected]>
  • Loading branch information
Kaihui-intel committed Aug 14, 2024
commit 0894f0e4b3b133342328190966299b987c32f4df
1 change: 1 addition & 0 deletions neural_compressor/torch/algorithms/weight_only/gptq.py
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,7 @@ def __init__(
dataloader: an iterable containing calibration datasets, contains (inputs, targets)
use_layer_wise (bool): Enables quantize model per layer. Defaults to False.
model_path (str): Model path that is used to load state_dict per layer.
quant_lm_head (bool): Indicates whether quantize the lm_head layer in transformers. Defaults to False.
device (str): cpu or cuda.
"""
# model
Expand Down
2 changes: 1 addition & 1 deletion neural_compressor/torch/quantization/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -604,7 +604,7 @@ def __init__(
double_quant_bits (int): Number of bits used to represent double_quant scale, default is 4.
double_quant_use_sym (bool): Indicates whether double_quant scale are symmetric, default is True.
double_quant_group_size (int): Size of double_quant groups, default is 32.
quant_lm_head (bool): Indicates whether quantize the lm_head layer in transformers。 Default is False.
quant_lm_head (bool): Indicates whether quantize the lm_head layer in transformer, default is False.
use_auto_scale (bool): Enables best scales search based on activation distribution, default is True.
use_auto_clip (bool): Enables clip range search. Defaults to True.
folding(bool): Allow insert mul before linear when the scale cannot be absorbed by last layer,
Expand Down