PyTorch implementation of Near-Lossless Post-Training Quantization of Deep Neural Networks via a Piecewise Linear Approximation
There are 5 main arguments
- quantize: whether to quantize parameters(per-channel) and activations(per-tensor).
- imagenet_path: path to folder contains train/val folder of imagenet data
- model: the type of model, should be one of ['mobilenetv2', 'resnet50', 'inceptionv3'], default to mobilenetv2
- qtype: the type of quantization for weights, should be one of ['uniform', 'pws', 'pwg', 'pwl'], default to uniform
- bits_weight: number of bits for weight quantization, default to 8
run the 4-bits quantized pws mobilenetv2 model by:
python main_cls.py --quantize --qtype pws --model mobilenetv2 --bits_Weight 4
The quantization in this repo is fake quantization. Inference is NOT pure Int8 arithmetics.
- Uniform quantization
- PWS quantization
- update results for classification models
- PWG quantization
- PWL quantization
- detection model
- segmentation model