Skip to content

Conversation

@changwangss
Copy link
Contributor

@changwangss changwangss commented Aug 27, 2024

What does this PR do?

I raise this PR based #841 . INC 3.0.2 released,we plan remove ITREX dependency and base INC to apply WOQ. @echarlaix

# quantize
from neural_compressor.transformers import GPTQConfig
from optimum.intel.neural_compressor import INCModelForCaudalLM
quantization_config = GPTQConfig(tokenizer=tokenizer_name, dataset=dataset_name)
model =  INCModelForCaudalLM.from_pretrained(model_name_or_path, quantization_config=quantization_config)
model.save_pretrained(“output_dir”)
# loading
model = INCModelForCaudalLM.from_pretrained(“output_dir”)

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Signed-off-by: changwangss <[email protected]>
Signed-off-by: changwangss <[email protected]>
)
trainer.model = quantizer._quantized_model

if optim_args.apply_quantization and optim_args.quantization_approach in {"weight_only"}:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if optim_args.apply_quantization and optim_args.quantization_approach in {"weight_only"}:
if optim_args.apply_quantization and optim_args.quantization_approach == "weight_only":

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: changwangss <[email protected]>
Comment on lines 142 to 145
warnings.warn(
"Weight only quantization model loading provided by intel_extension_for_transformers is deprecated and it is provided by INC now.",
DeprecationWarning,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be determined from the model itself (that the model was quantized through ITREX) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not noticeable to users. Only the code inside optimal-intel changes from import ITREX to import INC. Unfortunately, the model does not have an attribute to indicate the source.

Copy link
Collaborator

@echarlaix echarlaix Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a check here 08091bc : checking the quantization configuration (if present + matching algorithm parameter)

@echarlaix echarlaix merged commit 8a015a6 into huggingface:main Sep 9, 2024
echarlaix added a commit that referenced this pull request Sep 10, 2024
* add inc woq and remove itrex dependency

Signed-off-by: changwangss <[email protected]>

* Update optimum/intel/neural_compressor/modeling_base.py

Co-authored-by: Ella Charlaix <[email protected]>

* Update optimum/intel/neural_compressor/modeling_base.py

Co-authored-by: Ella Charlaix <[email protected]>

* Update optimum/intel/neural_compressor/modeling_base.py

Co-authored-by: Ella Charlaix <[email protected]>

* Update optimum/intel/neural_compressor/modeling_base.py

Co-authored-by: Ella Charlaix <[email protected]>

* fix code according comment

Signed-off-by: changwangss <[email protected]>

* add logger setting

Signed-off-by: changwangss <[email protected]>

* improve ut

Signed-off-by: changwangss <[email protected]>

* move woq quantization to quantization.py

Signed-off-by: changwangss <[email protected]>

* Update examples/neural_compressor/language-modeling/run_clm.py

Co-authored-by: Ilyas Moutawwakil <[email protected]>

* Update examples/neural_compressor/language-modeling/run_clm.py

Co-authored-by: Ilyas Moutawwakil <[email protected]>

* remove dependency

Signed-off-by: changwangss <[email protected]>

* Update examples/neural_compressor/language-modeling/run_clm.py

* add woq saving and loading ut and logger info

Signed-off-by: changwangss <[email protected]>

* set transformers version limit

Signed-off-by: changwangss <[email protected]>

* fix installation neural_compressor[pt]

Signed-off-by: changwangss <[email protected]>

* improve ut

Signed-off-by: changwangss <[email protected]>

* refactoring

* Refactor

* revert

* fix datasets loading issue

Signed-off-by: changwangss <[email protected]>

* fix

---------

Signed-off-by: changwangss <[email protected]>
Co-authored-by: Ella Charlaix <[email protected]>
Co-authored-by: Ilyas Moutawwakil <[email protected]>
Co-authored-by: Ella Charlaix <[email protected]>
@changwangss changwangss deleted the wangchang/inc_woq branch May 26, 2025 05:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants