Add INC WoQ and remove ITREX dependency #880

changwangss · 2024-08-27T14:07:33Z

What does this PR do?

I raise this PR based #841 . INC 3.0.2 released，we plan remove ITREX dependency and base INC to apply WOQ. @echarlaix

# quantize
from neural_compressor.transformers import GPTQConfig
from optimum.intel.neural_compressor import INCModelForCaudalLM
quantization_config = GPTQConfig(tokenizer=tokenizer_name, dataset=dataset_name)
model =  INCModelForCaudalLM.from_pretrained(model_name_or_path, quantization_config=quantization_config)
model.save_pretrained(“output_dir”)
# loading
model = INCModelForCaudalLM.from_pretrained(“output_dir”)

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Signed-off-by: changwangss <[email protected]>

optimum/intel/neural_compressor/modeling_base.py

Co-authored-by: Ella Charlaix <[email protected]>

Signed-off-by: changwangss <[email protected]>

tests/neural_compressor/test_modeling.py

setup.py

optimum/intel/neural_compressor/modeling_base.py

examples/neural_compressor/language-modeling/run_clm.py

Signed-off-by: changwangss <[email protected]>

Co-authored-by: Ilyas Moutawwakil <[email protected]>

Signed-off-by: changwangss <[email protected]>

IlyasMoutawwakil · 2024-09-05T07:39:19Z

examples/neural_compressor/language-modeling/run_clm.py

        )
        trainer.model = quantizer._quantized_model

+    if optim_args.apply_quantization and optim_args.quantization_approach in {"weight_only"}:


Suggested change

if optim_args.apply_quantization and optim_args.quantization_approach in {"weight_only"}:

if optim_args.apply_quantization and optim_args.quantization_approach == "weight_only":

optimum/intel/neural_compressor/modeling_base.py

setup.py

Signed-off-by: changwangss <[email protected]>

setup.py

Signed-off-by: changwangss <[email protected]>

HuggingFaceDocBuilderDev · 2024-09-06T08:45:22Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Signed-off-by: changwangss <[email protected]>

echarlaix · 2024-09-06T13:54:54Z

optimum/intel/neural_compressor/modeling_base.py

+                    warnings.warn(
+                        "Weight only quantization model loading provided by intel_extension_for_transformers is deprecated and it is provided by INC now.",
+                        DeprecationWarning,
+                    )


could this be determined from the model itself (that the model was quantized through ITREX) ?

This is not noticeable to users. Only the code inside optimal-intel changes from import ITREX to import INC. Unfortunately, the model does not have an attribute to indicate the source.

added a check here 08091bc : checking the quantization configuration (if present + matching algorithm parameter)

Signed-off-by: changwangss <[email protected]>

* add inc woq and remove itrex dependency Signed-off-by: changwangss <[email protected]> * Update optimum/intel/neural_compressor/modeling_base.py Co-authored-by: Ella Charlaix <[email protected]> * Update optimum/intel/neural_compressor/modeling_base.py Co-authored-by: Ella Charlaix <[email protected]> * Update optimum/intel/neural_compressor/modeling_base.py Co-authored-by: Ella Charlaix <[email protected]> * Update optimum/intel/neural_compressor/modeling_base.py Co-authored-by: Ella Charlaix <[email protected]> * fix code according comment Signed-off-by: changwangss <[email protected]> * add logger setting Signed-off-by: changwangss <[email protected]> * improve ut Signed-off-by: changwangss <[email protected]> * move woq quantization to quantization.py Signed-off-by: changwangss <[email protected]> * Update examples/neural_compressor/language-modeling/run_clm.py Co-authored-by: Ilyas Moutawwakil <[email protected]> * Update examples/neural_compressor/language-modeling/run_clm.py Co-authored-by: Ilyas Moutawwakil <[email protected]> * remove dependency Signed-off-by: changwangss <[email protected]> * Update examples/neural_compressor/language-modeling/run_clm.py * add woq saving and loading ut and logger info Signed-off-by: changwangss <[email protected]> * set transformers version limit Signed-off-by: changwangss <[email protected]> * fix installation neural_compressor[pt] Signed-off-by: changwangss <[email protected]> * improve ut Signed-off-by: changwangss <[email protected]> * refactoring * Refactor * revert * fix datasets loading issue Signed-off-by: changwangss <[email protected]> * fix --------- Signed-off-by: changwangss <[email protected]> Co-authored-by: Ella Charlaix <[email protected]> Co-authored-by: Ilyas Moutawwakil <[email protected]> Co-authored-by: Ella Charlaix <[email protected]>

add inc woq and remove itrex dependency

0658a83

Signed-off-by: changwangss <[email protected]>

changwangss mentioned this pull request Aug 27, 2024

add quantize, save, load function for transformers-like api intel/neural-compressor#1986

Merged

changwangss marked this pull request as ready for review August 28, 2024 12:13

echarlaix reviewed Aug 28, 2024

View reviewed changes

changwangss and others added 5 commits August 29, 2024 10:53

Update optimum/intel/neural_compressor/modeling_base.py

4955b8a

Co-authored-by: Ella Charlaix <[email protected]>

Update optimum/intel/neural_compressor/modeling_base.py

7fe5ac5

Co-authored-by: Ella Charlaix <[email protected]>

Update optimum/intel/neural_compressor/modeling_base.py

1d6797c

Co-authored-by: Ella Charlaix <[email protected]>

Update optimum/intel/neural_compressor/modeling_base.py

ab178e9

Co-authored-by: Ella Charlaix <[email protected]>

fix code according comment

c078ca2

Signed-off-by: changwangss <[email protected]>

changwangss force-pushed the wangchang/inc_woq branch from 7a3e2e5 to c078ca2 Compare August 29, 2024 06:44

changwangss added 2 commits August 29, 2024 00:13

add logger setting

c257101

Signed-off-by: changwangss <[email protected]>

improve ut

d55004b

Signed-off-by: changwangss <[email protected]>

echarlaix reviewed Sep 2, 2024

View reviewed changes

echarlaix requested a review from IlyasMoutawwakil September 3, 2024 09:35

IlyasMoutawwakil reviewed Sep 5, 2024

View reviewed changes

examples/neural_compressor/language-modeling/run_clm.py Show resolved Hide resolved

IlyasMoutawwakil reviewed Sep 5, 2024

View reviewed changes

examples/neural_compressor/language-modeling/run_clm.py Outdated Show resolved Hide resolved

IlyasMoutawwakil reviewed Sep 5, 2024

View reviewed changes

examples/neural_compressor/language-modeling/run_clm.py Outdated Show resolved Hide resolved

changwangss and others added 4 commits September 4, 2024 23:50

move woq quantization to quantization.py

fcadbac

Signed-off-by: changwangss <[email protected]>

Update examples/neural_compressor/language-modeling/run_clm.py

8cf22de

Co-authored-by: Ilyas Moutawwakil <[email protected]>

Update examples/neural_compressor/language-modeling/run_clm.py

a31fc6a

Co-authored-by: Ilyas Moutawwakil <[email protected]>

remove dependency

3b5f228

Signed-off-by: changwangss <[email protected]>

IlyasMoutawwakil reviewed Sep 5, 2024

View reviewed changes

Update examples/neural_compressor/language-modeling/run_clm.py

7f8c2a2

IlyasMoutawwakil reviewed Sep 5, 2024

View reviewed changes

optimum/intel/neural_compressor/modeling_base.py Outdated Show resolved Hide resolved

IlyasMoutawwakil reviewed Sep 5, 2024

View reviewed changes

setup.py Outdated Show resolved Hide resolved

changwangss added 3 commits September 5, 2024 01:47

add woq saving and loading ut and logger info

6eba7c4

Signed-off-by: changwangss <[email protected]>

Merge branch 'main' into wangchang/inc_woq

2683608

set transformers version limit

1401c89

Signed-off-by: changwangss <[email protected]>

IlyasMoutawwakil reviewed Sep 5, 2024

View reviewed changes

setup.py Outdated Show resolved Hide resolved

fix installation neural_compressor[pt]

bc3b95a

Signed-off-by: changwangss <[email protected]>

improve ut

99f797d

Signed-off-by: changwangss <[email protected]>

echarlaix reviewed Sep 6, 2024

View reviewed changes

echarlaix added 2 commits September 6, 2024 16:46

refactoring

8321a24

Refactor

08091bc

echarlaix force-pushed the wangchang/inc_woq branch from 03e2629 to 08091bc Compare September 6, 2024 16:09

echarlaix and others added 3 commits September 6, 2024 18:28

revert

09acbd9

fix datasets loading issue

28a10d9

Signed-off-by: changwangss <[email protected]>

fix

1ad67f1

echarlaix merged commit 8a015a6 into huggingface:main Sep 9, 2024

echarlaix mentioned this pull request Nov 4, 2024

Enable autoround quantization #771

Closed

changwangss deleted the wangchang/inc_woq branch May 26, 2025 05:00

	if optim_args.apply_quantization and optim_args.quantization_approach in {"weight_only"}:
	if optim_args.apply_quantization and optim_args.quantization_approach == "weight_only":

Add INC WoQ and remove ITREX dependency #880

Add INC WoQ and remove ITREX dependency #880

Uh oh!

Conversation

changwangss commented Aug 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IlyasMoutawwakil Sep 5, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Sep 6, 2024

Uh oh!

echarlaix Sep 6, 2024

Choose a reason for hiding this comment

Uh oh!

changwangss Sep 6, 2024

Choose a reason for hiding this comment

Uh oh!

echarlaix Sep 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

changwangss commented Aug 27, 2024 •

edited

Loading

echarlaix Sep 6, 2024 •

edited

Loading