Skip to content

Extend usage for olora finetune script#2308

Merged
BenjaminBossan merged 17 commits intohuggingface:mainfrom
jiqing-feng:olora
Jan 8, 2025
Merged

Extend usage for olora finetune script#2308
BenjaminBossan merged 17 commits intohuggingface:mainfrom
jiqing-feng:olora

Conversation

@jiqing-feng
Copy link
Contributor

@jiqing-feng jiqing-feng commented Jan 7, 2025

I compared a lot of lora finetune script and found this script is the best, it's so clear and easy to understand. So I want to extend the usage for this script to support more platforms and also DDP with minimum changes.

I currently test the script on cuda and cpu with 4 DDP on opt and llama-3. It shows reasonable results.

It would be great if this PR can be merged so we can apply it on more models and devices.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making these additions to the OLoRA fine-tuning script. Overall I think these are good additions, I just have a few small comments.

Also pinging @tokenizer-decode in case you also want to review but it's not necessary for this change, as OLoRA is not directly involved.

```
please add `--device_map cpu` if you want to run finetune on CPU.

If you want to train a quantized model like AWQ and GPTQ which do not support olora init method, please pass `--init_lora_weights gaussian`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to use AWQ or GPTQ right now? If the user passes --quantize, the usage of bitsandbytes is hard-coded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can just pass a quantized model. I have updated it in the readme.

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@buyukakyuz
Copy link
Contributor

Seems like good addition to me. I haven't run the script, but it looks good at first glance. You can ping me if anything is needed. Thanks btw @jiqing-feng

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@jiqing-feng jiqing-feng marked this pull request as ready for review January 8, 2025 09:02
jiqing-feng and others added 3 commits January 8, 2025 17:12
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
@jiqing-feng
Copy link
Contributor Author

Hi @BenjaminBossan . I have fixed all your comments, please review the new changes. Thanks!

output_dir=output_dir,
save_total_limit=3,
load_best_model_at_end=True,
ddp_find_unused_parameters=False if world_size > 1 else True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should this be set to True if world_size is not greater than 1? AFAICT, the default is None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this parameter, it will print the following warning when runnnig DDP:

[rank1]:[W108 19:06:54.522318540 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank2]:[W108 19:06:54.522318531 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank0]:[W108 19:06:54.523105147 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank3]:[W108 19:06:54.523123410 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration,  which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean changing the line to: ddp_find_unused_parameters=False if world_size > 1 else None. Is that what you tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter should be setted to True here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it should be None if no DDP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@jiqing-feng
Copy link
Contributor Author

Please trigger the test again. Thanks!

@jiqing-feng
Copy link
Contributor Author

Format fixed.

@jiqing-feng
Copy link
Contributor Author

The failed tests do not seem to be related to my changes.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes LGTM, thanks.

Failing CI is indeed unrelated, I'll merge.

@BenjaminBossan BenjaminBossan merged commit c207885 into huggingface:main Jan 8, 2025
10 of 14 checks passed
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Guy-Bilitski pushed a commit to Guy-Bilitski/peft that referenced this pull request May 13, 2025
- allow DDP
- make it work on CPU
- set seed and dtype

Related: dequantize_bnb_weight is updated not to move to cuda if not
available.
---------

Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
@jiqing-feng jiqing-feng deleted the olora branch October 9, 2025 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants