Extend usage for olora finetune script#2308
Extend usage for olora finetune script#2308BenjaminBossan merged 17 commits intohuggingface:mainfrom
Conversation
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
BenjaminBossan
left a comment
There was a problem hiding this comment.
Thanks for making these additions to the OLoRA fine-tuning script. Overall I think these are good additions, I just have a few small comments.
Also pinging @tokenizer-decode in case you also want to review but it's not necessary for this change, as OLoRA is not directly involved.
examples/olora_finetuning/README.md
Outdated
| ``` | ||
| please add `--device_map cpu` if you want to run finetune on CPU. | ||
|
|
||
| If you want to train a quantized model like AWQ and GPTQ which do not support olora init method, please pass `--init_lora_weights gaussian`. |
There was a problem hiding this comment.
Would it be possible to use AWQ or GPTQ right now? If the user passes --quantize, the usage of bitsandbytes is hard-coded.
There was a problem hiding this comment.
Yes, we can just pass a quantized model. I have updated it in the readme.
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
|
Seems like good addition to me. I haven't run the script, but it looks good at first glance. You can ping me if anything is needed. Thanks btw @jiqing-feng |
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
|
Hi @BenjaminBossan . I have fixed all your comments, please review the new changes. Thanks! |
| output_dir=output_dir, | ||
| save_total_limit=3, | ||
| load_best_model_at_end=True, | ||
| ddp_find_unused_parameters=False if world_size > 1 else True, |
There was a problem hiding this comment.
Why should this be set to True if world_size is not greater than 1? AFAICT, the default is None.
There was a problem hiding this comment.
Without this parameter, it will print the following warning when runnnig DDP:
[rank1]:[W108 19:06:54.522318540 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank2]:[W108 19:06:54.522318531 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank0]:[W108 19:06:54.523105147 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
[rank3]:[W108 19:06:54.523123410 reducer.cpp:1400] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters in the forward pass. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters in the forward pass, consider turning this flag off. Note that this warning may be a false positive if your model has flow control causing later iterations to have unused parameters. (function operator())
There was a problem hiding this comment.
I mean changing the line to: ddp_find_unused_parameters=False if world_size > 1 else None. Is that what you tested?
There was a problem hiding this comment.
The parameter should be setted to True here
There was a problem hiding this comment.
Right, it should be None if no DDP.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
Please trigger the test again. Thanks! |
|
Format fixed. |
|
The failed tests do not seem to be related to my changes. |
BenjaminBossan
left a comment
There was a problem hiding this comment.
The changes LGTM, thanks.
Failing CI is indeed unrelated, I'll merge.
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
- allow DDP - make it work on CPU - set seed and dtype Related: dequantize_bnb_weight is updated not to move to cuda if not available. --------- Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
I compared a lot of lora finetune script and found this script is the best, it's so clear and easy to understand. So I want to extend the usage for this script to support more platforms and also DDP with minimum changes.
I currently test the script on cuda and cpu with 4 DDP on opt and llama-3. It shows reasonable results.
It would be great if this PR can be merged so we can apply it on more models and devices.