Skip to content

Randlora documentation and some example usage#2524

Merged
githubnemo merged 14 commits intohuggingface:mainfrom
PaulAlbert31:randlora_docs
May 7, 2025
Merged

Randlora documentation and some example usage#2524
githubnemo merged 14 commits intohuggingface:mainfrom
PaulAlbert31:randlora_docs

Conversation

@PaulAlbert31
Copy link
Contributor

Hi @BenjaminBossan and others,
This is a follow up to #2464 and issue #2441.

I have drafted a documentation for RandLora and slightly updated the example usage in the model.py docstring.

Since RandLora performs well compared to Lora on the PEFT model comparison suite, is it also possible to add RandLora to a PEFT leader board or is that something you don't do at the moment ?

Happy to iterate or give more example usages.

@BenjaminBossan
Copy link
Member

Thanks for the follow up.

I haven't reviewed this PR yet, as something has gone wrong when you applied your diff. There are many lines like:

<<<<<<< HEAD
=======
from torch.nn.init import _calculate_correct_fan
>>>>>>> 649a35b (randlora integration - more work to do to conform to quantization practices)

Could you please check and fix those?

As to adding an experiment to the MetaMathQA method comparison suite, yes, that can be done and added to this PR. Please follow the steps described here.

@PaulAlbert31
Copy link
Contributor Author

Hi @BenjaminBossan,
Sorry about that, I missed those. I have now set up pre-commit as per the docs so to hopfully avoid things like this happening again.

I have removed the diff lines and added the MetaMathQA config.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the RandLora documentation and experiment config. The docs are really well written, well done.

I only found some minor issues that should be easily resolved, please check.

For better adoption, I would also recommend adding a full example. This can be as easy as copying one from the examples/ directory and making the necessary adjustments for RandLora. This can also be done in a later PR if you prefer.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

PaulAlbert31 and others added 2 commits May 5, 2025 15:48
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
@PaulAlbert31
Copy link
Contributor Author

@BenjaminBossan
Thanks for catching the typos in the docs, I have now added an example in the example/ folder following DoRA's structure.

I am still investigating the large memory usage of RandLora I observed when running randlora_finetune.py. This goes against what I have observed outside of the peft library.
My current guess is that the random basis are copied across layers or cast from fp16 to fp32 at some point which causes the large usages or even OOM in some cases. I'll open another PR if I find a fix.

Please let me know in case I missed something.

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the examples. Overall, they look good, but they still need some "fine-tuning". Please check my comments.

Regarding the notebook, I get an error when trying to open it on GitHub. Other people seem to face the same error, maybe this fix works.

I am still investigating the large memory usage of RandLora I observed when running randlora_finetune.py. This goes against what I have observed outside of the peft library.

Thanks for investigating, please create a PR as soon as you find the underlying issue. Is the example you're comparing it to also using Trainer? In my experience, comparing a vanilla PyTorch training loop vs Trainer can be quite difficult, as there are so many things going on under the hood.

tokenizer=tokenizer,
)
trainer.train()
peft_model.save_pretrained("randlora-llama-3-8b")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name doesn't fit the base model.

peft_model.save_pretrained("randlora-llama-3-8b")
```

There is no additional change needed to your standard PEFT training procedure, simply swap your LoRAConfig for a RandLoraConfig. Note however that RandLora's trainable parameter count is **inversely proportional** to the rank parameter `r`. Lower `r` to increase and increase it to reduce trainable parameters of RandLora.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There is no additional change needed to your standard PEFT training procedure, simply swap your LoRAConfig for a RandLoraConfig. Note however that RandLora's trainable parameter count is **inversely proportional** to the rank parameter `r`. Lower `r` to increase and increase it to reduce trainable parameters of RandLora.
There is no additional change needed to your standard PEFT training procedure, simply swap your `LoraConfig` for a `RandLoraConfig`. Note however that RandLora's trainable parameter count is **inversely proportional** to the rank parameter `r`. Lower `r` to increase and increase it to reduce trainable parameters of RandLora.

python examples/randlora_finetuning/randlora_finetuning.py --base_model meta-llama/Meta-Llama-3-8B --data_path timdettmers/openassistant-guanaco --use_lora --randlora_alpha
```

RandLora can be made to use sparse or very sparse random bases. These sparse matrices can help reduce overfitting. To add `--very_sparse` to run with very sparse matrice or run the following for sparse matrices:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RandLora can be made to use sparse or very sparse random bases. These sparse matrices can help reduce overfitting. To add `--very_sparse` to run with very sparse matrice or run the following for sparse matrices:
RandLora can be made to use sparse or very sparse random bases. These sparse matrices can help reduce overfitting. Add `--very_sparse` to run with very sparse matrices or `--sparse` for sparse matrices:

RandLora can be made to use sparse or very sparse random bases. These sparse matrices can help reduce overfitting. To add `--very_sparse` to run with very sparse matrice or run the following for sparse matrices:

```bash
python examples/randlora_finetuning/randlora_finetuning.py --base_model meta-llama/Meta-Llama-3-8B --quantize --sparse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
python examples/randlora_finetuning/randlora_finetuning.py --base_model meta-llama/Meta-Llama-3-8B --quantize --sparse
python examples/randlora_finetuning/randlora_finetuning.py --base_model meta-llama/Meta-Llama-3-8B --sparse

Let's remove it here as the option is discussed in the example below.

python examples/randlora_finetuning/randlora_finetuning.py --base_model meta-llama/Meta-Llama-3-8B --quantize
```

By default the RandLora layers are the key and value layers of LLama model. Adding adapters on more layers will increase memory usage. If you whish to choose a different set of layers for RandLora to be applied on, you can simply define it using:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
By default the RandLora layers are the key and value layers of LLama model. Adding adapters on more layers will increase memory usage. If you whish to choose a different set of layers for RandLora to be applied on, you can simply define it using:
By default the RandLora layers are the key and value layers of LLama model. Adding adapters on more layers will increase memory usage. If you wish to choose a different set of layers for RandLora to be applied on, you can simply define it using:

push_to_hub=push_to_hub,
hub_model_id=hub_model_id,
gradient_accumulation_steps=16,
fp16=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not depend on the torch_dtype that was chosen earlier?

This 👆🏻 by default will load the model in peft set up with RandLora config. Now if you wanna quickly compare it with Lora, all you need to do is to input ` --use_lora` in the command line and reduce `--randlora_alpha` to 2x the rank. So same above example would be 👇🏻;

```bash
python examples/randlora_finetuning/randlora_finetuning.py --base_model meta-llama/Meta-Llama-3-8B --data_path timdettmers/openassistant-guanaco --use_lora --randlora_alpha
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--randlora_alpha is missing a value.

save_total_limit=2,
push_to_hub=push_to_hub,
hub_model_id=hub_model_id,
gradient_accumulation_steps=16,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say either remove this argument or make it configurable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to 16//batch_size to ensure minimum size after accumulation is 16 is that suitable ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you found that it has to be a 16 accumulation steps to work properly? Maybe it's worthwhile mentioning that as a comment.


if module_shape != largest_shape:
largest_shape = tuple(max(a, b) for a, b in zip(largest_shape, module_shape))
# largest_shape = tuple(max(a, b) for a, b in zip(largest_shape, module_shape))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

largest_shape = (
max(max(module_shape), max(largest_shape)),
max(min(module_shape), min(largest_shape)),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain this change?

Copy link
Contributor Author

@PaulAlbert31 PaulAlbert31 May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change I implemented to tried to reduce the memory usage which did not work. I didn't mean to commit so I'll revert for now.

The change constrains the bases to be as small as possible and use a transpose view if possible.

Given a two layer network with sizes (D, d) and (d, D) where D>d, the current behavior for a rank 32 is to create a randlora_B random base of size (D, d//32, 32) and randlora_A of size (32,1, D) so that the bases can be sliced and reused in both layer.

This new behavior changes to randlora_B (D, 32, d//32) and randlora_A (32, 1, d) and transposes the update to fit the size of the second matrix.

This is supposed to be the default behavior but I missed the problem in the RandLora pull request. I'll delay this change for when I find a fix to the high memory usage

@PaulAlbert31 PaulAlbert31 force-pushed the randlora_docs branch 2 times, most recently from 149c2b6 to 0ca1d44 Compare May 6, 2025 03:07
notebook fix and remove broken link

notebook fix and remove broken link
@PaulAlbert31
Copy link
Contributor Author

Thanks for the feedback @BenjaminBossan, I have implemented your suggested changes.
I also had a bit of a fight to fix the notebook but I thankfully came out on top in the end.

Here is a command I used in case the issue happens with other contributions:
jupyter nbconvert --clear-output --inplace qrandlora_finetuning.ipynb

This fix is suggested in the same thread you linked: https://github.com/orgs/community/discussions/155944#discussioncomment-12856952

Let me know if there is more to improve

Copy link
Collaborator

@githubnemo githubnemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes. I'm taking over the review from @BenjaminBossan but there's not much left to do as it seems :)

Just a few nitpicks from my side.


RandLora is expected to increase performance over LoRA for equivalent amounts of trainable parameters, mostly for larger equivalent amounts (> LoRA rank 4).

RandLora's perfromance increase comes with two limitations:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RandLora's perfromance increase comes with two limitations:
RandLora's performance increase comes with two limitations:


Because reducing the rank of RandLora's random bases will increase their number, RandLora can become slower to train than LoRA for very small ranks where typically, ranks below 4 with result in a large training time increase. This does not affect inference though as the RandLora adapters can be merged into the pretrained weight matrices.

RandLora additionally supports training with sparse, unary random bases (only containing -1, 0 and 1). These bases are as described in [Bingham et al.](https://cs-people.bu.edu/evimaria/cs565/kdd-rp.pdf) and [Ping et al.](https://hastie.su.domains/Papers/Ping/KDD06_rp.pdf) and could theoretically be used to reduce compute needs by performing aggregations instead of matrix multiplications to create the weight update. This is not currently supported. Although it does not currently reduce compute, using sparse random bases in RandLora can reduce overfitting in some cases. For users intersted in using sparse unary bases, the `sparse` option is recommended over the `very_sparse` one that can reduce perfromance.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/perfromance/performance :)

I'm probably missing lingo here but I haven't found confirmation from a quick search so I have to ask: Is unary correct in this case? Isn't the base ternary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good point thanks, ternary is the correct term

@PaulAlbert31
Copy link
Contributor Author

Hi @githubnemo, thanks for your comment and catching the typos.
This new update addresses the comments, let me know if there is more to do !

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@githubnemo githubnemo marked this pull request as ready for review May 7, 2025 09:40
Copy link
Collaborator

@githubnemo githubnemo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks a lot for the thorough documentation, example and integration into the method comparison suite.

@githubnemo githubnemo dismissed BenjaminBossan’s stale review May 7, 2025 09:44

@githubnemo took over the review and all points of the review were addressed.

@githubnemo githubnemo merged commit 6c48949 into huggingface:main May 7, 2025
14 checks passed
@githubnemo githubnemo mentioned this pull request May 12, 2025
efraimdahl pushed a commit to efraimdahl/peft that referenced this pull request Jul 12, 2025
This is a follow up to huggingface#2464 and issue huggingface#2441.

Entails documentation for RandLora and slightly updated example usage in the model.py docstring.

Also adds RandLoRA to method comparison.

---------

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants