Randlora documentation and some example usage by PaulAlbert31 · Pull Request #2524 · huggingface/peft

PaulAlbert31 · 2025-04-30T00:34:55Z

Hi @BenjaminBossan and others,
This is a follow up to #2464 and issue #2441.

I have drafted a documentation for RandLora and slightly updated the example usage in the model.py docstring.

Since RandLora performs well compared to Lora on the PEFT model comparison suite, is it also possible to add RandLora to a PEFT leader board or is that something you don't do at the moment ?

Happy to iterate or give more example usages.

BenjaminBossan · 2025-04-30T10:03:16Z

Thanks for the follow up.

I haven't reviewed this PR yet, as something has gone wrong when you applied your diff. There are many lines like:

<<<<<<< HEAD
=======
from torch.nn.init import _calculate_correct_fan
>>>>>>> 649a35b (randlora integration - more work to do to conform to quantization practices)

Could you please check and fix those?

As to adding an experiment to the MetaMathQA method comparison suite, yes, that can be done and added to this PR. Please follow the steps described here.

docs merge

…into randlora_docs

docs merge squash

PaulAlbert31 · 2025-05-01T00:31:43Z

Hi @BenjaminBossan,
Sorry about that, I missed those. I have now set up pre-commit as per the docs so to hopfully avoid things like this happening again.

I have removed the diff lines and added the MetaMathQA config.

BenjaminBossan

Thanks for adding the RandLora documentation and experiment config. The docs are really well written, well done.

I only found some minor issues that should be easily resolved, please check.

For better adoption, I would also recommend adding a full example. This can be as easy as copying one from the examples/ directory and making the necessary adjustments for RandLora. This can also be done in a later PR if you prefer.

docs/source/package_reference/randlora.md

…into randlora_docs

review-notebook-app · 2025-05-05T05:47:32Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

PaulAlbert31 · 2025-05-05T05:58:16Z

@BenjaminBossan
Thanks for catching the typos in the docs, I have now added an example in the example/ folder following DoRA's structure.

I am still investigating the large memory usage of RandLora I observed when running randlora_finetune.py. This goes against what I have observed outside of the peft library.
My current guess is that the random basis are copied across layers or cast from fp16 to fp32 at some point which causes the large usages or even OOM in some cases. I'll open another PR if I find a fix.

Please let me know in case I missed something.

…into randlora_docs

BenjaminBossan

Thanks for adding the examples. Overall, they look good, but they still need some "fine-tuning". Please check my comments.

Regarding the notebook, I get an error when trying to open it on GitHub. Other people seem to face the same error, maybe this fix works.

I am still investigating the large memory usage of RandLora I observed when running randlora_finetune.py. This goes against what I have observed outside of the peft library.

Thanks for investigating, please create a PR as soon as you find the underlying issue. Is the example you're comparing it to also using Trainer? In my experience, comparing a vanilla PyTorch training loop vs Trainer can be quite difficult, as there are so many things going on under the hood.

BenjaminBossan · 2025-05-05T10:13:40Z

examples/randlora_finetuning/README.md

+    tokenizer=tokenizer,
+)
+trainer.train()
+peft_model.save_pretrained("randlora-llama-3-8b")


The name doesn't fit the base model.

BenjaminBossan · 2025-05-05T10:14:07Z

examples/randlora_finetuning/README.md

+peft_model.save_pretrained("randlora-llama-3-8b")
+```
+
+There is no additional change needed to your standard PEFT training procedure, simply swap your LoRAConfig for a RandLoraConfig. Note however that RandLora's trainable parameter count is **inversely proportional** to the rank parameter `r`. Lower `r` to increase and increase it to reduce trainable parameters of RandLora.


Suggested change

There is no additional change needed to your standard PEFT training procedure, simply swap your LoRAConfig for a RandLoraConfig. Note however that RandLora's trainable parameter count is **inversely proportional** to the rank parameter `r`. Lower `r` to increase and increase it to reduce trainable parameters of RandLora.

There is no additional change needed to your standard PEFT training procedure, simply swap your `LoraConfig` for a `RandLoraConfig`. Note however that RandLora's trainable parameter count is **inversely proportional** to the rank parameter `r`. Lower `r` to increase and increase it to reduce trainable parameters of RandLora.

BenjaminBossan · 2025-05-05T10:15:25Z

examples/randlora_finetuning/README.md

+python examples/randlora_finetuning/randlora_finetuning.py --base_model meta-llama/Meta-Llama-3-8B --data_path timdettmers/openassistant-guanaco --use_lora --randlora_alpha
+```
+
+RandLora can be made to use sparse or very sparse random bases. These sparse matrices can help reduce overfitting. To add `--very_sparse` to run with very sparse matrice or run the following for sparse matrices:


Suggested change

RandLora can be made to use sparse or very sparse random bases. These sparse matrices can help reduce overfitting. To add `--very_sparse` to run with very sparse matrice or run the following for sparse matrices:

RandLora can be made to use sparse or very sparse random bases. These sparse matrices can help reduce overfitting. Add `--very_sparse` to run with very sparse matrices or `--sparse` for sparse matrices:

BenjaminBossan · 2025-05-05T10:16:07Z

examples/randlora_finetuning/README.md

+RandLora can be made to use sparse or very sparse random bases. These sparse matrices can help reduce overfitting. To add `--very_sparse` to run with very sparse matrice or run the following for sparse matrices:
+
+```bash
+python examples/randlora_finetuning/randlora_finetuning.py --base_model meta-llama/Meta-Llama-3-8B --quantize --sparse


Suggested change

python examples/randlora_finetuning/randlora_finetuning.py --base_model meta-llama/Meta-Llama-3-8B --quantize --sparse

python examples/randlora_finetuning/randlora_finetuning.py --base_model meta-llama/Meta-Llama-3-8B --sparse

Let's remove it here as the option is discussed in the example below.

BenjaminBossan · 2025-05-05T10:16:26Z

examples/randlora_finetuning/README.md

+python examples/randlora_finetuning/randlora_finetuning.py --base_model meta-llama/Meta-Llama-3-8B --quantize
+```
+
+By default the RandLora layers are the key and value layers of LLama model. Adding adapters on more layers will increase memory usage. If you whish to choose a different set of layers for RandLora to be applied on, you can simply define it using:


Suggested change

By default the RandLora layers are the key and value layers of LLama model. Adding adapters on more layers will increase memory usage. If you whish to choose a different set of layers for RandLora to be applied on, you can simply define it using:

By default the RandLora layers are the key and value layers of LLama model. Adding adapters on more layers will increase memory usage. If you wish to choose a different set of layers for RandLora to be applied on, you can simply define it using:

BenjaminBossan · 2025-05-05T10:25:43Z

examples/randlora_finetuning/randlora_finetuning.py

+        push_to_hub=push_to_hub,
+        hub_model_id=hub_model_id,
+        gradient_accumulation_steps=16,
+        fp16=True,


Should this not depend on the torch_dtype that was chosen earlier?

BenjaminBossan · 2025-05-05T10:28:20Z

examples/randlora_finetuning/README.md

+This 👆🏻 by default will load the model in peft set up with RandLora config. Now if you wanna quickly compare it with Lora, all you need to do is to input ` --use_lora` in the command line and reduce `--randlora_alpha` to 2x the rank. So same above example would be 👇🏻;
+
+```bash
+python examples/randlora_finetuning/randlora_finetuning.py --base_model meta-llama/Meta-Llama-3-8B --data_path timdettmers/openassistant-guanaco --use_lora --randlora_alpha


--randlora_alpha is missing a value.

BenjaminBossan · 2025-05-05T10:29:18Z

examples/randlora_finetuning/randlora_finetuning.py

+        save_total_limit=2,
+        push_to_hub=push_to_hub,
+        hub_model_id=hub_model_id,
+        gradient_accumulation_steps=16,


I'd say either remove this argument or make it configurable.

Changed to 16//batch_size to ensure minimum size after accumulation is 16 is that suitable ?

Because you found that it has to be a 16 accumulation steps to work properly? Maybe it's worthwhile mentioning that as a comment.

BenjaminBossan · 2025-05-05T10:29:52Z

src/peft/tuners/randlora/model.py


            if module_shape != largest_shape:
-                largest_shape = tuple(max(a, b) for a, b in zip(largest_shape, module_shape))
+                # largest_shape = tuple(max(a, b) for a, b in zip(largest_shape, module_shape))


BenjaminBossan · 2025-05-05T10:30:39Z

src/peft/tuners/randlora/model.py

+                largest_shape = (
+                    max(max(module_shape), max(largest_shape)),
+                    max(min(module_shape), min(largest_shape)),
+                )


Could you please explain this change?

This is a change I implemented to tried to reduce the memory usage which did not work. I didn't mean to commit so I'll revert for now.

The change constrains the bases to be as small as possible and use a transpose view if possible.

Given a two layer network with sizes (D, d) and (d, D) where D>d, the current behavior for a rank 32 is to create a randlora_B random base of size (D, d//32, 32) and randlora_A of size (32,1, D) so that the bases can be sliced and reused in both layer.

This new behavior changes to randlora_B (D, 32, d//32) and randlora_A (32, 1, d) and transposes the update to fit the size of the second matrix.

This is supposed to be the default behavior but I missed the problem in the RandLora pull request. I'll delay this change for when I find a fix to the high memory usage

notebook fix and remove broken link notebook fix and remove broken link

PaulAlbert31 · 2025-05-06T04:04:31Z

Thanks for the feedback @BenjaminBossan, I have implemented your suggested changes.
I also had a bit of a fight to fix the notebook but I thankfully came out on top in the end.

Here is a command I used in case the issue happens with other contributions:
jupyter nbconvert --clear-output --inplace qrandlora_finetuning.ipynb

This fix is suggested in the same thread you linked: https://github.com/orgs/community/discussions/155944#discussioncomment-12856952

Let me know if there is more to improve

githubnemo

Thanks for the fixes. I'm taking over the review from @BenjaminBossan but there's not much left to do as it seems :)

Just a few nitpicks from my side.

githubnemo · 2025-05-06T17:48:57Z

examples/randlora_finetuning/README.md

+
+RandLora is expected to increase performance over LoRA for equivalent amounts of trainable parameters, mostly for larger equivalent amounts (> LoRA rank 4).
+
+RandLora's perfromance increase comes with two limitations:


Suggested change

RandLora's perfromance increase comes with two limitations:

RandLora's performance increase comes with two limitations:

githubnemo · 2025-05-06T17:52:17Z

docs/source/package_reference/randlora.md

+
+Because reducing the rank of RandLora's random bases will increase their number, RandLora can become slower to train than LoRA for very small ranks where typically, ranks below 4 with result in a large training time increase. This does not affect inference though as the RandLora adapters can be merged into the pretrained weight matrices.
+
+RandLora additionally supports training with sparse, unary random bases (only containing -1, 0 and 1). These bases are as described in [Bingham et al.](https://cs-people.bu.edu/evimaria/cs565/kdd-rp.pdf) and [Ping et al.](https://hastie.su.domains/Papers/Ping/KDD06_rp.pdf) and could theoretically be used to reduce compute needs by performing aggregations instead of matrix multiplications to create the weight update. This is not currently supported. Although it does not currently reduce compute, using sparse random bases in RandLora can reduce overfitting in some cases. For users intersted in using sparse unary bases, the `sparse` option is recommended over the `very_sparse` one that can reduce perfromance. 


s/perfromance/performance :)

I'm probably missing lingo here but I haven't found confirmation from a quick search so I have to ask: Is unary correct in this case? Isn't the base ternary?

Yes good point thanks, ternary is the correct term

PaulAlbert31 · 2025-05-06T23:44:29Z

Hi @githubnemo, thanks for your comment and catching the typos.
This new update addresses the comments, let me know if there is more to do !

HuggingFaceDocBuilderDev · 2025-05-07T09:21:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

githubnemo

This is great, thanks a lot for the thorough documentation, example and integration into the method comparison suite.

@githubnemo

@githubnemo took over the review and all points of the review were addressed.

This is a follow up to huggingface#2464 and issue huggingface#2441. Entails documentation for RandLora and slightly updated example usage in the model.py docstring. Also adds RandLoRA to method comparison. --------- Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

PaulAlbert31 mentioned this pull request Apr 30, 2025

Integarting RandLoRA a full rank PEFT algorithm #2441

Closed

randlora docs and MetaMathQA config

196ba70

PaulAlbert31 force-pushed the randlora_docs branch from 8cc4f35 to 196ba70 Compare May 1, 2025 00:19

PaulAlbert31 added 3 commits May 1, 2025 00:23

randlora docs and MetaMathQA config

6a09a85

docs merge

Merge branch 'randlora_docs' of https://github.com/PaulAlbert31/peft …

e2774cd

…into randlora_docs

randlora docs and MetaMathQA config

b28248a

docs merge squash

BenjaminBossan requested changes May 2, 2025

View reviewed changes

docs/source/package_reference/randlora.md Outdated Show resolved Hide resolved

docs/source/package_reference/randlora.md Show resolved Hide resolved

docs/source/package_reference/randlora.md Outdated Show resolved Hide resolved

PaulAlbert31 added 2 commits May 5, 2025 01:40

Merge branch 'randlora_docs' of https://github.com/PaulAlbert31/peft …

a780aee

…into randlora_docs

example usage

6f41aa9

PaulAlbert31 and others added 2 commits May 5, 2025 15:48

Update docs/source/package_reference/randlora.md

7268604

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

Update docs/source/package_reference/randlora.md

adce08e

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>

PaulAlbert31 added 2 commits May 5, 2025 06:37

adapted example usage due to high memory usage + sparse examples

4421e11

Merge branch 'randlora_docs' of https://github.com/PaulAlbert31/peft …

2741593

…into randlora_docs

BenjaminBossan previously requested changes May 5, 2025

View reviewed changes

fixes and reverse changes to randlora/model.py

4fde31f

PaulAlbert31 force-pushed the randlora_docs branch 2 times, most recently from 149c2b6 to 0ca1d44 Compare May 6, 2025 03:07

notebook fix and remove broken link

07e3780

notebook fix and remove broken link notebook fix and remove broken link

PaulAlbert31 force-pushed the randlora_docs branch from 0ca1d44 to 07e3780 Compare May 6, 2025 03:15

fixes and better hyper-parameters

e328a63

githubnemo reviewed May 6, 2025

View reviewed changes

typos and comment on batch size

57f5747

githubnemo marked this pull request as ready for review May 7, 2025 09:40

githubnemo approved these changes May 7, 2025

View reviewed changes

githubnemo merged commit 6c48949 into huggingface:main May 7, 2025
14 checks passed

githubnemo mentioned this pull request May 12, 2025

UIortholora #2537

Closed

	There is no additional change needed to your standard PEFT training procedure, simply swap your LoRAConfig for a RandLoraConfig. Note however that RandLora's trainable parameter count is inversely proportional to the rank parameter `r`. Lower `r` to increase and increase it to reduce trainable parameters of RandLora.
	There is no additional change needed to your standard PEFT training procedure, simply swap your `LoraConfig` for a `RandLoraConfig`. Note however that RandLora's trainable parameter count is inversely proportional to the rank parameter `r`. Lower `r` to increase and increase it to reduce trainable parameters of RandLora.

	RandLora can be made to use sparse or very sparse random bases. These sparse matrices can help reduce overfitting. To add `--very_sparse` to run with very sparse matrice or run the following for sparse matrices:
	RandLora can be made to use sparse or very sparse random bases. These sparse matrices can help reduce overfitting. Add `--very_sparse` to run with very sparse matrices or `--sparse` for sparse matrices:

	python examples/randlora_finetuning/randlora_finetuning.py --base_model meta-llama/Meta-Llama-3-8B --quantize --sparse
	python examples/randlora_finetuning/randlora_finetuning.py --base_model meta-llama/Meta-Llama-3-8B --sparse

	By default the RandLora layers are the key and value layers of LLama model. Adding adapters on more layers will increase memory usage. If you whish to choose a different set of layers for RandLora to be applied on, you can simply define it using:
	By default the RandLora layers are the key and value layers of LLama model. Adding adapters on more layers will increase memory usage. If you wish to choose a different set of layers for RandLora to be applied on, you can simply define it using:


		RandLora is expected to increase performance over LoRA for equivalent amounts of trainable parameters, mostly for larger equivalent amounts (> LoRA rank 4).

		RandLora's perfromance increase comes with two limitations:

	RandLora's perfromance increase comes with two limitations:
	RandLora's performance increase comes with two limitations:


		Because reducing the rank of RandLora's random bases will increase their number, RandLora can become slower to train than LoRA for very small ranks where typically, ranks below 4 with result in a large training time increase. This does not affect inference though as the RandLora adapters can be merged into the pretrained weight matrices.

		RandLora additionally supports training with sparse, unary random bases (only containing -1, 0 and 1). These bases are as described in [Bingham et al.](https://cs-people.bu.edu/evimaria/cs565/kdd-rp.pdf) and [Ping et al.](https://hastie.su.domains/Papers/Ping/KDD06_rp.pdf) and could theoretically be used to reduce compute needs by performing aggregations instead of matrix multiplications to create the weight update. This is not currently supported. Although it does not currently reduce compute, using sparse random bases in RandLora can reduce overfitting in some cases. For users intersted in using sparse unary bases, the `sparse` option is recommended over the `very_sparse` one that can reduce perfromance.

Conversation

PaulAlbert31 commented Apr 30, 2025

Uh oh!

BenjaminBossan commented Apr 30, 2025

Uh oh!

PaulAlbert31 commented May 1, 2025

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

review-notebook-app bot commented May 5, 2025

Uh oh!

PaulAlbert31 commented May 5, 2025

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PaulAlbert31 May 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PaulAlbert31 commented May 6, 2025

Uh oh!

githubnemo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PaulAlbert31 commented May 6, 2025

Uh oh!

HuggingFaceDocBuilderDev commented May 7, 2025

Uh oh!

githubnemo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PaulAlbert31 May 6, 2025 •

edited

Loading