Skip to content

Commit edaab7f

Browse files
FEAT Convert non-LoRA PEFT adapters to LoRA (#2939)
This adds the possibility to convert a non-LoRA adapter into a LoRA adapter. Not all LoRA adapters will support this, but many will. Conversion is not precise, there will be a loss of performance. The higher the rank, the lower the loss, but also the less efficient the adapter. Also, for now, this only supports linear layers. Still, this has some advantages: - In PEFT, LoRA supports more features than most other methods, e.g. mixed adapter batches. Thus the converted adapter can be used with those features. - Some downstream packages support LoRA adapters, but not other PEFT methods, e.g. Diffusers. The conversion allows to use a non-LoRA adapter with those packages. Users can pass a fixed rank for the LoRA adapter or a float that will use a dynamic rank based on the threshold of the contribution of the singular values. Unrelated changes I noticed that the VB-LoRA layer had no __repr__, so it was added. The return type annotation of set_peft_model_state_dict was incorrect.
1 parent 9bb8947 commit edaab7f

File tree

28 files changed

+1488
-3
lines changed

28 files changed

+1488
-3
lines changed

docs/source/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,5 +155,7 @@
155155
title: Hotswapping adapters
156156
- local: package_reference/functional
157157
title: Functions for PEFT integration
158+
- local: package_reference/lora_conversion
159+
title: Converting non-LoRA adapters to LoRA
158160
title: Utilities
159161
title: API reference

docs/source/package_reference/hotswap.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,8 @@ Hotswapping works with transformers models and diffusers models. However, there
6969
- It only works for the same PEFT method, so no swapping LoRA and LoHa, for example.
7070
- The adapter that is being swapped in must target the same layers as the previous adapter or a subset of those layers. It cannot target new layers. Therefore, if possible, start with the adapter that targets most layers.
7171

72+
## API
73+
7274
[[autodoc]] utils.hotswap.hotswap_adapter
7375
- all
7476

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
<!--⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
2+
rendered properly in your Markdown viewer.
3+
-->
4+
5+
# LoRA conversion
6+
7+
Functions that allow to convert non-LoRA PEFT models to LoRA models.
8+
9+
## Description
10+
11+
PEFT supports dozens of different parameter effficient fine-tuning techniques. The most popular one by far is LoRA. This means that many other packages support LoRA too. For example, [Diffusers](https://huggingface.co/docs/diffusers/main/en/api/loaders/lora) allows to load LoRA adapters to change the capabilities of diffusion models. [vLLM](https://docs.vllm.ai/en/stable/features/lora/) allows serving models with LoRA adapters. This is nice but unfortunately, all the other, non-LoRA PEFT methods are rarely supported. Therefore, even if another PEFT method would work better for your specific use case, you may be prevented from using it because downstream packages offer no support.
12+
13+
Here we present a potential solution. PEFT offers two functions, [`save_as_lora`] and [`convert_to_lora`], which allow to convert a PEFT adapter into a LoRA adapter. Not all PEFT methods support this for now, but if they do, it means you can start with the PEFT method that works best for you and then later use it as if it were a LoRA adapter.
14+
15+
## Example
16+
17+
The LoRA rank for the converted adapter can either be set to a fixed rank by passing an int > 0 to the `rank` argument, or a dynamic rank, which adapts to each layer, by passing a float between 0 and 1 to the `rank` argument. Dynamic ranks can potentially be more efficient (same performance with fewer parameters).
18+
19+
### Fixed LoRA rank
20+
21+
The usage of [`save_as_lora`] is relatively straightforward:
22+
23+
```python
24+
from peft import get_peft_model, save_as_lora
25+
26+
# first load and train your non-LoRA PEFT model as normal
27+
base_model = ...
28+
non_lora_config = ...
29+
model = get_peft_model(base_model, non_lora_config)
30+
# check that this PEFT method can indeed be converted to LoRA
31+
assert model.supports_lora_conversion()
32+
... # train the model
33+
34+
# the rank of the LoRA adapter that you want to convert to
35+
target_rank = 64
36+
# save as a LoRA checkpoint
37+
save_as_lora(output_path, model, rank=target_rank)
38+
```
39+
40+
This will create a LoRA checkpoint at `output_path` that you can load like any other LoRA adapter, or use in downstream packages such as Diffusers or vLLM.
41+
42+
The [`convert_to_lora`] function is useful if you don't want to save the converted LoRA adapter but instead want to use the converted weights right away, for example to perform evaluations:
43+
44+
```python
45+
from peft import convert_to_lora, get_peft_model, set_peft_model_state_dict
46+
47+
base_model = ...
48+
non_lora_config = ...
49+
model = get_peft_model(base_model, non_lora_config)
50+
... # train the model
51+
52+
# get the lora config and state dict of the converted lora model
53+
lora_config, lora_state_dict = convert_to_lora(model, rank=target_rank)
54+
# reload the base model, or use model.unload()
55+
base_model = ...
56+
# apply the lora config to the base model
57+
lora_model = get_peft_model(base_model, lora_config)
58+
# load the LoRA weights onto the base model
59+
set_peft_model_state_dict(lora_model, state_dict)
60+
```
61+
62+
### Dynamic LoRA rank
63+
64+
In the examples above, we used a fixed LoRA rank for conversion. However, it is conceivable that some layers don't require a high rank to be accurately converted, while other layers require a higher rank. To accomodate this, PEFT offers the option to pass a float between 0 and 1 as the `rank` argument. Let's say you pass `rank=0.5`. This means that for each layer, the rank for the LoRA adapter is chosen such that the LoRA adapter explains 50% of the variance in weight introduced by original adapter. In more technical terms, under the hood we perform a [Singular Value Decomposition](https://en.wikipedia.org/wiki/Singular_value_decomposition) on the weight contribution of the adapter and then take the top singular values that, when normalized, sum up to the passed value.
65+
66+
```python
67+
# set a dynamic rank by passing a float
68+
threshold = 0.7
69+
# save as a LoRA checkpoint
70+
save_as_lora(output_path, model, rank=threshold)
71+
# get the lora config and state dict directly:
72+
lora_config, lora_state_dict = convert_to_lora(model, rank=threshold)
73+
# inspect the different ranks per layer:
74+
print(lora_config.rank_pattern)
75+
```
76+
77+
Using this type of dynamic LoRA rank can be useful if the contribution of the different layers varies a lot. The disadvantage is that it could mean that some layers will have a very high LoRA rank, which can lead to memory spikes. Please test what works best for your use case.
78+
79+
### LoRA to LoRA conversion
80+
81+
It is also possible to convert a LoRA adapter into another LoRA adapter. Why would you want to do that? There is one reason, namely if you want to reduce the rank of the LoRA adapter. If, after training, you want to shrink the LoRA adapter, use [`save_as_lora`] or [`convert_to_lora`] and pass a smaller rank. This will give you a new LoRA adapter that has a smaller memory and storage footprint.
82+
83+
## Metrics
84+
85+
### Non-LoRA to LoRA conversion
86+
87+
Of course, converting one PEFT adapter into another adapter is a lossy process. The new adapter will most likely not perform as well as the initial adapter. Therefore, it is highly advised to **evaluate the converted LoRA adapter**. This way, you can make sure that the converted adapter performs well enough for your use case. The general rule applies that the higher the rank of the LoRA adaper, the better it will approximate your initial adapter. This means that the converted LoRA adapter may require more parameters than the original adapter to achieve a similar performanace.
88+
89+
To give an example, here are some numbers that were derived on the [PEFT MetaMathQA benchmark](https://github.com/huggingface/peft/tree/main/method_comparison/MetaMathQA). For this, a [LoHa](https://huggingface.co/docs/peft/package_reference/loha) was used to fine-tune `meta-llama/Llama-3.2-3B` on MetaMathQA and evaluated on GSM8K. The initial LoKr adapter had rank 32, resulting in 18,350,080 trainable parameters, and a test accuracy of 41.85%. Evaluation required 12.25 GB of memory. The checkpoint was converted into LoRA with different values for the `rank`. The resulting outcome is:
90+
91+
| rank | trainable parameters | test accuracy (%) | accuracy change | memory reserved (max, GB) | memory increase |
92+
|------|---------------------:|------------------:|----------------:|--------------------------:|----------------:|
93+
| 8 | 2293760 | 37.60 | -4.25 | 12.41 | 0.16 |
94+
| 16 | 4587520 | 38.89 | -2.96 | 12.15 | -0.10 |
95+
| 32 | 9175040 | 40.11 | -1.74 | 12.41 | 0.16 |
96+
| 64 | 18350080 | 39.20 | -2.65 | 12.18 | -0.07 |
97+
| | | | | | |
98+
| 0.4 | 2428928 | 37.60 | -4.25 | 12.41 | 0.16 |
99+
| 0.5 | 4761600 | 40.18 | -1.67 | 12.41 | 0.16 |
100+
| 0.6 | 8857600 | 39.42 | -2.43 | 12.41 | 0.16 |
101+
| 0.7 | 16230400 | 39.04 | -2.81 | 12.15 | -0.10 |
102+
103+
As you can see, we can attain a test accuracy that comes close to the original LoHa adapter if the rank is sufficiently high. Choosing the right rank is a tradeoff between model performance and model efficiency. To reproduce this experiment, follow the script at https://github.com/huggingface/peft/tree/main/scripts/evaluate-lora-conversion.py.
104+
105+
Note that the number of trainable parameters cannot be translated one to one into memory usage. Some PEFT methods require more, some less memory, even with the same number of trainable parameters. Therefore, even if after conversion, the LoRA adapter has more parameters than the original one, it could still be more memory efficient when serving.
106+
107+
### LoRA to LoRA conversion
108+
109+
Similar to the experiment above, we can also evaluate LoRA to LoRA conversion (i.e. LoRA compression). Here, we start with a LoRA adapter of rank 64 trained on the same setup as above with RS-LoRA. The initial adapter has 18,350,080 trainable parameters, a test accuracy of 52.92%, and requires 12.58 GB of memory for evaluation. The following table shows the results of converting this adapter to LoRA adapters of smaller rank:
110+
111+
| rank | trainable parameters | test accuracy (%) | accuracy change | memory reserved (max, GB) | memory increase |
112+
|------|---------------------:|------------------:|----------------:|--------------------------:|----------------:|
113+
| 8 | 2293760 | 43.37 | -9.55 | 12.38 | -0.20 |
114+
| 16 | 4587520 | 48.90 | -4.02 | 12.38 | -0.20 |
115+
| 32 | 9175040 | 51.48 | -1.44 | 12.49 | -0.09 |
116+
| 48 | 13762560 | 52.01 | -0.91 | 12.38 | -0.20 |
117+
| | | | | | |
118+
| 0.5 | 2150400 | 44.12 | -8.80 | 12.37 | -0.21 |
119+
| 0.6 | 3082240 | 47.54 | -5.38 | 12.37 | -0.21 |
120+
| 0.7 | 4448256 | 50.49 | -2.43 | 12.37 | -0.21 |
121+
| 0.8 | 6510592 | 50.11 | -2.81 | 12.37 | -0.21 |
122+
| 0.9 | 10022912 | 51.55 | -1.37 | 12.38 | -0.20 |
123+
| 0.95 | 12976128 | 52.62 | -0.30 | 12.39 | -0.19 |
124+
125+
So for instance for rank 0.95, we can close the accuracy gap to just 0.3 percentage points while reducing the number of parameters by 30%. Also note that these compressed LoRAs can be better than directly training them on the lower rank -- e.g. for rank 32, training directly results in a test accuracy of 48.22%, while conversion from rank 64 results in 51.48%.
126+
127+
## Caveats
128+
129+
There are some limitations to the LoRA conversion. As mentioned above, a reduction in performance is expected and the converted LoRA will most likely be less parameter efficient than the original adapter. Morever, LoRA conversion has these limitations:
130+
131+
- Right now, only adapters applied to linear layers can be converted.
132+
- Not all PEFT methods currently support LoRA conversion.
133+
134+
If there is a lot of demand to extend LoRA conversion, please let us know and we will make it work with more layer types and PEFT methods.
135+
136+
## API
137+
138+
### Convert a non-LoRA model to a LoRA model, return the `LoraConfig` and `state_dict`
139+
140+
[[autodoc]] tuners.lora.conversion.convert_to_lora
141+
- all
142+
143+
### Convert a non-LoRA model to a LoRA model, save the adapter checkpoint and config at the given path
144+
145+
[[autodoc]] tuners.lora.conversion.save_as_lora
146+
- all
Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
#!/usr/bin/env python3
2+
# Copyright 2025-present the HuggingFace Inc. team.
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
"""Script to evaluate a PEFT checkpoint converted into a LoRA on GSM8K
17+
18+
To run this script, first train a PEFT model on MetaMathQA as described here:
19+
20+
https://github.com/huggingface/peft/tree/main/method_comparison/MetaMathQA
21+
22+
Call the script with the `-v` (verbose) option. When that run finishes, it will save a checkpoint of that model and
23+
print a message like this: "Saved PEFT checkpoint to ...". Use this path as the `--path` argument to this script.
24+
25+
Example usage:
26+
27+
```bash
28+
# Convert to LoRA with rank 8 and evaluate it
29+
python evaluate-lora-conversion.py --path /path/to/peft/checkpoint --rank 8
30+
# Convert to LoRA with dynamic rank (50% singular value threshold) and evaluate it
31+
python evaluate-lora-conversion.py --path /path/to/peft/checkpoint --rank 0.5
32+
# Evaluate the original PEFT model without LoRA conversion
33+
python evaluate-lora-conversion.py --path /path/to/peft/checkpoint
34+
```
35+
36+
The script will report the evaluation accuracy, maximum CUDA memory reserved, and evaluation time for the converted LoRA
37+
model.
38+
39+
"""
40+
41+
import argparse
42+
import importlib.util
43+
import os
44+
import sys
45+
import time
46+
47+
import torch
48+
from transformers import AutoModelForCausalLM
49+
50+
from peft import PeftModel, convert_to_lora, get_peft_model, set_peft_model_state_dict
51+
52+
53+
root = os.path.abspath(os.path.join(os.path.dirname(__file__), ".."))
54+
55+
spec = importlib.util.spec_from_file_location("data", os.path.join(root, "method_comparison", "MetaMathQA", "data.py"))
56+
mm_data = importlib.util.module_from_spec(spec)
57+
spec.loader.exec_module(mm_data)
58+
sys.modules["data"] = mm_data
59+
60+
spec = importlib.util.spec_from_file_location(
61+
"utils", os.path.join(root, "method_comparison", "MetaMathQA", "utils.py")
62+
)
63+
mm_utils = importlib.util.module_from_spec(spec)
64+
spec.loader.exec_module(mm_utils)
65+
sys.modules["utils"] = mm_utils
66+
67+
spec = importlib.util.spec_from_file_location("run", os.path.join(root, "method_comparison", "MetaMathQA", "run.py"))
68+
mm_run = importlib.util.module_from_spec(spec)
69+
spec.loader.exec_module(mm_run)
70+
71+
72+
def noop(*args, **kwargs):
73+
pass
74+
75+
76+
def evaluate_model(model, tokenizer, ds_test):
77+
torch.cuda.empty_cache()
78+
torch.cuda.reset_peak_memory_stats()
79+
tic = time.perf_counter()
80+
predictions, responses = mm_run.evaluate(
81+
model=model,
82+
tokenizer=tokenizer,
83+
ds=ds_test,
84+
batch_size=50,
85+
generate_kwargs={"max_length": 800, "max_new_tokens": 300, "pad_token_id": tokenizer.eos_token_id},
86+
use_tqdm=True,
87+
)
88+
toc = time.perf_counter()
89+
accuracy_peft = mm_utils.get_accuracy(predictions=predictions, responses=responses)
90+
cuda_mem_reserved_max = torch.cuda.memory_reserved(0)
91+
print(f"Evaluation Accuracy: {100 * accuracy_peft:.2f}%")
92+
print(f"Max CUDA Memory Reserved: {cuda_mem_reserved_max / (1024**3):.2f} GB")
93+
print(f"Evaluation Time: {toc - tic:.0f} seconds".format(toc - tic))
94+
95+
96+
def main(path_peft_model: str, rank: int | float | None) -> None:
97+
model_id = "meta-llama/Llama-3.2-3B"
98+
tokenizer = mm_utils.get_tokenizer(model_id=model_id, max_seq_length=768)
99+
_, _, ds_test = mm_data.get_train_valid_test_datasets(
100+
tokenizer=tokenizer, query_template="Question: {query} Think step by step.\nAnswer:", print_fn=noop
101+
)
102+
103+
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16).to(0)
104+
model = PeftModel.from_pretrained(model, path_peft_model)
105+
if rank is None:
106+
print("Evaluating the original PEFT model without LoRA conversion...")
107+
model.set_adapter("default")
108+
model.print_trainable_parameters()
109+
model.eval()
110+
evaluate_model(model, tokenizer, ds_test)
111+
return
112+
113+
print(f"Converting PEFT model to LoRA with rank={rank}...")
114+
tic = time.perf_counter()
115+
lora_config, lora_state_dict = convert_to_lora(model, rank=rank, progressbar=True)
116+
toc = time.perf_counter()
117+
print(f"Conversion completed in {toc - tic:.0f} seconds.".format(toc - tic))
118+
119+
del model
120+
torch.cuda.empty_cache()
121+
model = AutoModelForCausalLM.from_pretrained(model_id, dtype=torch.bfloat16).to(0)
122+
123+
model = get_peft_model(model, lora_config)
124+
model.print_trainable_parameters()
125+
126+
load_result = set_peft_model_state_dict(model, lora_state_dict)
127+
assert not load_result.unexpected_keys, (
128+
f"Unexpected keys when loading LoRA state dict: {load_result.unexpected_keys}"
129+
)
130+
131+
del lora_state_dict
132+
model.eval()
133+
evaluate_model(model, tokenizer, ds_test)
134+
135+
136+
if __name__ == "__main__":
137+
parser = argparse.ArgumentParser(description="Evaluate a PEFT checkpoint converted into a LoRA on GSM8K")
138+
parser.add_argument(
139+
"--path",
140+
type=str,
141+
required=True,
142+
help="Path to the input PEFT checkpoint",
143+
)
144+
parser.add_argument(
145+
"--rank",
146+
required=False,
147+
default=None,
148+
help="Rank for the LoRA decomposition (int, float, or None for no conversion)",
149+
)
150+
151+
args = parser.parse_args()
152+
if args.rank is not None:
153+
if "." in str(args.rank):
154+
args.rank = float(args.rank)
155+
else:
156+
args.rank = int(args.rank)
157+
main(args.path, args.rank)

src/peft/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,9 +117,11 @@
117117
WaveFTModel,
118118
XLoraConfig,
119119
XLoraModel,
120+
convert_to_lora,
120121
create_arrow_model,
121122
get_eva_state_dict,
122123
initialize_lora_eva_weights,
124+
save_as_lora,
123125
)
124126
from .tuners.cartridge.utils import (
125127
compose_cartridge_adapters,
@@ -244,6 +246,7 @@
244246
"bloom_model_postprocess_past_key_value",
245247
"cast_mixed_precision_params",
246248
"compose_cartridge_adapters",
249+
"convert_to_lora",
247250
"create_arrow_model",
248251
"get_eva_state_dict",
249252
"get_layer_status",
@@ -259,6 +262,7 @@
259262
"prepare_model_for_kbit_training",
260263
"prompt_embeddings_from_past_key_values",
261264
"replace_lora_weights_loftq",
265+
"save_as_lora",
262266
"set_peft_model_state_dict",
263267
"shift_tokens_right",
264268
]

src/peft/peft_model.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1618,6 +1618,21 @@ def create_or_update_model_card(self, output_dir: str):
16181618
card.text = "\n".join(lines)
16191619
card.save(filename)
16201620

1621+
def supports_lora_conversion(self, adapter_name: str = "default") -> bool:
1622+
"""
1623+
Whether it is possible for the adapter of this model to be converted to LoRA.
1624+
1625+
Normally, this works if the PEFT method is additive, i.e. W' = W_base + delta_weight.
1626+
"""
1627+
peft_config = self.active_peft_config
1628+
if peft_config.is_prompt_learning:
1629+
return False
1630+
1631+
if not hasattr(self.base_model, "supports_lora_conversion"):
1632+
return False
1633+
1634+
return self.base_model.supports_lora_conversion()
1635+
16211636

16221637
class PeftModelForSequenceClassification(PeftModel):
16231638
"""

src/peft/tuners/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,11 @@
3535
LoraConfig,
3636
LoraModel,
3737
LoraRuntimeConfig,
38+
convert_to_lora,
3839
create_arrow_model,
3940
get_eva_state_dict,
4041
initialize_lora_eva_weights,
42+
save_as_lora,
4143
)
4244
from .miss import MissConfig, MissModel
4345
from .mixed import MixedModel
@@ -132,7 +134,9 @@
132134
"WaveFTModel",
133135
"XLoraConfig",
134136
"XLoraModel",
137+
"convert_to_lora",
135138
"create_arrow_model",
136139
"get_eva_state_dict",
137140
"initialize_lora_eva_weights",
141+
"save_as_lora",
138142
]

0 commit comments

Comments
 (0)