-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
System Info
transformers version: 4.49.0
Platform: Linux-6.6.0-72.0.0.64.oe2403.x86_64-x86_64-with-glibc2.38
Python version: 3.10.16
Huggingface_hub version: 0.29.1
Safetensors version: 0.5.3
Accelerate version: 1.4.0
Accelerate config: not found
DeepSpeed version: not installed
PyTorch version (GPU?): 2.2.2+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?:
Using GPU in script?:
GPU type: NVIDIA L4
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder - My own task or dataset (give details below)
Reproduction
I load an adapter for Qwen/Qwen2.5-0.5B using the following code and an error occur:
import torch
from peft import AutoPeftModelForCausalLM
from transformers import AutoTokenizer, pipeline
from peft import PeftConfig, PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
peft_model_id = "/home/chenjq/pythonWork/nlp/Qwen2.5-0.5B-SFT-Capybara/checkpoint-31"
# peft_model_id = args.output_dir
tokenizer = AutoTokenizer.from_pretrained(peft_model_id)
# Load Model with PEFT adapter
model = AutoPeftModelForCausalLM.from_pretrained(
peft_model_id,
device_map="auto",
torch_dtype=torch.float16
)Error info as follow:
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Traceback (most recent call last):
File "/home/chenjq/.pycharm_helpers/pydev/pydevd.py", line 1500, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/home/chenjq/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/chenjq/pythonWork/nlp/test14.py", line 11, in <module>
model = AutoPeftModelForCausalLM.from_pretrained(
File "/home/chenjq/miniconda3/envs/nlp/lib/python3.10/site-packages/peft/auto.py", line 130, in from_pretrained
return cls._target_peft_class.from_pretrained(
File "/home/chenjq/miniconda3/envs/nlp/lib/python3.10/site-packages/peft/peft_model.py", line 581, in from_pretrained
load_result = model.load_adapter(
File "/home/chenjq/miniconda3/envs/nlp/lib/python3.10/site-packages/peft/peft_model.py", line 1239, in load_adapter
load_result = set_peft_model_state_dict(
File "/home/chenjq/miniconda3/envs/nlp/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 451, in set_peft_model_state_dict
load_result = model.load_state_dict(peft_model_state_dict, strict=False)
File "/home/chenjq/miniconda3/envs/nlp/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([151936, 896]) from checkpoint, the shape in current model is torch.Size([151665, 896]).
Process finished with exit code 1However, if I use the following code to load model, everything just work fine:
from peft import PeftConfig, PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base_model_name ='/home/models/qwen/Qwen2.5-0.5B'
adapter_model_name = "/home/chenjq/pythonWork/nlp/Qwen2.5-0.5B-SFT-Capybara/checkpoint-31"
model = AutoModelForCausalLM.from_pretrained(base_model_name)
model = PeftModel.from_pretrained(model, adapter_model_name)
tokenizer = AutoTokenizer.from_pretrained(base_model_name)Some info from here that maybe help:
Hi everyone! I did some research and found out that the error occurs because the len(tokenizer)(151665) and the embedding size (151936) of Qwen/Qwen2.5-0.5B do not match. _BaseAutoPeftModel.from_pretrained resizes the base model embeddings to match with the tokenizer (here) and as a result, it is unable to load the saved weights. I think a possible solution might be to only resize base model embeddings if the tokenizer size differs from the base tokenizer size. What do you think?
The adapter trained using the following code:
from datasets import load_dataset
from trl import SFTConfig, SFTTrainer
from peft import LoraConfig
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
dataset = load_dataset("trl-lib/Capybara", split="train")
dataset = dataset.select(range(500))
MODEL_ID = 'Qwen/Qwen2.5-0.5B'
peft_config = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
target_modules="all-linear",
modules_to_save=["lm_head", "embed_token"],
task_type="CAUSAL_LM",
)
args = SFTConfig(
output_dir="Qwen2.5-0.5B-SFT-Capybara", # directory to save and repository id
num_train_epochs=1, # number of training epochs
per_device_train_batch_size=4, # batch size per device during training
gradient_accumulation_steps=4, # number of steps before performing a backward/update pass
gradient_checkpointing=True, # use gradient checkpointing to save memory
optim="adamw_torch_fused", # use fused adamw optimizer
logging_steps=10, # log every 10 steps
save_strategy="epoch", # save checkpoint every epoch
bf16=True, # use bfloat16 precision
tf32=True, # use tf32 precision
learning_rate=2e-4, # learning rate, based on QLoRA paper
max_grad_norm=0.3, # max gradient norm based on QLoRA paper
warmup_ratio=0.03, # warmup ratio based on QLoRA paper
lr_scheduler_type="constant", # use constant learning rate scheduler
push_to_hub=False, # push model to hub
# report_to="tensorboard", # report metrics to tensorboard
)
trainer = SFTTrainer(
MODEL_ID,
train_dataset=dataset,
args=args,
peft_config=peft_config
)
trainer.train()
print('end')Expected behavior
Hope the model can predict normally.