This is the training code for fine-tune Stable Diffusion v3.5. The script is adapted from the diffusers library.
This project includes the following components:
- Full fine-tuning of the SDv3.5 model
- Fine-tuning the SDv3.5 model using LoRA
- Fine-tuning the SDv3.5 model with DreamBooth combined with LoRA
- RLHF-based fine-tuning of the SDv3.5 model using DDPO with an aesthetic scorer
- RLHF-based fine-tuning of the SDv3.5 model using GRPO with an aesthetic scorer
- DPO-based fine-tuning of the SDv1.5 model
- RLHF-based fine-tuning of the SDv1.5 model using ReFL with a text-image matching scorer
Let's dive into Stable Diffusion v3.5!
pip install -r requirements.txt
datas/Path to Dataset (from the HuggingFace hub) containing the training data of instance images or promptsmodels/Path to pretrained model downloaded from huggingface.co/modelsoutputs/The output directory where the model predictions and checkpoints will be writtenscripts/Main script for running SDv3.5 trainingsrc/Main pipeline and trainerdemo.py / demo.shis examples of running SDv3.5 inferencerequirements.txt / setup.pyBasic pip requirementstrain*.pyMain script.
models
|-- aesthetics-predictor-v1-vit-large-patch14
|-- clip-vit-large-patch14
|-- improved-aesthetic-predictor
`-- stable-diffusion-3.5-mediumDownload improved-aesthetic-predictor from: improved-aesthetic-predictor
datas
|-- dogs
`-- pokemon- Full fine-tuning of the SDv3.5 model
bash scripts/train_full_finetuning_sd3.sh- Fine-tuning the SDv3.5 model using LoRA
bash scripts/train_text_to_image_lora_sd3.sh- Fine-tuning the SDv3.5 model with DreamBooth combined with LoRA
bash scripts/train_dreambooth_lora_sd3.sh- RLHF-based fine-tuning of the SDv3.5 model using DDPO with an aesthetic scorer
bash scripts/train_aesthetic_ddpo_sd3.sh- RLHF-based fine-tuning of the SDv3.5 model using GRPO with an aesthetic scorer
# Note: This part of the code may have issues and needs further refinement.
bash scripts/train_aesthetic_rlhf_grpo_lora_sd3.sh- DPO-based fine-tuning of the SDv1.5 model
bash scripts/train_dpo_sd_v1_5.sh- RLHF-based fine-tuning of the SDv3.5 model using ReFL with a text-image matching scorer
bash scripts/train_refl_v1_5.sh--pretrained_model_name_or_pathwhat model to train/initalize from--output_dirwhere to save/log to--seedtraining seed (not set by default)
-
--max_train_stepsHow many train steps to take -
--gradient_accumulation_steps -
--train_batch_sizesee above notes in script for actual BS -
--checkpointing_stepshow often to save model -
--gradient_checkpointingturned on automatically for SDXL -
--learning_rate -
--scale_lrFound this to be very helpful but isn't default in code -
--lr_schedulerType of LR warmup/decay. Default is linear warmup to constant -
--lr_warmup_stepsnumber of scheduler warmup steps
--dataset_namePath to Dataset (from the HuggingFace hub) containing the training data of instance images or prompts--cache_dirwhere dataset is cached locally (users will want to change this to fit their file system)--resolutiondefaults to 512 for non-SDXL, 1024 for SDXL.--random_cropand--no_hflipchanges data aug--dataloader_num_workersnumber of total dataloader workers
