Skip to content

jinjianRick/DCI_ICO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Customized Generation Reimagined: Fidelity and Editability Harmonized

Jian Jin, Yang Shen, Zhenyong Fu† Jian Yang

(†corresponding author)

PCA Lab, Key Lab of Intelligent Perception and Systems for High-Dimensional Information of Ministry of Education, and Jiangsu Key Lab of Image and Video Understanding for Social Security, School of Computer Science and Engineering, Nanjing University of Science and Technology

ECCV 2024

📖 Abstract

results_of_multi_concept

Customized generation aims to incorporate a novel concept into a pre-trained text-to-image model, enabling new generations of the concept in novel contexts guided by textual prompts. However, customized generation suffers from an inherent trade-off between concept fidelity and editability, i.e., between precisely modeling the concept and faithfully adhering to the prompts. Previous methods reluctantly seek a compromise and struggle to achieve both high concept fidelity and ideal prompt alignment simultaneously. In this paper, we propose a "Divide, Conquer, then Integrate" (DCI) framework, which performs a surgical adjustment in the early stage of denoising to liberate the fine-tuned model from the fidelity-editability trade-off at inference. The two conflicting components in the trade-off are decoupled and individually conquered by two collaborative branches, which are then selectively integrated to preserve high concept fidelity while achieving faithful prompt adherence. To obtain a better fine-tuned model, we introduce an Image-specific Context Optimization (ICO) strategy for model customization. ICO replaces manual prompt templates with learnable image-specific contexts, providing an adaptive and precise fine-tuning direction to promote the overall performance. Extensive experiments demonstrate the effectiveness of our method in reconciling the fidelity-editability trade-off.

Getting Started

Config environment

cd stable-diffusion
conda env create -f environment.yaml
conda activate ldm
pip install clip-retrieval tqdm

or

conda env create -f environment.yaml
conda activate ldm

Prepare pre-trained model

Download the stable-diffusion model checkpoint wget https://huggingface.co/CompVis/stable-diffusion-v-1-4-original/resolve/main/sd-v1-4.ckpt

🚀 Run

Finetuning Stage

Config the items in train.sh and run:

bash train.sh

You can download the real regularization images here and place them in ./real_reg, or curate them yourself and organize them in the same format.

Inference

Config the items in inference.sh and run:

bash inference.sh

🖊️ BibTeX

If you find this project useful in your research, please consider cite:

@inproceedings{jin2024customized,
  title={customized generation reimagined: fidelity and editability harmonized}, 
  author={Jian Jin and Yang Shen and Zhenyong Fu and Jian Yang},
  booktitle={European Conference on Computer Vision},
  year={2024}
}

🙏 Acknowledgements

This code is based on the Stable Diffusion and Custom Diffusion. Thank them for their outstanding work.

📧 Contact

Should you have any question or suggestion, please contact jinj@njust.edu.cn.

About

ECCV 2024

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors