Skip to content

XiangGao1102/PTDiffusion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PTDiffusion: Free Lunch for Generating Optical Illusion Hidden Pictures with Phase-Transferred Diffusion Model

[CVPR 2025] Official code of the paper "PTDiffusion: Free Lunch for Generating Optical Illusion Hidden Pictures with Phase-Transferred Diffusion Model". Paper link

Taking the first image on the left as an example, what do you see at your first glance? A painting of a path through a forest (zoom in for a detailed look), or a human face (zoom out for a more global view)? Based on the off-the-shelf text-to-image diffusion model, we contribute a plug-and-play method that naturally dissolves a reference image (shown in the bottom-right corner) into arbitrary scenes described by a text prompt, providing a free lunch for synthesizing optical illusion hidden pictures using diffusion model. Better viewed with zoom-in. Abuntant results of our generated optical illusion hidden pictures are displayed in our paper.

Contributions

  1. We pioneer generating optical illusion hidden pictures from the perspective of text-guided I2I translation.
  2. We propose a concise and elegant method that realizes deep fusion of image structure and text semantics via dynamic phase manipulation in the LDM feature space, producing contextually harmonious illusion pictures.
  3. We propose asynchronous phase transfer to enable flexible control over the degree of hidden image discernibility.
  4. Our method dispenses with any training and optimization process, providing a free lunch for synthesizing illusion pictures using off-the-shelf T2I diffusion model.

Method overview

Established on the pre-trained Latent Diffusion Model (LDM), PTDiffusion is composed of three diffusion trajectories. The inversion trajectory inverts the reference image into the LDM Gaussian noise space. The reconstruction trajectory recovers the reference image from the inverted noise embedding. The sampling trajectory samples the final illusion image from random noise guided by the text prompt. The reconstruction and sampling trajectory are bridged by our proposed phase transfer module, which dynamically transplants diffusion features’ phase spectra from the reconstruction trajectory into the sampling trajectory to smoothly blend the source image structure with the textual semantics in the LDM feature space.

Environment

We use Anaconda environment with python 3.8 and pytorch 2.0, which can be built with the following commands:
First, create a new conda virtual environment:


conda create -n PTDiffusion python=3.8

Then, install pytorch using conda:


conda activate PTDiffusion
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia

Lastly, install the required packages in the requirements.txt:


pip install -r requirements.txt

Download pre-trained models

Our method requires the pre-trained Stable Diffusion model and the CLIP text encoder.

  1. Download the Stable Diffusion v1.5 model checkpoint file v1-5-pruned-emaonly.ckpt and put it right into the "models" folder. It can be downloaded from Hugging Face, or from GoogleDrive.
  2. Download the clip-vit-large-patch14 and put it right into the "openai" folder. It can be downloaded from here with the demo code, or manually downloaded file by file from here. We also provide a DoogleDrive link to download it for convenience.

Run the code

Our model is training-free, you can translate a given reference image into an optical illusion hidden picture by directly running the following inference script:


python inference.py

In the inference script inference.py, you can manually set the image path of the reference image, as well as the target text prompt to describe the scene content of the generated illusion image.

The parameters "direct_transfer_steps", "decayed_transfer_steps", "async_ahead_steps", "exponent" in sample_illusion_image function, as well as the parameters "contrast", "add_noise", "noise_value" in load_ref_img function are all tunable to suit different input reference image.

Test the demo

We also provide a jupyter notebook demo code for ease of visualization, please open it by running the following command:


jupyter notebook demo.ipynb

Citation


@inproceedings{gao2025ptdiffusion,
  title={PTDiffusion: Free Lunch for Generating Optical Illusion Hidden Pictures with Phase-Transferred Diffusion Model},
  author={Gao, Xiang and Yang, Shuai and Liu, Jiaying},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={18240--18249},
  year={2025}
}

About

Official code of the paper "PTDiffusion: Free Lunch for Generating Optical Illusion Hidden Pictures with Phase-Transferred Diffusion Model"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors