Skip to content

Conversation

Junsheng121
Copy link
Contributor

@Junsheng121 Junsheng121 commented Dec 26, 2023

What does this PR do?

Issue: https://github.com/huggingface/diffusers/issues/6313
The implementation of a Null-Text Inversion Pipeline. NullTextPipeline is mostly from the official code [https://github.com/google/prompt-to-prompt/blob/main/null_text_w_ptp.ipynb

Usage:

from diffusers.schedulers import DDIMScheduler
from pipeline_null_text_inversion import NullTextPipeline
import torch

invert_prompt = "A lying cat"
input_image = "siamese.jpg"
steps = 50
prompt = "A lying cat"

# must torch.float32
scheduler = DDIMScheduler(num_train_timesteps=1000, beta_start=0.00085, beta_end=0.0120, beta_schedule="scaled_linear")
pipeline = NullTextPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", scheduler = scheduler, torch_dtype=torch.float32).to("cuda")

inverted_latent, uncond = pipeline.invert(input_image, invert_prompt, num_inner_steps=5, early_stop_epsilon= 1e-5, num_inference_steps = steps)
pipeline(prompt, uncond, inverted_latent, guidance_scale=7.5, num_inference_steps=steps).images[0].save(input_image+".output.jpg")

invert_prompt is used for DDIM-Inversion and optimization.

prompt can be the same with invert_prompt for reconstruction, or be set to a different one like "A lying dog" for image editing.

Note that float16 will fail to successfully optimize the null-text embedding.

Fixes # (issue)

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@sayakpaul
Copy link
Member

Could you please:

  • Remove the image from the PR?
  • Run the styling check (make style && make quality)?
  • Add a section about the pipeline in the README?

Copy link
Contributor

@patrickvonplaten patrickvonplaten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove the image siamese.jpg from the PR to make sure the repo is kept light-weight and add some lines to the README?

@Junsheng121
Copy link
Contributor Author

Junsheng121 commented Dec 27, 2023 via email

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@patrickvonplaten
Copy link
Contributor

Now we just need to run make style once and we should be good :-)

@Junsheng121
Copy link
Contributor Author

Done~

@patrickvonplaten
Copy link
Contributor

Done~

Perfect!

@patrickvonplaten patrickvonplaten merged commit d184291 into huggingface:main Jan 5, 2024
@Skquark
Copy link

Skquark commented Jan 17, 2024

While implementing my UI for Null-Text, ran into a few issues worth pointing out.. In the doc, it shows to import with from examples.community.pipeline_null_text_inversion import NullTextPipeline but took me a while to figure out that it should be from pipeline_null_text_inversion import NullTextPipeline instead. Then the scheduler in example gave warnings with steps_offset & clip_sample, so it should be scheduler = DDIMScheduler(num_train_timesteps=1000, beta_start=0.00085, beta_end=0.0120, beta_schedule="scaled_linear", steps_offset=1, clip_sample=False)... Then I found it strange that the input image is supposed to be a path string instead of a PIL or Numpy like every other pipeline, but I could live with that. Also wouldn't mind a callback function for my progress bar, seems like generation time is slow but probably worth it.
Looking forward to testing the results, just got it working in Diffusion Deluxe, seems quite useful for edits. Thanks..

@Junsheng121
Copy link
Contributor Author

Junsheng121 commented Jan 18, 2024 via email

@mycfhs
Copy link

mycfhs commented Mar 14, 2024

Hello! Is there any way to use it with float16?
As you say, float16 will fail. I find the reason may be the grad turn to zero in unet layers due to it's float16 dtype. The grad in unet is about 1e-10 ~ 1e-7, and those who less than 1e-8 may turn to zero due to the effective range of float16. And following layers' grad turn to smaller and smaller, and finally to zero.

I tried to enlarge the loss by a factor of 10(or 1e3, 1e4), or lr to smaller, but all failed. Are there any solutions to solve it? I really really want to use it with torch.float16 o(╥﹏╥)o

AmericanPresidentJimmyCarter pushed a commit to AmericanPresidentJimmyCarter/diffusers that referenced this pull request Apr 26, 2024
* null-text-inversion-implementation

* edited

* edited

* edited

* edited

* edited

* edit

* makestyle

---------

Co-authored-by: Sayak Paul <[email protected]>
@Adenialzz
Copy link
Contributor

Hi, thanks for contributing. How to combine this with prompt-to-prompt for image editing? Could you please provide an example code snippet? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants