null-text-inversion-pipeline-implementation #6329

Junsheng121 · 2023-12-26T03:46:13Z

What does this PR do?

Issue: https://github.com/huggingface/diffusers/issues/6313
The implementation of a Null-Text Inversion Pipeline. NullTextPipeline is mostly from the official code [https://github.com/google/prompt-to-prompt/blob/main/null_text_w_ptp.ipynb

Usage:

from diffusers.schedulers import DDIMScheduler
from pipeline_null_text_inversion import NullTextPipeline
import torch

invert_prompt = "A lying cat"
input_image = "siamese.jpg"
steps = 50
prompt = "A lying cat"

# must torch.float32
scheduler = DDIMScheduler(num_train_timesteps=1000, beta_start=0.00085, beta_end=0.0120, beta_schedule="scaled_linear")
pipeline = NullTextPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", scheduler = scheduler, torch_dtype=torch.float32).to("cuda")

inverted_latent, uncond = pipeline.invert(input_image, invert_prompt, num_inner_steps=5, early_stop_epsilon= 1e-5, num_inference_steps = steps)
pipeline(prompt, uncond, inverted_latent, guidance_scale=7.5, num_inference_steps=steps).images[0].save(input_image+".output.jpg")

invert_prompt is used for DDIM-Inversion and optimization.

prompt can be the same with invert_prompt for reconstruction, or be set to a different one like "A lying dog" for image editing.

Note that float16 will fail to successfully optimize the null-text embedding.

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

sayakpaul · 2023-12-26T13:27:58Z

Could you please:

Remove the image from the PR?
Run the styling check (make style && make quality)?
Add a section about the pipeline in the README?

patrickvonplaten

Could we remove the image siamese.jpg from the PR to make sure the repo is kept light-weight and add some lines to the README?

Junsheng121 · 2023-12-27T00:21:10Z

OK Patrick von Platen ***@***.***>于2023年12月27日周三05:27写道：

…

***@***.**** commented on this pull request. Could we remove the image siamese.jpg from the PR to make sure the repo is kept light-weight and add some lines to the README? — Reply to this email directly, view it on GitHub <#6329 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AYSXVSDSSVVFEWJDSC5MJHDYLM6M7AVCNFSM6AAAAABBC3DEHOVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTOOJWGU4TSOBSGE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

…s into null-text-inversion

HuggingFaceDocBuilderDev · 2023-12-27T01:36:19Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

patrickvonplaten · 2024-01-02T13:12:54Z

Now we just need to run make style once and we should be good :-)

Junsheng121 · 2024-01-03T03:04:14Z

Done~

patrickvonplaten · 2024-01-05T10:35:18Z

Done~

Perfect!

Skquark · 2024-01-17T07:19:51Z

While implementing my UI for Null-Text, ran into a few issues worth pointing out.. In the doc, it shows to import with from examples.community.pipeline_null_text_inversion import NullTextPipeline but took me a while to figure out that it should be from pipeline_null_text_inversion import NullTextPipeline instead. Then the scheduler in example gave warnings with steps_offset & clip_sample, so it should be scheduler = DDIMScheduler(num_train_timesteps=1000, beta_start=0.00085, beta_end=0.0120, beta_schedule="scaled_linear", steps_offset=1, clip_sample=False)... Then I found it strange that the input image is supposed to be a path string instead of a PIL or Numpy like every other pipeline, but I could live with that. Also wouldn't mind a callback function for my progress bar, seems like generation time is slow but probably worth it.
Looking forward to testing the results, just got it working in Diffusion Deluxe, seems quite useful for edits. Thanks..

Junsheng121 · 2024-01-18T06:55:46Z

I will fix it later Alan Bedian ***@***.***> 于2024年1月17日周三 15:20写道：

…

While implementing my UI for Null-Text, ran into a few issues worth pointing out.. In the doc, it shows to import with from examples.community.pipeline_null_text_inversion import NullTextPipeline but took me a while to figure out that it should be from pipeline_null_text_inversion import NullTextPipeline instead. Then the scheduler in example gave warnings with steps_offset & clip_sample, so it should be scheduler = DDIMScheduler(num_train_timesteps=1000, beta_start=0.00085, beta_end=0.0120, beta_schedule="scaled_linear", steps_offset=1, clip_sample=False)... Not sure if it's compatible with all the other schedulers, but I can stick with DDIM for now. Then I found it strange that the input image is supposed to be a path string instead of a PIL or Numpy like every other pipeline, but I could live with that. Also wouldn't mind a callback function for my progress bar, seems like generation time is slow but probably worth it. The main issue when I finally got it working is the returned StableDiffusionPipelineOutput which gave error about .save not being on str. I checked the returned images value and it was a list with a string, so ["images"] was the output data. I looked into the script, and at the very end I saw this line 256 image, output_type=output_type, do_denormalize=[True] * image.shape[0] which looks like the variables are reversed, so the variable image = output_type, which is "images". Then on the last line is return StableDiffusionPipelineOutput(images=image, nsfw_content_detected=False) where I think images should be a list, but maybe it can take a single. Everything looks like easy fixes, looking forward to testing the results, seems useful for edits. Thanks.. — Reply to this email directly, view it on GitHub <#6329 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AYSXVSGBOWX2ELI4O3YGI43YO53SHAVCNFSM6AAAAABBC3DEHOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJVGIZDANJVG4> . You are receiving this because you authored the thread.Message ID: ***@***.***>

mycfhs · 2024-03-14T09:55:25Z

Hello! Is there any way to use it with float16?
As you say, float16 will fail. I find the reason may be the grad turn to zero in unet layers due to it's float16 dtype. The grad in unet is about 1e-10 ~ 1e-7, and those who less than 1e-8 may turn to zero due to the effective range of float16. And following layers' grad turn to smaller and smaller, and finally to zero.

I tried to enlarge the loss by a factor of 10(or 1e3, 1e4), or lr to smaller, but all failed. Are there any solutions to solve it? I really really want to use it with torch.float16 o(╥﹏╥)o

* null-text-inversion-implementation * edited * edited * edited * edited * edited * edit * makestyle --------- Co-authored-by: Sayak Paul <[email protected]>

Adenialzz · 2024-05-08T10:23:45Z

Hi, thanks for contributing. How to combine this with prompt-to-prompt for image editing? Could you please provide an example code snippet? Thanks.

Junsheng121 added 6 commits December 26, 2023 11:22

null-text-inversion-implementation

966eba2

edited

3529275

edited

5d56886

edited

36c7e13

edited

5b04306

edited

8f7e72f

Merge branch 'main' into null-text-inversion

f489715

patrickvonplaten reviewed Dec 26, 2023

View reviewed changes

Junsheng121 added 2 commits December 27, 2023 09:19

edit

9d9fff5

Merge branch 'null-text-inversion' of github.com:Junsheng121/diffuser…

9d7242d

…s into null-text-inversion

makestyle

4f3f93e

patrickvonplaten merged commit d184291 into huggingface:main Jan 5, 2024

sayakpaul mentioned this pull request Jan 27, 2024

The implementation of a Null-Text Inversion Pipeline #6313

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

null-text-inversion-pipeline-implementation #6329

null-text-inversion-pipeline-implementation #6329

Uh oh!

Junsheng121 commented Dec 26, 2023 •

edited

Loading

Uh oh!

sayakpaul commented Dec 26, 2023

Uh oh!

patrickvonplaten left a comment

Uh oh!

Junsheng121 commented Dec 27, 2023 via email

Uh oh!

HuggingFaceDocBuilderDev commented Dec 27, 2023

Uh oh!

patrickvonplaten commented Jan 2, 2024

Uh oh!

Junsheng121 commented Jan 3, 2024

Uh oh!

patrickvonplaten commented Jan 5, 2024

Uh oh!

Skquark commented Jan 17, 2024 •

edited

Loading

Uh oh!

Junsheng121 commented Jan 18, 2024 via email

Uh oh!

mycfhs commented Mar 14, 2024

Uh oh!

Adenialzz commented May 8, 2024

Uh oh!

Uh oh!

null-text-inversion-pipeline-implementation #6329

null-text-inversion-pipeline-implementation #6329

Uh oh!

Conversation

Junsheng121 commented Dec 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

sayakpaul commented Dec 26, 2023

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

Junsheng121 commented Dec 27, 2023 via email

Uh oh!

HuggingFaceDocBuilderDev commented Dec 27, 2023

Uh oh!

patrickvonplaten commented Jan 2, 2024

Uh oh!

Junsheng121 commented Jan 3, 2024

Uh oh!

patrickvonplaten commented Jan 5, 2024

Uh oh!

Skquark commented Jan 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Junsheng121 commented Jan 18, 2024 via email

Uh oh!

mycfhs commented Mar 14, 2024

Uh oh!

Adenialzz commented May 8, 2024

Uh oh!

Uh oh!

Junsheng121 commented Dec 26, 2023 •

edited

Loading

Skquark commented Jan 17, 2024 •

edited

Loading