Skip to content

tstramer/diffusers

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

79 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Diffusers

Definitions

Models: Single neural network that models p_ΞΈ(x_t-1|x_t) and is trained to β€œdenoise” to image Examples: UNet, Conditioned UNet, 3D UNet, Transformer UNet

model_diff_1_50

Schedulers: Algorithm to sample noise schedule for both training and inference. Defines alpha and beta schedule, timesteps, etc.. Example: Gaussian DDPM, DDIM, PMLS, DEIN

sampling training

Diffusion Pipeline: End-to-end pipeline that includes multiple diffusion models, possible text encoders, CLIP Example: GLIDE,CompVis/Latent-Diffusion, Imagen, DALL-E

imagen

1. diffusers as a central modular diffusion and sampler library

diffusers is more modularized than transformers. The idea is that researchers and engineers can use only parts of the library easily for the own use cases. It could become a central place for all kinds of models, schedulers, training utils and processors that one can mix and match for one's own use case. Both models and scredulers should be load- and saveable from the Hub.

Example:

import torch
from diffusers import UNetModel, GaussianDDPMScheduler
import PIL
import numpy as np

generator = torch.Generator()
generator = generator.manual_seed(6694729458485568)
torch_device = "cuda" if torch.cuda.is_available() else "cpu"

# 1. Load models
scheduler = GaussianDDPMScheduler.from_config("fusing/ddpm-lsun-church")
model = UNetModel.from_pretrained("fusing/ddpm-lsun-church").to(torch_device)

# 2. Sample gaussian noise
image = scheduler.sample_noise((1, model.in_channels, model.resolution, model.resolution), device=torch_device, generator=generator)

# 3. Denoise                                                                                                                                           
for t in reversed(range(len(scheduler))):
    # i) define coefficients for time step t
    clipped_image_coeff = 1 / torch.sqrt(scheduler.get_alpha_prod(t))
    clipped_noise_coeff = torch.sqrt(1 / scheduler.get_alpha_prod(t) - 1)
    image_coeff = (1 - scheduler.get_alpha_prod(t - 1)) * torch.sqrt(scheduler.get_alpha(t)) / (1 - scheduler.get_alpha_prod(t))
    clipped_coeff = torch.sqrt(scheduler.get_alpha_prod(t - 1)) * scheduler.get_beta(t) / (1 - scheduler.get_alpha_prod(t))

    # ii) predict noise residual
    with torch.no_grad():
        noise_residual = model(image, t)

    # iii) compute predicted image from residual
    # See 2nd formula at https://github.com/hojonathanho/diffusion/issues/5#issue-896554416 for comparison
    pred_mean = clipped_image_coeff * image - clipped_noise_coeff * noise_residual
    pred_mean = torch.clamp(pred_mean, -1, 1)
    prev_image = clipped_coeff * pred_mean + image_coeff * image

    # iv) sample variance
    prev_variance = scheduler.sample_variance(t, prev_image.shape, device=torch_device, generator=generator)

    # v) sample  x_{t-1} ~ N(prev_image, prev_variance)
    sampled_prev_image = prev_image + prev_variance
    image = sampled_prev_image

# process image to PIL
image_processed = image.cpu().permute(0, 2, 3, 1)
image_processed = (image_processed + 1.0) * 127.5
image_processed = image_processed.numpy().astype(np.uint8)
image_pil = PIL.Image.fromarray(image_processed[0])

# save image
image_pil.save("test.png")

2. diffusers as a collection of most important Diffusion systems (GLIDE, Dalle, ...)

models directory in repository hosts the complete code necessary for running a diffusion system as well as to train it. A DiffusionPipeline class allows to easily run the diffusion model in inference:

Example:

from diffusers import DiffusionPipeline
import PIL.Image
import numpy as np

# load model and scheduler
ddpm = DiffusionPipeline.from_pretrained("fusing/ddpm-lsun-bedroom")

# run pipeline in inference (sample random noise and denoise)
image = ddpm()

# process image to PIL
image_processed = image.cpu().permute(0, 2, 3, 1)
image_processed = (image_processed + 1.0) * 127.5
image_processed = image_processed.numpy().astype(np.uint8)
image_pil = PIL.Image.fromarray(image_processed[0])

# save image
image_pil.save("test.png")

Library structure:

β”œβ”€β”€ models
β”‚Β Β  β”œβ”€β”€ audio
β”‚Β Β  β”‚Β Β  └── fastdiff
β”‚Β Β  β”‚Β Β      β”œβ”€β”€ modeling_fastdiff.py
β”‚Β Β  β”‚Β Β      β”œβ”€β”€ README.md
β”‚Β Β  β”‚Β Β      └── run_fastdiff.py
β”‚Β Β  β”œβ”€β”€ __init__.py
β”‚Β Β  └── vision
β”‚Β Β      β”œβ”€β”€ dalle2
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ modeling_dalle2.py
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ README.md
β”‚Β Β      β”‚Β Β  └── run_dalle2.py
β”‚Β Β      β”œβ”€β”€ ddpm
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ example.py
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ modeling_ddpm.py
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ README.md
β”‚Β Β      β”‚Β Β  └── run_ddpm.py
β”‚Β Β      β”œβ”€β”€ glide
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ modeling_glide.py
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ modeling_vqvae.py.py
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ README.md
β”‚Β Β      β”‚Β Β  └── run_glide.py
β”‚Β Β      β”œβ”€β”€ imagen
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ modeling_dalle2.py
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ README.md
β”‚Β Β      β”‚Β Β  └── run_dalle2.py
β”‚Β Β      β”œβ”€β”€ __init__.py
β”‚Β Β      └── latent_diffusion
β”‚Β Β          β”œβ”€β”€ modeling_latent_diffusion.py
β”‚Β Β          β”œβ”€β”€ README.md
β”‚Β Β          └── run_latent_diffusion.py
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ README.md
β”œβ”€β”€ setup.cfg
β”œβ”€β”€ setup.py
β”œβ”€β”€ src
β”‚Β Β  └── diffusers
β”‚Β Β      β”œβ”€β”€ configuration_utils.py
β”‚Β Β      β”œβ”€β”€ __init__.py
β”‚Β Β      β”œβ”€β”€ modeling_utils.py
β”‚Β Β      β”œβ”€β”€ models
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ __init__.py
β”‚Β Β      β”‚Β Β  β”œβ”€β”€ unet_glide.py
β”‚Β Β      β”‚Β Β  └── unet.py
β”‚Β Β      β”œβ”€β”€ pipeline_utils.py
β”‚Β Β      └── schedulers
β”‚Β Β          β”œβ”€β”€ gaussian_ddpm.py
β”‚Β Β          β”œβ”€β”€ __init__.py
β”œβ”€β”€ tests
β”‚Β Β  └── test_modeling_utils.py

About

πŸ€— Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.7%
  • Makefile 0.3%