Reduce huge amounts of code duplication

**Is your feature request related to a problem? Please describe.**
There is a _ridiculous_ amount of code duplication going on in this library. Could anyone please tell me why the schedulers function `betas_for_alpha_bar` is duplicated - identically - **23 times**, appearing in about half of the schedulers files? There are many other duplicated functions such as `retrieve_timesteps`, appearing 6 times. Many many member functions of classes are also duplicated. This seems like extremely bad practice, and makes it very hard to modify these functions.

This is just the tip of the iceberg - there are similar patterns across the whole codebase. The string "# Copied from" occurs in **149 files**, and often many times in each file too. Bad sign. Why are there separate standalone files for `StableDiffusionPipeline`, `StableDiffusionImg2ImgPipeline`, `StableDiffusionInpaintPipeline`, `StableDiffusionPipelineSafe` and so on (and so on and so on) when they have a huge amount in common, and could be expressed as single methods of or even arguments to one pipeline? Even if you want them to be separate classes, surely they should be sharing code via OOP rather than using this huge amount of duplicated code? Surely, functionally, `StableDiffusionControlNetImg2ImgPipeline` could have been expressed as a very simple combination of the existing ControlNet and Img2Img implementations, rather than being its own 1269 line file (with 17 occurrences of "# Copied from")?

In my opinion this methodology will make this codebase increasingly difficult to maintain and contribute to as the features continue to grow. While I unfortunately don't currently have time to attempt this myself, I really believe this package needs a serious refactor. There are already problems with advanced use; if I were to train a ControlNet on an image variation model, I would be unable to use the result with diffusers as `StableDiffusionControlNetImageVariationPipeline` does not yet exist (not that it should!).

It's worth running `pipelines/stable_diffusion/pipeline_stable_diffusion.py` and `pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py` through diff to see the extent of the issue. The code and documentation are extremely similar, with huge identical sections. `check_inputs` is slightly different, Img2Img has a new function `get_timesteps`, there are slight differences to the arguments to `__call__`, and slight differences to the body - and that's it. Both files are ~1000 lines long.

**Describe the solution you'd like.**
_Share code between pipelines and schedulers wherever possible_. There should surely be some kind of `StableDiffusionPipeline` base class given the number of related classes. ControlNet should be incorporated in to existing Stable Diffusion pipelines - to my understanding wherever SD is used, ControlNet may also be used (we just need to provide an image condition), so to me it seems very overkill to have separate pipelines for every ControlNet use case. For example, `StableDiffusionControlNetImg2ImgPipeline` could inherit from `StableDiffusionImg2ImgPipeline`, or `StableDiffusionImg2ImgPipeline` could take optional image conditions as input to its call.

Of course these are just suggestions, it's worth thinking very carefully about how to proceed! But I really believe that change is necessary or this library will become more and more bloated as new features are added.

I was originally intending to add a custom beta schedule to this library (the sqrt schedule from Diffusion-LM), but the amount of code duplication that would be necessary for this really put me off. I believe the noise schedule system needs a rewrite - there should really be just one function which maps from a string (and num timesteps etc) to a list of betas, rather than 23 identical functions + additional logic in many different constructors. (perhaps I should separate this into a separate issue?)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce huge amounts of code duplication #6153

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reduce huge amounts of code duplication #6153

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions