Skip to content

Commit dcebd51

Browse files
committed
first draft
1 parent ac61eef commit dcebd51

File tree

4 files changed

+427
-16
lines changed

4 files changed

+427
-16
lines changed

docs/source/en/_toctree.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,8 @@
5252
title: Image-to-image
5353
- local: using-diffusers/inpaint
5454
title: Inpainting
55+
- local: using-diffusers/text-img2vid
56+
title: Text or image-to-video
5557
- local: using-diffusers/depth2img
5658
title: Depth-to-image
5759
title: Tasks
@@ -315,6 +317,8 @@
315317
title: Text-to-image
316318
- local: api/pipelines/stable_diffusion/img2img
317319
title: Image-to-image
320+
- local: api/pipelines/stable_diffusion/svd
321+
title: Image-to-video
318322
- local: api/pipelines/stable_diffusion/inpaint
319323
title: Inpainting
320324
- local: api/pipelines/stable_diffusion/depth2img

docs/source/en/api/attnprocessor.md

Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -20,41 +20,44 @@ An attention processor is a class for applying different types of attention mech
2020
## AttnProcessor2_0
2121
[[autodoc]] models.attention_processor.AttnProcessor2_0
2222

23-
## FusedAttnProcessor2_0
24-
[[autodoc]] models.attention_processor.FusedAttnProcessor2_0
23+
## AttnAddedKVProcessor
24+
[[autodoc]] models.attention_processor.AttnAddedKVProcessor
2525

26-
## LoRAAttnProcessor
27-
[[autodoc]] models.attention_processor.LoRAAttnProcessor
26+
## AttnAddedKVProcessor2_0
27+
[[autodoc]] models.attention_processor.AttnAddedKVProcessor2_0
2828

29-
## LoRAAttnProcessor2_0
30-
[[autodoc]] models.attention_processor.LoRAAttnProcessor2_0
29+
## CrossFrameAttnProcessor
30+
[[autodoc]] pipelines.text_to_video_synthesis.CrossFrameAttnProcessor
3131

3232
## CustomDiffusionAttnProcessor
3333
[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor
3434

3535
## CustomDiffusionAttnProcessor2_0
3636
[[autodoc]] models.attention_processor.CustomDiffusionAttnProcessor2_0
3737

38-
## AttnAddedKVProcessor
39-
[[autodoc]] models.attention_processor.AttnAddedKVProcessor
38+
## CustomDiffusionXFormersAttnProcessor
39+
[[autodoc]] models.attention_processor.CustomDiffusionXFormersAttnProcessor
4040

41-
## AttnAddedKVProcessor2_0
42-
[[autodoc]] models.attention_processor.AttnAddedKVProcessor2_0
41+
## FusedAttnProcessor2_0
42+
[[autodoc]] models.attention_processor.FusedAttnProcessor2_0
43+
44+
## LoRAAttnProcessor
45+
[[autodoc]] models.attention_processor.LoRAAttnProcessor
46+
47+
## LoRAAttnProcessor2_0
48+
[[autodoc]] models.attention_processor.LoRAAttnProcessor2_0
4349

4450
## LoRAAttnAddedKVProcessor
4551
[[autodoc]] models.attention_processor.LoRAAttnAddedKVProcessor
4652

47-
## XFormersAttnProcessor
48-
[[autodoc]] models.attention_processor.XFormersAttnProcessor
49-
5053
## LoRAXFormersAttnProcessor
5154
[[autodoc]] models.attention_processor.LoRAXFormersAttnProcessor
5255

53-
## CustomDiffusionXFormersAttnProcessor
54-
[[autodoc]] models.attention_processor.CustomDiffusionXFormersAttnProcessor
55-
5656
## SlicedAttnProcessor
5757
[[autodoc]] models.attention_processor.SlicedAttnProcessor
5858

5959
## SlicedAttnAddedKVProcessor
6060
[[autodoc]] models.attention_processor.SlicedAttnAddedKVProcessor
61+
62+
## XFormersAttnProcessor
63+
[[autodoc]] models.attention_processor.XFormersAttnProcessor
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Stable Video Diffusion
14+
15+
Stable Video Diffusion was proposed in [Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets](https://hf.co/papers/2311.15127) by Andreas Blattmann, Tim Dockhorn, Sumith Kulal, Daniel Mendelevitch, Maciej Kilian, Dominik Lorenz, Yam Levi, Zion English, Vikram Voleti, Adam Letts, Varun Jampani, Robin Rombach.
16+
17+
The abstract from the paper is:
18+
19+
*We present Stable Video Diffusion - a latent video diffusion model for high-resolution, state-of-the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained for 2D image synthesis have been turned into generative video models by inserting temporal layers and finetuning them on small, high-quality video datasets. However, training methods in the literature vary widely, and the field has yet to agree on a unified strategy for curating video data. In this paper, we identify and evaluate three different stages for successful training of video LDMs: text-to-image pretraining, video pretraining, and high-quality video finetuning. Furthermore, we demonstrate the necessity of a well-curated pretraining dataset for generating high-quality videos and present a systematic curation process to train a strong base model, including captioning and filtering strategies. We then explore the impact of finetuning our base model on high-quality data and train a text-to-video model that is competitive with closed-source video generation. We also show that our base model provides a powerful motion representation for downstream tasks such as image-to-video generation and adaptability to camera motion-specific LoRA modules. Finally, we demonstrate that our model provides a strong multi-view 3D-prior and can serve as a base to finetune a multi-view diffusion model that jointly generates multiple views of objects in a feedforward fashion, outperforming image-based methods at a fraction of their compute budget. We release code and model weights at this https URL.*
20+
21+
<Tip>
22+
23+
To learn how to use Stable Video Diffusion, take a look at the [Stable Video Diffusion](../../../using-diffusers/svd) guide.
24+
25+
<br>
26+
27+
Check out the [Stability AI](https://huggingface.co/stabilityai) Hub organization for the [base](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid) and [extended frame](https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt) checkpoints!
28+
29+
</Tip>
30+
31+
## StableVideoDiffusionPipeline
32+
33+
[[autodoc]] StableVideoDiffusionPipeline
34+
35+
## StableVideoDiffusionPipelineOutput
36+
37+
[[autodoc]] StableVideoDiffusionPipelineOutput

0 commit comments

Comments
 (0)