-
Notifications
You must be signed in to change notification settings - Fork 6.4k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
In docstring for the forward of this model it's stated that sample should have the shape: batch, num_frames, channel, height, width
, but later before any permutations the model has a string num_frames = sample.shape[2]
. It seems that these statements contradict each other. The model works when frames go at dim=2 and channels at dim=1 but it contradicts the documentation.
Reproduction
model = UNet3DConditionModel(
sample_size=(240, 320),
in_channels=3,
out_channels=3,
layers_per_block=2,
block_out_channels=(12,),
norm_num_groups=2,
down_block_types=(
"DownBlock3D",
),
up_block_types=(
"UpBlock3D",
),
cross_attention_dim=24,
attention_head_dim=8,
)
model.forward(
sample = torch.randn(1, 75, 3, 240, 320),
timestep = 500,
encoder_hidden_states = torch.ones(1, 75, 24) * 3.0,
)
Logs
No response
System Info
diffusers
version: 0.25.1- Platform: Linux-6.7.0-0-MANJARO-x86_64-with-glibc2.38
- Python version: 3.11.6
- PyTorch version (GPU?): 2.1.2+cu121 (True)
- Huggingface_hub version: 0.20.2
- Transformers version: 4.36.1
- Accelerate version: 0.25.0
- xFormers version: not installed
- Using GPU in script?: True
- Using distributed or parallel set-up in script?: False
Who can help?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working