-
Notifications
You must be signed in to change notification settings - Fork 6.6k
[IP-Adapter] Support multiple IP-Adapters #6573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
7e1bf9b
3b55a1e
345b4d6
f6bae6e
baa7b83
9024698
27fc796
67908bf
afd91e3
7d42455
1fdcd7b
45fb582
1a4c6b1
d924c47
cc2aa1b
2c86534
0049e44
4a3df90
ff96407
193d6e8
f7f2465
efa704a
9abdcf9
5e47ceb
c6670de
711387e
bce309f
fae861e
21da205
accee6b
816578f
e57103f
6cfa34b
1e68c64
2cc1561
475046e
98fa0c2
3a52ecb
e742cf4
dcdde9c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -720,7 +720,6 @@ def __call__( | |||||||||||||||||||
| attention_mask: Optional[torch.FloatTensor] = None, | ||||||||||||||||||||
| temb: Optional[torch.FloatTensor] = None, | ||||||||||||||||||||
| scale: float = 1.0, | ||||||||||||||||||||
| **kwargs, | ||||||||||||||||||||
| ) -> torch.Tensor: | ||||||||||||||||||||
| residual = hidden_states | ||||||||||||||||||||
|
|
||||||||||||||||||||
|
|
@@ -1197,7 +1196,6 @@ def __call__( | |||||||||||||||||||
| attention_mask: Optional[torch.FloatTensor] = None, | ||||||||||||||||||||
| temb: Optional[torch.FloatTensor] = None, | ||||||||||||||||||||
| scale: float = 1.0, | ||||||||||||||||||||
| **kwargs, | ||||||||||||||||||||
| ) -> torch.FloatTensor: | ||||||||||||||||||||
| residual = hidden_states | ||||||||||||||||||||
| if attn.spatial_norm is not None: | ||||||||||||||||||||
|
|
@@ -2125,12 +2123,15 @@ def __call__( | |||||||||||||||||||
| self, | ||||||||||||||||||||
| attn, | ||||||||||||||||||||
| hidden_states, | ||||||||||||||||||||
| encoder_hidden_states=None, | ||||||||||||||||||||
| encoder_hidden_states, | ||||||||||||||||||||
| attention_mask=None, | ||||||||||||||||||||
| temb=None, | ||||||||||||||||||||
| **kwargs, | ||||||||||||||||||||
yiyixuxu marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||
| ): | ||||||||||||||||||||
| residual = hidden_states | ||||||||||||||||||||
|
|
||||||||||||||||||||
| encoder_hidden_states, ip_hidden_states = encoder_hidden_states | ||||||||||||||||||||
sayakpaul marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||||
| encoder_hidden_states, ip_hidden_states = encoder_hidden_states | |
| if isinstance(encoder_hidden_states, tuple): | |
| encoder_hidden_states, ip_hidden_states = encoder_hidden_states | |
| else: | |
| end_pos = encoder_hidden_states.shape[1] - self.num_tokens | |
| encoder_hidden_states, ip_hidden_states = ( | |
| encoder_hidden_states[:, :end_pos, :], | |
| encoder_hidden_states[:, end_pos:, :], | |
| ) |
Let's try to make the class backwards compatible
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not add a kwargs here unnecessarily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it's not unnecessary and all processors should use this layout as it is much more flexible and less error prone if cross_attention_kwargs are actually used. At the moment it's not possible to use a processor with custom signature without awkward cross_attention_kwargs dict restructuring in attention.py, a core file, as cross_attention_kwargs has to match all receiving function signatures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we don't need **kwargs here, but I'm slightly in favor of adding it - I think it's more readable and makes mix-and-match attention processors easier here.
For example, the scale argument is not used in the IP-adapter attention calculation at all. However, we have to add it in the signature; otherwise, it won't work because the IP-adapter attention processor is used together with the default attention processors, which takes a scale argument.
Likewise, if we want to add a new argument to ip-adapter attention processor, we would have to add that argument to the default attention_processor's signature too (e.g. if we are going to support ip-adapter attention mask, we will have to add a new ip-attention-mask argument) - I think it is more confusing than simply appending a **kwargs to the signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Yiyi here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you feel strongly here @yiyixuxu go for it! Still don't think we should add it though because adding kwargs makes it much harder for people to spot errors. E.g. if a function has a **kwarg input, it swallows everything.
E.g. if you have:
def forward(input_values, attention_scale=1.0, **kwargs):
...and someone write forward(input_values, atention_scale=5.0) # note the typo for attention_scale here.
Then this code is executed nevertheless with attention_scale set to 1.0 though because atention_scale is swallod by kwargs.
Most programming languages don't allow something like **kwargs for good reason in my opinion.
Also it sets a bad precedent and incentivizes people to add kwargs everywhere.
It also significantly hurts readability. Having a kwargs argument means that you can never be sure what can all be passed to the function making it harder to understand the code.
But again, I do see the upsides you mention above. Another possibility that we could start to explore more is to add validation function decorators that are used quite heavily in huggingface_hub: https://github.com/huggingface/huggingface_hub/blob/c528f78fa7404869f32a2b04eac0f9964a5c6f9e/src/huggingface_hub/utils/_validators.py#L46 we could use such a validator to remove an argument like scale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using validation decorators would be very nice to have. I am happy to take a look at it next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the explanation! make sense! I will not add **kwargs for now:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I posted a comment here to complement on @patrickvonplaten's comment above. IMO using a decorator to avoid **kwargs is legit for some use cases while more debatable in other situations. (just to keep it in mind)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd keep
encoder_hidden_statesoptional here for backwards breakingUh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
but I think
encocer_hidden_statewas not actually an optional argument in the previous code, even though it followed the same code structure as the default attention processor, so it will be able to run whenencoder_hidden_stateisNone. however, it makes no sense though, and the ip-adapter attention processor can not be used for self-attention. so I think it was a mistake in the signature. If that's the case, it would not be a breaking change to remove=None, no?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to follow closely with default attention processor