-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Add Newbie Image support #12803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add Newbie Image support #12803
Conversation
|
Any updated? |
|
FYI: A simplified implementation of Jina CLIP v2 has been merged in ComfyUI, see Comfy-Org/ComfyUI#11415 . Compared to the original XLM-RoBERTa, the main difference is that they use a RoPE. If you don't like it that loading the official Jina CLIP v2 requires |
|
gentle ping @sayakpaul @yiyixuxu |
| self, | ||
| hidden_size: int = 4096, | ||
| cap_feat_dim: int = 2048, | ||
| pooled_projection_dim: Optional[int] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of modifying transformer_lumina2.py directly, can you implement the modified transformer in a separate file transformer_newbie.py? You can copy over model classes as needed with the # Copied from mechanism:
class NewbieCombinedTimestepCaptionEmbedding(nn.Module):
# Lumina2CombinedTimestepCaptionEmbedding which accepts an additional `pooled_projection_dim`
...
# Since not changed, use copied from
# Copied from diffusers.models.transformers.transformer_lumina2.Lumina2AttnProcessor2_0
class Lumina2AttnProcessor2_0:
...# Copied from will ensure that the code is synced between the two files.
| >>> device = "cuda" | ||
| >>> model_path = "Disty0/NewBie-image-Exp0.1-Diffusers" | ||
| >>> text_encoder_2 = AutoModel.from_pretrained(model_path, subfolder="text_encoder_2", trust_remote_code=True, torch_dtype=torch.bfloat16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you implement the JinaClip text encoder model in the Newbie pipeline directory, for example at src/diffusers/pipelines/newbie/modeling_jina_clip.py? The model should inherit from diffusers.models.modeling_utils.ModelMixin and diffusers.configuration_utils.ConfigMixin. This will help ensure that text_encoder_2 works with e.g. CPU offloading and removes the dependency on trust_remote_code=True.
| return timesteps, num_inference_steps | ||
|
|
||
|
|
||
| class NewbiePipeline(Lumina2Pipeline): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can NewbiePipeline inherit directly from DiffusionPipeline (and Lumina2LoraLoaderMixin, if appropriate) rather than Lumina2Pipeline? You can copy over methods (with the # Copied from mechanism) and properties as needed:
class NewbiePipeline(DiffusionPipeline, Lumina2LoraLoaderMixin):
....
# Copied from diffusers.pipelines.lumina2.Lumina2Pipeline.prepare_latents
def prepare_latents(self, batch_size, num_channels_latents, height, width, dtype, device, generator, latents=None):
...
...You don't need to copy over the VAE slicing/tiling methods, as these will be deprecated and users can always call the analogous methods directly on the VAE with e.g. pipe.vae.enable_slicing().
dg845
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Disty0, thanks for the PR and thanks for your patience! The main items from the review are as follows:
- Implement the DiT in a separate file
transformer_newbie.py. - Implement JinaClip in a file in the
newbie/pipeline directory - Have
NewbiePipelineinherit directly fromDiffusionPipeline
What does this PR do?
Adds NewbieAI support to Diffusers.
Adds
pooled_projection_dimconfig to Lumina2Transformer2DModel and uses pooled projections from Newbie codebase if it is set to something other than None.Original NewbieAI model: https://huggingface.co/NewBie-AI/NewBie-image-Exp0.1
NewbieAI in Diffusers format: https://huggingface.co/Disty0/NewBie-image-Exp0.1-Diffusers
Known Issues:
Example code:
Before submitting
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
Core library: