Add Newbie Image support #12803

Disty0 · 2025-12-07T15:22:43Z

What does this PR do?

Adds NewbieAI support to Diffusers.
Adds pooled_projection_dim config to Lumina2Transformer2DModel and uses pooled projections from Newbie codebase if it is set to something other than None.

Original NewbieAI model: https://huggingface.co/NewBie-AI/NewBie-image-Exp0.1
NewbieAI in Diffusers format: https://huggingface.co/Disty0/NewBie-image-Exp0.1-Diffusers

Known Issues:

JinaClip requires trust_remote_code=True and has to be loaded separately.
JinaClip doesn't work with CPU Offload.

Example code:

import torch
from diffusers import NewbiePipeline
from transformers import AutoModel

device = "cuda"
model_path = "Disty0/NewBie-image-Exp0.1-Diffusers"
text_encoder_2 = AutoModel.from_pretrained(model_path, subfolder="text_encoder_2", trust_remote_code=True, torch_dtype=torch.bfloat16)
pipe = NewbiePipeline.from_pretrained(model_path, text_encoder_2=text_encoder_2, torch_dtype=torch.bfloat16)
del text_encoder_2

# Enable memory optimizations.
pipe.enable_model_cpu_offload(device=device)

prompt = """
  <character_1>
  <n>$character_1$</n>
  <gender>1girl</gender>
  <appearance>chibi, red_eyes, blue_hair, long_hair, hair_between_eyes, head_tilt, tareme, closed_mouth</appearance>
  <clothing>school_uniform, serafuku, white_sailor_collar, white_shirt, short_sleeves, red_neckerchief, bow, blue_skirt, miniskirt, pleated_skirt, blue_hat, mini_hat, thighhighs, grey_thighhighs, black_shoes, mary_janes</clothing>
  <expression>happy, smile</expression>
  <action>standing, holding, holding_briefcase</action>
  <position>center_left</position>
  </character_1>

  <character_2>
  <n>$character_2$</n>
  <gender>1girl</gender>
  <appearance>chibi, red_eyes, pink_hair, long_hair, very_long_hair, multi-tied_hair, open_mouth</appearance>
  <clothing>school_uniform, serafuku, white_sailor_collar, white_shirt, short_sleeves, red_neckerchief, bow, red_skirt, miniskirt, pleated_skirt, hair_bow, multiple_hair_bows, white_bow, ribbon_trim, ribbon-trimmed_bow, white_thighhighs, black_shoes, mary_janes, bow_legwear, bare_arms</clothing>
  <expression>happy, smile</expression>
  <action>standing, holding, holding_briefcase, waving</action>
  <position>center_right</position>
  </character_2>

  <general_tags>
  <count>2girls, multiple_girls</count>
  <style>anime_style, digital_art</style>
  <background>white_background, simple_background</background>
  <atmosphere>cheerful</atmosphere>
  <quality>high_resolution, detailed</quality>
  <objects>briefcase</objects>
  <other>alternate_costume</other>
  </general_tags>
"""

negative_prompt = "blurry, worst quality, low quality, deformed hands, bad anatomy, extra limbs, poorly drawn face, mutated, extra eyes, bad proportions"

pipe.text_encoder_2 = pipe.text_encoder_2.to(device)
image = pipe(
    prompt,
    negative_prompt=negative_prompt,
    height=1024,
    width=1024,
    guidance_scale=2.5,
    num_inference_steps=30,
    generator=torch.manual_seed(42),
).images[0]
display(image)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[*] Did you read the contributor guideline?
[*] Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
[*] Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Core library:

Pipelines and pipeline callbacks: @yiyixuxu and @asomoza

david6666666 · 2025-12-16T11:34:00Z

Any updated？

woct0rdho · 2025-12-20T06:12:50Z

FYI: A simplified implementation of Jina CLIP v2 has been merged in ComfyUI, see Comfy-Org/ComfyUI#11415 . Compared to the original XLM-RoBERTa, the main difference is that they use a RoPE.

If you don't like it that loading the official Jina CLIP v2 requires trust_remote_code=True, maybe you can directly implement it in Transformers.

vladmandic · 2026-01-08T20:41:12Z

gentle ping @sayakpaul @yiyixuxu

dg845 · 2026-01-09T03:55:44Z

src/diffusers/models/transformers/transformer_lumina2.py

        self,
        hidden_size: int = 4096,
        cap_feat_dim: int = 2048,
+        pooled_projection_dim: Optional[int] = None,


Instead of modifying transformer_lumina2.py directly, can you implement the modified transformer in a separate file transformer_newbie.py? You can copy over model classes as needed with the # Copied from mechanism:

class NewbieCombinedTimestepCaptionEmbedding(nn.Module): # Lumina2CombinedTimestepCaptionEmbedding which accepts an additional `pooled_projection_dim` ... # Since not changed, use copied from # Copied from diffusers.models.transformers.transformer_lumina2.Lumina2AttnProcessor2_0 class Lumina2AttnProcessor2_0: ...

# Copied from will ensure that the code is synced between the two files.

dg845 · 2026-01-09T04:00:21Z

src/diffusers/pipelines/newbie/pipeline_newbie.py

+
+        >>> device = "cuda"
+        >>> model_path = "Disty0/NewBie-image-Exp0.1-Diffusers"
+        >>> text_encoder_2 = AutoModel.from_pretrained(model_path, subfolder="text_encoder_2", trust_remote_code=True, torch_dtype=torch.bfloat16)


Can you implement the JinaClip text encoder model in the Newbie pipeline directory, for example at src/diffusers/pipelines/newbie/modeling_jina_clip.py? The model should inherit from diffusers.models.modeling_utils.ModelMixin and diffusers.configuration_utils.ConfigMixin. This will help ensure that text_encoder_2 works with e.g. CPU offloading and removes the dependency on trust_remote_code=True.

dg845 · 2026-01-09T04:08:09Z

src/diffusers/pipelines/newbie/pipeline_newbie.py

+    return timesteps, num_inference_steps
+
+
+class NewbiePipeline(Lumina2Pipeline):


Can NewbiePipeline inherit directly from DiffusionPipeline (and Lumina2LoraLoaderMixin, if appropriate) rather than Lumina2Pipeline? You can copy over methods (with the # Copied from mechanism) and properties as needed:

class NewbiePipeline(DiffusionPipeline, Lumina2LoraLoaderMixin): .... # Copied from diffusers.pipelines.lumina2.Lumina2Pipeline.prepare_latents def prepare_latents(self, batch_size, num_channels_latents, height, width, dtype, device, generator, latents=None): ... ...

You don't need to copy over the VAE slicing/tiling methods, as these will be deprecated and users can always call the analogous methods directly on the VAE with e.g. pipe.vae.enable_slicing().

dg845

@Disty0, thanks for the PR and thanks for your patience! The main items from the review are as follows:

Implement the DiT in a separate file transformer_newbie.py.
Implement JinaClip in a file in the newbie/ pipeline directory
Have NewbiePipeline inherit directly from DiffusionPipeline

Add NewbieAI support

eae0024

Disty0 changed the title ~~Add NewbieAI support~~ Add Newbie Image support Dec 7, 2025

increse the newbie max_sequence_length limit to 8192

4636e0e

sayakpaul mentioned this pull request Dec 8, 2025

Add NewbiePipeline and NextDiT_3B_GQA_patch2_Adaln_Refiner_WHIT_CLIP transformer #12789

Open

4 tasks

yiyixuxu requested a review from dg845 January 8, 2026 22:57

dg845 reviewed Jan 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Newbie Image support #12803

Add Newbie Image support #12803

Disty0 commented Dec 7, 2025 •

edited

Loading

Uh oh!

david6666666 commented Dec 16, 2025

Uh oh!

woct0rdho commented Dec 20, 2025

Uh oh!

vladmandic commented Jan 8, 2026

Uh oh!

dg845 Jan 9, 2026 •

edited

Loading

Uh oh!

dg845 Jan 9, 2026

Uh oh!

dg845 Jan 9, 2026

Uh oh!

dg845 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		return timesteps, num_inference_steps


		class NewbiePipeline(Lumina2Pipeline):

Add Newbie Image support #12803

Are you sure you want to change the base?

Add Newbie Image support #12803

Conversation

Disty0 commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Known Issues:

Before submitting

Who can review?

Uh oh!

david6666666 commented Dec 16, 2025

Uh oh!

woct0rdho commented Dec 20, 2025

Uh oh!

vladmandic commented Jan 8, 2026

Uh oh!

dg845 Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dg845 Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

dg845 Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

dg845 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Disty0 commented Dec 7, 2025 •

edited

Loading

dg845 Jan 9, 2026 •

edited

Loading