Implement sliding attention in Gemma3 #11409

woct0rdho · 2025-12-19T02:06:39Z

As the NewBie model uses detailed XML prompts that can be > 1024 tokens, it's time to implement the sliding attention that's previously missing in Gemma3.

I found a mistake in the config that Gemma3 should use 5 sliding + 1 non-sliding, not vice-versa. rope_theta and rope_scale are also swapped accordingly. (Before sliding attention is implemented, this does not affect the results with < 1024 tokens.)

After this PR, the condition tensor from Gemma3 in ComfyUI is much closer to the one from Gemma3 in Transformers.

Implement sliding attention in Gemma3

78fc8d8

woct0rdho requested review from Kosinkadink, comfyanonymous and guill as code owners December 19, 2025 02:06

woct0rdho mentioned this pull request Dec 19, 2025

Added support for NewBieModel #11284

Closed

Ruff

cb1eb08

comfyanonymous merged commit 0aa7fa4 into Comfy-Org:master Dec 20, 2025
10 checks passed

woct0rdho deleted the gemma-sliding-attn branch December 20, 2025 06:03

lrivera pushed a commit to Research-Warrant/ComfyUI that referenced this pull request Jan 8, 2026

Implement sliding attention in Gemma3 (Comfy-Org#11409)

ffec387

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement sliding attention in Gemma3 #11409

Implement sliding attention in Gemma3 #11409

Uh oh!

woct0rdho commented Dec 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement sliding attention in Gemma3 #11409

Implement sliding attention in Gemma3 #11409

Uh oh!

Conversation

woct0rdho commented Dec 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants