Skip to content

Latest commit

 

History

History
19 lines (10 loc) · 526 Bytes

File metadata and controls

19 lines (10 loc) · 526 Bytes

Sage attention hacked for NVIDIA Turing GPUs. See the real repo: https://github.com/thu-ml/SageAttention

MMA "fixed" thanks to https://github.com/1506086927

sparge attention can, in theory, run given the same treatment


Status as of 2.1.1:

Compiles on cuda 11.8

fused kernel : working on SM75

qattn: fixed MMA from https://github.com/1506086927

9/1/25 - triton w/fused works on ComfyUI with SageAttention command line. 12/13/25 - mma nan is fixed but speeds still not beating xformers when compiled