Skip to content

Conversation

@ggerganov
Copy link
Member

cont #16309

Simplify code now that we no longer need to pad the KQ mask for flash attention.


// q: [n_embd_k, n_batch, n_head, ne3 ]
// k: [n_embd_k, n_kv, n_head_kv, ne3 ]
// v: [n_embd_v, n_kv, n_head_kv, ne3 ] !! not transposed !!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a use of GGML_KQ_MASK_PAD in the comment on the next line.

@github-actions github-actions bot added testing Everything test related examples ggml changes relating to the ggml tensor library for machine learning labels Dec 10, 2025
@ggerganov ggerganov merged commit 4dff236 into master Dec 10, 2025
68 of 69 checks passed
@ggerganov ggerganov deleted the gg/ggml-remove-kq-mask branch December 10, 2025 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples ggml changes relating to the ggml tensor library for machine learning testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants