You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,6 +10,8 @@ This repository contains the sparse attention primitives used in [Sparse Transfo
10
10
11
11
We hope this code can further accelerate research into sparse attention.
12
12
13
+
An example Transformer implementation which is close to the version we use internally can be found at https://github.com/openai/blocksparse/blob/master/examples/transformer/enwik8.py.
14
+
13
15
# Overview of kernels
14
16
The repository contains fused implementations of the attention operation, which takes in `Q`, `K`, `V` matrices (all of dimensionality `batch, time, dim`) representing the queries, keys, and values for a sequence. For every query element, a weighted sum of the values is returned, where the weightings are determined by the scaled matrix product of `Q` and `K^T`.
0 commit comments