diff --git a/README.md b/README.md
index 171f5406..7d298284 100644
--- a/README.md
+++ b/README.md
@@ -1,8 +1,10 @@
 # Algorithmica v3
 
-Algorithmica is a free and open web book about Computer Science.
+Algorithmica is an open-access web book dedicated to the art and science of computing.
 
-If you are concerned with editing, please read the [contributing guide](https://ru.algorithmica.org/contributing/) (in Russian).
+You can contribute via [Prose](https://prose.io/) by clicking on the pencil icon on the top right on any page or by editing its source directly on GitHub. We use a slightly different Markdown dialect, so if you are not sure that the change is correct (for example, editing an intricate LaTeX formula), you can install [Hugo](https://gohugo.io/) and build the site locally — or just create a pull request, and a preview link will be automatically generated for you.
+
+If you happen to speak Russian, please also read the [contributing guidelines](https://ru.algorithmica.org/contributing/).
 
 ---
 
@@ -16,11 +18,11 @@ Key technical changes from the [previous version](https://github.com/algorithmic
 * Rich metadata support (language, sections, TOCs, authors...)
 * Automated global table of contents
 * Theming support
+* Search support (Lunr)
 
 Short-term todo list:
 
-* Search with lunr
-* Themes (especially a better dark theme)
-* Minor style adjustments for mobile and print versions
+* Style adjustments for mobile and print versions
 * A pdf version of the whole website
+* Meta-information support (for Google Scholar and social media)
 * [Sticky table of contents](https://css-tricks.com/table-of-contents-with-intersectionobserver/)
diff --git a/assets/slides.sass b/assets/slides.sass
index e69de29b..671ababe 100644
--- a/assets/slides.sass
+++ b/assets/slides.sass
@@ -0,0 +1,50 @@
+$font-text: 'Source Sans', serif !default
+$font-code: 'Inconsolata', monospace !default
+$font-headings: 'Garamond', serif !default
+
+$borders: 1px solid #eaecef !default
+    
+/* fonts */
+@font-face
+  font-family: 'CMU'
+  src: url(fonts/cmu.woff2)
+
+@font-face
+  font-family: 'Merriweather'
+  src: url(fonts/merriweather.woff2)
+
+@font-face
+  font-family: 'Inconsolata'
+  src: url(fonts/inconsolata.woff2)
+
+@font-face
+  font-family: 'Garamond'
+  src: url(fonts/garamond.woff2)
+
+@font-face
+  font-family: "Open Sans"
+  src: url(fonts/opensans.woff2)
+
+@font-face
+  font-family: "Source Sans"
+  src: url(fonts/sourcesans.ttf)
+
+@font-face
+  font-family: "Crimson"
+  src: url(fonts/crimson.ttf)
+
+body
+    font-family: $font-text
+    font-size: 24px
+
+h1
+  font-size: 2em
+  text-align: center
+  margin-top: 0
+  margin-bottom: 20px
+
+h2
+  font-size: 1.5em
+
+h3
+  font-size: 1.25em
diff --git a/config.yaml b/config.yaml
index 7e4ca1b7..1f196de4 100644
--- a/config.yaml
+++ b/config.yaml
@@ -8,6 +8,15 @@ outputFormats:
     baseName: index
     mediaType: text/html
     isHTML: true
+  SearchIndex:
+    mediaType: "application/json"
+    baseName: "searchindex"
+    isPlainText: true
+    notAlternative: true
+outputs:
+  home:
+  - HTML
+  - SearchIndex
 markup:
   goldmark:
     footnote: false  # katex conflict
@@ -33,8 +42,8 @@ languages:
 params:
   repo: "https://github.com/algorithmica-org/algorithmica"
   reveal_hugo:
-    theme: white
+    #theme: white
     slide_number: true
     transition: none
-    #custom_theme: "slides.sass"
-    #custom_theme_compile: true
+    custom_theme: "slides.sass"
+    custom_theme_compile: true
diff --git a/content/english/hpc/_index.md b/content/english/hpc/_index.md
index 5bb1fe60..9b6aa606 100644
--- a/content/english/hpc/_index.md
+++ b/content/english/hpc/_index.md
@@ -33,17 +33,17 @@ A "release" for an open-source book like this essentially means:
 - mostly freezing the table of contents (except for the case studies),
 - doing one final round of heavy copyediting (hopefully, with the help of a professional editor — I still haven’t figured out how commas work in English),
 - drawing illustrations (I stole a lot of those that are currently displayed),
-- making a print-optimized pdf and figuring out the best way to distribute it.
+- making a print-optimized PDF and figuring out the best way to distribute it.
 
 After that, I will mostly be fixing errors and only doing some minor edits reflecting the changes in technology or new algorithm advancements. The e-book/printed editions will most likely be sold on a "pay what you want" basis, and in any case, the web version will always be fully available online.
 
 **Pre-ordering / financially supporting the book.** Due to my unfortunate citizenship and place of birth, you can't — that is, until I find a way that at the same time complies with international sanctions, doesn't sponsor [the war](https://en.wikipedia.org/wiki/2022_Russian_invasion_of_Ukraine), and won't put me in prison for tax evasion.
 
-So, don't bother. If you want to support this book, just share the articles you like on link aggregators and social media and help fix typos — that would be enough.
+So, don't bother. If you want to support this book, just share it and help fix typos — that would be enough.
 
 **Translations.** The website has a separate functionality for creating and managing translations — and I've already been contacted by some nice people willing to translate the book into Italian and Chinese (and I will personally translate at least some of it into my native Russian).
 
-However, as the book is still evolving, it is probably not the best idea to start translating it at least until Part I is finished. That said, you are very much encouraged to make translations of any articles and publish them in your blogs — just send me the link so that we can merge it back when a centralized translation process starts.
+However, as the book is still evolving, it is probably not the best idea to start translating it at least until Part I is finished. That said, you are very much encouraged to make translations of any articles and publish them in your blogs — just send me the link so that we can merge it back when centralized translation starts.
 
 **"Translating" the Russian version.** The articles hosted at [ru.algorithmica.org/cs/](https://ru.algorithmica.org/cs/) are not about advanced performance engineering but mostly about classical computer science algorithms — without discussing how to speed them up beyond asymptotic complexity. Most of the information there is not unique and already exists in English on some other places on the internet: for example, the similar-spirited [cp-algorithms.com](https://cp-algorithms.com/).
 
@@ -51,7 +51,7 @@ However, as the book is still evolving, it is probably not the best idea to star
 
 There are two highly impactful textbooks on which most computer science courses are built. Both are undoubtedly outstanding, but [one of them](https://en.wikipedia.org/wiki/The_Art_of_Computer_Programming) is 50 years old, and [the other](https://en.wikipedia.org/wiki/Introduction_to_Algorithms) is 30 years old, and [computers have changed a lot](/hpc/complexity/hardware) since then. Asymptotic complexity is not the sole deciding factor anymore. In modern practical algorithm design, you choose the approach that makes better use of different types of parallelism available in the hardware over the one that theoretically does fewer raw operations on galaxy-scale inputs.
 
-And yet, the computer science curricula in most colleges completely ignore this shift. Although there are some great courses that aim to correct that — such as "[Performance Engineering of Software Systems](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-172-performance-engineering-of-software-systems-fall-2018/)" from MIT, "[Programming Parallel Computers](https://ppc.cs.aalto.fi/)" from Aalto University, and some non-academic ones like Denis Bakhvalov's "[Performance Ninja](https://github.com/dendibakh/perf-ninja)" — most computer science graduates still treat the hardware like something from the 90s.
+And yet, the computer science curricula in most colleges completely ignore this shift. Although there are some great courses that aim to correct that — such as "[Performance Engineering of Software Systems](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-172-performance-engineering-of-software-systems-fall-2018/)" from MIT, "[Programming Parallel Computers](https://ppc.cs.aalto.fi/)" from Aalto University, and some non-academic ones like Denis Bakhvalov's "[Performance Ninja](https://github.com/dendibakh/perf-ninja)" — most computer science graduates still treat modern hardware like something from the 1990s.
 
 What I really want to achieve is that performance engineering becomes taught right after introduction to algorithms. Writing the first comprehensive textbook on the subject is a large part of it, and this is why I rush to finish it by the summer so that the colleges can pick it up in the next academic year. But creating a new course requires more than that: you need a balanced curriculum, course infrastructure, lecture slides, lab assignments… so for some time after finishing the main book, I will be working on course materials and tools for *teaching* performance engineering — and I'm looking forward to collaborating with other people who want to make it a reality as well.
 
@@ -76,7 +76,7 @@ Competitive programming is, in my opinion, misguided. They are doing useless thi
 
 The first part covers the basics of computer architecture and optimization of single-threaded algorithms.
 
-It walks through the main CPU optimization topics such as caching, SIMD and pipelining, and provides brief examples in C++, followed by large case studies where we usually achieve a significant speedup over some STL algorithm or data structure.
+It walks through the main CPU optimization topics such as caching, SIMD, and pipelining, and provides brief examples in C++, followed by large case studies where we usually achieve a significant speedup over some STL algorithm or data structure.
 
 Planned table of contents:
 
@@ -94,7 +94,7 @@ Planned table of contents:
  1.4. Functions and Recursion
  1.5. Indirect Branching
  1.6. Machine Code Layout
- 1.7. Interrupts and System Calls
+ 1.7. System Calls
  1.8. Virtualization
 3. Instruction-Level Parallelism
  3.1. Pipeline Hazards
@@ -163,11 +163,11 @@ Planned table of contents:
  9.11. AoS and SoA
 10. SIMD Parallelism
  10.1. Intrinsics and Vector Types
- 10.2. Loading and Writing Data
- 10.3. Sums and Other Reductions
+ 10.2. Moving Data
+ 10.3. Reductions
  10.4. Masking and Blending
  10.5. In-Register Shuffles
- 10.6. Auto-Vectorization
+ 10.6. Auto-Vectorization and SPMD
 11. Algorithm Case Studies
  11.1. Binary GCD
 (11.2. Prime Number Sieves)
@@ -178,20 +178,22 @@ Planned table of contents:
  11.7. Number-Theoretic Transform
  11.8. Argmin with SIMD
  11.9. Prefix Sum with SIMD
- 11.10. Reading and Writing Integers
-(11.11. Reading and Writing Floats)
-(11.12. String Searching)
- 11.13. Sorting
- 11.14. Matrix Multiplication
+ 11.10. Reading Decimal Integers
+ 11.11. Writing Decimal Integers
+(11.12. Reading and Writing Floats)
+(11.13. String Searching)
+ 11.14. Sorting
+ 11.15. Matrix Multiplication
 12. Data Structure Case Studies
  12.1. Binary Search
  12.2. Static B-Trees
- 12.3. Segment Trees
-(12.4. Search Trees)
-(12.5. Range Minimum Query)
- 12.6. Hash Tables
-(12.7. Bitmaps)
-(12.8. Probabilistic Filters)
+(12.3. Search Trees)
+ 12.4. Segment Trees
+(12.5. Tries)
+(12.6. Range Minimum Query)
+ 12.7. Hash Tables
+(12.8. Bitmaps)
+(12.9. Probabilistic Filters)
 ```
 
 Among the cool things that we will speed up:
@@ -201,18 +203,47 @@ Among the cool things that we will speed up:
 - 5-10x faster segment trees (compared to Fenwick trees)
 - 5x faster hash tables (compared to `std::unordered_map`)
 - 2x faster popcount (compared to repeatedly calling `popcnt`)
-- 2x faster parsing series of integers (compared to `scanf`)
+- 35x faster parsing series of integers (compared to `scanf`)
 - ?x faster sorting (compared to `std::sort`)
 - 2x faster sum (compared to `std::accumulate`)
 - 2-3x faster prefix sum (compared to naive implementation)
 - 10x faster argmin (compared to naive implementation)
 - 10x faster array searching (compared to `std::find`)
+- 15x faster search tree (compared to `std::set`)
 - 100x faster matrix multiplication (compared to "for-for-for")
 - optimal word-size integer factorization (~0.4ms per 60-bit integer)
 - optimal Karatsuba Algorithm
 - optimal FFT
 
-This work is largely based on blog posts, research papers, conference talks and other work authored by a lot of people:
+Volume: 450-600 pages  
+Release date: Q3 2022
+
+### Part II: Parallel Algorithms
+
+Concurrency, models of parallelism, context switching, green threads, concurrent runtimes, cache coherence, synchronization primitives, OpenMP, reductions, scans, list ranking, graph algorithms, lock-free data structures, heterogeneous computing, CUDA, kernels, warps, blocks, matrix multiplication, sorting.
+
+Volume: 150-200 pages  
+Release date: 2023-2024?
+
+### Part III: Distributed Computing
+
+<!-- (I might need some help from here on.) -->
+
+Metworking, message passing, actor model, communication-constrained algorithms, distributed primitives, all-reduce, MapReduce, stream processing, query planning, storage, sharding, compression, distributed databases, consistency, reliability, scheduling, workflow engines, cloud computing.
+
+Release date: ??? (more likely to be completed than not)
+
+### Part IV: Software & Hardware
+
+<!-- (TODO: come up with a better title — one that emphasizes that this part is mainly about the software-hardware boundary and not PL/IC design.) -->
+
+LLVM IR, compiler optimizations & back-end, interpreters, JIT-compilation, Cython, JAX, Numba, Julia, OpenCL, DPC++, oneAPI, XLA, (basic) Verilog, FPGAs, ASICs, TPUs and other AI accelerators.
+
+Release date: ??? (less likely to be completed than not)
+
+### Acknowledgements
+
+The book is largely based on blog posts, research papers, conference talks, and other work authored by a lot of people:
 
 - [Agner Fog](https://agner.org/optimize/)
 - [Daniel Lemire](https://lemire.me/en/#publications)
@@ -236,35 +267,23 @@ This work is largely based on blog posts, research papers, conference talks and
 - [Geoff Langdale](https://branchfree.org/)
 - [Matt Kulukundis](https://twitter.com/JuvHarlequinKFM)
 - [Georg Sauthoff](https://gms.tf/)
+- [Danila Kutenin](https://danlark.org/author/kutdanila/)
+- [Ivica Bogosavljević](https://johnysswlab.com/author/ibogi/)
+- [Matt Pharr](https://pharr.org/matt/)
+- [Jan Wassenberg](https://research.google/people/JanWassenberg/)
 - [Marshall Lochbaum](https://mlochbaum.github.io/publications.html)
+- [Pavel Zemtsov](https://pzemtsov.github.io/)
+- [Gustavo Duarte](https://manybutfinite.com/)
+- [Nyaan](https://nyaannyaan.github.io/library/)
 - [Nayuki](https://www.nayuki.io/category/programming)
+- [Konstantin](http://const.me/)
+- [InstLatX64](https://twitter.com/InstLatX64)
 - [ridiculous_fish](https://ridiculousfish.com/blog/)
+- [Z boson](https://stackoverflow.com/users/2542702/z-boson)
 - [Creel](https://www.youtube.com/c/WhatsACreel)
 
-Volume: 450-600 pages  
-Release date: Q2 2022
-
-### Part II: Parallel Algorithms
-
-Concurrency, models of parallelism, green threads and concurrent runtimes, cache coherence, synchronization primitives, OpenMP, reductions, scans, list ranking and graph algorithms, lock-free data structures, heterogeneous computing, CUDA, kernels, warps, blocks, matrix multiplication and sorting.
-
-Volume: 150-200 pages  
-Release date: 2023?
-
-### Part III: Distributed Computing
-
-Communication-constrained algorithms, message passing, actor model, partitioning, MapReduce, consistency and reliability at scale, storage, compression, scheduling and cloud computing, distributed deep learning.
-
-Release date: ??? (more likely to be completed than not)
-
-### Part IV: Compilers and Domain-Specific Architectures
-
-LLVM IR, compiler optimizations, JIT-compilation, Cython, JAX, Numba, Julia, OpenCL, DPC++ and oneAPI, XLA,  Verilog, FPGAs, ASICs, TPUs and other AI accelerators.
-
-Release date: ??? (less likely to be completed than not)
-
 ### Disclaimer: Technology Choices
 
-The examples in this book use C++, GCC, x86-64, CUDA, and Spark, although the underlying principles we aim to convey are not specific to them.
+The examples in this book use C++, GCC, x86-64, CUDA, and Spark, although the underlying principles conveyed are not specific to them.
 
-To clear my conscience, I'm not happy with any of these choices: these technologies just happen to be the most widespread and stable at the moment and thus more helpful to the reader. I would have respectively picked C / Rust, LLVM, arm, OpenCL, and Dask; maybe there will be a 2nd edition in which some of the tech stack is changed.
+To clear my conscience, I'm not happy with any of these choices: these technologies just happen to be the most widespread and stable at the moment and thus more helpful to the reader. I would have respectively picked C / Rust / [Carbon?](https://github.com/carbon-language/carbon-lang), LLVM, arm, OpenCL, and Dask; maybe there will be a 2nd edition in which some of the tech stack is changed.
diff --git a/content/english/hpc/algorithms/argmin.md b/content/english/hpc/algorithms/argmin.md
index ccd9f140..2089d083 100644
--- a/content/english/hpc/algorithms/argmin.md
+++ b/content/english/hpc/algorithms/argmin.md
@@ -3,7 +3,7 @@ title: Argmin with SIMD
 weight: 7
 ---
 
-Computing the *minimum* of an array [easily vectorizable](/hpc/simd/reduction), as it is not different from any other reduction: in AVX2, you just need to use a convenient `_mm256_min_epi32` intrinsic as the inner operation. It computes the minimum of two 8-element vectors in one cycle — even faster than in the scalar case, which requires at least a comparison and a conditional move.
+Computing the *minimum* of an array is [easily vectorizable](/hpc/simd/reduction), as it is not different from any other reduction: in AVX2, you just need to use a convenient `_mm256_min_epi32` intrinsic as the inner operation. It computes the minimum of two 8-element vectors in one cycle — even faster than in the scalar case, which requires at least a comparison and a conditional move.
 
 Finding the *index* of that minimum element (*argmin*) is much harder, but it is still possible to vectorize very efficiently. In this section, we design an algorithm that computes the argmin (almost) at the speed of computing the minimum and ~15x faster than the naive scalar approach.
 
@@ -164,7 +164,7 @@ int argmin(int *a, int n) {
 
 The compiler [optimized the machine code layout](/hpc/architecture/layout), and the CPU is now able to execute the loop at around 2 GFLOPS — a slight but sizeable improvement from 1.5 GFLOPS of the non-hinted loop.
 
-Here is the idea: if we are only updating the minimum a dozen or so times during the entire computation, we can ditch all the vector-blending and index updating and just maintain the minimum and regularly check if it has changed. Inside this check, we can use however slow method of updating the argmin we want because it will only be called a few times. 
+Here is the idea: if we are only updating the minimum a dozen or so times during the entire computation, we can ditch all the vector-blending and index updating and just maintain the minimum and regularly check if it has changed. Inside this check, we can use however slow method of updating the argmin we want because it will only be called a few times.
 
 To implement it with SIMD, all we need to do on each iteration is a vector load, a comparison, and a test-if-zero:
 
diff --git a/content/english/hpc/algorithms/factorization.md b/content/english/hpc/algorithms/factorization.md
index 4ff8061d..b900eb8c 100644
--- a/content/english/hpc/algorithms/factorization.md
+++ b/content/english/hpc/algorithms/factorization.md
@@ -1,48 +1,74 @@
 ---
 title: Integer Factorization
 weight: 3
-draft: true
+published: true
 ---
 
-Integer factorization is interesting because of RSA problem.
+The problem of factoring integers into primes is central to computational [number theory](/hpc/number-theory/). It has been [studied](https://www.cs.purdue.edu/homes/ssw/chapter3.pdf) since at least the 3rd century BC, and [many methods](https://en.wikipedia.org/wiki/Category:Integer_factorization_algorithms) have been developed that are efficient for different inputs.
 
-"How big are your numbers?" determines the method to use:
+In this case study, we specifically consider the factorization of *word-sized* integers: those on the order of $10^9$ and $10^{18}$. Untypical for this book, in this one, you may actually learn an asymptotically better algorithm: we start with a few basic approaches and gradually build up to the $O(\sqrt[4]{n})$-time *Pollard's rho algorithm* and optimize it to the point where it can factorize 60-bit semiprimes in 0.3-0.4ms and ~3 times faster than the previous state-of-the-art.
 
-- Less than 2^16 or so: Lookup table.
-- Less than 2^70 or so: Richard Brent's modification of Pollard's rho algorithm.
-- Less than 10^50: Lenstra elliptic curve factorization
-- Less than 10^100: Quadratic Sieve
-- More than 10^100: General Number Field Sieve
+<!--
+Integer factorization is interesting because of the RSA problem.
+Unlike other case studies of this book, in this one you will actually learn an asymptotically better algorithm that you've never known before — Pollard's rho algorithm — which we optimize so that it is almost 4 times faster than the existing implementation, to the best of my knowledge.
+-->
 
+### Benchmark
 
-and do other computations such as computing the greatest common multiple (given that it is not even so that ) (since $\gcd(n, r) = 1$)
-
-For all methods, we will implement `find_factor` function which returns one divisor ot 1. You can apply it recurively to get the factorization, so whatever asymptotic you had won't affect it:
+For all methods, we will implement `find_factor` function that takes a positive integer $n$ and returns any of its non-trivial divisors (or `1` if the number is prime):
 
 ```c++
-typedef uint32_t u32;
-typedef uint64_t u64;
+// I don't feel like typing "unsigned long long" each time
+typedef __uint16_t u16;
+typedef __uint32_t u32;
+typedef __uint64_t u64;
 typedef __uint128_t u128;
 
+u64 find_factor(u64 n);
+```
+
+To find the full factorization, you can apply it to $n$, reduce it, and continue until a new factor can no longer be found:
+
+```c++
 vector<u64> factorize(u64 n) {
-    vector<u64> res;
-    while (int d = find_factor(n); d > 1) // does it work?
-        res.push_back(d);
-    return res;
+    vector<u64> factorization;
+    do {
+        u64 d = find_factor(n);
+        factorization.push_back(d);
+        n /= d;
+    } while (d != 1);
+    return factorization;
 }
 ```
 
-## Trial division
+After each removed factor, the problem becomes considerably smaller, so the worst-case running time of full factorization is equal to the worst-case running time of a `find_factor` call. 
+
+For many factorization algorithms, including those presented in this section, the running time scales with the smaller prime factor. Therefore, to provide worst-case input, we use *semiprimes:* products of two prime numbers $p \le q$ that are on the same order of magnitude. We generate a $k$-bit semiprime as the product of two random $\lfloor k / 2 \rfloor$-bit primes.
+
+Since some of the algorithms are inherently randomized, we also tolerate a small (<1%) percentage of false-negative errors (when `find_factor` returns `1` despite number $n$ being composite), although this rate can be reduced to almost zero without significant performance penalties.
+
+### Trial division
+
+<!--
 
-This is the most basic algorithm to find a prime factorization.
+Trial division was first described by Fibonacci in 1202. Although it was probably known to animals. Perhaps some animals can factor? The scientific priority probably belongs to dinosaurs or ancient fish trying to divvy stuff up.
 
-We divide by each possible divisor $d$.
-We can notice, that it is impossible that all prime factors of a composite number $n$ are bigger than $\sqrt{n}$.
-Therefore, we only need to test the divisors $2 \le d \le \sqrt{n}$, which gives us the prime factorization in $O(\sqrt{n})$.
+0.056024
+
+-->
+
+The most basic approach is to try every integer smaller than $n$ as a divisor:
+
+```c++
+u64 find_factor(u64 n) {
+    for (u64 d = 2; d < n; d++)
+        if (n % d == 0)
+            return d;
+    return 1;
+}
+```
 
-The smallest divisor has to be a prime number.
-We remove the factor from the number, and repeat the process.
-If we cannot find any divisor in the range $[2; \sqrt{n}]$, then the number itself has to be prime.
+We can notice that if $n$ is divided by $d < \sqrt n$, then it is also divided by $\frac{n}{d} > \sqrt n$, and there is no need to check for it separately. This lets us stop trial division early and only check for potential divisors that do not exceed $\sqrt n$:
 
 ```c++
 u64 find_factor(u64 n) {
@@ -53,13 +79,43 @@ u64 find_factor(u64 n) {
 }
 ```
 
+In our benchmark, $n$ is a semiprime, and we always find the lesser divisor, so both $O(n)$ and $O(\sqrt n)$ implementations perform the same and are able to factorize ~2k 30-bit numbers per second — while taking whole 20 seconds to factorize a single 60-bit number.
+
+### Lookup Table
+
+Nowadays, you can type `factor 57` in your Linux terminal or Google search bar to get the factorization of any number. But before computers were invented, it was more practical to use *factorization tables:* special books containing factorizations of the first $N$ numbers.
+
+We can also use this approach to compute these lookup tables [during compile time](/hpc/compilation/precalc/). To save space, we can store only the smallest divisor of a number. Since the smallest divisor does not exceed the $\sqrt n$, we need just one byte per a 16-bit integer:
+
+```c++
+template <int N = (1<<16)>
+struct Precalc {
+    unsigned char divisor[N];
+
+    constexpr Precalc() : divisor{} {
+        for (int i = 0; i < N; i++)
+            divisor[i] = 1;
+        for (int i = 2; i * i < N; i++)
+            if (divisor[i] == 1)
+                for (int k = i * i; k < N; k += i)
+                    divisor[k] = i;
+    }
+};
+
+constexpr Precalc P{};
+
+u64 find_factor(u64 n) {
+    return P.divisor[n];
+}
+```
+
+With this approach, we can process 3M 16-bit integers per second, although it would probably [get slower](../hpc/cpu-cache/bandwidth/) for larger numbers. While it requires just a few milliseconds and 64KB of memory to calculate and store the divisors of the first $2^{16}$ numbers, it does not scale well for larger inputs.
+
 ### Wheel factorization
 
-This is an optimization of the trial division.
-The idea is the following.
-Once we know that the number is not divisible by 2, we don't need to check every other even number.
-This leaves us with only $50\%$ of the numbers to check.
-After checking 2, we can simply start with 3 and skip every other number.
+To save paper space, pre-computer era factorization tables typically excluded numbers divisible by $2$ and $5$, making the factorization table ½ × ⅘ = 0.4 of its original size. In the decimal numeral system, you can quickly determine whether a number is divisible by $2$ or $5$ (by looking at its last digit) and keep dividing the number $n$ by $2$ or $5$ while it is possible, eventually arriving at some entry in the factorization table.
+
+We can apply a similar trick to trial division by first checking if the number is divisible by $2$ and then only considering odd divisors:
 
 ```c++
 u64 find_factor(u64 n) {
@@ -72,24 +128,29 @@ u64 find_factor(u64 n) {
 }
 ```
 
-This method can be extended.
-If the number is not divisible by 3, we can also ignore all other multiples of 3 in the future computations.
-So we only need to check the numbers $5, 7, 11, 13, 17, 19, 23, \dots$.
-We can observe a pattern of these remaining numbers.
-We need to check all numbers with $d \bmod 6 = 1$ and $d \bmod 6 = 5$.
-So this leaves us with only $33.3\%$ percent of the numbers to check.
-We can implement this by checking the primes 2 and 3 first, and then start checking with 5 and alternatively skip 1 or 3 numbers.
+With 50% fewer divisions to perform, this algorithm works twice as fast.
+
+This method can be extended: if the number is not divisible by $3$, we can also ignore all multiples of $3$, and the same goes for all other divisors. The problem is, as we increase the number of primes to exclude, it becomes less straightforward to iterate only over the numbers not divisible by them as they follow an irregular pattern — unless the number of primes is small.
+
+For example, if we consider $2$, $3$, and $5$, then, among the first $90$ numbers, we only need to check:
+
+```center
+(1,) 7, 11, 13, 17, 19, 23, 29,
+31, 37, 41, 43, 47, 49, 53, 59,
+61, 67, 71, 73, 77, 79, 83, 89…
+```
+
+You can notice a pattern: the sequence repeats itself every $30$ numbers. This is not surprising since the remainder modulo $2 \times 3 \times 5 = 30$ is all we need to determine whether a number is divisible by $2$, $3$, or $5$. This means that we only need to check $8$ numbers with specific remainders out of every $30$, proportionally improving the performance:
 
 ```c++
 u64 find_factor(u64 n) {
     for (u64 d : {2, 3, 5})
         if (n % d == 0)
             return d;
-    u64 increments[] =   {0, 4, 6, 10, 12, 16, 22, 24};
-    u64 sum = 30;
-    for (u64 d = 7; d * d <= n; d += sum) {
-        for (u64 k = 0; k < 8; k++) {
-            u64 x = d + increments[k];
+    u64 offsets[] = {0, 4, 6, 10, 12, 16, 22, 24};
+    for (u64 d = 7; d * d <= n; d += 30) {
+        for (u64 offset : offsets) {
+            u64 x = d + offset;
             if (n % x == 0)
                 return x;
         }
@@ -98,98 +159,290 @@ u64 find_factor(u64 n) {
 }
 ```
 
-We can extend this even further.
-Here is an implementation for the prime number 2, 3 and 5.
-It's convenient to use an array to store how much we have to skip.
+As expected, it works $\frac{30}{8} = 3.75$ times faster than the naive trial division, processing about 7.6k 30-bit numbers per second. The performance can be improved further by considering more primes, but the returns are diminishing: adding a new prime $p$ reduces the number of iterations by $\frac{1}{p}$ but increases the size of the skip-list by a factor of $p$, requiring proportionally more memory.
 
-### Lookup table
+### Precomputed Primes
 
-We will choose to store smallest factors of first $2^16$ — because this way they all fit in just one byte, so we are sort of saving on memory here.
+If we keep increasing the number of primes in wheel factorization, we eventually exclude all composite numbers and only check for prime factors. In this case, we don't need this array of offsets but just the array of primes:
 
 ```c++
-template<int N = (1<<16)>
-struct Precalc {
-    char divisor[N];
+const int N = (1 << 16);
 
-    constexpr Precalc() : divisor{} {
-        for (int i = 0; i < N; i++)
-            divisor[i] = 1;
-        for (int i = 2; i * i < N; i++)
-            if (divisor[i] == 1)
-                for (int k = i * i; k < N; k += i)
-                    divisor[k] = i;
+struct Precalc {
+    u16 primes[6542]; // # of primes under N=2^16
+
+    constexpr Precalc() : primes{} {
+        bool marked[N] = {};
+        int n_primes = 0;
+
+        for (int i = 2; i < N; i++) {
+            if (!marked[i]) {
+                primes[n_primes++] = i;
+                for (int j = 2 * i; j < N; j += i)
+                    marked[j] = true;
+            }
+        }
     }
 };
 
-constexpr Precalc precalc{};
+constexpr Precalc P{};
 
 u64 find_factor(u64 n) {
-    return precalc.divisor[n];
+    for (u16 p : P.primes)
+        if (n % p == 0)
+            return p;
+    return 1;
 }
 ```
 
+This approach lets us process almost 20k 30-bit integers per second, but it does not work for larger (64-bit) numbers unless they have small ($< 2^{16}$) factors.
+
+Note that this is actually an asymptotic optimization: there are $O(\frac{n}{\ln n})$ primes among the first $n$ numbers, so this algorithm performs $O(\frac{\sqrt n}{\ln \sqrt n})$ operations, while wheel factorization only eliminates a large but constant fraction of divisors. If we extend it to 64-bit numbers and precompute every prime under $2^{32}$ (storing which would require several hundred megabytes of memory), the relative speedup would grow by a factor of $\frac{\ln \sqrt{n^2}}{\ln \sqrt n} = 2 \cdot \frac{1/2}{1/2} \cdot \frac{\ln n}{\ln n} = 2$.
+
+All variants of trial division, including this one, are bottlenecked by the speed of integer division, which can be [optimized](/hpc/arithmetic/division/) if we know the divisors in advance and allow for some additional precomputation. In our case, it is suitable to use [the Lemire division check](/hpc/arithmetic/division/#lemire-reduction):
+
+```c++
+// ...precomputation is the same as before,
+// but we store the reciprocal instead of the prime number itself
+u64 magic[6542];
+// for each prime i:
+magic[n_primes++] = u64(-1) / i + 1;
+
+u64 find_factor(u64 n) {
+    for (u64 m : P.magic)
+        if (m * n < m)
+            return u64(-1) / m + 1;
+    return 1;
+}
+```
+
+This makes the algorithm ~18x faster: we can now factorize **~350k** 30-bit numbers per second, which is actually the most efficient algorithm we have for this number range. While it can probably be optimized even further by performing these checks in parallel with [SIMD](/hpc/simd), we will stop there and try a different, asymptotically better approach.
+
 ### Pollard's Rho Algorithm
 
-The algorithm is probabilistic. This means that it may or may not work. You would also need to 
+<!--
+
+Consider this weird code snippet:
 
-Ро-алгоритм Полларда — рандомизированный алгоритм факторизации целых чисел, работающий за время $O(n^\frac{1}{4})$ и основывающийся не следствии из парадокса дней рождений:
+```c++
+u64 find_factor(u64 n) {
+    while (true) {
+        if (u64 g = gcd(randint(2, n - 1), n); g != 1)
+            return g;
+    }
+}
+```
+
+It also searches for a factor, but it does so by repeatedly trying to compute the [GCD](../gcd) of $n$ and its random remainder, which would yield a valid divisor of $n$ if this remainder is not coprime with it. Surprisingly, this algorithm is not *that* terrible: it needs expected $O(\sqrt n)$ iterations in the worst case (times $\log n$ from GCD) because on each trial, it can hit not only $p$ or $q = \frac{n}{p}$, but also $\frac{n}{p} + \frac{n}{q} = O(\sqrt n)$ of their multiples.
+
+By itself, this algorithm is just an esoteric way of computing factorization, but can be made useful. If, instead of random numbers, we apply this $\gcd$ trick to a particular number sequence, we get a $O(n^\frac{1}{4})$ approach known as Pollard's rho algorithm.
+
+Apart from this trick, Pollard's rho algorithm relies on a consequence from the Birthday paradox: we need to add $O(\sqrt{n})$ random numbers from $1$ to $n$ to a set until we get a collision. 
+
+-->
+
+Pollard's rho is a randomized $O(\sqrt[4]{n})$ integer factorization algorithm that makes use of the [birthday paradox](https://en.wikipedia.org/wiki/Birthday_problem):
+
+> One only needs to draw $d = \Theta(\sqrt{n})$ random numbers between $1$ and $n$ to get a collision with high probability.
+
+The reasoning behind it is that each of the $d$ added element has a $\frac{d}{n}$ chance of colliding with some other element, implying that the expected number of collisions is $\frac{d^2}{n}$. If $d$ is asymptotically smaller than $\sqrt n$, then this ratio grows to zero as $n \to \infty$, and to infinity otherwise.
+
+Consider some function $f(x)$ that takes a remainder $x \in [0, n)$ and maps it to some other remainder of $n$ in a way that seems random from the number theory point of view. Specifically, we will use $f(x) = x^2 + 1 \bmod n$, which is random enough for our purposes.
+
+Now, consider a graph where each number-vertex $x$ has an edge pointing to $f(x)$. Such graphs are called *functional*. In functional graphs, the "trajectory" of any element — the path we walk if we start from that element and keep following the edges — is a path that eventually loops around (because the set of vertices is limited, and at some point, we have to go to a vertex we have already visited).
+
+![The trajectory of an element resembles the greek letter ρ (rho), which is what the algorithm is named after](../img/rho.jpg)
+
+Consider a trajectory of some particular element $x_0$:
+
+$$
+x_0, \; f(x_0), \; f(f(x_0)), \; \ldots
+$$
+
+Let's make another sequence out of this one by reducing each element modulo $p$, the smallest prime divisor of $n$.
 
-> В мультимножество нужно добавить $O(\sqrt{n})$ случайных чисел от 1 до $n$, чтобы какие-то два совпали.
+**Lemma.** The expected length of the reduced sequence before it turns into a cycle is $O(\sqrt[4]{n})$.
 
-## $\rho$-алгоритм Полларда
+**Proof:** Since $p$ is the smallest divisor, $p \leq \sqrt n$. Each time we follow a new edge, we essentially generate a random number between $0$ and $p$ (we treat $f$ as a "deterministically-random" function). The birthday paradox states that we only need to generate $O(\sqrt p) = O(\sqrt[4]{n})$ numbers until we get a collision and thus enter a loop.
 
-Итак, мы хотим факторизовать число $n$. Предположим, что $n = p q$ и $p \approx q$. Понятно, что труднее случая, наверное, нет. Алгоритм итеративно ищет наименьший делитель и таким образом сводит задачу к как минимум в два раза меньшей.
+Since we don't know $p$, this mod-$p$ sequence is only imaginary, but if find a cycle in it — that is, $i$ and $j$ such that
 
-Возьмём произвольную «достаточно случайную» с точки зрения теории чисел функцию. Например $f(x) = (x+1)^2 \mod n$.
+$$
+f^i(x_0) \equiv f^j(x_0) \pmod p
+$$
 
-Граф, в котором из каждой вершины есть единственное ребро $x \to f(x)$, называется *функциональным*. Если в нём нарисовать «траекторию» произвольного элемента — какой-то путь, превращающийся в цикл — то получится что-то похожее на букву $\rho$ (ро). Алгоритм из-за этого так и назван.
+then we can also find $p$ itself as
 
-![](https://upload.wikimedia.org/wikipedia/commons/4/47/Pollard_rho_cycle.jpg)
+$$
+p = \gcd(|f^i(x_0) - f^j(x_0)|, n)
+$$
 
-Рассмотрим траекторию какого-нибудь элемента $x_0$: {$x_0$, $f(x_0)$, $f(f(x_0))$, $\ldots$}. Сделаем из неё новую последовательность, мысленно взяв каждый элемент по модулю $p$ — наименьшего из простых делителей $n$. 
+The algorithm itself just finds this cycle and $p$ using this GCD trick and Floyd's "[tortoise and hare](https://en.wikipedia.org/wiki/Cycle_detection#Floyd's_tortoise_and_hare)" algorithm: we maintain two pointers $i$ and $j = 2i$ and check that 
 
-**Утверждение**. Ожидаемая длина цикла в этой последовательности $O(\sqrt[4]{n})$.
+$$
+\gcd(|f^i(x_0) - f^j(x_0)|, n) \neq 1
+$$
 
-*Доказательство:* так как $p$ — меньший делитель, то $p \leq \sqrt n$. Теперь просто подставлим в следствие из парадокса дней рождений: в множество нужно добавить $O(\sqrt{p}) = O(\sqrt[4]{n})$ элементов, чтобы какие-то два совпали, а значит последовательность зациклилась.
+which is equivalent to comparing $f^i(x_0)$ and $f^j(x_0)$ modulo $p$. Since $j$ (hare) is increasing at twice the rate of $i$ (tortoise), their difference is increasing by $1$ each iteration and eventually will become equal to (or a multiple of) the cycle length, with $i$ and $j$ pointing to the same elements. And as we proved half a page ago, reaching a cycle would only require $O(\sqrt[4]{n})$ iterations:
 
-Если мы найдём цикл в такой последовательности — то есть такие $i$ и $j$, что $f^i(x_0) \equiv f^j(x_0) \pmod p$ — то мы сможем найти и какой-то делитель $n$, а именно $\gcd(|f^i(x_0) - f^j(x_0)|, n)$ — это число меньше $n$ и делится на $p$.
+```c++
+u64 f(u64 x, u64 mod) {
+    return ((u128) x * x + 1) % mod;
+}
+
+u64 diff(u64 a, u64 b) {
+    // a and b are unsigned and so is their difference, so we can't just call abs(a - b)
+    return a > b ? a - b : b - a;
+}
+
+const u64 SEED = 42;
+
+u64 find_factor(u64 n) {
+    u64 x = SEED, y = SEED, g = 1;
+    while (g == 1) {
+        x = f(f(x, n), n); // advance x twice
+        y = f(y, n);       // advance y once
+        g = gcd(diff(x, y));
+    }
+    return g;
+}
+```
+
+While it processes only ~25k 30-bit integers — which is almost 15 times slower than by checking each prime using a fast division trick — it dramatically outperforms every $\tilde{O}(\sqrt n)$ algorithm for 60-bit numbers, factorizing around 90 of them per second.
+
+### Pollard-Brent Algorithm 
 
-Алгоритм по сути находит цикл в этой последовательности, используя для этого стандартный алгоритм («черепаха и заяц»): будем поддерживать два удаляющихся друг от друга указателя $i$ и $j$ ($i = 2j$) и проверять, что $f^i(x_0) \equiv f^j(x_0) \pmod p$, что эквивалентно проверке $\gcd(|f^i(x_0) - f^j(x_0)|, n) \not \in \{ 1, n \}$.
+Floyd's cycle-finding algorithm has a problem in that it moves iterators more than necessary: at least half of the vertices are visited one additional time by the slower iterator.
+
+One way to solve it is to memorize the values $x_i$ that the faster iterator visits and, every two iterations, compute the GCD using the difference of $x_i$ and $x_{\lfloor i / 2 \rfloor}$. But it can also be done without extra memory using a different principle: the tortoise doesn't move on every iteration, but it gets reset to the value of the faster iterator when the iteration number becomes a power of two. This lets us save additional iterations while still using the same GCD trick to compare $x_i$ and $x_{2^{\lfloor \log_2 i \rfloor}}$ on each iteration:
 
 ```c++
-typedef long long ll;
-
-inline ll f(ll x) { return (x+1)*(x+1); }
-
-ll find_divisor(ll n, ll seed = 1) {
-    ll x = seed, y = seed;
-    ll divisor = 1;
-    while (divisor == 1 || divisor == n) {
-        // двигаем первый указатель на шаг
-        y = f(y) % n;
-        // а второй -- на два
-        x = f(f(x) % n) % n;
-        // пытаемся найти общий делитель
-        divisor = __gcd(abs(x-y), n);
+u64 find_factor(u64 n) {
+    u64 x = SEED;
+    
+    for (int l = 256; l < (1 << 20); l *= 2) {
+        u64 y = x;
+        for (int i = 0; i < l; i++) {
+            x = f(x, n);
+            if (u64 g = gcd(diff(x, y), n); g != 1)
+                return g;
+        }
     }
-    return divisor;
+
+    return 1;
 }
 ```
 
-Так как алгоритм рандомизированный, при полной реализации нужно учитывать разные детали. Например, что иногда делитель не находится (нужно запускать несколько раз), или что при попытке факторизовать простое число он будет работать за $O(\sqrt n)$ (нужно добавить отсечение по времени).
+Note that we also set an upper limit on the number of iterations so that the algorithm finishes in a reasonable amount of time and returns `1` if $n$ turns out to be a prime.
+
+It actually does *not* improve performance and even makes the algorithm ~1.5x *slower*, which probably has something to do with the fact that $x$ is stale. It spends most of the time computing the GCD and not advancing the iterator — in fact, the time requirement of this algorithm is currently $O(\sqrt[4]{n} \log n)$ because of it.
+
+Instead of [optimizing the GCD itself](../gcd), we will optimize the number of its invocations. We can use the fact that if one of $a$ and $b$ contains factor $p$, then $a \cdot b \bmod n$ will also contain it, so instead of computing $\gcd(a, n)$ and $\gcd(b, n)$, we can compute $\gcd(a \cdot b \bmod n, n)$. This way, we can group the calculations of GCP in groups of $M = O(\log n)$ we remove $\log n$ out of the asymptotic:
+
+```c++
+const int M = 1024;
+
+u64 find_factor(u64 n) {
+    u64 x = SEED;
+    
+    for (int l = M; l < (1 << 20); l *= 2) {
+        u64 y = x, p = 1;
+        for (int i = 0; i < l; i += M) {
+            for (int j = 0; j < M; j++) {
+                y = f(y, n);
+                p = (u128) p * diff(x, y) % n;
+            }
+            if (u64 g = gcd(p, n); g != 1)
+                return g;
+        }
+    }
+
+    return 1;
+}
+```
+
+Now it performs 425 factorizations per second, bottlenecked by the speed of modulo.
+
+### Optimizing the Modulo
+
+The final step is to apply [Montgomery multiplication](/hpc/number-theory/montgomery/). Since the modulo is constant, we can perform all computations — advancing the iterator, multiplication, and even computing the GCD — in the Montgomery space where reduction is cheap:
+
+```c++
+struct Montgomery {
+    u64 n, nr;
+    
+    Montgomery(u64 n) : n(n) {
+        nr = 1;
+        for (int i = 0; i < 6; i++)
+            nr *= 2 - n * nr;
+    }
+
+    u64 reduce(u128 x) const {
+        u64 q = u64(x) * nr;
+        u64 m = ((u128) q * n) >> 64;
+        return (x >> 64) + n - m;
+    }
+
+    u64 multiply(u64 x, u64 y) {
+        return reduce((u128) x * y);
+    }
+};
+
+u64 f(u64 x, u64 a, Montgomery m) {
+    return m.multiply(x, x) + a;
+}
+
+const int M = 1024;
+
+u64 find_factor(u64 n, u64 x0 = 2, u64 a = 1) {
+    Montgomery m(n);
+    u64 x = SEED;
+    
+    for (int l = M; l < (1 << 20); l *= 2) {
+        u64 y = x, p = 1;
+        for (int i = 0; i < l; i += M) {
+            for (int j = 0; j < M; j++) {
+                x = f(x, m);
+                p = m.multiply(p, diff(x, y));
+            }
+            if (u64 g = gcd(p, n); g != 1)
+                return g;
+        }
+    }
+
+    return 1;
+}
+```
+
+This implementation can processes around 3k 60-bit integers per second, which is ~3x faster than what [PARI](https://pari.math.u-bordeaux.fr/) / [SageMath's `factor`](https://doc.sagemath.org/html/en/reference/structure/sage/structure/factorization.html) / `cat semiprimes.txt | time factor` measures.
+
+### Further Improvements
+
+**Optimizations.** There is still a lot of potential for optimization in our implementation of the Pollard's algorithm:
+
+- We could probably use a better cycle-finding algorithm, exploiting the fact that the graph is random. For example, there is little chance that we enter the loop in within the first few iterations (the length of the cycle and the path we walk before entering it should be equal in expectation since before we loop around, we choose the vertex of the path we've walked independently), so we may just advance the iterator for some time before starting the trials with the GCD trick.
+- Our current approach is bottlenecked by advancing the iterator (the latency of Montgomery multiplication is much higher than its reciprocal throughput), and while we are waiting for it to complete, we could perform more than just one trial using the previous values.
+- If we run $p$ independent instances of the algorithm with different seeds in parallel and stop when one of them finds the answer, it would finish $\sqrt p$ times faster (the reasoning is similar to the Birthday paradox; try to prove it yourself). We don't have to use multiple cores for that: there is a lot of untapped [instruction-level parallelism](/hpc/pipelining/), so we could concurrently run two or three of the same operations on the same thread, or use [SIMD](/hpc/simd) instructions to perform 4 or 8 multiplications in parallel.
 
-### Brent's Method
+I would not be surprised to see another 3x improvement and throughput of ~10k/sec. If you [implement](https://github.com/sslotin/amh-code/tree/main/factor) some of these ideas, please [let me know](http://sereja.me/).
 
-Another idea is to accumulate the product and instead of calculating GCD on each step to calculate it every log n steps.
+<!-- Another observation: the length of the "tail" and the cycle is equal in expectation, since when we loop around, we choose any vertex of the path we walked independently. How to optimize for the *average* case is unclear. -->
 
-### Optimizing division
+**Errors.** Another aspect that we need to handle in a practical implementation is possible errors. Our current implementation has a 0.7% error rate for 60-bit integers, and it grows higher if the numbers are lower. These errors come from three main sources:
 
-The next step is to actually apply Montgomery Multiplication.
+- A cycle simply not being found (the algorithm is inherently random, and there is no guarantee that it will be found). In this case, we need to perform a primality test and optionally start again.
+- The `p` variable becoming zero (because both $p$ and $q$ can get into the product). It becomes increasingly more likely as we decrease size of the inputs or increase the constant `M`. In this case, we need to either restart the process or (better) roll back the last $M$ iterations and perform the trials one by one.
+- Overflows in the Montgomery multiplication. Our current implementation is pretty loose with them, and if $n$ is large, we need to add more `x > mod ? x - mod : x` kind of statements to deal with overflows.
 
-This is exactly the type of problem when we need specific knowledge, because we have 64-bit modulo by not-compile-constants, and compiler can't really do much to optimize it.
+**Larger numbers.** These issues become less important if we exclude small numbers and numbers with small prime factors using the algorithms we've implemented before. In general, the optimal approach should depend on the size of the numbers:
 
-...
+- Smaller than $2^{16}$: use a lookup table;
+- Smaller than $2^{32}$: use a list of precomputed primes with a fast divisibility check;
+- Smaller than $2^{64}$ or so: use Pollard's rho algorithm with Montgomery multiplication;
+- Smaller than $10^{50}$: switch to [Lenstra elliptic curve factorization](https://en.wikipedia.org/wiki/Lenstra_elliptic-curve_factorization);
+- Smaller than $10^{100}$: switch to [Quadratic Sieve](https://en.wikipedia.org/wiki/Quadratic_sieve);
+- Larger than $10^{100}$: switch to [General Number Field Sieve](https://en.wikipedia.org/wiki/General_number_field_sieve).
 
-## Further optimizations
+<!-- Requiring about 100KB of memory. 6542 * 8 -->
 
-Существуют также [субэкспоненциальные](https://ru.wikipedia.org/wiki/%D0%A4%D0%B0%D0%BA%D1%82%D0%BE%D1%80%D0%B8%D0%B7%D0%B0%D1%86%D0%B8%D1%8F_%D1%86%D0%B5%D0%BB%D1%8B%D1%85_%D1%87%D0%B8%D1%81%D0%B5%D0%BB#%D0%A1%D1%83%D0%B1%D1%8D%D0%BA%D1%81%D0%BF%D0%BE%D0%BD%D0%B5%D0%BD%D1%86%D0%B8%D0%B0%D0%BB%D1%8C%D0%BD%D1%8B%D0%B5_%D0%B0%D0%BB%D0%B3%D0%BE%D1%80%D0%B8%D1%82%D0%BC%D1%8B), но не полиномиальные алгоритмы факторизации. Человечество [умеет](https://en.wikipedia.org/wiki/Integer_factorization_records) факторизовывать числа порядка $2^{200}$.
+The last three approaches are very different from what we've been doing and require much more advanced number theory, and they deserve an article (or a full-length university course) of their own.
diff --git a/content/english/hpc/algorithms/gcd.md b/content/english/hpc/algorithms/gcd.md
index 59e55f10..6a4f8ca7 100644
--- a/content/english/hpc/algorithms/gcd.md
+++ b/content/english/hpc/algorithms/gcd.md
@@ -14,7 +14,7 @@ $$
 \gcd(a, b) = \max_{g: \; g|a \, \land \, g | b} g
 $$
 
-You probably already know this algorithm from a CS textbook, but let me briefly remind it anyway. It is based on the following formula, assuming that $a > b$:
+You probably already know this algorithm from a CS textbook, but I will summarize it here. It is based on the following formula, assuming that $a > b$:
 
 $$
 \gcd(a, b) = \begin{cases}
@@ -135,7 +135,7 @@ int gcd(int a, int b) {
 
 Let's run it, and… it sucks. The difference in speed compared to `std::gcd` is indeed 2x, but on the other side of the equation. This is mainly because of all the branching needed to differentiate between the cases. Let's start optimizing.
 
-First, let's replace all divisions by 2 with divisions by whichever highest power of 2 we can. We can do it efficiently with `__builtin_ctz`, the "count trailing zeros" instruction available on modern CPUs. Whenever we are supposed to divide by 2 in the original algorithm, we will call this function instead, which will give us the exact amount to right-shift the number by. Assuming that the we are dealing with large random numbers, this is expected to decrease the number of iterations by almost a factor 2, because $1 + \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \ldots \to 2$.
+First, let's replace all divisions by 2 with divisions by whichever highest power of 2 we can. We can do it efficiently with `__builtin_ctz`, the "count trailing zeros" instruction available on modern CPUs. Whenever we are supposed to divide by 2 in the original algorithm, we will call this function instead, which will give us the exact number of bits to right-shift the number by. Assuming that the we are dealing with large random numbers, this is expected to decrease the number of iterations by almost a factor 2, because $1 + \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \ldots \to 2$.
 
 Second, we can notice that condition 2 can now only be true once — in the very beginning — because every other identity leaves at least one of the numbers odd. Therefore we can handle this case just once in the beginning and not consider it in the main loop.
 
@@ -186,7 +186,7 @@ loop:
 
 Let's draw the dependency graph of this loop:
 
-@@
+<!--
 \node [draw, circle] (diff)  at (3, 10) {diff};
 \node [draw, circle] (min)   at (1.5, 8.9) {min};
 \node [draw, circle] (abs)   at (3, 8.9) {abs};
@@ -201,13 +201,15 @@ Let's draw the dependency graph of this loop:
 \path [->, dotted] (shift) edge (test);
 \path [->, dashed] (shift) edge [bend right=75] (diff);
 \path [->, dashed] (shift) edge [bend left=25] (min);
-@@
+-->
+
+![](../img/gcd-dependency1.png)
 
 Modern processors can execute many instructions in parallel, essentially meaning that the true "cost" of this computation is roughly the sum of latencies on its critical path. In this case, it is the total latency of `diff`, `abs`, `ctz`, and `shift`.
 
-We can decrease this latency using the fact that we can actually calculate `ctz` using just `diff = a - b`, because a negative number divisible by $2^k$ still has $k$ zeros at the end. This lets us not wait for `max(diff, -diff)` to be computed first, resulting in a shorter graph like this:
+We can decrease this latency using the fact that we can actually calculate `ctz` using just `diff = a - b`, because a [negative number](../hpc/arithmetic/integer/#signed-integers) divisible by $2^k$ still has $k$ zeros at the end of its binary representation. This lets us not wait for `max(diff, -diff)` to be computed first, resulting in a shorter graph like this:
 
-@@
+<!--
 \node [draw, circle] (diff)  at (3, 10) {diff};
 \node [draw, circle] (min)   at (1.5, 8.9) {min};
 \node [draw, circle] (abs)   at (4.5, 8.9) {abs};
@@ -222,7 +224,9 @@ We can decrease this latency using the fact that we can actually calculate `ctz`
 \path [->, dotted] (diff) edge (test);
 \path [->, dashed] (shift) edge [bend left=25] (min);
 \path [->, dashed] (abs) edge [bend left=25] (diff);
-@@
+-->
+
+![](../img/gcd-dependency2.png)
 
 Hopefully you will be less confused when you think about how the final code will be executed:
 
@@ -248,9 +252,9 @@ int gcd(int a, int b) {
 }
 ```
 
-It runs in 91ns — which is good enough to leave it there.
+It runs in 91ns, which is good enough to leave it there.
 
-If somebody wants to try to shove off a few more nanoseconds by re-writing assembly by hand or trying a lookup table to save a few last iterations, please [let me know](http://sereja.me/).
+If somebody wants to try to shave off a few more nanoseconds by rewriting the assembly by hand or trying a lookup table to save a few last iterations, please [let me know](http://sereja.me/).
 
 ### Acknowledgements
 
diff --git a/content/english/hpc/algorithms/img/column-major.jpg b/content/english/hpc/algorithms/img/column-major.jpg
new file mode 100644
index 00000000..675d0b85
Binary files /dev/null and b/content/english/hpc/algorithms/img/column-major.jpg differ
diff --git a/content/english/hpc/algorithms/img/gcd-dependency1.png b/content/english/hpc/algorithms/img/gcd-dependency1.png
new file mode 100644
index 00000000..4e58904c
Binary files /dev/null and b/content/english/hpc/algorithms/img/gcd-dependency1.png differ
diff --git a/content/english/hpc/algorithms/img/gcd-dependency2.png b/content/english/hpc/algorithms/img/gcd-dependency2.png
new file mode 100644
index 00000000..b045ada4
Binary files /dev/null and b/content/english/hpc/algorithms/img/gcd-dependency2.png differ
diff --git a/content/english/hpc/algorithms/img/mm-blas.svg b/content/english/hpc/algorithms/img/mm-blas.svg
new file mode 100644
index 00000000..5027faef
--- /dev/null
+++ b/content/english/hpc/algorithms/img/mm-blas.svg
@@ -0,0 +1,1570 @@
+<?xml version="1.0" encoding="utf-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+  "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<svg xmlns:xlink="http://www.w3.org/1999/xlink" width="576pt" height="360pt" viewBox="0 0 576 360" xmlns="http://www.w3.org/2000/svg" version="1.1">
+ <metadata>
+  <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
+   <cc:Work>
+    <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/>
+    <dc:date>2022-04-05T01:19:43.486396</dc:date>
+    <dc:format>image/svg+xml</dc:format>
+    <dc:creator>
+     <cc:Agent>
+      <dc:title>Matplotlib v3.5.1, https://matplotlib.org/</dc:title>
+     </cc:Agent>
+    </dc:creator>
+   </cc:Work>
+  </rdf:RDF>
+ </metadata>
+ <defs>
+  <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style>
+ </defs>
+ <g id="figure_1">
+  <g id="patch_1">
+   <path d="M 0 360 
+L 576 360 
+L 576 0 
+L 0 0 
+z
+" style="fill: #ffffff"/>
+  </g>
+  <g id="axes_1">
+   <g id="patch_2">
+    <path d="M 72 320.4 
+L 518.4 320.4 
+L 518.4 43.2 
+L 72 43.2 
+z
+" style="fill: #ffffff"/>
+   </g>
+   <g id="matplotlib.axis_1">
+    <g id="xtick_1">
+     <g id="line2d_1">
+      <path d="M 116.162567 320.4 
+L 116.162567 43.2 
+" clip-path="url(#p51dc6f4d83)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_1">
+      <!-- naive -->
+      <g style="fill: #262626" transform="translate(102.504754 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-6e" d="M 3513 2113 
+L 3513 0 
+L 2938 0 
+L 2938 2094 
+Q 2938 2591 2744 2837 
+Q 2550 3084 2163 3084 
+Q 1697 3084 1428 2787 
+Q 1159 2491 1159 1978 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1366 3272 1645 3428 
+Q 1925 3584 2291 3584 
+Q 2894 3584 3203 3211 
+Q 3513 2838 3513 2113 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-61" d="M 2194 1759 
+Q 1497 1759 1228 1600 
+Q 959 1441 959 1056 
+Q 959 750 1161 570 
+Q 1363 391 1709 391 
+Q 2188 391 2477 730 
+Q 2766 1069 2766 1631 
+L 2766 1759 
+L 2194 1759 
+z
+M 3341 1997 
+L 3341 0 
+L 2766 0 
+L 2766 531 
+Q 2569 213 2275 61 
+Q 1981 -91 1556 -91 
+Q 1019 -91 701 211 
+Q 384 513 384 1019 
+Q 384 1609 779 1909 
+Q 1175 2209 1959 2209 
+L 2766 2209 
+L 2766 2266 
+Q 2766 2663 2505 2880 
+Q 2244 3097 1772 3097 
+Q 1472 3097 1187 3025 
+Q 903 2953 641 2809 
+L 641 3341 
+Q 956 3463 1253 3523 
+Q 1550 3584 1831 3584 
+Q 2591 3584 2966 3190 
+Q 3341 2797 3341 1997 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-69" d="M 603 3500 
+L 1178 3500 
+L 1178 0 
+L 603 0 
+L 603 3500 
+z
+M 603 4863 
+L 1178 4863 
+L 1178 4134 
+L 603 4134 
+L 603 4863 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-76" d="M 191 3500 
+L 800 3500 
+L 1894 563 
+L 2988 3500 
+L 3597 3500 
+L 2284 0 
+L 1503 0 
+L 191 3500 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-65" d="M 3597 1894 
+L 3597 1613 
+L 953 1613 
+Q 991 1019 1311 708 
+Q 1631 397 2203 397 
+Q 2534 397 2845 478 
+Q 3156 559 3463 722 
+L 3463 178 
+Q 3153 47 2828 -22 
+Q 2503 -91 2169 -91 
+Q 1331 -91 842 396 
+Q 353 884 353 1716 
+Q 353 2575 817 3079 
+Q 1281 3584 2069 3584 
+Q 2775 3584 3186 3129 
+Q 3597 2675 3597 1894 
+z
+M 3022 2063 
+Q 3016 2534 2758 2815 
+Q 2500 3097 2075 3097 
+Q 1594 3097 1305 2825 
+Q 1016 2553 972 2059 
+L 3022 2063 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-6e"/>
+       <use xlink:href="#DejaVuSans-61" x="63.378906"/>
+       <use xlink:href="#DejaVuSans-69" x="124.658203"/>
+       <use xlink:href="#DejaVuSans-76" x="152.441406"/>
+       <use xlink:href="#DejaVuSans-65" x="211.621094"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_2">
+     <g id="line2d_2">
+      <path d="M 175.841711 320.4 
+L 175.841711 43.2 
+" clip-path="url(#p51dc6f4d83)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_2">
+      <!-- transposed -->
+      <g style="fill: #262626" transform="translate(147.899524 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-74" d="M 1172 4494 
+L 1172 3500 
+L 2356 3500 
+L 2356 3053 
+L 1172 3053 
+L 1172 1153 
+Q 1172 725 1289 603 
+Q 1406 481 1766 481 
+L 2356 481 
+L 2356 0 
+L 1766 0 
+Q 1100 0 847 248 
+Q 594 497 594 1153 
+L 594 3053 
+L 172 3053 
+L 172 3500 
+L 594 3500 
+L 594 4494 
+L 1172 4494 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-72" d="M 2631 2963 
+Q 2534 3019 2420 3045 
+Q 2306 3072 2169 3072 
+Q 1681 3072 1420 2755 
+Q 1159 2438 1159 1844 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1341 3275 1631 3429 
+Q 1922 3584 2338 3584 
+Q 2397 3584 2469 3576 
+Q 2541 3569 2628 3553 
+L 2631 2963 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-73" d="M 2834 3397 
+L 2834 2853 
+Q 2591 2978 2328 3040 
+Q 2066 3103 1784 3103 
+Q 1356 3103 1142 2972 
+Q 928 2841 928 2578 
+Q 928 2378 1081 2264 
+Q 1234 2150 1697 2047 
+L 1894 2003 
+Q 2506 1872 2764 1633 
+Q 3022 1394 3022 966 
+Q 3022 478 2636 193 
+Q 2250 -91 1575 -91 
+Q 1294 -91 989 -36 
+Q 684 19 347 128 
+L 347 722 
+Q 666 556 975 473 
+Q 1284 391 1588 391 
+Q 1994 391 2212 530 
+Q 2431 669 2431 922 
+Q 2431 1156 2273 1281 
+Q 2116 1406 1581 1522 
+L 1381 1569 
+Q 847 1681 609 1914 
+Q 372 2147 372 2553 
+Q 372 3047 722 3315 
+Q 1072 3584 1716 3584 
+Q 2034 3584 2315 3537 
+Q 2597 3491 2834 3397 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-70" d="M 1159 525 
+L 1159 -1331 
+L 581 -1331 
+L 581 3500 
+L 1159 3500 
+L 1159 2969 
+Q 1341 3281 1617 3432 
+Q 1894 3584 2278 3584 
+Q 2916 3584 3314 3078 
+Q 3713 2572 3713 1747 
+Q 3713 922 3314 415 
+Q 2916 -91 2278 -91 
+Q 1894 -91 1617 61 
+Q 1341 213 1159 525 
+z
+M 3116 1747 
+Q 3116 2381 2855 2742 
+Q 2594 3103 2138 3103 
+Q 1681 3103 1420 2742 
+Q 1159 2381 1159 1747 
+Q 1159 1113 1420 752 
+Q 1681 391 2138 391 
+Q 2594 391 2855 752 
+Q 3116 1113 3116 1747 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-6f" d="M 1959 3097 
+Q 1497 3097 1228 2736 
+Q 959 2375 959 1747 
+Q 959 1119 1226 758 
+Q 1494 397 1959 397 
+Q 2419 397 2687 759 
+Q 2956 1122 2956 1747 
+Q 2956 2369 2687 2733 
+Q 2419 3097 1959 3097 
+z
+M 1959 3584 
+Q 2709 3584 3137 3096 
+Q 3566 2609 3566 1747 
+Q 3566 888 3137 398 
+Q 2709 -91 1959 -91 
+Q 1206 -91 779 398 
+Q 353 888 353 1747 
+Q 353 2609 779 3096 
+Q 1206 3584 1959 3584 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-64" d="M 2906 2969 
+L 2906 4863 
+L 3481 4863 
+L 3481 0 
+L 2906 0 
+L 2906 525 
+Q 2725 213 2448 61 
+Q 2172 -91 1784 -91 
+Q 1150 -91 751 415 
+Q 353 922 353 1747 
+Q 353 2572 751 3078 
+Q 1150 3584 1784 3584 
+Q 2172 3584 2448 3432 
+Q 2725 3281 2906 2969 
+z
+M 947 1747 
+Q 947 1113 1208 752 
+Q 1469 391 1925 391 
+Q 2381 391 2643 752 
+Q 2906 1113 2906 1747 
+Q 2906 2381 2643 2742 
+Q 2381 3103 1925 3103 
+Q 1469 3103 1208 2742 
+Q 947 2381 947 1747 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-74"/>
+       <use xlink:href="#DejaVuSans-72" x="39.208984"/>
+       <use xlink:href="#DejaVuSans-61" x="80.322266"/>
+       <use xlink:href="#DejaVuSans-6e" x="141.601562"/>
+       <use xlink:href="#DejaVuSans-73" x="204.980469"/>
+       <use xlink:href="#DejaVuSans-70" x="257.080078"/>
+       <use xlink:href="#DejaVuSans-6f" x="320.556641"/>
+       <use xlink:href="#DejaVuSans-73" x="381.738281"/>
+       <use xlink:href="#DejaVuSans-65" x="433.837891"/>
+       <use xlink:href="#DejaVuSans-64" x="495.361328"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_3">
+     <g id="line2d_3">
+      <path d="M 235.520856 320.4 
+L 235.520856 43.2 
+" clip-path="url(#p51dc6f4d83)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_3">
+      <!-- vectorized -->
+      <g style="fill: #262626" transform="translate(209.396637 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-63" d="M 3122 3366 
+L 3122 2828 
+Q 2878 2963 2633 3030 
+Q 2388 3097 2138 3097 
+Q 1578 3097 1268 2742 
+Q 959 2388 959 1747 
+Q 959 1106 1268 751 
+Q 1578 397 2138 397 
+Q 2388 397 2633 464 
+Q 2878 531 3122 666 
+L 3122 134 
+Q 2881 22 2623 -34 
+Q 2366 -91 2075 -91 
+Q 1284 -91 818 406 
+Q 353 903 353 1747 
+Q 353 2603 823 3093 
+Q 1294 3584 2113 3584 
+Q 2378 3584 2631 3529 
+Q 2884 3475 3122 3366 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-7a" d="M 353 3500 
+L 3084 3500 
+L 3084 2975 
+L 922 459 
+L 3084 459 
+L 3084 0 
+L 275 0 
+L 275 525 
+L 2438 3041 
+L 353 3041 
+L 353 3500 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-76"/>
+       <use xlink:href="#DejaVuSans-65" x="59.179688"/>
+       <use xlink:href="#DejaVuSans-63" x="120.703125"/>
+       <use xlink:href="#DejaVuSans-74" x="175.683594"/>
+       <use xlink:href="#DejaVuSans-6f" x="214.892578"/>
+       <use xlink:href="#DejaVuSans-72" x="276.074219"/>
+       <use xlink:href="#DejaVuSans-69" x="317.1875"/>
+       <use xlink:href="#DejaVuSans-7a" x="344.970703"/>
+       <use xlink:href="#DejaVuSans-65" x="397.460938"/>
+       <use xlink:href="#DejaVuSans-64" x="458.984375"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_4">
+     <g id="line2d_4">
+      <path d="M 295.2 320.4 
+L 295.2 43.2 
+" clip-path="url(#p51dc6f4d83)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_4">
+      <!-- kernel -->
+      <g style="fill: #262626" transform="translate(279.807031 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-6b" d="M 581 4863 
+L 1159 4863 
+L 1159 1991 
+L 2875 3500 
+L 3609 3500 
+L 1753 1863 
+L 3688 0 
+L 2938 0 
+L 1159 1709 
+L 1159 0 
+L 581 0 
+L 581 4863 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-6c" d="M 603 4863 
+L 1178 4863 
+L 1178 0 
+L 603 0 
+L 603 4863 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-6b"/>
+       <use xlink:href="#DejaVuSans-65" x="54.285156"/>
+       <use xlink:href="#DejaVuSans-72" x="115.808594"/>
+       <use xlink:href="#DejaVuSans-6e" x="155.171875"/>
+       <use xlink:href="#DejaVuSans-65" x="218.550781"/>
+       <use xlink:href="#DejaVuSans-6c" x="280.074219"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_5">
+     <g id="line2d_5">
+      <path d="M 354.879144 320.4 
+L 354.879144 43.2 
+" clip-path="url(#p51dc6f4d83)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_5">
+      <!-- blocked -->
+      <g style="fill: #262626" transform="translate(335.542426 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-62" d="M 3116 1747 
+Q 3116 2381 2855 2742 
+Q 2594 3103 2138 3103 
+Q 1681 3103 1420 2742 
+Q 1159 2381 1159 1747 
+Q 1159 1113 1420 752 
+Q 1681 391 2138 391 
+Q 2594 391 2855 752 
+Q 3116 1113 3116 1747 
+z
+M 1159 2969 
+Q 1341 3281 1617 3432 
+Q 1894 3584 2278 3584 
+Q 2916 3584 3314 3078 
+Q 3713 2572 3713 1747 
+Q 3713 922 3314 415 
+Q 2916 -91 2278 -91 
+Q 1894 -91 1617 61 
+Q 1341 213 1159 525 
+L 1159 0 
+L 581 0 
+L 581 4863 
+L 1159 4863 
+L 1159 2969 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-62"/>
+       <use xlink:href="#DejaVuSans-6c" x="63.476562"/>
+       <use xlink:href="#DejaVuSans-6f" x="91.259766"/>
+       <use xlink:href="#DejaVuSans-63" x="152.441406"/>
+       <use xlink:href="#DejaVuSans-6b" x="207.421875"/>
+       <use xlink:href="#DejaVuSans-65" x="261.707031"/>
+       <use xlink:href="#DejaVuSans-64" x="323.230469"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_6">
+     <g id="line2d_6">
+      <path d="M 414.558289 320.4 
+L 414.558289 43.2 
+" clip-path="url(#p51dc6f4d83)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_6">
+      <!-- in-place -->
+      <g style="fill: #262626" transform="translate(394.743445 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-2d" d="M 313 2009 
+L 1997 2009 
+L 1997 1497 
+L 313 1497 
+L 313 2009 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-69"/>
+       <use xlink:href="#DejaVuSans-6e" x="27.783203"/>
+       <use xlink:href="#DejaVuSans-2d" x="91.162109"/>
+       <use xlink:href="#DejaVuSans-70" x="127.246094"/>
+       <use xlink:href="#DejaVuSans-6c" x="190.722656"/>
+       <use xlink:href="#DejaVuSans-61" x="218.505859"/>
+       <use xlink:href="#DejaVuSans-63" x="279.785156"/>
+       <use xlink:href="#DejaVuSans-65" x="334.765625"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_7">
+     <g id="line2d_7">
+      <path d="M 474.237433 320.4 
+L 474.237433 43.2 
+" clip-path="url(#p51dc6f4d83)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_7">
+      <!-- BLAS -->
+      <g style="fill: #262626" transform="translate(461.313996 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-42" d="M 1259 2228 
+L 1259 519 
+L 2272 519 
+Q 2781 519 3026 730 
+Q 3272 941 3272 1375 
+Q 3272 1813 3026 2020 
+Q 2781 2228 2272 2228 
+L 1259 2228 
+z
+M 1259 4147 
+L 1259 2741 
+L 2194 2741 
+Q 2656 2741 2882 2914 
+Q 3109 3088 3109 3444 
+Q 3109 3797 2882 3972 
+Q 2656 4147 2194 4147 
+L 1259 4147 
+z
+M 628 4666 
+L 2241 4666 
+Q 2963 4666 3353 4366 
+Q 3744 4066 3744 3513 
+Q 3744 3084 3544 2831 
+Q 3344 2578 2956 2516 
+Q 3422 2416 3680 2098 
+Q 3938 1781 3938 1306 
+Q 3938 681 3513 340 
+Q 3088 0 2303 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-4c" d="M 628 4666 
+L 1259 4666 
+L 1259 531 
+L 3531 531 
+L 3531 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-41" d="M 2188 4044 
+L 1331 1722 
+L 3047 1722 
+L 2188 4044 
+z
+M 1831 4666 
+L 2547 4666 
+L 4325 0 
+L 3669 0 
+L 3244 1197 
+L 1141 1197 
+L 716 0 
+L 50 0 
+L 1831 4666 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-53" d="M 3425 4513 
+L 3425 3897 
+Q 3066 4069 2747 4153 
+Q 2428 4238 2131 4238 
+Q 1616 4238 1336 4038 
+Q 1056 3838 1056 3469 
+Q 1056 3159 1242 3001 
+Q 1428 2844 1947 2747 
+L 2328 2669 
+Q 3034 2534 3370 2195 
+Q 3706 1856 3706 1288 
+Q 3706 609 3251 259 
+Q 2797 -91 1919 -91 
+Q 1588 -91 1214 -16 
+Q 841 59 441 206 
+L 441 856 
+Q 825 641 1194 531 
+Q 1563 422 1919 422 
+Q 2459 422 2753 634 
+Q 3047 847 3047 1241 
+Q 3047 1584 2836 1778 
+Q 2625 1972 2144 2069 
+L 1759 2144 
+Q 1053 2284 737 2584 
+Q 422 2884 422 3419 
+Q 422 4038 858 4394 
+Q 1294 4750 2059 4750 
+Q 2388 4750 2728 4690 
+Q 3069 4631 3425 4513 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-42"/>
+       <use xlink:href="#DejaVuSans-4c" x="68.603516"/>
+       <use xlink:href="#DejaVuSans-41" x="126.566406"/>
+       <use xlink:href="#DejaVuSans-53" x="194.974609"/>
+      </g>
+     </g>
+    </g>
+   </g>
+   <g id="matplotlib.axis_2">
+    <g id="ytick_1">
+     <g id="line2d_8">
+      <path d="M 72 320.4 
+L 518.4 320.4 
+" clip-path="url(#p51dc6f4d83)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_8">
+      <!-- 0 -->
+      <g style="fill: #262626" transform="translate(55.50125 324.579141)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-30" d="M 2034 4250 
+Q 1547 4250 1301 3770 
+Q 1056 3291 1056 2328 
+Q 1056 1369 1301 889 
+Q 1547 409 2034 409 
+Q 2525 409 2770 889 
+Q 3016 1369 3016 2328 
+Q 3016 3291 2770 3770 
+Q 2525 4250 2034 4250 
+z
+M 2034 4750 
+Q 2819 4750 3233 4129 
+Q 3647 3509 3647 2328 
+Q 3647 1150 3233 529 
+Q 2819 -91 2034 -91 
+Q 1250 -91 836 529 
+Q 422 1150 422 2328 
+Q 422 3509 836 4129 
+Q 1250 4750 2034 4750 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-30"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_2">
+     <g id="line2d_9">
+      <path d="M 72 279.15 
+L 518.4 279.15 
+" clip-path="url(#p51dc6f4d83)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_9">
+      <!-- 5 -->
+      <g style="fill: #262626" transform="translate(55.50125 283.329141)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-35" d="M 691 4666 
+L 3169 4666 
+L 3169 4134 
+L 1269 4134 
+L 1269 2991 
+Q 1406 3038 1543 3061 
+Q 1681 3084 1819 3084 
+Q 2600 3084 3056 2656 
+Q 3513 2228 3513 1497 
+Q 3513 744 3044 326 
+Q 2575 -91 1722 -91 
+Q 1428 -91 1123 -41 
+Q 819 9 494 109 
+L 494 744 
+Q 775 591 1075 516 
+Q 1375 441 1709 441 
+Q 2250 441 2565 725 
+Q 2881 1009 2881 1497 
+Q 2881 1984 2565 2268 
+Q 2250 2553 1709 2553 
+Q 1456 2553 1204 2497 
+Q 953 2441 691 2322 
+L 691 4666 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-35"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_3">
+     <g id="line2d_10">
+      <path d="M 72 237.9 
+L 518.4 237.9 
+" clip-path="url(#p51dc6f4d83)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_10">
+      <!-- 10 -->
+      <g style="fill: #262626" transform="translate(48.5025 242.079141)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-31" d="M 794 531 
+L 1825 531 
+L 1825 4091 
+L 703 3866 
+L 703 4441 
+L 1819 4666 
+L 2450 4666 
+L 2450 531 
+L 3481 531 
+L 3481 0 
+L 794 0 
+L 794 531 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-30" x="63.623047"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_4">
+     <g id="line2d_11">
+      <path d="M 72 196.65 
+L 518.4 196.65 
+" clip-path="url(#p51dc6f4d83)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_11">
+      <!-- 15 -->
+      <g style="fill: #262626" transform="translate(48.5025 200.829141)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-35" x="63.623047"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_5">
+     <g id="line2d_12">
+      <path d="M 72 155.4 
+L 518.4 155.4 
+" clip-path="url(#p51dc6f4d83)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_12">
+      <!-- 20 -->
+      <g style="fill: #262626" transform="translate(48.5025 159.579141)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-32" d="M 1228 531 
+L 3431 531 
+L 3431 0 
+L 469 0 
+L 469 531 
+Q 828 903 1448 1529 
+Q 2069 2156 2228 2338 
+Q 2531 2678 2651 2914 
+Q 2772 3150 2772 3378 
+Q 2772 3750 2511 3984 
+Q 2250 4219 1831 4219 
+Q 1534 4219 1204 4116 
+Q 875 4013 500 3803 
+L 500 4441 
+Q 881 4594 1212 4672 
+Q 1544 4750 1819 4750 
+Q 2544 4750 2975 4387 
+Q 3406 4025 3406 3419 
+Q 3406 3131 3298 2873 
+Q 3191 2616 2906 2266 
+Q 2828 2175 2409 1742 
+Q 1991 1309 1228 531 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-32"/>
+       <use xlink:href="#DejaVuSans-30" x="63.623047"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_6">
+     <g id="line2d_13">
+      <path d="M 72 114.15 
+L 518.4 114.15 
+" clip-path="url(#p51dc6f4d83)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_13">
+      <!-- 25 -->
+      <g style="fill: #262626" transform="translate(48.5025 118.329141)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-32"/>
+       <use xlink:href="#DejaVuSans-35" x="63.623047"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_7">
+     <g id="line2d_14">
+      <path d="M 72 72.9 
+L 518.4 72.9 
+" clip-path="url(#p51dc6f4d83)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_14">
+      <!-- 30 -->
+      <g style="fill: #262626" transform="translate(48.5025 77.079141)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-33" d="M 2597 2516 
+Q 3050 2419 3304 2112 
+Q 3559 1806 3559 1356 
+Q 3559 666 3084 287 
+Q 2609 -91 1734 -91 
+Q 1441 -91 1130 -33 
+Q 819 25 488 141 
+L 488 750 
+Q 750 597 1062 519 
+Q 1375 441 1716 441 
+Q 2309 441 2620 675 
+Q 2931 909 2931 1356 
+Q 2931 1769 2642 2001 
+Q 2353 2234 1838 2234 
+L 1294 2234 
+L 1294 2753 
+L 1863 2753 
+Q 2328 2753 2575 2939 
+Q 2822 3125 2822 3475 
+Q 2822 3834 2567 4026 
+Q 2313 4219 1838 4219 
+Q 1578 4219 1281 4162 
+Q 984 4106 628 3988 
+L 628 4550 
+Q 988 4650 1302 4700 
+Q 1616 4750 1894 4750 
+Q 2613 4750 3031 4423 
+Q 3450 4097 3450 3541 
+Q 3450 3153 3228 2886 
+Q 3006 2619 2597 2516 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-33"/>
+       <use xlink:href="#DejaVuSans-30" x="63.623047"/>
+      </g>
+     </g>
+    </g>
+    <g id="text_15">
+     <!-- GFLOPS -->
+     <g style="fill: #262626" transform="translate(42.006875 205.175625)rotate(-90)scale(0.12 -0.12)">
+      <defs>
+       <path id="DejaVuSans-47" d="M 3809 666 
+L 3809 1919 
+L 2778 1919 
+L 2778 2438 
+L 4434 2438 
+L 4434 434 
+Q 4069 175 3628 42 
+Q 3188 -91 2688 -91 
+Q 1594 -91 976 548 
+Q 359 1188 359 2328 
+Q 359 3472 976 4111 
+Q 1594 4750 2688 4750 
+Q 3144 4750 3555 4637 
+Q 3966 4525 4313 4306 
+L 4313 3634 
+Q 3963 3931 3569 4081 
+Q 3175 4231 2741 4231 
+Q 1884 4231 1454 3753 
+Q 1025 3275 1025 2328 
+Q 1025 1384 1454 906 
+Q 1884 428 2741 428 
+Q 3075 428 3337 486 
+Q 3600 544 3809 666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-46" d="M 628 4666 
+L 3309 4666 
+L 3309 4134 
+L 1259 4134 
+L 1259 2759 
+L 3109 2759 
+L 3109 2228 
+L 1259 2228 
+L 1259 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-4f" d="M 2522 4238 
+Q 1834 4238 1429 3725 
+Q 1025 3213 1025 2328 
+Q 1025 1447 1429 934 
+Q 1834 422 2522 422 
+Q 3209 422 3611 934 
+Q 4013 1447 4013 2328 
+Q 4013 3213 3611 3725 
+Q 3209 4238 2522 4238 
+z
+M 2522 4750 
+Q 3503 4750 4090 4092 
+Q 4678 3434 4678 2328 
+Q 4678 1225 4090 567 
+Q 3503 -91 2522 -91 
+Q 1538 -91 948 565 
+Q 359 1222 359 2328 
+Q 359 3434 948 4092 
+Q 1538 4750 2522 4750 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-50" d="M 1259 4147 
+L 1259 2394 
+L 2053 2394 
+Q 2494 2394 2734 2622 
+Q 2975 2850 2975 3272 
+Q 2975 3691 2734 3919 
+Q 2494 4147 2053 4147 
+L 1259 4147 
+z
+M 628 4666 
+L 2053 4666 
+Q 2838 4666 3239 4311 
+Q 3641 3956 3641 3272 
+Q 3641 2581 3239 2228 
+Q 2838 1875 2053 1875 
+L 1259 1875 
+L 1259 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+      </defs>
+      <use xlink:href="#DejaVuSans-47"/>
+      <use xlink:href="#DejaVuSans-46" x="77.490234"/>
+      <use xlink:href="#DejaVuSans-4c" x="135.009766"/>
+      <use xlink:href="#DejaVuSans-4f" x="187.097656"/>
+      <use xlink:href="#DejaVuSans-50" x="265.808594"/>
+      <use xlink:href="#DejaVuSans-53" x="326.111328"/>
+     </g>
+    </g>
+   </g>
+   <g id="patch_3">
+    <path d="M 92.290909 320.4 
+L 140.034225 320.4 
+L 140.034225 316.913854 
+L 92.290909 316.913854 
+z
+" clip-path="url(#p51dc6f4d83)" style="fill: #4c72b0; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_4">
+    <path d="M 151.970053 320.4 
+L 199.713369 320.4 
+L 199.713369 315.682286 
+L 151.970053 315.682286 
+z
+" clip-path="url(#p51dc6f4d83)" style="fill: #dd8452; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_5">
+    <path d="M 211.649198 320.4 
+L 259.392513 320.4 
+L 259.392513 301.66771 
+L 211.649198 301.66771 
+z
+" clip-path="url(#p51dc6f4d83)" style="fill: #55a868; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_6">
+    <path d="M 271.328342 320.4 
+L 319.071658 320.4 
+L 319.071658 294.362573 
+L 271.328342 294.362573 
+z
+" clip-path="url(#p51dc6f4d83)" style="fill: #c44e52; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_7">
+    <path d="M 331.007487 320.4 
+L 378.750802 320.4 
+L 378.750802 193.865902 
+L 331.007487 193.865902 
+z
+" clip-path="url(#p51dc6f4d83)" style="fill: #8172b3; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_8">
+    <path d="M 390.686631 320.4 
+L 438.429947 320.4 
+L 438.429947 128.209154 
+L 390.686631 128.209154 
+z
+" clip-path="url(#p51dc6f4d83)" style="fill: #937860; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_9">
+    <path d="M 450.365775 320.4 
+L 498.109091 320.4 
+L 498.109091 107.984498 
+L 450.365775 107.984498 
+z
+" clip-path="url(#p51dc6f4d83)" style="fill: #da8bc3; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="line2d_15">
+    <path d="M 339.84 56.4 
+L 496.08 56.4 
+" clip-path="url(#p51dc6f4d83)" style="fill: none; stroke: #808080; stroke-width: 1.5; stroke-linecap: round"/>
+   </g>
+   <g id="patch_10">
+    <path d="M 72 320.4 
+L 72 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_11">
+    <path d="M 518.4 320.4 
+L 518.4 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_12">
+    <path d="M 72 320.4 
+L 518.4 320.4 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_13">
+    <path d="M 72 43.2 
+L 518.4 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="text_16">
+    <!-- Theoretical maximum -->
+    <g style="fill: #808080" transform="translate(375.766845 67.5375)scale(0.1 -0.1)">
+     <defs>
+      <path id="DejaVuSans-54" d="M -19 4666 
+L 3928 4666 
+L 3928 4134 
+L 2272 4134 
+L 2272 0 
+L 1638 0 
+L 1638 4134 
+L -19 4134 
+L -19 4666 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-68" d="M 3513 2113 
+L 3513 0 
+L 2938 0 
+L 2938 2094 
+Q 2938 2591 2744 2837 
+Q 2550 3084 2163 3084 
+Q 1697 3084 1428 2787 
+Q 1159 2491 1159 1978 
+L 1159 0 
+L 581 0 
+L 581 4863 
+L 1159 4863 
+L 1159 2956 
+Q 1366 3272 1645 3428 
+Q 1925 3584 2291 3584 
+Q 2894 3584 3203 3211 
+Q 3513 2838 3513 2113 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-20" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-6d" d="M 3328 2828 
+Q 3544 3216 3844 3400 
+Q 4144 3584 4550 3584 
+Q 5097 3584 5394 3201 
+Q 5691 2819 5691 2113 
+L 5691 0 
+L 5113 0 
+L 5113 2094 
+Q 5113 2597 4934 2840 
+Q 4756 3084 4391 3084 
+Q 3944 3084 3684 2787 
+Q 3425 2491 3425 1978 
+L 3425 0 
+L 2847 0 
+L 2847 2094 
+Q 2847 2600 2669 2842 
+Q 2491 3084 2119 3084 
+Q 1678 3084 1418 2786 
+Q 1159 2488 1159 1978 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1356 3278 1631 3431 
+Q 1906 3584 2284 3584 
+Q 2666 3584 2933 3390 
+Q 3200 3197 3328 2828 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-78" d="M 3513 3500 
+L 2247 1797 
+L 3578 0 
+L 2900 0 
+L 1881 1375 
+L 863 0 
+L 184 0 
+L 1544 1831 
+L 300 3500 
+L 978 3500 
+L 1906 2253 
+L 2834 3500 
+L 3513 3500 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-75" d="M 544 1381 
+L 544 3500 
+L 1119 3500 
+L 1119 1403 
+Q 1119 906 1312 657 
+Q 1506 409 1894 409 
+Q 2359 409 2629 706 
+Q 2900 1003 2900 1516 
+L 2900 3500 
+L 3475 3500 
+L 3475 0 
+L 2900 0 
+L 2900 538 
+Q 2691 219 2414 64 
+Q 2138 -91 1772 -91 
+Q 1169 -91 856 284 
+Q 544 659 544 1381 
+z
+M 1991 3584 
+L 1991 3584 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-54"/>
+     <use xlink:href="#DejaVuSans-68" x="61.083984"/>
+     <use xlink:href="#DejaVuSans-65" x="124.462891"/>
+     <use xlink:href="#DejaVuSans-6f" x="185.986328"/>
+     <use xlink:href="#DejaVuSans-72" x="247.167969"/>
+     <use xlink:href="#DejaVuSans-65" x="286.03125"/>
+     <use xlink:href="#DejaVuSans-74" x="347.554688"/>
+     <use xlink:href="#DejaVuSans-69" x="386.763672"/>
+     <use xlink:href="#DejaVuSans-63" x="414.546875"/>
+     <use xlink:href="#DejaVuSans-61" x="469.527344"/>
+     <use xlink:href="#DejaVuSans-6c" x="530.806641"/>
+     <use xlink:href="#DejaVuSans-20" x="558.589844"/>
+     <use xlink:href="#DejaVuSans-6d" x="590.376953"/>
+     <use xlink:href="#DejaVuSans-61" x="687.789062"/>
+     <use xlink:href="#DejaVuSans-78" x="749.068359"/>
+     <use xlink:href="#DejaVuSans-69" x="808.248047"/>
+     <use xlink:href="#DejaVuSans-6d" x="836.03125"/>
+     <use xlink:href="#DejaVuSans-75" x="933.443359"/>
+     <use xlink:href="#DejaVuSans-6d" x="996.822266"/>
+    </g>
+   </g>
+   <g id="text_17">
+    <!-- 1.00x -->
+    <g style="fill: #262626" transform="translate(97.489442 311.918229)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-31" d="M 750 831 
+L 1813 831 
+L 1813 3847 
+L 722 3622 
+L 722 4441 
+L 1806 4666 
+L 2950 4666 
+L 2950 831 
+L 4013 831 
+L 4013 0 
+L 750 0 
+L 750 831 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Bold-2e" d="M 653 1209 
+L 1778 1209 
+L 1778 0 
+L 653 0 
+L 653 1209 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Bold-30" d="M 2944 2338 
+Q 2944 3213 2780 3570 
+Q 2616 3928 2228 3928 
+Q 1841 3928 1675 3570 
+Q 1509 3213 1509 2338 
+Q 1509 1453 1675 1090 
+Q 1841 728 2228 728 
+Q 2613 728 2778 1090 
+Q 2944 1453 2944 2338 
+z
+M 4147 2328 
+Q 4147 1169 3647 539 
+Q 3147 -91 2228 -91 
+Q 1306 -91 806 539 
+Q 306 1169 306 2328 
+Q 306 3491 806 4120 
+Q 1306 4750 2228 4750 
+Q 3147 4750 3647 4120 
+Q 4147 3491 4147 2328 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Bold-78" d="M 1422 1791 
+L 159 3500 
+L 1344 3500 
+L 2059 2463 
+L 2784 3500 
+L 3969 3500 
+L 2706 1797 
+L 4031 0 
+L 2847 0 
+L 2059 1106 
+L 1281 0 
+L 97 0 
+L 1422 1791 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-31"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-30" x="107.568359"/>
+     <use xlink:href="#DejaVuSans-Bold-30" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="246.728516"/>
+    </g>
+   </g>
+   <g id="text_18">
+    <!-- 1.35x -->
+    <g style="fill: #262626" transform="translate(157.168586 310.686661)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-33" d="M 2981 2516 
+Q 3453 2394 3698 2092 
+Q 3944 1791 3944 1325 
+Q 3944 631 3412 270 
+Q 2881 -91 1863 -91 
+Q 1503 -91 1142 -33 
+Q 781 25 428 141 
+L 428 1069 
+Q 766 900 1098 814 
+Q 1431 728 1753 728 
+Q 2231 728 2486 893 
+Q 2741 1059 2741 1369 
+Q 2741 1688 2480 1852 
+Q 2219 2016 1709 2016 
+L 1228 2016 
+L 1228 2791 
+L 1734 2791 
+Q 2188 2791 2409 2933 
+Q 2631 3075 2631 3366 
+Q 2631 3634 2415 3781 
+Q 2200 3928 1806 3928 
+Q 1516 3928 1219 3862 
+Q 922 3797 628 3669 
+L 628 4550 
+Q 984 4650 1334 4700 
+Q 1684 4750 2022 4750 
+Q 2931 4750 3382 4451 
+Q 3834 4153 3834 3553 
+Q 3834 3144 3618 2883 
+Q 3403 2622 2981 2516 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Bold-35" d="M 678 4666 
+L 3669 4666 
+L 3669 3781 
+L 1638 3781 
+L 1638 3059 
+Q 1775 3097 1914 3117 
+Q 2053 3138 2203 3138 
+Q 3056 3138 3531 2711 
+Q 4006 2284 4006 1522 
+Q 4006 766 3489 337 
+Q 2972 -91 2053 -91 
+Q 1656 -91 1267 -14 
+Q 878 63 494 219 
+L 494 1166 
+Q 875 947 1217 837 
+Q 1559 728 1863 728 
+Q 2300 728 2551 942 
+Q 2803 1156 2803 1522 
+Q 2803 1891 2551 2103 
+Q 2300 2316 1863 2316 
+Q 1603 2316 1309 2248 
+Q 1016 2181 678 2041 
+L 678 4666 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-31"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-33" x="107.568359"/>
+     <use xlink:href="#DejaVuSans-Bold-35" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="246.728516"/>
+    </g>
+   </g>
+   <g id="text_19">
+    <!-- 5.37x -->
+    <g style="fill: #262626" transform="translate(216.847731 296.672085)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-37" d="M 428 4666 
+L 3944 4666 
+L 3944 3988 
+L 2125 0 
+L 953 0 
+L 2675 3781 
+L 428 3781 
+L 428 4666 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-35"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-33" x="107.568359"/>
+     <use xlink:href="#DejaVuSans-Bold-37" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="246.728516"/>
+    </g>
+   </g>
+   <g id="text_20">
+    <!-- 7.47x -->
+    <g style="fill: #262626" transform="translate(276.526875 289.366948)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-34" d="M 2356 3675 
+L 1038 1722 
+L 2356 1722 
+L 2356 3675 
+z
+M 2156 4666 
+L 3494 4666 
+L 3494 1722 
+L 4159 1722 
+L 4159 850 
+L 3494 850 
+L 3494 0 
+L 2356 0 
+L 2356 850 
+L 288 850 
+L 288 1881 
+L 2156 4666 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-37"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-34" x="107.568359"/>
+     <use xlink:href="#DejaVuSans-Bold-37" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="246.728516"/>
+    </g>
+   </g>
+   <g id="text_21">
+    <!-- 36.30x -->
+    <g style="fill: #262626" transform="translate(332.031332 188.870277)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-36" d="M 2316 2303 
+Q 2000 2303 1842 2098 
+Q 1684 1894 1684 1484 
+Q 1684 1075 1842 870 
+Q 2000 666 2316 666 
+Q 2634 666 2792 870 
+Q 2950 1075 2950 1484 
+Q 2950 1894 2792 2098 
+Q 2634 2303 2316 2303 
+z
+M 3803 4544 
+L 3803 3681 
+Q 3506 3822 3243 3889 
+Q 2981 3956 2731 3956 
+Q 2194 3956 1894 3657 
+Q 1594 3359 1544 2772 
+Q 1750 2925 1990 3001 
+Q 2231 3078 2516 3078 
+Q 3231 3078 3670 2659 
+Q 4109 2241 4109 1563 
+Q 4109 813 3618 361 
+Q 3128 -91 2303 -91 
+Q 1394 -91 895 523 
+Q 397 1138 397 2266 
+Q 397 3422 980 4083 
+Q 1563 4744 2578 4744 
+Q 2900 4744 3203 4694 
+Q 3506 4644 3803 4544 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-33"/>
+     <use xlink:href="#DejaVuSans-Bold-36" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="139.160156"/>
+     <use xlink:href="#DejaVuSans-Bold-33" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-30" x="246.728516"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="316.308594"/>
+    </g>
+   </g>
+   <g id="text_22">
+    <!-- 55.13x -->
+    <g style="fill: #262626" transform="translate(391.710476 123.213529)scale(0.12 -0.12)">
+     <use xlink:href="#DejaVuSans-Bold-35"/>
+     <use xlink:href="#DejaVuSans-Bold-35" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="139.160156"/>
+     <use xlink:href="#DejaVuSans-Bold-31" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-33" x="246.728516"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="316.308594"/>
+    </g>
+   </g>
+   <g id="text_23">
+    <!-- 60.93x -->
+    <g style="fill: #262626" transform="translate(451.389621 102.988873)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-39" d="M 641 103 
+L 641 966 
+Q 928 831 1190 764 
+Q 1453 697 1709 697 
+Q 2247 697 2547 995 
+Q 2847 1294 2900 1881 
+Q 2688 1725 2447 1647 
+Q 2206 1569 1925 1569 
+Q 1209 1569 770 1986 
+Q 331 2403 331 3084 
+Q 331 3838 820 4291 
+Q 1309 4744 2131 4744 
+Q 3044 4744 3544 4128 
+Q 4044 3513 4044 2388 
+Q 4044 1231 3459 570 
+Q 2875 -91 1856 -91 
+Q 1528 -91 1228 -42 
+Q 928 6 641 103 
+z
+M 2125 2350 
+Q 2441 2350 2600 2554 
+Q 2759 2759 2759 3169 
+Q 2759 3575 2600 3781 
+Q 2441 3988 2125 3988 
+Q 1809 3988 1650 3781 
+Q 1491 3575 1491 3169 
+Q 1491 2759 1650 2554 
+Q 1809 2350 2125 2350 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-36"/>
+     <use xlink:href="#DejaVuSans-Bold-30" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="139.160156"/>
+     <use xlink:href="#DejaVuSans-Bold-39" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-33" x="246.728516"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="316.308594"/>
+    </g>
+   </g>
+   <g id="text_24">
+    <!-- Matrix multiplication ($n=1920$) -->
+    <g style="fill: #262626" transform="translate(184.74 23.2)scale(0.14 -0.14)">
+     <defs>
+      <path id="DejaVuSans-4d" d="M 628 4666 
+L 1569 4666 
+L 2759 1491 
+L 3956 4666 
+L 4897 4666 
+L 4897 0 
+L 4281 0 
+L 4281 4097 
+L 3078 897 
+L 2444 897 
+L 1241 4097 
+L 1241 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-28" d="M 1984 4856 
+Q 1566 4138 1362 3434 
+Q 1159 2731 1159 2009 
+Q 1159 1288 1364 580 
+Q 1569 -128 1984 -844 
+L 1484 -844 
+Q 1016 -109 783 600 
+Q 550 1309 550 2009 
+Q 550 2706 781 3412 
+Q 1013 4119 1484 4856 
+L 1984 4856 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Oblique-6e" d="M 3566 2113 
+L 3156 0 
+L 2578 0 
+L 2988 2091 
+Q 3016 2238 3031 2350 
+Q 3047 2463 3047 2528 
+Q 3047 2791 2881 2937 
+Q 2716 3084 2419 3084 
+Q 1956 3084 1622 2776 
+Q 1288 2469 1184 1941 
+L 800 0 
+L 225 0 
+L 903 3500 
+L 1478 3500 
+L 1363 2950 
+Q 1603 3253 1940 3418 
+Q 2278 3584 2650 3584 
+Q 3113 3584 3367 3334 
+Q 3622 3084 3622 2631 
+Q 3622 2519 3608 2391 
+Q 3594 2263 3566 2113 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-3d" d="M 678 2906 
+L 4684 2906 
+L 4684 2381 
+L 678 2381 
+L 678 2906 
+z
+M 678 1631 
+L 4684 1631 
+L 4684 1100 
+L 678 1100 
+L 678 1631 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-39" d="M 703 97 
+L 703 672 
+Q 941 559 1184 500 
+Q 1428 441 1663 441 
+Q 2288 441 2617 861 
+Q 2947 1281 2994 2138 
+Q 2813 1869 2534 1725 
+Q 2256 1581 1919 1581 
+Q 1219 1581 811 2004 
+Q 403 2428 403 3163 
+Q 403 3881 828 4315 
+Q 1253 4750 1959 4750 
+Q 2769 4750 3195 4129 
+Q 3622 3509 3622 2328 
+Q 3622 1225 3098 567 
+Q 2575 -91 1691 -91 
+Q 1453 -91 1209 -44 
+Q 966 3 703 97 
+z
+M 1959 2075 
+Q 2384 2075 2632 2365 
+Q 2881 2656 2881 3163 
+Q 2881 3666 2632 3958 
+Q 2384 4250 1959 4250 
+Q 1534 4250 1286 3958 
+Q 1038 3666 1038 3163 
+Q 1038 2656 1286 2365 
+Q 1534 2075 1959 2075 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-29" d="M 513 4856 
+L 1013 4856 
+Q 1481 4119 1714 3412 
+Q 1947 2706 1947 2009 
+Q 1947 1309 1714 600 
+Q 1481 -109 1013 -844 
+L 513 -844 
+Q 928 -128 1133 580 
+Q 1338 1288 1338 2009 
+Q 1338 2731 1133 3434 
+Q 928 4138 513 4856 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-4d" transform="translate(0 0.015625)"/>
+     <use xlink:href="#DejaVuSans-61" transform="translate(86.279297 0.015625)"/>
+     <use xlink:href="#DejaVuSans-74" transform="translate(147.558594 0.015625)"/>
+     <use xlink:href="#DejaVuSans-72" transform="translate(186.767578 0.015625)"/>
+     <use xlink:href="#DejaVuSans-69" transform="translate(227.880859 0.015625)"/>
+     <use xlink:href="#DejaVuSans-78" transform="translate(255.664062 0.015625)"/>
+     <use xlink:href="#DejaVuSans-20" transform="translate(314.84375 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6d" transform="translate(346.630859 0.015625)"/>
+     <use xlink:href="#DejaVuSans-75" transform="translate(444.042969 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6c" transform="translate(507.421875 0.015625)"/>
+     <use xlink:href="#DejaVuSans-74" transform="translate(535.205078 0.015625)"/>
+     <use xlink:href="#DejaVuSans-69" transform="translate(574.414062 0.015625)"/>
+     <use xlink:href="#DejaVuSans-70" transform="translate(602.197266 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6c" transform="translate(665.673828 0.015625)"/>
+     <use xlink:href="#DejaVuSans-69" transform="translate(693.457031 0.015625)"/>
+     <use xlink:href="#DejaVuSans-63" transform="translate(721.240234 0.015625)"/>
+     <use xlink:href="#DejaVuSans-61" transform="translate(776.220703 0.015625)"/>
+     <use xlink:href="#DejaVuSans-74" transform="translate(837.5 0.015625)"/>
+     <use xlink:href="#DejaVuSans-69" transform="translate(876.708984 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6f" transform="translate(904.492188 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6e" transform="translate(965.673828 0.015625)"/>
+     <use xlink:href="#DejaVuSans-20" transform="translate(1029.052734 0.015625)"/>
+     <use xlink:href="#DejaVuSans-28" transform="translate(1060.839844 0.015625)"/>
+     <use xlink:href="#DejaVuSans-Oblique-6e" transform="translate(1099.853516 0.015625)"/>
+     <use xlink:href="#DejaVuSans-3d" transform="translate(1182.714844 0.015625)"/>
+     <use xlink:href="#DejaVuSans-31" transform="translate(1285.986328 0.015625)"/>
+     <use xlink:href="#DejaVuSans-39" transform="translate(1349.609375 0.015625)"/>
+     <use xlink:href="#DejaVuSans-32" transform="translate(1411.482422 0.015625)"/>
+     <use xlink:href="#DejaVuSans-30" transform="translate(1475.105469 0.015625)"/>
+     <use xlink:href="#DejaVuSans-29" transform="translate(1538.728516 0.015625)"/>
+    </g>
+   </g>
+  </g>
+ </g>
+ <defs>
+  <clipPath id="p51dc6f4d83">
+   <rect x="72" y="43.2" width="446.4" height="277.2"/>
+  </clipPath>
+ </defs>
+</svg>
diff --git a/content/english/hpc/algorithms/img/mm-blocked-barplot.svg b/content/english/hpc/algorithms/img/mm-blocked-barplot.svg
new file mode 100644
index 00000000..93334ac1
--- /dev/null
+++ b/content/english/hpc/algorithms/img/mm-blocked-barplot.svg
@@ -0,0 +1,1402 @@
+<?xml version="1.0" encoding="utf-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+  "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<svg xmlns:xlink="http://www.w3.org/1999/xlink" width="576pt" height="360pt" viewBox="0 0 576 360" xmlns="http://www.w3.org/2000/svg" version="1.1">
+ <metadata>
+  <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
+   <cc:Work>
+    <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/>
+    <dc:date>2022-04-05T01:18:41.689702</dc:date>
+    <dc:format>image/svg+xml</dc:format>
+    <dc:creator>
+     <cc:Agent>
+      <dc:title>Matplotlib v3.5.1, https://matplotlib.org/</dc:title>
+     </cc:Agent>
+    </dc:creator>
+   </cc:Work>
+  </rdf:RDF>
+ </metadata>
+ <defs>
+  <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style>
+ </defs>
+ <g id="figure_1">
+  <g id="patch_1">
+   <path d="M 0 360 
+L 576 360 
+L 576 0 
+L 0 0 
+z
+" style="fill: #ffffff"/>
+  </g>
+  <g id="axes_1">
+   <g id="patch_2">
+    <path d="M 72 320.4 
+L 518.4 320.4 
+L 518.4 43.2 
+L 72 43.2 
+z
+" style="fill: #ffffff"/>
+   </g>
+   <g id="matplotlib.axis_1">
+    <g id="xtick_1">
+     <g id="line2d_1">
+      <path d="M 126.109091 320.4 
+L 126.109091 43.2 
+" clip-path="url(#p2c5cd7951c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_1">
+      <!-- naive -->
+      <g style="fill: #262626" transform="translate(112.451278 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-6e" d="M 3513 2113 
+L 3513 0 
+L 2938 0 
+L 2938 2094 
+Q 2938 2591 2744 2837 
+Q 2550 3084 2163 3084 
+Q 1697 3084 1428 2787 
+Q 1159 2491 1159 1978 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1366 3272 1645 3428 
+Q 1925 3584 2291 3584 
+Q 2894 3584 3203 3211 
+Q 3513 2838 3513 2113 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-61" d="M 2194 1759 
+Q 1497 1759 1228 1600 
+Q 959 1441 959 1056 
+Q 959 750 1161 570 
+Q 1363 391 1709 391 
+Q 2188 391 2477 730 
+Q 2766 1069 2766 1631 
+L 2766 1759 
+L 2194 1759 
+z
+M 3341 1997 
+L 3341 0 
+L 2766 0 
+L 2766 531 
+Q 2569 213 2275 61 
+Q 1981 -91 1556 -91 
+Q 1019 -91 701 211 
+Q 384 513 384 1019 
+Q 384 1609 779 1909 
+Q 1175 2209 1959 2209 
+L 2766 2209 
+L 2766 2266 
+Q 2766 2663 2505 2880 
+Q 2244 3097 1772 3097 
+Q 1472 3097 1187 3025 
+Q 903 2953 641 2809 
+L 641 3341 
+Q 956 3463 1253 3523 
+Q 1550 3584 1831 3584 
+Q 2591 3584 2966 3190 
+Q 3341 2797 3341 1997 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-69" d="M 603 3500 
+L 1178 3500 
+L 1178 0 
+L 603 0 
+L 603 3500 
+z
+M 603 4863 
+L 1178 4863 
+L 1178 4134 
+L 603 4134 
+L 603 4863 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-76" d="M 191 3500 
+L 800 3500 
+L 1894 563 
+L 2988 3500 
+L 3597 3500 
+L 2284 0 
+L 1503 0 
+L 191 3500 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-65" d="M 3597 1894 
+L 3597 1613 
+L 953 1613 
+Q 991 1019 1311 708 
+Q 1631 397 2203 397 
+Q 2534 397 2845 478 
+Q 3156 559 3463 722 
+L 3463 178 
+Q 3153 47 2828 -22 
+Q 2503 -91 2169 -91 
+Q 1331 -91 842 396 
+Q 353 884 353 1716 
+Q 353 2575 817 3079 
+Q 1281 3584 2069 3584 
+Q 2775 3584 3186 3129 
+Q 3597 2675 3597 1894 
+z
+M 3022 2063 
+Q 3016 2534 2758 2815 
+Q 2500 3097 2075 3097 
+Q 1594 3097 1305 2825 
+Q 1016 2553 972 2059 
+L 3022 2063 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-6e"/>
+       <use xlink:href="#DejaVuSans-61" x="63.378906"/>
+       <use xlink:href="#DejaVuSans-69" x="124.658203"/>
+       <use xlink:href="#DejaVuSans-76" x="152.441406"/>
+       <use xlink:href="#DejaVuSans-65" x="211.621094"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_2">
+     <g id="line2d_2">
+      <path d="M 210.654545 320.4 
+L 210.654545 43.2 
+" clip-path="url(#p2c5cd7951c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_2">
+      <!-- transposed -->
+      <g style="fill: #262626" transform="translate(182.712358 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-74" d="M 1172 4494 
+L 1172 3500 
+L 2356 3500 
+L 2356 3053 
+L 1172 3053 
+L 1172 1153 
+Q 1172 725 1289 603 
+Q 1406 481 1766 481 
+L 2356 481 
+L 2356 0 
+L 1766 0 
+Q 1100 0 847 248 
+Q 594 497 594 1153 
+L 594 3053 
+L 172 3053 
+L 172 3500 
+L 594 3500 
+L 594 4494 
+L 1172 4494 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-72" d="M 2631 2963 
+Q 2534 3019 2420 3045 
+Q 2306 3072 2169 3072 
+Q 1681 3072 1420 2755 
+Q 1159 2438 1159 1844 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1341 3275 1631 3429 
+Q 1922 3584 2338 3584 
+Q 2397 3584 2469 3576 
+Q 2541 3569 2628 3553 
+L 2631 2963 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-73" d="M 2834 3397 
+L 2834 2853 
+Q 2591 2978 2328 3040 
+Q 2066 3103 1784 3103 
+Q 1356 3103 1142 2972 
+Q 928 2841 928 2578 
+Q 928 2378 1081 2264 
+Q 1234 2150 1697 2047 
+L 1894 2003 
+Q 2506 1872 2764 1633 
+Q 3022 1394 3022 966 
+Q 3022 478 2636 193 
+Q 2250 -91 1575 -91 
+Q 1294 -91 989 -36 
+Q 684 19 347 128 
+L 347 722 
+Q 666 556 975 473 
+Q 1284 391 1588 391 
+Q 1994 391 2212 530 
+Q 2431 669 2431 922 
+Q 2431 1156 2273 1281 
+Q 2116 1406 1581 1522 
+L 1381 1569 
+Q 847 1681 609 1914 
+Q 372 2147 372 2553 
+Q 372 3047 722 3315 
+Q 1072 3584 1716 3584 
+Q 2034 3584 2315 3537 
+Q 2597 3491 2834 3397 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-70" d="M 1159 525 
+L 1159 -1331 
+L 581 -1331 
+L 581 3500 
+L 1159 3500 
+L 1159 2969 
+Q 1341 3281 1617 3432 
+Q 1894 3584 2278 3584 
+Q 2916 3584 3314 3078 
+Q 3713 2572 3713 1747 
+Q 3713 922 3314 415 
+Q 2916 -91 2278 -91 
+Q 1894 -91 1617 61 
+Q 1341 213 1159 525 
+z
+M 3116 1747 
+Q 3116 2381 2855 2742 
+Q 2594 3103 2138 3103 
+Q 1681 3103 1420 2742 
+Q 1159 2381 1159 1747 
+Q 1159 1113 1420 752 
+Q 1681 391 2138 391 
+Q 2594 391 2855 752 
+Q 3116 1113 3116 1747 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-6f" d="M 1959 3097 
+Q 1497 3097 1228 2736 
+Q 959 2375 959 1747 
+Q 959 1119 1226 758 
+Q 1494 397 1959 397 
+Q 2419 397 2687 759 
+Q 2956 1122 2956 1747 
+Q 2956 2369 2687 2733 
+Q 2419 3097 1959 3097 
+z
+M 1959 3584 
+Q 2709 3584 3137 3096 
+Q 3566 2609 3566 1747 
+Q 3566 888 3137 398 
+Q 2709 -91 1959 -91 
+Q 1206 -91 779 398 
+Q 353 888 353 1747 
+Q 353 2609 779 3096 
+Q 1206 3584 1959 3584 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-64" d="M 2906 2969 
+L 2906 4863 
+L 3481 4863 
+L 3481 0 
+L 2906 0 
+L 2906 525 
+Q 2725 213 2448 61 
+Q 2172 -91 1784 -91 
+Q 1150 -91 751 415 
+Q 353 922 353 1747 
+Q 353 2572 751 3078 
+Q 1150 3584 1784 3584 
+Q 2172 3584 2448 3432 
+Q 2725 3281 2906 2969 
+z
+M 947 1747 
+Q 947 1113 1208 752 
+Q 1469 391 1925 391 
+Q 2381 391 2643 752 
+Q 2906 1113 2906 1747 
+Q 2906 2381 2643 2742 
+Q 2381 3103 1925 3103 
+Q 1469 3103 1208 2742 
+Q 947 2381 947 1747 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-74"/>
+       <use xlink:href="#DejaVuSans-72" x="39.208984"/>
+       <use xlink:href="#DejaVuSans-61" x="80.322266"/>
+       <use xlink:href="#DejaVuSans-6e" x="141.601562"/>
+       <use xlink:href="#DejaVuSans-73" x="204.980469"/>
+       <use xlink:href="#DejaVuSans-70" x="257.080078"/>
+       <use xlink:href="#DejaVuSans-6f" x="320.556641"/>
+       <use xlink:href="#DejaVuSans-73" x="381.738281"/>
+       <use xlink:href="#DejaVuSans-65" x="433.837891"/>
+       <use xlink:href="#DejaVuSans-64" x="495.361328"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_3">
+     <g id="line2d_3">
+      <path d="M 295.2 320.4 
+L 295.2 43.2 
+" clip-path="url(#p2c5cd7951c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_3">
+      <!-- vectorized -->
+      <g style="fill: #262626" transform="translate(269.075781 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-63" d="M 3122 3366 
+L 3122 2828 
+Q 2878 2963 2633 3030 
+Q 2388 3097 2138 3097 
+Q 1578 3097 1268 2742 
+Q 959 2388 959 1747 
+Q 959 1106 1268 751 
+Q 1578 397 2138 397 
+Q 2388 397 2633 464 
+Q 2878 531 3122 666 
+L 3122 134 
+Q 2881 22 2623 -34 
+Q 2366 -91 2075 -91 
+Q 1284 -91 818 406 
+Q 353 903 353 1747 
+Q 353 2603 823 3093 
+Q 1294 3584 2113 3584 
+Q 2378 3584 2631 3529 
+Q 2884 3475 3122 3366 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-7a" d="M 353 3500 
+L 3084 3500 
+L 3084 2975 
+L 922 459 
+L 3084 459 
+L 3084 0 
+L 275 0 
+L 275 525 
+L 2438 3041 
+L 353 3041 
+L 353 3500 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-76"/>
+       <use xlink:href="#DejaVuSans-65" x="59.179688"/>
+       <use xlink:href="#DejaVuSans-63" x="120.703125"/>
+       <use xlink:href="#DejaVuSans-74" x="175.683594"/>
+       <use xlink:href="#DejaVuSans-6f" x="214.892578"/>
+       <use xlink:href="#DejaVuSans-72" x="276.074219"/>
+       <use xlink:href="#DejaVuSans-69" x="317.1875"/>
+       <use xlink:href="#DejaVuSans-7a" x="344.970703"/>
+       <use xlink:href="#DejaVuSans-65" x="397.460938"/>
+       <use xlink:href="#DejaVuSans-64" x="458.984375"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_4">
+     <g id="line2d_4">
+      <path d="M 379.745455 320.4 
+L 379.745455 43.2 
+" clip-path="url(#p2c5cd7951c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_4">
+      <!-- kernel -->
+      <g style="fill: #262626" transform="translate(364.352486 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-6b" d="M 581 4863 
+L 1159 4863 
+L 1159 1991 
+L 2875 3500 
+L 3609 3500 
+L 1753 1863 
+L 3688 0 
+L 2938 0 
+L 1159 1709 
+L 1159 0 
+L 581 0 
+L 581 4863 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-6c" d="M 603 4863 
+L 1178 4863 
+L 1178 0 
+L 603 0 
+L 603 4863 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-6b"/>
+       <use xlink:href="#DejaVuSans-65" x="54.285156"/>
+       <use xlink:href="#DejaVuSans-72" x="115.808594"/>
+       <use xlink:href="#DejaVuSans-6e" x="155.171875"/>
+       <use xlink:href="#DejaVuSans-65" x="218.550781"/>
+       <use xlink:href="#DejaVuSans-6c" x="280.074219"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_5">
+     <g id="line2d_5">
+      <path d="M 464.290909 320.4 
+L 464.290909 43.2 
+" clip-path="url(#p2c5cd7951c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_5">
+      <!-- blocked -->
+      <g style="fill: #262626" transform="translate(444.95419 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-62" d="M 3116 1747 
+Q 3116 2381 2855 2742 
+Q 2594 3103 2138 3103 
+Q 1681 3103 1420 2742 
+Q 1159 2381 1159 1747 
+Q 1159 1113 1420 752 
+Q 1681 391 2138 391 
+Q 2594 391 2855 752 
+Q 3116 1113 3116 1747 
+z
+M 1159 2969 
+Q 1341 3281 1617 3432 
+Q 1894 3584 2278 3584 
+Q 2916 3584 3314 3078 
+Q 3713 2572 3713 1747 
+Q 3713 922 3314 415 
+Q 2916 -91 2278 -91 
+Q 1894 -91 1617 61 
+Q 1341 213 1159 525 
+L 1159 0 
+L 581 0 
+L 581 4863 
+L 1159 4863 
+L 1159 2969 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-62"/>
+       <use xlink:href="#DejaVuSans-6c" x="63.476562"/>
+       <use xlink:href="#DejaVuSans-6f" x="91.259766"/>
+       <use xlink:href="#DejaVuSans-63" x="152.441406"/>
+       <use xlink:href="#DejaVuSans-6b" x="207.421875"/>
+       <use xlink:href="#DejaVuSans-65" x="261.707031"/>
+       <use xlink:href="#DejaVuSans-64" x="323.230469"/>
+      </g>
+     </g>
+    </g>
+   </g>
+   <g id="matplotlib.axis_2">
+    <g id="ytick_1">
+     <g id="line2d_6">
+      <path d="M 72 320.4 
+L 518.4 320.4 
+" clip-path="url(#p2c5cd7951c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_6">
+      <!-- 0 -->
+      <g style="fill: #262626" transform="translate(55.50125 324.579141)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-30" d="M 2034 4250 
+Q 1547 4250 1301 3770 
+Q 1056 3291 1056 2328 
+Q 1056 1369 1301 889 
+Q 1547 409 2034 409 
+Q 2525 409 2770 889 
+Q 3016 1369 3016 2328 
+Q 3016 3291 2770 3770 
+Q 2525 4250 2034 4250 
+z
+M 2034 4750 
+Q 2819 4750 3233 4129 
+Q 3647 3509 3647 2328 
+Q 3647 1150 3233 529 
+Q 2819 -91 2034 -91 
+Q 1250 -91 836 529 
+Q 422 1150 422 2328 
+Q 422 3509 836 4129 
+Q 1250 4750 2034 4750 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-30"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_2">
+     <g id="line2d_7">
+      <path d="M 72 286.8 
+L 518.4 286.8 
+" clip-path="url(#p2c5cd7951c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_7">
+      <!-- 2 -->
+      <g style="fill: #262626" transform="translate(55.50125 290.979141)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-32" d="M 1228 531 
+L 3431 531 
+L 3431 0 
+L 469 0 
+L 469 531 
+Q 828 903 1448 1529 
+Q 2069 2156 2228 2338 
+Q 2531 2678 2651 2914 
+Q 2772 3150 2772 3378 
+Q 2772 3750 2511 3984 
+Q 2250 4219 1831 4219 
+Q 1534 4219 1204 4116 
+Q 875 4013 500 3803 
+L 500 4441 
+Q 881 4594 1212 4672 
+Q 1544 4750 1819 4750 
+Q 2544 4750 2975 4387 
+Q 3406 4025 3406 3419 
+Q 3406 3131 3298 2873 
+Q 3191 2616 2906 2266 
+Q 2828 2175 2409 1742 
+Q 1991 1309 1228 531 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-32"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_3">
+     <g id="line2d_8">
+      <path d="M 72 253.2 
+L 518.4 253.2 
+" clip-path="url(#p2c5cd7951c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_8">
+      <!-- 4 -->
+      <g style="fill: #262626" transform="translate(55.50125 257.379141)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-34" d="M 2419 4116 
+L 825 1625 
+L 2419 1625 
+L 2419 4116 
+z
+M 2253 4666 
+L 3047 4666 
+L 3047 1625 
+L 3713 1625 
+L 3713 1100 
+L 3047 1100 
+L 3047 0 
+L 2419 0 
+L 2419 1100 
+L 313 1100 
+L 313 1709 
+L 2253 4666 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-34"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_4">
+     <g id="line2d_9">
+      <path d="M 72 219.6 
+L 518.4 219.6 
+" clip-path="url(#p2c5cd7951c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_9">
+      <!-- 6 -->
+      <g style="fill: #262626" transform="translate(55.50125 223.779141)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-36" d="M 2113 2584 
+Q 1688 2584 1439 2293 
+Q 1191 2003 1191 1497 
+Q 1191 994 1439 701 
+Q 1688 409 2113 409 
+Q 2538 409 2786 701 
+Q 3034 994 3034 1497 
+Q 3034 2003 2786 2293 
+Q 2538 2584 2113 2584 
+z
+M 3366 4563 
+L 3366 3988 
+Q 3128 4100 2886 4159 
+Q 2644 4219 2406 4219 
+Q 1781 4219 1451 3797 
+Q 1122 3375 1075 2522 
+Q 1259 2794 1537 2939 
+Q 1816 3084 2150 3084 
+Q 2853 3084 3261 2657 
+Q 3669 2231 3669 1497 
+Q 3669 778 3244 343 
+Q 2819 -91 2113 -91 
+Q 1303 -91 875 529 
+Q 447 1150 447 2328 
+Q 447 3434 972 4092 
+Q 1497 4750 2381 4750 
+Q 2619 4750 2861 4703 
+Q 3103 4656 3366 4563 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-36"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_5">
+     <g id="line2d_10">
+      <path d="M 72 186 
+L 518.4 186 
+" clip-path="url(#p2c5cd7951c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_10">
+      <!-- 8 -->
+      <g style="fill: #262626" transform="translate(55.50125 190.179141)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-38" d="M 2034 2216 
+Q 1584 2216 1326 1975 
+Q 1069 1734 1069 1313 
+Q 1069 891 1326 650 
+Q 1584 409 2034 409 
+Q 2484 409 2743 651 
+Q 3003 894 3003 1313 
+Q 3003 1734 2745 1975 
+Q 2488 2216 2034 2216 
+z
+M 1403 2484 
+Q 997 2584 770 2862 
+Q 544 3141 544 3541 
+Q 544 4100 942 4425 
+Q 1341 4750 2034 4750 
+Q 2731 4750 3128 4425 
+Q 3525 4100 3525 3541 
+Q 3525 3141 3298 2862 
+Q 3072 2584 2669 2484 
+Q 3125 2378 3379 2068 
+Q 3634 1759 3634 1313 
+Q 3634 634 3220 271 
+Q 2806 -91 2034 -91 
+Q 1263 -91 848 271 
+Q 434 634 434 1313 
+Q 434 1759 690 2068 
+Q 947 2378 1403 2484 
+z
+M 1172 3481 
+Q 1172 3119 1398 2916 
+Q 1625 2713 2034 2713 
+Q 2441 2713 2670 2916 
+Q 2900 3119 2900 3481 
+Q 2900 3844 2670 4047 
+Q 2441 4250 2034 4250 
+Q 1625 4250 1398 4047 
+Q 1172 3844 1172 3481 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-38"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_6">
+     <g id="line2d_11">
+      <path d="M 72 152.4 
+L 518.4 152.4 
+" clip-path="url(#p2c5cd7951c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_11">
+      <!-- 10 -->
+      <g style="fill: #262626" transform="translate(48.5025 156.579141)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-31" d="M 794 531 
+L 1825 531 
+L 1825 4091 
+L 703 3866 
+L 703 4441 
+L 1819 4666 
+L 2450 4666 
+L 2450 531 
+L 3481 531 
+L 3481 0 
+L 794 0 
+L 794 531 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-30" x="63.623047"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_7">
+     <g id="line2d_12">
+      <path d="M 72 118.8 
+L 518.4 118.8 
+" clip-path="url(#p2c5cd7951c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_12">
+      <!-- 12 -->
+      <g style="fill: #262626" transform="translate(48.5025 122.979141)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-32" x="63.623047"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_8">
+     <g id="line2d_13">
+      <path d="M 72 85.2 
+L 518.4 85.2 
+" clip-path="url(#p2c5cd7951c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_13">
+      <!-- 14 -->
+      <g style="fill: #262626" transform="translate(48.5025 89.379141)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-34" x="63.623047"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_9">
+     <g id="line2d_14">
+      <path d="M 72 51.6 
+L 518.4 51.6 
+" clip-path="url(#p2c5cd7951c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_14">
+      <!-- 16 -->
+      <g style="fill: #262626" transform="translate(48.5025 55.779141)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-36" x="63.623047"/>
+      </g>
+     </g>
+    </g>
+    <g id="text_15">
+     <!-- GFLOPS -->
+     <g style="fill: #262626" transform="translate(42.006875 205.175625)rotate(-90)scale(0.12 -0.12)">
+      <defs>
+       <path id="DejaVuSans-47" d="M 3809 666 
+L 3809 1919 
+L 2778 1919 
+L 2778 2438 
+L 4434 2438 
+L 4434 434 
+Q 4069 175 3628 42 
+Q 3188 -91 2688 -91 
+Q 1594 -91 976 548 
+Q 359 1188 359 2328 
+Q 359 3472 976 4111 
+Q 1594 4750 2688 4750 
+Q 3144 4750 3555 4637 
+Q 3966 4525 4313 4306 
+L 4313 3634 
+Q 3963 3931 3569 4081 
+Q 3175 4231 2741 4231 
+Q 1884 4231 1454 3753 
+Q 1025 3275 1025 2328 
+Q 1025 1384 1454 906 
+Q 1884 428 2741 428 
+Q 3075 428 3337 486 
+Q 3600 544 3809 666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-46" d="M 628 4666 
+L 3309 4666 
+L 3309 4134 
+L 1259 4134 
+L 1259 2759 
+L 3109 2759 
+L 3109 2228 
+L 1259 2228 
+L 1259 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-4c" d="M 628 4666 
+L 1259 4666 
+L 1259 531 
+L 3531 531 
+L 3531 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-4f" d="M 2522 4238 
+Q 1834 4238 1429 3725 
+Q 1025 3213 1025 2328 
+Q 1025 1447 1429 934 
+Q 1834 422 2522 422 
+Q 3209 422 3611 934 
+Q 4013 1447 4013 2328 
+Q 4013 3213 3611 3725 
+Q 3209 4238 2522 4238 
+z
+M 2522 4750 
+Q 3503 4750 4090 4092 
+Q 4678 3434 4678 2328 
+Q 4678 1225 4090 567 
+Q 3503 -91 2522 -91 
+Q 1538 -91 948 565 
+Q 359 1222 359 2328 
+Q 359 3434 948 4092 
+Q 1538 4750 2522 4750 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-50" d="M 1259 4147 
+L 1259 2394 
+L 2053 2394 
+Q 2494 2394 2734 2622 
+Q 2975 2850 2975 3272 
+Q 2975 3691 2734 3919 
+Q 2494 4147 2053 4147 
+L 1259 4147 
+z
+M 628 4666 
+L 2053 4666 
+Q 2838 4666 3239 4311 
+Q 3641 3956 3641 3272 
+Q 3641 2581 3239 2228 
+Q 2838 1875 2053 1875 
+L 1259 1875 
+L 1259 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-53" d="M 3425 4513 
+L 3425 3897 
+Q 3066 4069 2747 4153 
+Q 2428 4238 2131 4238 
+Q 1616 4238 1336 4038 
+Q 1056 3838 1056 3469 
+Q 1056 3159 1242 3001 
+Q 1428 2844 1947 2747 
+L 2328 2669 
+Q 3034 2534 3370 2195 
+Q 3706 1856 3706 1288 
+Q 3706 609 3251 259 
+Q 2797 -91 1919 -91 
+Q 1588 -91 1214 -16 
+Q 841 59 441 206 
+L 441 856 
+Q 825 641 1194 531 
+Q 1563 422 1919 422 
+Q 2459 422 2753 634 
+Q 3047 847 3047 1241 
+Q 3047 1584 2836 1778 
+Q 2625 1972 2144 2069 
+L 1759 2144 
+Q 1053 2284 737 2584 
+Q 422 2884 422 3419 
+Q 422 4038 858 4394 
+Q 1294 4750 2059 4750 
+Q 2388 4750 2728 4690 
+Q 3069 4631 3425 4513 
+z
+" transform="scale(0.015625)"/>
+      </defs>
+      <use xlink:href="#DejaVuSans-47"/>
+      <use xlink:href="#DejaVuSans-46" x="77.490234"/>
+      <use xlink:href="#DejaVuSans-4c" x="135.009766"/>
+      <use xlink:href="#DejaVuSans-4f" x="187.097656"/>
+      <use xlink:href="#DejaVuSans-50" x="265.808594"/>
+      <use xlink:href="#DejaVuSans-53" x="326.111328"/>
+     </g>
+    </g>
+   </g>
+   <g id="patch_3">
+    <path d="M 92.290909 320.4 
+L 159.927273 320.4 
+L 159.927273 313.300939 
+L 92.290909 313.300939 
+z
+" clip-path="url(#p2c5cd7951c)" style="fill: #4c72b0; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_4">
+    <path d="M 176.836364 320.4 
+L 244.472727 320.4 
+L 244.472727 310.793018 
+L 176.836364 310.793018 
+z
+" clip-path="url(#p2c5cd7951c)" style="fill: #dd8452; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_5">
+    <path d="M 261.381818 320.4 
+L 329.018182 320.4 
+L 329.018182 282.254245 
+L 261.381818 282.254245 
+z
+" clip-path="url(#p2c5cd7951c)" style="fill: #55a868; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_6">
+    <path d="M 345.927273 320.4 
+L 413.563636 320.4 
+L 413.563636 267.37833 
+L 345.927273 267.37833 
+z
+" clip-path="url(#p2c5cd7951c)" style="fill: #c44e52; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_7">
+    <path d="M 430.472727 320.4 
+L 498.109091 320.4 
+L 498.109091 62.730564 
+L 430.472727 62.730564 
+z
+" clip-path="url(#p2c5cd7951c)" style="fill: #8172b3; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_8">
+    <path d="M 72 320.4 
+L 72 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_9">
+    <path d="M 518.4 320.4 
+L 518.4 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_10">
+    <path d="M 72 320.4 
+L 518.4 320.4 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_11">
+    <path d="M 72 43.2 
+L 518.4 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="text_16">
+    <!-- 1.00x -->
+    <g style="fill: #262626" transform="translate(107.435966 308.305314)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-31" d="M 750 831 
+L 1813 831 
+L 1813 3847 
+L 722 3622 
+L 722 4441 
+L 1806 4666 
+L 2950 4666 
+L 2950 831 
+L 4013 831 
+L 4013 0 
+L 750 0 
+L 750 831 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Bold-2e" d="M 653 1209 
+L 1778 1209 
+L 1778 0 
+L 653 0 
+L 653 1209 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Bold-30" d="M 2944 2338 
+Q 2944 3213 2780 3570 
+Q 2616 3928 2228 3928 
+Q 1841 3928 1675 3570 
+Q 1509 3213 1509 2338 
+Q 1509 1453 1675 1090 
+Q 1841 728 2228 728 
+Q 2613 728 2778 1090 
+Q 2944 1453 2944 2338 
+z
+M 4147 2328 
+Q 4147 1169 3647 539 
+Q 3147 -91 2228 -91 
+Q 1306 -91 806 539 
+Q 306 1169 306 2328 
+Q 306 3491 806 4120 
+Q 1306 4750 2228 4750 
+Q 3147 4750 3647 4120 
+Q 4147 3491 4147 2328 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Bold-78" d="M 1422 1791 
+L 159 3500 
+L 1344 3500 
+L 2059 2463 
+L 2784 3500 
+L 3969 3500 
+L 2706 1797 
+L 4031 0 
+L 2847 0 
+L 2059 1106 
+L 1281 0 
+L 97 0 
+L 1422 1791 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-31"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-30" x="107.568359"/>
+     <use xlink:href="#DejaVuSans-Bold-30" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="246.728516"/>
+    </g>
+   </g>
+   <g id="text_17">
+    <!-- 1.35x -->
+    <g style="fill: #262626" transform="translate(191.98142 305.797393)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-33" d="M 2981 2516 
+Q 3453 2394 3698 2092 
+Q 3944 1791 3944 1325 
+Q 3944 631 3412 270 
+Q 2881 -91 1863 -91 
+Q 1503 -91 1142 -33 
+Q 781 25 428 141 
+L 428 1069 
+Q 766 900 1098 814 
+Q 1431 728 1753 728 
+Q 2231 728 2486 893 
+Q 2741 1059 2741 1369 
+Q 2741 1688 2480 1852 
+Q 2219 2016 1709 2016 
+L 1228 2016 
+L 1228 2791 
+L 1734 2791 
+Q 2188 2791 2409 2933 
+Q 2631 3075 2631 3366 
+Q 2631 3634 2415 3781 
+Q 2200 3928 1806 3928 
+Q 1516 3928 1219 3862 
+Q 922 3797 628 3669 
+L 628 4550 
+Q 984 4650 1334 4700 
+Q 1684 4750 2022 4750 
+Q 2931 4750 3382 4451 
+Q 3834 4153 3834 3553 
+Q 3834 3144 3618 2883 
+Q 3403 2622 2981 2516 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Bold-35" d="M 678 4666 
+L 3669 4666 
+L 3669 3781 
+L 1638 3781 
+L 1638 3059 
+Q 1775 3097 1914 3117 
+Q 2053 3138 2203 3138 
+Q 3056 3138 3531 2711 
+Q 4006 2284 4006 1522 
+Q 4006 766 3489 337 
+Q 2972 -91 2053 -91 
+Q 1656 -91 1267 -14 
+Q 878 63 494 219 
+L 494 1166 
+Q 875 947 1217 837 
+Q 1559 728 1863 728 
+Q 2300 728 2551 942 
+Q 2803 1156 2803 1522 
+Q 2803 1891 2551 2103 
+Q 2300 2316 1863 2316 
+Q 1603 2316 1309 2248 
+Q 1016 2181 678 2041 
+L 678 4666 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-31"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-33" x="107.568359"/>
+     <use xlink:href="#DejaVuSans-Bold-35" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="246.728516"/>
+    </g>
+   </g>
+   <g id="text_18">
+    <!-- 5.37x -->
+    <g style="fill: #262626" transform="translate(276.526875 277.25862)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-37" d="M 428 4666 
+L 3944 4666 
+L 3944 3988 
+L 2125 0 
+L 953 0 
+L 2675 3781 
+L 428 3781 
+L 428 4666 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-35"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-33" x="107.568359"/>
+     <use xlink:href="#DejaVuSans-Bold-37" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="246.728516"/>
+    </g>
+   </g>
+   <g id="text_19">
+    <!-- 7.47x -->
+    <g style="fill: #262626" transform="translate(361.07233 262.382705)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-34" d="M 2356 3675 
+L 1038 1722 
+L 2356 1722 
+L 2356 3675 
+z
+M 2156 4666 
+L 3494 4666 
+L 3494 1722 
+L 4159 1722 
+L 4159 850 
+L 3494 850 
+L 3494 0 
+L 2356 0 
+L 2356 850 
+L 288 850 
+L 288 1881 
+L 2156 4666 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-37"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-34" x="107.568359"/>
+     <use xlink:href="#DejaVuSans-Bold-37" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="246.728516"/>
+    </g>
+   </g>
+   <g id="text_20">
+    <!-- 36.30x -->
+    <g style="fill: #262626" transform="translate(441.443097 57.734939)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-36" d="M 2316 2303 
+Q 2000 2303 1842 2098 
+Q 1684 1894 1684 1484 
+Q 1684 1075 1842 870 
+Q 2000 666 2316 666 
+Q 2634 666 2792 870 
+Q 2950 1075 2950 1484 
+Q 2950 1894 2792 2098 
+Q 2634 2303 2316 2303 
+z
+M 3803 4544 
+L 3803 3681 
+Q 3506 3822 3243 3889 
+Q 2981 3956 2731 3956 
+Q 2194 3956 1894 3657 
+Q 1594 3359 1544 2772 
+Q 1750 2925 1990 3001 
+Q 2231 3078 2516 3078 
+Q 3231 3078 3670 2659 
+Q 4109 2241 4109 1563 
+Q 4109 813 3618 361 
+Q 3128 -91 2303 -91 
+Q 1394 -91 895 523 
+Q 397 1138 397 2266 
+Q 397 3422 980 4083 
+Q 1563 4744 2578 4744 
+Q 2900 4744 3203 4694 
+Q 3506 4644 3803 4544 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-33"/>
+     <use xlink:href="#DejaVuSans-Bold-36" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="139.160156"/>
+     <use xlink:href="#DejaVuSans-Bold-33" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-30" x="246.728516"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="316.308594"/>
+    </g>
+   </g>
+   <g id="text_21">
+    <!-- Matrix multiplication ($n=1920$) -->
+    <g style="fill: #262626" transform="translate(184.74 23.2)scale(0.14 -0.14)">
+     <defs>
+      <path id="DejaVuSans-4d" d="M 628 4666 
+L 1569 4666 
+L 2759 1491 
+L 3956 4666 
+L 4897 4666 
+L 4897 0 
+L 4281 0 
+L 4281 4097 
+L 3078 897 
+L 2444 897 
+L 1241 4097 
+L 1241 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-78" d="M 3513 3500 
+L 2247 1797 
+L 3578 0 
+L 2900 0 
+L 1881 1375 
+L 863 0 
+L 184 0 
+L 1544 1831 
+L 300 3500 
+L 978 3500 
+L 1906 2253 
+L 2834 3500 
+L 3513 3500 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-20" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-6d" d="M 3328 2828 
+Q 3544 3216 3844 3400 
+Q 4144 3584 4550 3584 
+Q 5097 3584 5394 3201 
+Q 5691 2819 5691 2113 
+L 5691 0 
+L 5113 0 
+L 5113 2094 
+Q 5113 2597 4934 2840 
+Q 4756 3084 4391 3084 
+Q 3944 3084 3684 2787 
+Q 3425 2491 3425 1978 
+L 3425 0 
+L 2847 0 
+L 2847 2094 
+Q 2847 2600 2669 2842 
+Q 2491 3084 2119 3084 
+Q 1678 3084 1418 2786 
+Q 1159 2488 1159 1978 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1356 3278 1631 3431 
+Q 1906 3584 2284 3584 
+Q 2666 3584 2933 3390 
+Q 3200 3197 3328 2828 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-75" d="M 544 1381 
+L 544 3500 
+L 1119 3500 
+L 1119 1403 
+Q 1119 906 1312 657 
+Q 1506 409 1894 409 
+Q 2359 409 2629 706 
+Q 2900 1003 2900 1516 
+L 2900 3500 
+L 3475 3500 
+L 3475 0 
+L 2900 0 
+L 2900 538 
+Q 2691 219 2414 64 
+Q 2138 -91 1772 -91 
+Q 1169 -91 856 284 
+Q 544 659 544 1381 
+z
+M 1991 3584 
+L 1991 3584 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-28" d="M 1984 4856 
+Q 1566 4138 1362 3434 
+Q 1159 2731 1159 2009 
+Q 1159 1288 1364 580 
+Q 1569 -128 1984 -844 
+L 1484 -844 
+Q 1016 -109 783 600 
+Q 550 1309 550 2009 
+Q 550 2706 781 3412 
+Q 1013 4119 1484 4856 
+L 1984 4856 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Oblique-6e" d="M 3566 2113 
+L 3156 0 
+L 2578 0 
+L 2988 2091 
+Q 3016 2238 3031 2350 
+Q 3047 2463 3047 2528 
+Q 3047 2791 2881 2937 
+Q 2716 3084 2419 3084 
+Q 1956 3084 1622 2776 
+Q 1288 2469 1184 1941 
+L 800 0 
+L 225 0 
+L 903 3500 
+L 1478 3500 
+L 1363 2950 
+Q 1603 3253 1940 3418 
+Q 2278 3584 2650 3584 
+Q 3113 3584 3367 3334 
+Q 3622 3084 3622 2631 
+Q 3622 2519 3608 2391 
+Q 3594 2263 3566 2113 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-3d" d="M 678 2906 
+L 4684 2906 
+L 4684 2381 
+L 678 2381 
+L 678 2906 
+z
+M 678 1631 
+L 4684 1631 
+L 4684 1100 
+L 678 1100 
+L 678 1631 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-39" d="M 703 97 
+L 703 672 
+Q 941 559 1184 500 
+Q 1428 441 1663 441 
+Q 2288 441 2617 861 
+Q 2947 1281 2994 2138 
+Q 2813 1869 2534 1725 
+Q 2256 1581 1919 1581 
+Q 1219 1581 811 2004 
+Q 403 2428 403 3163 
+Q 403 3881 828 4315 
+Q 1253 4750 1959 4750 
+Q 2769 4750 3195 4129 
+Q 3622 3509 3622 2328 
+Q 3622 1225 3098 567 
+Q 2575 -91 1691 -91 
+Q 1453 -91 1209 -44 
+Q 966 3 703 97 
+z
+M 1959 2075 
+Q 2384 2075 2632 2365 
+Q 2881 2656 2881 3163 
+Q 2881 3666 2632 3958 
+Q 2384 4250 1959 4250 
+Q 1534 4250 1286 3958 
+Q 1038 3666 1038 3163 
+Q 1038 2656 1286 2365 
+Q 1534 2075 1959 2075 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-29" d="M 513 4856 
+L 1013 4856 
+Q 1481 4119 1714 3412 
+Q 1947 2706 1947 2009 
+Q 1947 1309 1714 600 
+Q 1481 -109 1013 -844 
+L 513 -844 
+Q 928 -128 1133 580 
+Q 1338 1288 1338 2009 
+Q 1338 2731 1133 3434 
+Q 928 4138 513 4856 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-4d" transform="translate(0 0.015625)"/>
+     <use xlink:href="#DejaVuSans-61" transform="translate(86.279297 0.015625)"/>
+     <use xlink:href="#DejaVuSans-74" transform="translate(147.558594 0.015625)"/>
+     <use xlink:href="#DejaVuSans-72" transform="translate(186.767578 0.015625)"/>
+     <use xlink:href="#DejaVuSans-69" transform="translate(227.880859 0.015625)"/>
+     <use xlink:href="#DejaVuSans-78" transform="translate(255.664062 0.015625)"/>
+     <use xlink:href="#DejaVuSans-20" transform="translate(314.84375 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6d" transform="translate(346.630859 0.015625)"/>
+     <use xlink:href="#DejaVuSans-75" transform="translate(444.042969 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6c" transform="translate(507.421875 0.015625)"/>
+     <use xlink:href="#DejaVuSans-74" transform="translate(535.205078 0.015625)"/>
+     <use xlink:href="#DejaVuSans-69" transform="translate(574.414062 0.015625)"/>
+     <use xlink:href="#DejaVuSans-70" transform="translate(602.197266 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6c" transform="translate(665.673828 0.015625)"/>
+     <use xlink:href="#DejaVuSans-69" transform="translate(693.457031 0.015625)"/>
+     <use xlink:href="#DejaVuSans-63" transform="translate(721.240234 0.015625)"/>
+     <use xlink:href="#DejaVuSans-61" transform="translate(776.220703 0.015625)"/>
+     <use xlink:href="#DejaVuSans-74" transform="translate(837.5 0.015625)"/>
+     <use xlink:href="#DejaVuSans-69" transform="translate(876.708984 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6f" transform="translate(904.492188 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6e" transform="translate(965.673828 0.015625)"/>
+     <use xlink:href="#DejaVuSans-20" transform="translate(1029.052734 0.015625)"/>
+     <use xlink:href="#DejaVuSans-28" transform="translate(1060.839844 0.015625)"/>
+     <use xlink:href="#DejaVuSans-Oblique-6e" transform="translate(1099.853516 0.015625)"/>
+     <use xlink:href="#DejaVuSans-3d" transform="translate(1182.714844 0.015625)"/>
+     <use xlink:href="#DejaVuSans-31" transform="translate(1285.986328 0.015625)"/>
+     <use xlink:href="#DejaVuSans-39" transform="translate(1349.609375 0.015625)"/>
+     <use xlink:href="#DejaVuSans-32" transform="translate(1411.482422 0.015625)"/>
+     <use xlink:href="#DejaVuSans-30" transform="translate(1475.105469 0.015625)"/>
+     <use xlink:href="#DejaVuSans-29" transform="translate(1538.728516 0.015625)"/>
+    </g>
+   </g>
+  </g>
+ </g>
+ <defs>
+  <clipPath id="p2c5cd7951c">
+   <rect x="72" y="43.2" width="446.4" height="277.2"/>
+  </clipPath>
+ </defs>
+</svg>
diff --git a/content/english/hpc/algorithms/img/mm-blocked-plot.svg b/content/english/hpc/algorithms/img/mm-blocked-plot.svg
new file mode 100644
index 00000000..87dda835
--- /dev/null
+++ b/content/english/hpc/algorithms/img/mm-blocked-plot.svg
@@ -0,0 +1,1474 @@
+<?xml version="1.0" encoding="utf-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+  "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<svg xmlns:xlink="http://www.w3.org/1999/xlink" width="576pt" height="360pt" viewBox="0 0 576 360" xmlns="http://www.w3.org/2000/svg" version="1.1">
+ <metadata>
+  <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
+   <cc:Work>
+    <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/>
+    <dc:date>2022-04-05T01:18:54.049300</dc:date>
+    <dc:format>image/svg+xml</dc:format>
+    <dc:creator>
+     <cc:Agent>
+      <dc:title>Matplotlib v3.5.1, https://matplotlib.org/</dc:title>
+     </cc:Agent>
+    </dc:creator>
+   </cc:Work>
+  </rdf:RDF>
+ </metadata>
+ <defs>
+  <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style>
+ </defs>
+ <g id="figure_1">
+  <g id="patch_1">
+   <path d="M 0 360 
+L 576 360 
+L 576 0 
+L 0 0 
+z
+" style="fill: #ffffff"/>
+  </g>
+  <g id="axes_1">
+   <g id="patch_2">
+    <path d="M 72 320.4 
+L 518.4 320.4 
+L 518.4 43.2 
+L 72 43.2 
+z
+" style="fill: #ffffff"/>
+   </g>
+   <g id="matplotlib.axis_1">
+    <g id="xtick_1">
+     <g id="line2d_1">
+      <path d="M 117.784615 320.4 
+L 117.784615 43.2 
+" clip-path="url(#pc0259ccbb5)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_1">
+      <!-- 240 -->
+      <g style="fill: #262626" transform="translate(107.28649 338.258281)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-32" d="M 1228 531 
+L 3431 531 
+L 3431 0 
+L 469 0 
+L 469 531 
+Q 828 903 1448 1529 
+Q 2069 2156 2228 2338 
+Q 2531 2678 2651 2914 
+Q 2772 3150 2772 3378 
+Q 2772 3750 2511 3984 
+Q 2250 4219 1831 4219 
+Q 1534 4219 1204 4116 
+Q 875 4013 500 3803 
+L 500 4441 
+Q 881 4594 1212 4672 
+Q 1544 4750 1819 4750 
+Q 2544 4750 2975 4387 
+Q 3406 4025 3406 3419 
+Q 3406 3131 3298 2873 
+Q 3191 2616 2906 2266 
+Q 2828 2175 2409 1742 
+Q 1991 1309 1228 531 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-34" d="M 2419 4116 
+L 825 1625 
+L 2419 1625 
+L 2419 4116 
+z
+M 2253 4666 
+L 3047 4666 
+L 3047 1625 
+L 3713 1625 
+L 3713 1100 
+L 3047 1100 
+L 3047 0 
+L 2419 0 
+L 2419 1100 
+L 313 1100 
+L 313 1709 
+L 2253 4666 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-30" d="M 2034 4250 
+Q 1547 4250 1301 3770 
+Q 1056 3291 1056 2328 
+Q 1056 1369 1301 889 
+Q 1547 409 2034 409 
+Q 2525 409 2770 889 
+Q 3016 1369 3016 2328 
+Q 3016 3291 2770 3770 
+Q 2525 4250 2034 4250 
+z
+M 2034 4750 
+Q 2819 4750 3233 4129 
+Q 3647 3509 3647 2328 
+Q 3647 1150 3233 529 
+Q 2819 -91 2034 -91 
+Q 1250 -91 836 529 
+Q 422 1150 422 2328 
+Q 422 3509 836 4129 
+Q 1250 4750 2034 4750 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-32"/>
+       <use xlink:href="#DejaVuSans-34" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="127.246094"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_2">
+     <g id="line2d_2">
+      <path d="M 175.015385 320.4 
+L 175.015385 43.2 
+" clip-path="url(#pc0259ccbb5)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_2">
+      <!-- 480 -->
+      <g style="fill: #262626" transform="translate(164.51726 338.258281)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-38" d="M 2034 2216 
+Q 1584 2216 1326 1975 
+Q 1069 1734 1069 1313 
+Q 1069 891 1326 650 
+Q 1584 409 2034 409 
+Q 2484 409 2743 651 
+Q 3003 894 3003 1313 
+Q 3003 1734 2745 1975 
+Q 2488 2216 2034 2216 
+z
+M 1403 2484 
+Q 997 2584 770 2862 
+Q 544 3141 544 3541 
+Q 544 4100 942 4425 
+Q 1341 4750 2034 4750 
+Q 2731 4750 3128 4425 
+Q 3525 4100 3525 3541 
+Q 3525 3141 3298 2862 
+Q 3072 2584 2669 2484 
+Q 3125 2378 3379 2068 
+Q 3634 1759 3634 1313 
+Q 3634 634 3220 271 
+Q 2806 -91 2034 -91 
+Q 1263 -91 848 271 
+Q 434 634 434 1313 
+Q 434 1759 690 2068 
+Q 947 2378 1403 2484 
+z
+M 1172 3481 
+Q 1172 3119 1398 2916 
+Q 1625 2713 2034 2713 
+Q 2441 2713 2670 2916 
+Q 2900 3119 2900 3481 
+Q 2900 3844 2670 4047 
+Q 2441 4250 2034 4250 
+Q 1625 4250 1398 4047 
+Q 1172 3844 1172 3481 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-34"/>
+       <use xlink:href="#DejaVuSans-38" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="127.246094"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_3">
+     <g id="line2d_3">
+      <path d="M 232.246154 320.4 
+L 232.246154 43.2 
+" clip-path="url(#pc0259ccbb5)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_3">
+      <!-- 720 -->
+      <g style="fill: #262626" transform="translate(221.748029 338.258281)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-37" d="M 525 4666 
+L 3525 4666 
+L 3525 4397 
+L 1831 0 
+L 1172 0 
+L 2766 4134 
+L 525 4134 
+L 525 4666 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-37"/>
+       <use xlink:href="#DejaVuSans-32" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="127.246094"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_4">
+     <g id="line2d_4">
+      <path d="M 289.476923 320.4 
+L 289.476923 43.2 
+" clip-path="url(#pc0259ccbb5)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_4">
+      <!-- 960 -->
+      <g style="fill: #262626" transform="translate(278.978798 338.258281)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-39" d="M 703 97 
+L 703 672 
+Q 941 559 1184 500 
+Q 1428 441 1663 441 
+Q 2288 441 2617 861 
+Q 2947 1281 2994 2138 
+Q 2813 1869 2534 1725 
+Q 2256 1581 1919 1581 
+Q 1219 1581 811 2004 
+Q 403 2428 403 3163 
+Q 403 3881 828 4315 
+Q 1253 4750 1959 4750 
+Q 2769 4750 3195 4129 
+Q 3622 3509 3622 2328 
+Q 3622 1225 3098 567 
+Q 2575 -91 1691 -91 
+Q 1453 -91 1209 -44 
+Q 966 3 703 97 
+z
+M 1959 2075 
+Q 2384 2075 2632 2365 
+Q 2881 2656 2881 3163 
+Q 2881 3666 2632 3958 
+Q 2384 4250 1959 4250 
+Q 1534 4250 1286 3958 
+Q 1038 3666 1038 3163 
+Q 1038 2656 1286 2365 
+Q 1534 2075 1959 2075 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-36" d="M 2113 2584 
+Q 1688 2584 1439 2293 
+Q 1191 2003 1191 1497 
+Q 1191 994 1439 701 
+Q 1688 409 2113 409 
+Q 2538 409 2786 701 
+Q 3034 994 3034 1497 
+Q 3034 2003 2786 2293 
+Q 2538 2584 2113 2584 
+z
+M 3366 4563 
+L 3366 3988 
+Q 3128 4100 2886 4159 
+Q 2644 4219 2406 4219 
+Q 1781 4219 1451 3797 
+Q 1122 3375 1075 2522 
+Q 1259 2794 1537 2939 
+Q 1816 3084 2150 3084 
+Q 2853 3084 3261 2657 
+Q 3669 2231 3669 1497 
+Q 3669 778 3244 343 
+Q 2819 -91 2113 -91 
+Q 1303 -91 875 529 
+Q 447 1150 447 2328 
+Q 447 3434 972 4092 
+Q 1497 4750 2381 4750 
+Q 2619 4750 2861 4703 
+Q 3103 4656 3366 4563 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-39"/>
+       <use xlink:href="#DejaVuSans-36" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="127.246094"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_5">
+     <g id="line2d_5">
+      <path d="M 346.707692 320.4 
+L 346.707692 43.2 
+" clip-path="url(#pc0259ccbb5)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_5">
+      <!-- 1200 -->
+      <g style="fill: #262626" transform="translate(332.710192 338.258281)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-31" d="M 794 531 
+L 1825 531 
+L 1825 4091 
+L 703 3866 
+L 703 4441 
+L 1819 4666 
+L 2450 4666 
+L 2450 531 
+L 3481 531 
+L 3481 0 
+L 794 0 
+L 794 531 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-32" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="127.246094"/>
+       <use xlink:href="#DejaVuSans-30" x="190.869141"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_6">
+     <g id="line2d_6">
+      <path d="M 403.938462 320.4 
+L 403.938462 43.2 
+" clip-path="url(#pc0259ccbb5)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_6">
+      <!-- 1440 -->
+      <g style="fill: #262626" transform="translate(389.940962 338.258281)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-34" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-34" x="127.246094"/>
+       <use xlink:href="#DejaVuSans-30" x="190.869141"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_7">
+     <g id="line2d_7">
+      <path d="M 461.169231 320.4 
+L 461.169231 43.2 
+" clip-path="url(#pc0259ccbb5)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_7">
+      <!-- 1680 -->
+      <g style="fill: #262626" transform="translate(447.171731 338.258281)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-36" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-38" x="127.246094"/>
+       <use xlink:href="#DejaVuSans-30" x="190.869141"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_8">
+     <g id="line2d_8">
+      <path d="M 518.4 320.4 
+L 518.4 43.2 
+" clip-path="url(#pc0259ccbb5)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_8">
+      <!-- 1920 -->
+      <g style="fill: #262626" transform="translate(504.4025 338.258281)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-39" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-32" x="127.246094"/>
+       <use xlink:href="#DejaVuSans-30" x="190.869141"/>
+      </g>
+     </g>
+    </g>
+    <g id="text_9">
+     <!-- Matrix size ($n \times n$) -->
+     <g style="fill: #262626" transform="translate(241.2 353.664062)scale(0.12 -0.12)">
+      <defs>
+       <path id="DejaVuSans-4d" d="M 628 4666 
+L 1569 4666 
+L 2759 1491 
+L 3956 4666 
+L 4897 4666 
+L 4897 0 
+L 4281 0 
+L 4281 4097 
+L 3078 897 
+L 2444 897 
+L 1241 4097 
+L 1241 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-61" d="M 2194 1759 
+Q 1497 1759 1228 1600 
+Q 959 1441 959 1056 
+Q 959 750 1161 570 
+Q 1363 391 1709 391 
+Q 2188 391 2477 730 
+Q 2766 1069 2766 1631 
+L 2766 1759 
+L 2194 1759 
+z
+M 3341 1997 
+L 3341 0 
+L 2766 0 
+L 2766 531 
+Q 2569 213 2275 61 
+Q 1981 -91 1556 -91 
+Q 1019 -91 701 211 
+Q 384 513 384 1019 
+Q 384 1609 779 1909 
+Q 1175 2209 1959 2209 
+L 2766 2209 
+L 2766 2266 
+Q 2766 2663 2505 2880 
+Q 2244 3097 1772 3097 
+Q 1472 3097 1187 3025 
+Q 903 2953 641 2809 
+L 641 3341 
+Q 956 3463 1253 3523 
+Q 1550 3584 1831 3584 
+Q 2591 3584 2966 3190 
+Q 3341 2797 3341 1997 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-74" d="M 1172 4494 
+L 1172 3500 
+L 2356 3500 
+L 2356 3053 
+L 1172 3053 
+L 1172 1153 
+Q 1172 725 1289 603 
+Q 1406 481 1766 481 
+L 2356 481 
+L 2356 0 
+L 1766 0 
+Q 1100 0 847 248 
+Q 594 497 594 1153 
+L 594 3053 
+L 172 3053 
+L 172 3500 
+L 594 3500 
+L 594 4494 
+L 1172 4494 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-72" d="M 2631 2963 
+Q 2534 3019 2420 3045 
+Q 2306 3072 2169 3072 
+Q 1681 3072 1420 2755 
+Q 1159 2438 1159 1844 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1341 3275 1631 3429 
+Q 1922 3584 2338 3584 
+Q 2397 3584 2469 3576 
+Q 2541 3569 2628 3553 
+L 2631 2963 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-69" d="M 603 3500 
+L 1178 3500 
+L 1178 0 
+L 603 0 
+L 603 3500 
+z
+M 603 4863 
+L 1178 4863 
+L 1178 4134 
+L 603 4134 
+L 603 4863 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-78" d="M 3513 3500 
+L 2247 1797 
+L 3578 0 
+L 2900 0 
+L 1881 1375 
+L 863 0 
+L 184 0 
+L 1544 1831 
+L 300 3500 
+L 978 3500 
+L 1906 2253 
+L 2834 3500 
+L 3513 3500 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-20" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-73" d="M 2834 3397 
+L 2834 2853 
+Q 2591 2978 2328 3040 
+Q 2066 3103 1784 3103 
+Q 1356 3103 1142 2972 
+Q 928 2841 928 2578 
+Q 928 2378 1081 2264 
+Q 1234 2150 1697 2047 
+L 1894 2003 
+Q 2506 1872 2764 1633 
+Q 3022 1394 3022 966 
+Q 3022 478 2636 193 
+Q 2250 -91 1575 -91 
+Q 1294 -91 989 -36 
+Q 684 19 347 128 
+L 347 722 
+Q 666 556 975 473 
+Q 1284 391 1588 391 
+Q 1994 391 2212 530 
+Q 2431 669 2431 922 
+Q 2431 1156 2273 1281 
+Q 2116 1406 1581 1522 
+L 1381 1569 
+Q 847 1681 609 1914 
+Q 372 2147 372 2553 
+Q 372 3047 722 3315 
+Q 1072 3584 1716 3584 
+Q 2034 3584 2315 3537 
+Q 2597 3491 2834 3397 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-7a" d="M 353 3500 
+L 3084 3500 
+L 3084 2975 
+L 922 459 
+L 3084 459 
+L 3084 0 
+L 275 0 
+L 275 525 
+L 2438 3041 
+L 353 3041 
+L 353 3500 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-65" d="M 3597 1894 
+L 3597 1613 
+L 953 1613 
+Q 991 1019 1311 708 
+Q 1631 397 2203 397 
+Q 2534 397 2845 478 
+Q 3156 559 3463 722 
+L 3463 178 
+Q 3153 47 2828 -22 
+Q 2503 -91 2169 -91 
+Q 1331 -91 842 396 
+Q 353 884 353 1716 
+Q 353 2575 817 3079 
+Q 1281 3584 2069 3584 
+Q 2775 3584 3186 3129 
+Q 3597 2675 3597 1894 
+z
+M 3022 2063 
+Q 3016 2534 2758 2815 
+Q 2500 3097 2075 3097 
+Q 1594 3097 1305 2825 
+Q 1016 2553 972 2059 
+L 3022 2063 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-28" d="M 1984 4856 
+Q 1566 4138 1362 3434 
+Q 1159 2731 1159 2009 
+Q 1159 1288 1364 580 
+Q 1569 -128 1984 -844 
+L 1484 -844 
+Q 1016 -109 783 600 
+Q 550 1309 550 2009 
+Q 550 2706 781 3412 
+Q 1013 4119 1484 4856 
+L 1984 4856 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-Oblique-6e" d="M 3566 2113 
+L 3156 0 
+L 2578 0 
+L 2988 2091 
+Q 3016 2238 3031 2350 
+Q 3047 2463 3047 2528 
+Q 3047 2791 2881 2937 
+Q 2716 3084 2419 3084 
+Q 1956 3084 1622 2776 
+Q 1288 2469 1184 1941 
+L 800 0 
+L 225 0 
+L 903 3500 
+L 1478 3500 
+L 1363 2950 
+Q 1603 3253 1940 3418 
+Q 2278 3584 2650 3584 
+Q 3113 3584 3367 3334 
+Q 3622 3084 3622 2631 
+Q 3622 2519 3608 2391 
+Q 3594 2263 3566 2113 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-d7" d="M 4488 3438 
+L 3059 2003 
+L 4488 575 
+L 4116 197 
+L 2681 1631 
+L 1247 197 
+L 878 575 
+L 2303 2003 
+L 878 3438 
+L 1247 3816 
+L 2681 2381 
+L 4116 3816 
+L 4488 3438 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-29" d="M 513 4856 
+L 1013 4856 
+Q 1481 4119 1714 3412 
+Q 1947 2706 1947 2009 
+Q 1947 1309 1714 600 
+Q 1481 -109 1013 -844 
+L 513 -844 
+Q 928 -128 1133 580 
+Q 1338 1288 1338 2009 
+Q 1338 2731 1133 3434 
+Q 928 4138 513 4856 
+z
+" transform="scale(0.015625)"/>
+      </defs>
+      <use xlink:href="#DejaVuSans-4d" transform="translate(0 0.015625)"/>
+      <use xlink:href="#DejaVuSans-61" transform="translate(86.279297 0.015625)"/>
+      <use xlink:href="#DejaVuSans-74" transform="translate(147.558594 0.015625)"/>
+      <use xlink:href="#DejaVuSans-72" transform="translate(186.767578 0.015625)"/>
+      <use xlink:href="#DejaVuSans-69" transform="translate(227.880859 0.015625)"/>
+      <use xlink:href="#DejaVuSans-78" transform="translate(255.664062 0.015625)"/>
+      <use xlink:href="#DejaVuSans-20" transform="translate(314.84375 0.015625)"/>
+      <use xlink:href="#DejaVuSans-73" transform="translate(346.630859 0.015625)"/>
+      <use xlink:href="#DejaVuSans-69" transform="translate(398.730469 0.015625)"/>
+      <use xlink:href="#DejaVuSans-7a" transform="translate(426.513672 0.015625)"/>
+      <use xlink:href="#DejaVuSans-65" transform="translate(479.003906 0.015625)"/>
+      <use xlink:href="#DejaVuSans-20" transform="translate(540.527344 0.015625)"/>
+      <use xlink:href="#DejaVuSans-28" transform="translate(572.314453 0.015625)"/>
+      <use xlink:href="#DejaVuSans-Oblique-6e" transform="translate(611.328125 0.015625)"/>
+      <use xlink:href="#DejaVuSans-d7" transform="translate(694.189453 0.015625)"/>
+      <use xlink:href="#DejaVuSans-Oblique-6e" transform="translate(797.460938 0.015625)"/>
+      <use xlink:href="#DejaVuSans-29" transform="translate(860.839844 0.015625)"/>
+     </g>
+    </g>
+   </g>
+   <g id="matplotlib.axis_2">
+    <g id="ytick_1">
+     <g id="line2d_9">
+      <path d="M 72 320.4 
+L 518.4 320.4 
+" clip-path="url(#pc0259ccbb5)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_10">
+      <!-- 0 -->
+      <g style="fill: #262626" transform="translate(55.50125 324.579141)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-30"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_2">
+     <g id="line2d_10">
+      <path d="M 72 262.193219 
+L 518.4 262.193219 
+" clip-path="url(#pc0259ccbb5)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_11">
+      <!-- 5 -->
+      <g style="fill: #262626" transform="translate(55.50125 266.37236)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-35" d="M 691 4666 
+L 3169 4666 
+L 3169 4134 
+L 1269 4134 
+L 1269 2991 
+Q 1406 3038 1543 3061 
+Q 1681 3084 1819 3084 
+Q 2600 3084 3056 2656 
+Q 3513 2228 3513 1497 
+Q 3513 744 3044 326 
+Q 2575 -91 1722 -91 
+Q 1428 -91 1123 -41 
+Q 819 9 494 109 
+L 494 744 
+Q 775 591 1075 516 
+Q 1375 441 1709 441 
+Q 2250 441 2565 725 
+Q 2881 1009 2881 1497 
+Q 2881 1984 2565 2268 
+Q 2250 2553 1709 2553 
+Q 1456 2553 1204 2497 
+Q 953 2441 691 2322 
+L 691 4666 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-35"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_3">
+     <g id="line2d_11">
+      <path d="M 72 203.986439 
+L 518.4 203.986439 
+" clip-path="url(#pc0259ccbb5)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_12">
+      <!-- 10 -->
+      <g style="fill: #262626" transform="translate(48.5025 208.165579)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-30" x="63.623047"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_4">
+     <g id="line2d_12">
+      <path d="M 72 145.779658 
+L 518.4 145.779658 
+" clip-path="url(#pc0259ccbb5)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_13">
+      <!-- 15 -->
+      <g style="fill: #262626" transform="translate(48.5025 149.958799)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-35" x="63.623047"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_5">
+     <g id="line2d_13">
+      <path d="M 72 87.572878 
+L 518.4 87.572878 
+" clip-path="url(#pc0259ccbb5)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_14">
+      <!-- 20 -->
+      <g style="fill: #262626" transform="translate(48.5025 91.752018)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-32"/>
+       <use xlink:href="#DejaVuSans-30" x="63.623047"/>
+      </g>
+     </g>
+    </g>
+    <g id="text_15">
+     <!-- GFLOPS -->
+     <g style="fill: #262626" transform="translate(42.006875 205.175625)rotate(-90)scale(0.12 -0.12)">
+      <defs>
+       <path id="DejaVuSans-47" d="M 3809 666 
+L 3809 1919 
+L 2778 1919 
+L 2778 2438 
+L 4434 2438 
+L 4434 434 
+Q 4069 175 3628 42 
+Q 3188 -91 2688 -91 
+Q 1594 -91 976 548 
+Q 359 1188 359 2328 
+Q 359 3472 976 4111 
+Q 1594 4750 2688 4750 
+Q 3144 4750 3555 4637 
+Q 3966 4525 4313 4306 
+L 4313 3634 
+Q 3963 3931 3569 4081 
+Q 3175 4231 2741 4231 
+Q 1884 4231 1454 3753 
+Q 1025 3275 1025 2328 
+Q 1025 1384 1454 906 
+Q 1884 428 2741 428 
+Q 3075 428 3337 486 
+Q 3600 544 3809 666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-46" d="M 628 4666 
+L 3309 4666 
+L 3309 4134 
+L 1259 4134 
+L 1259 2759 
+L 3109 2759 
+L 3109 2228 
+L 1259 2228 
+L 1259 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-4c" d="M 628 4666 
+L 1259 4666 
+L 1259 531 
+L 3531 531 
+L 3531 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-4f" d="M 2522 4238 
+Q 1834 4238 1429 3725 
+Q 1025 3213 1025 2328 
+Q 1025 1447 1429 934 
+Q 1834 422 2522 422 
+Q 3209 422 3611 934 
+Q 4013 1447 4013 2328 
+Q 4013 3213 3611 3725 
+Q 3209 4238 2522 4238 
+z
+M 2522 4750 
+Q 3503 4750 4090 4092 
+Q 4678 3434 4678 2328 
+Q 4678 1225 4090 567 
+Q 3503 -91 2522 -91 
+Q 1538 -91 948 565 
+Q 359 1222 359 2328 
+Q 359 3434 948 4092 
+Q 1538 4750 2522 4750 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-50" d="M 1259 4147 
+L 1259 2394 
+L 2053 2394 
+Q 2494 2394 2734 2622 
+Q 2975 2850 2975 3272 
+Q 2975 3691 2734 3919 
+Q 2494 4147 2053 4147 
+L 1259 4147 
+z
+M 628 4666 
+L 2053 4666 
+Q 2838 4666 3239 4311 
+Q 3641 3956 3641 3272 
+Q 3641 2581 3239 2228 
+Q 2838 1875 2053 1875 
+L 1259 1875 
+L 1259 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-53" d="M 3425 4513 
+L 3425 3897 
+Q 3066 4069 2747 4153 
+Q 2428 4238 2131 4238 
+Q 1616 4238 1336 4038 
+Q 1056 3838 1056 3469 
+Q 1056 3159 1242 3001 
+Q 1428 2844 1947 2747 
+L 2328 2669 
+Q 3034 2534 3370 2195 
+Q 3706 1856 3706 1288 
+Q 3706 609 3251 259 
+Q 2797 -91 1919 -91 
+Q 1588 -91 1214 -16 
+Q 841 59 441 206 
+L 441 856 
+Q 825 641 1194 531 
+Q 1563 422 1919 422 
+Q 2459 422 2753 634 
+Q 3047 847 3047 1241 
+Q 3047 1584 2836 1778 
+Q 2625 1972 2144 2069 
+L 1759 2144 
+Q 1053 2284 737 2584 
+Q 422 2884 422 3419 
+Q 422 4038 858 4394 
+Q 1294 4750 2059 4750 
+Q 2388 4750 2728 4690 
+Q 3069 4631 3425 4513 
+z
+" transform="scale(0.015625)"/>
+      </defs>
+      <use xlink:href="#DejaVuSans-47"/>
+      <use xlink:href="#DejaVuSans-46" x="77.490234"/>
+      <use xlink:href="#DejaVuSans-4c" x="135.009766"/>
+      <use xlink:href="#DejaVuSans-4f" x="187.097656"/>
+      <use xlink:href="#DejaVuSans-50" x="265.808594"/>
+      <use xlink:href="#DejaVuSans-53" x="326.111328"/>
+     </g>
+    </g>
+   </g>
+   <g id="line2d_14">
+    <path d="M 72 309.671326 
+L 83.446154 311.498075 
+L 94.892308 312.017819 
+L 106.338462 312.236301 
+L 117.784615 312.38036 
+L 129.230769 312.50921 
+L 140.676923 312.532968 
+L 152.123077 312.600467 
+L 163.569231 312.647151 
+L 175.015385 312.627851 
+L 186.461538 312.67057 
+L 197.907692 312.701045 
+L 209.353846 312.717554 
+L 220.8 312.708816 
+L 232.246154 312.710882 
+L 243.692308 315.407713 
+L 255.138462 312.728222 
+L 266.584615 312.801696 
+L 278.030769 313.848938 
+L 289.476923 313.911222 
+L 300.923077 313.705591 
+L 312.369231 313.756657 
+L 323.815385 313.596682 
+L 335.261538 315.165161 
+L 346.707692 313.8018 
+L 358.153846 313.675801 
+L 369.6 313.837237 
+L 381.046154 313.44832 
+L 392.492308 313.911027 
+L 403.938462 313.481161 
+L 415.384615 313.836402 
+L 426.830769 318.50093 
+L 438.276923 313.158347 
+L 449.723077 314.031662 
+L 461.169231 313.386046 
+L 472.615385 313.784265 
+L 484.061538 313.695596 
+L 495.507692 313.710558 
+L 506.953846 313.419903 
+L 518.4 315.480792 
+" clip-path="url(#pc0259ccbb5)" style="fill: none; stroke: #4c72b0; stroke-width: 1.5; stroke-linecap: round"/>
+   </g>
+   <g id="line2d_15">
+    <path d="M 72 310.100473 
+L 83.446154 311.723061 
+L 94.892308 312.164771 
+L 106.338462 312.369963 
+L 117.784615 312.46459 
+L 129.230769 312.566557 
+L 140.676923 312.627815 
+L 152.123077 312.735949 
+L 163.569231 312.765275 
+L 175.015385 312.823565 
+L 186.461538 312.89925 
+L 197.907692 312.923884 
+L 209.353846 312.952443 
+L 220.8 313.044405 
+L 232.246154 313.118095 
+L 243.692308 313.154137 
+L 255.138462 313.397423 
+L 266.584615 313.377379 
+L 278.030769 313.286294 
+L 289.476923 313.310017 
+L 300.923077 313.404439 
+L 312.369231 313.41027 
+L 323.815385 313.35796 
+L 335.261538 313.469655 
+L 346.707692 313.731341 
+L 358.153846 313.628065 
+L 369.6 313.918748 
+L 381.046154 313.780767 
+L 392.492308 313.762085 
+L 403.938462 313.712702 
+L 415.384615 313.680426 
+L 426.830769 313.69078 
+L 438.276923 313.655243 
+L 449.723077 313.801 
+L 461.169231 313.856531 
+L 472.615385 313.749041 
+L 484.061538 313.756204 
+L 495.507692 313.765085 
+L 506.953846 313.733048 
+L 518.4 313.742958 
+" clip-path="url(#pc0259ccbb5)" style="fill: none; stroke: #dd8452; stroke-width: 1.5; stroke-linecap: round"/>
+   </g>
+   <g id="line2d_16">
+    <path d="M 72 270.883044 
+L 83.446154 250.335192 
+L 94.892308 241.217988 
+L 106.338462 239.934947 
+L 117.784615 240.533942 
+L 129.230769 240.282678 
+L 140.676923 243.292926 
+L 152.123077 245.477322 
+L 163.569231 246.74824 
+L 175.015385 248.423657 
+L 186.461538 251.925124 
+L 197.907692 253.275787 
+L 209.353846 256.016235 
+L 220.8 256.125166 
+L 232.246154 260.496327 
+L 243.692308 265.344168 
+L 255.138462 269.443769 
+L 266.584615 275.716187 
+L 278.030769 279.160382 
+L 289.476923 284.835335 
+L 300.923077 287.083554 
+L 312.369231 289.513765 
+L 323.815385 291.436354 
+L 335.261538 292.78857 
+L 346.707692 292.867424 
+L 358.153846 292.906062 
+L 369.6 292.525449 
+L 381.046154 292.674721 
+L 392.492308 294.161419 
+L 403.938462 293.327772 
+L 415.384615 294.184601 
+L 426.830769 294.107848 
+L 438.276923 293.676165 
+L 449.723077 293.709604 
+L 461.169231 294.153819 
+L 472.615385 294.084407 
+L 484.061538 293.36601 
+L 495.507692 292.413419 
+L 506.953846 293.865577 
+L 518.4 293.967362 
+" clip-path="url(#pc0259ccbb5)" style="fill: none; stroke: #55a868; stroke-width: 1.5; stroke-linecap: round"/>
+   </g>
+   <g id="line2d_17">
+    <path d="M 72 136.479878 
+L 83.446154 56.309568 
+L 94.892308 113.489862 
+L 106.338462 116.952556 
+L 117.784615 119.488506 
+L 129.230769 129.668021 
+L 140.676923 134.074172 
+L 152.123077 140.348179 
+L 163.569231 146.208392 
+L 175.015385 148.580855 
+L 186.461538 145.687135 
+L 197.907692 153.104332 
+L 209.353846 155.817924 
+L 220.8 166.367399 
+L 232.246154 186.465197 
+L 243.692308 212.969007 
+L 255.138462 228.798185 
+L 266.584615 242.299424 
+L 278.030769 252.946887 
+L 289.476923 264.280352 
+L 300.923077 260.045711 
+L 312.369231 246.46932 
+L 323.815385 252.022952 
+L 335.261538 275.665034 
+L 346.707692 256.224693 
+L 358.153846 260.129384 
+L 369.6 255.160551 
+L 381.046154 252.305103 
+L 392.492308 261.313163 
+L 403.938462 260.210566 
+L 415.384615 260.971397 
+L 426.830769 276.441432 
+L 438.276923 260.644667 
+L 449.723077 258.857527 
+L 461.169231 253.939829 
+L 472.615385 252.91023 
+L 484.061538 258.367244 
+L 495.507692 267.230819 
+L 506.953846 271.063016 
+L 518.4 283.659277 
+" clip-path="url(#pc0259ccbb5)" style="fill: none; stroke: #c44e52; stroke-width: 1.5; stroke-linecap: round"/>
+   </g>
+   <g id="line2d_18">
+    <path d="M 72 136.479878 
+L 83.446154 62.911829 
+L 94.892308 113.489862 
+L 106.338462 120.893427 
+L 117.784615 126.741628 
+L 129.230769 132.376183 
+L 140.676923 138.224994 
+L 152.123077 146.338812 
+L 163.569231 144.938347 
+L 175.015385 150.306083 
+L 186.461538 151.357504 
+L 197.907692 147.204375 
+L 209.353846 147.740523 
+L 220.8 148.47389 
+L 232.246154 143.409658 
+L 243.692308 145.629253 
+L 255.138462 140.487126 
+L 266.584615 136.533924 
+L 278.030769 138.805922 
+L 289.476923 137.368886 
+L 300.923077 134.588138 
+L 312.369231 138.938768 
+L 323.815385 140.456141 
+L 335.261538 142.70316 
+L 346.707692 137.373466 
+L 358.153846 134.139357 
+L 369.6 135.082407 
+L 381.046154 133.469061 
+L 392.492308 132.046951 
+L 403.938462 132.576325 
+L 415.384615 135.306496 
+L 426.830769 187.347797 
+L 438.276923 131.519839 
+L 449.723077 136.72422 
+L 461.169231 130.629802 
+L 472.615385 131.495713 
+L 484.061538 133.317256 
+L 495.507692 133.160196 
+L 506.953846 138.508473 
+L 518.4 141.851091 
+" clip-path="url(#pc0259ccbb5)" style="fill: none; stroke: #8172b3; stroke-width: 1.5; stroke-linecap: round"/>
+   </g>
+   <g id="patch_3">
+    <path d="M 72 320.4 
+L 72 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_4">
+    <path d="M 518.4 320.4 
+L 518.4 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_5">
+    <path d="M 72 320.4 
+L 518.4 320.4 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_6">
+    <path d="M 72 43.2 
+L 518.4 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="text_16">
+    <!-- Matrix multiplication -->
+    <g style="fill: #262626" transform="translate(223.167812 23.2)scale(0.14 -0.14)">
+     <defs>
+      <path id="DejaVuSans-6d" d="M 3328 2828 
+Q 3544 3216 3844 3400 
+Q 4144 3584 4550 3584 
+Q 5097 3584 5394 3201 
+Q 5691 2819 5691 2113 
+L 5691 0 
+L 5113 0 
+L 5113 2094 
+Q 5113 2597 4934 2840 
+Q 4756 3084 4391 3084 
+Q 3944 3084 3684 2787 
+Q 3425 2491 3425 1978 
+L 3425 0 
+L 2847 0 
+L 2847 2094 
+Q 2847 2600 2669 2842 
+Q 2491 3084 2119 3084 
+Q 1678 3084 1418 2786 
+Q 1159 2488 1159 1978 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1356 3278 1631 3431 
+Q 1906 3584 2284 3584 
+Q 2666 3584 2933 3390 
+Q 3200 3197 3328 2828 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-75" d="M 544 1381 
+L 544 3500 
+L 1119 3500 
+L 1119 1403 
+Q 1119 906 1312 657 
+Q 1506 409 1894 409 
+Q 2359 409 2629 706 
+Q 2900 1003 2900 1516 
+L 2900 3500 
+L 3475 3500 
+L 3475 0 
+L 2900 0 
+L 2900 538 
+Q 2691 219 2414 64 
+Q 2138 -91 1772 -91 
+Q 1169 -91 856 284 
+Q 544 659 544 1381 
+z
+M 1991 3584 
+L 1991 3584 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-6c" d="M 603 4863 
+L 1178 4863 
+L 1178 0 
+L 603 0 
+L 603 4863 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-70" d="M 1159 525 
+L 1159 -1331 
+L 581 -1331 
+L 581 3500 
+L 1159 3500 
+L 1159 2969 
+Q 1341 3281 1617 3432 
+Q 1894 3584 2278 3584 
+Q 2916 3584 3314 3078 
+Q 3713 2572 3713 1747 
+Q 3713 922 3314 415 
+Q 2916 -91 2278 -91 
+Q 1894 -91 1617 61 
+Q 1341 213 1159 525 
+z
+M 3116 1747 
+Q 3116 2381 2855 2742 
+Q 2594 3103 2138 3103 
+Q 1681 3103 1420 2742 
+Q 1159 2381 1159 1747 
+Q 1159 1113 1420 752 
+Q 1681 391 2138 391 
+Q 2594 391 2855 752 
+Q 3116 1113 3116 1747 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-63" d="M 3122 3366 
+L 3122 2828 
+Q 2878 2963 2633 3030 
+Q 2388 3097 2138 3097 
+Q 1578 3097 1268 2742 
+Q 959 2388 959 1747 
+Q 959 1106 1268 751 
+Q 1578 397 2138 397 
+Q 2388 397 2633 464 
+Q 2878 531 3122 666 
+L 3122 134 
+Q 2881 22 2623 -34 
+Q 2366 -91 2075 -91 
+Q 1284 -91 818 406 
+Q 353 903 353 1747 
+Q 353 2603 823 3093 
+Q 1294 3584 2113 3584 
+Q 2378 3584 2631 3529 
+Q 2884 3475 3122 3366 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-6f" d="M 1959 3097 
+Q 1497 3097 1228 2736 
+Q 959 2375 959 1747 
+Q 959 1119 1226 758 
+Q 1494 397 1959 397 
+Q 2419 397 2687 759 
+Q 2956 1122 2956 1747 
+Q 2956 2369 2687 2733 
+Q 2419 3097 1959 3097 
+z
+M 1959 3584 
+Q 2709 3584 3137 3096 
+Q 3566 2609 3566 1747 
+Q 3566 888 3137 398 
+Q 2709 -91 1959 -91 
+Q 1206 -91 779 398 
+Q 353 888 353 1747 
+Q 353 2609 779 3096 
+Q 1206 3584 1959 3584 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-6e" d="M 3513 2113 
+L 3513 0 
+L 2938 0 
+L 2938 2094 
+Q 2938 2591 2744 2837 
+Q 2550 3084 2163 3084 
+Q 1697 3084 1428 2787 
+Q 1159 2491 1159 1978 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1366 3272 1645 3428 
+Q 1925 3584 2291 3584 
+Q 2894 3584 3203 3211 
+Q 3513 2838 3513 2113 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-4d"/>
+     <use xlink:href="#DejaVuSans-61" x="86.279297"/>
+     <use xlink:href="#DejaVuSans-74" x="147.558594"/>
+     <use xlink:href="#DejaVuSans-72" x="186.767578"/>
+     <use xlink:href="#DejaVuSans-69" x="227.880859"/>
+     <use xlink:href="#DejaVuSans-78" x="255.664062"/>
+     <use xlink:href="#DejaVuSans-20" x="314.84375"/>
+     <use xlink:href="#DejaVuSans-6d" x="346.630859"/>
+     <use xlink:href="#DejaVuSans-75" x="444.042969"/>
+     <use xlink:href="#DejaVuSans-6c" x="507.421875"/>
+     <use xlink:href="#DejaVuSans-74" x="535.205078"/>
+     <use xlink:href="#DejaVuSans-69" x="574.414062"/>
+     <use xlink:href="#DejaVuSans-70" x="602.197266"/>
+     <use xlink:href="#DejaVuSans-6c" x="665.673828"/>
+     <use xlink:href="#DejaVuSans-69" x="693.457031"/>
+     <use xlink:href="#DejaVuSans-63" x="721.240234"/>
+     <use xlink:href="#DejaVuSans-61" x="776.220703"/>
+     <use xlink:href="#DejaVuSans-74" x="837.5"/>
+     <use xlink:href="#DejaVuSans-69" x="876.708984"/>
+     <use xlink:href="#DejaVuSans-6f" x="904.492188"/>
+     <use xlink:href="#DejaVuSans-6e" x="965.673828"/>
+    </g>
+   </g>
+   <g id="legend_1">
+    <g id="patch_7">
+     <path d="M 246.863594 132.729687 
+L 343.536406 132.729687 
+Q 345.736406 132.729687 345.736406 130.529687 
+L 345.736406 50.9 
+Q 345.736406 48.7 343.536406 48.7 
+L 246.863594 48.7 
+Q 244.663594 48.7 244.663594 50.9 
+L 244.663594 130.529687 
+Q 244.663594 132.729687 246.863594 132.729687 
+z
+" style="fill: #ffffff; opacity: 0.8; stroke: #cccccc; stroke-linejoin: miter"/>
+    </g>
+    <g id="line2d_19">
+     <path d="M 249.063594 57.608281 
+L 260.063594 57.608281 
+L 271.063594 57.608281 
+" style="fill: none; stroke: #4c72b0; stroke-width: 1.5; stroke-linecap: round"/>
+    </g>
+    <g id="text_17">
+     <!-- naive -->
+     <g style="fill: #262626" transform="translate(279.863594 61.458281)scale(0.11 -0.11)">
+      <defs>
+       <path id="DejaVuSans-76" d="M 191 3500 
+L 800 3500 
+L 1894 563 
+L 2988 3500 
+L 3597 3500 
+L 2284 0 
+L 1503 0 
+L 191 3500 
+z
+" transform="scale(0.015625)"/>
+      </defs>
+      <use xlink:href="#DejaVuSans-6e"/>
+      <use xlink:href="#DejaVuSans-61" x="63.378906"/>
+      <use xlink:href="#DejaVuSans-69" x="124.658203"/>
+      <use xlink:href="#DejaVuSans-76" x="152.441406"/>
+      <use xlink:href="#DejaVuSans-65" x="211.621094"/>
+     </g>
+    </g>
+    <g id="line2d_20">
+     <path d="M 249.063594 73.754219 
+L 260.063594 73.754219 
+L 271.063594 73.754219 
+" style="fill: none; stroke: #dd8452; stroke-width: 1.5; stroke-linecap: round"/>
+    </g>
+    <g id="text_18">
+     <!-- transposed -->
+     <g style="fill: #262626" transform="translate(279.863594 77.604219)scale(0.11 -0.11)">
+      <defs>
+       <path id="DejaVuSans-64" d="M 2906 2969 
+L 2906 4863 
+L 3481 4863 
+L 3481 0 
+L 2906 0 
+L 2906 525 
+Q 2725 213 2448 61 
+Q 2172 -91 1784 -91 
+Q 1150 -91 751 415 
+Q 353 922 353 1747 
+Q 353 2572 751 3078 
+Q 1150 3584 1784 3584 
+Q 2172 3584 2448 3432 
+Q 2725 3281 2906 2969 
+z
+M 947 1747 
+Q 947 1113 1208 752 
+Q 1469 391 1925 391 
+Q 2381 391 2643 752 
+Q 2906 1113 2906 1747 
+Q 2906 2381 2643 2742 
+Q 2381 3103 1925 3103 
+Q 1469 3103 1208 2742 
+Q 947 2381 947 1747 
+z
+" transform="scale(0.015625)"/>
+      </defs>
+      <use xlink:href="#DejaVuSans-74"/>
+      <use xlink:href="#DejaVuSans-72" x="39.208984"/>
+      <use xlink:href="#DejaVuSans-61" x="80.322266"/>
+      <use xlink:href="#DejaVuSans-6e" x="141.601562"/>
+      <use xlink:href="#DejaVuSans-73" x="204.980469"/>
+      <use xlink:href="#DejaVuSans-70" x="257.080078"/>
+      <use xlink:href="#DejaVuSans-6f" x="320.556641"/>
+      <use xlink:href="#DejaVuSans-73" x="381.738281"/>
+      <use xlink:href="#DejaVuSans-65" x="433.837891"/>
+      <use xlink:href="#DejaVuSans-64" x="495.361328"/>
+     </g>
+    </g>
+    <g id="line2d_21">
+     <path d="M 249.063594 89.900156 
+L 260.063594 89.900156 
+L 271.063594 89.900156 
+" style="fill: none; stroke: #55a868; stroke-width: 1.5; stroke-linecap: round"/>
+    </g>
+    <g id="text_19">
+     <!-- vectorized -->
+     <g style="fill: #262626" transform="translate(279.863594 93.750156)scale(0.11 -0.11)">
+      <use xlink:href="#DejaVuSans-76"/>
+      <use xlink:href="#DejaVuSans-65" x="59.179688"/>
+      <use xlink:href="#DejaVuSans-63" x="120.703125"/>
+      <use xlink:href="#DejaVuSans-74" x="175.683594"/>
+      <use xlink:href="#DejaVuSans-6f" x="214.892578"/>
+      <use xlink:href="#DejaVuSans-72" x="276.074219"/>
+      <use xlink:href="#DejaVuSans-69" x="317.1875"/>
+      <use xlink:href="#DejaVuSans-7a" x="344.970703"/>
+      <use xlink:href="#DejaVuSans-65" x="397.460938"/>
+      <use xlink:href="#DejaVuSans-64" x="458.984375"/>
+     </g>
+    </g>
+    <g id="line2d_22">
+     <path d="M 249.063594 106.046094 
+L 260.063594 106.046094 
+L 271.063594 106.046094 
+" style="fill: none; stroke: #c44e52; stroke-width: 1.5; stroke-linecap: round"/>
+    </g>
+    <g id="text_20">
+     <!-- kernel -->
+     <g style="fill: #262626" transform="translate(279.863594 109.896094)scale(0.11 -0.11)">
+      <defs>
+       <path id="DejaVuSans-6b" d="M 581 4863 
+L 1159 4863 
+L 1159 1991 
+L 2875 3500 
+L 3609 3500 
+L 1753 1863 
+L 3688 0 
+L 2938 0 
+L 1159 1709 
+L 1159 0 
+L 581 0 
+L 581 4863 
+z
+" transform="scale(0.015625)"/>
+      </defs>
+      <use xlink:href="#DejaVuSans-6b"/>
+      <use xlink:href="#DejaVuSans-65" x="54.285156"/>
+      <use xlink:href="#DejaVuSans-72" x="115.808594"/>
+      <use xlink:href="#DejaVuSans-6e" x="155.171875"/>
+      <use xlink:href="#DejaVuSans-65" x="218.550781"/>
+      <use xlink:href="#DejaVuSans-6c" x="280.074219"/>
+     </g>
+    </g>
+    <g id="line2d_23">
+     <path d="M 249.063594 122.192031 
+L 260.063594 122.192031 
+L 271.063594 122.192031 
+" style="fill: none; stroke: #8172b3; stroke-width: 1.5; stroke-linecap: round"/>
+    </g>
+    <g id="text_21">
+     <!-- blocked -->
+     <g style="fill: #262626" transform="translate(279.863594 126.042031)scale(0.11 -0.11)">
+      <defs>
+       <path id="DejaVuSans-62" d="M 3116 1747 
+Q 3116 2381 2855 2742 
+Q 2594 3103 2138 3103 
+Q 1681 3103 1420 2742 
+Q 1159 2381 1159 1747 
+Q 1159 1113 1420 752 
+Q 1681 391 2138 391 
+Q 2594 391 2855 752 
+Q 3116 1113 3116 1747 
+z
+M 1159 2969 
+Q 1341 3281 1617 3432 
+Q 1894 3584 2278 3584 
+Q 2916 3584 3314 3078 
+Q 3713 2572 3713 1747 
+Q 3713 922 3314 415 
+Q 2916 -91 2278 -91 
+Q 1894 -91 1617 61 
+Q 1341 213 1159 525 
+L 1159 0 
+L 581 0 
+L 581 4863 
+L 1159 4863 
+L 1159 2969 
+z
+" transform="scale(0.015625)"/>
+      </defs>
+      <use xlink:href="#DejaVuSans-62"/>
+      <use xlink:href="#DejaVuSans-6c" x="63.476562"/>
+      <use xlink:href="#DejaVuSans-6f" x="91.259766"/>
+      <use xlink:href="#DejaVuSans-63" x="152.441406"/>
+      <use xlink:href="#DejaVuSans-6b" x="207.421875"/>
+      <use xlink:href="#DejaVuSans-65" x="261.707031"/>
+      <use xlink:href="#DejaVuSans-64" x="323.230469"/>
+     </g>
+    </g>
+   </g>
+  </g>
+ </g>
+ <defs>
+  <clipPath id="pc0259ccbb5">
+   <rect x="72" y="43.2" width="446.4" height="277.2"/>
+  </clipPath>
+ </defs>
+</svg>
diff --git a/content/english/hpc/algorithms/img/mm-kernel-barplot.svg b/content/english/hpc/algorithms/img/mm-kernel-barplot.svg
new file mode 100644
index 00000000..834d8b39
--- /dev/null
+++ b/content/english/hpc/algorithms/img/mm-kernel-barplot.svg
@@ -0,0 +1,1277 @@
+<?xml version="1.0" encoding="utf-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+  "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<svg xmlns:xlink="http://www.w3.org/1999/xlink" width="576pt" height="360pt" viewBox="0 0 576 360" xmlns="http://www.w3.org/2000/svg" version="1.1">
+ <metadata>
+  <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
+   <cc:Work>
+    <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/>
+    <dc:date>2022-04-05T01:18:16.721432</dc:date>
+    <dc:format>image/svg+xml</dc:format>
+    <dc:creator>
+     <cc:Agent>
+      <dc:title>Matplotlib v3.5.1, https://matplotlib.org/</dc:title>
+     </cc:Agent>
+    </dc:creator>
+   </cc:Work>
+  </rdf:RDF>
+ </metadata>
+ <defs>
+  <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style>
+ </defs>
+ <g id="figure_1">
+  <g id="patch_1">
+   <path d="M 0 360 
+L 576 360 
+L 576 0 
+L 0 0 
+z
+" style="fill: #ffffff"/>
+  </g>
+  <g id="axes_1">
+   <g id="patch_2">
+    <path d="M 72 320.4 
+L 518.4 320.4 
+L 518.4 43.2 
+L 72 43.2 
+z
+" style="fill: #ffffff"/>
+   </g>
+   <g id="matplotlib.axis_1">
+    <g id="xtick_1">
+     <g id="line2d_1">
+      <path d="M 135.008612 320.4 
+L 135.008612 43.2 
+" clip-path="url(#p4787e7a7d6)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_1">
+      <!-- naive -->
+      <g style="fill: #262626" transform="translate(121.3508 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-6e" d="M 3513 2113 
+L 3513 0 
+L 2938 0 
+L 2938 2094 
+Q 2938 2591 2744 2837 
+Q 2550 3084 2163 3084 
+Q 1697 3084 1428 2787 
+Q 1159 2491 1159 1978 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1366 3272 1645 3428 
+Q 1925 3584 2291 3584 
+Q 2894 3584 3203 3211 
+Q 3513 2838 3513 2113 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-61" d="M 2194 1759 
+Q 1497 1759 1228 1600 
+Q 959 1441 959 1056 
+Q 959 750 1161 570 
+Q 1363 391 1709 391 
+Q 2188 391 2477 730 
+Q 2766 1069 2766 1631 
+L 2766 1759 
+L 2194 1759 
+z
+M 3341 1997 
+L 3341 0 
+L 2766 0 
+L 2766 531 
+Q 2569 213 2275 61 
+Q 1981 -91 1556 -91 
+Q 1019 -91 701 211 
+Q 384 513 384 1019 
+Q 384 1609 779 1909 
+Q 1175 2209 1959 2209 
+L 2766 2209 
+L 2766 2266 
+Q 2766 2663 2505 2880 
+Q 2244 3097 1772 3097 
+Q 1472 3097 1187 3025 
+Q 903 2953 641 2809 
+L 641 3341 
+Q 956 3463 1253 3523 
+Q 1550 3584 1831 3584 
+Q 2591 3584 2966 3190 
+Q 3341 2797 3341 1997 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-69" d="M 603 3500 
+L 1178 3500 
+L 1178 0 
+L 603 0 
+L 603 3500 
+z
+M 603 4863 
+L 1178 4863 
+L 1178 4134 
+L 603 4134 
+L 603 4863 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-76" d="M 191 3500 
+L 800 3500 
+L 1894 563 
+L 2988 3500 
+L 3597 3500 
+L 2284 0 
+L 1503 0 
+L 191 3500 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-65" d="M 3597 1894 
+L 3597 1613 
+L 953 1613 
+Q 991 1019 1311 708 
+Q 1631 397 2203 397 
+Q 2534 397 2845 478 
+Q 3156 559 3463 722 
+L 3463 178 
+Q 3153 47 2828 -22 
+Q 2503 -91 2169 -91 
+Q 1331 -91 842 396 
+Q 353 884 353 1716 
+Q 353 2575 817 3079 
+Q 1281 3584 2069 3584 
+Q 2775 3584 3186 3129 
+Q 3597 2675 3597 1894 
+z
+M 3022 2063 
+Q 3016 2534 2758 2815 
+Q 2500 3097 2075 3097 
+Q 1594 3097 1305 2825 
+Q 1016 2553 972 2059 
+L 3022 2063 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-6e"/>
+       <use xlink:href="#DejaVuSans-61" x="63.378906"/>
+       <use xlink:href="#DejaVuSans-69" x="124.658203"/>
+       <use xlink:href="#DejaVuSans-76" x="152.441406"/>
+       <use xlink:href="#DejaVuSans-65" x="211.621094"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_2">
+     <g id="line2d_2">
+      <path d="M 241.802871 320.4 
+L 241.802871 43.2 
+" clip-path="url(#p4787e7a7d6)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_2">
+      <!-- transposed -->
+      <g style="fill: #262626" transform="translate(213.860683 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-74" d="M 1172 4494 
+L 1172 3500 
+L 2356 3500 
+L 2356 3053 
+L 1172 3053 
+L 1172 1153 
+Q 1172 725 1289 603 
+Q 1406 481 1766 481 
+L 2356 481 
+L 2356 0 
+L 1766 0 
+Q 1100 0 847 248 
+Q 594 497 594 1153 
+L 594 3053 
+L 172 3053 
+L 172 3500 
+L 594 3500 
+L 594 4494 
+L 1172 4494 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-72" d="M 2631 2963 
+Q 2534 3019 2420 3045 
+Q 2306 3072 2169 3072 
+Q 1681 3072 1420 2755 
+Q 1159 2438 1159 1844 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1341 3275 1631 3429 
+Q 1922 3584 2338 3584 
+Q 2397 3584 2469 3576 
+Q 2541 3569 2628 3553 
+L 2631 2963 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-73" d="M 2834 3397 
+L 2834 2853 
+Q 2591 2978 2328 3040 
+Q 2066 3103 1784 3103 
+Q 1356 3103 1142 2972 
+Q 928 2841 928 2578 
+Q 928 2378 1081 2264 
+Q 1234 2150 1697 2047 
+L 1894 2003 
+Q 2506 1872 2764 1633 
+Q 3022 1394 3022 966 
+Q 3022 478 2636 193 
+Q 2250 -91 1575 -91 
+Q 1294 -91 989 -36 
+Q 684 19 347 128 
+L 347 722 
+Q 666 556 975 473 
+Q 1284 391 1588 391 
+Q 1994 391 2212 530 
+Q 2431 669 2431 922 
+Q 2431 1156 2273 1281 
+Q 2116 1406 1581 1522 
+L 1381 1569 
+Q 847 1681 609 1914 
+Q 372 2147 372 2553 
+Q 372 3047 722 3315 
+Q 1072 3584 1716 3584 
+Q 2034 3584 2315 3537 
+Q 2597 3491 2834 3397 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-70" d="M 1159 525 
+L 1159 -1331 
+L 581 -1331 
+L 581 3500 
+L 1159 3500 
+L 1159 2969 
+Q 1341 3281 1617 3432 
+Q 1894 3584 2278 3584 
+Q 2916 3584 3314 3078 
+Q 3713 2572 3713 1747 
+Q 3713 922 3314 415 
+Q 2916 -91 2278 -91 
+Q 1894 -91 1617 61 
+Q 1341 213 1159 525 
+z
+M 3116 1747 
+Q 3116 2381 2855 2742 
+Q 2594 3103 2138 3103 
+Q 1681 3103 1420 2742 
+Q 1159 2381 1159 1747 
+Q 1159 1113 1420 752 
+Q 1681 391 2138 391 
+Q 2594 391 2855 752 
+Q 3116 1113 3116 1747 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-6f" d="M 1959 3097 
+Q 1497 3097 1228 2736 
+Q 959 2375 959 1747 
+Q 959 1119 1226 758 
+Q 1494 397 1959 397 
+Q 2419 397 2687 759 
+Q 2956 1122 2956 1747 
+Q 2956 2369 2687 2733 
+Q 2419 3097 1959 3097 
+z
+M 1959 3584 
+Q 2709 3584 3137 3096 
+Q 3566 2609 3566 1747 
+Q 3566 888 3137 398 
+Q 2709 -91 1959 -91 
+Q 1206 -91 779 398 
+Q 353 888 353 1747 
+Q 353 2609 779 3096 
+Q 1206 3584 1959 3584 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-64" d="M 2906 2969 
+L 2906 4863 
+L 3481 4863 
+L 3481 0 
+L 2906 0 
+L 2906 525 
+Q 2725 213 2448 61 
+Q 2172 -91 1784 -91 
+Q 1150 -91 751 415 
+Q 353 922 353 1747 
+Q 353 2572 751 3078 
+Q 1150 3584 1784 3584 
+Q 2172 3584 2448 3432 
+Q 2725 3281 2906 2969 
+z
+M 947 1747 
+Q 947 1113 1208 752 
+Q 1469 391 1925 391 
+Q 2381 391 2643 752 
+Q 2906 1113 2906 1747 
+Q 2906 2381 2643 2742 
+Q 2381 3103 1925 3103 
+Q 1469 3103 1208 2742 
+Q 947 2381 947 1747 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-74"/>
+       <use xlink:href="#DejaVuSans-72" x="39.208984"/>
+       <use xlink:href="#DejaVuSans-61" x="80.322266"/>
+       <use xlink:href="#DejaVuSans-6e" x="141.601562"/>
+       <use xlink:href="#DejaVuSans-73" x="204.980469"/>
+       <use xlink:href="#DejaVuSans-70" x="257.080078"/>
+       <use xlink:href="#DejaVuSans-6f" x="320.556641"/>
+       <use xlink:href="#DejaVuSans-73" x="381.738281"/>
+       <use xlink:href="#DejaVuSans-65" x="433.837891"/>
+       <use xlink:href="#DejaVuSans-64" x="495.361328"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_3">
+     <g id="line2d_3">
+      <path d="M 348.597129 320.4 
+L 348.597129 43.2 
+" clip-path="url(#p4787e7a7d6)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_3">
+      <!-- vectorized -->
+      <g style="fill: #262626" transform="translate(322.47291 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-63" d="M 3122 3366 
+L 3122 2828 
+Q 2878 2963 2633 3030 
+Q 2388 3097 2138 3097 
+Q 1578 3097 1268 2742 
+Q 959 2388 959 1747 
+Q 959 1106 1268 751 
+Q 1578 397 2138 397 
+Q 2388 397 2633 464 
+Q 2878 531 3122 666 
+L 3122 134 
+Q 2881 22 2623 -34 
+Q 2366 -91 2075 -91 
+Q 1284 -91 818 406 
+Q 353 903 353 1747 
+Q 353 2603 823 3093 
+Q 1294 3584 2113 3584 
+Q 2378 3584 2631 3529 
+Q 2884 3475 3122 3366 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-7a" d="M 353 3500 
+L 3084 3500 
+L 3084 2975 
+L 922 459 
+L 3084 459 
+L 3084 0 
+L 275 0 
+L 275 525 
+L 2438 3041 
+L 353 3041 
+L 353 3500 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-76"/>
+       <use xlink:href="#DejaVuSans-65" x="59.179688"/>
+       <use xlink:href="#DejaVuSans-63" x="120.703125"/>
+       <use xlink:href="#DejaVuSans-74" x="175.683594"/>
+       <use xlink:href="#DejaVuSans-6f" x="214.892578"/>
+       <use xlink:href="#DejaVuSans-72" x="276.074219"/>
+       <use xlink:href="#DejaVuSans-69" x="317.1875"/>
+       <use xlink:href="#DejaVuSans-7a" x="344.970703"/>
+       <use xlink:href="#DejaVuSans-65" x="397.460938"/>
+       <use xlink:href="#DejaVuSans-64" x="458.984375"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_4">
+     <g id="line2d_4">
+      <path d="M 455.391388 320.4 
+L 455.391388 43.2 
+" clip-path="url(#p4787e7a7d6)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_4">
+      <!-- kernel -->
+      <g style="fill: #262626" transform="translate(439.998419 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-6b" d="M 581 4863 
+L 1159 4863 
+L 1159 1991 
+L 2875 3500 
+L 3609 3500 
+L 1753 1863 
+L 3688 0 
+L 2938 0 
+L 1159 1709 
+L 1159 0 
+L 581 0 
+L 581 4863 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-6c" d="M 603 4863 
+L 1178 4863 
+L 1178 0 
+L 603 0 
+L 603 4863 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-6b"/>
+       <use xlink:href="#DejaVuSans-65" x="54.285156"/>
+       <use xlink:href="#DejaVuSans-72" x="115.808594"/>
+       <use xlink:href="#DejaVuSans-6e" x="155.171875"/>
+       <use xlink:href="#DejaVuSans-65" x="218.550781"/>
+       <use xlink:href="#DejaVuSans-6c" x="280.074219"/>
+      </g>
+     </g>
+    </g>
+   </g>
+   <g id="matplotlib.axis_2">
+    <g id="ytick_1">
+     <g id="line2d_5">
+      <path d="M 72 320.4 
+L 518.4 320.4 
+" clip-path="url(#p4787e7a7d6)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_5">
+      <!-- 0.0 -->
+      <g style="fill: #262626" transform="translate(45.006563 324.579141)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-30" d="M 2034 4250 
+Q 1547 4250 1301 3770 
+Q 1056 3291 1056 2328 
+Q 1056 1369 1301 889 
+Q 1547 409 2034 409 
+Q 2525 409 2770 889 
+Q 3016 1369 3016 2328 
+Q 3016 3291 2770 3770 
+Q 2525 4250 2034 4250 
+z
+M 2034 4750 
+Q 2819 4750 3233 4129 
+Q 3647 3509 3647 2328 
+Q 3647 1150 3233 529 
+Q 2819 -91 2034 -91 
+Q 1250 -91 836 529 
+Q 422 1150 422 2328 
+Q 422 3509 836 4129 
+Q 1250 4750 2034 4750 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-2e" d="M 684 794 
+L 1344 794 
+L 1344 0 
+L 684 0 
+L 684 794 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-30"/>
+       <use xlink:href="#DejaVuSans-2e" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="95.410156"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_2">
+     <g id="line2d_6">
+      <path d="M 72 280.8 
+L 518.4 280.8 
+" clip-path="url(#p4787e7a7d6)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_6">
+      <!-- 0.5 -->
+      <g style="fill: #262626" transform="translate(45.006563 284.979141)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-35" d="M 691 4666 
+L 3169 4666 
+L 3169 4134 
+L 1269 4134 
+L 1269 2991 
+Q 1406 3038 1543 3061 
+Q 1681 3084 1819 3084 
+Q 2600 3084 3056 2656 
+Q 3513 2228 3513 1497 
+Q 3513 744 3044 326 
+Q 2575 -91 1722 -91 
+Q 1428 -91 1123 -41 
+Q 819 9 494 109 
+L 494 744 
+Q 775 591 1075 516 
+Q 1375 441 1709 441 
+Q 2250 441 2565 725 
+Q 2881 1009 2881 1497 
+Q 2881 1984 2565 2268 
+Q 2250 2553 1709 2553 
+Q 1456 2553 1204 2497 
+Q 953 2441 691 2322 
+L 691 4666 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-30"/>
+       <use xlink:href="#DejaVuSans-2e" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-35" x="95.410156"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_3">
+     <g id="line2d_7">
+      <path d="M 72 241.2 
+L 518.4 241.2 
+" clip-path="url(#p4787e7a7d6)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_7">
+      <!-- 1.0 -->
+      <g style="fill: #262626" transform="translate(45.006563 245.379141)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-31" d="M 794 531 
+L 1825 531 
+L 1825 4091 
+L 703 3866 
+L 703 4441 
+L 1819 4666 
+L 2450 4666 
+L 2450 531 
+L 3481 531 
+L 3481 0 
+L 794 0 
+L 794 531 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-2e" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="95.410156"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_4">
+     <g id="line2d_8">
+      <path d="M 72 201.6 
+L 518.4 201.6 
+" clip-path="url(#p4787e7a7d6)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_8">
+      <!-- 1.5 -->
+      <g style="fill: #262626" transform="translate(45.006563 205.779141)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-2e" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-35" x="95.410156"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_5">
+     <g id="line2d_9">
+      <path d="M 72 162 
+L 518.4 162 
+" clip-path="url(#p4787e7a7d6)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_9">
+      <!-- 2.0 -->
+      <g style="fill: #262626" transform="translate(45.006563 166.179141)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-32" d="M 1228 531 
+L 3431 531 
+L 3431 0 
+L 469 0 
+L 469 531 
+Q 828 903 1448 1529 
+Q 2069 2156 2228 2338 
+Q 2531 2678 2651 2914 
+Q 2772 3150 2772 3378 
+Q 2772 3750 2511 3984 
+Q 2250 4219 1831 4219 
+Q 1534 4219 1204 4116 
+Q 875 4013 500 3803 
+L 500 4441 
+Q 881 4594 1212 4672 
+Q 1544 4750 1819 4750 
+Q 2544 4750 2975 4387 
+Q 3406 4025 3406 3419 
+Q 3406 3131 3298 2873 
+Q 3191 2616 2906 2266 
+Q 2828 2175 2409 1742 
+Q 1991 1309 1228 531 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-32"/>
+       <use xlink:href="#DejaVuSans-2e" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="95.410156"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_6">
+     <g id="line2d_10">
+      <path d="M 72 122.4 
+L 518.4 122.4 
+" clip-path="url(#p4787e7a7d6)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_10">
+      <!-- 2.5 -->
+      <g style="fill: #262626" transform="translate(45.006563 126.579141)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-32"/>
+       <use xlink:href="#DejaVuSans-2e" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-35" x="95.410156"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_7">
+     <g id="line2d_11">
+      <path d="M 72 82.8 
+L 518.4 82.8 
+" clip-path="url(#p4787e7a7d6)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_11">
+      <!-- 3.0 -->
+      <g style="fill: #262626" transform="translate(45.006563 86.979141)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-33" d="M 2597 2516 
+Q 3050 2419 3304 2112 
+Q 3559 1806 3559 1356 
+Q 3559 666 3084 287 
+Q 2609 -91 1734 -91 
+Q 1441 -91 1130 -33 
+Q 819 25 488 141 
+L 488 750 
+Q 750 597 1062 519 
+Q 1375 441 1716 441 
+Q 2309 441 2620 675 
+Q 2931 909 2931 1356 
+Q 2931 1769 2642 2001 
+Q 2353 2234 1838 2234 
+L 1294 2234 
+L 1294 2753 
+L 1863 2753 
+Q 2328 2753 2575 2939 
+Q 2822 3125 2822 3475 
+Q 2822 3834 2567 4026 
+Q 2313 4219 1838 4219 
+Q 1578 4219 1281 4162 
+Q 984 4106 628 3988 
+L 628 4550 
+Q 988 4650 1302 4700 
+Q 1616 4750 1894 4750 
+Q 2613 4750 3031 4423 
+Q 3450 4097 3450 3541 
+Q 3450 3153 3228 2886 
+Q 3006 2619 2597 2516 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-33"/>
+       <use xlink:href="#DejaVuSans-2e" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="95.410156"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_8">
+     <g id="line2d_12">
+      <path d="M 72 43.2 
+L 518.4 43.2 
+" clip-path="url(#p4787e7a7d6)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_12">
+      <!-- 3.5 -->
+      <g style="fill: #262626" transform="translate(45.006563 47.379141)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-33"/>
+       <use xlink:href="#DejaVuSans-2e" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-35" x="95.410156"/>
+      </g>
+     </g>
+    </g>
+    <g id="text_13">
+     <!-- GFLOPS -->
+     <g style="fill: #262626" transform="translate(38.510937 205.175625)rotate(-90)scale(0.12 -0.12)">
+      <defs>
+       <path id="DejaVuSans-47" d="M 3809 666 
+L 3809 1919 
+L 2778 1919 
+L 2778 2438 
+L 4434 2438 
+L 4434 434 
+Q 4069 175 3628 42 
+Q 3188 -91 2688 -91 
+Q 1594 -91 976 548 
+Q 359 1188 359 2328 
+Q 359 3472 976 4111 
+Q 1594 4750 2688 4750 
+Q 3144 4750 3555 4637 
+Q 3966 4525 4313 4306 
+L 4313 3634 
+Q 3963 3931 3569 4081 
+Q 3175 4231 2741 4231 
+Q 1884 4231 1454 3753 
+Q 1025 3275 1025 2328 
+Q 1025 1384 1454 906 
+Q 1884 428 2741 428 
+Q 3075 428 3337 486 
+Q 3600 544 3809 666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-46" d="M 628 4666 
+L 3309 4666 
+L 3309 4134 
+L 1259 4134 
+L 1259 2759 
+L 3109 2759 
+L 3109 2228 
+L 1259 2228 
+L 1259 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-4c" d="M 628 4666 
+L 1259 4666 
+L 1259 531 
+L 3531 531 
+L 3531 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-4f" d="M 2522 4238 
+Q 1834 4238 1429 3725 
+Q 1025 3213 1025 2328 
+Q 1025 1447 1429 934 
+Q 1834 422 2522 422 
+Q 3209 422 3611 934 
+Q 4013 1447 4013 2328 
+Q 4013 3213 3611 3725 
+Q 3209 4238 2522 4238 
+z
+M 2522 4750 
+Q 3503 4750 4090 4092 
+Q 4678 3434 4678 2328 
+Q 4678 1225 4090 567 
+Q 3503 -91 2522 -91 
+Q 1538 -91 948 565 
+Q 359 1222 359 2328 
+Q 359 3434 948 4092 
+Q 1538 4750 2522 4750 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-50" d="M 1259 4147 
+L 1259 2394 
+L 2053 2394 
+Q 2494 2394 2734 2622 
+Q 2975 2850 2975 3272 
+Q 2975 3691 2734 3919 
+Q 2494 4147 2053 4147 
+L 1259 4147 
+z
+M 628 4666 
+L 2053 4666 
+Q 2838 4666 3239 4311 
+Q 3641 3956 3641 3272 
+Q 3641 2581 3239 2228 
+Q 2838 1875 2053 1875 
+L 1259 1875 
+L 1259 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-53" d="M 3425 4513 
+L 3425 3897 
+Q 3066 4069 2747 4153 
+Q 2428 4238 2131 4238 
+Q 1616 4238 1336 4038 
+Q 1056 3838 1056 3469 
+Q 1056 3159 1242 3001 
+Q 1428 2844 1947 2747 
+L 2328 2669 
+Q 3034 2534 3370 2195 
+Q 3706 1856 3706 1288 
+Q 3706 609 3251 259 
+Q 2797 -91 1919 -91 
+Q 1588 -91 1214 -16 
+Q 841 59 441 206 
+L 441 856 
+Q 825 641 1194 531 
+Q 1563 422 1919 422 
+Q 2459 422 2753 634 
+Q 3047 847 3047 1241 
+Q 3047 1584 2836 1778 
+Q 2625 1972 2144 2069 
+L 1759 2144 
+Q 1053 2284 737 2584 
+Q 422 2884 422 3419 
+Q 422 4038 858 4394 
+Q 1294 4750 2059 4750 
+Q 2388 4750 2728 4690 
+Q 3069 4631 3425 4513 
+z
+" transform="scale(0.015625)"/>
+      </defs>
+      <use xlink:href="#DejaVuSans-47"/>
+      <use xlink:href="#DejaVuSans-46" x="77.490234"/>
+      <use xlink:href="#DejaVuSans-4c" x="135.009766"/>
+      <use xlink:href="#DejaVuSans-4f" x="187.097656"/>
+      <use xlink:href="#DejaVuSans-50" x="265.808594"/>
+      <use xlink:href="#DejaVuSans-53" x="326.111328"/>
+     </g>
+    </g>
+   </g>
+   <g id="patch_3">
+    <path d="M 92.290909 320.4 
+L 177.726316 320.4 
+L 177.726316 286.933 
+L 92.290909 286.933 
+z
+" clip-path="url(#p4787e7a7d6)" style="fill: #4c72b0; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_4">
+    <path d="M 199.085167 320.4 
+L 284.520574 320.4 
+L 284.520574 275.109942 
+L 199.085167 275.109942 
+z
+" clip-path="url(#p4787e7a7d6)" style="fill: #dd8452; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_5">
+    <path d="M 305.879426 320.4 
+L 391.314833 320.4 
+L 391.314833 140.570014 
+L 305.879426 140.570014 
+z
+" clip-path="url(#p4787e7a7d6)" style="fill: #55a868; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_6">
+    <path d="M 412.673684 320.4 
+L 498.109091 320.4 
+L 498.109091 70.440698 
+L 412.673684 70.440698 
+z
+" clip-path="url(#p4787e7a7d6)" style="fill: #c44e52; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_7">
+    <path d="M 72 320.4 
+L 72 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_8">
+    <path d="M 518.4 320.4 
+L 518.4 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_9">
+    <path d="M 72 320.4 
+L 518.4 320.4 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_10">
+    <path d="M 72 43.2 
+L 518.4 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="text_14">
+    <!-- 1.00x -->
+    <g style="fill: #262626" transform="translate(116.335487 281.937375)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-31" d="M 750 831 
+L 1813 831 
+L 1813 3847 
+L 722 3622 
+L 722 4441 
+L 1806 4666 
+L 2950 4666 
+L 2950 831 
+L 4013 831 
+L 4013 0 
+L 750 0 
+L 750 831 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Bold-2e" d="M 653 1209 
+L 1778 1209 
+L 1778 0 
+L 653 0 
+L 653 1209 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Bold-30" d="M 2944 2338 
+Q 2944 3213 2780 3570 
+Q 2616 3928 2228 3928 
+Q 1841 3928 1675 3570 
+Q 1509 3213 1509 2338 
+Q 1509 1453 1675 1090 
+Q 1841 728 2228 728 
+Q 2613 728 2778 1090 
+Q 2944 1453 2944 2338 
+z
+M 4147 2328 
+Q 4147 1169 3647 539 
+Q 3147 -91 2228 -91 
+Q 1306 -91 806 539 
+Q 306 1169 306 2328 
+Q 306 3491 806 4120 
+Q 1306 4750 2228 4750 
+Q 3147 4750 3647 4120 
+Q 4147 3491 4147 2328 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Bold-78" d="M 1422 1791 
+L 159 3500 
+L 1344 3500 
+L 2059 2463 
+L 2784 3500 
+L 3969 3500 
+L 2706 1797 
+L 4031 0 
+L 2847 0 
+L 2059 1106 
+L 1281 0 
+L 97 0 
+L 1422 1791 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-31"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-30" x="107.568359"/>
+     <use xlink:href="#DejaVuSans-Bold-30" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="246.728516"/>
+    </g>
+   </g>
+   <g id="text_15">
+    <!-- 1.35x -->
+    <g style="fill: #262626" transform="translate(223.129746 270.114317)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-33" d="M 2981 2516 
+Q 3453 2394 3698 2092 
+Q 3944 1791 3944 1325 
+Q 3944 631 3412 270 
+Q 2881 -91 1863 -91 
+Q 1503 -91 1142 -33 
+Q 781 25 428 141 
+L 428 1069 
+Q 766 900 1098 814 
+Q 1431 728 1753 728 
+Q 2231 728 2486 893 
+Q 2741 1059 2741 1369 
+Q 2741 1688 2480 1852 
+Q 2219 2016 1709 2016 
+L 1228 2016 
+L 1228 2791 
+L 1734 2791 
+Q 2188 2791 2409 2933 
+Q 2631 3075 2631 3366 
+Q 2631 3634 2415 3781 
+Q 2200 3928 1806 3928 
+Q 1516 3928 1219 3862 
+Q 922 3797 628 3669 
+L 628 4550 
+Q 984 4650 1334 4700 
+Q 1684 4750 2022 4750 
+Q 2931 4750 3382 4451 
+Q 3834 4153 3834 3553 
+Q 3834 3144 3618 2883 
+Q 3403 2622 2981 2516 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Bold-35" d="M 678 4666 
+L 3669 4666 
+L 3669 3781 
+L 1638 3781 
+L 1638 3059 
+Q 1775 3097 1914 3117 
+Q 2053 3138 2203 3138 
+Q 3056 3138 3531 2711 
+Q 4006 2284 4006 1522 
+Q 4006 766 3489 337 
+Q 2972 -91 2053 -91 
+Q 1656 -91 1267 -14 
+Q 878 63 494 219 
+L 494 1166 
+Q 875 947 1217 837 
+Q 1559 728 1863 728 
+Q 2300 728 2551 942 
+Q 2803 1156 2803 1522 
+Q 2803 1891 2551 2103 
+Q 2300 2316 1863 2316 
+Q 1603 2316 1309 2248 
+Q 1016 2181 678 2041 
+L 678 4666 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-31"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-33" x="107.568359"/>
+     <use xlink:href="#DejaVuSans-Bold-35" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="246.728516"/>
+    </g>
+   </g>
+   <g id="text_16">
+    <!-- 5.37x -->
+    <g style="fill: #262626" transform="translate(329.924004 135.574389)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-37" d="M 428 4666 
+L 3944 4666 
+L 3944 3988 
+L 2125 0 
+L 953 0 
+L 2675 3781 
+L 428 3781 
+L 428 4666 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-35"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-33" x="107.568359"/>
+     <use xlink:href="#DejaVuSans-Bold-37" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="246.728516"/>
+    </g>
+   </g>
+   <g id="text_17">
+    <!-- 7.47x -->
+    <g style="fill: #262626" transform="translate(436.718263 65.445073)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-34" d="M 2356 3675 
+L 1038 1722 
+L 2356 1722 
+L 2356 3675 
+z
+M 2156 4666 
+L 3494 4666 
+L 3494 1722 
+L 4159 1722 
+L 4159 850 
+L 3494 850 
+L 3494 0 
+L 2356 0 
+L 2356 850 
+L 288 850 
+L 288 1881 
+L 2156 4666 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-37"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-34" x="107.568359"/>
+     <use xlink:href="#DejaVuSans-Bold-37" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="246.728516"/>
+    </g>
+   </g>
+   <g id="text_18">
+    <!-- Matrix multiplication ($n=1920$) -->
+    <g style="fill: #262626" transform="translate(184.74 23.2)scale(0.14 -0.14)">
+     <defs>
+      <path id="DejaVuSans-4d" d="M 628 4666 
+L 1569 4666 
+L 2759 1491 
+L 3956 4666 
+L 4897 4666 
+L 4897 0 
+L 4281 0 
+L 4281 4097 
+L 3078 897 
+L 2444 897 
+L 1241 4097 
+L 1241 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-78" d="M 3513 3500 
+L 2247 1797 
+L 3578 0 
+L 2900 0 
+L 1881 1375 
+L 863 0 
+L 184 0 
+L 1544 1831 
+L 300 3500 
+L 978 3500 
+L 1906 2253 
+L 2834 3500 
+L 3513 3500 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-20" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-6d" d="M 3328 2828 
+Q 3544 3216 3844 3400 
+Q 4144 3584 4550 3584 
+Q 5097 3584 5394 3201 
+Q 5691 2819 5691 2113 
+L 5691 0 
+L 5113 0 
+L 5113 2094 
+Q 5113 2597 4934 2840 
+Q 4756 3084 4391 3084 
+Q 3944 3084 3684 2787 
+Q 3425 2491 3425 1978 
+L 3425 0 
+L 2847 0 
+L 2847 2094 
+Q 2847 2600 2669 2842 
+Q 2491 3084 2119 3084 
+Q 1678 3084 1418 2786 
+Q 1159 2488 1159 1978 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1356 3278 1631 3431 
+Q 1906 3584 2284 3584 
+Q 2666 3584 2933 3390 
+Q 3200 3197 3328 2828 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-75" d="M 544 1381 
+L 544 3500 
+L 1119 3500 
+L 1119 1403 
+Q 1119 906 1312 657 
+Q 1506 409 1894 409 
+Q 2359 409 2629 706 
+Q 2900 1003 2900 1516 
+L 2900 3500 
+L 3475 3500 
+L 3475 0 
+L 2900 0 
+L 2900 538 
+Q 2691 219 2414 64 
+Q 2138 -91 1772 -91 
+Q 1169 -91 856 284 
+Q 544 659 544 1381 
+z
+M 1991 3584 
+L 1991 3584 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-28" d="M 1984 4856 
+Q 1566 4138 1362 3434 
+Q 1159 2731 1159 2009 
+Q 1159 1288 1364 580 
+Q 1569 -128 1984 -844 
+L 1484 -844 
+Q 1016 -109 783 600 
+Q 550 1309 550 2009 
+Q 550 2706 781 3412 
+Q 1013 4119 1484 4856 
+L 1984 4856 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Oblique-6e" d="M 3566 2113 
+L 3156 0 
+L 2578 0 
+L 2988 2091 
+Q 3016 2238 3031 2350 
+Q 3047 2463 3047 2528 
+Q 3047 2791 2881 2937 
+Q 2716 3084 2419 3084 
+Q 1956 3084 1622 2776 
+Q 1288 2469 1184 1941 
+L 800 0 
+L 225 0 
+L 903 3500 
+L 1478 3500 
+L 1363 2950 
+Q 1603 3253 1940 3418 
+Q 2278 3584 2650 3584 
+Q 3113 3584 3367 3334 
+Q 3622 3084 3622 2631 
+Q 3622 2519 3608 2391 
+Q 3594 2263 3566 2113 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-3d" d="M 678 2906 
+L 4684 2906 
+L 4684 2381 
+L 678 2381 
+L 678 2906 
+z
+M 678 1631 
+L 4684 1631 
+L 4684 1100 
+L 678 1100 
+L 678 1631 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-39" d="M 703 97 
+L 703 672 
+Q 941 559 1184 500 
+Q 1428 441 1663 441 
+Q 2288 441 2617 861 
+Q 2947 1281 2994 2138 
+Q 2813 1869 2534 1725 
+Q 2256 1581 1919 1581 
+Q 1219 1581 811 2004 
+Q 403 2428 403 3163 
+Q 403 3881 828 4315 
+Q 1253 4750 1959 4750 
+Q 2769 4750 3195 4129 
+Q 3622 3509 3622 2328 
+Q 3622 1225 3098 567 
+Q 2575 -91 1691 -91 
+Q 1453 -91 1209 -44 
+Q 966 3 703 97 
+z
+M 1959 2075 
+Q 2384 2075 2632 2365 
+Q 2881 2656 2881 3163 
+Q 2881 3666 2632 3958 
+Q 2384 4250 1959 4250 
+Q 1534 4250 1286 3958 
+Q 1038 3666 1038 3163 
+Q 1038 2656 1286 2365 
+Q 1534 2075 1959 2075 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-29" d="M 513 4856 
+L 1013 4856 
+Q 1481 4119 1714 3412 
+Q 1947 2706 1947 2009 
+Q 1947 1309 1714 600 
+Q 1481 -109 1013 -844 
+L 513 -844 
+Q 928 -128 1133 580 
+Q 1338 1288 1338 2009 
+Q 1338 2731 1133 3434 
+Q 928 4138 513 4856 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-4d" transform="translate(0 0.015625)"/>
+     <use xlink:href="#DejaVuSans-61" transform="translate(86.279297 0.015625)"/>
+     <use xlink:href="#DejaVuSans-74" transform="translate(147.558594 0.015625)"/>
+     <use xlink:href="#DejaVuSans-72" transform="translate(186.767578 0.015625)"/>
+     <use xlink:href="#DejaVuSans-69" transform="translate(227.880859 0.015625)"/>
+     <use xlink:href="#DejaVuSans-78" transform="translate(255.664062 0.015625)"/>
+     <use xlink:href="#DejaVuSans-20" transform="translate(314.84375 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6d" transform="translate(346.630859 0.015625)"/>
+     <use xlink:href="#DejaVuSans-75" transform="translate(444.042969 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6c" transform="translate(507.421875 0.015625)"/>
+     <use xlink:href="#DejaVuSans-74" transform="translate(535.205078 0.015625)"/>
+     <use xlink:href="#DejaVuSans-69" transform="translate(574.414062 0.015625)"/>
+     <use xlink:href="#DejaVuSans-70" transform="translate(602.197266 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6c" transform="translate(665.673828 0.015625)"/>
+     <use xlink:href="#DejaVuSans-69" transform="translate(693.457031 0.015625)"/>
+     <use xlink:href="#DejaVuSans-63" transform="translate(721.240234 0.015625)"/>
+     <use xlink:href="#DejaVuSans-61" transform="translate(776.220703 0.015625)"/>
+     <use xlink:href="#DejaVuSans-74" transform="translate(837.5 0.015625)"/>
+     <use xlink:href="#DejaVuSans-69" transform="translate(876.708984 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6f" transform="translate(904.492188 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6e" transform="translate(965.673828 0.015625)"/>
+     <use xlink:href="#DejaVuSans-20" transform="translate(1029.052734 0.015625)"/>
+     <use xlink:href="#DejaVuSans-28" transform="translate(1060.839844 0.015625)"/>
+     <use xlink:href="#DejaVuSans-Oblique-6e" transform="translate(1099.853516 0.015625)"/>
+     <use xlink:href="#DejaVuSans-3d" transform="translate(1182.714844 0.015625)"/>
+     <use xlink:href="#DejaVuSans-31" transform="translate(1285.986328 0.015625)"/>
+     <use xlink:href="#DejaVuSans-39" transform="translate(1349.609375 0.015625)"/>
+     <use xlink:href="#DejaVuSans-32" transform="translate(1411.482422 0.015625)"/>
+     <use xlink:href="#DejaVuSans-30" transform="translate(1475.105469 0.015625)"/>
+     <use xlink:href="#DejaVuSans-29" transform="translate(1538.728516 0.015625)"/>
+    </g>
+   </g>
+  </g>
+ </g>
+ <defs>
+  <clipPath id="p4787e7a7d6">
+   <rect x="72" y="43.2" width="446.4" height="277.2"/>
+  </clipPath>
+ </defs>
+</svg>
diff --git a/content/english/hpc/algorithms/img/mm-kernel-plot.svg b/content/english/hpc/algorithms/img/mm-kernel-plot.svg
new file mode 100644
index 00000000..99f9315a
--- /dev/null
+++ b/content/english/hpc/algorithms/img/mm-kernel-plot.svg
@@ -0,0 +1,1385 @@
+<?xml version="1.0" encoding="utf-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+  "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<svg xmlns:xlink="http://www.w3.org/1999/xlink" width="576pt" height="360pt" viewBox="0 0 576 360" xmlns="http://www.w3.org/2000/svg" version="1.1">
+ <metadata>
+  <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
+   <cc:Work>
+    <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/>
+    <dc:date>2022-04-05T01:18:30.773700</dc:date>
+    <dc:format>image/svg+xml</dc:format>
+    <dc:creator>
+     <cc:Agent>
+      <dc:title>Matplotlib v3.5.1, https://matplotlib.org/</dc:title>
+     </cc:Agent>
+    </dc:creator>
+   </cc:Work>
+  </rdf:RDF>
+ </metadata>
+ <defs>
+  <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style>
+ </defs>
+ <g id="figure_1">
+  <g id="patch_1">
+   <path d="M 0 360 
+L 576 360 
+L 576 0 
+L 0 0 
+z
+" style="fill: #ffffff"/>
+  </g>
+  <g id="axes_1">
+   <g id="patch_2">
+    <path d="M 72 320.4 
+L 518.4 320.4 
+L 518.4 43.2 
+L 72 43.2 
+z
+" style="fill: #ffffff"/>
+   </g>
+   <g id="matplotlib.axis_1">
+    <g id="xtick_1">
+     <g id="line2d_1">
+      <path d="M 117.784615 320.4 
+L 117.784615 43.2 
+" clip-path="url(#p1185134d18)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_1">
+      <!-- 240 -->
+      <g style="fill: #262626" transform="translate(107.28649 338.258281)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-32" d="M 1228 531 
+L 3431 531 
+L 3431 0 
+L 469 0 
+L 469 531 
+Q 828 903 1448 1529 
+Q 2069 2156 2228 2338 
+Q 2531 2678 2651 2914 
+Q 2772 3150 2772 3378 
+Q 2772 3750 2511 3984 
+Q 2250 4219 1831 4219 
+Q 1534 4219 1204 4116 
+Q 875 4013 500 3803 
+L 500 4441 
+Q 881 4594 1212 4672 
+Q 1544 4750 1819 4750 
+Q 2544 4750 2975 4387 
+Q 3406 4025 3406 3419 
+Q 3406 3131 3298 2873 
+Q 3191 2616 2906 2266 
+Q 2828 2175 2409 1742 
+Q 1991 1309 1228 531 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-34" d="M 2419 4116 
+L 825 1625 
+L 2419 1625 
+L 2419 4116 
+z
+M 2253 4666 
+L 3047 4666 
+L 3047 1625 
+L 3713 1625 
+L 3713 1100 
+L 3047 1100 
+L 3047 0 
+L 2419 0 
+L 2419 1100 
+L 313 1100 
+L 313 1709 
+L 2253 4666 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-30" d="M 2034 4250 
+Q 1547 4250 1301 3770 
+Q 1056 3291 1056 2328 
+Q 1056 1369 1301 889 
+Q 1547 409 2034 409 
+Q 2525 409 2770 889 
+Q 3016 1369 3016 2328 
+Q 3016 3291 2770 3770 
+Q 2525 4250 2034 4250 
+z
+M 2034 4750 
+Q 2819 4750 3233 4129 
+Q 3647 3509 3647 2328 
+Q 3647 1150 3233 529 
+Q 2819 -91 2034 -91 
+Q 1250 -91 836 529 
+Q 422 1150 422 2328 
+Q 422 3509 836 4129 
+Q 1250 4750 2034 4750 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-32"/>
+       <use xlink:href="#DejaVuSans-34" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="127.246094"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_2">
+     <g id="line2d_2">
+      <path d="M 175.015385 320.4 
+L 175.015385 43.2 
+" clip-path="url(#p1185134d18)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_2">
+      <!-- 480 -->
+      <g style="fill: #262626" transform="translate(164.51726 338.258281)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-38" d="M 2034 2216 
+Q 1584 2216 1326 1975 
+Q 1069 1734 1069 1313 
+Q 1069 891 1326 650 
+Q 1584 409 2034 409 
+Q 2484 409 2743 651 
+Q 3003 894 3003 1313 
+Q 3003 1734 2745 1975 
+Q 2488 2216 2034 2216 
+z
+M 1403 2484 
+Q 997 2584 770 2862 
+Q 544 3141 544 3541 
+Q 544 4100 942 4425 
+Q 1341 4750 2034 4750 
+Q 2731 4750 3128 4425 
+Q 3525 4100 3525 3541 
+Q 3525 3141 3298 2862 
+Q 3072 2584 2669 2484 
+Q 3125 2378 3379 2068 
+Q 3634 1759 3634 1313 
+Q 3634 634 3220 271 
+Q 2806 -91 2034 -91 
+Q 1263 -91 848 271 
+Q 434 634 434 1313 
+Q 434 1759 690 2068 
+Q 947 2378 1403 2484 
+z
+M 1172 3481 
+Q 1172 3119 1398 2916 
+Q 1625 2713 2034 2713 
+Q 2441 2713 2670 2916 
+Q 2900 3119 2900 3481 
+Q 2900 3844 2670 4047 
+Q 2441 4250 2034 4250 
+Q 1625 4250 1398 4047 
+Q 1172 3844 1172 3481 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-34"/>
+       <use xlink:href="#DejaVuSans-38" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="127.246094"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_3">
+     <g id="line2d_3">
+      <path d="M 232.246154 320.4 
+L 232.246154 43.2 
+" clip-path="url(#p1185134d18)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_3">
+      <!-- 720 -->
+      <g style="fill: #262626" transform="translate(221.748029 338.258281)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-37" d="M 525 4666 
+L 3525 4666 
+L 3525 4397 
+L 1831 0 
+L 1172 0 
+L 2766 4134 
+L 525 4134 
+L 525 4666 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-37"/>
+       <use xlink:href="#DejaVuSans-32" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="127.246094"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_4">
+     <g id="line2d_4">
+      <path d="M 289.476923 320.4 
+L 289.476923 43.2 
+" clip-path="url(#p1185134d18)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_4">
+      <!-- 960 -->
+      <g style="fill: #262626" transform="translate(278.978798 338.258281)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-39" d="M 703 97 
+L 703 672 
+Q 941 559 1184 500 
+Q 1428 441 1663 441 
+Q 2288 441 2617 861 
+Q 2947 1281 2994 2138 
+Q 2813 1869 2534 1725 
+Q 2256 1581 1919 1581 
+Q 1219 1581 811 2004 
+Q 403 2428 403 3163 
+Q 403 3881 828 4315 
+Q 1253 4750 1959 4750 
+Q 2769 4750 3195 4129 
+Q 3622 3509 3622 2328 
+Q 3622 1225 3098 567 
+Q 2575 -91 1691 -91 
+Q 1453 -91 1209 -44 
+Q 966 3 703 97 
+z
+M 1959 2075 
+Q 2384 2075 2632 2365 
+Q 2881 2656 2881 3163 
+Q 2881 3666 2632 3958 
+Q 2384 4250 1959 4250 
+Q 1534 4250 1286 3958 
+Q 1038 3666 1038 3163 
+Q 1038 2656 1286 2365 
+Q 1534 2075 1959 2075 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-36" d="M 2113 2584 
+Q 1688 2584 1439 2293 
+Q 1191 2003 1191 1497 
+Q 1191 994 1439 701 
+Q 1688 409 2113 409 
+Q 2538 409 2786 701 
+Q 3034 994 3034 1497 
+Q 3034 2003 2786 2293 
+Q 2538 2584 2113 2584 
+z
+M 3366 4563 
+L 3366 3988 
+Q 3128 4100 2886 4159 
+Q 2644 4219 2406 4219 
+Q 1781 4219 1451 3797 
+Q 1122 3375 1075 2522 
+Q 1259 2794 1537 2939 
+Q 1816 3084 2150 3084 
+Q 2853 3084 3261 2657 
+Q 3669 2231 3669 1497 
+Q 3669 778 3244 343 
+Q 2819 -91 2113 -91 
+Q 1303 -91 875 529 
+Q 447 1150 447 2328 
+Q 447 3434 972 4092 
+Q 1497 4750 2381 4750 
+Q 2619 4750 2861 4703 
+Q 3103 4656 3366 4563 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-39"/>
+       <use xlink:href="#DejaVuSans-36" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="127.246094"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_5">
+     <g id="line2d_5">
+      <path d="M 346.707692 320.4 
+L 346.707692 43.2 
+" clip-path="url(#p1185134d18)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_5">
+      <!-- 1200 -->
+      <g style="fill: #262626" transform="translate(332.710192 338.258281)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-31" d="M 794 531 
+L 1825 531 
+L 1825 4091 
+L 703 3866 
+L 703 4441 
+L 1819 4666 
+L 2450 4666 
+L 2450 531 
+L 3481 531 
+L 3481 0 
+L 794 0 
+L 794 531 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-32" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="127.246094"/>
+       <use xlink:href="#DejaVuSans-30" x="190.869141"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_6">
+     <g id="line2d_6">
+      <path d="M 403.938462 320.4 
+L 403.938462 43.2 
+" clip-path="url(#p1185134d18)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_6">
+      <!-- 1440 -->
+      <g style="fill: #262626" transform="translate(389.940962 338.258281)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-34" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-34" x="127.246094"/>
+       <use xlink:href="#DejaVuSans-30" x="190.869141"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_7">
+     <g id="line2d_7">
+      <path d="M 461.169231 320.4 
+L 461.169231 43.2 
+" clip-path="url(#p1185134d18)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_7">
+      <!-- 1680 -->
+      <g style="fill: #262626" transform="translate(447.171731 338.258281)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-36" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-38" x="127.246094"/>
+       <use xlink:href="#DejaVuSans-30" x="190.869141"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_8">
+     <g id="line2d_8">
+      <path d="M 518.4 320.4 
+L 518.4 43.2 
+" clip-path="url(#p1185134d18)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_8">
+      <!-- 1920 -->
+      <g style="fill: #262626" transform="translate(504.4025 338.258281)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-39" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-32" x="127.246094"/>
+       <use xlink:href="#DejaVuSans-30" x="190.869141"/>
+      </g>
+     </g>
+    </g>
+    <g id="text_9">
+     <!-- Matrix size ($n \times n$) -->
+     <g style="fill: #262626" transform="translate(241.2 353.664062)scale(0.12 -0.12)">
+      <defs>
+       <path id="DejaVuSans-4d" d="M 628 4666 
+L 1569 4666 
+L 2759 1491 
+L 3956 4666 
+L 4897 4666 
+L 4897 0 
+L 4281 0 
+L 4281 4097 
+L 3078 897 
+L 2444 897 
+L 1241 4097 
+L 1241 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-61" d="M 2194 1759 
+Q 1497 1759 1228 1600 
+Q 959 1441 959 1056 
+Q 959 750 1161 570 
+Q 1363 391 1709 391 
+Q 2188 391 2477 730 
+Q 2766 1069 2766 1631 
+L 2766 1759 
+L 2194 1759 
+z
+M 3341 1997 
+L 3341 0 
+L 2766 0 
+L 2766 531 
+Q 2569 213 2275 61 
+Q 1981 -91 1556 -91 
+Q 1019 -91 701 211 
+Q 384 513 384 1019 
+Q 384 1609 779 1909 
+Q 1175 2209 1959 2209 
+L 2766 2209 
+L 2766 2266 
+Q 2766 2663 2505 2880 
+Q 2244 3097 1772 3097 
+Q 1472 3097 1187 3025 
+Q 903 2953 641 2809 
+L 641 3341 
+Q 956 3463 1253 3523 
+Q 1550 3584 1831 3584 
+Q 2591 3584 2966 3190 
+Q 3341 2797 3341 1997 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-74" d="M 1172 4494 
+L 1172 3500 
+L 2356 3500 
+L 2356 3053 
+L 1172 3053 
+L 1172 1153 
+Q 1172 725 1289 603 
+Q 1406 481 1766 481 
+L 2356 481 
+L 2356 0 
+L 1766 0 
+Q 1100 0 847 248 
+Q 594 497 594 1153 
+L 594 3053 
+L 172 3053 
+L 172 3500 
+L 594 3500 
+L 594 4494 
+L 1172 4494 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-72" d="M 2631 2963 
+Q 2534 3019 2420 3045 
+Q 2306 3072 2169 3072 
+Q 1681 3072 1420 2755 
+Q 1159 2438 1159 1844 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1341 3275 1631 3429 
+Q 1922 3584 2338 3584 
+Q 2397 3584 2469 3576 
+Q 2541 3569 2628 3553 
+L 2631 2963 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-69" d="M 603 3500 
+L 1178 3500 
+L 1178 0 
+L 603 0 
+L 603 3500 
+z
+M 603 4863 
+L 1178 4863 
+L 1178 4134 
+L 603 4134 
+L 603 4863 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-78" d="M 3513 3500 
+L 2247 1797 
+L 3578 0 
+L 2900 0 
+L 1881 1375 
+L 863 0 
+L 184 0 
+L 1544 1831 
+L 300 3500 
+L 978 3500 
+L 1906 2253 
+L 2834 3500 
+L 3513 3500 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-20" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-73" d="M 2834 3397 
+L 2834 2853 
+Q 2591 2978 2328 3040 
+Q 2066 3103 1784 3103 
+Q 1356 3103 1142 2972 
+Q 928 2841 928 2578 
+Q 928 2378 1081 2264 
+Q 1234 2150 1697 2047 
+L 1894 2003 
+Q 2506 1872 2764 1633 
+Q 3022 1394 3022 966 
+Q 3022 478 2636 193 
+Q 2250 -91 1575 -91 
+Q 1294 -91 989 -36 
+Q 684 19 347 128 
+L 347 722 
+Q 666 556 975 473 
+Q 1284 391 1588 391 
+Q 1994 391 2212 530 
+Q 2431 669 2431 922 
+Q 2431 1156 2273 1281 
+Q 2116 1406 1581 1522 
+L 1381 1569 
+Q 847 1681 609 1914 
+Q 372 2147 372 2553 
+Q 372 3047 722 3315 
+Q 1072 3584 1716 3584 
+Q 2034 3584 2315 3537 
+Q 2597 3491 2834 3397 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-7a" d="M 353 3500 
+L 3084 3500 
+L 3084 2975 
+L 922 459 
+L 3084 459 
+L 3084 0 
+L 275 0 
+L 275 525 
+L 2438 3041 
+L 353 3041 
+L 353 3500 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-65" d="M 3597 1894 
+L 3597 1613 
+L 953 1613 
+Q 991 1019 1311 708 
+Q 1631 397 2203 397 
+Q 2534 397 2845 478 
+Q 3156 559 3463 722 
+L 3463 178 
+Q 3153 47 2828 -22 
+Q 2503 -91 2169 -91 
+Q 1331 -91 842 396 
+Q 353 884 353 1716 
+Q 353 2575 817 3079 
+Q 1281 3584 2069 3584 
+Q 2775 3584 3186 3129 
+Q 3597 2675 3597 1894 
+z
+M 3022 2063 
+Q 3016 2534 2758 2815 
+Q 2500 3097 2075 3097 
+Q 1594 3097 1305 2825 
+Q 1016 2553 972 2059 
+L 3022 2063 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-28" d="M 1984 4856 
+Q 1566 4138 1362 3434 
+Q 1159 2731 1159 2009 
+Q 1159 1288 1364 580 
+Q 1569 -128 1984 -844 
+L 1484 -844 
+Q 1016 -109 783 600 
+Q 550 1309 550 2009 
+Q 550 2706 781 3412 
+Q 1013 4119 1484 4856 
+L 1984 4856 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-Oblique-6e" d="M 3566 2113 
+L 3156 0 
+L 2578 0 
+L 2988 2091 
+Q 3016 2238 3031 2350 
+Q 3047 2463 3047 2528 
+Q 3047 2791 2881 2937 
+Q 2716 3084 2419 3084 
+Q 1956 3084 1622 2776 
+Q 1288 2469 1184 1941 
+L 800 0 
+L 225 0 
+L 903 3500 
+L 1478 3500 
+L 1363 2950 
+Q 1603 3253 1940 3418 
+Q 2278 3584 2650 3584 
+Q 3113 3584 3367 3334 
+Q 3622 3084 3622 2631 
+Q 3622 2519 3608 2391 
+Q 3594 2263 3566 2113 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-d7" d="M 4488 3438 
+L 3059 2003 
+L 4488 575 
+L 4116 197 
+L 2681 1631 
+L 1247 197 
+L 878 575 
+L 2303 2003 
+L 878 3438 
+L 1247 3816 
+L 2681 2381 
+L 4116 3816 
+L 4488 3438 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-29" d="M 513 4856 
+L 1013 4856 
+Q 1481 4119 1714 3412 
+Q 1947 2706 1947 2009 
+Q 1947 1309 1714 600 
+Q 1481 -109 1013 -844 
+L 513 -844 
+Q 928 -128 1133 580 
+Q 1338 1288 1338 2009 
+Q 1338 2731 1133 3434 
+Q 928 4138 513 4856 
+z
+" transform="scale(0.015625)"/>
+      </defs>
+      <use xlink:href="#DejaVuSans-4d" transform="translate(0 0.015625)"/>
+      <use xlink:href="#DejaVuSans-61" transform="translate(86.279297 0.015625)"/>
+      <use xlink:href="#DejaVuSans-74" transform="translate(147.558594 0.015625)"/>
+      <use xlink:href="#DejaVuSans-72" transform="translate(186.767578 0.015625)"/>
+      <use xlink:href="#DejaVuSans-69" transform="translate(227.880859 0.015625)"/>
+      <use xlink:href="#DejaVuSans-78" transform="translate(255.664062 0.015625)"/>
+      <use xlink:href="#DejaVuSans-20" transform="translate(314.84375 0.015625)"/>
+      <use xlink:href="#DejaVuSans-73" transform="translate(346.630859 0.015625)"/>
+      <use xlink:href="#DejaVuSans-69" transform="translate(398.730469 0.015625)"/>
+      <use xlink:href="#DejaVuSans-7a" transform="translate(426.513672 0.015625)"/>
+      <use xlink:href="#DejaVuSans-65" transform="translate(479.003906 0.015625)"/>
+      <use xlink:href="#DejaVuSans-20" transform="translate(540.527344 0.015625)"/>
+      <use xlink:href="#DejaVuSans-28" transform="translate(572.314453 0.015625)"/>
+      <use xlink:href="#DejaVuSans-Oblique-6e" transform="translate(611.328125 0.015625)"/>
+      <use xlink:href="#DejaVuSans-d7" transform="translate(694.189453 0.015625)"/>
+      <use xlink:href="#DejaVuSans-Oblique-6e" transform="translate(797.460938 0.015625)"/>
+      <use xlink:href="#DejaVuSans-29" transform="translate(860.839844 0.015625)"/>
+     </g>
+    </g>
+   </g>
+   <g id="matplotlib.axis_2">
+    <g id="ytick_1">
+     <g id="line2d_9">
+      <path d="M 72 320.4 
+L 518.4 320.4 
+" clip-path="url(#p1185134d18)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_10">
+      <!-- 0 -->
+      <g style="fill: #262626" transform="translate(55.50125 324.579141)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-30"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_2">
+     <g id="line2d_10">
+      <path d="M 72 262.193219 
+L 518.4 262.193219 
+" clip-path="url(#p1185134d18)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_11">
+      <!-- 5 -->
+      <g style="fill: #262626" transform="translate(55.50125 266.37236)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-35" d="M 691 4666 
+L 3169 4666 
+L 3169 4134 
+L 1269 4134 
+L 1269 2991 
+Q 1406 3038 1543 3061 
+Q 1681 3084 1819 3084 
+Q 2600 3084 3056 2656 
+Q 3513 2228 3513 1497 
+Q 3513 744 3044 326 
+Q 2575 -91 1722 -91 
+Q 1428 -91 1123 -41 
+Q 819 9 494 109 
+L 494 744 
+Q 775 591 1075 516 
+Q 1375 441 1709 441 
+Q 2250 441 2565 725 
+Q 2881 1009 2881 1497 
+Q 2881 1984 2565 2268 
+Q 2250 2553 1709 2553 
+Q 1456 2553 1204 2497 
+Q 953 2441 691 2322 
+L 691 4666 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-35"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_3">
+     <g id="line2d_11">
+      <path d="M 72 203.986439 
+L 518.4 203.986439 
+" clip-path="url(#p1185134d18)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_12">
+      <!-- 10 -->
+      <g style="fill: #262626" transform="translate(48.5025 208.165579)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-30" x="63.623047"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_4">
+     <g id="line2d_12">
+      <path d="M 72 145.779658 
+L 518.4 145.779658 
+" clip-path="url(#p1185134d18)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_13">
+      <!-- 15 -->
+      <g style="fill: #262626" transform="translate(48.5025 149.958799)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-35" x="63.623047"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_5">
+     <g id="line2d_13">
+      <path d="M 72 87.572878 
+L 518.4 87.572878 
+" clip-path="url(#p1185134d18)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_14">
+      <!-- 20 -->
+      <g style="fill: #262626" transform="translate(48.5025 91.752018)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-32"/>
+       <use xlink:href="#DejaVuSans-30" x="63.623047"/>
+      </g>
+     </g>
+    </g>
+    <g id="text_15">
+     <!-- GFLOPS -->
+     <g style="fill: #262626" transform="translate(42.006875 205.175625)rotate(-90)scale(0.12 -0.12)">
+      <defs>
+       <path id="DejaVuSans-47" d="M 3809 666 
+L 3809 1919 
+L 2778 1919 
+L 2778 2438 
+L 4434 2438 
+L 4434 434 
+Q 4069 175 3628 42 
+Q 3188 -91 2688 -91 
+Q 1594 -91 976 548 
+Q 359 1188 359 2328 
+Q 359 3472 976 4111 
+Q 1594 4750 2688 4750 
+Q 3144 4750 3555 4637 
+Q 3966 4525 4313 4306 
+L 4313 3634 
+Q 3963 3931 3569 4081 
+Q 3175 4231 2741 4231 
+Q 1884 4231 1454 3753 
+Q 1025 3275 1025 2328 
+Q 1025 1384 1454 906 
+Q 1884 428 2741 428 
+Q 3075 428 3337 486 
+Q 3600 544 3809 666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-46" d="M 628 4666 
+L 3309 4666 
+L 3309 4134 
+L 1259 4134 
+L 1259 2759 
+L 3109 2759 
+L 3109 2228 
+L 1259 2228 
+L 1259 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-4c" d="M 628 4666 
+L 1259 4666 
+L 1259 531 
+L 3531 531 
+L 3531 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-4f" d="M 2522 4238 
+Q 1834 4238 1429 3725 
+Q 1025 3213 1025 2328 
+Q 1025 1447 1429 934 
+Q 1834 422 2522 422 
+Q 3209 422 3611 934 
+Q 4013 1447 4013 2328 
+Q 4013 3213 3611 3725 
+Q 3209 4238 2522 4238 
+z
+M 2522 4750 
+Q 3503 4750 4090 4092 
+Q 4678 3434 4678 2328 
+Q 4678 1225 4090 567 
+Q 3503 -91 2522 -91 
+Q 1538 -91 948 565 
+Q 359 1222 359 2328 
+Q 359 3434 948 4092 
+Q 1538 4750 2522 4750 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-50" d="M 1259 4147 
+L 1259 2394 
+L 2053 2394 
+Q 2494 2394 2734 2622 
+Q 2975 2850 2975 3272 
+Q 2975 3691 2734 3919 
+Q 2494 4147 2053 4147 
+L 1259 4147 
+z
+M 628 4666 
+L 2053 4666 
+Q 2838 4666 3239 4311 
+Q 3641 3956 3641 3272 
+Q 3641 2581 3239 2228 
+Q 2838 1875 2053 1875 
+L 1259 1875 
+L 1259 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-53" d="M 3425 4513 
+L 3425 3897 
+Q 3066 4069 2747 4153 
+Q 2428 4238 2131 4238 
+Q 1616 4238 1336 4038 
+Q 1056 3838 1056 3469 
+Q 1056 3159 1242 3001 
+Q 1428 2844 1947 2747 
+L 2328 2669 
+Q 3034 2534 3370 2195 
+Q 3706 1856 3706 1288 
+Q 3706 609 3251 259 
+Q 2797 -91 1919 -91 
+Q 1588 -91 1214 -16 
+Q 841 59 441 206 
+L 441 856 
+Q 825 641 1194 531 
+Q 1563 422 1919 422 
+Q 2459 422 2753 634 
+Q 3047 847 3047 1241 
+Q 3047 1584 2836 1778 
+Q 2625 1972 2144 2069 
+L 1759 2144 
+Q 1053 2284 737 2584 
+Q 422 2884 422 3419 
+Q 422 4038 858 4394 
+Q 1294 4750 2059 4750 
+Q 2388 4750 2728 4690 
+Q 3069 4631 3425 4513 
+z
+" transform="scale(0.015625)"/>
+      </defs>
+      <use xlink:href="#DejaVuSans-47"/>
+      <use xlink:href="#DejaVuSans-46" x="77.490234"/>
+      <use xlink:href="#DejaVuSans-4c" x="135.009766"/>
+      <use xlink:href="#DejaVuSans-4f" x="187.097656"/>
+      <use xlink:href="#DejaVuSans-50" x="265.808594"/>
+      <use xlink:href="#DejaVuSans-53" x="326.111328"/>
+     </g>
+    </g>
+   </g>
+   <g id="line2d_14">
+    <path d="M 72 309.671326 
+L 83.446154 311.498075 
+L 94.892308 312.017819 
+L 106.338462 312.236301 
+L 117.784615 312.38036 
+L 129.230769 312.50921 
+L 140.676923 312.532968 
+L 152.123077 312.600467 
+L 163.569231 312.647151 
+L 175.015385 312.627851 
+L 186.461538 312.67057 
+L 197.907692 312.701045 
+L 209.353846 312.717554 
+L 220.8 312.708816 
+L 232.246154 312.710882 
+L 243.692308 315.407713 
+L 255.138462 312.728222 
+L 266.584615 312.801696 
+L 278.030769 313.848938 
+L 289.476923 313.911222 
+L 300.923077 313.705591 
+L 312.369231 313.756657 
+L 323.815385 313.596682 
+L 335.261538 315.165161 
+L 346.707692 313.8018 
+L 358.153846 313.675801 
+L 369.6 313.837237 
+L 381.046154 313.44832 
+L 392.492308 313.911027 
+L 403.938462 313.481161 
+L 415.384615 313.836402 
+L 426.830769 318.50093 
+L 438.276923 313.158347 
+L 449.723077 314.031662 
+L 461.169231 313.386046 
+L 472.615385 313.784265 
+L 484.061538 313.695596 
+L 495.507692 313.710558 
+L 506.953846 313.419903 
+L 518.4 315.480792 
+" clip-path="url(#p1185134d18)" style="fill: none; stroke: #4c72b0; stroke-width: 1.5; stroke-linecap: round"/>
+   </g>
+   <g id="line2d_15">
+    <path d="M 72 310.100473 
+L 83.446154 311.723061 
+L 94.892308 312.164771 
+L 106.338462 312.369963 
+L 117.784615 312.46459 
+L 129.230769 312.566557 
+L 140.676923 312.627815 
+L 152.123077 312.735949 
+L 163.569231 312.765275 
+L 175.015385 312.823565 
+L 186.461538 312.89925 
+L 197.907692 312.923884 
+L 209.353846 312.952443 
+L 220.8 313.044405 
+L 232.246154 313.118095 
+L 243.692308 313.154137 
+L 255.138462 313.397423 
+L 266.584615 313.377379 
+L 278.030769 313.286294 
+L 289.476923 313.310017 
+L 300.923077 313.404439 
+L 312.369231 313.41027 
+L 323.815385 313.35796 
+L 335.261538 313.469655 
+L 346.707692 313.731341 
+L 358.153846 313.628065 
+L 369.6 313.918748 
+L 381.046154 313.780767 
+L 392.492308 313.762085 
+L 403.938462 313.712702 
+L 415.384615 313.680426 
+L 426.830769 313.69078 
+L 438.276923 313.655243 
+L 449.723077 313.801 
+L 461.169231 313.856531 
+L 472.615385 313.749041 
+L 484.061538 313.756204 
+L 495.507692 313.765085 
+L 506.953846 313.733048 
+L 518.4 313.742958 
+" clip-path="url(#p1185134d18)" style="fill: none; stroke: #dd8452; stroke-width: 1.5; stroke-linecap: round"/>
+   </g>
+   <g id="line2d_16">
+    <path d="M 72 270.883044 
+L 83.446154 250.335192 
+L 94.892308 241.217988 
+L 106.338462 239.934947 
+L 117.784615 240.533942 
+L 129.230769 240.282678 
+L 140.676923 243.292926 
+L 152.123077 245.477322 
+L 163.569231 246.74824 
+L 175.015385 248.423657 
+L 186.461538 251.925124 
+L 197.907692 253.275787 
+L 209.353846 256.016235 
+L 220.8 256.125166 
+L 232.246154 260.496327 
+L 243.692308 265.344168 
+L 255.138462 269.443769 
+L 266.584615 275.716187 
+L 278.030769 279.160382 
+L 289.476923 284.835335 
+L 300.923077 287.083554 
+L 312.369231 289.513765 
+L 323.815385 291.436354 
+L 335.261538 292.78857 
+L 346.707692 292.867424 
+L 358.153846 292.906062 
+L 369.6 292.525449 
+L 381.046154 292.674721 
+L 392.492308 294.161419 
+L 403.938462 293.327772 
+L 415.384615 294.184601 
+L 426.830769 294.107848 
+L 438.276923 293.676165 
+L 449.723077 293.709604 
+L 461.169231 294.153819 
+L 472.615385 294.084407 
+L 484.061538 293.36601 
+L 495.507692 292.413419 
+L 506.953846 293.865577 
+L 518.4 293.967362 
+" clip-path="url(#p1185134d18)" style="fill: none; stroke: #55a868; stroke-width: 1.5; stroke-linecap: round"/>
+   </g>
+   <g id="line2d_17">
+    <path d="M 72 136.479878 
+L 83.446154 56.309568 
+L 94.892308 113.489862 
+L 106.338462 116.952556 
+L 117.784615 119.488506 
+L 129.230769 129.668021 
+L 140.676923 134.074172 
+L 152.123077 140.348179 
+L 163.569231 146.208392 
+L 175.015385 148.580855 
+L 186.461538 145.687135 
+L 197.907692 153.104332 
+L 209.353846 155.817924 
+L 220.8 166.367399 
+L 232.246154 186.465197 
+L 243.692308 212.969007 
+L 255.138462 228.798185 
+L 266.584615 242.299424 
+L 278.030769 252.946887 
+L 289.476923 264.280352 
+L 300.923077 260.045711 
+L 312.369231 246.46932 
+L 323.815385 252.022952 
+L 335.261538 275.665034 
+L 346.707692 256.224693 
+L 358.153846 260.129384 
+L 369.6 255.160551 
+L 381.046154 252.305103 
+L 392.492308 261.313163 
+L 403.938462 260.210566 
+L 415.384615 260.971397 
+L 426.830769 276.441432 
+L 438.276923 260.644667 
+L 449.723077 258.857527 
+L 461.169231 253.939829 
+L 472.615385 252.91023 
+L 484.061538 258.367244 
+L 495.507692 267.230819 
+L 506.953846 271.063016 
+L 518.4 283.659277 
+" clip-path="url(#p1185134d18)" style="fill: none; stroke: #c44e52; stroke-width: 1.5; stroke-linecap: round"/>
+   </g>
+   <g id="patch_3">
+    <path d="M 72 320.4 
+L 72 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_4">
+    <path d="M 518.4 320.4 
+L 518.4 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_5">
+    <path d="M 72 320.4 
+L 518.4 320.4 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_6">
+    <path d="M 72 43.2 
+L 518.4 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="text_16">
+    <!-- Matrix multiplication -->
+    <g style="fill: #262626" transform="translate(223.167812 23.2)scale(0.14 -0.14)">
+     <defs>
+      <path id="DejaVuSans-6d" d="M 3328 2828 
+Q 3544 3216 3844 3400 
+Q 4144 3584 4550 3584 
+Q 5097 3584 5394 3201 
+Q 5691 2819 5691 2113 
+L 5691 0 
+L 5113 0 
+L 5113 2094 
+Q 5113 2597 4934 2840 
+Q 4756 3084 4391 3084 
+Q 3944 3084 3684 2787 
+Q 3425 2491 3425 1978 
+L 3425 0 
+L 2847 0 
+L 2847 2094 
+Q 2847 2600 2669 2842 
+Q 2491 3084 2119 3084 
+Q 1678 3084 1418 2786 
+Q 1159 2488 1159 1978 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1356 3278 1631 3431 
+Q 1906 3584 2284 3584 
+Q 2666 3584 2933 3390 
+Q 3200 3197 3328 2828 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-75" d="M 544 1381 
+L 544 3500 
+L 1119 3500 
+L 1119 1403 
+Q 1119 906 1312 657 
+Q 1506 409 1894 409 
+Q 2359 409 2629 706 
+Q 2900 1003 2900 1516 
+L 2900 3500 
+L 3475 3500 
+L 3475 0 
+L 2900 0 
+L 2900 538 
+Q 2691 219 2414 64 
+Q 2138 -91 1772 -91 
+Q 1169 -91 856 284 
+Q 544 659 544 1381 
+z
+M 1991 3584 
+L 1991 3584 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-6c" d="M 603 4863 
+L 1178 4863 
+L 1178 0 
+L 603 0 
+L 603 4863 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-70" d="M 1159 525 
+L 1159 -1331 
+L 581 -1331 
+L 581 3500 
+L 1159 3500 
+L 1159 2969 
+Q 1341 3281 1617 3432 
+Q 1894 3584 2278 3584 
+Q 2916 3584 3314 3078 
+Q 3713 2572 3713 1747 
+Q 3713 922 3314 415 
+Q 2916 -91 2278 -91 
+Q 1894 -91 1617 61 
+Q 1341 213 1159 525 
+z
+M 3116 1747 
+Q 3116 2381 2855 2742 
+Q 2594 3103 2138 3103 
+Q 1681 3103 1420 2742 
+Q 1159 2381 1159 1747 
+Q 1159 1113 1420 752 
+Q 1681 391 2138 391 
+Q 2594 391 2855 752 
+Q 3116 1113 3116 1747 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-63" d="M 3122 3366 
+L 3122 2828 
+Q 2878 2963 2633 3030 
+Q 2388 3097 2138 3097 
+Q 1578 3097 1268 2742 
+Q 959 2388 959 1747 
+Q 959 1106 1268 751 
+Q 1578 397 2138 397 
+Q 2388 397 2633 464 
+Q 2878 531 3122 666 
+L 3122 134 
+Q 2881 22 2623 -34 
+Q 2366 -91 2075 -91 
+Q 1284 -91 818 406 
+Q 353 903 353 1747 
+Q 353 2603 823 3093 
+Q 1294 3584 2113 3584 
+Q 2378 3584 2631 3529 
+Q 2884 3475 3122 3366 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-6f" d="M 1959 3097 
+Q 1497 3097 1228 2736 
+Q 959 2375 959 1747 
+Q 959 1119 1226 758 
+Q 1494 397 1959 397 
+Q 2419 397 2687 759 
+Q 2956 1122 2956 1747 
+Q 2956 2369 2687 2733 
+Q 2419 3097 1959 3097 
+z
+M 1959 3584 
+Q 2709 3584 3137 3096 
+Q 3566 2609 3566 1747 
+Q 3566 888 3137 398 
+Q 2709 -91 1959 -91 
+Q 1206 -91 779 398 
+Q 353 888 353 1747 
+Q 353 2609 779 3096 
+Q 1206 3584 1959 3584 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-6e" d="M 3513 2113 
+L 3513 0 
+L 2938 0 
+L 2938 2094 
+Q 2938 2591 2744 2837 
+Q 2550 3084 2163 3084 
+Q 1697 3084 1428 2787 
+Q 1159 2491 1159 1978 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1366 3272 1645 3428 
+Q 1925 3584 2291 3584 
+Q 2894 3584 3203 3211 
+Q 3513 2838 3513 2113 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-4d"/>
+     <use xlink:href="#DejaVuSans-61" x="86.279297"/>
+     <use xlink:href="#DejaVuSans-74" x="147.558594"/>
+     <use xlink:href="#DejaVuSans-72" x="186.767578"/>
+     <use xlink:href="#DejaVuSans-69" x="227.880859"/>
+     <use xlink:href="#DejaVuSans-78" x="255.664062"/>
+     <use xlink:href="#DejaVuSans-20" x="314.84375"/>
+     <use xlink:href="#DejaVuSans-6d" x="346.630859"/>
+     <use xlink:href="#DejaVuSans-75" x="444.042969"/>
+     <use xlink:href="#DejaVuSans-6c" x="507.421875"/>
+     <use xlink:href="#DejaVuSans-74" x="535.205078"/>
+     <use xlink:href="#DejaVuSans-69" x="574.414062"/>
+     <use xlink:href="#DejaVuSans-70" x="602.197266"/>
+     <use xlink:href="#DejaVuSans-6c" x="665.673828"/>
+     <use xlink:href="#DejaVuSans-69" x="693.457031"/>
+     <use xlink:href="#DejaVuSans-63" x="721.240234"/>
+     <use xlink:href="#DejaVuSans-61" x="776.220703"/>
+     <use xlink:href="#DejaVuSans-74" x="837.5"/>
+     <use xlink:href="#DejaVuSans-69" x="876.708984"/>
+     <use xlink:href="#DejaVuSans-6f" x="904.492188"/>
+     <use xlink:href="#DejaVuSans-6e" x="965.673828"/>
+    </g>
+   </g>
+   <g id="legend_1">
+    <g id="patch_7">
+     <path d="M 414.027187 116.58375 
+L 510.7 116.58375 
+Q 512.9 116.58375 512.9 114.38375 
+L 512.9 50.9 
+Q 512.9 48.7 510.7 48.7 
+L 414.027187 48.7 
+Q 411.827187 48.7 411.827187 50.9 
+L 411.827187 114.38375 
+Q 411.827187 116.58375 414.027187 116.58375 
+z
+" style="fill: #ffffff; opacity: 0.8; stroke: #cccccc; stroke-linejoin: miter"/>
+    </g>
+    <g id="line2d_18">
+     <path d="M 416.227187 57.608281 
+L 427.227187 57.608281 
+L 438.227187 57.608281 
+" style="fill: none; stroke: #4c72b0; stroke-width: 1.5; stroke-linecap: round"/>
+    </g>
+    <g id="text_17">
+     <!-- naive -->
+     <g style="fill: #262626" transform="translate(447.027187 61.458281)scale(0.11 -0.11)">
+      <defs>
+       <path id="DejaVuSans-76" d="M 191 3500 
+L 800 3500 
+L 1894 563 
+L 2988 3500 
+L 3597 3500 
+L 2284 0 
+L 1503 0 
+L 191 3500 
+z
+" transform="scale(0.015625)"/>
+      </defs>
+      <use xlink:href="#DejaVuSans-6e"/>
+      <use xlink:href="#DejaVuSans-61" x="63.378906"/>
+      <use xlink:href="#DejaVuSans-69" x="124.658203"/>
+      <use xlink:href="#DejaVuSans-76" x="152.441406"/>
+      <use xlink:href="#DejaVuSans-65" x="211.621094"/>
+     </g>
+    </g>
+    <g id="line2d_19">
+     <path d="M 416.227187 73.754219 
+L 427.227187 73.754219 
+L 438.227187 73.754219 
+" style="fill: none; stroke: #dd8452; stroke-width: 1.5; stroke-linecap: round"/>
+    </g>
+    <g id="text_18">
+     <!-- transposed -->
+     <g style="fill: #262626" transform="translate(447.027187 77.604219)scale(0.11 -0.11)">
+      <defs>
+       <path id="DejaVuSans-64" d="M 2906 2969 
+L 2906 4863 
+L 3481 4863 
+L 3481 0 
+L 2906 0 
+L 2906 525 
+Q 2725 213 2448 61 
+Q 2172 -91 1784 -91 
+Q 1150 -91 751 415 
+Q 353 922 353 1747 
+Q 353 2572 751 3078 
+Q 1150 3584 1784 3584 
+Q 2172 3584 2448 3432 
+Q 2725 3281 2906 2969 
+z
+M 947 1747 
+Q 947 1113 1208 752 
+Q 1469 391 1925 391 
+Q 2381 391 2643 752 
+Q 2906 1113 2906 1747 
+Q 2906 2381 2643 2742 
+Q 2381 3103 1925 3103 
+Q 1469 3103 1208 2742 
+Q 947 2381 947 1747 
+z
+" transform="scale(0.015625)"/>
+      </defs>
+      <use xlink:href="#DejaVuSans-74"/>
+      <use xlink:href="#DejaVuSans-72" x="39.208984"/>
+      <use xlink:href="#DejaVuSans-61" x="80.322266"/>
+      <use xlink:href="#DejaVuSans-6e" x="141.601562"/>
+      <use xlink:href="#DejaVuSans-73" x="204.980469"/>
+      <use xlink:href="#DejaVuSans-70" x="257.080078"/>
+      <use xlink:href="#DejaVuSans-6f" x="320.556641"/>
+      <use xlink:href="#DejaVuSans-73" x="381.738281"/>
+      <use xlink:href="#DejaVuSans-65" x="433.837891"/>
+      <use xlink:href="#DejaVuSans-64" x="495.361328"/>
+     </g>
+    </g>
+    <g id="line2d_20">
+     <path d="M 416.227187 89.900156 
+L 427.227187 89.900156 
+L 438.227187 89.900156 
+" style="fill: none; stroke: #55a868; stroke-width: 1.5; stroke-linecap: round"/>
+    </g>
+    <g id="text_19">
+     <!-- vectorized -->
+     <g style="fill: #262626" transform="translate(447.027187 93.750156)scale(0.11 -0.11)">
+      <use xlink:href="#DejaVuSans-76"/>
+      <use xlink:href="#DejaVuSans-65" x="59.179688"/>
+      <use xlink:href="#DejaVuSans-63" x="120.703125"/>
+      <use xlink:href="#DejaVuSans-74" x="175.683594"/>
+      <use xlink:href="#DejaVuSans-6f" x="214.892578"/>
+      <use xlink:href="#DejaVuSans-72" x="276.074219"/>
+      <use xlink:href="#DejaVuSans-69" x="317.1875"/>
+      <use xlink:href="#DejaVuSans-7a" x="344.970703"/>
+      <use xlink:href="#DejaVuSans-65" x="397.460938"/>
+      <use xlink:href="#DejaVuSans-64" x="458.984375"/>
+     </g>
+    </g>
+    <g id="line2d_21">
+     <path d="M 416.227187 106.046094 
+L 427.227187 106.046094 
+L 438.227187 106.046094 
+" style="fill: none; stroke: #c44e52; stroke-width: 1.5; stroke-linecap: round"/>
+    </g>
+    <g id="text_20">
+     <!-- kernel -->
+     <g style="fill: #262626" transform="translate(447.027187 109.896094)scale(0.11 -0.11)">
+      <defs>
+       <path id="DejaVuSans-6b" d="M 581 4863 
+L 1159 4863 
+L 1159 1991 
+L 2875 3500 
+L 3609 3500 
+L 1753 1863 
+L 3688 0 
+L 2938 0 
+L 1159 1709 
+L 1159 0 
+L 581 0 
+L 581 4863 
+z
+" transform="scale(0.015625)"/>
+      </defs>
+      <use xlink:href="#DejaVuSans-6b"/>
+      <use xlink:href="#DejaVuSans-65" x="54.285156"/>
+      <use xlink:href="#DejaVuSans-72" x="115.808594"/>
+      <use xlink:href="#DejaVuSans-6e" x="155.171875"/>
+      <use xlink:href="#DejaVuSans-65" x="218.550781"/>
+      <use xlink:href="#DejaVuSans-6c" x="280.074219"/>
+     </g>
+    </g>
+   </g>
+  </g>
+ </g>
+ <defs>
+  <clipPath id="p1185134d18">
+   <rect x="72" y="43.2" width="446.4" height="277.2"/>
+  </clipPath>
+ </defs>
+</svg>
diff --git a/content/english/hpc/algorithms/img/mm-noalloc.svg b/content/english/hpc/algorithms/img/mm-noalloc.svg
new file mode 100644
index 00000000..a4911ea0
--- /dev/null
+++ b/content/english/hpc/algorithms/img/mm-noalloc.svg
@@ -0,0 +1,1344 @@
+<?xml version="1.0" encoding="utf-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+  "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<svg xmlns:xlink="http://www.w3.org/1999/xlink" width="576pt" height="360pt" viewBox="0 0 576 360" xmlns="http://www.w3.org/2000/svg" version="1.1">
+ <metadata>
+  <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
+   <cc:Work>
+    <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/>
+    <dc:date>2022-04-05T01:19:35.314892</dc:date>
+    <dc:format>image/svg+xml</dc:format>
+    <dc:creator>
+     <cc:Agent>
+      <dc:title>Matplotlib v3.5.1, https://matplotlib.org/</dc:title>
+     </cc:Agent>
+    </dc:creator>
+   </cc:Work>
+  </rdf:RDF>
+ </metadata>
+ <defs>
+  <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style>
+ </defs>
+ <g id="figure_1">
+  <g id="patch_1">
+   <path d="M 0 360 
+L 576 360 
+L 576 0 
+L 0 0 
+z
+" style="fill: #ffffff"/>
+  </g>
+  <g id="axes_1">
+   <g id="patch_2">
+    <path d="M 72 320.4 
+L 518.4 320.4 
+L 518.4 43.2 
+L 72 43.2 
+z
+" style="fill: #ffffff"/>
+   </g>
+   <g id="matplotlib.axis_1">
+    <g id="xtick_1">
+     <g id="line2d_1">
+      <path d="M 120.27837 320.4 
+L 120.27837 43.2 
+" clip-path="url(#p47e68bb29c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_1">
+      <!-- naive -->
+      <g style="fill: #262626" transform="translate(106.620557 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-6e" d="M 3513 2113 
+L 3513 0 
+L 2938 0 
+L 2938 2094 
+Q 2938 2591 2744 2837 
+Q 2550 3084 2163 3084 
+Q 1697 3084 1428 2787 
+Q 1159 2491 1159 1978 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1366 3272 1645 3428 
+Q 1925 3584 2291 3584 
+Q 2894 3584 3203 3211 
+Q 3513 2838 3513 2113 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-61" d="M 2194 1759 
+Q 1497 1759 1228 1600 
+Q 959 1441 959 1056 
+Q 959 750 1161 570 
+Q 1363 391 1709 391 
+Q 2188 391 2477 730 
+Q 2766 1069 2766 1631 
+L 2766 1759 
+L 2194 1759 
+z
+M 3341 1997 
+L 3341 0 
+L 2766 0 
+L 2766 531 
+Q 2569 213 2275 61 
+Q 1981 -91 1556 -91 
+Q 1019 -91 701 211 
+Q 384 513 384 1019 
+Q 384 1609 779 1909 
+Q 1175 2209 1959 2209 
+L 2766 2209 
+L 2766 2266 
+Q 2766 2663 2505 2880 
+Q 2244 3097 1772 3097 
+Q 1472 3097 1187 3025 
+Q 903 2953 641 2809 
+L 641 3341 
+Q 956 3463 1253 3523 
+Q 1550 3584 1831 3584 
+Q 2591 3584 2966 3190 
+Q 3341 2797 3341 1997 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-69" d="M 603 3500 
+L 1178 3500 
+L 1178 0 
+L 603 0 
+L 603 3500 
+z
+M 603 4863 
+L 1178 4863 
+L 1178 4134 
+L 603 4134 
+L 603 4863 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-76" d="M 191 3500 
+L 800 3500 
+L 1894 563 
+L 2988 3500 
+L 3597 3500 
+L 2284 0 
+L 1503 0 
+L 191 3500 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-65" d="M 3597 1894 
+L 3597 1613 
+L 953 1613 
+Q 991 1019 1311 708 
+Q 1631 397 2203 397 
+Q 2534 397 2845 478 
+Q 3156 559 3463 722 
+L 3463 178 
+Q 3153 47 2828 -22 
+Q 2503 -91 2169 -91 
+Q 1331 -91 842 396 
+Q 353 884 353 1716 
+Q 353 2575 817 3079 
+Q 1281 3584 2069 3584 
+Q 2775 3584 3186 3129 
+Q 3597 2675 3597 1894 
+z
+M 3022 2063 
+Q 3016 2534 2758 2815 
+Q 2500 3097 2075 3097 
+Q 1594 3097 1305 2825 
+Q 1016 2553 972 2059 
+L 3022 2063 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-6e"/>
+       <use xlink:href="#DejaVuSans-61" x="63.378906"/>
+       <use xlink:href="#DejaVuSans-69" x="124.658203"/>
+       <use xlink:href="#DejaVuSans-76" x="152.441406"/>
+       <use xlink:href="#DejaVuSans-65" x="211.621094"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_2">
+     <g id="line2d_2">
+      <path d="M 190.247022 320.4 
+L 190.247022 43.2 
+" clip-path="url(#p47e68bb29c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_2">
+      <!-- transposed -->
+      <g style="fill: #262626" transform="translate(162.304834 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-74" d="M 1172 4494 
+L 1172 3500 
+L 2356 3500 
+L 2356 3053 
+L 1172 3053 
+L 1172 1153 
+Q 1172 725 1289 603 
+Q 1406 481 1766 481 
+L 2356 481 
+L 2356 0 
+L 1766 0 
+Q 1100 0 847 248 
+Q 594 497 594 1153 
+L 594 3053 
+L 172 3053 
+L 172 3500 
+L 594 3500 
+L 594 4494 
+L 1172 4494 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-72" d="M 2631 2963 
+Q 2534 3019 2420 3045 
+Q 2306 3072 2169 3072 
+Q 1681 3072 1420 2755 
+Q 1159 2438 1159 1844 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1341 3275 1631 3429 
+Q 1922 3584 2338 3584 
+Q 2397 3584 2469 3576 
+Q 2541 3569 2628 3553 
+L 2631 2963 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-73" d="M 2834 3397 
+L 2834 2853 
+Q 2591 2978 2328 3040 
+Q 2066 3103 1784 3103 
+Q 1356 3103 1142 2972 
+Q 928 2841 928 2578 
+Q 928 2378 1081 2264 
+Q 1234 2150 1697 2047 
+L 1894 2003 
+Q 2506 1872 2764 1633 
+Q 3022 1394 3022 966 
+Q 3022 478 2636 193 
+Q 2250 -91 1575 -91 
+Q 1294 -91 989 -36 
+Q 684 19 347 128 
+L 347 722 
+Q 666 556 975 473 
+Q 1284 391 1588 391 
+Q 1994 391 2212 530 
+Q 2431 669 2431 922 
+Q 2431 1156 2273 1281 
+Q 2116 1406 1581 1522 
+L 1381 1569 
+Q 847 1681 609 1914 
+Q 372 2147 372 2553 
+Q 372 3047 722 3315 
+Q 1072 3584 1716 3584 
+Q 2034 3584 2315 3537 
+Q 2597 3491 2834 3397 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-70" d="M 1159 525 
+L 1159 -1331 
+L 581 -1331 
+L 581 3500 
+L 1159 3500 
+L 1159 2969 
+Q 1341 3281 1617 3432 
+Q 1894 3584 2278 3584 
+Q 2916 3584 3314 3078 
+Q 3713 2572 3713 1747 
+Q 3713 922 3314 415 
+Q 2916 -91 2278 -91 
+Q 1894 -91 1617 61 
+Q 1341 213 1159 525 
+z
+M 3116 1747 
+Q 3116 2381 2855 2742 
+Q 2594 3103 2138 3103 
+Q 1681 3103 1420 2742 
+Q 1159 2381 1159 1747 
+Q 1159 1113 1420 752 
+Q 1681 391 2138 391 
+Q 2594 391 2855 752 
+Q 3116 1113 3116 1747 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-6f" d="M 1959 3097 
+Q 1497 3097 1228 2736 
+Q 959 2375 959 1747 
+Q 959 1119 1226 758 
+Q 1494 397 1959 397 
+Q 2419 397 2687 759 
+Q 2956 1122 2956 1747 
+Q 2956 2369 2687 2733 
+Q 2419 3097 1959 3097 
+z
+M 1959 3584 
+Q 2709 3584 3137 3096 
+Q 3566 2609 3566 1747 
+Q 3566 888 3137 398 
+Q 2709 -91 1959 -91 
+Q 1206 -91 779 398 
+Q 353 888 353 1747 
+Q 353 2609 779 3096 
+Q 1206 3584 1959 3584 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-64" d="M 2906 2969 
+L 2906 4863 
+L 3481 4863 
+L 3481 0 
+L 2906 0 
+L 2906 525 
+Q 2725 213 2448 61 
+Q 2172 -91 1784 -91 
+Q 1150 -91 751 415 
+Q 353 922 353 1747 
+Q 353 2572 751 3078 
+Q 1150 3584 1784 3584 
+Q 2172 3584 2448 3432 
+Q 2725 3281 2906 2969 
+z
+M 947 1747 
+Q 947 1113 1208 752 
+Q 1469 391 1925 391 
+Q 2381 391 2643 752 
+Q 2906 1113 2906 1747 
+Q 2906 2381 2643 2742 
+Q 2381 3103 1925 3103 
+Q 1469 3103 1208 2742 
+Q 947 2381 947 1747 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-74"/>
+       <use xlink:href="#DejaVuSans-72" x="39.208984"/>
+       <use xlink:href="#DejaVuSans-61" x="80.322266"/>
+       <use xlink:href="#DejaVuSans-6e" x="141.601562"/>
+       <use xlink:href="#DejaVuSans-73" x="204.980469"/>
+       <use xlink:href="#DejaVuSans-70" x="257.080078"/>
+       <use xlink:href="#DejaVuSans-6f" x="320.556641"/>
+       <use xlink:href="#DejaVuSans-73" x="381.738281"/>
+       <use xlink:href="#DejaVuSans-65" x="433.837891"/>
+       <use xlink:href="#DejaVuSans-64" x="495.361328"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_3">
+     <g id="line2d_3">
+      <path d="M 260.215674 320.4 
+L 260.215674 43.2 
+" clip-path="url(#p47e68bb29c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_3">
+      <!-- vectorized -->
+      <g style="fill: #262626" transform="translate(234.091455 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-63" d="M 3122 3366 
+L 3122 2828 
+Q 2878 2963 2633 3030 
+Q 2388 3097 2138 3097 
+Q 1578 3097 1268 2742 
+Q 959 2388 959 1747 
+Q 959 1106 1268 751 
+Q 1578 397 2138 397 
+Q 2388 397 2633 464 
+Q 2878 531 3122 666 
+L 3122 134 
+Q 2881 22 2623 -34 
+Q 2366 -91 2075 -91 
+Q 1284 -91 818 406 
+Q 353 903 353 1747 
+Q 353 2603 823 3093 
+Q 1294 3584 2113 3584 
+Q 2378 3584 2631 3529 
+Q 2884 3475 3122 3366 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-7a" d="M 353 3500 
+L 3084 3500 
+L 3084 2975 
+L 922 459 
+L 3084 459 
+L 3084 0 
+L 275 0 
+L 275 525 
+L 2438 3041 
+L 353 3041 
+L 353 3500 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-76"/>
+       <use xlink:href="#DejaVuSans-65" x="59.179688"/>
+       <use xlink:href="#DejaVuSans-63" x="120.703125"/>
+       <use xlink:href="#DejaVuSans-74" x="175.683594"/>
+       <use xlink:href="#DejaVuSans-6f" x="214.892578"/>
+       <use xlink:href="#DejaVuSans-72" x="276.074219"/>
+       <use xlink:href="#DejaVuSans-69" x="317.1875"/>
+       <use xlink:href="#DejaVuSans-7a" x="344.970703"/>
+       <use xlink:href="#DejaVuSans-65" x="397.460938"/>
+       <use xlink:href="#DejaVuSans-64" x="458.984375"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_4">
+     <g id="line2d_4">
+      <path d="M 330.184326 320.4 
+L 330.184326 43.2 
+" clip-path="url(#p47e68bb29c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_4">
+      <!-- kernel -->
+      <g style="fill: #262626" transform="translate(314.791357 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-6b" d="M 581 4863 
+L 1159 4863 
+L 1159 1991 
+L 2875 3500 
+L 3609 3500 
+L 1753 1863 
+L 3688 0 
+L 2938 0 
+L 1159 1709 
+L 1159 0 
+L 581 0 
+L 581 4863 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-6c" d="M 603 4863 
+L 1178 4863 
+L 1178 0 
+L 603 0 
+L 603 4863 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-6b"/>
+       <use xlink:href="#DejaVuSans-65" x="54.285156"/>
+       <use xlink:href="#DejaVuSans-72" x="115.808594"/>
+       <use xlink:href="#DejaVuSans-6e" x="155.171875"/>
+       <use xlink:href="#DejaVuSans-65" x="218.550781"/>
+       <use xlink:href="#DejaVuSans-6c" x="280.074219"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_5">
+     <g id="line2d_5">
+      <path d="M 400.152978 320.4 
+L 400.152978 43.2 
+" clip-path="url(#p47e68bb29c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_5">
+      <!-- blocked -->
+      <g style="fill: #262626" transform="translate(380.816259 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-62" d="M 3116 1747 
+Q 3116 2381 2855 2742 
+Q 2594 3103 2138 3103 
+Q 1681 3103 1420 2742 
+Q 1159 2381 1159 1747 
+Q 1159 1113 1420 752 
+Q 1681 391 2138 391 
+Q 2594 391 2855 752 
+Q 3116 1113 3116 1747 
+z
+M 1159 2969 
+Q 1341 3281 1617 3432 
+Q 1894 3584 2278 3584 
+Q 2916 3584 3314 3078 
+Q 3713 2572 3713 1747 
+Q 3713 922 3314 415 
+Q 2916 -91 2278 -91 
+Q 1894 -91 1617 61 
+Q 1341 213 1159 525 
+L 1159 0 
+L 581 0 
+L 581 4863 
+L 1159 4863 
+L 1159 2969 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-62"/>
+       <use xlink:href="#DejaVuSans-6c" x="63.476562"/>
+       <use xlink:href="#DejaVuSans-6f" x="91.259766"/>
+       <use xlink:href="#DejaVuSans-63" x="152.441406"/>
+       <use xlink:href="#DejaVuSans-6b" x="207.421875"/>
+       <use xlink:href="#DejaVuSans-65" x="261.707031"/>
+       <use xlink:href="#DejaVuSans-64" x="323.230469"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_6">
+     <g id="line2d_6">
+      <path d="M 470.12163 320.4 
+L 470.12163 43.2 
+" clip-path="url(#p47e68bb29c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_6">
+      <!-- in-place -->
+      <g style="fill: #262626" transform="translate(450.306786 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-2d" d="M 313 2009 
+L 1997 2009 
+L 1997 1497 
+L 313 1497 
+L 313 2009 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-69"/>
+       <use xlink:href="#DejaVuSans-6e" x="27.783203"/>
+       <use xlink:href="#DejaVuSans-2d" x="91.162109"/>
+       <use xlink:href="#DejaVuSans-70" x="127.246094"/>
+       <use xlink:href="#DejaVuSans-6c" x="190.722656"/>
+       <use xlink:href="#DejaVuSans-61" x="218.505859"/>
+       <use xlink:href="#DejaVuSans-63" x="279.785156"/>
+       <use xlink:href="#DejaVuSans-65" x="334.765625"/>
+      </g>
+     </g>
+    </g>
+   </g>
+   <g id="matplotlib.axis_2">
+    <g id="ytick_1">
+     <g id="line2d_7">
+      <path d="M 72 320.4 
+L 518.4 320.4 
+" clip-path="url(#p47e68bb29c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_7">
+      <!-- 0 -->
+      <g style="fill: #262626" transform="translate(55.50125 324.579141)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-30" d="M 2034 4250 
+Q 1547 4250 1301 3770 
+Q 1056 3291 1056 2328 
+Q 1056 1369 1301 889 
+Q 1547 409 2034 409 
+Q 2525 409 2770 889 
+Q 3016 1369 3016 2328 
+Q 3016 3291 2770 3770 
+Q 2525 4250 2034 4250 
+z
+M 2034 4750 
+Q 2819 4750 3233 4129 
+Q 3647 3509 3647 2328 
+Q 3647 1150 3233 529 
+Q 2819 -91 2034 -91 
+Q 1250 -91 836 529 
+Q 422 1150 422 2328 
+Q 422 3509 836 4129 
+Q 1250 4750 2034 4750 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-30"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_2">
+     <g id="line2d_8">
+      <path d="M 72 267.092308 
+L 518.4 267.092308 
+" clip-path="url(#p47e68bb29c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_8">
+      <!-- 5 -->
+      <g style="fill: #262626" transform="translate(55.50125 271.271448)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-35" d="M 691 4666 
+L 3169 4666 
+L 3169 4134 
+L 1269 4134 
+L 1269 2991 
+Q 1406 3038 1543 3061 
+Q 1681 3084 1819 3084 
+Q 2600 3084 3056 2656 
+Q 3513 2228 3513 1497 
+Q 3513 744 3044 326 
+Q 2575 -91 1722 -91 
+Q 1428 -91 1123 -41 
+Q 819 9 494 109 
+L 494 744 
+Q 775 591 1075 516 
+Q 1375 441 1709 441 
+Q 2250 441 2565 725 
+Q 2881 1009 2881 1497 
+Q 2881 1984 2565 2268 
+Q 2250 2553 1709 2553 
+Q 1456 2553 1204 2497 
+Q 953 2441 691 2322 
+L 691 4666 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-35"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_3">
+     <g id="line2d_9">
+      <path d="M 72 213.784615 
+L 518.4 213.784615 
+" clip-path="url(#p47e68bb29c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_9">
+      <!-- 10 -->
+      <g style="fill: #262626" transform="translate(48.5025 217.963756)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-31" d="M 794 531 
+L 1825 531 
+L 1825 4091 
+L 703 3866 
+L 703 4441 
+L 1819 4666 
+L 2450 4666 
+L 2450 531 
+L 3481 531 
+L 3481 0 
+L 794 0 
+L 794 531 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-30" x="63.623047"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_4">
+     <g id="line2d_10">
+      <path d="M 72 160.476923 
+L 518.4 160.476923 
+" clip-path="url(#p47e68bb29c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_10">
+      <!-- 15 -->
+      <g style="fill: #262626" transform="translate(48.5025 164.656064)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-35" x="63.623047"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_5">
+     <g id="line2d_11">
+      <path d="M 72 107.169231 
+L 518.4 107.169231 
+" clip-path="url(#p47e68bb29c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_11">
+      <!-- 20 -->
+      <g style="fill: #262626" transform="translate(48.5025 111.348371)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-32" d="M 1228 531 
+L 3431 531 
+L 3431 0 
+L 469 0 
+L 469 531 
+Q 828 903 1448 1529 
+Q 2069 2156 2228 2338 
+Q 2531 2678 2651 2914 
+Q 2772 3150 2772 3378 
+Q 2772 3750 2511 3984 
+Q 2250 4219 1831 4219 
+Q 1534 4219 1204 4116 
+Q 875 4013 500 3803 
+L 500 4441 
+Q 881 4594 1212 4672 
+Q 1544 4750 1819 4750 
+Q 2544 4750 2975 4387 
+Q 3406 4025 3406 3419 
+Q 3406 3131 3298 2873 
+Q 3191 2616 2906 2266 
+Q 2828 2175 2409 1742 
+Q 1991 1309 1228 531 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-32"/>
+       <use xlink:href="#DejaVuSans-30" x="63.623047"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_6">
+     <g id="line2d_12">
+      <path d="M 72 53.861538 
+L 518.4 53.861538 
+" clip-path="url(#p47e68bb29c)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_12">
+      <!-- 25 -->
+      <g style="fill: #262626" transform="translate(48.5025 58.040679)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-32"/>
+       <use xlink:href="#DejaVuSans-35" x="63.623047"/>
+      </g>
+     </g>
+    </g>
+    <g id="text_13">
+     <!-- GFLOPS -->
+     <g style="fill: #262626" transform="translate(42.006875 205.175625)rotate(-90)scale(0.12 -0.12)">
+      <defs>
+       <path id="DejaVuSans-47" d="M 3809 666 
+L 3809 1919 
+L 2778 1919 
+L 2778 2438 
+L 4434 2438 
+L 4434 434 
+Q 4069 175 3628 42 
+Q 3188 -91 2688 -91 
+Q 1594 -91 976 548 
+Q 359 1188 359 2328 
+Q 359 3472 976 4111 
+Q 1594 4750 2688 4750 
+Q 3144 4750 3555 4637 
+Q 3966 4525 4313 4306 
+L 4313 3634 
+Q 3963 3931 3569 4081 
+Q 3175 4231 2741 4231 
+Q 1884 4231 1454 3753 
+Q 1025 3275 1025 2328 
+Q 1025 1384 1454 906 
+Q 1884 428 2741 428 
+Q 3075 428 3337 486 
+Q 3600 544 3809 666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-46" d="M 628 4666 
+L 3309 4666 
+L 3309 4134 
+L 1259 4134 
+L 1259 2759 
+L 3109 2759 
+L 3109 2228 
+L 1259 2228 
+L 1259 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-4c" d="M 628 4666 
+L 1259 4666 
+L 1259 531 
+L 3531 531 
+L 3531 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-4f" d="M 2522 4238 
+Q 1834 4238 1429 3725 
+Q 1025 3213 1025 2328 
+Q 1025 1447 1429 934 
+Q 1834 422 2522 422 
+Q 3209 422 3611 934 
+Q 4013 1447 4013 2328 
+Q 4013 3213 3611 3725 
+Q 3209 4238 2522 4238 
+z
+M 2522 4750 
+Q 3503 4750 4090 4092 
+Q 4678 3434 4678 2328 
+Q 4678 1225 4090 567 
+Q 3503 -91 2522 -91 
+Q 1538 -91 948 565 
+Q 359 1222 359 2328 
+Q 359 3434 948 4092 
+Q 1538 4750 2522 4750 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-50" d="M 1259 4147 
+L 1259 2394 
+L 2053 2394 
+Q 2494 2394 2734 2622 
+Q 2975 2850 2975 3272 
+Q 2975 3691 2734 3919 
+Q 2494 4147 2053 4147 
+L 1259 4147 
+z
+M 628 4666 
+L 2053 4666 
+Q 2838 4666 3239 4311 
+Q 3641 3956 3641 3272 
+Q 3641 2581 3239 2228 
+Q 2838 1875 2053 1875 
+L 1259 1875 
+L 1259 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-53" d="M 3425 4513 
+L 3425 3897 
+Q 3066 4069 2747 4153 
+Q 2428 4238 2131 4238 
+Q 1616 4238 1336 4038 
+Q 1056 3838 1056 3469 
+Q 1056 3159 1242 3001 
+Q 1428 2844 1947 2747 
+L 2328 2669 
+Q 3034 2534 3370 2195 
+Q 3706 1856 3706 1288 
+Q 3706 609 3251 259 
+Q 2797 -91 1919 -91 
+Q 1588 -91 1214 -16 
+Q 841 59 441 206 
+L 441 856 
+Q 825 641 1194 531 
+Q 1563 422 1919 422 
+Q 2459 422 2753 634 
+Q 3047 847 3047 1241 
+Q 3047 1584 2836 1778 
+Q 2625 1972 2144 2069 
+L 1759 2144 
+Q 1053 2284 737 2584 
+Q 422 2884 422 3419 
+Q 422 4038 858 4394 
+Q 1294 4750 2059 4750 
+Q 2388 4750 2728 4690 
+Q 3069 4631 3425 4513 
+z
+" transform="scale(0.015625)"/>
+      </defs>
+      <use xlink:href="#DejaVuSans-47"/>
+      <use xlink:href="#DejaVuSans-46" x="77.490234"/>
+      <use xlink:href="#DejaVuSans-4c" x="135.009766"/>
+      <use xlink:href="#DejaVuSans-4f" x="187.097656"/>
+      <use xlink:href="#DejaVuSans-50" x="265.808594"/>
+      <use xlink:href="#DejaVuSans-53" x="326.111328"/>
+     </g>
+    </g>
+   </g>
+   <g id="patch_3">
+    <path d="M 92.290909 320.4 
+L 148.265831 320.4 
+L 148.265831 315.894827 
+L 92.290909 315.894827 
+z
+" clip-path="url(#p47e68bb29c)" style="fill: #4c72b0; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_4">
+    <path d="M 162.259561 320.4 
+L 218.234483 320.4 
+L 218.234483 314.303261 
+L 162.259561 314.303261 
+z
+" clip-path="url(#p47e68bb29c)" style="fill: #dd8452; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_5">
+    <path d="M 232.228213 320.4 
+L 288.203135 320.4 
+L 288.203135 296.192117 
+L 232.228213 296.192117 
+z
+" clip-path="url(#p47e68bb29c)" style="fill: #55a868; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_6">
+    <path d="M 302.196865 320.4 
+L 358.171787 320.4 
+L 358.171787 286.751632 
+L 302.196865 286.751632 
+z
+" clip-path="url(#p47e68bb29c)" style="fill: #c44e52; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_7">
+    <path d="M 372.165517 320.4 
+L 428.140439 320.4 
+L 428.140439 156.879012 
+L 372.165517 156.879012 
+z
+" clip-path="url(#p47e68bb29c)" style="fill: #8172b3; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_8">
+    <path d="M 442.134169 320.4 
+L 498.109091 320.4 
+L 498.109091 72.030291 
+L 442.134169 72.030291 
+z
+" clip-path="url(#p47e68bb29c)" style="fill: #937860; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_9">
+    <path d="M 72 320.4 
+L 72 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_10">
+    <path d="M 518.4 320.4 
+L 518.4 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_11">
+    <path d="M 72 320.4 
+L 518.4 320.4 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_12">
+    <path d="M 72 43.2 
+L 518.4 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="text_14">
+    <!-- 1.00x -->
+    <g style="fill: #262626" transform="translate(101.605245 310.899202)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-31" d="M 750 831 
+L 1813 831 
+L 1813 3847 
+L 722 3622 
+L 722 4441 
+L 1806 4666 
+L 2950 4666 
+L 2950 831 
+L 4013 831 
+L 4013 0 
+L 750 0 
+L 750 831 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Bold-2e" d="M 653 1209 
+L 1778 1209 
+L 1778 0 
+L 653 0 
+L 653 1209 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Bold-30" d="M 2944 2338 
+Q 2944 3213 2780 3570 
+Q 2616 3928 2228 3928 
+Q 1841 3928 1675 3570 
+Q 1509 3213 1509 2338 
+Q 1509 1453 1675 1090 
+Q 1841 728 2228 728 
+Q 2613 728 2778 1090 
+Q 2944 1453 2944 2338 
+z
+M 4147 2328 
+Q 4147 1169 3647 539 
+Q 3147 -91 2228 -91 
+Q 1306 -91 806 539 
+Q 306 1169 306 2328 
+Q 306 3491 806 4120 
+Q 1306 4750 2228 4750 
+Q 3147 4750 3647 4120 
+Q 4147 3491 4147 2328 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Bold-78" d="M 1422 1791 
+L 159 3500 
+L 1344 3500 
+L 2059 2463 
+L 2784 3500 
+L 3969 3500 
+L 2706 1797 
+L 4031 0 
+L 2847 0 
+L 2059 1106 
+L 1281 0 
+L 97 0 
+L 1422 1791 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-31"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-30" x="107.568359"/>
+     <use xlink:href="#DejaVuSans-Bold-30" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="246.728516"/>
+    </g>
+   </g>
+   <g id="text_15">
+    <!-- 1.35x -->
+    <g style="fill: #262626" transform="translate(171.573897 309.307636)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-33" d="M 2981 2516 
+Q 3453 2394 3698 2092 
+Q 3944 1791 3944 1325 
+Q 3944 631 3412 270 
+Q 2881 -91 1863 -91 
+Q 1503 -91 1142 -33 
+Q 781 25 428 141 
+L 428 1069 
+Q 766 900 1098 814 
+Q 1431 728 1753 728 
+Q 2231 728 2486 893 
+Q 2741 1059 2741 1369 
+Q 2741 1688 2480 1852 
+Q 2219 2016 1709 2016 
+L 1228 2016 
+L 1228 2791 
+L 1734 2791 
+Q 2188 2791 2409 2933 
+Q 2631 3075 2631 3366 
+Q 2631 3634 2415 3781 
+Q 2200 3928 1806 3928 
+Q 1516 3928 1219 3862 
+Q 922 3797 628 3669 
+L 628 4550 
+Q 984 4650 1334 4700 
+Q 1684 4750 2022 4750 
+Q 2931 4750 3382 4451 
+Q 3834 4153 3834 3553 
+Q 3834 3144 3618 2883 
+Q 3403 2622 2981 2516 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Bold-35" d="M 678 4666 
+L 3669 4666 
+L 3669 3781 
+L 1638 3781 
+L 1638 3059 
+Q 1775 3097 1914 3117 
+Q 2053 3138 2203 3138 
+Q 3056 3138 3531 2711 
+Q 4006 2284 4006 1522 
+Q 4006 766 3489 337 
+Q 2972 -91 2053 -91 
+Q 1656 -91 1267 -14 
+Q 878 63 494 219 
+L 494 1166 
+Q 875 947 1217 837 
+Q 1559 728 1863 728 
+Q 2300 728 2551 942 
+Q 2803 1156 2803 1522 
+Q 2803 1891 2551 2103 
+Q 2300 2316 1863 2316 
+Q 1603 2316 1309 2248 
+Q 1016 2181 678 2041 
+L 678 4666 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-31"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-33" x="107.568359"/>
+     <use xlink:href="#DejaVuSans-Bold-35" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="246.728516"/>
+    </g>
+   </g>
+   <g id="text_16">
+    <!-- 5.37x -->
+    <g style="fill: #262626" transform="translate(241.542549 291.196492)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-37" d="M 428 4666 
+L 3944 4666 
+L 3944 3988 
+L 2125 0 
+L 953 0 
+L 2675 3781 
+L 428 3781 
+L 428 4666 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-35"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-33" x="107.568359"/>
+     <use xlink:href="#DejaVuSans-Bold-37" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="246.728516"/>
+    </g>
+   </g>
+   <g id="text_17">
+    <!-- 7.47x -->
+    <g style="fill: #262626" transform="translate(311.511201 281.756007)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-34" d="M 2356 3675 
+L 1038 1722 
+L 2356 1722 
+L 2356 3675 
+z
+M 2156 4666 
+L 3494 4666 
+L 3494 1722 
+L 4159 1722 
+L 4159 850 
+L 3494 850 
+L 3494 0 
+L 2356 0 
+L 2356 850 
+L 288 850 
+L 288 1881 
+L 2156 4666 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-37"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-34" x="107.568359"/>
+     <use xlink:href="#DejaVuSans-Bold-37" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="246.728516"/>
+    </g>
+   </g>
+   <g id="text_18">
+    <!-- 36.30x -->
+    <g style="fill: #262626" transform="translate(377.305166 151.883387)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-36" d="M 2316 2303 
+Q 2000 2303 1842 2098 
+Q 1684 1894 1684 1484 
+Q 1684 1075 1842 870 
+Q 2000 666 2316 666 
+Q 2634 666 2792 870 
+Q 2950 1075 2950 1484 
+Q 2950 1894 2792 2098 
+Q 2634 2303 2316 2303 
+z
+M 3803 4544 
+L 3803 3681 
+Q 3506 3822 3243 3889 
+Q 2981 3956 2731 3956 
+Q 2194 3956 1894 3657 
+Q 1594 3359 1544 2772 
+Q 1750 2925 1990 3001 
+Q 2231 3078 2516 3078 
+Q 3231 3078 3670 2659 
+Q 4109 2241 4109 1563 
+Q 4109 813 3618 361 
+Q 3128 -91 2303 -91 
+Q 1394 -91 895 523 
+Q 397 1138 397 2266 
+Q 397 3422 980 4083 
+Q 1563 4744 2578 4744 
+Q 2900 4744 3203 4694 
+Q 3506 4644 3803 4544 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-33"/>
+     <use xlink:href="#DejaVuSans-Bold-36" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="139.160156"/>
+     <use xlink:href="#DejaVuSans-Bold-33" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-30" x="246.728516"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="316.308594"/>
+    </g>
+   </g>
+   <g id="text_19">
+    <!-- 55.13x -->
+    <g style="fill: #262626" transform="translate(447.273818 67.034666)scale(0.12 -0.12)">
+     <use xlink:href="#DejaVuSans-Bold-35"/>
+     <use xlink:href="#DejaVuSans-Bold-35" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="139.160156"/>
+     <use xlink:href="#DejaVuSans-Bold-31" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-33" x="246.728516"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="316.308594"/>
+    </g>
+   </g>
+   <g id="text_20">
+    <!-- Matrix multiplication ($n=1920$) -->
+    <g style="fill: #262626" transform="translate(184.74 23.2)scale(0.14 -0.14)">
+     <defs>
+      <path id="DejaVuSans-4d" d="M 628 4666 
+L 1569 4666 
+L 2759 1491 
+L 3956 4666 
+L 4897 4666 
+L 4897 0 
+L 4281 0 
+L 4281 4097 
+L 3078 897 
+L 2444 897 
+L 1241 4097 
+L 1241 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-78" d="M 3513 3500 
+L 2247 1797 
+L 3578 0 
+L 2900 0 
+L 1881 1375 
+L 863 0 
+L 184 0 
+L 1544 1831 
+L 300 3500 
+L 978 3500 
+L 1906 2253 
+L 2834 3500 
+L 3513 3500 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-20" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-6d" d="M 3328 2828 
+Q 3544 3216 3844 3400 
+Q 4144 3584 4550 3584 
+Q 5097 3584 5394 3201 
+Q 5691 2819 5691 2113 
+L 5691 0 
+L 5113 0 
+L 5113 2094 
+Q 5113 2597 4934 2840 
+Q 4756 3084 4391 3084 
+Q 3944 3084 3684 2787 
+Q 3425 2491 3425 1978 
+L 3425 0 
+L 2847 0 
+L 2847 2094 
+Q 2847 2600 2669 2842 
+Q 2491 3084 2119 3084 
+Q 1678 3084 1418 2786 
+Q 1159 2488 1159 1978 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1356 3278 1631 3431 
+Q 1906 3584 2284 3584 
+Q 2666 3584 2933 3390 
+Q 3200 3197 3328 2828 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-75" d="M 544 1381 
+L 544 3500 
+L 1119 3500 
+L 1119 1403 
+Q 1119 906 1312 657 
+Q 1506 409 1894 409 
+Q 2359 409 2629 706 
+Q 2900 1003 2900 1516 
+L 2900 3500 
+L 3475 3500 
+L 3475 0 
+L 2900 0 
+L 2900 538 
+Q 2691 219 2414 64 
+Q 2138 -91 1772 -91 
+Q 1169 -91 856 284 
+Q 544 659 544 1381 
+z
+M 1991 3584 
+L 1991 3584 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-28" d="M 1984 4856 
+Q 1566 4138 1362 3434 
+Q 1159 2731 1159 2009 
+Q 1159 1288 1364 580 
+Q 1569 -128 1984 -844 
+L 1484 -844 
+Q 1016 -109 783 600 
+Q 550 1309 550 2009 
+Q 550 2706 781 3412 
+Q 1013 4119 1484 4856 
+L 1984 4856 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Oblique-6e" d="M 3566 2113 
+L 3156 0 
+L 2578 0 
+L 2988 2091 
+Q 3016 2238 3031 2350 
+Q 3047 2463 3047 2528 
+Q 3047 2791 2881 2937 
+Q 2716 3084 2419 3084 
+Q 1956 3084 1622 2776 
+Q 1288 2469 1184 1941 
+L 800 0 
+L 225 0 
+L 903 3500 
+L 1478 3500 
+L 1363 2950 
+Q 1603 3253 1940 3418 
+Q 2278 3584 2650 3584 
+Q 3113 3584 3367 3334 
+Q 3622 3084 3622 2631 
+Q 3622 2519 3608 2391 
+Q 3594 2263 3566 2113 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-3d" d="M 678 2906 
+L 4684 2906 
+L 4684 2381 
+L 678 2381 
+L 678 2906 
+z
+M 678 1631 
+L 4684 1631 
+L 4684 1100 
+L 678 1100 
+L 678 1631 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-39" d="M 703 97 
+L 703 672 
+Q 941 559 1184 500 
+Q 1428 441 1663 441 
+Q 2288 441 2617 861 
+Q 2947 1281 2994 2138 
+Q 2813 1869 2534 1725 
+Q 2256 1581 1919 1581 
+Q 1219 1581 811 2004 
+Q 403 2428 403 3163 
+Q 403 3881 828 4315 
+Q 1253 4750 1959 4750 
+Q 2769 4750 3195 4129 
+Q 3622 3509 3622 2328 
+Q 3622 1225 3098 567 
+Q 2575 -91 1691 -91 
+Q 1453 -91 1209 -44 
+Q 966 3 703 97 
+z
+M 1959 2075 
+Q 2384 2075 2632 2365 
+Q 2881 2656 2881 3163 
+Q 2881 3666 2632 3958 
+Q 2384 4250 1959 4250 
+Q 1534 4250 1286 3958 
+Q 1038 3666 1038 3163 
+Q 1038 2656 1286 2365 
+Q 1534 2075 1959 2075 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-29" d="M 513 4856 
+L 1013 4856 
+Q 1481 4119 1714 3412 
+Q 1947 2706 1947 2009 
+Q 1947 1309 1714 600 
+Q 1481 -109 1013 -844 
+L 513 -844 
+Q 928 -128 1133 580 
+Q 1338 1288 1338 2009 
+Q 1338 2731 1133 3434 
+Q 928 4138 513 4856 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-4d" transform="translate(0 0.015625)"/>
+     <use xlink:href="#DejaVuSans-61" transform="translate(86.279297 0.015625)"/>
+     <use xlink:href="#DejaVuSans-74" transform="translate(147.558594 0.015625)"/>
+     <use xlink:href="#DejaVuSans-72" transform="translate(186.767578 0.015625)"/>
+     <use xlink:href="#DejaVuSans-69" transform="translate(227.880859 0.015625)"/>
+     <use xlink:href="#DejaVuSans-78" transform="translate(255.664062 0.015625)"/>
+     <use xlink:href="#DejaVuSans-20" transform="translate(314.84375 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6d" transform="translate(346.630859 0.015625)"/>
+     <use xlink:href="#DejaVuSans-75" transform="translate(444.042969 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6c" transform="translate(507.421875 0.015625)"/>
+     <use xlink:href="#DejaVuSans-74" transform="translate(535.205078 0.015625)"/>
+     <use xlink:href="#DejaVuSans-69" transform="translate(574.414062 0.015625)"/>
+     <use xlink:href="#DejaVuSans-70" transform="translate(602.197266 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6c" transform="translate(665.673828 0.015625)"/>
+     <use xlink:href="#DejaVuSans-69" transform="translate(693.457031 0.015625)"/>
+     <use xlink:href="#DejaVuSans-63" transform="translate(721.240234 0.015625)"/>
+     <use xlink:href="#DejaVuSans-61" transform="translate(776.220703 0.015625)"/>
+     <use xlink:href="#DejaVuSans-74" transform="translate(837.5 0.015625)"/>
+     <use xlink:href="#DejaVuSans-69" transform="translate(876.708984 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6f" transform="translate(904.492188 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6e" transform="translate(965.673828 0.015625)"/>
+     <use xlink:href="#DejaVuSans-20" transform="translate(1029.052734 0.015625)"/>
+     <use xlink:href="#DejaVuSans-28" transform="translate(1060.839844 0.015625)"/>
+     <use xlink:href="#DejaVuSans-Oblique-6e" transform="translate(1099.853516 0.015625)"/>
+     <use xlink:href="#DejaVuSans-3d" transform="translate(1182.714844 0.015625)"/>
+     <use xlink:href="#DejaVuSans-31" transform="translate(1285.986328 0.015625)"/>
+     <use xlink:href="#DejaVuSans-39" transform="translate(1349.609375 0.015625)"/>
+     <use xlink:href="#DejaVuSans-32" transform="translate(1411.482422 0.015625)"/>
+     <use xlink:href="#DejaVuSans-30" transform="translate(1475.105469 0.015625)"/>
+     <use xlink:href="#DejaVuSans-29" transform="translate(1538.728516 0.015625)"/>
+    </g>
+   </g>
+  </g>
+ </g>
+ <defs>
+  <clipPath id="p47e68bb29c">
+   <rect x="72" y="43.2" width="446.4" height="277.2"/>
+  </clipPath>
+ </defs>
+</svg>
diff --git a/content/english/hpc/algorithms/img/mm-vectorized-barplot.svg b/content/english/hpc/algorithms/img/mm-vectorized-barplot.svg
new file mode 100644
index 00000000..610d8276
--- /dev/null
+++ b/content/english/hpc/algorithms/img/mm-vectorized-barplot.svg
@@ -0,0 +1,1140 @@
+<?xml version="1.0" encoding="utf-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+  "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<svg xmlns:xlink="http://www.w3.org/1999/xlink" width="576pt" height="360pt" viewBox="0 0 576 360" xmlns="http://www.w3.org/2000/svg" version="1.1">
+ <metadata>
+  <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
+   <cc:Work>
+    <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/>
+    <dc:date>2022-04-05T01:17:55.289785</dc:date>
+    <dc:format>image/svg+xml</dc:format>
+    <dc:creator>
+     <cc:Agent>
+      <dc:title>Matplotlib v3.5.1, https://matplotlib.org/</dc:title>
+     </cc:Agent>
+    </dc:creator>
+   </cc:Work>
+  </rdf:RDF>
+ </metadata>
+ <defs>
+  <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style>
+ </defs>
+ <g id="figure_1">
+  <g id="patch_1">
+   <path d="M 0 360 
+L 576 360 
+L 576 0 
+L 0 0 
+z
+" style="fill: #ffffff"/>
+  </g>
+  <g id="axes_1">
+   <g id="patch_2">
+    <path d="M 72 320.4 
+L 518.4 320.4 
+L 518.4 43.2 
+L 72 43.2 
+z
+" style="fill: #ffffff"/>
+   </g>
+   <g id="matplotlib.axis_1">
+    <g id="xtick_1">
+     <g id="line2d_1">
+      <path d="M 150.264935 320.4 
+L 150.264935 43.2 
+" clip-path="url(#p7814d30ea0)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_1">
+      <!-- naive -->
+      <g style="fill: #262626" transform="translate(136.607123 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-6e" d="M 3513 2113 
+L 3513 0 
+L 2938 0 
+L 2938 2094 
+Q 2938 2591 2744 2837 
+Q 2550 3084 2163 3084 
+Q 1697 3084 1428 2787 
+Q 1159 2491 1159 1978 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1366 3272 1645 3428 
+Q 1925 3584 2291 3584 
+Q 2894 3584 3203 3211 
+Q 3513 2838 3513 2113 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-61" d="M 2194 1759 
+Q 1497 1759 1228 1600 
+Q 959 1441 959 1056 
+Q 959 750 1161 570 
+Q 1363 391 1709 391 
+Q 2188 391 2477 730 
+Q 2766 1069 2766 1631 
+L 2766 1759 
+L 2194 1759 
+z
+M 3341 1997 
+L 3341 0 
+L 2766 0 
+L 2766 531 
+Q 2569 213 2275 61 
+Q 1981 -91 1556 -91 
+Q 1019 -91 701 211 
+Q 384 513 384 1019 
+Q 384 1609 779 1909 
+Q 1175 2209 1959 2209 
+L 2766 2209 
+L 2766 2266 
+Q 2766 2663 2505 2880 
+Q 2244 3097 1772 3097 
+Q 1472 3097 1187 3025 
+Q 903 2953 641 2809 
+L 641 3341 
+Q 956 3463 1253 3523 
+Q 1550 3584 1831 3584 
+Q 2591 3584 2966 3190 
+Q 3341 2797 3341 1997 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-69" d="M 603 3500 
+L 1178 3500 
+L 1178 0 
+L 603 0 
+L 603 3500 
+z
+M 603 4863 
+L 1178 4863 
+L 1178 4134 
+L 603 4134 
+L 603 4863 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-76" d="M 191 3500 
+L 800 3500 
+L 1894 563 
+L 2988 3500 
+L 3597 3500 
+L 2284 0 
+L 1503 0 
+L 191 3500 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-65" d="M 3597 1894 
+L 3597 1613 
+L 953 1613 
+Q 991 1019 1311 708 
+Q 1631 397 2203 397 
+Q 2534 397 2845 478 
+Q 3156 559 3463 722 
+L 3463 178 
+Q 3153 47 2828 -22 
+Q 2503 -91 2169 -91 
+Q 1331 -91 842 396 
+Q 353 884 353 1716 
+Q 353 2575 817 3079 
+Q 1281 3584 2069 3584 
+Q 2775 3584 3186 3129 
+Q 3597 2675 3597 1894 
+z
+M 3022 2063 
+Q 3016 2534 2758 2815 
+Q 2500 3097 2075 3097 
+Q 1594 3097 1305 2825 
+Q 1016 2553 972 2059 
+L 3022 2063 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-6e"/>
+       <use xlink:href="#DejaVuSans-61" x="63.378906"/>
+       <use xlink:href="#DejaVuSans-69" x="124.658203"/>
+       <use xlink:href="#DejaVuSans-76" x="152.441406"/>
+       <use xlink:href="#DejaVuSans-65" x="211.621094"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_2">
+     <g id="line2d_2">
+      <path d="M 295.2 320.4 
+L 295.2 43.2 
+" clip-path="url(#p7814d30ea0)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_2">
+      <!-- transposed -->
+      <g style="fill: #262626" transform="translate(267.257812 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-74" d="M 1172 4494 
+L 1172 3500 
+L 2356 3500 
+L 2356 3053 
+L 1172 3053 
+L 1172 1153 
+Q 1172 725 1289 603 
+Q 1406 481 1766 481 
+L 2356 481 
+L 2356 0 
+L 1766 0 
+Q 1100 0 847 248 
+Q 594 497 594 1153 
+L 594 3053 
+L 172 3053 
+L 172 3500 
+L 594 3500 
+L 594 4494 
+L 1172 4494 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-72" d="M 2631 2963 
+Q 2534 3019 2420 3045 
+Q 2306 3072 2169 3072 
+Q 1681 3072 1420 2755 
+Q 1159 2438 1159 1844 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1341 3275 1631 3429 
+Q 1922 3584 2338 3584 
+Q 2397 3584 2469 3576 
+Q 2541 3569 2628 3553 
+L 2631 2963 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-73" d="M 2834 3397 
+L 2834 2853 
+Q 2591 2978 2328 3040 
+Q 2066 3103 1784 3103 
+Q 1356 3103 1142 2972 
+Q 928 2841 928 2578 
+Q 928 2378 1081 2264 
+Q 1234 2150 1697 2047 
+L 1894 2003 
+Q 2506 1872 2764 1633 
+Q 3022 1394 3022 966 
+Q 3022 478 2636 193 
+Q 2250 -91 1575 -91 
+Q 1294 -91 989 -36 
+Q 684 19 347 128 
+L 347 722 
+Q 666 556 975 473 
+Q 1284 391 1588 391 
+Q 1994 391 2212 530 
+Q 2431 669 2431 922 
+Q 2431 1156 2273 1281 
+Q 2116 1406 1581 1522 
+L 1381 1569 
+Q 847 1681 609 1914 
+Q 372 2147 372 2553 
+Q 372 3047 722 3315 
+Q 1072 3584 1716 3584 
+Q 2034 3584 2315 3537 
+Q 2597 3491 2834 3397 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-70" d="M 1159 525 
+L 1159 -1331 
+L 581 -1331 
+L 581 3500 
+L 1159 3500 
+L 1159 2969 
+Q 1341 3281 1617 3432 
+Q 1894 3584 2278 3584 
+Q 2916 3584 3314 3078 
+Q 3713 2572 3713 1747 
+Q 3713 922 3314 415 
+Q 2916 -91 2278 -91 
+Q 1894 -91 1617 61 
+Q 1341 213 1159 525 
+z
+M 3116 1747 
+Q 3116 2381 2855 2742 
+Q 2594 3103 2138 3103 
+Q 1681 3103 1420 2742 
+Q 1159 2381 1159 1747 
+Q 1159 1113 1420 752 
+Q 1681 391 2138 391 
+Q 2594 391 2855 752 
+Q 3116 1113 3116 1747 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-6f" d="M 1959 3097 
+Q 1497 3097 1228 2736 
+Q 959 2375 959 1747 
+Q 959 1119 1226 758 
+Q 1494 397 1959 397 
+Q 2419 397 2687 759 
+Q 2956 1122 2956 1747 
+Q 2956 2369 2687 2733 
+Q 2419 3097 1959 3097 
+z
+M 1959 3584 
+Q 2709 3584 3137 3096 
+Q 3566 2609 3566 1747 
+Q 3566 888 3137 398 
+Q 2709 -91 1959 -91 
+Q 1206 -91 779 398 
+Q 353 888 353 1747 
+Q 353 2609 779 3096 
+Q 1206 3584 1959 3584 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-64" d="M 2906 2969 
+L 2906 4863 
+L 3481 4863 
+L 3481 0 
+L 2906 0 
+L 2906 525 
+Q 2725 213 2448 61 
+Q 2172 -91 1784 -91 
+Q 1150 -91 751 415 
+Q 353 922 353 1747 
+Q 353 2572 751 3078 
+Q 1150 3584 1784 3584 
+Q 2172 3584 2448 3432 
+Q 2725 3281 2906 2969 
+z
+M 947 1747 
+Q 947 1113 1208 752 
+Q 1469 391 1925 391 
+Q 2381 391 2643 752 
+Q 2906 1113 2906 1747 
+Q 2906 2381 2643 2742 
+Q 2381 3103 1925 3103 
+Q 1469 3103 1208 2742 
+Q 947 2381 947 1747 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-74"/>
+       <use xlink:href="#DejaVuSans-72" x="39.208984"/>
+       <use xlink:href="#DejaVuSans-61" x="80.322266"/>
+       <use xlink:href="#DejaVuSans-6e" x="141.601562"/>
+       <use xlink:href="#DejaVuSans-73" x="204.980469"/>
+       <use xlink:href="#DejaVuSans-70" x="257.080078"/>
+       <use xlink:href="#DejaVuSans-6f" x="320.556641"/>
+       <use xlink:href="#DejaVuSans-73" x="381.738281"/>
+       <use xlink:href="#DejaVuSans-65" x="433.837891"/>
+       <use xlink:href="#DejaVuSans-64" x="495.361328"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_3">
+     <g id="line2d_3">
+      <path d="M 440.135065 320.4 
+L 440.135065 43.2 
+" clip-path="url(#p7814d30ea0)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_3">
+      <!-- vectorized -->
+      <g style="fill: #262626" transform="translate(414.010846 337.498438)scale(0.1 -0.1)">
+       <defs>
+        <path id="DejaVuSans-63" d="M 3122 3366 
+L 3122 2828 
+Q 2878 2963 2633 3030 
+Q 2388 3097 2138 3097 
+Q 1578 3097 1268 2742 
+Q 959 2388 959 1747 
+Q 959 1106 1268 751 
+Q 1578 397 2138 397 
+Q 2388 397 2633 464 
+Q 2878 531 3122 666 
+L 3122 134 
+Q 2881 22 2623 -34 
+Q 2366 -91 2075 -91 
+Q 1284 -91 818 406 
+Q 353 903 353 1747 
+Q 353 2603 823 3093 
+Q 1294 3584 2113 3584 
+Q 2378 3584 2631 3529 
+Q 2884 3475 3122 3366 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-7a" d="M 353 3500 
+L 3084 3500 
+L 3084 2975 
+L 922 459 
+L 3084 459 
+L 3084 0 
+L 275 0 
+L 275 525 
+L 2438 3041 
+L 353 3041 
+L 353 3500 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-76"/>
+       <use xlink:href="#DejaVuSans-65" x="59.179688"/>
+       <use xlink:href="#DejaVuSans-63" x="120.703125"/>
+       <use xlink:href="#DejaVuSans-74" x="175.683594"/>
+       <use xlink:href="#DejaVuSans-6f" x="214.892578"/>
+       <use xlink:href="#DejaVuSans-72" x="276.074219"/>
+       <use xlink:href="#DejaVuSans-69" x="317.1875"/>
+       <use xlink:href="#DejaVuSans-7a" x="344.970703"/>
+       <use xlink:href="#DejaVuSans-65" x="397.460938"/>
+       <use xlink:href="#DejaVuSans-64" x="458.984375"/>
+      </g>
+     </g>
+    </g>
+   </g>
+   <g id="matplotlib.axis_2">
+    <g id="ytick_1">
+     <g id="line2d_4">
+      <path d="M 72 320.4 
+L 518.4 320.4 
+" clip-path="url(#p7814d30ea0)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_4">
+      <!-- 0.0 -->
+      <g style="fill: #262626" transform="translate(45.006563 324.579141)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-30" d="M 2034 4250 
+Q 1547 4250 1301 3770 
+Q 1056 3291 1056 2328 
+Q 1056 1369 1301 889 
+Q 1547 409 2034 409 
+Q 2525 409 2770 889 
+Q 3016 1369 3016 2328 
+Q 3016 3291 2770 3770 
+Q 2525 4250 2034 4250 
+z
+M 2034 4750 
+Q 2819 4750 3233 4129 
+Q 3647 3509 3647 2328 
+Q 3647 1150 3233 529 
+Q 2819 -91 2034 -91 
+Q 1250 -91 836 529 
+Q 422 1150 422 2328 
+Q 422 3509 836 4129 
+Q 1250 4750 2034 4750 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-2e" d="M 684 794 
+L 1344 794 
+L 1344 0 
+L 684 0 
+L 684 794 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-30"/>
+       <use xlink:href="#DejaVuSans-2e" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="95.410156"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_2">
+     <g id="line2d_5">
+      <path d="M 72 264.96 
+L 518.4 264.96 
+" clip-path="url(#p7814d30ea0)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_5">
+      <!-- 0.5 -->
+      <g style="fill: #262626" transform="translate(45.006563 269.139141)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-35" d="M 691 4666 
+L 3169 4666 
+L 3169 4134 
+L 1269 4134 
+L 1269 2991 
+Q 1406 3038 1543 3061 
+Q 1681 3084 1819 3084 
+Q 2600 3084 3056 2656 
+Q 3513 2228 3513 1497 
+Q 3513 744 3044 326 
+Q 2575 -91 1722 -91 
+Q 1428 -91 1123 -41 
+Q 819 9 494 109 
+L 494 744 
+Q 775 591 1075 516 
+Q 1375 441 1709 441 
+Q 2250 441 2565 725 
+Q 2881 1009 2881 1497 
+Q 2881 1984 2565 2268 
+Q 2250 2553 1709 2553 
+Q 1456 2553 1204 2497 
+Q 953 2441 691 2322 
+L 691 4666 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-30"/>
+       <use xlink:href="#DejaVuSans-2e" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-35" x="95.410156"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_3">
+     <g id="line2d_6">
+      <path d="M 72 209.52 
+L 518.4 209.52 
+" clip-path="url(#p7814d30ea0)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_6">
+      <!-- 1.0 -->
+      <g style="fill: #262626" transform="translate(45.006563 213.699141)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-31" d="M 794 531 
+L 1825 531 
+L 1825 4091 
+L 703 3866 
+L 703 4441 
+L 1819 4666 
+L 2450 4666 
+L 2450 531 
+L 3481 531 
+L 3481 0 
+L 794 0 
+L 794 531 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-2e" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="95.410156"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_4">
+     <g id="line2d_7">
+      <path d="M 72 154.08 
+L 518.4 154.08 
+" clip-path="url(#p7814d30ea0)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_7">
+      <!-- 1.5 -->
+      <g style="fill: #262626" transform="translate(45.006563 158.259141)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-2e" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-35" x="95.410156"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_5">
+     <g id="line2d_8">
+      <path d="M 72 98.64 
+L 518.4 98.64 
+" clip-path="url(#p7814d30ea0)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_8">
+      <!-- 2.0 -->
+      <g style="fill: #262626" transform="translate(45.006563 102.819141)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-32" d="M 1228 531 
+L 3431 531 
+L 3431 0 
+L 469 0 
+L 469 531 
+Q 828 903 1448 1529 
+Q 2069 2156 2228 2338 
+Q 2531 2678 2651 2914 
+Q 2772 3150 2772 3378 
+Q 2772 3750 2511 3984 
+Q 2250 4219 1831 4219 
+Q 1534 4219 1204 4116 
+Q 875 4013 500 3803 
+L 500 4441 
+Q 881 4594 1212 4672 
+Q 1544 4750 1819 4750 
+Q 2544 4750 2975 4387 
+Q 3406 4025 3406 3419 
+Q 3406 3131 3298 2873 
+Q 3191 2616 2906 2266 
+Q 2828 2175 2409 1742 
+Q 1991 1309 1228 531 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-32"/>
+       <use xlink:href="#DejaVuSans-2e" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="95.410156"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_6">
+     <g id="line2d_9">
+      <path d="M 72 43.2 
+L 518.4 43.2 
+" clip-path="url(#p7814d30ea0)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_9">
+      <!-- 2.5 -->
+      <g style="fill: #262626" transform="translate(45.006563 47.379141)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-32"/>
+       <use xlink:href="#DejaVuSans-2e" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-35" x="95.410156"/>
+      </g>
+     </g>
+    </g>
+    <g id="text_10">
+     <!-- GFLOPS -->
+     <g style="fill: #262626" transform="translate(38.510937 205.175625)rotate(-90)scale(0.12 -0.12)">
+      <defs>
+       <path id="DejaVuSans-47" d="M 3809 666 
+L 3809 1919 
+L 2778 1919 
+L 2778 2438 
+L 4434 2438 
+L 4434 434 
+Q 4069 175 3628 42 
+Q 3188 -91 2688 -91 
+Q 1594 -91 976 548 
+Q 359 1188 359 2328 
+Q 359 3472 976 4111 
+Q 1594 4750 2688 4750 
+Q 3144 4750 3555 4637 
+Q 3966 4525 4313 4306 
+L 4313 3634 
+Q 3963 3931 3569 4081 
+Q 3175 4231 2741 4231 
+Q 1884 4231 1454 3753 
+Q 1025 3275 1025 2328 
+Q 1025 1384 1454 906 
+Q 1884 428 2741 428 
+Q 3075 428 3337 486 
+Q 3600 544 3809 666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-46" d="M 628 4666 
+L 3309 4666 
+L 3309 4134 
+L 1259 4134 
+L 1259 2759 
+L 3109 2759 
+L 3109 2228 
+L 1259 2228 
+L 1259 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-4c" d="M 628 4666 
+L 1259 4666 
+L 1259 531 
+L 3531 531 
+L 3531 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-4f" d="M 2522 4238 
+Q 1834 4238 1429 3725 
+Q 1025 3213 1025 2328 
+Q 1025 1447 1429 934 
+Q 1834 422 2522 422 
+Q 3209 422 3611 934 
+Q 4013 1447 4013 2328 
+Q 4013 3213 3611 3725 
+Q 3209 4238 2522 4238 
+z
+M 2522 4750 
+Q 3503 4750 4090 4092 
+Q 4678 3434 4678 2328 
+Q 4678 1225 4090 567 
+Q 3503 -91 2522 -91 
+Q 1538 -91 948 565 
+Q 359 1222 359 2328 
+Q 359 3434 948 4092 
+Q 1538 4750 2522 4750 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-50" d="M 1259 4147 
+L 1259 2394 
+L 2053 2394 
+Q 2494 2394 2734 2622 
+Q 2975 2850 2975 3272 
+Q 2975 3691 2734 3919 
+Q 2494 4147 2053 4147 
+L 1259 4147 
+z
+M 628 4666 
+L 2053 4666 
+Q 2838 4666 3239 4311 
+Q 3641 3956 3641 3272 
+Q 3641 2581 3239 2228 
+Q 2838 1875 2053 1875 
+L 1259 1875 
+L 1259 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-53" d="M 3425 4513 
+L 3425 3897 
+Q 3066 4069 2747 4153 
+Q 2428 4238 2131 4238 
+Q 1616 4238 1336 4038 
+Q 1056 3838 1056 3469 
+Q 1056 3159 1242 3001 
+Q 1428 2844 1947 2747 
+L 2328 2669 
+Q 3034 2534 3370 2195 
+Q 3706 1856 3706 1288 
+Q 3706 609 3251 259 
+Q 2797 -91 1919 -91 
+Q 1588 -91 1214 -16 
+Q 841 59 441 206 
+L 441 856 
+Q 825 641 1194 531 
+Q 1563 422 1919 422 
+Q 2459 422 2753 634 
+Q 3047 847 3047 1241 
+Q 3047 1584 2836 1778 
+Q 2625 1972 2144 2069 
+L 1759 2144 
+Q 1053 2284 737 2584 
+Q 422 2884 422 3419 
+Q 422 4038 858 4394 
+Q 1294 4750 2059 4750 
+Q 2388 4750 2728 4690 
+Q 3069 4631 3425 4513 
+z
+" transform="scale(0.015625)"/>
+      </defs>
+      <use xlink:href="#DejaVuSans-47"/>
+      <use xlink:href="#DejaVuSans-46" x="77.490234"/>
+      <use xlink:href="#DejaVuSans-4c" x="135.009766"/>
+      <use xlink:href="#DejaVuSans-4f" x="187.097656"/>
+      <use xlink:href="#DejaVuSans-50" x="265.808594"/>
+      <use xlink:href="#DejaVuSans-53" x="326.111328"/>
+     </g>
+    </g>
+   </g>
+   <g id="patch_3">
+    <path d="M 92.290909 320.4 
+L 208.238961 320.4 
+L 208.238961 273.546201 
+L 92.290909 273.546201 
+z
+" clip-path="url(#p7814d30ea0)" style="fill: #4c72b0; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_4">
+    <path d="M 237.225974 320.4 
+L 353.174026 320.4 
+L 353.174026 256.993918 
+L 237.225974 256.993918 
+z
+" clip-path="url(#p7814d30ea0)" style="fill: #dd8452; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_5">
+    <path d="M 382.161039 320.4 
+L 498.109091 320.4 
+L 498.109091 68.63802 
+L 382.161039 68.63802 
+z
+" clip-path="url(#p7814d30ea0)" style="fill: #55a868; stroke: #ffffff; stroke-linejoin: miter"/>
+   </g>
+   <g id="patch_6">
+    <path d="M 72 320.4 
+L 72 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_7">
+    <path d="M 518.4 320.4 
+L 518.4 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_8">
+    <path d="M 72 320.4 
+L 518.4 320.4 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_9">
+    <path d="M 72 43.2 
+L 518.4 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="text_11">
+    <!-- 1.00x -->
+    <g style="fill: #262626" transform="translate(131.59181 268.550576)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-31" d="M 750 831 
+L 1813 831 
+L 1813 3847 
+L 722 3622 
+L 722 4441 
+L 1806 4666 
+L 2950 4666 
+L 2950 831 
+L 4013 831 
+L 4013 0 
+L 750 0 
+L 750 831 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Bold-2e" d="M 653 1209 
+L 1778 1209 
+L 1778 0 
+L 653 0 
+L 653 1209 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Bold-30" d="M 2944 2338 
+Q 2944 3213 2780 3570 
+Q 2616 3928 2228 3928 
+Q 1841 3928 1675 3570 
+Q 1509 3213 1509 2338 
+Q 1509 1453 1675 1090 
+Q 1841 728 2228 728 
+Q 2613 728 2778 1090 
+Q 2944 1453 2944 2338 
+z
+M 4147 2328 
+Q 4147 1169 3647 539 
+Q 3147 -91 2228 -91 
+Q 1306 -91 806 539 
+Q 306 1169 306 2328 
+Q 306 3491 806 4120 
+Q 1306 4750 2228 4750 
+Q 3147 4750 3647 4120 
+Q 4147 3491 4147 2328 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Bold-78" d="M 1422 1791 
+L 159 3500 
+L 1344 3500 
+L 2059 2463 
+L 2784 3500 
+L 3969 3500 
+L 2706 1797 
+L 4031 0 
+L 2847 0 
+L 2059 1106 
+L 1281 0 
+L 97 0 
+L 1422 1791 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-31"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-30" x="107.568359"/>
+     <use xlink:href="#DejaVuSans-Bold-30" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="246.728516"/>
+    </g>
+   </g>
+   <g id="text_12">
+    <!-- 1.35x -->
+    <g style="fill: #262626" transform="translate(276.526875 251.998293)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-33" d="M 2981 2516 
+Q 3453 2394 3698 2092 
+Q 3944 1791 3944 1325 
+Q 3944 631 3412 270 
+Q 2881 -91 1863 -91 
+Q 1503 -91 1142 -33 
+Q 781 25 428 141 
+L 428 1069 
+Q 766 900 1098 814 
+Q 1431 728 1753 728 
+Q 2231 728 2486 893 
+Q 2741 1059 2741 1369 
+Q 2741 1688 2480 1852 
+Q 2219 2016 1709 2016 
+L 1228 2016 
+L 1228 2791 
+L 1734 2791 
+Q 2188 2791 2409 2933 
+Q 2631 3075 2631 3366 
+Q 2631 3634 2415 3781 
+Q 2200 3928 1806 3928 
+Q 1516 3928 1219 3862 
+Q 922 3797 628 3669 
+L 628 4550 
+Q 984 4650 1334 4700 
+Q 1684 4750 2022 4750 
+Q 2931 4750 3382 4451 
+Q 3834 4153 3834 3553 
+Q 3834 3144 3618 2883 
+Q 3403 2622 2981 2516 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Bold-35" d="M 678 4666 
+L 3669 4666 
+L 3669 3781 
+L 1638 3781 
+L 1638 3059 
+Q 1775 3097 1914 3117 
+Q 2053 3138 2203 3138 
+Q 3056 3138 3531 2711 
+Q 4006 2284 4006 1522 
+Q 4006 766 3489 337 
+Q 2972 -91 2053 -91 
+Q 1656 -91 1267 -14 
+Q 878 63 494 219 
+L 494 1166 
+Q 875 947 1217 837 
+Q 1559 728 1863 728 
+Q 2300 728 2551 942 
+Q 2803 1156 2803 1522 
+Q 2803 1891 2551 2103 
+Q 2300 2316 1863 2316 
+Q 1603 2316 1309 2248 
+Q 1016 2181 678 2041 
+L 678 4666 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-31"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-33" x="107.568359"/>
+     <use xlink:href="#DejaVuSans-Bold-35" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="246.728516"/>
+    </g>
+   </g>
+   <g id="text_13">
+    <!-- 5.37x -->
+    <g style="fill: #262626" transform="translate(421.46194 63.642395)scale(0.12 -0.12)">
+     <defs>
+      <path id="DejaVuSans-Bold-37" d="M 428 4666 
+L 3944 4666 
+L 3944 3988 
+L 2125 0 
+L 953 0 
+L 2675 3781 
+L 428 3781 
+L 428 4666 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-Bold-35"/>
+     <use xlink:href="#DejaVuSans-Bold-2e" x="69.580078"/>
+     <use xlink:href="#DejaVuSans-Bold-33" x="107.568359"/>
+     <use xlink:href="#DejaVuSans-Bold-37" x="177.148438"/>
+     <use xlink:href="#DejaVuSans-Bold-78" x="246.728516"/>
+    </g>
+   </g>
+   <g id="text_14">
+    <!-- Matrix multiplication ($n=1920$) -->
+    <g style="fill: #262626" transform="translate(184.74 23.2)scale(0.14 -0.14)">
+     <defs>
+      <path id="DejaVuSans-4d" d="M 628 4666 
+L 1569 4666 
+L 2759 1491 
+L 3956 4666 
+L 4897 4666 
+L 4897 0 
+L 4281 0 
+L 4281 4097 
+L 3078 897 
+L 2444 897 
+L 1241 4097 
+L 1241 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-78" d="M 3513 3500 
+L 2247 1797 
+L 3578 0 
+L 2900 0 
+L 1881 1375 
+L 863 0 
+L 184 0 
+L 1544 1831 
+L 300 3500 
+L 978 3500 
+L 1906 2253 
+L 2834 3500 
+L 3513 3500 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-20" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-6d" d="M 3328 2828 
+Q 3544 3216 3844 3400 
+Q 4144 3584 4550 3584 
+Q 5097 3584 5394 3201 
+Q 5691 2819 5691 2113 
+L 5691 0 
+L 5113 0 
+L 5113 2094 
+Q 5113 2597 4934 2840 
+Q 4756 3084 4391 3084 
+Q 3944 3084 3684 2787 
+Q 3425 2491 3425 1978 
+L 3425 0 
+L 2847 0 
+L 2847 2094 
+Q 2847 2600 2669 2842 
+Q 2491 3084 2119 3084 
+Q 1678 3084 1418 2786 
+Q 1159 2488 1159 1978 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1356 3278 1631 3431 
+Q 1906 3584 2284 3584 
+Q 2666 3584 2933 3390 
+Q 3200 3197 3328 2828 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-75" d="M 544 1381 
+L 544 3500 
+L 1119 3500 
+L 1119 1403 
+Q 1119 906 1312 657 
+Q 1506 409 1894 409 
+Q 2359 409 2629 706 
+Q 2900 1003 2900 1516 
+L 2900 3500 
+L 3475 3500 
+L 3475 0 
+L 2900 0 
+L 2900 538 
+Q 2691 219 2414 64 
+Q 2138 -91 1772 -91 
+Q 1169 -91 856 284 
+Q 544 659 544 1381 
+z
+M 1991 3584 
+L 1991 3584 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-6c" d="M 603 4863 
+L 1178 4863 
+L 1178 0 
+L 603 0 
+L 603 4863 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-28" d="M 1984 4856 
+Q 1566 4138 1362 3434 
+Q 1159 2731 1159 2009 
+Q 1159 1288 1364 580 
+Q 1569 -128 1984 -844 
+L 1484 -844 
+Q 1016 -109 783 600 
+Q 550 1309 550 2009 
+Q 550 2706 781 3412 
+Q 1013 4119 1484 4856 
+L 1984 4856 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-Oblique-6e" d="M 3566 2113 
+L 3156 0 
+L 2578 0 
+L 2988 2091 
+Q 3016 2238 3031 2350 
+Q 3047 2463 3047 2528 
+Q 3047 2791 2881 2937 
+Q 2716 3084 2419 3084 
+Q 1956 3084 1622 2776 
+Q 1288 2469 1184 1941 
+L 800 0 
+L 225 0 
+L 903 3500 
+L 1478 3500 
+L 1363 2950 
+Q 1603 3253 1940 3418 
+Q 2278 3584 2650 3584 
+Q 3113 3584 3367 3334 
+Q 3622 3084 3622 2631 
+Q 3622 2519 3608 2391 
+Q 3594 2263 3566 2113 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-3d" d="M 678 2906 
+L 4684 2906 
+L 4684 2381 
+L 678 2381 
+L 678 2906 
+z
+M 678 1631 
+L 4684 1631 
+L 4684 1100 
+L 678 1100 
+L 678 1631 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-39" d="M 703 97 
+L 703 672 
+Q 941 559 1184 500 
+Q 1428 441 1663 441 
+Q 2288 441 2617 861 
+Q 2947 1281 2994 2138 
+Q 2813 1869 2534 1725 
+Q 2256 1581 1919 1581 
+Q 1219 1581 811 2004 
+Q 403 2428 403 3163 
+Q 403 3881 828 4315 
+Q 1253 4750 1959 4750 
+Q 2769 4750 3195 4129 
+Q 3622 3509 3622 2328 
+Q 3622 1225 3098 567 
+Q 2575 -91 1691 -91 
+Q 1453 -91 1209 -44 
+Q 966 3 703 97 
+z
+M 1959 2075 
+Q 2384 2075 2632 2365 
+Q 2881 2656 2881 3163 
+Q 2881 3666 2632 3958 
+Q 2384 4250 1959 4250 
+Q 1534 4250 1286 3958 
+Q 1038 3666 1038 3163 
+Q 1038 2656 1286 2365 
+Q 1534 2075 1959 2075 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-29" d="M 513 4856 
+L 1013 4856 
+Q 1481 4119 1714 3412 
+Q 1947 2706 1947 2009 
+Q 1947 1309 1714 600 
+Q 1481 -109 1013 -844 
+L 513 -844 
+Q 928 -128 1133 580 
+Q 1338 1288 1338 2009 
+Q 1338 2731 1133 3434 
+Q 928 4138 513 4856 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-4d" transform="translate(0 0.015625)"/>
+     <use xlink:href="#DejaVuSans-61" transform="translate(86.279297 0.015625)"/>
+     <use xlink:href="#DejaVuSans-74" transform="translate(147.558594 0.015625)"/>
+     <use xlink:href="#DejaVuSans-72" transform="translate(186.767578 0.015625)"/>
+     <use xlink:href="#DejaVuSans-69" transform="translate(227.880859 0.015625)"/>
+     <use xlink:href="#DejaVuSans-78" transform="translate(255.664062 0.015625)"/>
+     <use xlink:href="#DejaVuSans-20" transform="translate(314.84375 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6d" transform="translate(346.630859 0.015625)"/>
+     <use xlink:href="#DejaVuSans-75" transform="translate(444.042969 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6c" transform="translate(507.421875 0.015625)"/>
+     <use xlink:href="#DejaVuSans-74" transform="translate(535.205078 0.015625)"/>
+     <use xlink:href="#DejaVuSans-69" transform="translate(574.414062 0.015625)"/>
+     <use xlink:href="#DejaVuSans-70" transform="translate(602.197266 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6c" transform="translate(665.673828 0.015625)"/>
+     <use xlink:href="#DejaVuSans-69" transform="translate(693.457031 0.015625)"/>
+     <use xlink:href="#DejaVuSans-63" transform="translate(721.240234 0.015625)"/>
+     <use xlink:href="#DejaVuSans-61" transform="translate(776.220703 0.015625)"/>
+     <use xlink:href="#DejaVuSans-74" transform="translate(837.5 0.015625)"/>
+     <use xlink:href="#DejaVuSans-69" transform="translate(876.708984 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6f" transform="translate(904.492188 0.015625)"/>
+     <use xlink:href="#DejaVuSans-6e" transform="translate(965.673828 0.015625)"/>
+     <use xlink:href="#DejaVuSans-20" transform="translate(1029.052734 0.015625)"/>
+     <use xlink:href="#DejaVuSans-28" transform="translate(1060.839844 0.015625)"/>
+     <use xlink:href="#DejaVuSans-Oblique-6e" transform="translate(1099.853516 0.015625)"/>
+     <use xlink:href="#DejaVuSans-3d" transform="translate(1182.714844 0.015625)"/>
+     <use xlink:href="#DejaVuSans-31" transform="translate(1285.986328 0.015625)"/>
+     <use xlink:href="#DejaVuSans-39" transform="translate(1349.609375 0.015625)"/>
+     <use xlink:href="#DejaVuSans-32" transform="translate(1411.482422 0.015625)"/>
+     <use xlink:href="#DejaVuSans-30" transform="translate(1475.105469 0.015625)"/>
+     <use xlink:href="#DejaVuSans-29" transform="translate(1538.728516 0.015625)"/>
+    </g>
+   </g>
+  </g>
+ </g>
+ <defs>
+  <clipPath id="p7814d30ea0">
+   <rect x="72" y="43.2" width="446.4" height="277.2"/>
+  </clipPath>
+ </defs>
+</svg>
diff --git a/content/english/hpc/algorithms/img/mm-vectorized-plot.svg b/content/english/hpc/algorithms/img/mm-vectorized-plot.svg
new file mode 100644
index 00000000..7374f73f
--- /dev/null
+++ b/content/english/hpc/algorithms/img/mm-vectorized-plot.svg
@@ -0,0 +1,1379 @@
+<?xml version="1.0" encoding="utf-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+  "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<svg xmlns:xlink="http://www.w3.org/1999/xlink" width="576pt" height="360pt" viewBox="0 0 576 360" xmlns="http://www.w3.org/2000/svg" version="1.1">
+ <metadata>
+  <rdf:RDF xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:cc="http://creativecommons.org/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
+   <cc:Work>
+    <dc:type rdf:resource="http://purl.org/dc/dcmitype/StillImage"/>
+    <dc:date>2022-04-05T01:18:01.560593</dc:date>
+    <dc:format>image/svg+xml</dc:format>
+    <dc:creator>
+     <cc:Agent>
+      <dc:title>Matplotlib v3.5.1, https://matplotlib.org/</dc:title>
+     </cc:Agent>
+    </dc:creator>
+   </cc:Work>
+  </rdf:RDF>
+ </metadata>
+ <defs>
+  <style type="text/css">*{stroke-linejoin: round; stroke-linecap: butt}</style>
+ </defs>
+ <g id="figure_1">
+  <g id="patch_1">
+   <path d="M 0 360 
+L 576 360 
+L 576 0 
+L 0 0 
+z
+" style="fill: #ffffff"/>
+  </g>
+  <g id="axes_1">
+   <g id="patch_2">
+    <path d="M 72 320.4 
+L 518.4 320.4 
+L 518.4 43.2 
+L 72 43.2 
+z
+" style="fill: #ffffff"/>
+   </g>
+   <g id="matplotlib.axis_1">
+    <g id="xtick_1">
+     <g id="line2d_1">
+      <path d="M 117.784615 320.4 
+L 117.784615 43.2 
+" clip-path="url(#pd6c2af2edb)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_1">
+      <!-- 240 -->
+      <g style="fill: #262626" transform="translate(107.28649 338.258281)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-32" d="M 1228 531 
+L 3431 531 
+L 3431 0 
+L 469 0 
+L 469 531 
+Q 828 903 1448 1529 
+Q 2069 2156 2228 2338 
+Q 2531 2678 2651 2914 
+Q 2772 3150 2772 3378 
+Q 2772 3750 2511 3984 
+Q 2250 4219 1831 4219 
+Q 1534 4219 1204 4116 
+Q 875 4013 500 3803 
+L 500 4441 
+Q 881 4594 1212 4672 
+Q 1544 4750 1819 4750 
+Q 2544 4750 2975 4387 
+Q 3406 4025 3406 3419 
+Q 3406 3131 3298 2873 
+Q 3191 2616 2906 2266 
+Q 2828 2175 2409 1742 
+Q 1991 1309 1228 531 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-34" d="M 2419 4116 
+L 825 1625 
+L 2419 1625 
+L 2419 4116 
+z
+M 2253 4666 
+L 3047 4666 
+L 3047 1625 
+L 3713 1625 
+L 3713 1100 
+L 3047 1100 
+L 3047 0 
+L 2419 0 
+L 2419 1100 
+L 313 1100 
+L 313 1709 
+L 2253 4666 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-30" d="M 2034 4250 
+Q 1547 4250 1301 3770 
+Q 1056 3291 1056 2328 
+Q 1056 1369 1301 889 
+Q 1547 409 2034 409 
+Q 2525 409 2770 889 
+Q 3016 1369 3016 2328 
+Q 3016 3291 2770 3770 
+Q 2525 4250 2034 4250 
+z
+M 2034 4750 
+Q 2819 4750 3233 4129 
+Q 3647 3509 3647 2328 
+Q 3647 1150 3233 529 
+Q 2819 -91 2034 -91 
+Q 1250 -91 836 529 
+Q 422 1150 422 2328 
+Q 422 3509 836 4129 
+Q 1250 4750 2034 4750 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-32"/>
+       <use xlink:href="#DejaVuSans-34" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="127.246094"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_2">
+     <g id="line2d_2">
+      <path d="M 175.015385 320.4 
+L 175.015385 43.2 
+" clip-path="url(#pd6c2af2edb)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_2">
+      <!-- 480 -->
+      <g style="fill: #262626" transform="translate(164.51726 338.258281)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-38" d="M 2034 2216 
+Q 1584 2216 1326 1975 
+Q 1069 1734 1069 1313 
+Q 1069 891 1326 650 
+Q 1584 409 2034 409 
+Q 2484 409 2743 651 
+Q 3003 894 3003 1313 
+Q 3003 1734 2745 1975 
+Q 2488 2216 2034 2216 
+z
+M 1403 2484 
+Q 997 2584 770 2862 
+Q 544 3141 544 3541 
+Q 544 4100 942 4425 
+Q 1341 4750 2034 4750 
+Q 2731 4750 3128 4425 
+Q 3525 4100 3525 3541 
+Q 3525 3141 3298 2862 
+Q 3072 2584 2669 2484 
+Q 3125 2378 3379 2068 
+Q 3634 1759 3634 1313 
+Q 3634 634 3220 271 
+Q 2806 -91 2034 -91 
+Q 1263 -91 848 271 
+Q 434 634 434 1313 
+Q 434 1759 690 2068 
+Q 947 2378 1403 2484 
+z
+M 1172 3481 
+Q 1172 3119 1398 2916 
+Q 1625 2713 2034 2713 
+Q 2441 2713 2670 2916 
+Q 2900 3119 2900 3481 
+Q 2900 3844 2670 4047 
+Q 2441 4250 2034 4250 
+Q 1625 4250 1398 4047 
+Q 1172 3844 1172 3481 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-34"/>
+       <use xlink:href="#DejaVuSans-38" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="127.246094"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_3">
+     <g id="line2d_3">
+      <path d="M 232.246154 320.4 
+L 232.246154 43.2 
+" clip-path="url(#pd6c2af2edb)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_3">
+      <!-- 720 -->
+      <g style="fill: #262626" transform="translate(221.748029 338.258281)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-37" d="M 525 4666 
+L 3525 4666 
+L 3525 4397 
+L 1831 0 
+L 1172 0 
+L 2766 4134 
+L 525 4134 
+L 525 4666 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-37"/>
+       <use xlink:href="#DejaVuSans-32" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="127.246094"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_4">
+     <g id="line2d_4">
+      <path d="M 289.476923 320.4 
+L 289.476923 43.2 
+" clip-path="url(#pd6c2af2edb)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_4">
+      <!-- 960 -->
+      <g style="fill: #262626" transform="translate(278.978798 338.258281)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-39" d="M 703 97 
+L 703 672 
+Q 941 559 1184 500 
+Q 1428 441 1663 441 
+Q 2288 441 2617 861 
+Q 2947 1281 2994 2138 
+Q 2813 1869 2534 1725 
+Q 2256 1581 1919 1581 
+Q 1219 1581 811 2004 
+Q 403 2428 403 3163 
+Q 403 3881 828 4315 
+Q 1253 4750 1959 4750 
+Q 2769 4750 3195 4129 
+Q 3622 3509 3622 2328 
+Q 3622 1225 3098 567 
+Q 2575 -91 1691 -91 
+Q 1453 -91 1209 -44 
+Q 966 3 703 97 
+z
+M 1959 2075 
+Q 2384 2075 2632 2365 
+Q 2881 2656 2881 3163 
+Q 2881 3666 2632 3958 
+Q 2384 4250 1959 4250 
+Q 1534 4250 1286 3958 
+Q 1038 3666 1038 3163 
+Q 1038 2656 1286 2365 
+Q 1534 2075 1959 2075 
+z
+" transform="scale(0.015625)"/>
+        <path id="DejaVuSans-36" d="M 2113 2584 
+Q 1688 2584 1439 2293 
+Q 1191 2003 1191 1497 
+Q 1191 994 1439 701 
+Q 1688 409 2113 409 
+Q 2538 409 2786 701 
+Q 3034 994 3034 1497 
+Q 3034 2003 2786 2293 
+Q 2538 2584 2113 2584 
+z
+M 3366 4563 
+L 3366 3988 
+Q 3128 4100 2886 4159 
+Q 2644 4219 2406 4219 
+Q 1781 4219 1451 3797 
+Q 1122 3375 1075 2522 
+Q 1259 2794 1537 2939 
+Q 1816 3084 2150 3084 
+Q 2853 3084 3261 2657 
+Q 3669 2231 3669 1497 
+Q 3669 778 3244 343 
+Q 2819 -91 2113 -91 
+Q 1303 -91 875 529 
+Q 447 1150 447 2328 
+Q 447 3434 972 4092 
+Q 1497 4750 2381 4750 
+Q 2619 4750 2861 4703 
+Q 3103 4656 3366 4563 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-39"/>
+       <use xlink:href="#DejaVuSans-36" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="127.246094"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_5">
+     <g id="line2d_5">
+      <path d="M 346.707692 320.4 
+L 346.707692 43.2 
+" clip-path="url(#pd6c2af2edb)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_5">
+      <!-- 1200 -->
+      <g style="fill: #262626" transform="translate(332.710192 338.258281)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-31" d="M 794 531 
+L 1825 531 
+L 1825 4091 
+L 703 3866 
+L 703 4441 
+L 1819 4666 
+L 2450 4666 
+L 2450 531 
+L 3481 531 
+L 3481 0 
+L 794 0 
+L 794 531 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-32" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-30" x="127.246094"/>
+       <use xlink:href="#DejaVuSans-30" x="190.869141"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_6">
+     <g id="line2d_6">
+      <path d="M 403.938462 320.4 
+L 403.938462 43.2 
+" clip-path="url(#pd6c2af2edb)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_6">
+      <!-- 1440 -->
+      <g style="fill: #262626" transform="translate(389.940962 338.258281)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-34" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-34" x="127.246094"/>
+       <use xlink:href="#DejaVuSans-30" x="190.869141"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_7">
+     <g id="line2d_7">
+      <path d="M 461.169231 320.4 
+L 461.169231 43.2 
+" clip-path="url(#pd6c2af2edb)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_7">
+      <!-- 1680 -->
+      <g style="fill: #262626" transform="translate(447.171731 338.258281)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-36" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-38" x="127.246094"/>
+       <use xlink:href="#DejaVuSans-30" x="190.869141"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_8">
+     <g id="line2d_8">
+      <path d="M 518.4 320.4 
+L 518.4 43.2 
+" clip-path="url(#pd6c2af2edb)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_8">
+      <!-- 1920 -->
+      <g style="fill: #262626" transform="translate(504.4025 338.258281)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-31"/>
+       <use xlink:href="#DejaVuSans-39" x="63.623047"/>
+       <use xlink:href="#DejaVuSans-32" x="127.246094"/>
+       <use xlink:href="#DejaVuSans-30" x="190.869141"/>
+      </g>
+     </g>
+    </g>
+    <g id="text_9">
+     <!-- Matrix size ($n \times n$) -->
+     <g style="fill: #262626" transform="translate(241.2 353.664062)scale(0.12 -0.12)">
+      <defs>
+       <path id="DejaVuSans-4d" d="M 628 4666 
+L 1569 4666 
+L 2759 1491 
+L 3956 4666 
+L 4897 4666 
+L 4897 0 
+L 4281 0 
+L 4281 4097 
+L 3078 897 
+L 2444 897 
+L 1241 4097 
+L 1241 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-61" d="M 2194 1759 
+Q 1497 1759 1228 1600 
+Q 959 1441 959 1056 
+Q 959 750 1161 570 
+Q 1363 391 1709 391 
+Q 2188 391 2477 730 
+Q 2766 1069 2766 1631 
+L 2766 1759 
+L 2194 1759 
+z
+M 3341 1997 
+L 3341 0 
+L 2766 0 
+L 2766 531 
+Q 2569 213 2275 61 
+Q 1981 -91 1556 -91 
+Q 1019 -91 701 211 
+Q 384 513 384 1019 
+Q 384 1609 779 1909 
+Q 1175 2209 1959 2209 
+L 2766 2209 
+L 2766 2266 
+Q 2766 2663 2505 2880 
+Q 2244 3097 1772 3097 
+Q 1472 3097 1187 3025 
+Q 903 2953 641 2809 
+L 641 3341 
+Q 956 3463 1253 3523 
+Q 1550 3584 1831 3584 
+Q 2591 3584 2966 3190 
+Q 3341 2797 3341 1997 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-74" d="M 1172 4494 
+L 1172 3500 
+L 2356 3500 
+L 2356 3053 
+L 1172 3053 
+L 1172 1153 
+Q 1172 725 1289 603 
+Q 1406 481 1766 481 
+L 2356 481 
+L 2356 0 
+L 1766 0 
+Q 1100 0 847 248 
+Q 594 497 594 1153 
+L 594 3053 
+L 172 3053 
+L 172 3500 
+L 594 3500 
+L 594 4494 
+L 1172 4494 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-72" d="M 2631 2963 
+Q 2534 3019 2420 3045 
+Q 2306 3072 2169 3072 
+Q 1681 3072 1420 2755 
+Q 1159 2438 1159 1844 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1341 3275 1631 3429 
+Q 1922 3584 2338 3584 
+Q 2397 3584 2469 3576 
+Q 2541 3569 2628 3553 
+L 2631 2963 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-69" d="M 603 3500 
+L 1178 3500 
+L 1178 0 
+L 603 0 
+L 603 3500 
+z
+M 603 4863 
+L 1178 4863 
+L 1178 4134 
+L 603 4134 
+L 603 4863 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-78" d="M 3513 3500 
+L 2247 1797 
+L 3578 0 
+L 2900 0 
+L 1881 1375 
+L 863 0 
+L 184 0 
+L 1544 1831 
+L 300 3500 
+L 978 3500 
+L 1906 2253 
+L 2834 3500 
+L 3513 3500 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-20" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-73" d="M 2834 3397 
+L 2834 2853 
+Q 2591 2978 2328 3040 
+Q 2066 3103 1784 3103 
+Q 1356 3103 1142 2972 
+Q 928 2841 928 2578 
+Q 928 2378 1081 2264 
+Q 1234 2150 1697 2047 
+L 1894 2003 
+Q 2506 1872 2764 1633 
+Q 3022 1394 3022 966 
+Q 3022 478 2636 193 
+Q 2250 -91 1575 -91 
+Q 1294 -91 989 -36 
+Q 684 19 347 128 
+L 347 722 
+Q 666 556 975 473 
+Q 1284 391 1588 391 
+Q 1994 391 2212 530 
+Q 2431 669 2431 922 
+Q 2431 1156 2273 1281 
+Q 2116 1406 1581 1522 
+L 1381 1569 
+Q 847 1681 609 1914 
+Q 372 2147 372 2553 
+Q 372 3047 722 3315 
+Q 1072 3584 1716 3584 
+Q 2034 3584 2315 3537 
+Q 2597 3491 2834 3397 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-7a" d="M 353 3500 
+L 3084 3500 
+L 3084 2975 
+L 922 459 
+L 3084 459 
+L 3084 0 
+L 275 0 
+L 275 525 
+L 2438 3041 
+L 353 3041 
+L 353 3500 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-65" d="M 3597 1894 
+L 3597 1613 
+L 953 1613 
+Q 991 1019 1311 708 
+Q 1631 397 2203 397 
+Q 2534 397 2845 478 
+Q 3156 559 3463 722 
+L 3463 178 
+Q 3153 47 2828 -22 
+Q 2503 -91 2169 -91 
+Q 1331 -91 842 396 
+Q 353 884 353 1716 
+Q 353 2575 817 3079 
+Q 1281 3584 2069 3584 
+Q 2775 3584 3186 3129 
+Q 3597 2675 3597 1894 
+z
+M 3022 2063 
+Q 3016 2534 2758 2815 
+Q 2500 3097 2075 3097 
+Q 1594 3097 1305 2825 
+Q 1016 2553 972 2059 
+L 3022 2063 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-28" d="M 1984 4856 
+Q 1566 4138 1362 3434 
+Q 1159 2731 1159 2009 
+Q 1159 1288 1364 580 
+Q 1569 -128 1984 -844 
+L 1484 -844 
+Q 1016 -109 783 600 
+Q 550 1309 550 2009 
+Q 550 2706 781 3412 
+Q 1013 4119 1484 4856 
+L 1984 4856 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-Oblique-6e" d="M 3566 2113 
+L 3156 0 
+L 2578 0 
+L 2988 2091 
+Q 3016 2238 3031 2350 
+Q 3047 2463 3047 2528 
+Q 3047 2791 2881 2937 
+Q 2716 3084 2419 3084 
+Q 1956 3084 1622 2776 
+Q 1288 2469 1184 1941 
+L 800 0 
+L 225 0 
+L 903 3500 
+L 1478 3500 
+L 1363 2950 
+Q 1603 3253 1940 3418 
+Q 2278 3584 2650 3584 
+Q 3113 3584 3367 3334 
+Q 3622 3084 3622 2631 
+Q 3622 2519 3608 2391 
+Q 3594 2263 3566 2113 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-d7" d="M 4488 3438 
+L 3059 2003 
+L 4488 575 
+L 4116 197 
+L 2681 1631 
+L 1247 197 
+L 878 575 
+L 2303 2003 
+L 878 3438 
+L 1247 3816 
+L 2681 2381 
+L 4116 3816 
+L 4488 3438 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-29" d="M 513 4856 
+L 1013 4856 
+Q 1481 4119 1714 3412 
+Q 1947 2706 1947 2009 
+Q 1947 1309 1714 600 
+Q 1481 -109 1013 -844 
+L 513 -844 
+Q 928 -128 1133 580 
+Q 1338 1288 1338 2009 
+Q 1338 2731 1133 3434 
+Q 928 4138 513 4856 
+z
+" transform="scale(0.015625)"/>
+      </defs>
+      <use xlink:href="#DejaVuSans-4d" transform="translate(0 0.015625)"/>
+      <use xlink:href="#DejaVuSans-61" transform="translate(86.279297 0.015625)"/>
+      <use xlink:href="#DejaVuSans-74" transform="translate(147.558594 0.015625)"/>
+      <use xlink:href="#DejaVuSans-72" transform="translate(186.767578 0.015625)"/>
+      <use xlink:href="#DejaVuSans-69" transform="translate(227.880859 0.015625)"/>
+      <use xlink:href="#DejaVuSans-78" transform="translate(255.664062 0.015625)"/>
+      <use xlink:href="#DejaVuSans-20" transform="translate(314.84375 0.015625)"/>
+      <use xlink:href="#DejaVuSans-73" transform="translate(346.630859 0.015625)"/>
+      <use xlink:href="#DejaVuSans-69" transform="translate(398.730469 0.015625)"/>
+      <use xlink:href="#DejaVuSans-7a" transform="translate(426.513672 0.015625)"/>
+      <use xlink:href="#DejaVuSans-65" transform="translate(479.003906 0.015625)"/>
+      <use xlink:href="#DejaVuSans-20" transform="translate(540.527344 0.015625)"/>
+      <use xlink:href="#DejaVuSans-28" transform="translate(572.314453 0.015625)"/>
+      <use xlink:href="#DejaVuSans-Oblique-6e" transform="translate(611.328125 0.015625)"/>
+      <use xlink:href="#DejaVuSans-d7" transform="translate(694.189453 0.015625)"/>
+      <use xlink:href="#DejaVuSans-Oblique-6e" transform="translate(797.460938 0.015625)"/>
+      <use xlink:href="#DejaVuSans-29" transform="translate(860.839844 0.015625)"/>
+     </g>
+    </g>
+   </g>
+   <g id="matplotlib.axis_2">
+    <g id="ytick_1">
+     <g id="line2d_9">
+      <path d="M 72 320.4 
+L 518.4 320.4 
+" clip-path="url(#pd6c2af2edb)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_10">
+      <!-- 0 -->
+      <g style="fill: #262626" transform="translate(55.50125 324.579141)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-30"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_2">
+     <g id="line2d_10">
+      <path d="M 72 282.162582 
+L 518.4 282.162582 
+" clip-path="url(#pd6c2af2edb)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_11">
+      <!-- 1 -->
+      <g style="fill: #262626" transform="translate(55.50125 286.341722)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-31"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_3">
+     <g id="line2d_11">
+      <path d="M 72 243.925164 
+L 518.4 243.925164 
+" clip-path="url(#pd6c2af2edb)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_12">
+      <!-- 2 -->
+      <g style="fill: #262626" transform="translate(55.50125 248.104304)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-32"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_4">
+     <g id="line2d_12">
+      <path d="M 72 205.687745 
+L 518.4 205.687745 
+" clip-path="url(#pd6c2af2edb)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_13">
+      <!-- 3 -->
+      <g style="fill: #262626" transform="translate(55.50125 209.866886)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-33" d="M 2597 2516 
+Q 3050 2419 3304 2112 
+Q 3559 1806 3559 1356 
+Q 3559 666 3084 287 
+Q 2609 -91 1734 -91 
+Q 1441 -91 1130 -33 
+Q 819 25 488 141 
+L 488 750 
+Q 750 597 1062 519 
+Q 1375 441 1716 441 
+Q 2309 441 2620 675 
+Q 2931 909 2931 1356 
+Q 2931 1769 2642 2001 
+Q 2353 2234 1838 2234 
+L 1294 2234 
+L 1294 2753 
+L 1863 2753 
+Q 2328 2753 2575 2939 
+Q 2822 3125 2822 3475 
+Q 2822 3834 2567 4026 
+Q 2313 4219 1838 4219 
+Q 1578 4219 1281 4162 
+Q 984 4106 628 3988 
+L 628 4550 
+Q 988 4650 1302 4700 
+Q 1616 4750 1894 4750 
+Q 2613 4750 3031 4423 
+Q 3450 4097 3450 3541 
+Q 3450 3153 3228 2886 
+Q 3006 2619 2597 2516 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-33"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_5">
+     <g id="line2d_13">
+      <path d="M 72 167.450327 
+L 518.4 167.450327 
+" clip-path="url(#pd6c2af2edb)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_14">
+      <!-- 4 -->
+      <g style="fill: #262626" transform="translate(55.50125 171.629468)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-34"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_6">
+     <g id="line2d_14">
+      <path d="M 72 129.212909 
+L 518.4 129.212909 
+" clip-path="url(#pd6c2af2edb)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_15">
+      <!-- 5 -->
+      <g style="fill: #262626" transform="translate(55.50125 133.39205)scale(0.11 -0.11)">
+       <defs>
+        <path id="DejaVuSans-35" d="M 691 4666 
+L 3169 4666 
+L 3169 4134 
+L 1269 4134 
+L 1269 2991 
+Q 1406 3038 1543 3061 
+Q 1681 3084 1819 3084 
+Q 2600 3084 3056 2656 
+Q 3513 2228 3513 1497 
+Q 3513 744 3044 326 
+Q 2575 -91 1722 -91 
+Q 1428 -91 1123 -41 
+Q 819 9 494 109 
+L 494 744 
+Q 775 591 1075 516 
+Q 1375 441 1709 441 
+Q 2250 441 2565 725 
+Q 2881 1009 2881 1497 
+Q 2881 1984 2565 2268 
+Q 2250 2553 1709 2553 
+Q 1456 2553 1204 2497 
+Q 953 2441 691 2322 
+L 691 4666 
+z
+" transform="scale(0.015625)"/>
+       </defs>
+       <use xlink:href="#DejaVuSans-35"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_7">
+     <g id="line2d_15">
+      <path d="M 72 90.975491 
+L 518.4 90.975491 
+" clip-path="url(#pd6c2af2edb)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_16">
+      <!-- 6 -->
+      <g style="fill: #262626" transform="translate(55.50125 95.154632)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-36"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_8">
+     <g id="line2d_16">
+      <path d="M 72 52.738073 
+L 518.4 52.738073 
+" clip-path="url(#pd6c2af2edb)" style="fill: none; stroke: #cccccc; stroke-linecap: round"/>
+     </g>
+     <g id="text_17">
+      <!-- 7 -->
+      <g style="fill: #262626" transform="translate(55.50125 56.917213)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-37"/>
+      </g>
+     </g>
+    </g>
+    <g id="text_18">
+     <!-- GFLOPS -->
+     <g style="fill: #262626" transform="translate(49.005625 205.175625)rotate(-90)scale(0.12 -0.12)">
+      <defs>
+       <path id="DejaVuSans-47" d="M 3809 666 
+L 3809 1919 
+L 2778 1919 
+L 2778 2438 
+L 4434 2438 
+L 4434 434 
+Q 4069 175 3628 42 
+Q 3188 -91 2688 -91 
+Q 1594 -91 976 548 
+Q 359 1188 359 2328 
+Q 359 3472 976 4111 
+Q 1594 4750 2688 4750 
+Q 3144 4750 3555 4637 
+Q 3966 4525 4313 4306 
+L 4313 3634 
+Q 3963 3931 3569 4081 
+Q 3175 4231 2741 4231 
+Q 1884 4231 1454 3753 
+Q 1025 3275 1025 2328 
+Q 1025 1384 1454 906 
+Q 1884 428 2741 428 
+Q 3075 428 3337 486 
+Q 3600 544 3809 666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-46" d="M 628 4666 
+L 3309 4666 
+L 3309 4134 
+L 1259 4134 
+L 1259 2759 
+L 3109 2759 
+L 3109 2228 
+L 1259 2228 
+L 1259 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-4c" d="M 628 4666 
+L 1259 4666 
+L 1259 531 
+L 3531 531 
+L 3531 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-4f" d="M 2522 4238 
+Q 1834 4238 1429 3725 
+Q 1025 3213 1025 2328 
+Q 1025 1447 1429 934 
+Q 1834 422 2522 422 
+Q 3209 422 3611 934 
+Q 4013 1447 4013 2328 
+Q 4013 3213 3611 3725 
+Q 3209 4238 2522 4238 
+z
+M 2522 4750 
+Q 3503 4750 4090 4092 
+Q 4678 3434 4678 2328 
+Q 4678 1225 4090 567 
+Q 3503 -91 2522 -91 
+Q 1538 -91 948 565 
+Q 359 1222 359 2328 
+Q 359 3434 948 4092 
+Q 1538 4750 2522 4750 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-50" d="M 1259 4147 
+L 1259 2394 
+L 2053 2394 
+Q 2494 2394 2734 2622 
+Q 2975 2850 2975 3272 
+Q 2975 3691 2734 3919 
+Q 2494 4147 2053 4147 
+L 1259 4147 
+z
+M 628 4666 
+L 2053 4666 
+Q 2838 4666 3239 4311 
+Q 3641 3956 3641 3272 
+Q 3641 2581 3239 2228 
+Q 2838 1875 2053 1875 
+L 1259 1875 
+L 1259 0 
+L 628 0 
+L 628 4666 
+z
+" transform="scale(0.015625)"/>
+       <path id="DejaVuSans-53" d="M 3425 4513 
+L 3425 3897 
+Q 3066 4069 2747 4153 
+Q 2428 4238 2131 4238 
+Q 1616 4238 1336 4038 
+Q 1056 3838 1056 3469 
+Q 1056 3159 1242 3001 
+Q 1428 2844 1947 2747 
+L 2328 2669 
+Q 3034 2534 3370 2195 
+Q 3706 1856 3706 1288 
+Q 3706 609 3251 259 
+Q 2797 -91 1919 -91 
+Q 1588 -91 1214 -16 
+Q 841 59 441 206 
+L 441 856 
+Q 825 641 1194 531 
+Q 1563 422 1919 422 
+Q 2459 422 2753 634 
+Q 3047 847 3047 1241 
+Q 3047 1584 2836 1778 
+Q 2625 1972 2144 2069 
+L 1759 2144 
+Q 1053 2284 737 2584 
+Q 422 2884 422 3419 
+Q 422 4038 858 4394 
+Q 1294 4750 2059 4750 
+Q 2388 4750 2728 4690 
+Q 3069 4631 3425 4513 
+z
+" transform="scale(0.015625)"/>
+      </defs>
+      <use xlink:href="#DejaVuSans-47"/>
+      <use xlink:href="#DejaVuSans-46" x="77.490234"/>
+      <use xlink:href="#DejaVuSans-4c" x="135.009766"/>
+      <use xlink:href="#DejaVuSans-4f" x="187.097656"/>
+      <use xlink:href="#DejaVuSans-50" x="265.808594"/>
+      <use xlink:href="#DejaVuSans-53" x="326.111328"/>
+     </g>
+    </g>
+   </g>
+   <g id="line2d_17">
+    <path d="M 72 285.160395 
+L 83.446154 291.16057 
+L 94.892308 292.867731 
+L 106.338462 293.58536 
+L 117.784615 294.05854 
+L 129.230769 294.481762 
+L 140.676923 294.5598 
+L 152.123077 294.781507 
+L 163.569231 294.934846 
+L 175.015385 294.871454 
+L 186.461538 295.011767 
+L 197.907692 295.111866 
+L 209.353846 295.166093 
+L 220.8 295.137392 
+L 232.246154 295.144178 
+L 243.692308 304.00224 
+L 255.138462 295.201133 
+L 266.584615 295.442468 
+L 278.030769 298.882257 
+L 289.476923 299.086838 
+L 300.923077 298.411418 
+L 312.369231 298.579152 
+L 323.815385 298.053693 
+L 335.261538 303.20555 
+L 346.707692 298.727426 
+L 358.153846 298.313568 
+L 369.6 298.843825 
+L 381.046154 297.566381 
+L 392.492308 299.086196 
+L 403.938462 297.674252 
+L 415.384615 298.841081 
+L 426.830769 314.162278 
+L 438.276923 296.61393 
+L 449.723077 299.482438 
+L 461.169231 297.361834 
+L 472.615385 298.669831 
+L 484.061538 298.378589 
+L 495.507692 298.427732 
+L 506.953846 297.473041 
+L 518.4 304.242277 
+" clip-path="url(#pd6c2af2edb)" style="fill: none; stroke: #4c72b0; stroke-width: 1.5; stroke-linecap: round"/>
+   </g>
+   <g id="line2d_18">
+    <path d="M 72 286.56998 
+L 83.446154 291.899562 
+L 94.892308 293.35041 
+L 106.338462 294.024387 
+L 117.784615 294.335204 
+L 129.230769 294.670125 
+L 140.676923 294.871336 
+L 152.123077 295.226513 
+L 163.569231 295.322838 
+L 175.015385 295.514299 
+L 186.461538 295.762896 
+L 197.907692 295.843808 
+L 209.353846 295.937615 
+L 220.8 296.239674 
+L 232.246154 296.481716 
+L 243.692308 296.600102 
+L 255.138462 297.399203 
+L 266.584615 297.333365 
+L 278.030769 297.034185 
+L 289.476923 297.112106 
+L 300.923077 297.422247 
+L 312.369231 297.441402 
+L 323.815385 297.269582 
+L 335.261538 297.636459 
+L 346.707692 298.495997 
+L 358.153846 298.156776 
+L 369.6 299.111556 
+L 381.046154 298.658342 
+L 392.492308 298.596979 
+L 403.938462 298.434774 
+L 415.384615 298.328761 
+L 426.830769 298.36277 
+L 438.276923 298.246043 
+L 449.723077 298.724799 
+L 461.169231 298.907197 
+L 472.615385 298.554134 
+L 484.061538 298.577661 
+L 495.507692 298.606833 
+L 506.953846 298.501605 
+L 518.4 298.534155 
+" clip-path="url(#pd6c2af2edb)" style="fill: none; stroke: #dd8452; stroke-width: 1.5; stroke-linecap: round"/>
+   </g>
+   <g id="line2d_19">
+    <path d="M 72 157.755671 
+L 83.446154 90.263807 
+L 94.892308 60.317269 
+L 106.338462 56.102966 
+L 117.784615 58.070437 
+L 129.230769 57.245131 
+L 140.676923 67.132648 
+L 152.123077 74.307558 
+L 163.569231 78.482037 
+L 175.015385 83.985143 
+L 186.461538 95.486128 
+L 197.907692 99.922542 
+L 209.353846 108.92387 
+L 220.8 109.281666 
+L 232.246154 123.639266 
+L 243.692308 139.562573 
+L 255.138462 153.028202 
+L 266.584615 173.630705 
+L 278.030769 184.943573 
+L 289.476923 203.583631 
+L 300.923077 210.968173 
+L 312.369231 218.950488 
+L 323.815385 225.265461 
+L 335.261538 229.706975 
+L 346.707692 229.965982 
+L 358.153846 230.092893 
+L 369.6 228.842724 
+L 381.046154 229.333025 
+L 392.492308 234.216261 
+L 403.938462 231.478049 
+L 415.384615 234.292407 
+L 426.830769 234.040301 
+L 438.276923 232.622386 
+L 449.723077 232.73222 
+L 461.169231 234.191297 
+L 472.615385 233.963307 
+L 484.061538 231.603647 
+L 495.507692 228.474748 
+L 506.953846 233.244534 
+L 518.4 233.578859 
+" clip-path="url(#pd6c2af2edb)" style="fill: none; stroke: #55a868; stroke-width: 1.5; stroke-linecap: round"/>
+   </g>
+   <g id="patch_3">
+    <path d="M 72 320.4 
+L 72 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_4">
+    <path d="M 518.4 320.4 
+L 518.4 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_5">
+    <path d="M 72 320.4 
+L 518.4 320.4 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="patch_6">
+    <path d="M 72 43.2 
+L 518.4 43.2 
+" style="fill: none; stroke: #cccccc; stroke-width: 1.25; stroke-linejoin: miter; stroke-linecap: square"/>
+   </g>
+   <g id="text_19">
+    <!-- Matrix multiplication -->
+    <g style="fill: #262626" transform="translate(223.167812 23.2)scale(0.14 -0.14)">
+     <defs>
+      <path id="DejaVuSans-6d" d="M 3328 2828 
+Q 3544 3216 3844 3400 
+Q 4144 3584 4550 3584 
+Q 5097 3584 5394 3201 
+Q 5691 2819 5691 2113 
+L 5691 0 
+L 5113 0 
+L 5113 2094 
+Q 5113 2597 4934 2840 
+Q 4756 3084 4391 3084 
+Q 3944 3084 3684 2787 
+Q 3425 2491 3425 1978 
+L 3425 0 
+L 2847 0 
+L 2847 2094 
+Q 2847 2600 2669 2842 
+Q 2491 3084 2119 3084 
+Q 1678 3084 1418 2786 
+Q 1159 2488 1159 1978 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1356 3278 1631 3431 
+Q 1906 3584 2284 3584 
+Q 2666 3584 2933 3390 
+Q 3200 3197 3328 2828 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-75" d="M 544 1381 
+L 544 3500 
+L 1119 3500 
+L 1119 1403 
+Q 1119 906 1312 657 
+Q 1506 409 1894 409 
+Q 2359 409 2629 706 
+Q 2900 1003 2900 1516 
+L 2900 3500 
+L 3475 3500 
+L 3475 0 
+L 2900 0 
+L 2900 538 
+Q 2691 219 2414 64 
+Q 2138 -91 1772 -91 
+Q 1169 -91 856 284 
+Q 544 659 544 1381 
+z
+M 1991 3584 
+L 1991 3584 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-6c" d="M 603 4863 
+L 1178 4863 
+L 1178 0 
+L 603 0 
+L 603 4863 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-70" d="M 1159 525 
+L 1159 -1331 
+L 581 -1331 
+L 581 3500 
+L 1159 3500 
+L 1159 2969 
+Q 1341 3281 1617 3432 
+Q 1894 3584 2278 3584 
+Q 2916 3584 3314 3078 
+Q 3713 2572 3713 1747 
+Q 3713 922 3314 415 
+Q 2916 -91 2278 -91 
+Q 1894 -91 1617 61 
+Q 1341 213 1159 525 
+z
+M 3116 1747 
+Q 3116 2381 2855 2742 
+Q 2594 3103 2138 3103 
+Q 1681 3103 1420 2742 
+Q 1159 2381 1159 1747 
+Q 1159 1113 1420 752 
+Q 1681 391 2138 391 
+Q 2594 391 2855 752 
+Q 3116 1113 3116 1747 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-63" d="M 3122 3366 
+L 3122 2828 
+Q 2878 2963 2633 3030 
+Q 2388 3097 2138 3097 
+Q 1578 3097 1268 2742 
+Q 959 2388 959 1747 
+Q 959 1106 1268 751 
+Q 1578 397 2138 397 
+Q 2388 397 2633 464 
+Q 2878 531 3122 666 
+L 3122 134 
+Q 2881 22 2623 -34 
+Q 2366 -91 2075 -91 
+Q 1284 -91 818 406 
+Q 353 903 353 1747 
+Q 353 2603 823 3093 
+Q 1294 3584 2113 3584 
+Q 2378 3584 2631 3529 
+Q 2884 3475 3122 3366 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-6f" d="M 1959 3097 
+Q 1497 3097 1228 2736 
+Q 959 2375 959 1747 
+Q 959 1119 1226 758 
+Q 1494 397 1959 397 
+Q 2419 397 2687 759 
+Q 2956 1122 2956 1747 
+Q 2956 2369 2687 2733 
+Q 2419 3097 1959 3097 
+z
+M 1959 3584 
+Q 2709 3584 3137 3096 
+Q 3566 2609 3566 1747 
+Q 3566 888 3137 398 
+Q 2709 -91 1959 -91 
+Q 1206 -91 779 398 
+Q 353 888 353 1747 
+Q 353 2609 779 3096 
+Q 1206 3584 1959 3584 
+z
+" transform="scale(0.015625)"/>
+      <path id="DejaVuSans-6e" d="M 3513 2113 
+L 3513 0 
+L 2938 0 
+L 2938 2094 
+Q 2938 2591 2744 2837 
+Q 2550 3084 2163 3084 
+Q 1697 3084 1428 2787 
+Q 1159 2491 1159 1978 
+L 1159 0 
+L 581 0 
+L 581 3500 
+L 1159 3500 
+L 1159 2956 
+Q 1366 3272 1645 3428 
+Q 1925 3584 2291 3584 
+Q 2894 3584 3203 3211 
+Q 3513 2838 3513 2113 
+z
+" transform="scale(0.015625)"/>
+     </defs>
+     <use xlink:href="#DejaVuSans-4d"/>
+     <use xlink:href="#DejaVuSans-61" x="86.279297"/>
+     <use xlink:href="#DejaVuSans-74" x="147.558594"/>
+     <use xlink:href="#DejaVuSans-72" x="186.767578"/>
+     <use xlink:href="#DejaVuSans-69" x="227.880859"/>
+     <use xlink:href="#DejaVuSans-78" x="255.664062"/>
+     <use xlink:href="#DejaVuSans-20" x="314.84375"/>
+     <use xlink:href="#DejaVuSans-6d" x="346.630859"/>
+     <use xlink:href="#DejaVuSans-75" x="444.042969"/>
+     <use xlink:href="#DejaVuSans-6c" x="507.421875"/>
+     <use xlink:href="#DejaVuSans-74" x="535.205078"/>
+     <use xlink:href="#DejaVuSans-69" x="574.414062"/>
+     <use xlink:href="#DejaVuSans-70" x="602.197266"/>
+     <use xlink:href="#DejaVuSans-6c" x="665.673828"/>
+     <use xlink:href="#DejaVuSans-69" x="693.457031"/>
+     <use xlink:href="#DejaVuSans-63" x="721.240234"/>
+     <use xlink:href="#DejaVuSans-61" x="776.220703"/>
+     <use xlink:href="#DejaVuSans-74" x="837.5"/>
+     <use xlink:href="#DejaVuSans-69" x="876.708984"/>
+     <use xlink:href="#DejaVuSans-6f" x="904.492188"/>
+     <use xlink:href="#DejaVuSans-6e" x="965.673828"/>
+    </g>
+   </g>
+   <g id="legend_1">
+    <g id="patch_7">
+     <path d="M 414.027187 100.437812 
+L 510.7 100.437812 
+Q 512.9 100.437812 512.9 98.237812 
+L 512.9 50.9 
+Q 512.9 48.7 510.7 48.7 
+L 414.027187 48.7 
+Q 411.827187 48.7 411.827187 50.9 
+L 411.827187 98.237812 
+Q 411.827187 100.437812 414.027187 100.437812 
+z
+" style="fill: #ffffff; opacity: 0.8; stroke: #cccccc; stroke-linejoin: miter"/>
+    </g>
+    <g id="line2d_20">
+     <path d="M 416.227187 57.608281 
+L 427.227187 57.608281 
+L 438.227187 57.608281 
+" style="fill: none; stroke: #4c72b0; stroke-width: 1.5; stroke-linecap: round"/>
+    </g>
+    <g id="text_20">
+     <!-- naive -->
+     <g style="fill: #262626" transform="translate(447.027187 61.458281)scale(0.11 -0.11)">
+      <defs>
+       <path id="DejaVuSans-76" d="M 191 3500 
+L 800 3500 
+L 1894 563 
+L 2988 3500 
+L 3597 3500 
+L 2284 0 
+L 1503 0 
+L 191 3500 
+z
+" transform="scale(0.015625)"/>
+      </defs>
+      <use xlink:href="#DejaVuSans-6e"/>
+      <use xlink:href="#DejaVuSans-61" x="63.378906"/>
+      <use xlink:href="#DejaVuSans-69" x="124.658203"/>
+      <use xlink:href="#DejaVuSans-76" x="152.441406"/>
+      <use xlink:href="#DejaVuSans-65" x="211.621094"/>
+     </g>
+    </g>
+    <g id="line2d_21">
+     <path d="M 416.227187 73.754219 
+L 427.227187 73.754219 
+L 438.227187 73.754219 
+" style="fill: none; stroke: #dd8452; stroke-width: 1.5; stroke-linecap: round"/>
+    </g>
+    <g id="text_21">
+     <!-- transposed -->
+     <g style="fill: #262626" transform="translate(447.027187 77.604219)scale(0.11 -0.11)">
+      <defs>
+       <path id="DejaVuSans-64" d="M 2906 2969 
+L 2906 4863 
+L 3481 4863 
+L 3481 0 
+L 2906 0 
+L 2906 525 
+Q 2725 213 2448 61 
+Q 2172 -91 1784 -91 
+Q 1150 -91 751 415 
+Q 353 922 353 1747 
+Q 353 2572 751 3078 
+Q 1150 3584 1784 3584 
+Q 2172 3584 2448 3432 
+Q 2725 3281 2906 2969 
+z
+M 947 1747 
+Q 947 1113 1208 752 
+Q 1469 391 1925 391 
+Q 2381 391 2643 752 
+Q 2906 1113 2906 1747 
+Q 2906 2381 2643 2742 
+Q 2381 3103 1925 3103 
+Q 1469 3103 1208 2742 
+Q 947 2381 947 1747 
+z
+" transform="scale(0.015625)"/>
+      </defs>
+      <use xlink:href="#DejaVuSans-74"/>
+      <use xlink:href="#DejaVuSans-72" x="39.208984"/>
+      <use xlink:href="#DejaVuSans-61" x="80.322266"/>
+      <use xlink:href="#DejaVuSans-6e" x="141.601562"/>
+      <use xlink:href="#DejaVuSans-73" x="204.980469"/>
+      <use xlink:href="#DejaVuSans-70" x="257.080078"/>
+      <use xlink:href="#DejaVuSans-6f" x="320.556641"/>
+      <use xlink:href="#DejaVuSans-73" x="381.738281"/>
+      <use xlink:href="#DejaVuSans-65" x="433.837891"/>
+      <use xlink:href="#DejaVuSans-64" x="495.361328"/>
+     </g>
+    </g>
+    <g id="line2d_22">
+     <path d="M 416.227187 89.900156 
+L 427.227187 89.900156 
+L 438.227187 89.900156 
+" style="fill: none; stroke: #55a868; stroke-width: 1.5; stroke-linecap: round"/>
+    </g>
+    <g id="text_22">
+     <!-- vectorized -->
+     <g style="fill: #262626" transform="translate(447.027187 93.750156)scale(0.11 -0.11)">
+      <use xlink:href="#DejaVuSans-76"/>
+      <use xlink:href="#DejaVuSans-65" x="59.179688"/>
+      <use xlink:href="#DejaVuSans-63" x="120.703125"/>
+      <use xlink:href="#DejaVuSans-74" x="175.683594"/>
+      <use xlink:href="#DejaVuSans-6f" x="214.892578"/>
+      <use xlink:href="#DejaVuSans-72" x="276.074219"/>
+      <use xlink:href="#DejaVuSans-69" x="317.1875"/>
+      <use xlink:href="#DejaVuSans-7a" x="344.970703"/>
+      <use xlink:href="#DejaVuSans-65" x="397.460938"/>
+      <use xlink:href="#DejaVuSans-64" x="458.984375"/>
+     </g>
+    </g>
+   </g>
+  </g>
+ </g>
+ <defs>
+  <clipPath id="pd6c2af2edb">
+   <rect x="72" y="43.2" width="446.4" height="277.2"/>
+  </clipPath>
+ </defs>
+</svg>
diff --git a/content/english/hpc/algorithms/img/rho.jpg b/content/english/hpc/algorithms/img/rho.jpg
new file mode 100644
index 00000000..d7f01ad8
Binary files /dev/null and b/content/english/hpc/algorithms/img/rho.jpg differ
diff --git a/content/english/hpc/algorithms/matmul.md b/content/english/hpc/algorithms/matmul.md
index be5bd07d..cf976045 100644
--- a/content/english/hpc/algorithms/matmul.md
+++ b/content/english/hpc/algorithms/matmul.md
@@ -1,426 +1,485 @@
 ---
 title: Matrix Multiplication
-weight: 4
-draft: true
+weight: 20
 ---
 
+<!--
+baseline 13.58622 0.5209607970428861
+hugepages 16.749895 0.42256312651512146
+transposed 12.377302 0.5718441708863531
+autovec 3.117215 2.2705806304666187
+vectorized 3.075742 2.301196914435606
+kernel 2.24264 3.1560517960974566
+blocked 0.461477 15.33746643928083
+noalloc 0.408031 17.346446716058338
+nomove 0.303826 23.295860130469414
+blas 0.27489790320396423 25.747333528217077
+-->
 
-## Case Study: Distance Product
+In this case study, we will design and implement several algorithms for matrix multiplication.
 
-(We are going to speedrun "[Programming Parallel Computers](http://ppc.cs.aalto.fi/ch2/)" course)
+We start with the naive "for-for-for" algorithm and incrementally improve it, eventually arriving at a version that is 50 times faster and matches the performance of BLAS libraries while being under 40 lines of C.
 
-Given a matrix $D$, we need to calculate its "min-plus matrix multiplication" defined as:
+All implementations are compiled with GCC 13 and run on a [Zen 2](https://en.wikichip.org/wiki/amd/microarchitectures/zen_2) CPU clocked at 2GHz.
 
-$(D \circ D)_{ij} = \min_k(D_{ik} + D_{kj})$
+## Baseline
 
-----
+The result of multiplying an $l \times n$ matrix $A$ by an $n \times m$ matrix $B$ is defined as an $l \times m$ matrix $C$ such that:
 
-Graph interpretation:
-find shortest paths of length 2 between all vertices in a fully-connected weighted graph
+$$
+C_{ij} = \sum_{k=1}^{n} A_{ik} \cdot B_{kj}
+$$
 
-![](https://i.imgur.com/Zf4G7qj.png)
+For simplicity, we will only consider *square* matrices, where $l = m = n$.
 
-----
+To implement matrix multiplication, we can simply transfer this definition into code, but instead of two-dimensional arrays (aka matrices), we will be using one-dimensional arrays to be explicit about pointer arithmetic:
 
-A cool thing about distance product is that if if we iterate the process and calculate:
+```c++
+void matmul(const float *a, const float *b, float *c, int n) {
+    for (int i = 0; i < n; i++)
+        for (int j = 0; j < n; j++)
+            for (int k = 0; k < n; k++)
+                c[i * n + j] += a[i * n + k] * b[k * n + j];
+}
+```
 
-$D_2 = D \circ D, \;\;
-D_4 = D_2 \circ D_2, \;\;
-D_8 = D_4 \circ D_4, \;\;
-\ldots$
+For reasons that will become apparent later, we will only use matrix sizes that are multiples of $48$ for benchmarking, but the implementations remain correct for all others. We also use [32-bit floats](/hpc/arithmetic/ieee-754) specifically, although all implementations can be easily [generalized](#generalizations) to other data types and operations.
 
-Then we can find all-pairs shortest distances in $O(\log n)$ steps
+Compiled with `g++ -O3 -march=native -ffast-math -funroll-loops`, the naive approach multiplies two matrices of size $n = 1920 = 48 \times 40$ in ~16.7 seconds. To put it in perspective, this is approximately $\frac{1920^3}{16.7 \times 10^9} \approx 0.42$ useful operations per nanosecond (GFLOPS), or roughly 5 CPU cycles per multiplication, which doesn't look that good yet.
 
-(but recall that there are [more direct ways](https://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm) to solve it) <!-- .element: class="fragment" data-fragment-index="1" -->
+## Transposition
 
----
+In general, when optimizing an algorithm that processes large quantities of data — and $1920^2 \times 3 \times 4 \approx 42$ MB clearly is a large quantity as it can't fit into any of the [CPU caches](/hpc/cpu-cache) — one should always start with memory before optimizing arithmetic, as it is much more likely to be the bottleneck.
 
-## V0: Baseline
+The field $C_{ij}$ can be thought of as the dot product of row $i$ of matrix $A$ and column $j$ of matrix $B$. As we increment `k` in the inner loop above, we are reading the matrix `a` sequentially, but we are jumping over $n$ elements as we iterate over a column of `b`, which is [not as fast](/hpc/cpu-cache/aos-soa) as sequential iteration.
 
-Implement the definition of what we need to do, but using arrays instead of matrices:
+One [well-known](/hpc/external-memory/oblivious/#matrix-multiplication) optimization that tackles this problem is to store matrix $B$ in *column-major* order — or, alternatively, to *transpose* it before the matrix multiplication. This requires $O(n^2)$ additional operations but ensures sequential reads in the innermost loop:
 
-```cpp
-const float infty = std::numeric_limits<float>::infinity();
+<!--
 
-void step(float* r, const float* d, int n) {
-    for (int i = 0; i < n; ++i) {
-        for (int j = 0; j < n; ++j) {
-            float v = infty;
-            for (int k = 0; k < n; ++k) {
-                float x = d[n*i + k];
-                float y = d[n*k + j];
-                float z = x + y;
-                v = std::min(v, z);
-            }
-            r[n*i + j] = v;
-        }
-    }
+![](../img/column-major.jpg)
+
+-->
+
+```c++
+void matmul(const float *a, const float *_b, float *c, int n) {
+    float *b = new float[n * n];
+
+    for (int i = 0; i < n; i++)
+        for (int j = 0; j < n; j++)
+            b[i * n + j] = _b[j * n + i];
+    
+    for (int i = 0; i < n; i++)
+        for (int j = 0; j < n; j++)
+            for (int k = 0; k < n; k++)
+                c[i * n + j] += a[i * n + k] * b[j * n + k]; // <- note the indices
 }
 ```
 
-Compile with `g++ -O3 -march=native -std=c++17`
+This code runs in ~12.4s, or about 30% faster.
 
-On our Intel Core i5-6500 ("Skylake", 4 cores, 3.6 GHz) with $n=4000$ it runs for 99s,
-which amounts to ~1.3B useful floating point operations per second
+As we will see in a bit, there are more important benefits to transposing it than just the sequential memory reads.
 
----
+## Vectorization
 
-## Theoretical Performance
+Now that all we do is just sequentially read the elements of `a` and `b`, multiply them, and add the result to an accumulator variable, we can use [SIMD](/hpc/simd/) instructions to speed it all up. It is pretty straightforward to implement using [GCC vector types](/hpc/simd/intrinsics/#gcc-vector-extensions) — we can [memory-align](/hpc/cpu-cache/alignment/) matrix rows, pad them with zeros, and then compute the multiply-sum as we would normally compute any other [reduction](/hpc/simd/reduction/):
 
-$$
-\underbrace{4}_{CPUs} \cdot \underbrace{8}_{SIMD} \cdot \underbrace{2}_{1/thr} \cdot \underbrace{3.6 \cdot 10^9}_{cycles/sec} = 230.4 \; GFLOPS \;\; (2.3 \cdot 10^{11})
-$$
+```c++
+// a vector of 256 / 32 = 8 floats
+typedef float vec __attribute__ (( vector_size(32) ));
 
-RAM bandwidth: 34.1 GB/s (or ~10 bytes per cycle)
-<!-- .element: class="fragment" data-fragment-index="1" -->
+// a helper function that allocates n vectors and initializes them with zeros
+vec* alloc(int n) {
+    vec* ptr = (vec*) std::aligned_alloc(32, 32 * n);
+    memset(ptr, 0, 32 * n);
+    return ptr;
+}
 
----
+void matmul(const float *_a, const float *_b, float *c, int n) {
+    int nB = (n + 7) / 8; // number of 8-element vectors in a row (rounded up)
+
+    vec *a = alloc(n * nB);
+    vec *b = alloc(n * nB);
+
+    // move both matrices to the aligned region
+    for (int i = 0; i < n; i++) {
+        for (int j = 0; j < n; j++) {
+            a[i * nB + j / 8][j % 8] = _a[i * n + j];
+            b[i * nB + j / 8][j % 8] = _b[j * n + i]; // <- b is still transposed
+        }
+    }
+
+    for (int i = 0; i < n; i++) {
+        for (int j = 0; j < n; j++) {
+            vec s{}; // initialize the accumulator with zeros
+
+            // vertical summation
+            for (int k = 0; k < nB; k++)
+                s += a[i * nB + k] * b[j * nB + k];
+            
+            // horizontal summation
+            for (int k = 0; k < 8; k++)
+                c[i * n + j] += s[k];
+        }
+    }
 
-## OpenMP
+    std::free(a);
+    std::free(b);
+}
+```
 
-* We have 4 cores, so why don't we use them?
-* There are low-level ways of creating threads, but they involve a lot of code
-* We will use a high-level interface called OpenMP
-* (We will talk about multithreading in much more detail on the next lecture)
+The performance for $n = 1920$ is now around 2.3 GFLOPS — or another ~4 times higher compared to the transposed but not vectorized version.
 
-![](https://www.researchgate.net/profile/Mario_Storti/publication/231168223/figure/fig2/AS:393334787985424@1470789729707/The-master-thread-creates-a-team-of-parallel-threads.png =400x)
+![](../img/mm-vectorized-barplot.svg)
 
-----
+This optimization looks neither too complex nor specific to matrix multiplication. Why can't the compiler [auto-vectorizee](/hpc/simd/auto-vectorization/) the inner loop by itself?
 
-## Multithreading Made Easy
+It actually can; the only thing preventing that is the possibility that `c` overlaps with either `a` or `b`. To rule it out, you can communicate to the compiler that you guarantee `c` is not [aliased](/hpc/compilation/contracts/#memory-aliasing) with anything by adding the `__restrict__` keyword to it:
 
-All you need to know for now is the `#pragma omp parallel for` directive
+<!-- (the compiler already knows that reading `a` and `b` is safe in any order because they are marked as `const`): -->
 
-```cpp
-#pragma omp parallel for
-for (int i = 0; i < 10; ++i) {
-    do_stuff(i);
+```c++
+void matmul(const float *a, const float *_b, float * __restrict__ c, int n) {
+    // ...
 }
 ```
 
-It splits iterations of a loop among multiple threads
+Both manually and auto-vectorized implementations perform roughly the same.
 
-There are many ways to control scheduling,
-but we'll just leave defaults because our use case is simple
-<!-- .element: class="fragment" data-fragment-index="1" -->
+<!--
 
+The performance is bottlenecked by using a single variable. We could use multiple variables similar to other reductions, but we will solve it later anyway.
 
-----
+-->
 
-## Warning: Data Races
+## Memory efficiency
 
-This only works when all iterations can safely be executed simultaneously
-It's not always easy to determine, but for now following rules of thumb are enough:
+What is interesting is that the implementation efficiency depends on the problem size. 
 
-* There must not be any shared data element that is read by X and written by Y
-* There must not be any shared data element that is written by X and written by Y
+At first, the performance (defined as the number of useful operations per second) increases as the overhead of the loop management and the horizontal reduction decreases. Then, at around $n=256$, it starts smoothly decreasing as the matrices stop fitting into the [cache](/hpc/cpu-cache/) ($2 \times 256^2 \times 4 = 512$ KB is the size of the L2 cache), and the performance becomes bottlenecked by the [memory bandwidth](/hpc/cpu-cache/bandwidth/).
 
-E. g. sum can't be parallelized this way, as threads would modify a shared variable
-<!-- .element: class="fragment" data-fragment-index="1" -->
+![](../img/mm-vectorized-plot.svg)
 
----
+It is also interesting that the naive implementation is mostly on par with the non-vectorized transposed version — and even slightly better because it doesn't need to perform a transposition.
 
-## Parallel Baseline
-
-OpenMP is included in compilers: just add `-fopenmp` flag and that's it
-
-```cpp
-void step(float* r, const float* d, int n) {
-    #pragma omp parallel for
-    for (int i = 0; i < n; ++i) {
-        for (int j = 0; j < n; ++j) {
-            float v = infty;
-            for (int k = 0; k < n; ++k) {
-                float x = d[n*i + k];
-                float y = d[n*k + j];
-                float z = x + y;
-                v = std::min(v, z);
-            }
-            r[n*i + j] = v;
-        }
-    }
-}
-```
+One might think that there would be some general performance gain from doing sequential reads since we are fetching fewer cache lines, but this is not the case: fetching the first column of `b` indeed takes more time, but the next 15 column reads will be in the same cache lines as the first one, so they will be cached anyway — unless the matrix is so large that it can't even fit `n * cache_line_size` bytes into the cache, which is not the case for any practical matrix sizes.
 
-Runs ~4x times faster, as it should
+Instead, the performance deteriorates on only a few specific matrix sizes due to the effects of [cache associativity](/hpc/cpu-cache/associativity/): when $n$ is a multiple of a large power of two, we are fetching the addresses of `b` that all likely map to the same cache line, which reduces the effective cache size. This explains the 30% performance dip for $n = 1920 = 2^7 \times 3 \times 5$, and you can see an even more noticeable one for $1536 = 2^9 \times 3$: it is roughly 3 times slower than for $n=1535$.
 
----
+So, counterintuitively, transposing the matrix doesn't help with caching — and in the naive scalar implementation, we are not really bottlenecked by the memory bandwidth anyway. But our vectorized implementation certainly is, so let's work on its I/O efficiency.
 
-## Memory Bottleneck
+## Register reuse
 
-![](https://i.imgur.com/z4d6aez.png =450x)
+Using a Python-like notation to refer to submatrices, to compute the cell $C[x][y]$, we need to calculate the dot product of $A[x][:]$ and $B[:][y]$, which requires fetching $2n$ elements, even if we store $B$ in column-major order.
 
-(It is slower on macOS because of smaller page sizes)
+<!-- Any two cells of A and B are used to update some cell of C. -->
 
-----
+To compute $C[x:x+2][y:y+2]$, a $2 \times 2$ submatrix of $C$, we would need two rows from $A$ and two columns from $B$, namely $A[x:x+2][:]$ and $B[:][y:y+2]$, containing $4n$ elements in total, to update *four* elements instead of *one* — which is $\frac{2n / 1}{4n / 4} = 2$ times better in terms of I/O efficiency.
 
-## Virtual Memory
+<!--
 
-![](https://www.cs.uic.edu/~jbell/CourseNotes/OperatingSystems/images/Chapter9/9_01_VirtualMemoryLarger.jpg =500x)
+To actually avoid reading more data, we need to read these $2+2$ rows and columns in parallel and update all $2 \times 2$ cells at once using all possible combinations of products.
 
----
+-->
 
-## V1: Linear Reading
+To avoid fetching data more than once, we need to iterate over these rows and columns in parallel and calculate all $2 \times 2$ possible combinations of products. Here is a proof of concept:
 
-Just transpose it, as we did with matrices
+```c++
+void kernel_2x2(int x, int y) {
+    int c00 = 0, c01 = 0, c10 = 0, c11 = 0;
 
-```cpp
-void step(float* r, const float* d, int n) {
-    std::vector<float> t(n*n);
-    #pragma omp parallel for
-    for (int i = 0; i < n; ++i) {
-        for (int j = 0; j < n; ++j) {
-            t[n*j + i] = d[n*i + j];
-        }
-    }
+    for (int k = 0; k < n; k++) {
+        // read rows
+        int a0 = a[x][k];
+        int a1 = a[x + 1][k];
 
-    #pragma omp parallel for
-    for (int i = 0; i < n; ++i) {
-        for (int j = 0; j < n; ++j) {
-            float v = std::numeric_limits<float>::infinity();
-            for (int k = 0; k < n; ++k) {
-                float x = d[n*i + k];
-                float y = t[n*j + k];
-                float z = x + y;
-                v = std::min(v, z);
-            }
-            r[n*i + j] = v;
-        }
+        // read columns
+        int b0 = b[k][y];
+        int b1 = b[k][y + 1];
+
+        // update all combinations
+        c00 += a0 * b0;
+        c01 += a0 * b1;
+        c10 += a1 * b0;
+        c11 += a1 * b1;
     }
+
+    // write the results to C
+    c[x][y]         = c00;
+    c[x][y + 1]     = c01;
+    c[x + 1][y]     = c10;
+    c[x + 1][y + 1] = c11;
 }
 ```
 
-----
+We can now simply call this kernel on all 2x2 submatrices of $C$, but we won't bother evaluating it: although this algorithm is better in terms of I/O operations, it would still not beat our SIMD-based implementation. Instead, we will extend this approach and develop a similar *vectorized* kernel right away.
 
-![](https://i.imgur.com/UwxcEG7.png =600x)
+<!-- It also boosts instruction-level parallelism (we don't have to wait between iterations to update the loop state) and saves some cycles from executing the read instructions.
 
-----
+Of course, although better in terms of I/O, this $2 \times 2$ update would not beat our vectorized implementation, so we are not going to try this version in particular and instead will scale the idea right away.
 
-![](https://i.imgur.com/2ySfr0V.png =600x)
+-->
 
----
+## Designing the kernel
 
-## V2: Instruction-Level Parallelism
+Instead of designing a kernel that computes an $h \times w$ submatrix of $C$ from scratch, we will declare a function that *updates* it using columns from $l$ to $r$ of $A$ and rows from $l$ to $r$ of $B$. For now, this seems like an over-generalization, but this function interface will prove useful later.
 
-We can apply the same trick as we did with array sum earlier, so that instead of:
+<!--
 
-```cpp
-v = min(v, z0);
-v = min(v, z1);
-v = min(v, z2);
-v = min(v, z3);
-v = min(v, z4);
-```
+We follow this approach and design a general kernel that updates a $h \times w$ submatrix of C using columns from $l$ to $r$ of $A$ and rows from $l$ to $r$ of $B$ (i.e., not a full computation, but only a partial update — it will be clear why later). 
 
-We use a few registers and compute minimum simultaneously utilizing ILP:
+-->
 
-```cpp
-v0 = min(v0, z0);
-v1 = min(v1, z1);
-v0 = min(v0, z2);
-v1 = min(v1, z3);
-v0 = min(v0, z4);
-...
-v = min(v0, v1);
-```
+To determine $h$ and $w$, we have several performance considerations:
 
-----
+- In general, to compute an $h \times w$ submatrix, we need to fetch $2 \cdot n \cdot (h + w)$ elements. To optimize the I/O efficiency, we want the $\frac{h \cdot w}{h + w}$ ratio to be high, which is achieved with large and square-ish submatrices.
+- We want to use the [FMA](https://en.wikipedia.org/wiki/FMA_instruction_set) ("fused multiply-add") instruction available on all modern x86 architectures. As you can guess from the name, it performs the `c += a * b` operation — which is the core of a dot product — on 8-element vectors in one go, which saves us from executing vector multiplication and addition separately. <!-- saxpy: Single-Precision A·X Plus Y -->
+- To achieve better utilization of this instruction, we want to make use of [instruction-level parallelism](/hpc/pipelining/). On Zen 2, the `fma` instruction has a latency of 5 and a throughput of 2, meaning that we need to concurrently execute at least $5 \times 2 = 10$ of them to saturate its execution ports.
+- We want to avoid register spill (move data to and from registers more than necessary), and we only have $16$ logical vector registers that we can use as accumulators (minus those that we need to hold temporary values).
 
-![](https://i.imgur.com/ihMC6z2.png)
+For these reasons, we settle on a $6 \times 16$ kernel. This way, we process $96$ elements at once that are stored in $6 \times 2 = 12$ vector registers. To update them efficiently, we use the following procedure:
 
-Our memory layout looks like this now
+<!--
 
-----
+We [broadcast](/hpc/simd/moving/#broadcast) an element of A, and then use it to update the first row ($8 + 8$ elements). Then we load the one below it, and so on. When we have updated the last row, we move to the next $6$ elements to the right.
 
-```cpp
-void step(float* r, const float* d_, int n) {
-    constexpr int nb = 4;
-    int na = (n + nb - 1) / nb;
-    int nab = na*nb;
+The final implementation is simpler than it sounds:
 
-    // input data, padded
-    std::vector<float> d(n*nab, infty);
-    // input data, transposed, padded
-    std::vector<float> t(n*nab, infty);
+-->
 
-    #pragma omp parallel for
-    for (int j = 0; j < n; ++j) {
-        for (int i = 0; i < n; ++i) {
-            d[nab*j + i] = d_[n*j + i];
-            t[nab*j + i] = d_[n*i + j];
-        }
-    }
+```c++
+// update 6x16 submatrix C[x:x+6][y:y+16]
+// using A[x:x+6][l:r] and B[l:r][y:y+16]
+void kernel(float *a, vec *b, vec *c, int x, int y, int l, int r, int n) {
+    vec t[6][2]{}; // will be zero-filled and stored in ymm registers
 
-    #pragma omp parallel for
-    for (int i = 0; i < n; ++i) {
-        for (int j = 0; j < n; ++j) {
-            // vv[0] = result for k = 0, 4, 8, ...
-            // vv[1] = result for k = 1, 5, 9, ...
-            // vv[2] = result for k = 2, 6, 10, ...
-            // vv[3] = result for k = 3, 7, 11, ...
-            float vv[nb];
-            for (int kb = 0; kb < nb; ++kb) {
-                vv[kb] = infty;
-            }
-            for (int ka = 0; ka < na; ++ka) {
-                for (int kb = 0; kb < nb; ++kb) {
-                    float x = d[nab*i + ka * nb + kb];
-                    float y = t[nab*j + ka * nb + kb];
-                    float z = x + y;
-                    vv[kb] = std::min(vv[kb], z);
-                }
-            }
-            // v = result for k = 0, 1, 2, ...
-            float v = infty;
-            for (int kb = 0; kb < nb; ++kb) {
-                v = std::min(vv[kb], v);
-            }
-            r[n*i + j] = v;
+    for (int k = l; k < r; k++) {
+        for (int i = 0; i < 6; i++) {
+            // broadcast a[x + i][k] into a register
+            vec alpha = vec{} + a[(x + i) * n + k]; // converts to a broadcast
+            // multiply b[k][y:y+16] by it and update t[i][0] and t[i][1]
+            for (int j = 0; j < 2; j++)
+                t[i][j] += alpha * b[(k * n + y) / 8 + j]; // converts to an fma
         }
     }
+
+    // write the results back to C
+    for (int i = 0; i < 6; i++)
+        for (int j = 0; j < 2; j++)
+            c[((x + i) * n + y) / 8 + j] += t[i][j];
 }
 ```
 
-----
+We need `t` so that the compiler stores these elements in vector registers. We could just update their final destinations in `c`, but, unfortunately, the compiler re-writes them back to memory, causing a slowdown (wrapping everything in `__restrict__` keywords doesn't help).
 
-![](https://i.imgur.com/5uHVRL4.png =600x)
+After unrolling these loops and hoisting `b` out of the `i` loop (`b[(k * n + y) / 8 + j]` does not depend on `i` and can be loaded once and reused in all 6 iterations), the compiler generates something more similar to this:
 
----
-
-## V3: Vectorization
+<!-- /hpc/simd/intrinsics/#simd-intrinsics -->
 
-![](https://i.imgur.com/EG0WjHl.png =400x)
+```c++
+for (int k = l; k < r; k++) {
+    __m256 b0 = _mm256_load_ps((__m256*) &b[k * n + y];
+    __m256 b1 = _mm256_load_ps((__m256*) &b[k * n + y + 8];
+    
+    __m256 a0 = _mm256_broadcast_ps((__m128*) &a[x * n + k]);
+    t00 = _mm256_fmadd_ps(a0, b0, t00);
+    t01 = _mm256_fmadd_ps(a0, b1, t01);
 
-----
+    __m256 a1 = _mm256_broadcast_ps((__m128*) &a[(x + 1) * n + k]);
+    t10 = _mm256_fmadd_ps(a1, b0, t10);
+    t11 = _mm256_fmadd_ps(a1, b1, t11);
 
-```cpp
-static inline float8_t min8(float8_t x, float8_t y) {
-    return x < y ? x : y;
+    // ...
 }
+```
 
-void step(float* r, const float* d_, int n) {
-    // elements per vector
-    constexpr int nb = 8;
-    // vectors per input row
-    int na = (n + nb - 1) / nb;
-
-    // input data, padded, converted to vectors
-    float8_t* vd = float8_alloc(n*na);
-    // input data, transposed, padded, converted to vectors
-    float8_t* vt = float8_alloc(n*na);
-
-    #pragma omp parallel for
-    for (int j = 0; j < n; ++j) {
-        for (int ka = 0; ka < na; ++ka) {
-            for (int kb = 0; kb < nb; ++kb) {
-                int i = ka * nb + kb;
-                vd[na*j + ka][kb] = i < n ? d_[n*j + i] : infty;
-                vt[na*j + ka][kb] = i < n ? d_[n*i + j] : infty;
-            }
-        }
-    }
+We are using $12+3=15$ vector registers and a total of $6 \times 3 + 2 = 20$ instructions to perform $16 \times 6 = 96$ updates. Assuming that there are no other bottleneks, we should be hitting the throughput of `_mm256_fmadd_ps`.
 
-    #pragma omp parallel for
-    for (int i = 0; i < n; ++i) {
-        for (int j = 0; j < n; ++j) {
-            float8_t vv = f8infty;
-            for (int ka = 0; ka < na; ++ka) {
-                float8_t x = vd[na*i + ka];
-                float8_t y = vt[na*j + ka];
-                float8_t z = x + y;
-                vv = min8(vv, z);
-            }
-            r[n*i + j] = hmin8(vv);
-        }
+Note that this kernel is architecture-specific. If we didn't have `fma`, or if its throughput/latency were different, or if the SIMD width was 128 or 512 bits, we would have made different design choices. Multi-platform BLAS implementations ship [many kernels](https://github.com/xianyi/OpenBLAS/tree/develop/kernel), each written in assembly by hand and optimized for a particular architecture.
+
+The rest of the implementation is straightforward. Similar to the previous vectorized implementation, we just move the matrices to memory-aligned arrays and call the kernel instead of the innermost loop:
+
+```c++
+void matmul(const float *_a, const float *_b, float *_c, int n) {
+    // to simplify the implementation, we pad the height and width
+    // so that they are divisible by 6 and 16 respectively
+    int nx = (n + 5) / 6 * 6;
+    int ny = (n + 15) / 16 * 16;
+    
+    float *a = alloc(nx * ny);
+    float *b = alloc(nx * ny);
+    float *c = alloc(nx * ny);
+
+    for (int i = 0; i < n; i++) {
+        memcpy(&a[i * ny], &_a[i * n], 4 * n);
+        memcpy(&b[i * ny], &_b[i * n], 4 * n); // we don't need to transpose b this time
     }
 
-    std::free(vt);
-    std::free(vd);
+    for (int x = 0; x < nx; x += 6)
+        for (int y = 0; y < ny; y += 16)
+            kernel(a, (vec*) b, (vec*) c, x, y, 0, n, ny);
+
+    for (int i = 0; i < n; i++)
+        memcpy(&_c[i * n], &c[i * ny], 4 * n);
+    
+    std::free(a);
+    std::free(b);
+    std::free(c);
 }
 ```
 
-----
+This improves the benchmark performance, but only by ~40%:
 
-![](https://i.imgur.com/R3OvLKO.png =600x)
+![](../img/mm-kernel-barplot.svg)
 
----
+The speedup is much higher (2-3x) on smaller arrays, indicating that there is still a memory bandwidth problem:
 
-## V4: Register Reuse
-
-* At this point we are actually bottlenecked by memory
-* It turns out that calculating one $r_{ij}$ at a time is not optimal
-* We can reuse data that we read into registers to update other fields
-
-----
-
-![](https://i.imgur.com/ljvD0ba.png =400x)
-
-----
-
-```cpp
-for (int ka = 0; ka < na; ++ka) {
-    float8_t y0 = vt[na*(jc * nd + 0) + ka];
-    float8_t y1 = vt[na*(jc * nd + 1) + ka];
-    float8_t y2 = vt[na*(jc * nd + 2) + ka];
-    float8_t x0 = vd[na*(ic * nd + 0) + ka];
-    float8_t x1 = vd[na*(ic * nd + 1) + ka];
-    float8_t x2 = vd[na*(ic * nd + 2) + ka];
-    vv[0][0] = min8(vv[0][0], x0 + y0);
-    vv[0][1] = min8(vv[0][1], x0 + y1);
-    vv[0][2] = min8(vv[0][2], x0 + y2);
-    vv[1][0] = min8(vv[1][0], x1 + y0);
-    vv[1][1] = min8(vv[1][1], x1 + y1);
-    vv[1][2] = min8(vv[1][2], x1 + y2);
-    vv[2][0] = min8(vv[2][0], x2 + y0);
-    vv[2][1] = min8(vv[2][1], x2 + y1);
-    vv[2][2] = min8(vv[2][2], x2 + y2);
-}
+![](../img/mm-kernel-plot.svg)
+
+Now, if you've read the section on [cache-oblivious algorithms](/hpc/external-memory/oblivious/), you know that one universal solution to these types of things is to split all matrices into four parts, perform eight recursive block matrix multiplications, and carefully combine the results together. This solution is okay in practice, but there is some [overhead to recursion](/hpc/architecture/functions/), and it also doesn't allow us to fine-tune the algorithm, so instead, we will follow a different, simpler approach.
+
+## Blocking
+
+The *cache-aware* alternative to the divide-and-conquer trick is *cache blocking*: splitting the data into blocks that can fit into the cache and processing them one by one. If we have more than one layer of cache, we can do hierarchical blocking: we first select a block of data that fits into the L3 cache, then we split it into blocks that fit into the L2 cache, and so on. This approach requires knowing the cache sizes in advance, but it is usually easier to implement and also faster in practice.
+
+Cache blocking is less trivial to do with matrices than with arrays, but the general idea is this:
+
+- Select a submatrix of $B$ that fits into the L3 cache (say, a subset of its columns).
+- Select a submatrix of $A$ that fits into the L2 cache (say, a subset of its rows).
+- Select a submatrix of the previously selected submatrix of $B$ (a subset of its rows) that fits into the L1 cache.
+- Update the relevant submatrix of $C$ using the kernel.
+
+Here is a good [visualization](https://jukkasuomela.fi/cache-blocking-demo/) by Jukka Suomela (it features many different approaches; you are interested in the last one).
+
+Note that the decision to start this process with matrix $B$ is not arbitrary. During the kernel execution, we are reading the elements of $A$ much slower than the elements of $B$: we fetch and broadcast just one element of $A$ and then multiply it with $16$ elements of $B$. Therefore, we want $B$ to be in the L1 cache while $A$ can stay in the L2 cache and not the other way around.
+
+This sounds complicated, but we can implement it with just three more outer `for` loops, which are collectively called *macro-kernel* (and the highly optimized low-level function that updates a 6x16 submatrix is called *micro-kernel*):
+
+```c++
+const int s3 = 64;  // how many columns of B to select
+const int s2 = 120; // how many rows of A to select 
+const int s1 = 240; // how many rows of B to select
+
+for (int i3 = 0; i3 < ny; i3 += s3)
+    // now we are working with b[:][i3:i3+s3]
+    for (int i2 = 0; i2 < nx; i2 += s2)
+        // now we are working with a[i2:i2+s2][:]
+        for (int i1 = 0; i1 < ny; i1 += s1)
+            // now we are working with b[i1:i1+s1][i3:i3+s3]
+            // and we need to update c[i2:i2+s2][i3:i3+s3] with [l:r] = [i1:i1+s1]
+            for (int x = i2; x < std::min(i2 + s2, nx); x += 6)
+                for (int y = i3; y < std::min(i3 + s3, ny); y += 16)
+                    kernel(a, (vec*) b, (vec*) c, x, y, i1, std::min(i1 + s1, n), ny);
 ```
 
-Ugly, but worth it
+Cache blocking completely removes the memory bottleneck:
 
-----
+![](../img/mm-blocked-barplot.svg)
 
-![](https://i.imgur.com/GZvIt8J.png =600x)
+The performance is no longer (significantly) affected by the problem size:
 
----
+![](../img/mm-blocked-plot.svg)
 
-## V5: More Register Reuse
+Notice that the dip at $1536$ is still there: cache associativity still affects the performance. To mitigate this, we can adjust the step constants or insert holes into the layout, but we will not bother doing that for now.
 
-![](https://i.imgur.com/amUznoQ.png =400x)
+## Optimization
 
-----
+To approach closer to the performance limit, we need a few more optimizations:
 
-![](https://i.imgur.com/24nBJ1Y.png =600x)
+- Remove memory allocation and operate directly on the arrays that are passed to the function. Note that we don't need to do anything with `a` as we are reading just one element at a time, and we can use an [unaligned](/hpc/simd/moving/#aligned-loads-and-stores) `store` for `c` as we only use it rarely, so our only concern is reading `b`.
+- Get rid of the `std::min` so that the size parameters are (mostly) constant and can be embedded into the machine code by the compiler (which also lets it [unroll](/hpc/architecture/loops/) the micro-kernel loop more efficiently and avoid runtime checks).
+- Rewrite the micro-kernel by hand using 12 vector variables (the compiler seems to struggle with keeping them in registers and writes them first to a temporary memory location and only then to $C$).
 
----
+These optimizations are straightforward but quite tedious to implement, so we are not going to list [the code](https://github.com/sslotin/amh-code/blob/main/matmul/v5-unrolled.cc) here in the article. It also requires some more work to effectively support "weird" matrix sizes, which is why we only run benchmarks for sizes that are multiple of $48 = \frac{6 \cdot 16}{\gcd(6, 16)}$.
 
-## V6: Software Prefetching
+<!--
 
-![](https://i.imgur.com/zwqa1ZS.png =600x)
+Effectively supporting weird sizes requires a bit more work, and this is the reason why we benchmarked at an array sizes that are divisible by $48 = \frac{6 \cdot 16}{\gcd(6, 16)}$. We leave the code out, because the change is large and tedious and involves slightly modifying the benchmarking code itself. It is straightforward, but we only implement the version for this particular size, whithout any safety checks. Cheating on the benchmark.
 
----
+But avoiding moving anything pays off. 
+
+-->
 
-## V7: Temporal Cache Locality
+These individually small improvements compound and result in another 50% improvement:
 
-![](https://i.imgur.com/29vTLKJ.png)
+![](../img/mm-noalloc.svg)
 
-----
+We are actually not that far from the theoretical performance limit — which can be calculated as the SIMD width times the `fma` instruction throughput times the clock frequency:
 
-### Z-Curve
+$$
+\underbrace{8}_{SIMD} \cdot \underbrace{2}_{thr.} \cdot \underbrace{2 \cdot 10^9}_{cycles/sec} = 32 \; GFLOPS \;\; (3.2 \cdot 10^{10})
+$$
 
-![](https://i.imgur.com/0optLZ3.png)
+It is more representative to compare against some practical library, such as [OpenBLAS](https://www.openblas.net/). The laziest way to do it is to simply [invoke matrix multiplication from NumPy](/hpc/complexity/languages/#blas). There may be some minor overhead due to Python, but it ends up reaching 80% of the theoretical limit, which seems plausible (a 20% overhead is okay: matrix multiplication is not the only thing that CPUs are made for).
 
-----
+![](../img/mm-blas.svg)
 
-![](https://i.imgur.com/U3GaO5b.png)
+We've reached ~93% of BLAS performance and ~75% of the theoretical performance limit, which is really great for what is essentially just 40 lines of C.
 
----
+Interestingly, the whole thing can be rolled into just one deeply nested `for` loop with a BLAS level of performance (assuming that we're in 2050 and using GCC version 35, which finally stopped screwing up with register spilling):
+
+```c++
+for (int i3 = 0; i3 < n; i3 += s3)
+    for (int i2 = 0; i2 < n; i2 += s2)
+        for (int i1 = 0; i1 < n; i1 += s1)
+            for (int x = i2; x < i2 + s2; x += 6)
+                for (int y = i3; y < i3 + s3; y += 16)
+                    for (int k = i1; k < i1 + s1; k++)
+                        for (int i = 0; i < 6; i++)
+                            for (int j = 0; j < 2; j++)
+                                c[x * n / 8 + i * n / 8 + y / 8 + j]
+                                += (vec{} + a[x * n + i * n + k])
+                                   * b[n / 8 * k + y / 8 + j];
+```
+
+There is also an approach that performs asymptotically fewer arithmetic operations — [the Strassen algorithm](/hpc/external-memory/oblivious/#strassen-algorithm) — but it has a large constant factor, and it is only efficient for [very large matrices](https://arxiv.org/pdf/1605.01078.pdf) ($n > 4000$), where we typically have to use either multiprocessing or some approximate dimensionality-reducing methods anyway.
+
+## Generalizations
+
+FMA also supports 64-bit floating-point numbers, but it does not support integers: you need to perform addition and multiplication separately, which results in decreased performance. If you can guarantee that all intermediate results can be represented exactly as 32- or 64-bit floating-point numbers (which is [often the case](/hpc/arithmetic/errors/)), it may be faster to just convert them to and from floats.
+
+This approach can be also applied to some similar-looking computations. One example is the "min-plus matrix multiplication" defined as:
+
+$$
+(A \circ B)_{ij} = \min_{1 \le k \le n} (A_{ik} + B_{kj})
+$$
+
+It is also known as the "distance product" due to its graph interpretation: when applied to itself $(D \circ D)$, the result is the matrix of shortest paths of length two between all pairs of vertices in a fully-connected weighted graph specified by the edge weight matrix $D$.
+
+A cool thing about the distance product is that if we iterate the process and calculate
+
+$$
+D_2 = D \circ D \\
+D_4 = D_2 \circ D_2 \\
+D_8 = D_4 \circ D_4 \\
+\ldots
+$$
+
+…we can find all-pairs shortest paths in $O(\log n)$ steps:
+
+```c++
+for (int l = 0; l < logn; l++)
+    for (int i = 0; i < n; i++)
+        for (int j = 0; j < n; j++)
+            for (int k = 0; k < n; k++)
+                d[i][j] = min(d[i][j], d[i][k] + d[k][j]);
+```
+
+This requires $O(n^3 \log n)$ operations. If we do these two-edge relaxations in a particular order, we can do it with just one pass, which is known as the [Floyd-Warshall algorithm](https://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm):
+
+```c++
+for (int k = 0; k < n; k++)
+    for (int i = 0; i < n; i++)
+        for (int j = 0; j < n; j++)
+            d[i][j] = min(d[i][j], d[i][k] + d[k][j]);
+```
+
+Interestingly, similarly vectorizing the distance product and executing it $O(\log n)$ times ([or possibly fewer](https://arxiv.org/pdf/1904.01210.pdf)) in $O(n^3 \log n)$ total operations is faster than naively executing the Floyd-Warshall algorithm in $O(n^3)$ operations, although not by a lot.
+
+As an exercise, try to speed up this "for-for-for" computation. It is harder to do than in the matrix multiplication case because now there is a logical dependency between the iterations, and you need to perform updates in a particular order, but it is still possible to design [a similar kernel and a block iteration order](https://github.com/sslotin/amh-code/blob/main/floyd/blocked.cc) that achieves a 30-50x total speedup.
+
+## Acknowledgements
 
-## Summary
+The final algorithm was originally designed by Kazushige Goto, and it is the basis of GotoBLAS and OpenBLAS. The author himself describes it in more detail in "[Anatomy of High-Performance Matrix Multiplication](https://www.cs.utexas.edu/~flame/pubs/GotoTOMS_revision.pdf)".
 
-* Deal with memory problems first (make sure data fits L3 cache)
-* SIMD can get you ~10x speedup
-* ILP can get you 2-3x speedup
-* Multi-core parallelism can get you $NUM_CORES speedup
- (and it can be just one `#pragma omp parallel for` away)
+The exposition style is inspired by the "[Programming Parallel Computers](http://ppc.cs.aalto.fi/)" course by Jukka Suomela, which features a [similar case study](http://ppc.cs.aalto.fi/ch2/) on speeding up the distance product.
diff --git a/content/english/hpc/algorithms/parsing.md b/content/english/hpc/algorithms/parsing.md
deleted file mode 100644
index c189e66a..00000000
--- a/content/english/hpc/algorithms/parsing.md
+++ /dev/null
@@ -1,5 +0,0 @@
----
-title: Parsing with SIMD
-weight: 5
-draft: true
----
diff --git a/content/english/hpc/algorithms/prefix.md b/content/english/hpc/algorithms/prefix.md
index 5e31570d..43bfd560 100644
--- a/content/english/hpc/algorithms/prefix.md
+++ b/content/english/hpc/algorithms/prefix.md
@@ -61,7 +61,7 @@ for (int l = 0; l < logn; l++)
 
 We can prove that this algorithm works by induction: if on $k$-th iteration every element $a_i$ is equal to the sum of the $(i - 2^k, i]$ segment of the original array, then after adding $a_{i - 2^k}$ to it, it will be equal to the sum of $(i - 2^{k+1}, i]$. After $O(\log n)$ iterations, the array will turn into its prefix sum.
 
-To implement it in SIMD, we could use [permutations](/hpc/simd/shuffles) to place $i$-th element against $(i-2^k)$-th, but they are too slow. Instead, we will use the `sll` ("shift lanes left") instruction that does exactly that and also replaces the unmatched elements with zeros:
+To implement it in SIMD, we could use [permutations](/hpc/simd/shuffling) to place $i$-th element against $(i-2^k)$-th, but they are too slow. Instead, we will use the `sll` ("shift lanes left") instruction that does exactly that and also replaces the unmatched elements with zeros:
 
 ```c++
 typedef __m128i v4i;
@@ -76,7 +76,7 @@ v4i prefix(v4i x) {
     // x = 1, 3, 5, 7
     //   + 0, 0, 1, 3
     //   = 1, 3, 6, 10
-    return s;
+    return x;
 }
 ```
 
@@ -91,7 +91,7 @@ v8i prefix(v8i x) {
     x = _mm256_add_epi32(x, _mm256_slli_si256(x, 8));
     x = _mm256_add_epi32(x, _mm256_slli_si256(x, 16)); // <- this does nothing
     // x = 1, 3, 6, 10, 5, 11, 18, 26
-    return s;
+    return x;
 }
 ```
 
@@ -146,7 +146,7 @@ Another interesting data point: if we only execute the `prefix` phase, the perfo
 
 ### Blocking
 
-So, we have a memory bandwidth problem for large arrays. We can avoid re-fetching the entire array from the RAM if we split it into blocks that fit in the cache and process them separately. All we need to pass to the next block is the sum of the previous ones, so we can design a `local_prefix` function with an interface similar to `accumulate`:
+So, we have a memory bandwidth problem for large arrays. We can avoid re-fetching the entire array from RAM if we split it into blocks that fit in the cache and process them separately. All we need to pass to the next block is the sum of the previous ones, so we can design a `local_prefix` function with an interface similar to `accumulate`:
 
 ```c++
 const int B = 4096; // <- ideally should be slightly less or equal to the L1 cache
diff --git a/content/english/hpc/algorithms/reading-integers.md b/content/english/hpc/algorithms/reading-integers.md
new file mode 100644
index 00000000..de9da4e9
--- /dev/null
+++ b/content/english/hpc/algorithms/reading-integers.md
@@ -0,0 +1,59 @@
+---
+title: Reading Decimal Integers
+weight: 10
+draft: true
+---
+
+I wrote a new integer parsing algorithm that is ~35x faster than scanf.
+
+(No, this is not an April Fools' joke — although it does sound ridiculous.)
+
+Zen 2 @ 2GHz. The compiler is Clang 13.
+
+Ridiculous.
+
+### Iostream
+
+### Scanf
+
+### Syncronization
+
+### Getchar
+
+### Buffering
+
+### SIMD
+
+http://0x80.pl/notesen/2014-10-12-parsing-decimal-numbers-part-1-swar.html
+
+
+### Serial
+
+### Transpose-based approach
+
+### Instruction-level parallelism
+
+
+### Modifications
+
+ILP benefits would not be that huge.
+
+One huge asterisk. We get the integers, and we can even do other parsing algorithms on them.
+
+1.75 cycles per byte. 
+
+AVX-512 both due to larger SIMD lane size and dedicated operations for filtering.
+
+It accounts for ~2% of all time, but it can be optimized by using special procedures. Pad buffer with any digits.
+
+### Future work
+
+Next time, we will be *writing* integers.
+
+You can create a string searcing algorithm by computing hashes in rabin-karp algorithm — although it does not seem to be possible to make an *exact* algorithm for that.
+
+## Acknowledgements
+
+http://0x80.pl/articles/simd-parsing-int-sequences.html
+
+https://stackoverflow.com/questions/25622745/transpose-an-8x8-float-using-avx-avx2/25627536#25627536
diff --git a/content/english/hpc/architecture/assembly.md b/content/english/hpc/architecture/assembly.md
index 20a018c7..de94e4cf 100644
--- a/content/english/hpc/architecture/assembly.md
+++ b/content/english/hpc/architecture/assembly.md
@@ -19,7 +19,7 @@ Jumping right into it, here is how you add two numbers (`*c = *a + *b`) in Arm a
 ldr w0, [x0]    ; load 4 bytes from wherever x0 points into w0
 ldr w1, [x1]    ; load 4 bytes from wherever x1 points into w1
 add w0, w0, w1  ; add w0 with w1 and save the result to w0
-str w0, [x2]    ; write contents of w0 to wherever x2 points/
+str w0, [x2]    ; write contents of w0 to wherever x2 points
 ```
 
 Here is the same operation in x86 assembly:
@@ -33,7 +33,7 @@ mov DWORD PTR [rdx], eax  ; write contents of eax to wherever rdx points
 
 Assembly is very simple in the sense that it doesn't have many syntactical constructions compared to high-level programming languages. From what you can observe from the examples above:
 
-- A program is a sequence of instructions, each written as its name followed by a variable amount of operands.
+- A program is a sequence of instructions, each written as its name followed by a variable number of operands.
 - The `[reg]` syntax is used for "dereferencing" a pointer stored in a register, and on x86 you need to prefix it with size information (`DWORD` here means 32 bit).
 - The `;` sign is used for line comments, similar to `#` and `//` in other languages.
 
@@ -49,15 +49,15 @@ Since there are far more differences between the architectures than just this on
 
 For historical reasons, instruction mnemonics in most assembly languages are very terse. Back when people used to write assembly by hand and repeatedly wrote the same set of common instructions, one less character to type was one step away from insanity.
 
-For example, `mov` is for "store/load a word", `inc` is for "increment by 1", `mul` is for "multiply", and `idiv` is for "integer division". You can look up the description of an instruction by its name in [one of x86 references](https://www.felixcloutier.com/x86/), but most instructions do what you'd think they do.
+For example, `mov` is for "store/load a word," `inc` is for "increment by 1," `mul` is for "multiply," and `idiv` is for "integer division." You can look up the description of an instruction by its name in [one of x86 references](https://www.felixcloutier.com/x86/), but most instructions do what you'd think they do.
 
 Most instructions write their result into the first operand, which can also be involved in the computation like in the `add eax, [rdi]` example we saw before. Operands can be either registers, constant values, or memory locations.
 
-**Registers** are named `rax`, `rbx`, `rcx`, `rdx`, `rdi`, `rsi`, `rbp`, `rsp`, and `r8`-`r15` for a total of 16 of them. The "letter" ones are named like that for historical reasons: `rax` is "accumulator", `rcx` is "counter", `rdx` is "data" and so on, but, of course, they don't have to be used only for that.
+**Registers** are named `rax`, `rbx`, `rcx`, `rdx`, `rdi`, `rsi`, `rbp`, `rsp`, and `r8`-`r15` for a total of 16 of them. The "letter" ones are named like that for historical reasons: `rax` is "accumulator," `rcx` is "counter," `rdx` is "data" and so on — but, of course, they don't have to be used only for that.
 
-There are also 32-, 16-bit and 8-bit registers that have similar names (`rax` → `eax` → `ax` → `al`). They are not fully separate but *aliased*: the first 32 bits of `rax` are `eax`, the first 16 bits of `eax` are `ax`, and so on. This is made to save die space while maintaining compatibility, and it is also the reason why basic type casts in compiled programming languages are usually free. 
+There are also 32-, 16-bit and 8-bit registers that have similar names (`rax` → `eax` → `ax` → `al`). They are not fully separate but *aliased*: the lowest 32 bits of `rax` are `eax`, the lowest 16 bits of `eax` are `ax`, and so on. This is made to save die space while maintaining compatibility, and it is also the reason why basic type casts in compiled programming languages are usually free. 
 
-These are just the *general-purpose* registers that you can, with [some exceptions](../functions), use however you like in most instructions. There is also a separate set of registers for [floating-point arithmetic](/hpc/arithmetic/float), a bunch of very wide registers used in [vector extensions](/hpc/simd), and a few special ones that are needed for [control flow](../jumps), but we'll get there in time.
+These are just the *general-purpose* registers that you can, with [some exceptions](../functions), use however you like in most instructions. There is also a separate set of registers for [floating-point arithmetic](/hpc/arithmetic/float), a bunch of very wide registers used in [vector extensions](/hpc/simd), and a few special ones that are needed for [control flow](../loops), but we'll get there in time.
 
 **Constants** are just integer or floating-point values: `42`, `0x2a`, `3.14`, `6.02e23`. They are more commonly called *immediate values* because they are embedded right into the machine code. Because it may considerably increase the complexity of the instruction encoding, some instructions don't support immediate values or allow just a fixed subset of them. In some cases, you have to load a constant value into a register and then use it instead of an immediate value.
 
@@ -117,20 +117,18 @@ There are actually multiple *assemblers* (the programs that produce machine code
 
 These syntaxes are also sometimes called *GAS* and *NASM* respectively, by the names of the two primary assemblers that use them (*GNU Assembler* and *Netwide Assembler*).
 
-We used Intel syntax in this chapter and will continue to preferably use it for the rest of the book. For comparison, here is what the summation loop looks like in AT&T asm:
+We used Intel syntax in this chapter and will continue to preferably use it for the rest of the book. For comparison, here is how the same `*c = *a + *b` example looks like in AT&T asm:
 
 ```asm
-loop:
-    addl (%rax), %edx
-    addq $4, %rax
-    cmpq %rcx, %rax
-    jne  loop
+movl (%rsi), %eax
+addl (%rdi), %eax
+movl %eax, (%rdx)
 ```
 
 The key differences can be summarized as follows:
 
 1. The *last* operand is used to specify the destination.
-2. Register names and constants need to be prefixed by `%` and `$` respectively.
+2. Registers and constants need to be prefixed by `%` and `$` respectively (e.g., `addl $1, %rdx` increments `rdx`).
 3. Memory addressing looks like this: `displacement(%base, %index, scale)`.
 4. Both `;` and `#` can be used for line comments, and also `/* */` can be used for block comments.
 
diff --git a/content/english/hpc/architecture/functions.md b/content/english/hpc/architecture/functions.md
index ec8631f0..3f98a381 100644
--- a/content/english/hpc/architecture/functions.md
+++ b/content/english/hpc/architecture/functions.md
@@ -1,6 +1,7 @@
 ---
 title: Functions and Recursion
 weight: 3
+published: true
 ---
 
 To "call a function" in assembly, you need to [jump](../loops) to its beginning and then jump back. But then two important problems arise:
@@ -15,9 +16,9 @@ Both of these concerns can be solved by having a dedicated location in memory wh
 The hardware stack works the same way software stacks do and is similarly implemented as just two pointers:
 
 - The *base pointer* marks the start of the stack and is conventionally stored in `rbp`.
-- The *stack pointer* marks the last element on the stack and is conventionally stored in `rsp`.
+- The *stack pointer* marks the last element of the stack and is conventionally stored in `rsp`.
 
-When you need to call a function, you push all your local variables onto the stack (which you can also do in other circumstances, e. g. when you run out of registers), push the current instruction pointer, and then jump to the beginning of the function. When exiting from a function, you look at the pointer stored on top of the stack, jump there, and then carefully read all the variables stored on the stack back into their registers.
+When you need to call a function, you push all your local variables onto the stack (which you can also do in other circumstances; e.g., when you run out of registers), push the current instruction pointer, and then jump to the beginning of the function. When exiting from a function, you look at the pointer stored on top of the stack, jump there, and then carefully read all the variables stored on the stack back into their registers.
 
 <!--
 
@@ -55,11 +56,11 @@ add rsp, 8
 
 ; "call func"
 push rip ; <- instruction pointer (although accessing it like that is probably illegal)
-jump func
+jmp func
 
 ; "ret"
 pop  rcx ; <- choose any unused register
-jump rcx
+jmp rcx
 ```
 
 The memory region between `rbp` and `rsp` is called a *stack frame*, and this is where local variables of functions are typically stored. It is pre-allocated at the start of the program, and if you push more data on the stack than its capacity (8MB by default on Linux), you encounter a *stack overflow* error. Because modern operating systems don't actually give you memory pages until you read or write to their address space, you can freely specify a very large stack size, which acts more like a limit on how much stack memory can be used, and not a fixed amount every program has to use.
@@ -93,7 +94,7 @@ Note that the data in the stack is written top-to-bottom. This is just a convent
 
 ### Calling Conventions
 
-The people who develop compilers and operating systems eventually came up with [conventions](https://wiki.osdev.org/Calling_Conventions) on how to write and call functions. These conventions enable some important [software engineering marvels](/hpc/compilation/linking/) such as splitting compilation into separate units, re-using already compiled libraries, and even writing them in different programming languages.
+The people who develop compilers and operating systems eventually came up with [conventions](https://wiki.osdev.org/Calling_Conventions) on how to write and call functions. These conventions enable some important [software engineering marvels](/hpc/compilation/stages/) such as splitting compilation into separate units, reusing already-compiled libraries, and even writing them in different programming languages.
 
 Consider the following example in C:
 
@@ -141,7 +142,7 @@ length:
 ```
 -->
 
-By convention, a function should take its arguments in `rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9` (and the rest in the stack if that wasn't enough), put the return value into `rax`, and then return. Thus, `square`, being a simple one-argument function, can be implemented like this:
+By convention, a function should take its arguments in `rdi`, `rsi`, `rdx`, `rcx`, `r8`, `r9` (and the rest in the stack if those weren't enough), put the return value into `rax`, and then return. Thus, `square`, being a simple one-argument function, can be implemented like this:
 
 ```nasm
 square:             ; x = edi, ret = eax
@@ -189,7 +190,7 @@ distance:
     ret
 ```
 
-This is better, but we are still implicitly accessing stack memory: you need to push and pop the instruction pointer on each function call. In simple cases like this, we can *inline* function calls by stitching callee's code into the caller and resolving conflicts over registers. In our example:
+This is better, but we are still implicitly accessing stack memory: you need to push and pop the instruction pointer on each function call. In simple cases like this, we can *inline* function calls by stitching the callee's code into the caller and resolving conflicts over registers. In our example:
 
 ```nasm
 distance:
@@ -229,7 +230,7 @@ Equivalent assembly:
 ```nasm
 ; n = edi, ret = eax
 factorial:
-    test edi, edi   ; test if a value if zero
+    test edi, edi   ; test if a value is zero
     jne  nonzero    ; (the machine code of "cmp rax, 0" would be one byte longer)
     mov  eax, 1     ; return 1
     ret
diff --git a/content/english/hpc/architecture/indirect.md b/content/english/hpc/architecture/indirect.md
index ce6e86b8..1bd96c06 100644
--- a/content/english/hpc/architecture/indirect.md
+++ b/content/english/hpc/architecture/indirect.md
@@ -102,11 +102,11 @@ There are many ways to implement this behavior, but C++ does it using a *virtual
 
 For all concrete implementations of `Animal`, compiler pads all their methods (that is, their instruction sequences) so that they have the exact same length for all classes (by inserting some [filler instructions](../layout) after `ret`) and then just writes them sequentially somewhere in the instruction memory. Then it adds a *run-time type information* field to the structure (that is, to all its instances), which is essentially just the offset in the memory region that points to the right implementation of the virtual methods of the class.
 
-During a virtual method call, that offset field is fetched from the instance of a structure, and a normal function call is made with it, using the fact that all methods and other fields of every derived class have exactly the same offsets.
+With a virtual method call, that offset field is fetched from the instance of a structure and a normal function call is made with it, using the fact that all methods and other fields of every derived class have exactly the same offsets.
 
 Of course, this adds some overhead:
 
-- You may need to spend another 15 cycles or so for the same pipeline flushing reasons as for [branch misprediction](../pipelining).
+- You may need to spend another 15 cycles or so for the same pipeline flushing reasons as for [branch misprediction](/hpc/pipelining).
 - The compiler most likely won't be able to inline the function call itself.
 - Class size increases by a couple of bytes or so (this is implementation-specific).
 - The binary size itself increases a little bit.
diff --git a/content/english/hpc/architecture/isa.md b/content/english/hpc/architecture/isa.md
index d109b359..b902f69c 100644
--- a/content/english/hpc/architecture/isa.md
+++ b/content/english/hpc/architecture/isa.md
@@ -14,7 +14,7 @@ Abstractions help us in reducing all this complexity down to a single *interface
 
 Hardware engineers love abstractions too. An abstraction of a CPU is called an *instruction set architecture* (ISA), and it defines how a computer should work from a programmer's perspective. Similar to software interfaces, it gives computer engineers the ability to improve on existing CPU designs while also giving its users — us, programmers — the confidence that things that worked before won't break on newer chips.
 
-An ISA essentially defines how the hardware should interpret the machine language. Apart from instructions and their binary encodings, ISA importantly defines counts, sizes, and purposes of registers, the memory model, and the input/output model. Similar to software interfaces, ISAs can be extended too: in fact, they are often updated, mostly in a backward-compatible way, to add new and more specialized instructions that can improve performance.
+An ISA essentially defines how the hardware should interpret the machine language. Apart from instructions and their binary encodings, an ISA also defines the counts, sizes, and purposes of registers, the memory model, and the input/output model. Similar to software interfaces, ISAs can be extended too: in fact, they are often updated, mostly in a backward-compatible way, to add new and more specialized instructions that can improve performance.
 
 ### RISC vs CISC
 
@@ -23,7 +23,7 @@ Historically, there have been many competing ISAs in use. But unlike [character
 - **Arm** chips, which are used in almost all mobile devices, as well as other computer-like devices such as TVs, smart fridges, microwaves, [car autopilots](https://en.wikipedia.org/wiki/Tesla_Autopilot), and so on. They are designed by a British company of the same name, as well as a number of electronics manufacturers including Apple and Samsung.
 - **x86**[^x86] chips, which are used in almost all servers and desktops, with a few notable exceptions such as Apple's M1 MacBooks, AWS's Graviton processors, and the current [world's fastest supercomputer](https://en.wikipedia.org/wiki/Fugaku_(supercomputer)), all of which use Arm-based CPUs. They are designed by a duopoly of Intel and AMD.
 
-[^x86]: Modern 64-bit versions of x86 are known as "AMD64", "Intel 64", or by the more vendor-neutral names of "x86-64" or just "x64". A similar 64-bit extension of Arm is called "AArch64" or "ARM64". In this book, we will just use plain "x86" and "Arm" implying the 64-bit versions.
+[^x86]: Modern 64-bit versions of x86 are known as "AMD64," "Intel 64," or by the more vendor-neutral names of "x86-64" or just "x64." A similar 64-bit extension of Arm is called "AArch64" or "ARM64." In this book, we will just use plain "x86" and "Arm" implying the 64-bit versions.
 
 The main difference between them is that of architectural complexity, which is more of a design philosophy rather than some strictly defined property:
 
diff --git a/content/english/hpc/architecture/layout.md b/content/english/hpc/architecture/layout.md
index 1ab39c82..df414512 100644
--- a/content/english/hpc/architecture/layout.md
+++ b/content/english/hpc/architecture/layout.md
@@ -1,6 +1,7 @@
 ---
 title: Machine Code Layout
 weight: 10
+published: true
 ---
 
 Computer engineers like to mentally split the [pipeline of a CPU](/hpc/pipelining) into two parts: the *front-end*, where instructions are fetched from memory and decoded, and the *back-end*, where they are scheduled and finally executed. Typically, the performance is bottlenecked by the execution stage, and for this reason, most of our efforts in this book are going to be spent towards optimizing around the back-end.
@@ -15,7 +16,7 @@ During the **fetch** stage, the CPU simply loads a fixed-size chunk of bytes fro
 
 <!-- todo: what happens when an instruction crosses the boundary? -->
 
-Next comes the **decode** stage: the CPU looks at this chunk of bytes, discards everything that comes before the instruction pointer, and splits the rest of them into instructions. Machine instructions are encoded using a variable amount of bytes: something simple and very common like `inc rax` takes one byte, while some obscure instruction with encoded constants and behavior-modifying prefixes may take up to 15. So, from a 32-byte block, a variable number of instructions may be decoded, but no more than a certain machine-dependant limit called the *decode width*. On my CPU (a [Zen 2](https://en.wikichip.org/wiki/amd/microarchitectures/zen_2)), the decode width is 4, which means that on each cycle, up to 4 instructions can be decoded and passed to the next stage.
+Next comes the **decode** stage: the CPU looks at this chunk of bytes, discards everything that comes before the instruction pointer, and splits the rest of them into instructions. Machine instructions are encoded using a variable number of bytes: something simple and very common like `inc rax` takes one byte, while some obscure instruction with encoded constants and behavior-modifying prefixes may take up to 15. So, from a 32-byte block, a variable number of instructions may be decoded, but no more than a certain machine-dependent limit called the *decode width*. On my CPU (a [Zen 2](https://en.wikichip.org/wiki/amd/microarchitectures/zen_2)), the decode width is 4, which means that on each cycle, up to 4 instructions can be decoded and passed to the next stage.
 
 The stages work in a pipelined fashion: if the CPU can tell (or [predict](/hpc/pipelining/branching/)) which instruction block it needs next, then the fetch stage doesn't wait for the last instruction in the current block to be decoded and loads the next one right away.
 
@@ -29,7 +30,7 @@ Loop Stream Detector (LSD)
 
 ### Code Alignment
 
-Other things being equal, compilers typically prefer instructions with shorter machine code, because this way more instructions can fit in a single 32B fetch block, and also because it reduces the size of the binary. But sometimes the reverse advice applies, caused by the fact that the fetched instructions blocks have to be aligned.
+Other things being equal, compilers typically prefer instructions with shorter machine code, because this way more instructions can fit in a single 32B fetch block, and also because it reduces the size of the binary. But sometimes the reverse is prefereable, due to the fact that the fetched instructions' blocks must be aligned.
 
 Imagine that you need to execute an instruction sequence that starts on the last byte of a 32B-aligned block. You may be able to execute the first instruction without additional delay, but for the subsequent ones, you have to wait for one additional cycle to do another instruction fetch. If the code block was aligned on a 32B boundary, then up to 4 instructions could be decoded and then executed concurrently (unless they are extra long or interdependent).
 
@@ -45,15 +46,15 @@ In GCC, you can use `-falign-labels=n` flag to specify a particular alignment po
 
 The instructions are stored and fetched using largely the same [memory system](/hpc/cpu-cache) as for the data, except maybe the lower layers of cache are replaced with a separate *instruction cache* (because you wouldn't want a random data read to kick out the code that processes it).
 
-The instruction cache is crucial in situations when you either
+The instruction cache is crucial in situations when you either:
 
 - don't know what instructions you are going to execute next, and need to fetch the next block with [low latency](/hpc/cpu-cache/latency),
-- or executing a long sequence of verbose-but-quick-to-process instructions, and need [high bandwidth](/hpc/cpu-cache/bandwidth).
+- or are executing a long sequence of verbose-but-quick-to-process instructions, and need [high bandwidth](/hpc/cpu-cache/bandwidth).
 
 The memory system can therefore become the bottleneck for programs with large machine code. This consideration limits the applicability of the optimization techniques we've previously discussed:
 
 - [Inlining functions](../functions) is not always optimal, because it reduces code sharing and increases the binary size, requiring more instruction cache.
-- [Unrolling loops](../loops) is only beneficial up to some extent, even if the number of loops is known during compile-time: at some point, the CPU would have to fetch both instructions and data from the main memory, in which case it will likely be bottlenecked by the memory bandwidth.
+- [Unrolling loops](../loops) is only beneficial up to some extent, even if the number of iterations is known during compile time: at some point, the CPU would have to fetch both instructions and data from the main memory, in which case it will likely be bottlenecked by the memory bandwidth.
 - Huge [code alignments](#code-alignment) increase the binary size, again requiring more instruction cache. Spending one more cycle on fetch is a minor penalty compared to missing the cache and waiting for the instructions to be fetched from the main memory.
 
 Another aspect is that placing frequently used instruction sequences on the same [cache lines](/hpc/cpu-cache/cache-lines) and [memory pages](/hpc/cpu-cache/paging) improves [cache locality](/hpc/external-memory/locality). To improve instruction cache utilization, you should  group hot code with hot code and cold code with cold code, and remove dead (unused) code if possible. If you want to explore this idea further, check out Facebook's [Binary Optimization and Layout Tool](https://engineering.fb.com/2018/06/19/data-infrastructure/accelerate-large-scale-applications-with-bolt/), which was recently [merged](https://github.com/llvm/llvm-project/commit/4c106cfdf7cf7eec861ad3983a3dd9a9e8f3a8ae) into LLVM.
@@ -126,7 +127,7 @@ normal:
     ret
 swap:
     xchg edi, esi
-    jump normal
+    jmp normal
 ```
 
 This technique is quite handy when handling exceptions cases in general, and in high-level code, you can give the compiler a [hint](/hpc/compilation/situational) that a certain branch is more likely than the other:
@@ -152,7 +153,7 @@ length:
     ret
 ```
 
-This is a very important issue, and we will spend [much of the next chapter](/hpc/pipelining/branching) discussing it in more detail.
+Eliminating branches is an important topic, and we will spend [much of the next chapter](/hpc/pipelining/branching) discussing it in more detail.
 
 <!--
 
diff --git a/content/english/hpc/architecture/loops.md b/content/english/hpc/architecture/loops.md
index ff0ab00c..ad59d890 100644
--- a/content/english/hpc/architecture/loops.md
+++ b/content/english/hpc/architecture/loops.md
@@ -19,17 +19,17 @@ The "body" of the loop is `add edx, DWORD PTR [rax]`: this instruction loads dat
 
 ### Jumps
 
-Assembly doesn't have if-s, for-s, functions, or other control flow structures that high-level languages have. What it does have is `goto`, or "jump", how it is known in the world of low-level programming.
+Assembly doesn't have if-s, for-s, functions, or other control flow structures that high-level languages have. What it does have is `goto`, or "jump," how it is known in the world of low-level programming.
 
 **Jump** moves the instruction pointer to a location specified by its operand. This location may be either an absolute address in memory, relative to the current address or even [computed during runtime](../indirect). To avoid the headache of managing these addresses directly, you can mark any instruction with a string followed by `:`, and then use this string as a label which gets replaced by the relative address of this instruction when converted to machine code.
 
-Labels can be any strings, but compilers don't get creative and [typically](https://godbolt.org/z/T45x8GKa5) just use the line numbers in the source code and function names with their signatures when picking names for labels.
+Labels can be any string, but compilers don't get creative and [typically](https://godbolt.org/z/T45x8GKa5) just use the line numbers in the source code and function names with their signatures when picking names for labels.
 
 **Unconditional** jump `jmp` can only be used to implement `while (true)` kind of loops or stitch parts of a program together. A family of **conditional** jumps is used to implement actual control flow.
 
-It is reasonable to think that these conditions are computed as `bool`-s somewhere and passed to conditional jumps as operands: after all, this is how it works in programming languages. But that is not how it is implemented in hardware. Conditional operations use a special `FLAGS` register, which first needs to be populated by executing instructions that perform some kind of checks.
+It is reasonable to think that these conditions are computed as `bool`-s somewhere and passed to conditional jumps as operands: after all, this is how it works in programming languages. But that is not how it is implemented in hardware. Conditional operations use a special `FLAGS` register, which first needs to be populated by executing instructions that perform some kind of check.
 
-In our example, `cmp rax, rcx` compares the iterator `rax` with the end-of-array pointer `rcx`. This updates the FLAGS register, and now it can be used by `jne loop`, which looks up a certain bit there that tells whether the two values are equal or not, and then either jumps back to the beginning or continues to the next instruction, thus breaking the loop.
+In our example, `cmp rax, rcx` compares the iterator `rax` with the end-of-array pointer `rcx`. This updates the `FLAGS` register, and now it can be used by `jne loop`, which looks up a certain bit there that tells whether the two values are equal or not, and then either jumps back to the beginning or continues to the next instruction, thus breaking the loop.
 
 ### Loop Unrolling
 
@@ -61,15 +61,15 @@ loop:
 
 Now we only need 3 loop control instructions for 4 useful ones (an improvement from $\frac{1}{4}$ to $\frac{4}{7}$ in terms of efficiency), and this can be continued to reduce the overhead almost to zero.
 
-In practice though, unrolling loops isn't always necessary for performance because modern processors don't actually execute instructions one-by-one, but maintain a [queue of pending instructions](/hpc/pipelining) so that two independent operations can be executed concurrently without waiting for each other to finish.
+In practice, unrolling loops isn't always necessary for performance because modern processors don't actually execute instructions one-by-one, but maintain a [queue of pending instructions](/hpc/pipelining) so that two independent operations can be executed concurrently without waiting for each other to finish.
 
 This is our case too: the real speedup from unrolling won't be fourfold, because the operations of incrementing the counter and checking if we are done are independent from the loop body, and can be scheduled to run concurrently with it. But may still be beneficial to [ask the compiler](/hpc/compilation/situational) to unroll it to some extent.
 
 ### An Alternative Approach
 
-You don't have to explicitly use `cmp` or a similar instruction to make a conditional jump. Many other instructions either read or modify the FLAGS register, sometimes as a by-product enabling optional exception checks.
+You don't have to explicitly use `cmp` or a similar instruction to make a conditional jump. Many other instructions either read or modify the `FLAGS` register, sometimes as a by-product enabling optional exception checks.
 
-For example, `add` always sets a bunch of flags, denoting whether the result is zero, is negative, whether an overflow or an underflow occurred, and so on. Taking advantage of this mechanism, compilers often produce loops like this:
+For example, `add` always sets a number of flags, denoting whether the result is zero, is negative, whether an overflow or an underflow occurred, and so on. Taking advantage of this mechanism, compilers often produce loops like this:
 
 ```nasm
     mov  rax, -100  ; replace 100 with the array size
@@ -79,7 +79,7 @@ loop:
     jnz  loop       ; checks if the result is zero
 ```
 
-This code is a bit harder to read for a human, but it is one instruction shorter in the repeated part, which isn't huge, but non-negligible for performance.
+This code is a bit harder to read for a human, but it is one instruction shorter in the repeated part, which may meaningfully affect performance.
 
 <!--
 
@@ -105,7 +105,7 @@ It is a notoriously difficult math problem that seems ridiculously simple.
 
 Make use of [lea instruction](../assembly).
 
-E. g. if you want to make a computational experiment [Collatz conjecture](https://en.wikipedia.org/wiki/Collatz_conjecture), you may use `lea rax, [rax + rax * 2 + 1]`, and then try to `sar` it.
+E.g., if you want to make a computational experiment [Collatz conjecture](https://en.wikipedia.org/wiki/Collatz_conjecture), you may use `lea rax, [rax + rax * 2 + 1]`, and then try to `sar` it.
 
 Another way is to check add.
 
@@ -117,7 +117,7 @@ cmov
 
 Need to somehow link it to branchless programming and layout article. We now have 3 places introducing the concept.
 
-Many other operations set something in the FLAGS register. For example, add often. It is useful to, and then decrement or increment it to save on instruction. Like a while loop:
+Many other operations set something in the `FLAGS` register. For example, add often. It is useful to, and then decrement or increment it to save on instruction. Like a while loop:
 
 ```
 while (n--) {
diff --git a/content/english/hpc/arithmetic/_index.md b/content/english/hpc/arithmetic/_index.md
index 686d15ab..ca4357c5 100644
--- a/content/english/hpc/arithmetic/_index.md
+++ b/content/english/hpc/arithmetic/_index.md
@@ -1,11 +1,12 @@
 ---
 title: Arithmetic
 weight: 6
+published: true
 ---
 
 As we repeatedly demonstrate throughout this book, knowing darker corners of the instruction set can be very fruitful, especially in the case of [CISC](/hpc/architecture/isa) platforms like x86, which currently has [somewhere between 1000 and 4000](https://stefanheule.com/blog/how-many-x86-64-instructions-are-there-anyway/) distinct instructions, depending on how you count.
 
-Most of these instructions are related arithmetic, and using them all efficiently to optimize arithmetic operations requires a great deal of both knowledge, skill, and creativity. Therefore, in this chapter, we will discuss number representations and their use in numerical algorithms.
+Most of these instructions are related to arithmetic, and using them all efficiently to optimize arithmetic operations requires a great deal of both knowledge, skill, and creativity. Therefore, in this chapter, we will discuss number representations and their use in numerical algorithms.
 
 <!--
 
diff --git a/content/english/hpc/arithmetic/bit-hacks.md b/content/english/hpc/arithmetic/bit-hacks.md
index 5d54b1c1..44a365eb 100644
--- a/content/english/hpc/arithmetic/bit-hacks.md
+++ b/content/english/hpc/arithmetic/bit-hacks.md
@@ -24,11 +24,11 @@ Left or right-shifting negative numbers invokes undefined behavior in C/C++.
 
 `__builtin_popcount` `popcnt` Returns the number of 1-bits in x.
 
-`__builtin_parity` Returns the parity of x, i.e. the number of 1-bits in x modulo 2.
+`__builtin_parity` Returns the *parity* of x (that is, the number of 1-bits in x modulo 2).
 
 This is presumably for [error detection](https://en.wikipedia.org/wiki/Parity_bit).
 
-`__builtin_clrsb` Returns the number of leading redundant sign bits in x, i.e. the number of bits following the most significant bit that are identical to it. There are no special cases for 0 or other values.
+`__builtin_clrsb` Returns the number of leading redundant sign bits in x, i.e., the number of bits following the most significant bit that are identical to it. There are no special cases for 0 or other values.
 
 `__builtin_ffs` Returns one plus the index of the least significant 1-bit of x, or if x is zero, returns zero.
 
diff --git a/content/english/hpc/arithmetic/division.md b/content/english/hpc/arithmetic/division.md
index 41b35e30..0bf44da8 100644
--- a/content/english/hpc/arithmetic/division.md
+++ b/content/english/hpc/arithmetic/division.md
@@ -45,7 +45,7 @@ You can also divide 128-bit integer (stored in `rdx:rax`) by a 64-bit integer:
 ```nasm
 div(u128, u64):
     ; a = rdi + rsi, b = rdx
-    mov  rcx, rdx ;
+    mov  rcx, rdx
     mov  rax, rdi
     mov  rdx, rsi
     div  edx 
@@ -118,15 +118,15 @@ This method requires some precomputation, including performing one actual divisi
 It is not very clear why such $m$ and $s$ always exist, let alone how to find them. But given a fixed $s$, intuition tells us that $m$ should be as close to $2^s/y$ as possible for $2^s$ to cancel out. So there are two natural choices: $\lfloor 2^s/y \rfloor$ and $\lceil 2^s/y \rceil$. The first one doesn't work, because if you substitute
 
 $$
-\lfloor \frac{x \cdot \lfloor 2^s/y \rfloor}{2^s} \rfloor
+\Bigl \lfloor \frac{x \cdot \lfloor 2^s/y \rfloor}{2^s} \Bigr \rfloor
 $$
 
 then for any integer $\frac{x}{y}$ where $y$ is not even, the result will be strictly less than the truth. This only leaves the other case, $m = \lceil 2^s/y \rceil$. Now, let's try to derive the lower and upper bounds for the result of the computation:
 
 $$
   \lfloor x / y \rfloor
-= \lfloor \frac{x \cdot m}{2^s} \rfloor
-= \lfloor \frac{x \cdot \lceil  2^s /y \rceil}{2^s} \rfloor
+= \Bigl \lfloor \frac{x \cdot m}{2^s} \Bigr \rfloor
+= \Bigl \lfloor \frac{x \cdot \lceil  2^s /y \rceil}{2^s} \Bigr \rfloor
 $$
 
 Let's start with the bounds for $m$:
@@ -144,7 +144,7 @@ And now for the whole expression:
 $$
 x / y - 1
 <
-\lfloor \frac{x \cdot \lceil  2^s /y \rceil}{2^s} \rfloor
+\Bigl \lfloor \frac{x \cdot \lceil  2^s /y \rceil}{2^s} \Bigr \rfloor
 <
 x / y + x / 2^s
 $$
@@ -182,8 +182,8 @@ Now, for 32-bit integers, we can set $s = 64$ and look at the computation that w
 
 $$
   \lfloor x / y \rfloor
-= \lfloor \frac{x \cdot m}{2^s} \rfloor
-= \lfloor \frac{x \cdot \lceil  2^s /y \rceil}{2^s} \rfloor
+= \Bigl \lfloor \frac{x \cdot m}{2^s} \Bigr \rfloor
+= \Bigl \lfloor \frac{x \cdot \lceil  2^s /y \rceil}{2^s} \Bigr \rfloor
 $$
 
 What we really do here is we multiply $x$ by a floating-point constant ($x \cdot m$) and then truncate the result $(\lfloor \frac{\cdot}{2^s} \rfloor)$.
@@ -199,7 +199,7 @@ This works perfectly because what we do here can be interpreted as just three ch
 ```c++
 uint32_t y;
 
-uint64_t m = uint64_t(-1) / y + 1; // ceil(2^64 / d)
+uint64_t m = uint64_t(-1) / y + 1; // ceil(2^64 / y)
 
 uint32_t mod(uint32_t x) {
     uint64_t lowbits = m * x;
@@ -211,6 +211,14 @@ uint32_t div(uint32_t x) {
 }
 ```
 
+We can also check divisibility of $x$ by $y$ with just one multiplication using the fact that the remainder of division is zero if and only if the fractional part (the lower 64 bits of $m \cdot x$) does not exceed $m$ (otherwise, it would become a nonzero number when multiplied back by $y$ and right-shifted by 64).
+
+```c++
+bool is_divisible(uint32_t x) {
+    return m * x < m;
+}
+```
+
 The only downside of this method is that it needs integer types four times the original size to perform the multiplication, while other reduction methods can work with just the double.
 
 There is also a way to compute 64x64 modulo by carefully manipulating the halves of intermediate results; the implementation is left as an exercise to the reader.
diff --git a/content/english/hpc/arithmetic/errors.md b/content/english/hpc/arithmetic/errors.md
index 14952cc4..47d5d42d 100644
--- a/content/english/hpc/arithmetic/errors.md
+++ b/content/english/hpc/arithmetic/errors.md
@@ -1,6 +1,7 @@
 ---
 title: Rounding Errors
 weight: 2
+published: true
 ---
 
 The way rounding works in hardware floats is remarkably simple: it occurs if and only if the result of the operation is not representable exactly, and by default gets rounded to the nearest representable number (in case of a tie preferring the number that ends with a zero).
@@ -24,9 +25,11 @@ has the exact same distance from $2^{24}$ and $(2^{24} + 1)$ but gets rounded do
 
 ### Rounding Errors and Operation Order
 
-Note is that while most operations with real numbers are commutative and associative, their rounding errors are not: even the result of $(x+y+z)$ depends on the order of summation. Compilers are not allowed to produce non-spec-compliant results, so this disables some potential optimizations that involve rearranging operands. You can disable this strict compliance with the `-ffast-math` flag in GCC and Clang.
+The result of a floating-point computation may depend on the order of operations despite being algebraically correct.
 
-For example, if we add `-O3` and `-ffast-math` and re-compile this snippet, it runs [considerably faster](/hpc/simd/reduction) and also happens to output the correct result, 33554432 — although you need to be aware that the compiler also could have chosen a less precise computation path.
+For example, while the operations of addition and multiplication are commutative and associative in the pure mathematical sense, their rounding errors are not: when we have three floating-point variables $x$, $y$, and $z$, the result of $(x+y+z)$ depends on the order of summation. The same non-commutativity principle applies to most if not all other floating-point operations.
+
+Compilers are not allowed to produce [non-spec-compliant](/hpc/compilation/contracts/) results, so this annoying nuance disables some potential optimizations that involve rearranging operands in arithmetic. You can disable this strict compliance with the `-ffast-math` flag in GCC and Clang. If we add it and re-compile the code snippet above, it runs [considerably faster](/hpc/simd/reduction) and also happens to output the correct result, 33554432 (although you need to be aware that the compiler also could have chosen a less precise computation path).
 
 ### Rounding Modes
 
@@ -41,7 +44,7 @@ For example, if you call `fesetround(FE_UPWARD)` before running the loop above,
 
 One of the uses for the alternative rounding modes is for diagnosing numerical instability. If the results of an algorithm substantially vary when switching between rounding to the positive and negative infinities, it indicates susceptibility to round-off errors.
 
-This test is better than switching all computations to lower precision and checking whether the result changed by too much. The default rounding-to-nearest converges to the correct “expected” value given enough averaging: statistically, half of the time, they are rounding up, and the other half, they are rounding down — so they cancel each other.
+This test is often better than switching all computations to lower precision and checking whether the result changed by too much because the default round-to-nearest policy converges to the correct “expected” value given enough averaging: half of the time the errors are rounding up, and the other they are rounding down — so, statistically, they cancel each other.
 
 ### Measuring Errors
 
@@ -71,11 +74,11 @@ bool eq(float a, float b) {
 }
 ```
 
-The value of epsilon should depend on the application: the one above — the machine epsilon for `float` — is only good for no more than one floating-point operation.
+The value of `eps` should depend on the application: the one above — the machine epsilon for `float` — is only good for no more than one floating-point operation.
 
 ### Interval Arithmetic
 
-An algorithm is called *numerically stable* if its error, whatever its cause, does not grow to be much larger during the calculation. This happens if the problem is *well-conditioned*, meaning that the solution changes by only a small amount if the problem data are changed by a small amount.
+An algorithm is called *numerically stable* if its error, whatever its cause, does not grow much larger during the calculation. This can only happen if the problem itself is *well-conditioned*, meaning that the solution changes only by a small amount if the input data are changed by a small amount.
 
 When analyzing numerical algorithms, it is often useful to adopt the same method that is used in experimental physics: instead of working with unknown real values, we will work with the intervals where they may be in.
 
@@ -123,11 +126,11 @@ $$
 f(x, y) = x^2 - y^2 = (x + y) \cdot (x - y)
 $$
 
-In this one, it is easy to show that the error is be bound by $\epsilon \cdot |x - y|$. It is also faster because it needs 2 additions and 1 multiplication: one fast addition more and one slow multiplication less compared to the original.
+In this one, it is easy to show that the error is bound by $\epsilon \cdot |x - y|$. It is also faster because it needs 2 additions and 1 multiplication: one fast addition more and one slow multiplication less compared to the original.
 
 ### Kahan Summation
 
-From the previous example, we can see that long chains of operations are not a problem, but adding and subtracting numbers of different magnitude is. The general approach to dealing with such problems is to try to keep big numbers with big numbers and low numbers with low numbers.
+From the previous example, we can see that long chains of operations are not a problem, but adding and subtracting numbers of different magnitude is. The general approach to dealing with such problems is to try to keep big numbers with big numbers and small numbers with small numbers.
 
 Consider the standard summation algorithm:
 
@@ -139,7 +142,7 @@ for (int i = 0; i < n; i++)
 
 Since we are performing summations and not multiplications, its relative error is no longer just bounded by $O(\epsilon \cdot n)$, but heavily depends on the input.
 
-In the most ridiculous case, if the first value is $2^{23}$ and the others are ones, the sum is going to be $2^{23}$ regardless of $n$, which can be verified by executing the following code and observing that it simply prints $16777216 = 2^{23}$ twice:
+In the most ridiculous case, if the first value is $2^{24}$ and the other values are equal to $1$, the sum is going to be $2^{24}$ regardless of $n$, which can be verified by executing the following code and observing that it simply prints $16777216 = 2^{24}$ twice:
 
 ```cpp
 const int n = (1<<24);
@@ -152,7 +155,7 @@ for (int i = 0; i < n; i++)
 printf("%f\n", s);
 ```
 
-This happens because `float` has only 23 mantissa bits, and so $2^{23} + 1$ is the first integer number that can't be represented exactly and has to be rounded down, which happens every time we try to add $1$ to $s = 2^{23}$. The error is indeed $O(n \cdot \epsilon)$ but in terms of the absolute error, not the relative one: in the example above, it is $2$, and it would go up to infinity if the last number happened to be $-2^{23}$.
+This happens because `float` has only 23 mantissa bits, and so $2^{24} + 1$ is the first integer number that can't be represented exactly and has to be rounded down, which happens every time we try to add $1$ to $s = 2^{24}$. The error is indeed $O(n \cdot \epsilon)$ but in terms of the absolute error, not the relative one: in the example above, it is $2$, and it would go up to infinity if the last number happened to be $-2^{24}$.
 
 The obvious solution is to switch to a larger type such as `double`, but this isn't really a scalable method. An elegant solution is to store the parts that weren't added in a separate variable, which is then added to the next variable:
 
@@ -168,7 +171,7 @@ for (int i = 0; i < n; i++) {
 
 This trick is known as *Kahan summation*. Its relative error is bounded by $2 \epsilon + O(n \epsilon^2)$: the first term comes from the very last summation, and the second term is due to the fact that we work with less-than-epsilon errors on each step.
 
-Of course, a more general approach would be to switch to a more precise data type, like `double`, either way effectively squaring the machine epsilon. It can sort of be scaled by bundling two `double` variables together: one for storing the value, and another for its non-representable errors, so that they actually represent $a+b$. This approach is known as *double-double* arithmetic, and it can be similarly generalized to define quad-double and higher precision arithmetic.
+Of course, a more general approach that works not just for array summation would be to switch to a more precise data type, like `double`, also effectively squaring the machine epsilon. Furthermore, it can (sort of) be scaled by bundling two `double` variables together: one for storing the value and another for its non-representable errors so that they represent the value $a+b$. This approach is known as double-double arithmetic, and it can be similarly generalized to define quad-double and higher precision arithmetic.
 
 <!--
 
@@ -205,7 +208,7 @@ But what to do in general case, if the exponent value is either too large or too
 
 Multiplying or dividing by 10 is the same as incrementing the exponent (the resulting one after the "e" in scientific notation, not the binary). The idea is to find a proper power of 10 so that the resulting number will have . We need to precalculate numbers of the form $\frac{10^a}{2^b}$ (since exponent is limited, there won't be many of them). To get the precalculated number, we need to look at the exponent (or possibly its neighbors).
 
-The tricky part is the "shortest possible". It can be solved by printing digits one by one and trying to parse it back, but this would be too slow.
+The tricky part is the "shortest possible." It can be solved by printing digits one by one and trying to parse it back, but this would be too slow.
 
 How many decimal digits do we need to print a `float`?
 
diff --git a/content/english/hpc/arithmetic/float.md b/content/english/hpc/arithmetic/float.md
index 7feb2769..dcc33039 100644
--- a/content/english/hpc/arithmetic/float.md
+++ b/content/english/hpc/arithmetic/float.md
@@ -9,7 +9,7 @@ The users of floating-point arithmetic deserve one of these IQ bell curve memes
 - Then they discover that `0.1 + 0.2 != 0.3` or some other quirk like that, freak out, start thinking that some random error term is added to every computation, and for many years avoid any real data types completely.
 - Then they finally man up, read the specification of how IEEE-754 floats work and start using them appropriately.
 
-Most people are unfortunately still at stage 2, breeding various misconceptions about floating-point arithmetic — thinking that it is fundamentally imprecise and unstable, and slower than integer arithmetic.
+Unfortunately, too many people are still at stage 2, breeding various misconceptions about floating-point arithmetic — thinking that it is fundamentally imprecise and unstable, and slower than integer arithmetic.
 
 ![](../img/iq.svg)
 
@@ -117,7 +117,7 @@ struct fp {
 };
 ```
 
-This way we can represent numbers in the form $\pm \\; m \times 2^e$ where both $m$ and $e$ are bounded *and possibly negative* integers — which would correspond to negative or small numbers respectively. The distribution of these numbers is very much non-uniform: there are as many numbers in the $[0, 1]$ range as in the $[0, +\infty)$ range.
+This way we can represent numbers in the form $\pm \\; m \times 2^e$ where both $m$ and $e$ are bounded *and possibly negative* integers — which would correspond to negative or small numbers respectively. The distribution of these numbers is very much non-uniform: there are roughly as many numbers in the $[0, 1)$ range as in the $[1, +\infty)$ range.
 
 Note that these representations are not unique for some numbers. For example, number $1$ can be represented as
 
@@ -139,7 +139,7 @@ $$
 \{ \pm \; (1 + m) \cdot 2^e \; | \; m = \frac{x}{2^{32}}, \; x \in [0, 2^{32}) \}
 $$
 
-Since $m$ is now a nonnegative value, we will now make it unsigned integer, and instead add a separate boolean field for the sign of the number:
+Since $m$ is now a nonnegative value, we will now make it unsigned integer, and instead add a separate Boolean field for the sign of the number:
 
 ```cpp
 struct fp {
@@ -189,4 +189,4 @@ fp operator*(fp a, fp b) {
 
 Many applications that require higher levels of precision use software floating-point arithmetic in a similar fashion. But of course, you don't want to execute a sequence of 10 or so instructions that this code compiles to each time you want to multiply two real numbers, so on modern CPUs, floating-point arithmetic is implemented in hardware — usually as separate coprocessors due to its complexity.
 
-The *floating-point unit* of x86 (often referred to as x87) has separate registers and its own tiny instruction set that supports memory operations, basic arithmetic, trigonometry, and some common operations such as logarithm, exponent, and square root. To make these operations properly work together, some additional details of floating-point number representation need to be clarified — which we will do in [the next section](../ieee).
+The *floating-point unit* of x86 (often referred to as x87) has separate registers and its own tiny instruction set that supports memory operations, basic arithmetic, trigonometry, and some common operations such as logarithm, exponent, and square root. To make these operations properly work together, some additional details of floating-point number representation need to be clarified — which we will do in [the next section](../ieee-754).
diff --git a/content/english/hpc/arithmetic/ieee-754.md b/content/english/hpc/arithmetic/ieee-754.md
index 9c708ffe..7787b589 100644
--- a/content/english/hpc/arithmetic/ieee-754.md
+++ b/content/english/hpc/arithmetic/ieee-754.md
@@ -6,7 +6,7 @@ weight: 2
 When we designed our [DIY floating-point type](../float), we omitted quite a lot of important little details:
 
 - How many bits do we dedicate for the mantissa and the exponent?
-- Does a "0" sign bit mean "+", or is it the other way around?
+- Does a `0` sign bit mean `+`, or is it the other way around?
 - How are these bits stored in memory?
 - How do we represent 0?
 - How exactly does rounding happen?
@@ -15,7 +15,7 @@ When we designed our [DIY floating-point type](../float), we omitted quite a lot
 - What happens if we increment the largest representable number?
 - Can we somehow detect if one of the above three happened?
 
-Most of the early computers didn't have floating-point arithmetic, and when vendors started adding floating-point coprocessors, they had slightly different visions for what answers to those questions should be. Diverse implementations made it difficult to use floating-point arithmetic reliably and portably — particularly for people developing compilers.
+Most of the early computers didn't support floating-point arithmetic, and when vendors started adding floating-point coprocessors, they had slightly different visions for what the answers to these questions should be. Diverse implementations made it difficult to use floating-point arithmetic reliably and portably — especially for the people who develop compilers.
 
 In 1985, the Institute of Electrical and Electronics Engineers published a standard (called [IEEE 754](https://en.wikipedia.org/wiki/IEEE_754)) that provided a formal specification of how floating-point numbers should work, which was quickly adopted by the vendors and is now used in virtually all general-purpose computers.
 
@@ -25,9 +25,18 @@ Similar to our handmade float implementation, hardware floats use one bit for si
 
 ![](../img/float.svg)
 
-One of the reasons why they are stored in this exact order is so that it would be easier to compare and sort them: you can simply use largely the same comparator circuit as for [unsigned integers](../integer) — except for maybe flipping the bits in the case of negative numbers.
+One of the reasons why they are stored in this exact order is that it is easier to compare and sort them: you can use mostly the same comparator circuit as for [unsigned integers](../integer), except for maybe flipping some bits in case one of the numbers is negative.
 
-IEEE 754 and a few consequent standards define not one, but *several* representations that differ in sizes, most notably:
+For the same reason, the exponent is *biased:* the actual value is 127 less than the stored unsigned integer, which lets us also cover the values less than one (with negative exponents). In the example above:
+
+$$
+(-1)^0 \times 2^{01111100_2 - 127} \times (1 + 2^{-2})
+= 2^{124 - 127} \times 1.25
+= \frac{1.25}{8}
+= 0.15625
+$$
+
+IEEE 754 and a few consequent standards define not one but *several* representations that differ in sizes, most notably:
 
 |      Type | Sign | Exponent | Mantissa | Total bits | Approx. decimal digits |
 |----------:|------|----------|----------|------------|------------------------|
@@ -41,34 +50,34 @@ IEEE 754 and a few consequent standards define not one, but *several* representa
 Their availability ranges from chip to chip:
 
 - Most CPUs support single- and double-precision — which is what `float` and `double` types refer to in C.
-- Extended formats are exclusive to x86, and are available in C as the `long double` type, which falls back to double precision on arm. The choice of 64 bits for mantissa is so that every `long long` integer can be represented exactly. There is also a 40-bit format that similarly allocates 32 mantissa bits.
+- Extended formats are exclusive to x86, and are available in C as the `long double` type, which falls back to double precision on Arm CPUs. The choice of 64 bits for mantissa is so that every `long long` integer can be represented exactly. There is also a 40-bit format that similarly allocates 32 mantissa bits.
 - Quadruple as well as the 256-bit "octuple" formats are only used for specific scientific computations and are not supported by general-purpose hardware.
-- Half-precision arithmetic only supports a small subset of operations and is generally used for machine learning applications, especially neural networks, because they tend to do a large amount of calculation, but don't require a high level of precision.
-- Half-precision is being gradually replaced by bfloat, which trades off 3 mantissa bits to have the same range as single-precision, enabling interoperability with it. It is mostly being adopted by specialized hardware: TPUs, FGPAs, and GPUs. The name stands for "[Brain](https://en.wikipedia.org/wiki/Google_Brain) float".
+- Half-precision arithmetic only supports a small subset of operations and is generally used for applications such as machine learning, especially neural networks, because they tend to perform large amounts of calculations but don't require high levels of precision.
+- Half-precision is being gradually replaced by bfloat, which trades off 3 mantissa bits to have the same range as single-precision, enabling interoperability with it. It is mostly being adopted by specialized hardware: TPUs, FGPAs, and GPUs. The name stands for "[Brain](https://en.wikipedia.org/wiki/Google_Brain) float."
 
-Lower precision types need less memory bandwidth to move them around and usually take fewer cycles to operate on (e. g. the division instruction may take $x$, $y$, or $z$ cycles depending on the type), which is why they are preferred when error tolerance allows it.
+Lower-precision types need less memory bandwidth to move them around and usually take fewer cycles to operate on (e.g., the division instruction may take $x$, $y$, or $z$ cycles depending on the type), which is why they are preferred when error tolerance allows it.
 
-Deep learning, emerging as a very popular and computationally-intensive field, created a huge demand for low-precision matrix multiplication, which led to manufacturers developing separate hardware or at least adding specialized instructions that support these types of computations — most notably, Google developing a custom chip called TPU (*tensor processing unit*) that specializes on multiplying 128-by-128 bfloat matrices, and NVIDIA adding "tensor cores", capable of performing 4-by-4 matrix multiplication in one go, to all their newer GPUs.
+Deep learning, emerging as a very popular and computationally-intensive field, created a huge demand for low-precision matrix multiplication, which led to manufacturers developing separate hardware or at least adding specialized instructions that support these types of computations — most notably, Google developing a custom chip called TPU (*tensor processing unit*) that specializes on multiplying 128-by-128 bfloat matrices, and NVIDIA adding "tensor cores," capable of performing 4-by-4 matrix multiplication in one go, to all their newer GPUs.
 
-Apart from their sizes, most of the behavior is exactly the same between all floating-point types, which we will now clarify.
+Apart from their sizes, most of the behavior is the same between all floating-point types, which we will now clarify.
 
 ## Handling Corner Cases
 
 The default way integer arithmetic deals with corner cases such as division by zero is to crash.
 
-Sometimes a software crash, in turn, causes a real, physical one. In 1996, the maiden flight of the [Ariane 5](https://en.wikipedia.org/wiki/Ariane_5) (the space launch vehicle that ESA uses to lift stuff into low Earth orbit) ended in [a catastrophic explosion](https://www.youtube.com/watch?v=gp_D8r-2hwk) due to the policy of aborting computation on arithmetic error, which in this case was a floating-point to integer conversion overflow, that led to the navigation system thinking that it was off course and making a large correction, eventually causing the disintegration of a $1B rocket.
+Sometimes a software crash, in turn, causes a real, physical one. In 1996, the maiden flight of the [Ariane 5](https://en.wikipedia.org/wiki/Ariane_5) (the space launch vehicle that ESA uses to lift stuff into low Earth orbit) ended in [a catastrophic explosion](https://www.youtube.com/watch?v=gp_D8r-2hwk) due to the policy of aborting computation on arithmetic error, which in this case was a floating-point to integer conversion overflow, that led to the navigation system thinking that it was off course and making a large correction, eventually causing the disintegration of a $200M rocket.
 
-There is a way to gracefully handle corner cases like these: hardware interrupts. When an exception occurs, CPU:
+There is a way to gracefully handle corner cases like these: hardware interrupts. When an exception occurs, the CPU
 
 - interrupts the execution of a program;
-- packs every all relevant information into a data structure called "interrupt vector";
+- packs all relevant information into a data structure called "interrupt vector";
 - passes it to the operating system, which in turn either calls the handling code if it exists (the "try-except" block) or terminates the program otherwise.
 
 This is a complex mechanism that deserves an article of its own, but since this is a book about performance, the only thing you need to know is that they are quite slow and not desirable in real-time systems such as navigating rockets.
 
 ### NaNs, Zeros and Infinities
 
-Floating-point arithmetic often deals with noisy, real-world data, and exceptions there are much more common than in the integer case. For this reason, the default behavior is different. Instead of crashing, the result is substituted with a special value without interrupting the executing, unless the programmer explicitly wants to.
+Floating-point arithmetic often deals with noisy, real-world data. Exceptions there are much more common than in the integer case, and for this reason, the default behavior when handling them is different. Instead of crashing, the result is substituted with a special value without interrupting the program execution (unless the programmer explicitly wants it to).
 
 The first type of such value is the two infinities: a positive and a negative one. They are generated if the result of an operation can't fit within the representable range, and they are treated as such in arithmetic.
 
@@ -99,7 +108,7 @@ $$
 
 There are two types of NaNs: a *signaling NaN* and a *quiet NaN*. A signaling NaN raises an exception flag, which may or may not cause immediate hardware interrupt depending on the FPU configuration, while a quiet NaN just propagates through almost every arithmetic operation, resulting in more NaNs.
 
-Both NaNs are encoded as all their exponent set to ones and the mantissa part being everything other than all zeroes (to distinguish them from infinities).
+In binary, both NaNs have their exponent bits all set and the mantissa part being anything other than all zeros (to distinguish them from infinities). Note that there are *very* many valid encodings for a NaN.
 
 ## Further Reading
 
diff --git a/content/english/hpc/arithmetic/integer.md b/content/english/hpc/arithmetic/integer.md
index bd70314b..686db686 100644
--- a/content/english/hpc/arithmetic/integer.md
+++ b/content/english/hpc/arithmetic/integer.md
@@ -19,7 +19,7 @@ $$
 \end{aligned}
 $$
 
-When the result of an operation can't fit into the word size (e. g. is more or equal to $2^{32}$ for 32-bit unsigned integers), it *overflows* by leaving only the lowest 32 bits of the result. Similarly, if the result is a negative value, it *underflows* by adding it to $2^{32}$, so that it always stays in the $[0, 2^{32})$ range.
+When the result of an operation can't fit into the word size (e.g., is more or equal to $2^{32}$ for 32-bit unsigned integers), it *overflows* by leaving only the lowest 32 bits of the result. Similarly, if the result is a negative value, it *underflows* by adding it to $2^{32}$, so that it always stays in the $[0, 2^{32})$ range.
 
 This is equivalent to performing all operations modulo a power of two:
 
@@ -90,10 +90,10 @@ The bits of an integer are simply stored sequentially. The only ambiguity here i
 
 This seems like an important architecture aspect, but in most cases, it doesn't make a difference: just pick one style and stick with it. But in some cases it does:
 
-- Little-endian has the advantage that you can cast a value to a smaller type (e. g. `long long` to `int`) by just loading fewer bytes, which in most cases means doing nothing — thanks to *register aliasing*, `eax` refers to the first 4 bytes of `rax`, so conversion is essentially free. It is also easier to read values in a variety of type sizes — while on big-endian architectures, loading an `int` from a `long long` array would require shifting the pointer by 2 bytes.
+- Little-endian has the advantage that you can cast a value to a smaller type (e.g., `long long` to `int`) by just loading fewer bytes, which in most cases means doing nothing — thanks to *register aliasing*, `eax` refers to the first 4 bytes of `rax`, so conversion is essentially free. It is also easier to read values in a variety of type sizes — while on big-endian architectures, loading an `int` from a `long long` array would require shifting the pointer by 2 bytes.
 - Big-endian has the advantage that higher bytes are loaded first, which in theory can make highest-to-lowest routines such as comparisons and printing faster. You can also perform certain checks such as finding out whether a number is negative by only loading its first byte.
 
-Big-endian is also more "natural" — this is how we write binary numbers on paper — but the advantage of having faster type conversions outweigh it. For this reason, little-endian is used by default on most hardware, although some CPUs are "bi-endian" and can be configured to switch modes on demand.
+Big-endian is also more "natural" — this is how we write binary numbers on paper — but the advantage of having faster type conversions outweights it. For this reason, little-endian is used by default on most hardware, although some CPUs are "bi-endian" and can be configured to switch modes on demand.
 
 ### 128-bit Integers
 
diff --git a/content/english/hpc/arithmetic/newton.md b/content/english/hpc/arithmetic/newton.md
index 8ce3cd37..510312aa 100644
--- a/content/english/hpc/arithmetic/newton.md
+++ b/content/english/hpc/arithmetic/newton.md
@@ -3,9 +3,9 @@ title: Newton's Method
 weight: 3
 ---
 
-Reaching the maximum possible precision is rarely required from a practical algorithm. In real-world data, modeling and measurement errors are usually several orders of magnitude larger than the errors that come from rounding floating-point numbers and such, so we are often perfectly happy with picking an approximate method that trades off precision for speed.
+Reaching the maximum possible precision is rarely required from a practical algorithm. In real-world data, modeling and measurement errors are usually several orders of magnitude larger than the errors that come from rounding floating-point numbers and such, and we are often perfectly happy with picking an approximate method that trades off precision for speed.
 
-In this section, we introduce one of the most important building blocks in such approximate, numerical algorithms: *the Newton's method*.
+In this section, we introduce one of the most important building blocks in such approximate, numerical algorithms: *Newton's method*.
 
 ## Newton's Method
 
@@ -15,7 +15,7 @@ $$
 f(x) = 0
 $$
 
-The only thing assumed about the function $f$ is that at least one root exists and that $f(x)$ is continuous and differentiable on the search interval.
+The only thing assumed about the function $f$ is that at least one root exists and that $f(x)$ is continuous and differentiable on the search interval. There are also some [boring corner cases](https://en.wikipedia.org/wiki/Newton%27s_method#Failure_analysis), but they almost never occur in practice, so we will just informally say that the function is "good."
 
 The main idea of the algorithm is to start with some initial approximation $x_0$ and then iteratively improve it by drawing the tangent to the graph of the function at $x = x_i$ and setting the next approximation $x_{i+1}$ equal to the $x$-coordinate of its intersection with the $x$-axis. The intuition is that if the function $f$ is "[good](https://en.wikipedia.org/wiki/Smoothness)" and $x_i$ is already close enough to the root, then $x_{i+1}$ will be even closer.
 
@@ -33,7 +33,7 @@ $$
 x_{i+1} = x_i - \frac{f(x_i)}{f'(x_i)}
 $$
 
-Newton's method is very important: it is the basis of most optimization solvers in science and engineering. 
+Newton's method is very important: it is the basis of a wide range of optimization solvers in science and engineering. 
 
 ### Square Root
 
@@ -62,15 +62,15 @@ double sqrt(double n) {
 }
 ```
 
-The algorithm converges for many functions, although it does so reliably and provably only for a certain subset of them (e. g. convex functions). Another question is how fast the convergence is, if it occurs.
+The algorithm converges for many functions, although it does so reliably and provably only for a certain subset of them (e.g., convex functions). Another question is how fast the convergence is, if it occurs.
 
 ### Rate of Convergence
 
 Let's run a few iterations of Newton's method to find the square root of $2$, starting with $x_0 = 1$, and check how many digits it got correct after each iteration:
 
-<pre>
-<b>1</b>
-<b>1</b>.5
+<pre class='center-pre'>
+<b>1</b>.0000000000000000000000000000000000000000000000000000000000000
+<b>1</b>.5000000000000000000000000000000000000000000000000000000000000
 <b>1.41</b>66666666666666666666666666666666666666666666666666666666675
 <b>1.41421</b>56862745098039215686274509803921568627450980392156862745
 <b>1.41421356237</b>46899106262955788901349101165596221157440445849057
diff --git a/content/english/hpc/arithmetic/rsqrt.md b/content/english/hpc/arithmetic/rsqrt.md
index 06659136..f1529d42 100644
--- a/content/english/hpc/arithmetic/rsqrt.md
+++ b/content/english/hpc/arithmetic/rsqrt.md
@@ -3,13 +3,13 @@ title: Fast Inverse Square Root
 weight: 4
 ---
 
-The inverse square root of a floating-point number $\frac{1}{\sqrt x}$ is used in calculating normalized vectors, which are in turn extensively used in various simulation scenarios such as computer graphics, e. g. to determine angles of incidence and reflection to simulate lighting.
+The inverse square root of a floating-point number $\frac{1}{\sqrt x}$ is used in calculating normalized vectors, which are in turn extensively used in various simulation scenarios such as computer graphics (e.g., to determine angles of incidence and reflection to simulate lighting).
 
 $$
 \hat{v} = \frac{\vec v}{\sqrt {v_x^2 + v_y^2 + v_z^2}}
 $$
 
-Calculating inverse square root directly — by first calculating square root and then dividing by it — is extremely slow, because both of these operations are slow even though they are implemented in hardware.
+Calculating an inverse square root directly — by first calculating a square root and then dividing $1$ by it — is extremely slow because both of these operations are slow even though they are implemented in hardware.
 
 But there is a surprisingly good approximation algorithm that takes advantage of the way floating-point numbers are stored in memory. In fact, it is so good that it has been [implemented in hardware](https://www.felixcloutier.com/x86/rsqrtps), so the algorithm is no longer relevant by itself for software engineers, but we are nonetheless going to walk through it for its intrinsic beauty and great educational value.
 
@@ -77,19 +77,21 @@ $$
 \log_2 x = e_x + \log_2 (1 + m_x) \approx e_x + m_x + \sigma
 $$
 
-Now, having this approximation in mind and defining $L=23$ as the number of mantissa bits in a `float` and $B=127$ for the exponent bias, when we reinterpret the bit-pattern of $x$ as an integer $I_x$, we get
+Now, having this approximation in mind and defining $L=2^{23}$ (the number of mantissa bits in a `float`) and $B=127$ (the exponent bias), when we reinterpret the bit-pattern of $x$ as an integer $I_x$, we essentially get
 
 $$
 \begin{aligned}
-I_x &= L(e_x + B + m_x)
-\\  &= L(e_x + m_x + \sigma +B-\sigma )
-\\  &\approx L\log_2 (x) + L (B-\sigma )
+I_x &= L \cdot (e_x + B + m_x)
+\\  &= L \cdot (e_x + m_x + \sigma +B-\sigma )
+\\  &\approx L \cdot \log_2 (x) + L \cdot (B-\sigma )
 \end{aligned}
 $$
 
+(Multiplying an integer by $L=2^{23}$ is equivalent to left-shifting it by 23.)
+
 When you tune $\sigma$ to minimize the mean square error, this results in a surprisingly accurate approximation.
 
-![](../img/approx.svg)
+![Reinterpreting a floating-point number $x$ as an integer (blue) compared to its scaled and shifted logarithm (gray)](../img/approx.svg)
 
 Now, expressing the logarithm from the approximation, we get
 
diff --git a/content/english/hpc/compilation/_index.md b/content/english/hpc/compilation/_index.md
index cbc0f691..e32ba624 100644
--- a/content/english/hpc/compilation/_index.md
+++ b/content/english/hpc/compilation/_index.md
@@ -6,6 +6,6 @@ weight: 4
 
 The main benefit of [learning assembly language](../architecture/assembly) is not the ability to write programs in it, but the understanding of what is happening during the execution of compiled code and its performance implications.
 
-There are rare cases where we *really* need to switch to handwritten assembly for maximal performance, but most of the time compilers are capable of producing near-optimal code all by themselves. When they do not, it is usually because the programmer knows more about the problem than what can be inferred from the source code, but failed to communicate this extra information to the compiler.
+There are rare cases where we *really* need to switch to handwritten assembly for maximal performance, but most of the time compilers are capable of producing near-optimal code all by themselves. When they do not, it is usually because the programmer knows more about the problem than what can be inferred from the source code but failed to communicate this extra information to the compiler.
 
-In this chapter, we will discuss the intricacies of getting compiler to do exactly what we want and gathering useful information that can guide further optimizations.
+In this chapter, we will discuss the intricacies of getting the compiler to do exactly what we want and gathering useful information that can guide further optimizations.
diff --git a/content/english/hpc/compilation/contracts.md b/content/english/hpc/compilation/contracts.md
index cedf20dd..56a50d6b 100644
--- a/content/english/hpc/compilation/contracts.md
+++ b/content/english/hpc/compilation/contracts.md
@@ -17,11 +17,11 @@ There are two major groups of actions that cause undefined behavior:
 
 - Operations that have slightly different observable behavior on different platforms. For example, the result of left-shifting an integer by more than 31 bits is undefined, because the instruction that does it is implemented differently on Arm and x86 CPUs. If you standardize one specific behavior, then all programs compiled for the other platform will have to spend a few more cycles checking for that edge case, so it is best to leave it undefined.
 
-  Sometimes, when there is a legitimate use case for some platform-specific behavior, instead of declaring it undefined, it can be left *implementation-defined*. For example, the result of right-shifting a [negative integer](/hpc/arithmetic/integer) depends on the platform: it either shifts in zeros or ones (e. g. right shifting `11010110 = -42` by one may mean either `01101011 = 107` or `11101011 = -21`, both use cases being realistic).
+  Sometimes, when there is a legitimate use case for some platform-specific behavior, instead of declaring it undefined, it can be left *implementation-defined*. For example, the result of right-shifting a [negative integer](/hpc/arithmetic/integer) depends on the platform: it either shifts in zeros or ones (e.g., right-shifting `11010110 = -42` by one may mean either `01101011 = 107` or `11101011 = -21`, both use cases being realistic).
 
 Designating something as undefined instead of implementation-defined behavior also helps compilers in optimization. Consider the case of signed integer overflow. On almost all architectures, [signed integers](/hpc/arithmetic/integer) overflow the same way as unsigned ones, with `INT_MAX + 1 == INT_MIN`, and yet, this is undefined behavior according to the C++ standard. This is very much intentional: if you disallow signed integer overflow, then `(x + 1) > x` is guaranteed to be always true for `int`, but not for `unsigned int`, because `(x + 1)` may overflow. For signed types, this lets compilers optimize such checks away.
 
-As a more naturally occurring example, consider the case of a loop with an integer control variable. Modern C++ and languages like Rust are advocating for using an unsigned integer (`size_t` / `usize`), while C programmers stubbornly keep using `int`. To understand why, consider the following `for` loop:
+As a more naturally occurring example, consider the case of a loop with an integer control variable. Modern C++ and languages like Rust encourage programmers to use an unsigned integer (`size_t` / `usize`), while C programmers stubbornly keep using `int`. To understand why, consider the following `for` loop:
 
 ```cpp
 for (unsigned int i = 0; i < n; i++) {
@@ -45,7 +45,7 @@ T at(size_t k) {
 }
 ```
 
-Interestingly, these checks are rarely actually executed during runtime because the compiler can often prove — during compile-time — that each access will be within bounds. For example, when iterating in a `for` loop from 1 to the array size and indexing $i$-th element on each step, nothing illegal can possibly happen, so the bounds checks can be safely optimized away.
+Interestingly, these checks are rarely actually executed during runtime because the compiler can often prove — during compile time — that each access will be within bounds. For example, when iterating in a `for` loop from 1 to the array size and indexing $i$-th element on each step, nothing illegal can possibly happen, so the bounds checks can be safely optimized away.
 
 ### Assumptions
 
@@ -157,7 +157,7 @@ void add(int *a, int *b, int n) {
 
 Since each iteration of this loop is independent, it can be executed in parallel and [vectorized](/hpc/simd). But is it, technically?
 
-There may be a problem if the arrays `a` and `b` intersect. Consider the case when `b == a + 1`, that is, if `b` is just a memory view of `a` starting from its second element. In this case, the next iteration depends on the previous one, and the only correct solution is to execute the loop sequentially. The compiler has to check for such possibilities, even if the programmer knows they can't happen.
+There may be a problem if the arrays `a` and `b` intersect. Consider the case when `b == a + 1`, that is, if `b` is just a memory view of `a` starting from its second element. In this case, the next iteration depends on the previous one, and the only correct solution is to execute the loop sequentially. The compiler has to check for such possibilities even if the programmer knows they can't happen.
 
 This is why we have `const` and `restrict` keywords. The first one enforces that we won't modify memory with the pointer variable, and the second is a way to tell the compiler that the memory is guaranteed to not be aliased.
 
@@ -196,7 +196,7 @@ int mod_power_of_two(int x, int m)
     [[ expects: is_power_of_two(m) ]]
     [[ ensures r: r >= 0 && r < m ]]
 {
-    float r = x & (m - 1);
+    int r = x & (m - 1);
     [[ assert: r = x % m ]];
     return r;
 }
diff --git a/content/english/hpc/compilation/flags.md b/content/english/hpc/compilation/flags.md
index 08e83341..74383237 100644
--- a/content/english/hpc/compilation/flags.md
+++ b/content/english/hpc/compilation/flags.md
@@ -1,6 +1,7 @@
 ---
 title: Flags and Targets
 weight: 2
+published: true
 ---
 
 The first step of getting high performance from the compiler is to ask for it, which is done with over a hundred different compiler options, attributes, and pragmas.
@@ -11,9 +12,9 @@ There are 4 *and a half* main levels of optimization for speed in GCC:
 
 - `-O0` is the default one that does no optimizations (although, in a sense, it does optimize: for compilation time).
 - `-O1` (also aliased as `-O`) does a few "low-hanging fruit" optimizations, almost not affecting the compilation time.
-- `-O2` enables all optimizations that are known to have little to no negative side effects and take reasonable time to complete (this is what most projects use for production builds).
+- `-O2` enables all optimizations that are known to have little to no negative side effects and take a reasonable time to complete (this is what most projects use for production builds).
 - `-O3` does very aggressive optimization, enabling almost all *correct* optimizations implemented in GCC.
-- `-Ofast` does everything in `-O3`, plus a few more optimizations flags that may break strict standard compliance, but not in a way that would be critical for most applications (e. g. floating-point operations may be rearranged so that the result is off by a few bits of the mantissa).
+- `-Ofast` does everything in `-O3`, plus a few more optimizations flags that may break strict standard compliance, but not in a way that would be critical for most applications (e.g., floating-point operations may be rearranged so that the result is off by a few bits in the mantissa).
 
 There are also many other optimization flags that are not included even in `-Ofast`, because they are very situational, and enabling them by default is more likely to hurt performance rather than improve it — we will talk about some of them in [the next section](../situational).
 
@@ -21,7 +22,7 @@ There are also many other optimization flags that are not included even in `-Ofa
 
 The next thing you may want to do is to tell the compiler more about the computer(s) this code is supposed to be run on: the smaller the set of platforms is, the better. By default, it will generate binaries that can run on any relatively new (>2000) x86 CPU. The simplest way to narrow it down is to pass `-march` flag to specify the exact microarchitecture: `-march=haswell`. If you are compiling on the same computer that will run the binary, you can use `-march=native` for auto-detection.
 
-The instruction sets are generally backward-compatible, so it is often enough to just use the name of the oldest microarchitecture you need to support. A more robust approach is to list specific features that the CPU is guaranteed to have: `-mavx2`, `-mpopcount`. When you just want to *tune* the program for a particular machine without using any instructions that may crash it on incompatible CPUs, you can use the `-mtune` flag (by default `-march=x` also implies `-mtune=x`).
+The instruction sets are generally backward-compatible, so it is often enough to just use the name of the oldest microarchitecture you need to support. A more robust approach is to list specific features that the CPU is guaranteed to have: `-mavx2`, `-mpopcnt`. When you just want to *tune* the program for a particular machine without using any instructions that may crash it on incompatible CPUs, you can use the `-mtune` flag (by default `-march=x` also implies `-mtune=x`).
 
 These options can also be specified for a compilation unit with pragmas instead of compilation flags:
 
@@ -34,7 +35,7 @@ This is useful when you need to optimize a single high-performance procedure wit
 
 ### Multiversioned Functions
 
-Sometimes you may also want to provide several architecture-specific implementations in a single library. You can use attribute-based syntax to select between multiversioned functions automatically during compile-time:
+Sometimes you may also want to provide several architecture-specific implementations in a single library. You can use attribute-based syntax to select between multiversioned functions automatically during compile time:
 
 ```c++
 __attribute__(( target("default") )) // fallback implementation
diff --git a/content/english/hpc/compilation/limitations.md b/content/english/hpc/compilation/limitations.md
index 0c76946e..521f78e7 100644
--- a/content/english/hpc/compilation/limitations.md
+++ b/content/english/hpc/compilation/limitations.md
@@ -21,7 +21,7 @@ In general, when an optimization doesn't happen, it is usually because one of th
 
 - The compiler doesn't have enough information to know it will be beneficial.
 - The optimization is actually not always correct: there is an input on which the result doesn't comply with the spec, even if it is correct on every input that the programmer expects.
-- It isn't implemented in the compiler yet, either because it is too hard to implement in general, too costly to compute or too rare to be worth the trouble (e. g. writing a tiny library for some specific algorithm is usually better than hardcoding it into compiler).
+- It isn't implemented in the compiler yet, either because it is too hard to implement in general, too costly to compute or too rare to be worth the trouble (e.g., writing a tiny library for some specific algorithm is usually better than hardcoding it into compiler).
 
 In addition, optimization sometimes fails just due to the source code being overly complicated.
 
@@ -34,4 +34,4 @@ Usually the right approach to performance is to think how the main hot spots of
 2. Is there a real-world dataset for which the optimization may not be beneficial? (hints, pragmas, PGO)
 3. Are there at least 1000 other places where this optimization makes sense? (remove abstractions and implement it manually, add a feature request for GCC and Clang)
 
-In the majority of the cases, at least one of these answers will be "no", and then you will know what to do.
+In the majority of the cases, at least one of these answers will be "no," and then you will know what to do.
diff --git a/content/english/hpc/compilation/precalc.md b/content/english/hpc/compilation/precalc.md
index 2bec9995..7de4c8fb 100644
--- a/content/english/hpc/compilation/precalc.md
+++ b/content/english/hpc/compilation/precalc.md
@@ -3,13 +3,13 @@ title: Precomputation
 weight: 8
 ---
 
-When compilers can infer that a certain variable does not depend on any user-provided data, they can compute its value during compile-time and turn it into a constant by embedding it into the generated machine code.
+When compilers can infer that a certain variable does not depend on any user-provided data, they can compute its value during compile time and turn it into a constant by embedding it into the generated machine code.
 
-This optimization helps performance a lot, but it is not a part of the C++ standard, so compilers don't *have to* do that. When a compile-time computation is either hard to implement or time-intensive, they have a full legal right to pass on that opportunity.
+This optimization helps performance a lot, but it is not a part of the C++ standard, so compilers don't *have to* do that. When a compile-time computation is either hard to implement or time-intensive, a compiler may pass on that opportunity.
 
 ### Constant Expressions
 
-In modern C++, you can mark a function as `constexpr`, and if it is called by passing constants, its value is guaranteed to be computed during compile-time:
+For a more reliable solution, in modern C++ you can mark a function as `constexpr`; if it is called by passing constants its value is guaranteed to be computed during compile time:
 
 ```c++
 constexpr int fibonacci(int n) {
@@ -21,9 +21,9 @@ constexpr int fibonacci(int n) {
 static_assert(fibonacci(10) == 55);
 ```
 
-These functions have some restrictions like that they only call other `constexpr` functions and can't do memory allocation, but otherwise, they are executed "as is".
+These functions have some restrictions like that they only call other `constexpr` functions and can't do memory allocation, but otherwise, they are executed "as is."
 
-Note that while they don't cost anything during the run-time, they still increase compilation time, so at least remotely care about their efficiency and don't put something NP-complete in them:
+Note that while `constexpr` functions don't cost anything during run time, they still increase compilation time, so at least remotely care about their efficiency and don't put something NP-complete in them:
 
 ```c++
 constexpr int fibonacci(int n) {
@@ -37,7 +37,7 @@ constexpr int fibonacci(int n) {
 }
 ```
 
-There used to be much more limitations in earlier C++ standards, like you could not use any sort of state inside them and had to rely on recursion, so the whole process felt more like Haskell programming rather than C++. Since C++17, you can even compute static arrays using the imperative style, which is useful for precomputing lookup tables:
+There used to be many more limitations in earlier C++ standards, like you could not use any sort of state inside them and had to rely on recursion, so the whole process felt more like Haskell programming rather than C++. Since C++17, you can even compute static arrays using the imperative style, which is useful for precomputing lookup tables:
 
 ```c++
 struct Precalc {
@@ -54,20 +54,20 @@ constexpr Precalc P;
 static_assert(P.isqrt[42] == 6);
 ```
 
-Note that when you call `constexpr` functions while passing non-constants, the compiler may or may not compute them during compile-time:
+Note that when you call `constexpr` functions while passing non-constants, the compiler may or may not compute them during compile time:
 
 ```c++
 for (int i = 0; i < 100; i++)
     cout << fibonacci(i) << endl;
 ```
 
-In this example, even though technically we perform a constant number of iterations and call `fibonacci` with parameters known at compile-time, they are technically not compile-time constants. It's up to the compiler whether to optimize this loop or not — and for heavy computations, it often chooses not to.
+In this example, even though technically we perform a constant number of iterations and call `fibonacci` with parameters known at compile time, they are technically not compile-time constants. It's up to the compiler whether to optimize this loop or not — and for heavy computations, it often chooses not to.
 
 <!--
 
 ### Code Generation
 
-There are plenty of languages that support computing *data* during compile-time, but none can produce efficient code at all times.
+There are plenty of languages that support computing *data* during compile time, but none can produce efficient code at all times.
 
 One huge example is generating lexers and parsers: which is usually done in.
 
diff --git a/content/english/hpc/compilation/situational.md b/content/english/hpc/compilation/situational.md
index ee758f06..41620c70 100644
--- a/content/english/hpc/compilation/situational.md
+++ b/content/english/hpc/compilation/situational.md
@@ -63,7 +63,7 @@ This is a new feature that only appeared in C++20. Before that, there were compi
 
 ```c++
 int factorial(int n) {
-    if (likely(n > 1))
+    if (__builtin_expect(n > 1, 1))
         return n * factorial(n - 1);
     else
         return 1;
@@ -96,13 +96,13 @@ The whole process is automated by modern compilers. For example, the `-fprofile-
 g++ -fprofile-generate [other flags] source.cc -o binary
 ```
 
-After we run the program — preferably on input that is as representative of real use case as possible — it will create a bunch of `*.gcda` files that contain log data for the test run, after which we can rebuild the program, but now adding the `-fprofile-use` flag:
+After we run the program — preferably on input that is as representative of the real use case as possible — it will create a bunch of `*.gcda` files that contain log data for the test run, after which we can rebuild the program, but now adding the `-fprofile-use` flag:
 
 ```
 g++ -fprofile-use [other flags] source.cc -o binary
 ```
 
-It usually improves performance by 10-20% for large codebases, and for this reason it is commonly included in the build process of performance-critical projects. One more reason to invest in solid benchmarking code.
+It usually improves performance by 10-20% for large codebases, and for this reason it is commonly included in the build process of performance-critical projects. This is more reason to invest in solid benchmarking code.
 
 <!--
 
diff --git a/content/english/hpc/compilation/stages.md b/content/english/hpc/compilation/stages.md
index d321076d..95c40050 100644
--- a/content/english/hpc/compilation/stages.md
+++ b/content/english/hpc/compilation/stages.md
@@ -7,10 +7,10 @@ Before jumping straight to compiler optimizations, which is what most of this ch
 
 1. **Preprocessing** expands macros, pulls included source from header files, and strips off comments from source code: `gcc -E source.c` (outputs preprocessed source to stdout)
 2. **Compiling** parses the source, checks for syntax errors, converts it into an intermediate representation, performs optimizations, and finally translates it into assembly language: `gcc -S file.c` (emits an `.s` file)
-3. **Assembly** turns it into machine code, except that any external function calls like `printf` are substituted with placeholders: `gcc -c file.c` (emits an `.o` file, called *object file*)
+3. **Assembly** turns assembly language into machine code, except that any external function calls like `printf` are substituted with placeholders: `gcc -c file.c` (emits an `.o` file, called *object file*)
 4. **Linking** finally resolves the function calls by plugging in their actual addresses, and produces an executable binary: `gcc -o binary file.c`
 
-There are possibilities to gain something for performance in each of these stages.
+There are possibilities to improve program performance in each of these stages.
 
 ### Interprocedural Optimization
 
@@ -19,13 +19,13 @@ We have the last [stage](../stages), linking, because it is is both easier and f
 It also gives the ability to distribute code as *libraries*, which can be either *static* or *shared*:
 
 - *Static* libraries are simply collections of precompiled object files that are merged with other sources by the compiler to produce a single executable, just as it normally would.
-- *Dynamic* or *shared* libraries are precompiled executables that have additional meta-information about where their callables are, references to which are resolved during runtime. As the name suggests, this allows *sharing* the compiled binaries between multiple users.
+- *Dynamic* or *shared* libraries are precompiled executables that have additional meta-information about where their callables are, references to which are resolved during runtime. As the name suggests, this allows *sharing* the compiled binaries between multiple programs.
 
 The main advantage of using static libraries is that you can perform various *interprocedural optimizations* that require more context than just the signatures of library functions, such as [function inlining](/hpc/architecture/functions) or dead code elimination. To force the linker to look for and only accept static libraries, you can pass the `-static` option.
 
-This process is called *link-time optimization*, and it is possible because modern compilers also store some form of *intermediate representation* in object files, which allows them to perform certain lightweight optimizations on the program as a whole. This also allows using different compiled languages in the same program, which can even be optimized across language barriers if their compilers use the same intermediate representation.
+This process is called *link-time optimization (LTO)*, and it is possible because modern compilers also store some form of *intermediate representation* in object files, which allows them to perform certain lightweight optimizations on the program as a whole. This also allows using different compiled languages in the same program, which can even be optimized across language barriers if their compilers use the same intermediate representation.
 
-LTO is a relatively recent feature (it appeared in GCC only around 2014), and it is still far from perfect. In C and C++, the way to make sure no performance is lost is to create a *header-only library*. As the name suggests, they are just header files that contain full definitions of all functions, and so by simply including them, the compiler gets access to all optimizations possible. Although you do have to recompile them from scratch each time, this approach retains full control and makes sure that no performance is lost.
+LTO is a relatively recent feature (it appeared in GCC only around 2014), and it is still far from perfect. In C and C++, the way to make sure no performance is lost due to separate compilation is to create a *header-only library*. As the name suggests, they are just header files that contain full definitions of all functions, and so by simply including them, the compiler gets access to all optimizations possible. Although you do have to recompile the library code from scratch each time, this approach retains full control and makes sure that no performance is lost.
 
 ### Inspecting the Output
 
diff --git a/content/english/hpc/complexity/_index.md b/content/english/hpc/complexity/_index.md
index 9d466bae..64b8a2f2 100644
--- a/content/english/hpc/complexity/_index.md
+++ b/content/english/hpc/complexity/_index.md
@@ -11,7 +11,7 @@ Complexity is an old concept. It was [systematically formulated](http://www.cs.a
 
 ### Classical Complexity Theory
 
-The "elementary operations" of a CPU are called *instructions*, and their "costs" are called *latencies*. Instructions are stored in *memory* and executed one by one by the processor, which has some internal *state* stored in a number of *registers*. One of these registers is the *instruction pointer* that indicates the address of the next instruction to read and execute. Each instruction changes the state of the processor in a certain way (including moving the instruction pointer), possibly modifies the main memory, and takes a different amount of *CPU cycles* to complete before the next one can be started.
+The "elementary operations" of a CPU are called *instructions*, and their "costs" are called *latencies*. Instructions are stored in *memory* and executed one by one by the processor, which has some internal *state* stored in a number of *registers*. One of these registers is the *instruction pointer*, which indicates the address of the next instruction to read and execute. Each instruction changes the state of the processor in a certain way (including moving the instruction pointer), possibly modifies the main memory, and takes a different number of *CPU cycles* to complete before the next one can be started.
 
 To estimate the real running time of a program, you need to sum all latencies for its executed instructions and divide it by the *clock frequency*, that is, the number of cycles a particular CPU does per second. 
 
@@ -19,7 +19,7 @@ To estimate the real running time of a program, you need to sum all latencies fo
 
 The clock frequency is a volatile and often unknown variable that depends on the CPU model, operating system settings, current microchip temperature, power usage of other components, and quite a few other things. In contrast, instruction latencies are static and even somewhat consistent across different CPUs when expressed in clock cycles, so counting them instead is much more useful for analytical purposes.
 
-For example, the by-definition matrix multiplication algorithm requires the total of $n^2 \cdot (n + n - 1)$ arithmetic operations: specifically, $n^3$ multiplications and $n^2 \cdot (n - 1)$ additions. If we look up the latencies for these instructions (in special documents called *instruction tables*, like [this one](https://www.agner.org/optimize/instruction_tables.pdf)), we can find that multiplication takes e. g. 3 cycles, while addition takes 1, so we need a total of $3 \cdot n^3 + n^2 \cdot (n - 1) = 4 \cdot n^3 - n^2$ clock cycles for the entire computation (bluntly ignoring everything else that needs to be done to "feed" these instructions with the right data).
+For example, the by-definition matrix multiplication algorithm requires the total of $n^2 \cdot (n + n - 1)$ arithmetic operations: specifically, $n^3$ multiplications and $n^2 \cdot (n - 1)$ additions. If we look up the latencies for these instructions (in special documents called *instruction tables*, like [this one](https://www.agner.org/optimize/instruction_tables.pdf)), we can find that, e.g., multiplication takes 3 cycles, while addition takes 1, so we need a total of $3 \cdot n^3 + n^2 \cdot (n - 1) = 4 \cdot n^3 - n^2$ clock cycles for the entire computation (bluntly ignoring everything else that needs to be done to "feed" these instructions with the right data).
 
 Similar to how the sum of instruction latencies can be used as a clock-independent proxy for total execution time, computational complexity can be used to quantify the intrinsic time requirements of an abstract algorithm, without relying on the choice of a specific computer.
 
@@ -27,7 +27,7 @@ Similar to how the sum of instruction latencies can be used as a clock-independe
 
 The idea to express execution time as a function of input size seems obvious now, but it wasn't so in the 1960s. Back then, [typical computers](https://en.wikipedia.org/wiki/CDC_1604) cost millions of dollars, were so large that they required a separate room, and had clock rates measured in kilohertz. They were used for practical tasks at hand, like predicting the weather, sending rockets into space, or figuring out how far a Soviet nuclear missile can fly from the coast of Cuba — all of which are finite-length problems. Engineers of that era were mainly concerned with how to multiply $3 \times 3$ matrices rather than $n \times n$ ones.
 
-What caused the shift was the acquired confidence among computer scientists that computers will continue to become faster — and indeed they have. Over time, people stopped counting execution time, then stopped counting cycles, and then even stopped counting operations exactly, replacing it with an *estimate* that, on sufficiently large inputs, is only off by no more than a constant factor. With *asymptotic complexity*, verbose "$4 \cdot n^3 - n^2$ operations" turns into plain "$\Theta(n^3)$", hiding the initial costs of individual operations in the "Big O", along with all the other intricacies of the hardware.
+What caused the shift was the acquired confidence among computer scientists that computers will continue to become faster — and indeed they have. Over time, people stopped counting execution time, then stopped counting cycles, and then even stopped counting operations exactly, replacing it with an *estimate* that, on sufficiently large inputs, is only off by no more than a constant factor. With *asymptotic complexity*, verbose "$4 \cdot n^3 - n^2$ operations" turns into plain "$\Theta(n^3)$," hiding the initial costs of individual operations in the "Big O," along with all the other intricacies of the hardware.
 
 ![](img/complexity.jpg)
 
diff --git a/content/english/hpc/complexity/hardware.md b/content/english/hpc/complexity/hardware.md
index eed2e56d..d1c950b6 100644
--- a/content/english/hpc/complexity/hardware.md
+++ b/content/english/hpc/complexity/hardware.md
@@ -4,9 +4,9 @@ weight: 1
 ignoreIndexing: true
 ---
 
-The main disadvantage of the supercomputers of the 1960s wasn't that they were slow — relatively speaking, they weren't — but that they were giant, complex to use, and so expensive that only the governments of the world superpowers could afford them. Their size was the reason they were so expensive: they required a lot of custom components that had to be very carefully assembled in the macro-world, by people holding advanced degrees in electrical engineering, in a process that couldn't be up-scaled for mass production.
+The main disadvantage of the supercomputers of the 1960s wasn't that they were slow — relatively speaking, they weren't — but that they were giant, complex to use, and so expensive that only the governments of the world superpowers could afford them. Their size was the reason they were so expensive: they required a lot of custom components that had to be very carefully assembled in the macro-world, by people holding advanced degrees in electrical engineering, in a process that couldn't be scaled up for mass production.
 
-The turning point was the development of *microchips* — single, tiny, complete circuits — which revolutionized the industry and turned out to be probably the most important invention of the 20th century. What was a multimillion-dollar cupboard of computing machinery in 1965 could in 1975 fit on a [4×4 mm slice of silicon](https://en.wikipedia.org/wiki/MOS_Technology_6502)[^size] that you can buy for $25. This dramatic improvement in affordability started the home computer revolution during the following decade, with computers like Apple II, Atari 2600, Commodore 64, and IBM PC becoming available to the masses.
+The turning point was the development of *microchips* — single, tiny, complete circuits — which revolutionized the industry and turned out to be probably the most important invention of the 20th century. What was a multimillion-dollar cupboard of computing machinery in 1965 could in 1975 fit on a [4mm × 4mm slice of silicon](https://en.wikipedia.org/wiki/MOS_Technology_6502)[^size] that you can buy for $25. This dramatic improvement in affordability started the home computer revolution during the following decade, with computers like Apple II, Atari 2600, Commodore 64, and IBM PC becoming available to the masses.
 
 [^size]: Actual sizes of CPUs are about centimeter-scale because of power management, heat dissipation, and the need to plug it into the motherboard without excessive swearing.
 
@@ -17,7 +17,7 @@ Microchips are "printed" on a slice of crystalline silicon using a process calle
 1. growing and slicing a [very pure silicon crystal](https://en.wikipedia.org/wiki/Wafer_(electronics)),
 2. covering it with a layer of [a substance that dissolves when photons hit it](https://en.wikipedia.org/wiki/Photoresist),
 3. hitting it with photons in a set pattern,
-4. chemically [etching](https://en.wikipedia.org/wiki/Etching_(microfabrication)) the now exposed parts,
+4. chemically [etching](https://en.wikipedia.org/wiki/Etching_(microfabrication)) the now-exposed parts,
 5. removing the remaining photoresist,
 
 …and then performing another 40-50 steps over several months to complete the rest of the CPU.
@@ -46,7 +46,7 @@ Since the per-unit manufacturing cost is a function of area, and the exploitatio
 
 [^power]: The cost of electricity for running a busy server for 2-3 years roughly equals the cost of making the chip itself.
 
-Due to the trade-offs between energy and performance you can make during the design, the fidelity of the fabrication process itself, such as "180nm" or "65nm", directly translating to the density of transistors, became the trademark for CPU efficiency[^fidelity].
+Due to the trade-offs between energy and performance you can make during the design, the fidelity of the fabrication process itself, such as "180nm" or "65nm," directly translating to the density of transistors, became the trademark for CPU efficiency[^fidelity].
 
 [^fidelity]: At some point, when Moore's law started to slow down, chip makers stopped delineating their chips by the size of their components — and it is now more like a marketing term. [A special committee](https://en.wikipedia.org/wiki/International_Technology_Roadmap_for_Semiconductors) has a meeting every two years where they take the previous node name, divide it by the square root of two, round to the nearest integer, declare the result to be the new node name, and then drink lots of wine. The "nm" doesn't mean nanometer anymore.
 
@@ -56,17 +56,17 @@ Throughout most of the computing history, optical shrinking was the main driving
 
 Both Dennard scaling and Moore's law are not actual laws of physics, but just observations made by savvy engineers. They are both destined to stop at some point due to fundamental physical limitations, the ultimate one being the size of silicon atoms. In fact, Dennard scaling already did — due to power issues.
 
-Thermodynamically, a computer is just a very efficient device for converting electrical power into heat. This heat eventually needs to be removed, and there are physical limits to how much power you can dissipate from a millimeter-scale crystal. Computer engineers, aiming to maximize performance, essentially just choose the maximum possible clock rate so that the overall power consumption stays the same. If transistors become smaller, they have less capacity, meaning less required voltage to flip them, which in turn allows increasing the clock rate.
+Thermodynamically, a computer is just a very efficient device for converting electrical power into heat. This heat eventually needs to be removed, and there are physical limits to how much power you can dissipate from a millimeter-scale crystal. Computer engineers, aiming to maximize performance, essentially just choose the maximum possible clock rate so that the overall power consumption stays the same. If transistors become smaller, they have less capacitance, meaning less required voltage to flip them, which in turn allows increasing the clock rate.
 
 Around 2005–2007, this strategy stopped working because of *leakage* effects: the circuit features became so small that their magnetic fields started to make the electrons in the neighboring circuitry move in directions they are not supposed to, causing unnecessary heating and occasional bit flipping.
 
-The only way to mitigate this is to increase voltage; and to balance off power consumption you need to reduce clock frequency, which in turn makes the whole process progressively less profitable as transistor density increases. At some point, clock rates could no longer be increased by scaling, and the miniaturization trend started to slow down.
+The only way to mitigate this is to increase the voltage; and to balance off power consumption you need to reduce clock frequency, which in turn makes the whole process progressively less profitable as transistor density increases. At some point, clock rates could no longer be increased by scaling, and the miniaturization trend started to slow down.
 
 <!--
 
 ### Power Efficiency
 
-It may come as a surprise, but the primary metric for modern CPUs is not the clock frequency, but rather "useful operations per joule", or, more practically put, "useful operations per dollar".
+It may come as a surprise, but the primary metric for modern CPUs is not the clock frequency, but rather "useful operations per joule," or, more practically put, "useful operations per dollar."
 
 Thermodynamically, a computer is just a very efficient device for converting electrical power into heat. This heat eventually needs to be removed, and it's not straightforward to do when you are working with a millimeter-scale crystal. There are physical limits to how much power you can consume and then dissipate.
 
diff --git a/content/english/hpc/complexity/languages.md b/content/english/hpc/complexity/languages.md
index f3853279..abb80979 100644
--- a/content/english/hpc/complexity/languages.md
+++ b/content/english/hpc/complexity/languages.md
@@ -18,9 +18,9 @@ The real answer, of course, is much more complicated and highly dependent on wha
 
 Because of this logic, and also because of the [computation model](../) postulated in CS 101, many programmers have a misconception that computers can execute a certain number of "operations" per second, and that using different programming languages has some sort of [multiplier effect](https://benchmarksgame-team.pages.debian.net/benchmarksgame/index.html) on that number:
 
-- "you can execute about $5 \cdot 10^8$ operations per second on this machine",
-- "C is 2 times faster than Java",
-- "Python is 100x slower than C++".
+- "you can execute about $5 \cdot 10^8$ operations per second on this machine,"
+- "C is 2 times faster than Java,"
+- "Python is 100x slower than C++."
 
 -->
 
@@ -47,7 +47,7 @@ Since running machine code in an interpreter doesn't make sense, this makes a to
 - Compiled languages with a runtime, such as Java, C#, or Erlang (and languages that work on their VMs, such as Scala, F#, or Elixir).
 - Compiled native languages, such as C, Go, or Rust.
 
-There is no "right" way of executing computer programs: each approach has its own gains and drawbacks. Interpreters and virtual machines provide flexibility and enable some nice high-level programming features such as dynamic typing, run-time code alteration, and automatic memory management, but this comes with some unavoidable performance trade-offs, which we will now talk about.
+There is no "right" way of executing computer programs: each approach has its own gains and drawbacks. Interpreters and virtual machines provide flexibility and enable some nice high-level programming features such as dynamic typing, run time code alteration, and automatic memory management, but these come with some unavoidable performance trade-offs, which we will now talk about.
 
 ### Interpreted languages
 
@@ -94,7 +94,7 @@ This is not surprising if you consider the things that Python needs to do to fig
 - looks up its type, figures out that it's a `float`, and fetches the method implementing `*` operator;
 - does the same things for `b` and `c` and finally add-assigns the result to `c[i][j]`.
 
-Granted, the interpreters of widely-used languages such as Python are well-optimized, and they can skip through some of these steps on repeated execution of the same code. But still, some quite significant overhead is unavoidable due to the language design. If we get rid of all this type checking and pointer chasing, perhaps we can get cycles per multiplication ratio closer to 1, or whatever the "cost" of native multiplication is?
+Granted, the interpreters of widely used languages such as Python are well-optimized, and they can skip through some of these steps on repeated execution of the same code. But still, some quite significant overhead is unavoidable due to the language design. If we get rid of all this type checking and pointer chasing, perhaps we can get cycles per multiplication ratio closer to 1, or whatever the "cost" of native multiplication is?
 
 ### Managed Languages
 
@@ -204,7 +204,7 @@ print(duration)
 
 Now it takes ~0.12 seconds: a ~5x speedup over the auto-vectorized C version and ~5250x speedup over our initial Python implementation!
 
-You don't typically see such dramatic improvements. For now, we are not ready to tell you exactly how this is achieved. Implementations of dense matrix multiplication in OpenBLAS are typically [5000 lines of handwritten assembly](https://github.com/xianyi/OpenBLAS/blob/develop/kernel/x86_64/dgemm_kernel_16x2_haswell.S) tailored separately for *each* architecture. In later chapters, we will explain all the relevant techniques one by one, and then return to this example and develop our own BLAS-level implementation using just under 40 lines of C.
+You don't typically see such dramatic improvements. For now, we are not ready to tell you exactly how this is achieved. Implementations of dense matrix multiplication in OpenBLAS are typically [5000 lines of handwritten assembly](https://github.com/xianyi/OpenBLAS/blob/develop/kernel/x86_64/dgemm_kernel_16x2_haswell.S) tailored separately for *each* architecture. In later chapters, we will explain all the relevant techniques one by one, and then [return](/hpc/algorithms/matmul) to this example and develop our own BLAS-level implementation using just under 40 lines of C.
 
 ### Takeaway
 
diff --git a/content/english/hpc/complexity/levels.md b/content/english/hpc/complexity/levels.md
index d0757754..9a792917 100644
--- a/content/english/hpc/complexity/levels.md
+++ b/content/english/hpc/complexity/levels.md
@@ -30,13 +30,55 @@ You get especially frustrated if you had a competitive programming experience. Y
 
 Programmers can be put in several "levels" in terms of their software optimization abilities:
 
-0. "Newbie". Those who don't think about performance at all. They usually write in high-level languages, sometimes in declarative / functional languages. Most "programmers" stay there (and there is nothing wrong with it).
-1. "Undergraduate student". Those who know about Big O notation and are familiar with basic data structures and approaches. LeetCode and CodeForces folks are there. This is also the requirement in getting into big companies — they have a lot of in-house software, large scale, and they are looking for people in the long term, so asking things like programming language.
-2. "Graduate student". Those who know that not all operations are created equal; know other cost models such as external memory model (B-tree, external sorting), word model (bitset,) or parallel computing, but still in theory.
-3. "Professional developer". Those who know actual timings of these operations. Aware that branch mispredictions are costly, memory is split into cache lines. Knows some basic SIMD techniques. 
-4. "Performance engineer". Know exactly what happens inside their hardware. Know the difference between latency and bandwidth, know about ports. Knows how to use SIMD and the rest of instruction set effectively. Can read assembly and use profilers.
-5. "Intel employee". Knows microarchitecture-specific details. This is outside of the purview of normal engineers.
+0. *Newbie*. Those who don't think about performance at all. They usually write in high-level languages, sometimes in declarative / functional languages. Most "programmers" stay there (and there is nothing wrong with it).
+1. *Undergraduate student*. Those who know about Big O notation and are familiar with basic data structures and approaches. LeetCode and CodeForces folks are there. This is also the requirement in getting into big companies — they have a lot of in-house software, large scale, and they are looking for people in the long term, so asking things like programming language.
+2. *Graduate student*. Those who know that not all operations are created equal; know other cost models such as external memory model (B-tree, external sorting), word model (bitset,) or parallel computing, but still in theory.
+3. *Professional developer*. Those who know actual timings of these operations. Aware that branch mispredictions are costly, memory is split into cache lines. Knows some basic SIMD techniques. 
+4. *Performance engineer*. Know exactly what happens inside their hardware. Know the difference between latency and bandwidth, know about ports. Knows how to use SIMD and the rest of instruction set effectively. Can read assembly and use profilers.
+5. *Intel employee*. Knows microarchitecture-specific details. This is outside of the purview of normal engineers.
 
 In this book, we expect that the average reader is somewhere around stage 1, and hopefully by the end of it will get to 4.
 
 You should also go through these levels when designing algorithms. First get it working in the first place, then select a bunch of reasonably asymptotically optimal algorithm. Then think about how they are going to work in terms of their memory operations or ability to execute in parallel (even if you consider single-threaded programs, there is still going to be plenty of parallelism inside a core, so this model is extremely ), and then proceed toward actual implementation. Avoid premature optimization, as Knuth once said.
+
+---
+
+For most web services, efficiency doesn't matter, but *latency* does.
+
+Increasing efficiency is not how it is done nowadays.
+
+A pageview usually generates somewhere on the order of 0.1 to 1 cent per pageview. This is a typical rate at which you monetize user attention. Say, if I simply installed AdSense, i'd be getting something like that — depending on where most of my readers are from and how many of them are using an ad blocker.
+
+At the same time, a server with a dedicated core and 1GB of ram (which is an absurdly large amount of resources for a simple web service) costs around one millionth per second when amortized. You could fetch 100 photos with that.
+
+Amazon had an experiment where they A/B tested their service with artificial delays and found out that a 100ms delay decreased revenue. This follows for most other services, say, you lose your "flow" at twitter, the user is likely to start thinking on something else and leave. If the delay at Google is more than a few seconds, people will just think that Google isn't working and quit.
+
+Minimization of latency can be usually done with parallel computing, which is why distributed systems are scaled more on scalability. This part of the book is concerned with improving *efficiency* of algorithms, which makes latency lower as the by-product.
+
+However, there are still use cases when there is a trade-off between quality and cost of servers.
+
+- Search is hierarchical. There are usually many layers of more accurate but slower models. The more documents you rank on each layer, the better the final quality.
+- Games. They are more enjoyable on large scale, but computational power also increases. This includes AI.
+- AI workloads — those that have large quantities of data such as language models. Heavier models require more compute. The bottleneck in them is not the number of data, but efficiencty.
+
+Inherently sequential algorithms, or cases when the resources are constrained. Ctrl+f'ing a large PDF is painful. Factorization.
+
+## Estimating the impact
+
+Sometime the optimization needs to happen in the calling layer.
+
+SIMDJSON speeds up JSON parsing, but it may be better to not use JSON in the first place.
+
+Protobuf or flat binary formats.
+
+There is also a chicken and egg problem: people don't use an approach that much because it is slow and not feasible.
+
+Cost to implement, bugs, maintainability. It is perfectly fine that most software in the world is inefficient.
+
+What does it mean to be a better programmer? Faster programs? Faster speed of work? Fewer bugs? It is a combination of those.
+
+Implementing compiler optimizations or databases are examples of high-leverage activities because they act as a tax on everything else — which is why you see most people writing books on these particular topics rather than software optimization in general.
+
+---
+
+Factorization is kind of useless by itself, but it helps with understanding how to optimize number theoretic computations in general. Same goes for sorting and binary trees: most people hold some metainformation.
diff --git a/content/english/hpc/cpu-cache/_index.md b/content/english/hpc/cpu-cache/_index.md
index 484a39dc..ef1bbd6f 100644
--- a/content/english/hpc/cpu-cache/_index.md
+++ b/content/english/hpc/cpu-cache/_index.md
@@ -5,7 +5,7 @@ weight: 9
 
 In the [previous chapter](../external-memory), we studied computer memory from a theoretical standpoint, using the [external memory model](../external-memory/model) to estimate the performance of memory-bound algorithms.
 
-While it is more or less accurate for computations involving HDDs and network storage, where in-memory arithmetic is negligibly fast compared to the external I/O operations, it is too imprecise for lower levels in the cache hierarchy, where the costs of these operations become comparable.
+While the external memory model is more or less accurate for computations involving HDDs and network storage, where cost of arithmetic operations on in-memory values is negligible compared to external I/O operations, it is too imprecise for lower levels in the cache hierarchy, where the costs of these operations become comparable.
 
 To perform more fine-grained optimization of in-memory algorithms, we have to start taking into account the many specific details of the CPU cache system. And instead of studying loads of boring Intel documents with dry specs and theoretically achievable limits, we will estimate these parameters experimentally by running numerous small benchmark programs with access patterns that resemble the ones that often occur in practical code.
 
@@ -34,7 +34,7 @@ Although the CPU can be clocked at 4.1GHz in boost mode, we will perform most ex
 
 -->
 
-Due to difficulties in [refraining the compiler from cheating](/hpc/profiling/noise/), the code snippets in this article are slightly simplified for exposition purposes. Check the [code repository](https://github.com/sslotin/amh-code/tree/main/cpu-cache) if you want to reproduce them yourself.
+Due to difficulties in [preventing the compiler from optimizing away unused values](/hpc/profiling/noise/), the code snippets in this article are slightly simplified for exposition purposes. Check the [code repository](https://github.com/sslotin/amh-code/tree/main/cpu-cache) if you want to reproduce them yourself.
 
 ### Acknowledgements
 
diff --git a/content/english/hpc/cpu-cache/alignment.md b/content/english/hpc/cpu-cache/alignment.md
index 32c54b6d..e9c5f4d3 100644
--- a/content/english/hpc/cpu-cache/alignment.md
+++ b/content/english/hpc/cpu-cache/alignment.md
@@ -33,7 +33,7 @@ struct alignas(64) Data {
 };
 ```
 
-Whenever an instance of `Data` is allocated, it will be at the beginning of a cache line. The downside is that the effective size of the structure will be rounded up to the nearest multiple of 64 bytes. This has to be done so that, e. g. when allocating an array of `Data`, not just the first element is properly aligned.
+Whenever an instance of `Data` is allocated, it will be at the beginning of a cache line. The downside is that the effective size of the structure will be rounded up to the nearest multiple of 64 bytes. This has to be done so that, e.g., when allocating an array of `Data`, not just the first element is properly aligned.
 
 ### Structure Alignment
 
@@ -77,7 +77,7 @@ This potentially wastes space but saves a lot of CPU cycles. This trade-off is m
 
 ### Optimizing Member Order
 
-Padding is only inserted before a not-yet-aligned member or at the end of the structure. By changing the ordering of members in a structure, it is possible to change the required amount of padding bytes and the total size of the structure.
+Padding is only inserted before a not-yet-aligned member or at the end of the structure. By changing the ordering of members in a structure, it is possible to change the required number of padding bytes and the total size of the structure.
 
 In the previous example, we could reorder the structure members like this:
 
@@ -94,7 +94,7 @@ Now, each of them is aligned without any padding, and the size of the structure
 
 As a rule of thumb, place your type definitions from largest data types to smallest — this greedy algorithm is guaranteed to work unless you have some weird non-power-of-two type sizes such as the [10-byte](/hpc/arithmetic/ieee-754#float-formats) `long double`[^extended].
 
-[^extended]: The 80-bit `long double` takes *at least* 10 bytes, but the exact format is up to the compiler — e. g. it may pad it to 12 or 16 bytes to minimize alignment issues (64-bit GCC and Clang use 16 bytes by default; you can override this by specifying one of `-mlong-double-64/80/128` or `-m96/128bit-long-double` [options](https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html)).
+[^extended]: The 80-bit `long double` takes *at least* 10 bytes, but the exact format is up to the compiler — for example, it may pad it to 12 or 16 bytes to minimize alignment issues (64-bit GCC and Clang use 16 bytes by default; you can override this by specifying one of `-mlong-double-64/80/128` or `-m96/128bit-long-double` [options](https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html)).
 
 <!--
 
@@ -185,4 +185,4 @@ int load(int *p) {
 }
 ```
 
-Compilers usually don't do that because this is not technically always legal: that 4th byte may be on a memory page that you don't own, so the operating system won't let you load it even if you are going to discard it right away.
+Compilers usually don't do that because it's technically not legal: that 4th byte may be on a memory page that you don't own, so the operating system won't let you load it even if you are going to discard it right away.
diff --git a/content/english/hpc/cpu-cache/aos-soa.md b/content/english/hpc/cpu-cache/aos-soa.md
index 048271db..d5765339 100644
--- a/content/english/hpc/cpu-cache/aos-soa.md
+++ b/content/english/hpc/cpu-cache/aos-soa.md
@@ -99,8 +99,8 @@ As the performance on smaller arrays sizes is not affected, this clearly has som
 From the performance analysis point of view, all data in RAM is physically stored in a two-dimensional array of tiny capacitor cells, which is split into rows and columns. To read or write any cell, you need to perform one, two, or three actions:
 
 1. Read the contents of a row in a *row buffer*, which temporarily discharges the capacitors. 
-2. Read or write a specific column in this buffer.
-3. Write the contents of a row buffer back into the capacitors, so that the data is preserved, and the row buffer can be used for other memory accesses.
+2. Read or write a specific cell in this buffer.
+3. Write the contents of a row buffer back into the capacitors so that the data is preserved and the row buffer can be used for other memory accesses.
 
 Here is the punchline: you don't have to perform steps 1 and 3 between two memory accesses that correspond to the same row — you can just use the row buffer as a temporary cache. These three actions take roughly the same time, so this optimization makes long sequences of row-local accesses run thrice as fast compared to dispersed access patterns.
 
diff --git a/content/english/hpc/cpu-cache/associativity.md b/content/english/hpc/cpu-cache/associativity.md
index ee3203cf..b9f278ee 100644
--- a/content/english/hpc/cpu-cache/associativity.md
+++ b/content/english/hpc/cpu-cache/associativity.md
@@ -95,7 +95,7 @@ along with a "tag" information which helps identify which block it is
 Performance issues caused by cache associativity effects arise with remarkable frequency in algorithms because, for multiple reasons, programmers just love using powers of two when indexing arrays:
 
 - It is easier to calculate the address for multi-dimensional array accesses if the last dimension is a power of two, as it only requires a binary shift instead of a multiplication.
-- It is easier to calculate modulo a power of two, as it can be done with a single bitwise "and".
+- It is easier to calculate modulo a power of two, as it can be done with a single bitwise `and`.
 - It is convenient and often even necessary to use power-of-two problem sizes in divide-and-conquer algorithms.
 - It is the smallest integer exponent, so using the sequence of increasing powers of two as problem sizes are a popular choice when benchmarking memory-bound algorithms.
 - Also, more natural powers of ten are by transitivity divisible by a slightly lower power of two.
diff --git a/content/english/hpc/cpu-cache/bandwidth.md b/content/english/hpc/cpu-cache/bandwidth.md
index 8a39f862..472a5689 100644
--- a/content/english/hpc/cpu-cache/bandwidth.md
+++ b/content/english/hpc/cpu-cache/bandwidth.md
@@ -1,6 +1,7 @@
 ---
 title: Memory Bandwidth
 weight: 1
+published: true
 ---
 
 On the data path between the CPU registers and the RAM, there is a hierarchy of *caches* that exist to speed up access to frequently used data: the layers closer to the processor are faster but also smaller in size. The word "faster" here applies to two closely related but separate timings:
@@ -20,7 +21,7 @@ for (int t = 0; t < K; t++)
         a[i]++;
 ```
 
-Changing $N$ and adjusting $K$ so that the total number of array cells accessed remains roughly constant and expressing the total time in "operations per second", we get a graph like this:
+Changing $N$ and adjusting $K$ so that the total number of array cells accessed remains roughly constant and expressing the total time in "operations per second," we get a graph like this:
 
 ![Dotted vertical lines are cache layer sizes](../img/inc.svg)
 
@@ -37,29 +38,29 @@ All CPU cache layers are placed on the same microchip as the processor, so the b
 
 ![](../img/boost.svg)
 
-This detail comes into play when comparing algorithm implementations. Unless the dataset fits entirely in the cache, the relative performance of the two implementations may be different depending on the CPU clock rate because the RAM remains unaffected by it, while everything else does.
+This detail comes into play when comparing algorithm implementations. When the working dataset fits in the cache, the relative performance of the two implementations may be different depending on the CPU clock rate because the RAM remains unaffected by it (while everything else does not).
 
 For this reason, it is [advised](/hpc/profiling/noise) to keep the clock rate fixed, and as the turbo boost isn't stable enough, we run most of the benchmarks in this book at plain 2GHz.
 
 ### Directional Access
 
-On each iteration, we need to fetch a value, increment it, and then write it back — so this loop simultaneously performs both reads and writes during its execution. For many applications we only need to do one of them, so let's try to measure one-directional bandwidth.
+This incrementing loop needs to perform both reads and writes during its execution: on each iteration, we fetch a value, increment it, and then write it back. In many applications, we only need to do one of them, so let’s try to measure unidirectional bandwidth.
 
-An array sum would only require memory reads:
+Calculating the sum of an array only requires memory reads:
 
 ```c++
 for (int i = 0; i < N; i++)
     s += a[i];
 ```
 
-And zeroing an array or filling it with any other value would only require memory writes:
+And zeroing an array (or filling it with any other constant value) only requires memory writes:
 
 ```c++
 for (int i = 0; i < N; i++)
     a[i] = 0;
 ```
 
-Both loops are trivially vectorized by the compiler, and the second one is actually replaced with `memset`, so the CPU is also not the bottleneck here, except when the array fits into the L1 cache.
+Both loops are trivially [vectorized](/hpc/simd) by the compiler, and the second one is actually replaced with a `memset`, so the CPU is also not the bottleneck here (except when the array fits into the L1 cache).
 
 ![](../img/directional.svg)
 
@@ -95,12 +96,14 @@ Non-temporal memory reads or writes are a way to tell the CPU that we won't be n
 
 On the one hand, if the array is small enough to fit into the cache, and we actually access it some short time after, this has a negative effect because we have to read entirely it from the RAM (or, in this case, we have to *write* it into the RAM instead of using a locally cached version). And on the other, this prevents read-backs and lets us use the memory bus more efficiently.
 
-In fact, the performance increase in the case of the RAM is even more than 2x and faster than the read-only benchmark. The best explanation I have is that it is because:
+In fact, the performance increase in the case of the RAM is even more than 2x and faster than the read-only benchmark. This happens because:
 
 - the memory controller doesn't have to switch the bus between read and write modes this way;
 - the instruction sequence becomes simpler, allowing for more pending memory instructions;
-- and, perhaps most importantly, the cache system can simply "fire and forget" non-temporal write requests, while for reads it needs to remember what to do with the data once it arrives — similar to connection handles in networking software.
+- and, most importantly, the memory controller can simply "fire and forget" non-temporal write requests — while for reads, it needs to remember what to do with the data once it arrives (similar to connection handles in networking software).
+
+Theoretically, both requests should use the same bandwidth: a read request sends an address and gets data, and a non-temporal write request sends an address *with* data and gets nothing. Not accounting for the direction, we transmit the same data, but the read cycle will be longer because it needs to wait for the data to be fetched. Since [there is a practical limit](../mlp) on how many concurrent requests the memory system can handle, this difference in read/write cycle latency also results in the difference in their bandwidth.
 
 Also, for these reasons, a single CPU core usually [can't fully saturate the memory bandwidth](../sharing).
 
-The same technique generalizes to `memcpy`: it also just moves 32-byte blocks with SIMD load/store instructions, and it can be similarly made non-temporal, increasing the throughput twofold for large arrays. There is also a non-temporal load instruction (`_mm256_stream_load_si256`) for when you want to *read* without polluting cache (e. g. when you don't need the original array after a `memcpy`, but will need some data that you had accessed before calling it).
+The same technique generalizes to `memcpy`: it also just moves 32-byte blocks with SIMD load/store instructions, and it can be similarly made non-temporal, increasing the throughput twofold for large arrays. There is also a non-temporal load instruction (`_mm256_stream_load_si256`) for when you want to *read* without polluting cache (e.g., when you don't need the original array after a `memcpy`, but will need some data that you had accessed before calling it).
diff --git a/content/english/hpc/cpu-cache/mlp.md b/content/english/hpc/cpu-cache/mlp.md
index 11c5b660..95dfa4cb 100644
--- a/content/english/hpc/cpu-cache/mlp.md
+++ b/content/english/hpc/cpu-cache/mlp.md
@@ -3,7 +3,7 @@ title: Memory-Level Parallelism
 weight: 5
 ---
 
-Memory requests can overlap in time: while you wait for a read request to complete, you can send a few others, which will be executed concurrently with it. This is the reason why [linear iteration](../bandwidth) is so much faster than [pointer jumping](../latency): the CPU knows which memory locations it needs to fetch next and sends memory requests far ahead of time.
+Memory requests can overlap in time: while you wait for a read request to complete, you can send a few others, which will be executed concurrently with it. This is the main reason why [linear iteration](../bandwidth) is so much faster than [pointer jumping](../latency): the CPU knows which memory locations it needs to fetch next and sends memory requests far ahead of time.
 
 The number of concurrent memory operations is large but limited, and it is different for different types of memory. When designing algorithms and especially data structures, you may want to know this number, as it limits the amount of parallelism your computation can achieve.
 
diff --git a/content/english/hpc/cpu-cache/paging.md b/content/english/hpc/cpu-cache/paging.md
index fad39a54..8320d437 100644
--- a/content/english/hpc/cpu-cache/paging.md
+++ b/content/english/hpc/cpu-cache/paging.md
@@ -53,7 +53,7 @@ always [madvise] never
 #include <sys/mman.h>
 
 void *ptr = std::aligned_alloc(page_size, array_size);
-madvise(pre, array_size, MADV_HUGEPAGE);
+madvise(ptr, array_size, MADV_HUGEPAGE);
 ```
 
 You can only request a memory region to be allocated using huge pages if it has the corresponding alignment.
@@ -81,7 +81,7 @@ Enabling huge pages also improves [latency](../latency) by up to 10-15% for arra
 
 In general, enabling huge pages is a good idea when you have any sort of sparse reads, as they usually slightly improve and ([almost](../aos-soa)) never hurt performance.
 
-That said, you shouldn't rely on huge pages if possible, as they aren't always available due to either hardware or computing environment restrictions. There are [many](../cache-lines) [other](../hw-prefetching) [reasons](../aos-soa) why grouping data accesses spatially may be beneficial, which automatically solves the paging problem.
+That said, you shouldn't rely on huge pages if possible, as they aren't always available due to either hardware or computing environment restrictions. There are [many](../cache-lines) [other](../prefetching) [reasons](../aos-soa) why grouping data accesses spatially may be beneficial, which automatically solves the paging problem.
 
 <!--
 
@@ -94,7 +94,7 @@ For sparse reads, it often makes sense to increase page size, which improves the
 
 Typical size of a page is 4KB, but it can be up to 1G or so for large databases, but enabling it by default is not a good idea as scenarios when we have a VPS with 256M or RAM and more than 256 processes are not uncommon.
 
-Typical page sizes are 4K, 2M and 1G (e. g. allowing for 256K, 128M, 64G memory regions to be stored in a 64-entry L1 TLB respectively).
+Typical page sizes are 4K, 2M and 1G (e.g., allowing for 256K, 128M, 64G memory regions to be stored in a 64-entry L1 TLB respectively).
 
 
 - There are other types of cache inside CPUs that are used for things other than data. The most important for us are *instruction cache* (I-cache), which is used to speed up the fetching of machine code from memory, and *translation lookaside buffer* (TLB), which is used to store physical locations of virtual memory pages, which is instrumental to the efficiency of virtual memory.
diff --git a/content/english/hpc/cpu-cache/prefetching.md b/content/english/hpc/cpu-cache/prefetching.md
index 8ccdea6b..4f5a7545 100644
--- a/content/english/hpc/cpu-cache/prefetching.md
+++ b/content/english/hpc/cpu-cache/prefetching.md
@@ -30,9 +30,9 @@ for (int i = 0; i + 16 < N; i += 16) {
 }
 ```
 
-There is no point in making a graph because the latency is flat: 3ns regardless of the array size. Even though the instruction scheduler still can't tell what we are going to fetch next, the memory prefetcher can detect a pattern just by looking at the memory accesses and start loading the next cache line ahead of time, leveling out its latency.
+There is no point in making a graph because it would be just flat: the latency is 3ns regardless of the array size. Even though the instruction scheduler still can't tell what we are going to fetch next, the memory prefetcher can detect a pattern just by looking at the memory accesses and start loading the next cache line ahead of time, mitigating the latency.
 
-Hardware prefetching is usually powerful enough for most cases, but it only detects simple patterns. You can iterate forward and backward over multiple arrays in parallel, perhaps with small-to-medium strides, but that's about it. For anything more complex, the prefetcher won't figure out what's happening, and we need to help it out ourselves.
+Hardware prefetching is smart enough for most use cases, but it only detects simple patterns. You can iterate forward and backward over multiple arrays in parallel, perhaps with small-to-medium strides, but that's about it. For anything more complex, the prefetcher won't figure out what's happening, and we need to help it out ourselves.
 
 ### Software Prefetching
 
@@ -70,7 +70,7 @@ There is some overhead to computing the next address, but for arrays large enoug
 
 ![](../img/sw-prefetch.svg)
 
-Interestingly, we can prefetch more than just two elements ahead, making use of this pattern in the LCG function:
+Interestingly, we can prefetch more than just one element ahead, making use of this pattern in the LCG function:
 
 $$
 \begin{aligned}
@@ -82,17 +82,17 @@ $$
 \end{aligned}
 $$
 
-Hence, in order to load `D` elements ahead, we can do this:
+Hence, to load the `D`-th element ahead, we can do this:
 
 ```cpp
 __builtin_prefetch(&q[((1 << D) * k + (1 << D) - 1) % n]);
 ```
 
-Ignoring some issues such as the integer overflow, this way we can reduce the latency arbitrarily close to the cost of computing the next index (which in this case is dominated by the [modulo operation](/hpc/arithmetic/division)).
+If we execute this request on every iteration, we will be simultaneously prefetching `D` elements ahead on average, increasing the throughput by `D` times. Ignoring some issues such as the integer overflow when `D` is too large, we can reduce the average latency arbitrarily close to the cost of computing the next index (which, in this case, is dominated by the [modulo operation](/hpc/arithmetic/division)).
 
 ![](../img/sw-prefetch-others.svg)
 
-Note that this is an artificial example, and you actually fail more often than not when trying to insert software prefetching into practical programs. This is largely due to the fact that you need to issue a separate memory instruction that may compete for resources with the others. At the same time, hardware prefetching is 100% harmless as it only activates when the memory and cache buses are not busy.
+Note that this is an artificial example, and you actually fail more often than not when trying to insert software prefetching into practical programs. This is largely because you need to issue a separate memory instruction that may compete for resources with the others. At the same time, hardware prefetching is 100% harmless as it only activates when the memory and cache buses are not busy.
 
 You can also specify a specific level of cache the data needs to be brought to when doing software prefetching — when you aren't sure if you will be using it and don't want to kick out what is already in the L1 cache. You can use it with the `_mm_prefetch` intrinsic, which takes an integer value as the second parameter, specifying the cache level. This is useful in combination with [non-temporal loads and stores](../bandwidth#bypassing-the-cache).
 
diff --git a/content/english/hpc/data-structures/b-tree.md b/content/english/hpc/data-structures/b-tree.md
index 25440bd0..0189a185 100644
--- a/content/english/hpc/data-structures/b-tree.md
+++ b/content/english/hpc/data-structures/b-tree.md
@@ -1,7 +1,399 @@
 ---
 title: Search Trees
-weight: 4
-draft: true
+weight: 3
 ---
 
-...
+In the [previous article](../s-tree), we designed and implemented *static* B-trees to speed up binary searching in sorted arrays. In its [last section](../s-tree/#as-a-dynamic-tree), we briefly discussed how to make them *dynamic* back while retaining the performance gains from [SIMD](/hpc/simd) and validated our predictions by adding and following explicit pointers in the internal nodes of the S+ tree.
+
+In this article, we follow up on that proposition and design a minimally functional search tree for integer keys, [achieving](#evaluation) up to 18x/8x speedup over `std::set` and up to 7x/2x speedup over [`absl::btree`](https://abseil.io/blog/20190812-btree) for `lower_bound` and `insert` queries, respectively — with yet ample room for improvement.
+
+The memory overhead of the structure is around 30% for 32-bit integers, and the final implementation is [under 150 lines of C++](https://github.com/sslotin/amh-code/blob/main/b-tree/btree-final.cc). It can be easily generalized to other arithmetic types and small/fixed-length strings such as hashes, country codes, and stock symbols.
+
+<!--
+
+7-18x/3-8x speedup over `std::set` and 3-7x/1.5-2x
+
+that we call *B− tree*
+
+-->
+
+## B− Tree
+
+Instead of making small incremental improvements like we usually do in other case studies, in this article, we will implement just one data structure that we name *B− tree*, which is based on the [B+ tree](../s-tree/#b-tree-layout-1), with a few minor differences:
+
+- Nodes in the B− tree do not store pointers or any metadata except for the pointers to internal node children (while the B+ tree leaf nodes store a pointer to the next leaf node). This lets us perfectly place the keys in the leaf nodes on cache lines.
+- We define key $i$ to be the *maximum* key in the subtree of the child $i$ instead of the *minimum* key in the subtree of the child $(i + 1)$. This lets us not fetch any other nodes after we reach a leaf (in the B+ tree, all keys in the leaf node may be less than the search key, so we need to go to the next leaf node to fetch its first element).
+
+We also use a node size of $B=32$, which is smaller than typical. The reason why it is not $16$, which was [optimal for the S+ tree](../s-tree/#modifications-and-further-optimizations), is because we have the additional overhead associated with fetching the pointer, and the benefit of reducing the tree height by ~20% outweighs the cost of processing twice the elements per node, and also because it improves the running time of the `insert` query that needs to perform a costly node split every $\frac{B}{2}$ insertions on average.
+
+<!--
+
+We will discuss other node sizes later.
+
+This is needed simd to be efficient (we will discuss other node sizes later).
+
+There is some overhead, so it makes sense to use more than one cache line.
+
+Analogous to the B+ tree,
+
+-->
+
+### Memory Layout
+
+Although this is probably not the best approach in terms of software engineering, we will simply store the entire tree in a large pre-allocated array, without discriminating between leaves and internal nodes:
+
+```c++
+const int R = 1e8;
+alignas(64) int tree[R];
+```
+
+We also pre-fill this array with infinities to simplify the implementation:
+
+```c++
+for (int i = 0; i < R; i++)
+    tree[i] = INT_MAX;
+```
+
+(In general, it is technically cheating to compare against `std::set` or other structures that use `new` under the hood, but memory allocation and initialization are not the bottlenecks here, so this does not significantly affect the evaluation.)
+
+Both nodes types store their keys sequentially in sorted order and are identified by the index of its first key in the array:
+
+- A leaf node has up to $(B - 1)$ keys but is padded to $B$ elements with infinities.
+- An internal node has up to $(B - 2)$ keys padded to $B$ elements and up to $(B - 1)$ indices of its child nodes, also padded to $B$ elements.
+
+These design decisions are not arbitrary:
+
+- The padding ensures that leaf nodes occupy exactly 2 cache lines and internal nodes occupy exactly 4 cache lines.
+- We specifically use [indices instead of pointers](/hpc/cpu-cache/pointers/) to save cache space and make moving them with SIMD faster.  
+  (We will use "pointer" and "index" interchangeably from now on.)
+- We store indices right after the keys even though they are stored in separate cache lines because [we have reasons](/hpc/cpu-cache/aos-soa/).
+- We intentionally "waste" one array cell in leaf nodes and $2+1=3$ cells in internal nodes because we need it to store temporary results during a node split.
+
+Initially, we only have one empty leaf node as the root:
+
+```c++
+const int B = 32;
+
+int root = 0;   // where the keys of the root start
+int n_tree = B; // number of allocated array cells
+int H = 1;      // current tree height
+```
+
+To "allocate" a new node, we simply increase `n_tree` by $B$ if it is a leaf node or by $2 B$ if it is an internal node. 
+
+Since new nodes can only be created by splitting a full node, each node except for the root will be at least half full. This implies that we need between 4 and 8 bytes per integer element (the internal nodes will contribute $\frac{1}{16}$-th or so to that number), the former being the case when the inserts are sequential, and the latter being the case when the input is adversarial. When the queries are uniformly distributed, the nodes are ~75% full on average, projecting to ~5.2 bytes per element.
+
+B-trees are very memory-efficient compared to the pointer-based binary trees. For example, `std::set` needs at least three pointers (the left child, the right child, and the parent), alone costing $3 \times 8 = 24$ bytes, plus at least another $8$ bytes to store the key and the meta-information due to [structure padding](/hpc/cpu-cache/alignment/).
+
+### Searching
+
+It is a very common scenario when >90% of operations are lookups, and even if this is not the case, every other tree operation typically begins with locating a key anyway, so we will start with implementing and optimizing the searches.
+
+When we implemented [S-trees](../s-tree/#optimization), we ended up storing the keys in permuted order due to the intricacies of how the blending/packs instructions work. For the *dynamic tree* problem, storing the keys in permuted order would make inserts much harder to implement, so we will change the approach instead.
+
+An alternative way to think about finding the would-be position of the element `x` in a sorted array is not "the index of the first element that is not less than `x`" but "the number of elements that are less than `x`." This observation generates the following idea: compare the keys against `x`, aggregate the vector masks into a 32-bit mask (where each bit can correspond to any element as long as the mapping is bijective), and then call `popcnt` on it, returning the number of elements less than `x`.
+
+This trick lets us perform the local search efficiently and without requiring any shuffling:
+
+```c++
+typedef __m256i reg;
+
+reg cmp(reg x, int *node) {
+    reg y = _mm256_load_si256((reg*) node);
+    return _mm256_cmpgt_epi32(x, y);
+}
+
+// returns how many keys are less than x
+unsigned rank32(reg x, int *node) {
+    reg m1 = cmp(x, node);
+    reg m2 = cmp(x, node + 8);
+    reg m3 = cmp(x, node + 16);
+    reg m4 = cmp(x, node + 24);
+
+    // take lower 16 bits from m1/m3 and higher 16 bits from m2/m4
+    m1 = _mm256_blend_epi16(m1, m2, 0b01010101);
+    m3 = _mm256_blend_epi16(m3, m4, 0b01010101);
+    m1 = _mm256_packs_epi16(m1, m3); // can also use blendv here, but packs is simpler
+
+    unsigned mask = _mm256_movemask_epi8(m1);
+    return __builtin_popcount(mask);    
+}
+```
+
+Note that, because of this procedure, we have to pad the "key area" with infinities, which prevents us from storing metadata in the vacated cells (unless we are also willing to spend a few cycles to mask it out when loading a SIMD lane).
+
+Now, to implement `lower_bound`, we can descend the tree just like we did in the S+ tree, but fetching the pointer after we compute the child number:
+
+```c++
+int lower_bound(int _x) {
+    unsigned k = root;
+    reg x = _mm256_set1_epi32(_x);
+    
+    for (int h = 0; h < H - 1; h++) {
+        unsigned i = rank32(x, &tree[k]);
+        k = tree[k + B + i];
+    }
+
+    unsigned i = rank32(x, &tree[k]);
+
+    return tree[k + i];
+}
+```
+
+Implementing search is easy, and it doesn't introduce much overhead. The hard part is implementing insertion.
+
+### Insertion
+
+On the one side, correctly implementing insertion takes a lot of code, but on the other, most of that code is executed very infrequently, so we don't have to care about its performance that much. Most often, all we need to do is to reach the leaf node (which we've already figured out how to do) and then insert a new key into it, moving some suffix of the keys one position to the right. Occasionally, we also need to split the node and/or update some ancestors, but this is relatively rare, so let's focus on the most common execution path first.
+
+To insert a key into an array of $(B - 1)$ sorted elements, we can load them in vector registers and then [mask-store](/hpc/simd/masking) them one position to the right using a [precomputed](/hpc/compilation/precalc/) mask that tells which elements need to be written for a given `i`:
+
+```c++
+struct Precalc {
+    alignas(64) int mask[B][B];
+
+    constexpr Precalc() : mask{} {
+        for (int i = 0; i < B; i++)
+            for (int j = i; j < B - 1; j++)
+                // everything from i to B - 2 inclusive needs to be moved
+                mask[i][j] = -1;
+    }
+};
+
+constexpr Precalc P;
+
+void insert(int *node, int i, int x) {
+    // need to iterate right-to-left to not overwrite the first element of the next lane
+    for (int j = B - 8; j >= 0; j -= 8) {
+        // load the keys
+        reg t = _mm256_load_si256((reg*) &node[j]);
+        // load the corresponding mask
+        reg mask = _mm256_load_si256((reg*) &P.mask[i][j]);
+        // mask-write them one position to the right
+        _mm256_maskstore_epi32(&node[j + 1], mask, t);
+    }
+    node[i] = x; // finally, write the element itself
+}
+```
+
+This [constexpr magic](/hpc/compilation/precalc/) is the only C++ feature we use.
+
+There are other ways to do it, some possibly more efficient, but we are going to stop there for now.
+
+When we split a node, we need to move half of the keys to another node, so let's write another primitive that does it:
+
+```c++
+// move the second half of a node and fill it with infinities
+void move(int *from, int *to) {
+    const reg infs = _mm256_set1_epi32(INT_MAX);
+    for (int i = 0; i < B / 2; i += 8) {
+        reg t = _mm256_load_si256((reg*) &from[B / 2 + i]);
+        _mm256_store_si256((reg*) &to[i], t);
+        _mm256_store_si256((reg*) &from[B / 2 + i], infs);
+    }
+}
+```
+
+With these two vector functions implemented, we can now very carefully implement insertion:
+
+```c++
+void insert(int _x) {
+    // the beginning of the procedure is the same as in lower_bound,
+    // except that we save the path in case we need to update some of our ancestors
+    unsigned sk[10], si[10]; // k and i on each iteration
+    //           ^------^ We assume that the tree height does not exceed 10
+    //                    (which would require at least 16^10 elements)
+    
+    unsigned k = root;
+    reg x = _mm256_set1_epi32(_x);
+
+    for (int h = 0; h < H - 1; h++) {
+        unsigned i = rank32(x, &tree[k]);
+
+        // optionally update the key i right away
+        tree[k + i] = (_x > tree[k + i] ? _x : tree[k + i]);
+        sk[h] = k, si[h] = i; // and save the path
+        
+        k = tree[k + B + i];
+    }
+
+    unsigned i = rank32(x, &tree[k]);
+
+    // we can start computing the is-full check before insertion completes
+    bool filled  = (tree[k + B - 2] != INT_MAX);
+
+    insert(tree + k, i, _x);
+
+    if (filled) {
+        // the node needs to be split, so we create a new leaf node
+        move(tree + k, tree + n_tree);
+        
+        int v = tree[k + B / 2 - 1]; // new key to be inserted
+        int p = n_tree;              // pointer to the newly created node
+        
+        n_tree += B;
+
+        for (int h = H - 2; h >= 0; h--) {
+            // ascend and repeat until we reach the root or find a the node is not split
+            k = sk[h], i = si[h];
+
+            filled = (tree[k + B - 3] != INT_MAX);
+
+            // the node already has a correct key (the right one)
+            //                  and a correct pointer (the left one)
+            insert(tree + k,     i,     v);
+            insert(tree + k + B, i + 1, p);
+            
+            if (!filled)
+                return; // we're done
+
+            // create a new internal node
+            move(tree + k,     tree + n_tree);     // move keys
+            move(tree + k + B, tree + n_tree + B); // move pointers
+
+            v = tree[k + B / 2 - 1];
+            tree[k + B / 2 - 1] = INT_MAX;
+
+            p = n_tree;
+            n_tree += 2 * B;
+        }
+
+        // if reach here, this means we've reached the root,
+        // and it was split into two, so we need a new root
+        tree[n_tree] = v;
+
+        tree[n_tree + B] = root;
+        tree[n_tree + B + 1] = p;
+
+        root = n_tree;
+        n_tree += 2 * B;
+        H++;
+    }
+}
+```
+
+There are many inefficiencies, but, luckily, the body of `if (filled)` is executed very infrequently — approximately every $\frac{B}{2}$ insertions — and the insertion performance is not really our top priority, so we will just leave it there.
+
+## Evaluation
+
+We have only implemented `insert` and `lower_bound`, so this is what we will measure.
+
+We want the evaluation to take a reasonable time, so our benchmark is a loop that alternates between two steps:
+
+- Increase the structure size from $1.17^k$ to $1.17^{k+1}$ using individual `insert`s and measure the time it took.
+- Perform $10^6$ random `lower_bound` queries and measure the time it took.
+
+We start at the size $10^4$ and end at $10^7$, for around $50$ data points in total. We generate the data for both query types uniformly in the $[0, 2^{30})$ range and independently between the stages. Since the data generation process allows for repeated keys, we compared against `std::multiset` and `absl::btree_multiset`[^absl], although we still refer to them as `std::set` and `absl::btree` for brevity. We also enable [hugepages](/hpc/cpu-cache/paging) on the system level for all three runs.
+
+[^absl]: If you also think that only comparing with Abseil's B-tree is not convincing enough, [feel free](https://github.com/sslotin/amh-code/tree/main/b-tree) to add your favorite search tree to the benchmark.
+
+<!--
+
+Keys are uniform, but we should not rely on that fact (e.g., using interpolation search).
+
+It is common that >90% of operations are lookups. Optimizing searches is important because every other operation starts with locating a key.
+
+I apologize to everyone else, but this is sort of your fault for not using a public benchmark.
+
+-->
+
+The performance of the B− tree matches what we originally predicted — at least for the lookups:
+
+![](../img/btree-absolute.svg)
+
+The relative speedup varies with the structure size — 7-18x/3-8x over STL and 3-7x/1.5-2x over Abseil:
+
+![](../img/btree-relative.svg)
+
+Insertions are only 1.5-2 faster than for `absl::btree`, which uses scalar code to do everything. My best guess why insertions are *that* slow is due to data dependency: since the tree nodes may change, the CPU can't start processing the next query before the previous one finishes (the [true latency](../s-tree/#comparison-with-stdlower_bound) of both queries is roughly equal and ~3x of the reciprocal throughput of `lower_bound`).
+
+![](../img/btree-absl.svg)
+
+When the structure size is small, the [reciprocal throughput](../s-tree/#comparison-with-stdlower_bound) of `lower_bound` increases in discrete steps: it starts with 3.5ns when there is only the root to visit, then grows to 6.5ns (two nodes), and then to 12ns (three nodes), and then hits the L2 cache (not shown on the graphs) and starts increasing more smoothly, but still with noticeable spikes when the tree height increases.
+
+Interestingly, B− tree outperforms `absl::btree` even when it only stores a single key: it takes around 5ns stalling on [branch misprediction](/hpc/pipelining/branching/), while (the search in) the B− tree is entirely branchless.
+
+### Possible Optimizations
+
+In our previous endeavors in data structure optimization, it helped a lot to make as many variables as possible compile-time constants: the compiler can hardcode these constants into the machine code, simplify the arithmetic, unroll all the loops, and do many other nice things for us.
+
+This would not be a problem at all if our tree were of constant height, but it is not. It is *largely* constant, though: the height rarely changes, and in fact, under the constraints of the benchmark, the maximum height was only 6.
+
+What we can do is pre-compile the `insert` and `lower_bound` functions for several different compile-time constant heights and switch between them as the tree grows. The idiomatic C++ way is to use virtual functions, but I prefer to be explicit and use raw function pointers like this:
+
+```c++
+void (*insert_ptr)(int);
+int (*lower_bound_ptr)(int);
+
+void insert(int x) {
+    insert_ptr(x);
+}
+
+int lower_bound(int x) {
+    return lower_bound_ptr(x);
+}
+```
+
+We now define template functions that have the tree height as a parameter, and in the grow-tree block inside the `insert` function, we change the pointers as the tree grows:
+
+```c++
+template <int H>
+void insert_impl(int _x) {
+    // ...
+}
+
+template <int H>
+void insert_impl(int _x) {
+    // ...
+    if (/* tree grows */) {
+        // ...
+        insert_ptr = &insert_impl<H + 1>;
+        lower_bound_ptr = &lower_bound_impl<H + 1>;
+    }
+}
+
+template <>
+void insert_impl<10>(int x) {
+    std::cerr << "This depth was not supposed to be reached" << std::endl;
+    exit(1);
+}
+```
+
+<!--
+insert_ptr = &insert_impl<1>;
+lower_bound_ptr = &lower_bound_impl<1>;
+-->
+
+I tried but could not get any performance improvement with this, but I still have high hope for this approach because the compiler can (theoretically) remove `sk` and `si`, completely removing any temporary storage and only reading and computing everything once, greatly optimizing the `insert` procedure.
+
+Insertion can also probably be optimized by using a larger block size as node splits would become rare, but this comes at the cost of slower lookups. We could also try different node sizes for different layers: leaves should probably be larger than the internal nodes.
+
+**Another idea** is to move extra keys on insert to a sibling node, delaying the node split as long as possible.
+
+One such particular modification is known as the B* tree. It moves the last key to the next node if the current one is full, and when both nodes become full, it jointly splits both of them, producing three nodes that are ⅔ full. This reduces the memory overhead (the nodes will be ⅚ full on average) and increases the fanout factor, reducing the height, which helps all operations.
+
+This technique can even be extended to, say, three-to-four splits, although further generalization would come at the cost of a slower `insert`.
+
+**And yet another idea** is to get rid of (some) pointers. For example, for large trees, we can probably afford a small [S+ tree](../s-tree) for $16 \cdot 17$ or so elements as the root, which we rebuild from scratch on each infrequent occasion when it changes. You can't extend it to the whole tree, unfortunately: I believe there is a paper somewhere saying that we can't turn a dynamic structure fully implicit without also having to do $\Omega(\sqrt n)$ operations per query.
+
+We could also try some non-tree data structures, such as the [skip list](https://en.wikipedia.org/wiki/Skip_list). There has even been a [successful attempt to vectorize it](https://doublequan.github.io/) — although the speedup was not that impressive. I have low hope that skip-list, in particular, can be improved, although it may achieve a higher total throughput in the concurrent setting.
+
+### Other Operations
+
+To *delete* a key, we can similarly locate and remove it from a node with the same mask-store trick. After that, if the node is at least half-full, we're done. Otherwise, we try to borrow a key from the next sibling. If the sibling has more than $\frac{B}{2}$ keys, we append its first key and shift its keys one to the left. Otherwise, both the current node and the next node have less than $\frac{B}{2}$ keys, so we can merge them, after which we go to the parent and iteratively delete a key there.
+
+Another thing we may want to implement is *iteration*. Bulk-loading each key from `l` to `r` is a very common pattern — for example, in `SELECT abc ORDER BY xyz` type of queries in databases — and B+ trees usually store pointers to the next node in the data layer to allow for this type of rapid iteration. In B− trees, as we're using a much smaller node size, we can experience [pointer chasing](/hpc/cpu-cache/latency/) problems if we do this. Going to the parent and reading all its $B$ pointers is probably faster as it negates this problem. Therefore, a stack of ancestors (the `sk` and `si` arrays we used in `insert`) can serve as an iterator and may even be better than separately storing pointers in nodes.
+
+We can easily implement almost everything that `std::set` does, but the B− tree, like any other B-tree, is very unlikely to become a drop-in replacement to `std::set` due to the requirement of pointer stability: a pointer to an element should remain valid unless the element is deleted, which is hard to achieve when we split and merge nodes all the time. This is a major problem not only for search trees but most data structures in general: having both pointer stability and high performance at the same time is next to impossible.
+
+<!--
+Maybe if the C++ standard adds something like `std::set_with_unstable_pointers`
+
+We can't store junk in keys.
+-->
+
+## Acknowledgements
+
+Thanks to [Danila Kutenin](https://danlark.org/) from Google for meaningful discussions of applicability and the usage of B-trees in Abseil.
+
+<!-- One interesting use case is *rope*, also known as *cord*, which is used for wrapping strings in a tree to support mass operations. For example, editing a very large text file. Which is the topic. -->
diff --git a/content/english/hpc/data-structures/binary-search.md b/content/english/hpc/data-structures/binary-search.md
index 61aec502..6426ddde 100644
--- a/content/english/hpc/data-structures/binary-search.md
+++ b/content/english/hpc/data-structures/binary-search.md
@@ -1,15 +1,18 @@
 ---
 title: Binary Search
 weight: 1
+published: true
 ---
 
+<!-- mention interpolation search and radix trees? -->
+
 While improving the speed of user-facing applications is the end goal of performance engineering, people don't really get excited over 5-10% improvements in some databases. Yes, this is what software engineers are paid for, but these types of optimizations tend to be too intricate and system-specific to be readily generalized to other software.
 
 Instead, the most fascinating showcases of performance engineering are multifold optimizations of textbook algorithms: the kinds that everybody knows and deemed so simple that it would never even occur to try to optimize them in the first place. These optimizations are simple and instructive and can very much be adopted elsewhere. And they are surprisingly not as rare as you'd think.
 
 <!-- Yet, with remarkable periodicity, these can be optimized to ridiculous levels of performance. -->
 
-In this article, we focus on such fundamental algorithm — *binary search* — and implement two of its variants that are, depending on the problem size, up to 4x faster than `std::lower_bound`, while being under just 15 lines of code.
+In this section, we focus on one such fundamental algorithm — *binary search* — and implement two of its variants that are, depending on the problem size, up to 4x faster than `std::lower_bound`, while being under just 15 lines of code.
 
 The first algorithm achieves that by removing [branches](/hpc/pipelining/branching), and the second also optimizes the memory layout to achieve better [cache system](/hpc/cpu-cache) performance. This technically disqualifies it from being a drop-in replacement for `std::lower_bound` as it needs to permute the elements of the array before it can start answering queries — but I can't recall a lot of scenarios where you obtain a sorted array but can't afford to spend linear time on preprocessing.
 
@@ -20,7 +23,7 @@ The first algorithm achieves that by removing [branches](/hpc/pipelining/branchi
 
 -->
 
-The usual disclaimer: the CPU is a [Zen 2](https://www.7-cpu.com/cpu/Zen2.html), the RAM is a [DDR4-2666](http://localhost:1313/hpc/cpu-cache/), and the compiler we will be using by default is Clang 10. The performance on your machine may be different, so I highly encourage to [go and test it](https://godbolt.org/z/14rd5Pnve) for yourself.
+The usual disclaimer: the CPU is a [Zen 2](https://www.7-cpu.com/cpu/Zen2.html), the RAM is a [DDR4-2666](/hpc/cpu-cache/), and the compiler we will be using by default is Clang 10. The performance on your machine may be different, so I highly encourage to [go and test it](https://godbolt.org/z/14rd5Pnve) for yourself.
 
 <!--
 
@@ -71,7 +74,7 @@ int lower_bound(int x) {
 
 Find the middle element of the search range, compare it to `x`, shrink the range in half. Beautiful in its simplicity.
 
-A similar approach is employed by `std::lower_bound`, except that it needs to be more generic to support containers with non-random-access iterators and thus uses the first element and the size of the search interval instead of the two of its ends. Implementations from both [Clang](https://github.com/llvm-mirror/libcxx/blob/78d6a7767ed57b50122a161b91f59f19c9bd0d19/include/algorithm#L4169) and [GCC](https://github.com/gcc-mirror/gcc/blob/d9375e490072d1aae73a93949aa158fcd2a27018/libstdc%2B%2B-v3/include/bits/stl_algobase.h#L1023) use this metaprogramming monstrosity:
+A similar approach is employed by `std::lower_bound`, except that it needs to be more generic to support containers with non-random-access iterators and thus uses the first element and the size of the search interval instead of the two of its ends. To this end, implementations from both [Clang](https://github.com/llvm-mirror/libcxx/blob/78d6a7767ed57b50122a161b91f59f19c9bd0d19/include/algorithm#L4169) and [GCC](https://github.com/gcc-mirror/gcc/blob/d9375e490072d1aae73a93949aa158fcd2a27018/libstdc%2B%2B-v3/include/bits/stl_algobase.h#L1023) use this metaprogramming monstrosity:
 
 ```c++
 template <class _Compare, class _ForwardIterator, class _Tp>
@@ -131,23 +134,60 @@ Now, let's try to get rid of these obstacles one by one.
 
 ## Removing Branches
 
-We can replace branching with [predication](/hpc/pipelining/branchless). To do this, we need to adopt the STL approach and rewrite the loop using the first element and the size of the search interval — instead of its first and last element. This way we only need to update the first element of the search interval with a `cmov` instruction and halve its size on each iteration:
+We can replace branching with [predication](/hpc/pipelining/branchless). To make the task easier, we can adopt the STL approach and rewrite the loop using the first element and the size of the search interval (instead of its first and last element):
+
+```c++
+int lower_bound(int x) {
+    int *base = t, len = n;
+    while (len > 1) {
+        int half = len / 2;
+        if (base[half - 1] < x) {
+            base += half;
+            len = len - half;
+        } else {
+            len = half;
+        }
+    }
+    return *base;
+}
+```
+
+Note that, on each iteration, `len` is essentially just halved and then either floored or ceiled, depending on how the comparison went. This conditional update seems unnecessary; to avoid it, we can simply say that it's always ceiled:
+
+```c++
+int lower_bound(int x) {
+    int *base = t, len = n;
+    while (len > 1) {
+        int half = len / 2;
+        if (base[half - 1] < x)
+            base += half;
+        len -= half; // = ceil(len / 2)
+    }
+    return *base;
+}
+```
+
+This way, we only need to update the first element of the search interval with a [conditional move](/hpc/pipelining/branchless/) and halve its size on each iteration:
 
 ```c++
 int lower_bound(int x) {
     int *base = t, len = n;
     while (len > 1) {
         int half = len / 2;
-        base = (base[half] < x ? &base[half] : base);
+        base += (base[half - 1] < x) * half; // will be replaced with a "cmov"
         len -= half;
     }
-    return *(base + (*base < x));
+    return *base;
 }
 ```
 
-Note that this loop is not always equivalent to the standard binary search — it always rounds *up* the size of the search interval, so it accesses slightly different elements and may perform one comparison more than what is needed. We do this to make the number of iterations constant and remove the need for branching completely, although it does require an awkward `(*base < x)` check at the end.
+<!-- pre-compute base pointer for next iteration? -->
+
+Note that this loop is not always equivalent to the standard binary search. Since it always rounds *up* the size of the search interval, it accesses slightly different elements and may perform one comparison more than needed. Apart from simplifying computations on each iteration, it also makes the number of iterations constant if the array size is constant, removing branch mispredictions completely.
+
+As typical for predication, this trick is very fragile to compiler optimizations — depending on the compiler and how the function is invoked, it may still leave a branch or generate suboptimal code. It works fine on Clang 10, yielding a 2.5-3x improvement on small arrays:
 
-As typical for predication, this trick is very fragile to compiler optimizations. It doesn't make a difference on Clang — for some reason, it replaces the ternary operator with a branch anyway — but it works fine on GCC (9.3), yielding a 2.5-3x improvement on small arrays:
+<!-- todo: update numbers -->
 
 ![](../img/search-branchless.svg)
 
@@ -162,20 +202,22 @@ int lower_bound(int x) {
     int *base = t, len = n;
     while (len > 1) {
         int half = len / 2;
-        __builtin_prefetch(&base[(len - half) / 2]);
-        __builtin_prefetch(&base[half + (len - half) / 2]);
-        base = (base[half] < x ? &base[half] : base);
         len -= half;
+        __builtin_prefetch(&base[len / 2 - 1]);
+        __builtin_prefetch(&base[half + len / 2 - 1]);
+        base += (base[half - 1] < x) * half;
     }
-    return *(base + (*base < x));
+    return *base;
 }
 ```
 
+<!-- todo: rerun this too -->
+
 With prefetching, the performance on large arrays becomes roughly the same:
 
 ![](../img/search-branchless-prefetch.svg)
 
-The graph still grows faster as the branchy version also prefetches "grandchildren", "grand-grandchildren", and so on — although the usefulness of each new speculative read diminishes exponentially as the prediction is less and less likely to be correct.
+The graph still grows faster as the branchy version also prefetches "grandchildren," "great-grandchildren," and so on — although the usefulness of each new speculative read diminishes exponentially as the prediction is less and less likely to be correct.
 
 In the branchless version, we could also fetch ahead by more than one layer, but the number of fetches we'd need also grows exponentially. Instead, we will try a different approach to optimize memory operations.
 
@@ -248,7 +290,7 @@ Apart from being compact, it has some nice properties, like that all even-number
 
 Here is how this layout looks when applied to binary search:
 
-![](../img/eytzinger.png)
+![Note that the tree is slightly imbalanced (because of the last layer is continuous)](../img/eytzinger.png)
 
 When searching in this layout, we just need to start from the first element of the array, and then on each iteration jump to either $2 k$ or $(2k + 1)$, depending on how the comparison went:
 
@@ -278,15 +320,17 @@ void eytzinger(int k = 1) {
 }
 ```
 
-This function takes the current node number `k`, recursively writes out all elements to the left of the middle of the search interval, writes out the current element we'd compare against, and then recursively writes out all the elements on the right. It seems a bit complicated, but to convince ourselves that it works, we only need three observations:
+This function takes the current node number `k`, recursively writes out all elements to the left of the middle of the search interval, writes out the current element we'd compare against, and then recursively writes out all the elements on the right. It seems a bit complicated, but to convince yourself that it works, you only need three observations:
 
 - It writes exactly `n` elements as we enter the body of `if` for each `k` from `1` to `n` just once.
 - It writes out sequential elements from the original array as it increments the `i` pointer each time.
-- By the time we write the element at node `k`, we have already written all the elements to its left (exactly `i`).
+- By the time we write the element at node `k`, we will have already written all the elements to its left (exactly `i`).
+
+Despite being recursive, it is actually quite fast as all the memory reads are sequential, and the memory writes are only in $O(\log n)$ different memory blocks at a time. Maintaining the permutation is both logically and computationally harder to maintain though: adding an element to a sorted array only requires shifting a suffix of its elements one position to the right, while Eytzinger array practically needs to be rebuilt from scratch.
 
-Despite being recursive, it is actually quite fast as all the memory reads are sequential, and the memory writes are only in $O(\log n)$ different memory blocks at a time.
+Note that this traversal and the resulting permutation are not exactly equivalent to the "tree" of vanilla binary search: for example, the left child subtree may be larger than the right child subtree — up to twice as large — but it doesn't matter much since both approaches result in the same $\lceil \log_2 n \rceil$ tree depth.
 
-Note that the Eytzinger array is one-indexed — this will be important for performance later. You can put in the zeroth element the value that you want to be returned in the case when the lower bound doesn't exist (similar to `a.end()` for `std::lower_bound`).
+Also note that the Eytzinger array is one-indexed — this will be important for performance later. You can put in the zeroth element the value that you want to be returned in the case when the lower bound doesn't exist (similar to `a.end()` for `std::lower_bound`).
 
 ### Search Implementation
 
@@ -298,22 +342,35 @@ while (k <= n)
     k = 2 * k + (t[k] < x);
 ```
 
-The only problem arises when we need to restore the index of the resulting element, as $k$ may end up not pointing to a leaf node. Here is an example of how that can happen:
+The only problem arises when we need to restore the index of the resulting element, as $k$ does not directly point to it. Consider this example (its corresponding tree is listed above):
 
-```
-    array:  1 2 3 4 5 6 7 8
-eytzinger:  4 2 5 1 6 3 7 8
-1st range:  ---------------  k := 1
-2nd range:  -------          k := 2*k      (=2)
-3rd range:      ---          k := 2*k + 1  (=5)
-4th range:        -          k := 2*k + 1  (=11)
-```
+<!--
+    array:  0 1 2 3 4 5 6 7 8 9                           
+eytzinger:  6 3 7 1 5 8 9 0 2 4                           
+1st range:  -------------------  k := 1                    
+2nd range:  -------------        k := 2*k     = 2   (6 ≥ 3)
+3rd range:  -------              k := 2*k     = 4   (3 ≥ 3)
+4th range:      ---              k := 2*k + 1 = 9   (1 < 3)
+5th range:        -              k := 2*k + 1 = 19  (2 < 3)
+-->
+
+<pre class='center-pre'>
+    array:  0 1 2 3 4 5 6 7 8 9                            
+eytzinger:  <u>6</u> <u>3</u> 7 <u>1</u> 5 8 9 0 <u>2</u> 4                            
+1st range:  ------------?------  k := 2*k     = 2   (6 ≥ 3)
+2nd range:  ------?------        k := 2*k     = 4   (3 ≥ 3)
+3rd range:  --?----              k := 2*k + 1 = 9   (1 < 3)
+4th range:      ?--              k := 2*k + 1 = 19  (2 < 3)
+5th range:        !                                        
+</pre>
+
+<!-- do we need the last comparison? -->
 
-Here we query the array of $[1, …, 8]$ for the lower bound of $x=4$. We compare it against $4$, $2$, and $5$, go left-right-right, and end up with $k = 11$, which isn't even a valid array index.
+Here we query the array of $[0, …, 9]$ for the lower bound of $x=3$. We compare it against $6$, $3$, $1$, and $2$, go left-left-right-right, and end up with $k = 19$, which isn't even a valid array index.
 
-The trick is to notice that, unless the answer is the last element of the array, we compare $x$ against it at some point, and after we've learned that it is not less than $x$, we start comparing $x$ against elements to the left, and all these comparisons evaluate true (i. e. leading to the right). Therefore, to restore the answer, we just need to "cancel" some number of right turns.
+The trick is to notice that, unless the answer is the last element of the array, we compare $x$ against it at some point, and after we've learned that it is not less than $x$, we go left exactly once and then keep going right until we reach a leaf (because we will only be comparing $x$ against lesser elements). Therefore, to restore the answer, we just need to "cancel" some number of right turns and then one more.
 
-This can be done in an elegant way by observing that the right turns are recorded in the binary representation of $k$ as 1-bits, and so we just need to find the number of trailing ones in the binary representation and right-shift $k$ by exactly that amount. To do this, we can invert the number (`~k`) and call the "find first set" instruction:
+This can be done in an elegant way by observing that the right turns are recorded in the binary representation of $k$ as 1-bits, and so we just need to find the number of trailing 1s in the binary representation and right-shift $k$ by exactly that number of bits plus one. To do this, we can invert the number (`~k`) and call the "find first set" instruction:
 
 ```c++
 int lower_bound(int x) {
@@ -359,9 +416,9 @@ This observation extends to the grand-children of node $k$ — they are also sto
 \end{aligned}
 -->
 
-Their cache line can also be fetched with one instruction. Interesting… what if we continue this, and instead of fetching direct children, we fetch ahead as many descendants as we can cramp into one cache line? That would be $\frac{64}{4} = 16$ elements, our grand-grand-grandchildren with indices from $16k$ to $(16k + 15)$.
+Their cache line can also be fetched with one instruction. Interesting… what if we continue this, and instead of fetching direct children, we fetch ahead as many descendants as we can cramp into one cache line? That would be $\frac{64}{4} = 16$ elements, our great-great-grandchildren with indices from $16k$ to $(16k + 15)$.
 
-Now, if we prefetch just one of these 16 elements, we will probably only get some but not all of them, as they may cross a cache line boundary. We can prefetch the first *and* the last element, but to get away with just one memory request, we need to notice that the index of the first element, $16k$, is divisible by $16$, so its memory address will be the base address of the array plus something divisible by $16 \cdot 4 = 64$, the cache line size. If the array were to begin on a cache line, then these $16$ grand-gran-grandchildren elements will be guaranteed to be on a single cache line, which is just what we needed.
+Now, if we prefetch just one of these 16 elements, we will probably only get some but not all of them, as they may cross a cache line boundary. We can prefetch the first *and* the last element, but to get away with just one memory request, we need to notice that the index of the first element, $16k$, is divisible by $16$, so its memory address will be the base address of the array plus something divisible by $16 \cdot 4 = 64$, the cache line size. If the array were to begin on a cache line, then these $16$ great-great-grandchildren elements will be guaranteed to be on a single cache line, which is just what we needed.
 
 Therefore, we only need to [align](/hpc/cpu-cache/alignment) the array:
 
@@ -399,7 +456,7 @@ Also, note that the last few prefetch requests are actually not needed, and in f
 
 This prefetching technique allows us to read up to four elements ahead, but it doesn't really come for free — we are effectively trading off excess memory [bandwidth](/hpc/cpu-cache/bandwidth) for reduced [latency](/hpc/cpu-cache/latency). If you run more than one instance at a time on separate hardware threads or just any other memory-intensive computation in the background, it will significantly [affect](/hpc/cpu-cache/sharing) the benchmark performance.
 
-But we can do better. Instead of fetching four cache lines at a time, we could fetch four times *fewer* cache lines. And in the [next article](../s-tree), we will explore the approach.
+But we can do better. Instead of fetching four cache lines at a time, we could fetch four times *fewer* cache lines. And in the [next section](../s-tree), we will explore the approach.
 
 <!--
 
@@ -411,7 +468,7 @@ But that was a small detour. Let's get back to optimizing for *large* arrays.
 
 ### Removing the Last Branch
 
-Just the finishing touch. Did you notice the bumpiness of the Eytzinger search? This isn't random noise — let's zoom in:
+Just one finishing touch: did you notice the bumpiness of the Eytzinger search? This isn't random noise — let's zoom in:
 
 ![](../img/search-eytzinger-small.svg)
 
diff --git a/content/english/hpc/data-structures/img/btree-absl.svg b/content/english/hpc/data-structures/img/btree-absl.svg
new file mode 100644
index 00000000..4ed0a949
--- /dev/null
+++ b/content/english/hpc/data-structures/img/btree-absl.svg
@@ -0,0 +1,1344 @@
+<?xml version="1.0" encoding="utf-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+  "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<!-- Created with matplotlib (https://matplotlib.org/) -->
+<svg height="576pt" version="1.1" viewBox="0 0 864 576" width="864pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+ <defs>
+  <style type="text/css">
+*{stroke-linecap:butt;stroke-linejoin:round;}
+  </style>
+ </defs>
+ <g id="figure_1">
+  <g id="patch_1">
+   <path d="M 0 576 
+L 864 576 
+L 864 0 
+L 0 0 
+z
+" style="fill:#ffffff;"/>
+  </g>
+  <g id="axes_1">
+   <g id="patch_2">
+    <path d="M 108 270.72 
+L 777.6 270.72 
+L 777.6 69.12 
+L 108 69.12 
+z
+" style="fill:#ffffff;"/>
+   </g>
+   <g id="matplotlib.axis_1">
+    <g id="xtick_1">
+     <g id="line2d_1">
+      <path clip-path="url(#pcc5d190b33)" d="M 226.05126 270.72 
+L 226.05126 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+    </g>
+    <g id="xtick_2">
+     <g id="line2d_2">
+      <path clip-path="url(#pcc5d190b33)" d="M 363.938445 270.72 
+L 363.938445 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+    </g>
+    <g id="xtick_3">
+     <g id="line2d_3">
+      <path clip-path="url(#pcc5d190b33)" d="M 501.82563 270.72 
+L 501.82563 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+    </g>
+    <g id="xtick_4">
+     <g id="line2d_4">
+      <path clip-path="url(#pcc5d190b33)" d="M 639.712815 270.72 
+L 639.712815 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+    </g>
+    <g id="xtick_5">
+     <g id="line2d_5">
+      <path clip-path="url(#pcc5d190b33)" d="M 777.6 270.72 
+L 777.6 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+    </g>
+   </g>
+   <g id="matplotlib.axis_2">
+    <g id="ytick_1">
+     <g id="line2d_6">
+      <path clip-path="url(#pcc5d190b33)" d="M 108 249.371067 
+L 777.6 249.371067 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_1">
+      <!-- 1.2 -->
+      <defs>
+       <path d="M 12.40625 8.296875 
+L 28.515625 8.296875 
+L 28.515625 63.921875 
+L 10.984375 60.40625 
+L 10.984375 69.390625 
+L 28.421875 72.90625 
+L 38.28125 72.90625 
+L 38.28125 8.296875 
+L 54.390625 8.296875 
+L 54.390625 0 
+L 12.40625 0 
+z
+" id="DejaVuSans-49"/>
+       <path d="M 10.6875 12.40625 
+L 21 12.40625 
+L 21 0 
+L 10.6875 0 
+z
+" id="DejaVuSans-46"/>
+       <path d="M 19.1875 8.296875 
+L 53.609375 8.296875 
+L 53.609375 0 
+L 7.328125 0 
+L 7.328125 8.296875 
+Q 12.9375 14.109375 22.625 23.890625 
+Q 32.328125 33.6875 34.8125 36.53125 
+Q 39.546875 41.84375 41.421875 45.53125 
+Q 43.3125 49.21875 43.3125 52.78125 
+Q 43.3125 58.59375 39.234375 62.25 
+Q 35.15625 65.921875 28.609375 65.921875 
+Q 23.96875 65.921875 18.8125 64.3125 
+Q 13.671875 62.703125 7.8125 59.421875 
+L 7.8125 69.390625 
+Q 13.765625 71.78125 18.9375 73 
+Q 24.125 74.21875 28.421875 74.21875 
+Q 39.75 74.21875 46.484375 68.546875 
+Q 53.21875 62.890625 53.21875 53.421875 
+Q 53.21875 48.921875 51.53125 44.890625 
+Q 49.859375 40.875 45.40625 35.40625 
+Q 44.1875 33.984375 37.640625 27.21875 
+Q 31.109375 20.453125 19.1875 8.296875 
+z
+" id="DejaVuSans-50"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(81.006563 253.550207)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-49"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-46"/>
+       <use x="95.410156" xlink:href="#DejaVuSans-50"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_2">
+     <g id="line2d_7">
+      <path clip-path="url(#pcc5d190b33)" d="M 108 218.134436 
+L 777.6 218.134436 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_2">
+      <!-- 1.4 -->
+      <defs>
+       <path d="M 37.796875 64.3125 
+L 12.890625 25.390625 
+L 37.796875 25.390625 
+z
+M 35.203125 72.90625 
+L 47.609375 72.90625 
+L 47.609375 25.390625 
+L 58.015625 25.390625 
+L 58.015625 17.1875 
+L 47.609375 17.1875 
+L 47.609375 0 
+L 37.796875 0 
+L 37.796875 17.1875 
+L 4.890625 17.1875 
+L 4.890625 26.703125 
+z
+" id="DejaVuSans-52"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(81.006563 222.313577)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-49"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-46"/>
+       <use x="95.410156" xlink:href="#DejaVuSans-52"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_3">
+     <g id="line2d_8">
+      <path clip-path="url(#pcc5d190b33)" d="M 108 186.897806 
+L 777.6 186.897806 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_3">
+      <!-- 1.6 -->
+      <defs>
+       <path d="M 33.015625 40.375 
+Q 26.375 40.375 22.484375 35.828125 
+Q 18.609375 31.296875 18.609375 23.390625 
+Q 18.609375 15.53125 22.484375 10.953125 
+Q 26.375 6.390625 33.015625 6.390625 
+Q 39.65625 6.390625 43.53125 10.953125 
+Q 47.40625 15.53125 47.40625 23.390625 
+Q 47.40625 31.296875 43.53125 35.828125 
+Q 39.65625 40.375 33.015625 40.375 
+z
+M 52.59375 71.296875 
+L 52.59375 62.3125 
+Q 48.875 64.0625 45.09375 64.984375 
+Q 41.3125 65.921875 37.59375 65.921875 
+Q 27.828125 65.921875 22.671875 59.328125 
+Q 17.53125 52.734375 16.796875 39.40625 
+Q 19.671875 43.65625 24.015625 45.921875 
+Q 28.375 48.1875 33.59375 48.1875 
+Q 44.578125 48.1875 50.953125 41.515625 
+Q 57.328125 34.859375 57.328125 23.390625 
+Q 57.328125 12.15625 50.6875 5.359375 
+Q 44.046875 -1.421875 33.015625 -1.421875 
+Q 20.359375 -1.421875 13.671875 8.265625 
+Q 6.984375 17.96875 6.984375 36.375 
+Q 6.984375 53.65625 15.1875 63.9375 
+Q 23.390625 74.21875 37.203125 74.21875 
+Q 40.921875 74.21875 44.703125 73.484375 
+Q 48.484375 72.75 52.59375 71.296875 
+z
+" id="DejaVuSans-54"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(81.006563 191.076946)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-49"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-46"/>
+       <use x="95.410156" xlink:href="#DejaVuSans-54"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_4">
+     <g id="line2d_9">
+      <path clip-path="url(#pcc5d190b33)" d="M 108 155.661175 
+L 777.6 155.661175 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_4">
+      <!-- 1.8 -->
+      <defs>
+       <path d="M 31.78125 34.625 
+Q 24.75 34.625 20.71875 30.859375 
+Q 16.703125 27.09375 16.703125 20.515625 
+Q 16.703125 13.921875 20.71875 10.15625 
+Q 24.75 6.390625 31.78125 6.390625 
+Q 38.8125 6.390625 42.859375 10.171875 
+Q 46.921875 13.96875 46.921875 20.515625 
+Q 46.921875 27.09375 42.890625 30.859375 
+Q 38.875 34.625 31.78125 34.625 
+z
+M 21.921875 38.8125 
+Q 15.578125 40.375 12.03125 44.71875 
+Q 8.5 49.078125 8.5 55.328125 
+Q 8.5 64.0625 14.71875 69.140625 
+Q 20.953125 74.21875 31.78125 74.21875 
+Q 42.671875 74.21875 48.875 69.140625 
+Q 55.078125 64.0625 55.078125 55.328125 
+Q 55.078125 49.078125 51.53125 44.71875 
+Q 48 40.375 41.703125 38.8125 
+Q 48.828125 37.15625 52.796875 32.3125 
+Q 56.78125 27.484375 56.78125 20.515625 
+Q 56.78125 9.90625 50.3125 4.234375 
+Q 43.84375 -1.421875 31.78125 -1.421875 
+Q 19.734375 -1.421875 13.25 4.234375 
+Q 6.78125 9.90625 6.78125 20.515625 
+Q 6.78125 27.484375 10.78125 32.3125 
+Q 14.796875 37.15625 21.921875 38.8125 
+z
+M 18.3125 54.390625 
+Q 18.3125 48.734375 21.84375 45.5625 
+Q 25.390625 42.390625 31.78125 42.390625 
+Q 38.140625 42.390625 41.71875 45.5625 
+Q 45.3125 48.734375 45.3125 54.390625 
+Q 45.3125 60.0625 41.71875 63.234375 
+Q 38.140625 66.40625 31.78125 66.40625 
+Q 25.390625 66.40625 21.84375 63.234375 
+Q 18.3125 60.0625 18.3125 54.390625 
+z
+" id="DejaVuSans-56"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(81.006563 159.840316)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-49"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-46"/>
+       <use x="95.410156" xlink:href="#DejaVuSans-56"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_5">
+     <g id="line2d_10">
+      <path clip-path="url(#pcc5d190b33)" d="M 108 124.424545 
+L 777.6 124.424545 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_5">
+      <!-- 2.0 -->
+      <defs>
+       <path d="M 31.78125 66.40625 
+Q 24.171875 66.40625 20.328125 58.90625 
+Q 16.5 51.421875 16.5 36.375 
+Q 16.5 21.390625 20.328125 13.890625 
+Q 24.171875 6.390625 31.78125 6.390625 
+Q 39.453125 6.390625 43.28125 13.890625 
+Q 47.125 21.390625 47.125 36.375 
+Q 47.125 51.421875 43.28125 58.90625 
+Q 39.453125 66.40625 31.78125 66.40625 
+z
+M 31.78125 74.21875 
+Q 44.046875 74.21875 50.515625 64.515625 
+Q 56.984375 54.828125 56.984375 36.375 
+Q 56.984375 17.96875 50.515625 8.265625 
+Q 44.046875 -1.421875 31.78125 -1.421875 
+Q 19.53125 -1.421875 13.0625 8.265625 
+Q 6.59375 17.96875 6.59375 36.375 
+Q 6.59375 54.828125 13.0625 64.515625 
+Q 19.53125 74.21875 31.78125 74.21875 
+z
+" id="DejaVuSans-48"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(81.006563 128.603685)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-50"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-46"/>
+       <use x="95.410156" xlink:href="#DejaVuSans-48"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_6">
+     <g id="line2d_11">
+      <path clip-path="url(#pcc5d190b33)" d="M 108 93.187914 
+L 777.6 93.187914 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_6">
+      <!-- 2.2 -->
+      <g style="fill:#262626;" transform="translate(81.006563 97.367055)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-50"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-46"/>
+       <use x="95.410156" xlink:href="#DejaVuSans-50"/>
+      </g>
+     </g>
+    </g>
+    <g id="text_7">
+     <!-- Insert (speedup) -->
+     <defs>
+      <path d="M 9.8125 72.90625 
+L 19.671875 72.90625 
+L 19.671875 0 
+L 9.8125 0 
+z
+" id="DejaVuSans-73"/>
+      <path d="M 54.890625 33.015625 
+L 54.890625 0 
+L 45.90625 0 
+L 45.90625 32.71875 
+Q 45.90625 40.484375 42.875 44.328125 
+Q 39.84375 48.1875 33.796875 48.1875 
+Q 26.515625 48.1875 22.3125 43.546875 
+Q 18.109375 38.921875 18.109375 30.90625 
+L 18.109375 0 
+L 9.078125 0 
+L 9.078125 54.6875 
+L 18.109375 54.6875 
+L 18.109375 46.1875 
+Q 21.34375 51.125 25.703125 53.5625 
+Q 30.078125 56 35.796875 56 
+Q 45.21875 56 50.046875 50.171875 
+Q 54.890625 44.34375 54.890625 33.015625 
+z
+" id="DejaVuSans-110"/>
+      <path d="M 44.28125 53.078125 
+L 44.28125 44.578125 
+Q 40.484375 46.53125 36.375 47.5 
+Q 32.28125 48.484375 27.875 48.484375 
+Q 21.1875 48.484375 17.84375 46.4375 
+Q 14.5 44.390625 14.5 40.28125 
+Q 14.5 37.15625 16.890625 35.375 
+Q 19.28125 33.59375 26.515625 31.984375 
+L 29.59375 31.296875 
+Q 39.15625 29.25 43.1875 25.515625 
+Q 47.21875 21.78125 47.21875 15.09375 
+Q 47.21875 7.46875 41.1875 3.015625 
+Q 35.15625 -1.421875 24.609375 -1.421875 
+Q 20.21875 -1.421875 15.453125 -0.5625 
+Q 10.6875 0.296875 5.421875 2 
+L 5.421875 11.28125 
+Q 10.40625 8.6875 15.234375 7.390625 
+Q 20.0625 6.109375 24.8125 6.109375 
+Q 31.15625 6.109375 34.5625 8.28125 
+Q 37.984375 10.453125 37.984375 14.40625 
+Q 37.984375 18.0625 35.515625 20.015625 
+Q 33.0625 21.96875 24.703125 23.78125 
+L 21.578125 24.515625 
+Q 13.234375 26.265625 9.515625 29.90625 
+Q 5.8125 33.546875 5.8125 39.890625 
+Q 5.8125 47.609375 11.28125 51.796875 
+Q 16.75 56 26.8125 56 
+Q 31.78125 56 36.171875 55.265625 
+Q 40.578125 54.546875 44.28125 53.078125 
+z
+" id="DejaVuSans-115"/>
+      <path d="M 56.203125 29.59375 
+L 56.203125 25.203125 
+L 14.890625 25.203125 
+Q 15.484375 15.921875 20.484375 11.0625 
+Q 25.484375 6.203125 34.421875 6.203125 
+Q 39.59375 6.203125 44.453125 7.46875 
+Q 49.3125 8.734375 54.109375 11.28125 
+L 54.109375 2.78125 
+Q 49.265625 0.734375 44.1875 -0.34375 
+Q 39.109375 -1.421875 33.890625 -1.421875 
+Q 20.796875 -1.421875 13.15625 6.1875 
+Q 5.515625 13.8125 5.515625 26.8125 
+Q 5.515625 40.234375 12.765625 48.109375 
+Q 20.015625 56 32.328125 56 
+Q 43.359375 56 49.78125 48.890625 
+Q 56.203125 41.796875 56.203125 29.59375 
+z
+M 47.21875 32.234375 
+Q 47.125 39.59375 43.09375 43.984375 
+Q 39.0625 48.390625 32.421875 48.390625 
+Q 24.90625 48.390625 20.390625 44.140625 
+Q 15.875 39.890625 15.1875 32.171875 
+z
+" id="DejaVuSans-101"/>
+      <path d="M 41.109375 46.296875 
+Q 39.59375 47.171875 37.8125 47.578125 
+Q 36.03125 48 33.890625 48 
+Q 26.265625 48 22.1875 43.046875 
+Q 18.109375 38.09375 18.109375 28.8125 
+L 18.109375 0 
+L 9.078125 0 
+L 9.078125 54.6875 
+L 18.109375 54.6875 
+L 18.109375 46.1875 
+Q 20.953125 51.171875 25.484375 53.578125 
+Q 30.03125 56 36.53125 56 
+Q 37.453125 56 38.578125 55.875 
+Q 39.703125 55.765625 41.0625 55.515625 
+z
+" id="DejaVuSans-114"/>
+      <path d="M 18.3125 70.21875 
+L 18.3125 54.6875 
+L 36.8125 54.6875 
+L 36.8125 47.703125 
+L 18.3125 47.703125 
+L 18.3125 18.015625 
+Q 18.3125 11.328125 20.140625 9.421875 
+Q 21.96875 7.515625 27.59375 7.515625 
+L 36.8125 7.515625 
+L 36.8125 0 
+L 27.59375 0 
+Q 17.1875 0 13.234375 3.875 
+Q 9.28125 7.765625 9.28125 18.015625 
+L 9.28125 47.703125 
+L 2.6875 47.703125 
+L 2.6875 54.6875 
+L 9.28125 54.6875 
+L 9.28125 70.21875 
+z
+" id="DejaVuSans-116"/>
+      <path id="DejaVuSans-32"/>
+      <path d="M 31 75.875 
+Q 24.46875 64.65625 21.28125 53.65625 
+Q 18.109375 42.671875 18.109375 31.390625 
+Q 18.109375 20.125 21.3125 9.0625 
+Q 24.515625 -2 31 -13.1875 
+L 23.1875 -13.1875 
+Q 15.875 -1.703125 12.234375 9.375 
+Q 8.59375 20.453125 8.59375 31.390625 
+Q 8.59375 42.28125 12.203125 53.3125 
+Q 15.828125 64.359375 23.1875 75.875 
+z
+" id="DejaVuSans-40"/>
+      <path d="M 18.109375 8.203125 
+L 18.109375 -20.796875 
+L 9.078125 -20.796875 
+L 9.078125 54.6875 
+L 18.109375 54.6875 
+L 18.109375 46.390625 
+Q 20.953125 51.265625 25.265625 53.625 
+Q 29.59375 56 35.59375 56 
+Q 45.5625 56 51.78125 48.09375 
+Q 58.015625 40.1875 58.015625 27.296875 
+Q 58.015625 14.40625 51.78125 6.484375 
+Q 45.5625 -1.421875 35.59375 -1.421875 
+Q 29.59375 -1.421875 25.265625 0.953125 
+Q 20.953125 3.328125 18.109375 8.203125 
+z
+M 48.6875 27.296875 
+Q 48.6875 37.203125 44.609375 42.84375 
+Q 40.53125 48.484375 33.40625 48.484375 
+Q 26.265625 48.484375 22.1875 42.84375 
+Q 18.109375 37.203125 18.109375 27.296875 
+Q 18.109375 17.390625 22.1875 11.75 
+Q 26.265625 6.109375 33.40625 6.109375 
+Q 40.53125 6.109375 44.609375 11.75 
+Q 48.6875 17.390625 48.6875 27.296875 
+z
+" id="DejaVuSans-112"/>
+      <path d="M 45.40625 46.390625 
+L 45.40625 75.984375 
+L 54.390625 75.984375 
+L 54.390625 0 
+L 45.40625 0 
+L 45.40625 8.203125 
+Q 42.578125 3.328125 38.25 0.953125 
+Q 33.9375 -1.421875 27.875 -1.421875 
+Q 17.96875 -1.421875 11.734375 6.484375 
+Q 5.515625 14.40625 5.515625 27.296875 
+Q 5.515625 40.1875 11.734375 48.09375 
+Q 17.96875 56 27.875 56 
+Q 33.9375 56 38.25 53.625 
+Q 42.578125 51.265625 45.40625 46.390625 
+z
+M 14.796875 27.296875 
+Q 14.796875 17.390625 18.875 11.75 
+Q 22.953125 6.109375 30.078125 6.109375 
+Q 37.203125 6.109375 41.296875 11.75 
+Q 45.40625 17.390625 45.40625 27.296875 
+Q 45.40625 37.203125 41.296875 42.84375 
+Q 37.203125 48.484375 30.078125 48.484375 
+Q 22.953125 48.484375 18.875 42.84375 
+Q 14.796875 37.203125 14.796875 27.296875 
+z
+" id="DejaVuSans-100"/>
+      <path d="M 8.5 21.578125 
+L 8.5 54.6875 
+L 17.484375 54.6875 
+L 17.484375 21.921875 
+Q 17.484375 14.15625 20.5 10.265625 
+Q 23.53125 6.390625 29.59375 6.390625 
+Q 36.859375 6.390625 41.078125 11.03125 
+Q 45.3125 15.671875 45.3125 23.6875 
+L 45.3125 54.6875 
+L 54.296875 54.6875 
+L 54.296875 0 
+L 45.3125 0 
+L 45.3125 8.40625 
+Q 42.046875 3.421875 37.71875 1 
+Q 33.40625 -1.421875 27.6875 -1.421875 
+Q 18.265625 -1.421875 13.375 4.4375 
+Q 8.5 10.296875 8.5 21.578125 
+z
+M 31.109375 56 
+z
+" id="DejaVuSans-117"/>
+      <path d="M 8.015625 75.875 
+L 15.828125 75.875 
+Q 23.140625 64.359375 26.78125 53.3125 
+Q 30.421875 42.28125 30.421875 31.390625 
+Q 30.421875 20.453125 26.78125 9.375 
+Q 23.140625 -1.703125 15.828125 -13.1875 
+L 8.015625 -13.1875 
+Q 14.5 -2 17.703125 9.0625 
+Q 20.90625 20.125 20.90625 31.390625 
+Q 20.90625 42.671875 17.703125 53.65625 
+Q 14.5 64.65625 8.015625 75.875 
+z
+" id="DejaVuSans-41"/>
+     </defs>
+     <g style="fill:#262626;" transform="translate(74.510937 219.456562)rotate(-90)scale(0.12 -0.12)">
+      <use xlink:href="#DejaVuSans-73"/>
+      <use x="29.492188" xlink:href="#DejaVuSans-110"/>
+      <use x="92.871094" xlink:href="#DejaVuSans-115"/>
+      <use x="144.970703" xlink:href="#DejaVuSans-101"/>
+      <use x="206.494141" xlink:href="#DejaVuSans-114"/>
+      <use x="247.607422" xlink:href="#DejaVuSans-116"/>
+      <use x="286.816406" xlink:href="#DejaVuSans-32"/>
+      <use x="318.603516" xlink:href="#DejaVuSans-40"/>
+      <use x="357.617188" xlink:href="#DejaVuSans-115"/>
+      <use x="409.716797" xlink:href="#DejaVuSans-112"/>
+      <use x="473.193359" xlink:href="#DejaVuSans-101"/>
+      <use x="534.716797" xlink:href="#DejaVuSans-101"/>
+      <use x="596.240234" xlink:href="#DejaVuSans-100"/>
+      <use x="659.716797" xlink:href="#DejaVuSans-117"/>
+      <use x="723.095703" xlink:href="#DejaVuSans-112"/>
+      <use x="786.572266" xlink:href="#DejaVuSans-41"/>
+     </g>
+    </g>
+   </g>
+   <g id="line2d_12">
+    <path clip-path="url(#pcc5d190b33)" d="M 108 103.126842 
+L 123.616312 78.283636 
+L 139.225357 81.543177 
+L 154.835707 92.381386 
+L 170.450001 109.939333 
+L 186.065768 113.519552 
+L 201.67987 119.78304 
+L 217.2929 110.169527 
+L 232.909155 131.863408 
+L 248.525007 114.279285 
+L 264.139766 126.353767 
+L 279.754857 151.716773 
+L 295.370534 105.802387 
+L 310.986484 116.622671 
+L 326.602365 124.317956 
+L 342.218044 135.415051 
+L 357.834332 126.905633 
+L 373.450023 116.984573 
+L 389.066187 145.279397 
+L 404.682343 146.479762 
+L 420.298254 113.951026 
+L 435.914393 113.246403 
+L 451.530695 150.63208 
+L 467.14687 172.22534 
+L 482.763023 209.334053 
+L 498.37927 221.201493 
+L 513.99548 234.722083 
+L 529.611731 261.556364 
+L 545.228 247.848943 
+L 560.844274 214.856802 
+L 576.460557 214.55924 
+L 592.07685 216.582921 
+L 607.693122 198.404223 
+L 623.309383 213.566126 
+L 638.92567 202.451866 
+L 654.541958 191.074112 
+L 670.158255 188.783125 
+L 685.774556 210.882381 
+L 701.390864 209.623974 
+L 717.007176 203.489785 
+L 732.623475 201.058718 
+L 748.239782 202.962183 
+L 763.85609 184.087593 
+L 779.472394 176.217077 
+" style="fill:none;stroke:#8b0000;stroke-linecap:round;stroke-width:1.5;"/>
+   </g>
+   <g id="line2d_13">
+    <path clip-path="url(#pcc5d190b33)" d="M 88.164075 270.72 
+L 88.164075 69.12 
+" style="fill:none;stroke:#000000;stroke-dasharray:3.7,1.6;stroke-dashoffset:0;"/>
+   </g>
+   <g id="line2d_14">
+    <path clip-path="url(#pcc5d190b33)" d="M 363.938445 270.72 
+L 363.938445 69.12 
+" style="fill:none;stroke:#000000;stroke-dasharray:3.7,1.6;stroke-dashoffset:0;"/>
+   </g>
+   <g id="line2d_15">
+    <path clip-path="url(#pcc5d190b33)" d="M 570.769222 270.72 
+L 570.769222 69.12 
+" style="fill:none;stroke:#000000;stroke-dasharray:3.7,1.6;stroke-dashoffset:0;"/>
+   </g>
+   <g id="patch_3">
+    <path d="M 108 270.72 
+L 108 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+   <g id="patch_4">
+    <path d="M 777.6 270.72 
+L 777.6 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+   <g id="patch_5">
+    <path d="M 108 270.72 
+L 777.6 270.72 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+   <g id="patch_6">
+    <path d="M 108 69.12 
+L 777.6 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+  </g>
+  <g id="axes_2">
+   <g id="patch_7">
+    <path d="M 108 512.64 
+L 777.6 512.64 
+L 777.6 311.04 
+L 108 311.04 
+z
+" style="fill:#ffffff;"/>
+   </g>
+   <g id="matplotlib.axis_3">
+    <g id="xtick_6">
+     <g id="line2d_16">
+      <path clip-path="url(#pc9fd3ddc9b)" d="M 226.05126 512.64 
+L 226.05126 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_8">
+      <!-- $\mathdefault{2^{15}}$ -->
+      <defs>
+       <path d="M 10.796875 72.90625 
+L 49.515625 72.90625 
+L 49.515625 64.59375 
+L 19.828125 64.59375 
+L 19.828125 46.734375 
+Q 21.96875 47.46875 24.109375 47.828125 
+Q 26.265625 48.1875 28.421875 48.1875 
+Q 40.625 48.1875 47.75 41.5 
+Q 54.890625 34.8125 54.890625 23.390625 
+Q 54.890625 11.625 47.5625 5.09375 
+Q 40.234375 -1.421875 26.90625 -1.421875 
+Q 22.3125 -1.421875 17.546875 -0.640625 
+Q 12.796875 0.140625 7.71875 1.703125 
+L 7.71875 11.625 
+Q 12.109375 9.234375 16.796875 8.0625 
+Q 21.484375 6.890625 26.703125 6.890625 
+Q 35.15625 6.890625 40.078125 11.328125 
+Q 45.015625 15.765625 45.015625 23.390625 
+Q 45.015625 31 40.078125 35.4375 
+Q 35.15625 39.890625 26.703125 39.890625 
+Q 22.75 39.890625 18.8125 39.015625 
+Q 14.890625 38.140625 10.796875 36.28125 
+z
+" id="DejaVuSans-53"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(217.41626 530.498281)scale(0.11 -0.11)">
+       <use transform="translate(0 0.684375)" xlink:href="#DejaVuSans-50"/>
+       <use transform="translate(64.580078 38.965625)scale(0.7)" xlink:href="#DejaVuSans-49"/>
+       <use transform="translate(109.116211 38.965625)scale(0.7)" xlink:href="#DejaVuSans-53"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_7">
+     <g id="line2d_17">
+      <path clip-path="url(#pc9fd3ddc9b)" d="M 363.938445 512.64 
+L 363.938445 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_9">
+      <!-- $\mathdefault{2^{17}}$ -->
+      <defs>
+       <path d="M 8.203125 72.90625 
+L 55.078125 72.90625 
+L 55.078125 68.703125 
+L 28.609375 0 
+L 18.3125 0 
+L 43.21875 64.59375 
+L 8.203125 64.59375 
+z
+" id="DejaVuSans-55"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(355.303445 530.498281)scale(0.11 -0.11)">
+       <use transform="translate(0 0.684375)" xlink:href="#DejaVuSans-50"/>
+       <use transform="translate(64.580078 38.965625)scale(0.7)" xlink:href="#DejaVuSans-49"/>
+       <use transform="translate(109.116211 38.965625)scale(0.7)" xlink:href="#DejaVuSans-55"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_8">
+     <g id="line2d_18">
+      <path clip-path="url(#pc9fd3ddc9b)" d="M 501.82563 512.64 
+L 501.82563 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_10">
+      <!-- $\mathdefault{2^{19}}$ -->
+      <defs>
+       <path d="M 10.984375 1.515625 
+L 10.984375 10.5 
+Q 14.703125 8.734375 18.5 7.8125 
+Q 22.3125 6.890625 25.984375 6.890625 
+Q 35.75 6.890625 40.890625 13.453125 
+Q 46.046875 20.015625 46.78125 33.40625 
+Q 43.953125 29.203125 39.59375 26.953125 
+Q 35.25 24.703125 29.984375 24.703125 
+Q 19.046875 24.703125 12.671875 31.3125 
+Q 6.296875 37.9375 6.296875 49.421875 
+Q 6.296875 60.640625 12.9375 67.421875 
+Q 19.578125 74.21875 30.609375 74.21875 
+Q 43.265625 74.21875 49.921875 64.515625 
+Q 56.59375 54.828125 56.59375 36.375 
+Q 56.59375 19.140625 48.40625 8.859375 
+Q 40.234375 -1.421875 26.421875 -1.421875 
+Q 22.703125 -1.421875 18.890625 -0.6875 
+Q 15.09375 0.046875 10.984375 1.515625 
+z
+M 30.609375 32.421875 
+Q 37.25 32.421875 41.125 36.953125 
+Q 45.015625 41.5 45.015625 49.421875 
+Q 45.015625 57.28125 41.125 61.84375 
+Q 37.25 66.40625 30.609375 66.40625 
+Q 23.96875 66.40625 20.09375 61.84375 
+Q 16.21875 57.28125 16.21875 49.421875 
+Q 16.21875 41.5 20.09375 36.953125 
+Q 23.96875 32.421875 30.609375 32.421875 
+z
+" id="DejaVuSans-57"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(493.19063 530.498281)scale(0.11 -0.11)">
+       <use transform="translate(0 0.765625)" xlink:href="#DejaVuSans-50"/>
+       <use transform="translate(64.580078 39.046875)scale(0.7)" xlink:href="#DejaVuSans-49"/>
+       <use transform="translate(109.116211 39.046875)scale(0.7)" xlink:href="#DejaVuSans-57"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_9">
+     <g id="line2d_19">
+      <path clip-path="url(#pc9fd3ddc9b)" d="M 639.712815 512.64 
+L 639.712815 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_11">
+      <!-- $\mathdefault{2^{21}}$ -->
+      <g style="fill:#262626;" transform="translate(631.077815 530.498281)scale(0.11 -0.11)">
+       <use transform="translate(0 0.765625)" xlink:href="#DejaVuSans-50"/>
+       <use transform="translate(64.580078 39.046875)scale(0.7)" xlink:href="#DejaVuSans-50"/>
+       <use transform="translate(109.116211 39.046875)scale(0.7)" xlink:href="#DejaVuSans-49"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_10">
+     <g id="line2d_20">
+      <path clip-path="url(#pc9fd3ddc9b)" d="M 777.6 512.64 
+L 777.6 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_12">
+      <!-- $\mathdefault{2^{23}}$ -->
+      <defs>
+       <path d="M 40.578125 39.3125 
+Q 47.65625 37.796875 51.625 33 
+Q 55.609375 28.21875 55.609375 21.1875 
+Q 55.609375 10.40625 48.1875 4.484375 
+Q 40.765625 -1.421875 27.09375 -1.421875 
+Q 22.515625 -1.421875 17.65625 -0.515625 
+Q 12.796875 0.390625 7.625 2.203125 
+L 7.625 11.71875 
+Q 11.71875 9.328125 16.59375 8.109375 
+Q 21.484375 6.890625 26.8125 6.890625 
+Q 36.078125 6.890625 40.9375 10.546875 
+Q 45.796875 14.203125 45.796875 21.1875 
+Q 45.796875 27.640625 41.28125 31.265625 
+Q 36.765625 34.90625 28.71875 34.90625 
+L 20.21875 34.90625 
+L 20.21875 43.015625 
+L 29.109375 43.015625 
+Q 36.375 43.015625 40.234375 45.921875 
+Q 44.09375 48.828125 44.09375 54.296875 
+Q 44.09375 59.90625 40.109375 62.90625 
+Q 36.140625 65.921875 28.71875 65.921875 
+Q 24.65625 65.921875 20.015625 65.03125 
+Q 15.375 64.15625 9.8125 62.3125 
+L 9.8125 71.09375 
+Q 15.4375 72.65625 20.34375 73.4375 
+Q 25.25 74.21875 29.59375 74.21875 
+Q 40.828125 74.21875 47.359375 69.109375 
+Q 53.90625 64.015625 53.90625 55.328125 
+Q 53.90625 49.265625 50.4375 45.09375 
+Q 46.96875 40.921875 40.578125 39.3125 
+z
+" id="DejaVuSans-51"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(768.965 530.498281)scale(0.11 -0.11)">
+       <use transform="translate(0 0.765625)" xlink:href="#DejaVuSans-50"/>
+       <use transform="translate(64.580078 39.046875)scale(0.7)" xlink:href="#DejaVuSans-50"/>
+       <use transform="translate(109.116211 39.046875)scale(0.7)" xlink:href="#DejaVuSans-51"/>
+      </g>
+     </g>
+    </g>
+    <g id="text_13">
+     <!-- # of elements -->
+     <defs>
+      <path d="M 51.125 44 
+L 36.921875 44 
+L 32.8125 27.6875 
+L 47.125 27.6875 
+z
+M 43.796875 71.78125 
+L 38.71875 51.515625 
+L 52.984375 51.515625 
+L 58.109375 71.78125 
+L 65.921875 71.78125 
+L 60.890625 51.515625 
+L 76.125 51.515625 
+L 76.125 44 
+L 58.984375 44 
+L 54.984375 27.6875 
+L 70.515625 27.6875 
+L 70.515625 20.21875 
+L 53.078125 20.21875 
+L 48 0 
+L 40.1875 0 
+L 45.21875 20.21875 
+L 30.90625 20.21875 
+L 25.875 0 
+L 18.015625 0 
+L 23.09375 20.21875 
+L 7.71875 20.21875 
+L 7.71875 27.6875 
+L 24.90625 27.6875 
+L 29 44 
+L 13.28125 44 
+L 13.28125 51.515625 
+L 30.90625 51.515625 
+L 35.890625 71.78125 
+z
+" id="DejaVuSans-35"/>
+      <path d="M 30.609375 48.390625 
+Q 23.390625 48.390625 19.1875 42.75 
+Q 14.984375 37.109375 14.984375 27.296875 
+Q 14.984375 17.484375 19.15625 11.84375 
+Q 23.34375 6.203125 30.609375 6.203125 
+Q 37.796875 6.203125 41.984375 11.859375 
+Q 46.1875 17.53125 46.1875 27.296875 
+Q 46.1875 37.015625 41.984375 42.703125 
+Q 37.796875 48.390625 30.609375 48.390625 
+z
+M 30.609375 56 
+Q 42.328125 56 49.015625 48.375 
+Q 55.71875 40.765625 55.71875 27.296875 
+Q 55.71875 13.875 49.015625 6.21875 
+Q 42.328125 -1.421875 30.609375 -1.421875 
+Q 18.84375 -1.421875 12.171875 6.21875 
+Q 5.515625 13.875 5.515625 27.296875 
+Q 5.515625 40.765625 12.171875 48.375 
+Q 18.84375 56 30.609375 56 
+z
+" id="DejaVuSans-111"/>
+      <path d="M 37.109375 75.984375 
+L 37.109375 68.5 
+L 28.515625 68.5 
+Q 23.6875 68.5 21.796875 66.546875 
+Q 19.921875 64.59375 19.921875 59.515625 
+L 19.921875 54.6875 
+L 34.71875 54.6875 
+L 34.71875 47.703125 
+L 19.921875 47.703125 
+L 19.921875 0 
+L 10.890625 0 
+L 10.890625 47.703125 
+L 2.296875 47.703125 
+L 2.296875 54.6875 
+L 10.890625 54.6875 
+L 10.890625 58.5 
+Q 10.890625 67.625 15.140625 71.796875 
+Q 19.390625 75.984375 28.609375 75.984375 
+z
+" id="DejaVuSans-102"/>
+      <path d="M 9.421875 75.984375 
+L 18.40625 75.984375 
+L 18.40625 0 
+L 9.421875 0 
+z
+" id="DejaVuSans-108"/>
+      <path d="M 52 44.1875 
+Q 55.375 50.25 60.0625 53.125 
+Q 64.75 56 71.09375 56 
+Q 79.640625 56 84.28125 50.015625 
+Q 88.921875 44.046875 88.921875 33.015625 
+L 88.921875 0 
+L 79.890625 0 
+L 79.890625 32.71875 
+Q 79.890625 40.578125 77.09375 44.375 
+Q 74.3125 48.1875 68.609375 48.1875 
+Q 61.625 48.1875 57.5625 43.546875 
+Q 53.515625 38.921875 53.515625 30.90625 
+L 53.515625 0 
+L 44.484375 0 
+L 44.484375 32.71875 
+Q 44.484375 40.625 41.703125 44.40625 
+Q 38.921875 48.1875 33.109375 48.1875 
+Q 26.21875 48.1875 22.15625 43.53125 
+Q 18.109375 38.875 18.109375 30.90625 
+L 18.109375 0 
+L 9.078125 0 
+L 9.078125 54.6875 
+L 18.109375 54.6875 
+L 18.109375 46.1875 
+Q 21.1875 51.21875 25.484375 53.609375 
+Q 29.78125 56 35.6875 56 
+Q 41.65625 56 45.828125 52.96875 
+Q 50 49.953125 52 44.1875 
+z
+" id="DejaVuSans-109"/>
+     </defs>
+     <g style="fill:#262626;" transform="translate(400.307813 545.904063)scale(0.12 -0.12)">
+      <use xlink:href="#DejaVuSans-35"/>
+      <use x="83.789062" xlink:href="#DejaVuSans-32"/>
+      <use x="115.576172" xlink:href="#DejaVuSans-111"/>
+      <use x="176.757812" xlink:href="#DejaVuSans-102"/>
+      <use x="211.962891" xlink:href="#DejaVuSans-32"/>
+      <use x="243.75" xlink:href="#DejaVuSans-101"/>
+      <use x="305.273438" xlink:href="#DejaVuSans-108"/>
+      <use x="333.056641" xlink:href="#DejaVuSans-101"/>
+      <use x="394.580078" xlink:href="#DejaVuSans-109"/>
+      <use x="491.992188" xlink:href="#DejaVuSans-101"/>
+      <use x="553.515625" xlink:href="#DejaVuSans-110"/>
+      <use x="616.894531" xlink:href="#DejaVuSans-116"/>
+      <use x="656.103516" xlink:href="#DejaVuSans-115"/>
+     </g>
+    </g>
+   </g>
+   <g id="matplotlib.axis_4">
+    <g id="ytick_7">
+     <g id="line2d_21">
+      <path clip-path="url(#pc9fd3ddc9b)" d="M 108 512.64 
+L 777.6 512.64 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_14">
+      <!-- 0 -->
+      <g style="fill:#262626;" transform="translate(91.50125 516.819141)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-48"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_8">
+     <g id="line2d_22">
+      <path clip-path="url(#pc9fd3ddc9b)" d="M 108 486.451202 
+L 777.6 486.451202 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_15">
+      <!-- 1 -->
+      <g style="fill:#262626;" transform="translate(91.50125 490.630342)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-49"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_9">
+     <g id="line2d_23">
+      <path clip-path="url(#pc9fd3ddc9b)" d="M 108 460.262404 
+L 777.6 460.262404 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_16">
+      <!-- 2 -->
+      <g style="fill:#262626;" transform="translate(91.50125 464.441544)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-50"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_10">
+     <g id="line2d_24">
+      <path clip-path="url(#pc9fd3ddc9b)" d="M 108 434.073605 
+L 777.6 434.073605 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_17">
+      <!-- 3 -->
+      <g style="fill:#262626;" transform="translate(91.50125 438.252746)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-51"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_11">
+     <g id="line2d_25">
+      <path clip-path="url(#pc9fd3ddc9b)" d="M 108 407.884807 
+L 777.6 407.884807 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_18">
+      <!-- 4 -->
+      <g style="fill:#262626;" transform="translate(91.50125 412.063948)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-52"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_12">
+     <g id="line2d_26">
+      <path clip-path="url(#pc9fd3ddc9b)" d="M 108 381.696009 
+L 777.6 381.696009 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_19">
+      <!-- 5 -->
+      <g style="fill:#262626;" transform="translate(91.50125 385.87515)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-53"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_13">
+     <g id="line2d_27">
+      <path clip-path="url(#pc9fd3ddc9b)" d="M 108 355.507211 
+L 777.6 355.507211 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_20">
+      <!-- 6 -->
+      <g style="fill:#262626;" transform="translate(91.50125 359.686351)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-54"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_14">
+     <g id="line2d_28">
+      <path clip-path="url(#pc9fd3ddc9b)" d="M 108 329.318413 
+L 777.6 329.318413 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_21">
+      <!-- 7 -->
+      <g style="fill:#262626;" transform="translate(91.50125 333.497553)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-55"/>
+      </g>
+     </g>
+    </g>
+    <g id="text_22">
+     <!-- Find (speedup) -->
+     <defs>
+      <path d="M 9.8125 72.90625 
+L 51.703125 72.90625 
+L 51.703125 64.59375 
+L 19.671875 64.59375 
+L 19.671875 43.109375 
+L 48.578125 43.109375 
+L 48.578125 34.8125 
+L 19.671875 34.8125 
+L 19.671875 0 
+L 9.8125 0 
+z
+" id="DejaVuSans-70"/>
+      <path d="M 9.421875 54.6875 
+L 18.40625 54.6875 
+L 18.40625 0 
+L 9.421875 0 
+z
+M 9.421875 75.984375 
+L 18.40625 75.984375 
+L 18.40625 64.59375 
+L 9.421875 64.59375 
+z
+" id="DejaVuSans-105"/>
+     </defs>
+     <g style="fill:#262626;" transform="translate(85.005625 456.890625)rotate(-90)scale(0.12 -0.12)">
+      <use xlink:href="#DejaVuSans-70"/>
+      <use x="57.410156" xlink:href="#DejaVuSans-105"/>
+      <use x="85.193359" xlink:href="#DejaVuSans-110"/>
+      <use x="148.572266" xlink:href="#DejaVuSans-100"/>
+      <use x="212.048828" xlink:href="#DejaVuSans-32"/>
+      <use x="243.835938" xlink:href="#DejaVuSans-40"/>
+      <use x="282.849609" xlink:href="#DejaVuSans-115"/>
+      <use x="334.949219" xlink:href="#DejaVuSans-112"/>
+      <use x="398.425781" xlink:href="#DejaVuSans-101"/>
+      <use x="459.949219" xlink:href="#DejaVuSans-101"/>
+      <use x="521.472656" xlink:href="#DejaVuSans-100"/>
+      <use x="584.949219" xlink:href="#DejaVuSans-117"/>
+      <use x="648.328125" xlink:href="#DejaVuSans-112"/>
+      <use x="711.804688" xlink:href="#DejaVuSans-41"/>
+     </g>
+    </g>
+   </g>
+   <g id="line2d_29">
+    <path clip-path="url(#pc9fd3ddc9b)" d="M 108 321.261956 
+L 123.616312 326.46679 
+L 139.225357 317.563213 
+L 154.835707 386.780429 
+L 170.450001 386.056323 
+L 186.065768 382.487576 
+L 201.67987 386.09423 
+L 217.2929 384.362409 
+L 232.909155 381.395589 
+L 248.525007 379.146773 
+L 264.139766 379.83097 
+L 279.754857 376.032162 
+L 295.370534 374.153844 
+L 310.986484 373.586917 
+L 326.602365 379.260952 
+L 342.218044 383.382031 
+L 357.834332 372.891349 
+L 373.450023 371.138784 
+L 389.066187 369.492742 
+L 404.682343 368.142329 
+L 420.298254 367.176977 
+L 435.914393 373.374254 
+L 451.530695 408.416174 
+L 467.14687 423.721431 
+L 482.763023 413.235898 
+L 498.37927 427.610464 
+L 513.99548 448.027482 
+L 529.611731 440.08634 
+L 545.228 435.413497 
+L 560.844274 427.277233 
+L 576.460557 422.899189 
+L 592.07685 414.901041 
+L 607.693122 409.667328 
+L 623.309383 420.147042 
+L 638.92567 414.484574 
+L 654.541958 413.843312 
+L 670.158255 413.331472 
+L 685.774556 421.699574 
+L 701.390864 419.221578 
+L 717.007176 420.971278 
+L 732.623475 421.062144 
+L 748.239782 416.14058 
+L 763.85609 414.141553 
+L 779.472394 411.674971 
+" style="fill:none;stroke:#8b0000;stroke-linecap:round;stroke-width:1.5;"/>
+   </g>
+   <g id="line2d_30">
+    <path clip-path="url(#pc9fd3ddc9b)" d="M 88.164075 512.64 
+L 88.164075 311.04 
+" style="fill:none;stroke:#000000;stroke-dasharray:3.7,1.6;stroke-dashoffset:0;"/>
+   </g>
+   <g id="line2d_31">
+    <path clip-path="url(#pc9fd3ddc9b)" d="M 363.938445 512.64 
+L 363.938445 311.04 
+" style="fill:none;stroke:#000000;stroke-dasharray:3.7,1.6;stroke-dashoffset:0;"/>
+   </g>
+   <g id="line2d_32">
+    <path clip-path="url(#pc9fd3ddc9b)" d="M 570.769222 512.64 
+L 570.769222 311.04 
+" style="fill:none;stroke:#000000;stroke-dasharray:3.7,1.6;stroke-dashoffset:0;"/>
+   </g>
+   <g id="patch_8">
+    <path d="M 108 512.64 
+L 108 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+   <g id="patch_9">
+    <path d="M 777.6 512.64 
+L 777.6 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+   <g id="patch_10">
+    <path d="M 108 512.64 
+L 777.6 512.64 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+   <g id="patch_11">
+    <path d="M 108 311.04 
+L 777.6 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+  </g>
+  <g id="text_23">
+   <!-- B− tree vs. absl::btree -->
+   <defs>
+    <path d="M 19.671875 34.8125 
+L 19.671875 8.109375 
+L 35.5 8.109375 
+Q 43.453125 8.109375 47.28125 11.40625 
+Q 51.125 14.703125 51.125 21.484375 
+Q 51.125 28.328125 47.28125 31.5625 
+Q 43.453125 34.8125 35.5 34.8125 
+z
+M 19.671875 64.796875 
+L 19.671875 42.828125 
+L 34.28125 42.828125 
+Q 41.5 42.828125 45.03125 45.53125 
+Q 48.578125 48.25 48.578125 53.8125 
+Q 48.578125 59.328125 45.03125 62.0625 
+Q 41.5 64.796875 34.28125 64.796875 
+z
+M 9.8125 72.90625 
+L 35.015625 72.90625 
+Q 46.296875 72.90625 52.390625 68.21875 
+Q 58.5 63.53125 58.5 54.890625 
+Q 58.5 48.1875 55.375 44.234375 
+Q 52.25 40.28125 46.1875 39.3125 
+Q 53.46875 37.75 57.5 32.78125 
+Q 61.53125 27.828125 61.53125 20.40625 
+Q 61.53125 10.640625 54.890625 5.3125 
+Q 48.25 0 35.984375 0 
+L 9.8125 0 
+z
+" id="DejaVuSans-66"/>
+    <path d="M 10.59375 35.5 
+L 73.1875 35.5 
+L 73.1875 27.203125 
+L 10.59375 27.203125 
+z
+" id="DejaVuSans-8722"/>
+    <path d="M 2.984375 54.6875 
+L 12.5 54.6875 
+L 29.59375 8.796875 
+L 46.6875 54.6875 
+L 56.203125 54.6875 
+L 35.6875 0 
+L 23.484375 0 
+z
+" id="DejaVuSans-118"/>
+    <path d="M 34.28125 27.484375 
+Q 23.390625 27.484375 19.1875 25 
+Q 14.984375 22.515625 14.984375 16.5 
+Q 14.984375 11.71875 18.140625 8.90625 
+Q 21.296875 6.109375 26.703125 6.109375 
+Q 34.1875 6.109375 38.703125 11.40625 
+Q 43.21875 16.703125 43.21875 25.484375 
+L 43.21875 27.484375 
+z
+M 52.203125 31.203125 
+L 52.203125 0 
+L 43.21875 0 
+L 43.21875 8.296875 
+Q 40.140625 3.328125 35.546875 0.953125 
+Q 30.953125 -1.421875 24.3125 -1.421875 
+Q 15.921875 -1.421875 10.953125 3.296875 
+Q 6 8.015625 6 15.921875 
+Q 6 25.140625 12.171875 29.828125 
+Q 18.359375 34.515625 30.609375 34.515625 
+L 43.21875 34.515625 
+L 43.21875 35.40625 
+Q 43.21875 41.609375 39.140625 45 
+Q 35.0625 48.390625 27.6875 48.390625 
+Q 23 48.390625 18.546875 47.265625 
+Q 14.109375 46.140625 10.015625 43.890625 
+L 10.015625 52.203125 
+Q 14.9375 54.109375 19.578125 55.046875 
+Q 24.21875 56 28.609375 56 
+Q 40.484375 56 46.34375 49.84375 
+Q 52.203125 43.703125 52.203125 31.203125 
+z
+" id="DejaVuSans-97"/>
+    <path d="M 48.6875 27.296875 
+Q 48.6875 37.203125 44.609375 42.84375 
+Q 40.53125 48.484375 33.40625 48.484375 
+Q 26.265625 48.484375 22.1875 42.84375 
+Q 18.109375 37.203125 18.109375 27.296875 
+Q 18.109375 17.390625 22.1875 11.75 
+Q 26.265625 6.109375 33.40625 6.109375 
+Q 40.53125 6.109375 44.609375 11.75 
+Q 48.6875 17.390625 48.6875 27.296875 
+z
+M 18.109375 46.390625 
+Q 20.953125 51.265625 25.265625 53.625 
+Q 29.59375 56 35.59375 56 
+Q 45.5625 56 51.78125 48.09375 
+Q 58.015625 40.1875 58.015625 27.296875 
+Q 58.015625 14.40625 51.78125 6.484375 
+Q 45.5625 -1.421875 35.59375 -1.421875 
+Q 29.59375 -1.421875 25.265625 0.953125 
+Q 20.953125 3.328125 18.109375 8.203125 
+L 18.109375 0 
+L 9.078125 0 
+L 9.078125 75.984375 
+L 18.109375 75.984375 
+z
+" id="DejaVuSans-98"/>
+    <path d="M 11.71875 12.40625 
+L 22.015625 12.40625 
+L 22.015625 0 
+L 11.71875 0 
+z
+M 11.71875 51.703125 
+L 22.015625 51.703125 
+L 22.015625 39.3125 
+L 11.71875 39.3125 
+z
+" id="DejaVuSans-58"/>
+   </defs>
+   <g style="fill:#262626;" transform="translate(350.424 39.74175)scale(0.144 -0.144)">
+    <use xlink:href="#DejaVuSans-66"/>
+    <use x="68.603516" xlink:href="#DejaVuSans-8722"/>
+    <use x="152.392578" xlink:href="#DejaVuSans-32"/>
+    <use x="184.179688" xlink:href="#DejaVuSans-116"/>
+    <use x="223.388672" xlink:href="#DejaVuSans-114"/>
+    <use x="264.470703" xlink:href="#DejaVuSans-101"/>
+    <use x="325.994141" xlink:href="#DejaVuSans-101"/>
+    <use x="387.517578" xlink:href="#DejaVuSans-32"/>
+    <use x="419.304688" xlink:href="#DejaVuSans-118"/>
+    <use x="478.484375" xlink:href="#DejaVuSans-115"/>
+    <use x="530.583984" xlink:href="#DejaVuSans-46"/>
+    <use x="562.371094" xlink:href="#DejaVuSans-32"/>
+    <use x="594.158203" xlink:href="#DejaVuSans-97"/>
+    <use x="655.4375" xlink:href="#DejaVuSans-98"/>
+    <use x="718.914062" xlink:href="#DejaVuSans-115"/>
+    <use x="771.013672" xlink:href="#DejaVuSans-108"/>
+    <use x="798.796875" xlink:href="#DejaVuSans-58"/>
+    <use x="832.488281" xlink:href="#DejaVuSans-58"/>
+    <use x="866.179688" xlink:href="#DejaVuSans-98"/>
+    <use x="929.65625" xlink:href="#DejaVuSans-116"/>
+    <use x="968.865234" xlink:href="#DejaVuSans-114"/>
+    <use x="1009.947266" xlink:href="#DejaVuSans-101"/>
+    <use x="1071.470703" xlink:href="#DejaVuSans-101"/>
+   </g>
+  </g>
+ </g>
+ <defs>
+  <clipPath id="pcc5d190b33">
+   <rect height="201.6" width="669.6" x="108" y="69.12"/>
+  </clipPath>
+  <clipPath id="pc9fd3ddc9b">
+   <rect height="201.6" width="669.6" x="108" y="311.04"/>
+  </clipPath>
+ </defs>
+</svg>
diff --git a/content/english/hpc/data-structures/img/btree-absolute.svg b/content/english/hpc/data-structures/img/btree-absolute.svg
new file mode 100644
index 00000000..6709908f
--- /dev/null
+++ b/content/english/hpc/data-structures/img/btree-absolute.svg
@@ -0,0 +1,1430 @@
+<?xml version="1.0" encoding="utf-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+  "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<!-- Created with matplotlib (https://matplotlib.org/) -->
+<svg height="576pt" version="1.1" viewBox="0 0 864 576" width="864pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+ <defs>
+  <style type="text/css">
+*{stroke-linecap:butt;stroke-linejoin:round;}
+  </style>
+ </defs>
+ <g id="figure_1">
+  <g id="patch_1">
+   <path d="M 0 576 
+L 864 576 
+L 864 0 
+L 0 0 
+z
+" style="fill:#ffffff;"/>
+  </g>
+  <g id="axes_1">
+   <g id="patch_2">
+    <path d="M 108 270.72 
+L 777.6 270.72 
+L 777.6 69.12 
+L 108 69.12 
+z
+" style="fill:#ffffff;"/>
+   </g>
+   <g id="matplotlib.axis_1">
+    <g id="xtick_1">
+     <g id="line2d_1">
+      <path clip-path="url(#padbb704ba3)" d="M 226.05126 270.72 
+L 226.05126 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+    </g>
+    <g id="xtick_2">
+     <g id="line2d_2">
+      <path clip-path="url(#padbb704ba3)" d="M 363.938445 270.72 
+L 363.938445 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+    </g>
+    <g id="xtick_3">
+     <g id="line2d_3">
+      <path clip-path="url(#padbb704ba3)" d="M 501.82563 270.72 
+L 501.82563 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+    </g>
+    <g id="xtick_4">
+     <g id="line2d_4">
+      <path clip-path="url(#padbb704ba3)" d="M 639.712815 270.72 
+L 639.712815 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+    </g>
+    <g id="xtick_5">
+     <g id="line2d_5">
+      <path clip-path="url(#padbb704ba3)" d="M 777.6 270.72 
+L 777.6 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+    </g>
+   </g>
+   <g id="matplotlib.axis_2">
+    <g id="ytick_1">
+     <g id="line2d_6">
+      <path clip-path="url(#padbb704ba3)" d="M 108 270.72 
+L 777.6 270.72 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_1">
+      <!-- 0 -->
+      <defs>
+       <path d="M 31.78125 66.40625 
+Q 24.171875 66.40625 20.328125 58.90625 
+Q 16.5 51.421875 16.5 36.375 
+Q 16.5 21.390625 20.328125 13.890625 
+Q 24.171875 6.390625 31.78125 6.390625 
+Q 39.453125 6.390625 43.28125 13.890625 
+Q 47.125 21.390625 47.125 36.375 
+Q 47.125 51.421875 43.28125 58.90625 
+Q 39.453125 66.40625 31.78125 66.40625 
+z
+M 31.78125 74.21875 
+Q 44.046875 74.21875 50.515625 64.515625 
+Q 56.984375 54.828125 56.984375 36.375 
+Q 56.984375 17.96875 50.515625 8.265625 
+Q 44.046875 -1.421875 31.78125 -1.421875 
+Q 19.53125 -1.421875 13.0625 8.265625 
+Q 6.59375 17.96875 6.59375 36.375 
+Q 6.59375 54.828125 13.0625 64.515625 
+Q 19.53125 74.21875 31.78125 74.21875 
+z
+" id="DejaVuSans-48"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(91.50125 274.899141)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-48"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_2">
+     <g id="line2d_7">
+      <path clip-path="url(#padbb704ba3)" d="M 108 229.442874 
+L 777.6 229.442874 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_2">
+      <!-- 500 -->
+      <defs>
+       <path d="M 10.796875 72.90625 
+L 49.515625 72.90625 
+L 49.515625 64.59375 
+L 19.828125 64.59375 
+L 19.828125 46.734375 
+Q 21.96875 47.46875 24.109375 47.828125 
+Q 26.265625 48.1875 28.421875 48.1875 
+Q 40.625 48.1875 47.75 41.5 
+Q 54.890625 34.8125 54.890625 23.390625 
+Q 54.890625 11.625 47.5625 5.09375 
+Q 40.234375 -1.421875 26.90625 -1.421875 
+Q 22.3125 -1.421875 17.546875 -0.640625 
+Q 12.796875 0.140625 7.71875 1.703125 
+L 7.71875 11.625 
+Q 12.109375 9.234375 16.796875 8.0625 
+Q 21.484375 6.890625 26.703125 6.890625 
+Q 35.15625 6.890625 40.078125 11.328125 
+Q 45.015625 15.765625 45.015625 23.390625 
+Q 45.015625 31 40.078125 35.4375 
+Q 35.15625 39.890625 26.703125 39.890625 
+Q 22.75 39.890625 18.8125 39.015625 
+Q 14.890625 38.140625 10.796875 36.28125 
+z
+" id="DejaVuSans-53"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(77.50375 233.622015)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-53"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-48"/>
+       <use x="127.246094" xlink:href="#DejaVuSans-48"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_3">
+     <g id="line2d_8">
+      <path clip-path="url(#padbb704ba3)" d="M 108 188.165749 
+L 777.6 188.165749 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_3">
+      <!-- 1000 -->
+      <defs>
+       <path d="M 12.40625 8.296875 
+L 28.515625 8.296875 
+L 28.515625 63.921875 
+L 10.984375 60.40625 
+L 10.984375 69.390625 
+L 28.421875 72.90625 
+L 38.28125 72.90625 
+L 38.28125 8.296875 
+L 54.390625 8.296875 
+L 54.390625 0 
+L 12.40625 0 
+z
+" id="DejaVuSans-49"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(70.505 192.344889)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-49"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-48"/>
+       <use x="127.246094" xlink:href="#DejaVuSans-48"/>
+       <use x="190.869141" xlink:href="#DejaVuSans-48"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_4">
+     <g id="line2d_9">
+      <path clip-path="url(#padbb704ba3)" d="M 108 146.888623 
+L 777.6 146.888623 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_4">
+      <!-- 1500 -->
+      <g style="fill:#262626;" transform="translate(70.505 151.067763)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-49"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-53"/>
+       <use x="127.246094" xlink:href="#DejaVuSans-48"/>
+       <use x="190.869141" xlink:href="#DejaVuSans-48"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_5">
+     <g id="line2d_10">
+      <path clip-path="url(#padbb704ba3)" d="M 108 105.611497 
+L 777.6 105.611497 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_5">
+      <!-- 2000 -->
+      <defs>
+       <path d="M 19.1875 8.296875 
+L 53.609375 8.296875 
+L 53.609375 0 
+L 7.328125 0 
+L 7.328125 8.296875 
+Q 12.9375 14.109375 22.625 23.890625 
+Q 32.328125 33.6875 34.8125 36.53125 
+Q 39.546875 41.84375 41.421875 45.53125 
+Q 43.3125 49.21875 43.3125 52.78125 
+Q 43.3125 58.59375 39.234375 62.25 
+Q 35.15625 65.921875 28.609375 65.921875 
+Q 23.96875 65.921875 18.8125 64.3125 
+Q 13.671875 62.703125 7.8125 59.421875 
+L 7.8125 69.390625 
+Q 13.765625 71.78125 18.9375 73 
+Q 24.125 74.21875 28.421875 74.21875 
+Q 39.75 74.21875 46.484375 68.546875 
+Q 53.21875 62.890625 53.21875 53.421875 
+Q 53.21875 48.921875 51.53125 44.890625 
+Q 49.859375 40.875 45.40625 35.40625 
+Q 44.1875 33.984375 37.640625 27.21875 
+Q 31.109375 20.453125 19.1875 8.296875 
+z
+" id="DejaVuSans-50"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(70.505 109.790638)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-50"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-48"/>
+       <use x="127.246094" xlink:href="#DejaVuSans-48"/>
+       <use x="190.869141" xlink:href="#DejaVuSans-48"/>
+      </g>
+     </g>
+    </g>
+    <g id="text_6">
+     <!-- Insert (ns) -->
+     <defs>
+      <path d="M 9.8125 72.90625 
+L 19.671875 72.90625 
+L 19.671875 0 
+L 9.8125 0 
+z
+" id="DejaVuSans-73"/>
+      <path d="M 54.890625 33.015625 
+L 54.890625 0 
+L 45.90625 0 
+L 45.90625 32.71875 
+Q 45.90625 40.484375 42.875 44.328125 
+Q 39.84375 48.1875 33.796875 48.1875 
+Q 26.515625 48.1875 22.3125 43.546875 
+Q 18.109375 38.921875 18.109375 30.90625 
+L 18.109375 0 
+L 9.078125 0 
+L 9.078125 54.6875 
+L 18.109375 54.6875 
+L 18.109375 46.1875 
+Q 21.34375 51.125 25.703125 53.5625 
+Q 30.078125 56 35.796875 56 
+Q 45.21875 56 50.046875 50.171875 
+Q 54.890625 44.34375 54.890625 33.015625 
+z
+" id="DejaVuSans-110"/>
+      <path d="M 44.28125 53.078125 
+L 44.28125 44.578125 
+Q 40.484375 46.53125 36.375 47.5 
+Q 32.28125 48.484375 27.875 48.484375 
+Q 21.1875 48.484375 17.84375 46.4375 
+Q 14.5 44.390625 14.5 40.28125 
+Q 14.5 37.15625 16.890625 35.375 
+Q 19.28125 33.59375 26.515625 31.984375 
+L 29.59375 31.296875 
+Q 39.15625 29.25 43.1875 25.515625 
+Q 47.21875 21.78125 47.21875 15.09375 
+Q 47.21875 7.46875 41.1875 3.015625 
+Q 35.15625 -1.421875 24.609375 -1.421875 
+Q 20.21875 -1.421875 15.453125 -0.5625 
+Q 10.6875 0.296875 5.421875 2 
+L 5.421875 11.28125 
+Q 10.40625 8.6875 15.234375 7.390625 
+Q 20.0625 6.109375 24.8125 6.109375 
+Q 31.15625 6.109375 34.5625 8.28125 
+Q 37.984375 10.453125 37.984375 14.40625 
+Q 37.984375 18.0625 35.515625 20.015625 
+Q 33.0625 21.96875 24.703125 23.78125 
+L 21.578125 24.515625 
+Q 13.234375 26.265625 9.515625 29.90625 
+Q 5.8125 33.546875 5.8125 39.890625 
+Q 5.8125 47.609375 11.28125 51.796875 
+Q 16.75 56 26.8125 56 
+Q 31.78125 56 36.171875 55.265625 
+Q 40.578125 54.546875 44.28125 53.078125 
+z
+" id="DejaVuSans-115"/>
+      <path d="M 56.203125 29.59375 
+L 56.203125 25.203125 
+L 14.890625 25.203125 
+Q 15.484375 15.921875 20.484375 11.0625 
+Q 25.484375 6.203125 34.421875 6.203125 
+Q 39.59375 6.203125 44.453125 7.46875 
+Q 49.3125 8.734375 54.109375 11.28125 
+L 54.109375 2.78125 
+Q 49.265625 0.734375 44.1875 -0.34375 
+Q 39.109375 -1.421875 33.890625 -1.421875 
+Q 20.796875 -1.421875 13.15625 6.1875 
+Q 5.515625 13.8125 5.515625 26.8125 
+Q 5.515625 40.234375 12.765625 48.109375 
+Q 20.015625 56 32.328125 56 
+Q 43.359375 56 49.78125 48.890625 
+Q 56.203125 41.796875 56.203125 29.59375 
+z
+M 47.21875 32.234375 
+Q 47.125 39.59375 43.09375 43.984375 
+Q 39.0625 48.390625 32.421875 48.390625 
+Q 24.90625 48.390625 20.390625 44.140625 
+Q 15.875 39.890625 15.1875 32.171875 
+z
+" id="DejaVuSans-101"/>
+      <path d="M 41.109375 46.296875 
+Q 39.59375 47.171875 37.8125 47.578125 
+Q 36.03125 48 33.890625 48 
+Q 26.265625 48 22.1875 43.046875 
+Q 18.109375 38.09375 18.109375 28.8125 
+L 18.109375 0 
+L 9.078125 0 
+L 9.078125 54.6875 
+L 18.109375 54.6875 
+L 18.109375 46.1875 
+Q 20.953125 51.171875 25.484375 53.578125 
+Q 30.03125 56 36.53125 56 
+Q 37.453125 56 38.578125 55.875 
+Q 39.703125 55.765625 41.0625 55.515625 
+z
+" id="DejaVuSans-114"/>
+      <path d="M 18.3125 70.21875 
+L 18.3125 54.6875 
+L 36.8125 54.6875 
+L 36.8125 47.703125 
+L 18.3125 47.703125 
+L 18.3125 18.015625 
+Q 18.3125 11.328125 20.140625 9.421875 
+Q 21.96875 7.515625 27.59375 7.515625 
+L 36.8125 7.515625 
+L 36.8125 0 
+L 27.59375 0 
+Q 17.1875 0 13.234375 3.875 
+Q 9.28125 7.765625 9.28125 18.015625 
+L 9.28125 47.703125 
+L 2.6875 47.703125 
+L 2.6875 54.6875 
+L 9.28125 54.6875 
+L 9.28125 70.21875 
+z
+" id="DejaVuSans-116"/>
+      <path id="DejaVuSans-32"/>
+      <path d="M 31 75.875 
+Q 24.46875 64.65625 21.28125 53.65625 
+Q 18.109375 42.671875 18.109375 31.390625 
+Q 18.109375 20.125 21.3125 9.0625 
+Q 24.515625 -2 31 -13.1875 
+L 23.1875 -13.1875 
+Q 15.875 -1.703125 12.234375 9.375 
+Q 8.59375 20.453125 8.59375 31.390625 
+Q 8.59375 42.28125 12.203125 53.3125 
+Q 15.828125 64.359375 23.1875 75.875 
+z
+" id="DejaVuSans-40"/>
+      <path d="M 8.015625 75.875 
+L 15.828125 75.875 
+Q 23.140625 64.359375 26.78125 53.3125 
+Q 30.421875 42.28125 30.421875 31.390625 
+Q 30.421875 20.453125 26.78125 9.375 
+Q 23.140625 -1.703125 15.828125 -13.1875 
+L 8.015625 -13.1875 
+Q 14.5 -2 17.703125 9.0625 
+Q 20.90625 20.125 20.90625 31.390625 
+Q 20.90625 42.671875 17.703125 53.65625 
+Q 14.5 64.65625 8.015625 75.875 
+z
+" id="DejaVuSans-41"/>
+     </defs>
+     <g style="fill:#262626;" transform="translate(64.009375 200.645625)rotate(-90)scale(0.12 -0.12)">
+      <use xlink:href="#DejaVuSans-73"/>
+      <use x="29.492188" xlink:href="#DejaVuSans-110"/>
+      <use x="92.871094" xlink:href="#DejaVuSans-115"/>
+      <use x="144.970703" xlink:href="#DejaVuSans-101"/>
+      <use x="206.494141" xlink:href="#DejaVuSans-114"/>
+      <use x="247.607422" xlink:href="#DejaVuSans-116"/>
+      <use x="286.816406" xlink:href="#DejaVuSans-32"/>
+      <use x="318.603516" xlink:href="#DejaVuSans-40"/>
+      <use x="357.617188" xlink:href="#DejaVuSans-110"/>
+      <use x="420.996094" xlink:href="#DejaVuSans-115"/>
+      <use x="473.095703" xlink:href="#DejaVuSans-41"/>
+     </g>
+    </g>
+   </g>
+   <g id="line2d_11">
+    <path clip-path="url(#padbb704ba3)" d="M 108 254.927372 
+L 123.616312 250.858273 
+L 139.225357 245.638367 
+L 154.835707 248.999151 
+L 170.450001 241.362057 
+L 186.065768 246.899796 
+L 201.67987 247.76579 
+L 217.2929 244.887124 
+L 232.909155 240.893149 
+L 248.525007 231.711465 
+L 264.139766 230.033963 
+L 279.754857 222.252399 
+L 295.370534 220.278527 
+L 310.986484 227.947817 
+L 326.602365 219.499215 
+L 342.218044 217.47581 
+L 357.834332 220.52619 
+L 373.450023 210.129307 
+L 389.066187 202.900031 
+L 404.682343 203.536525 
+L 420.298254 193.557367 
+L 435.914393 193.615155 
+L 451.530695 192.079646 
+L 467.14687 185.341568 
+L 482.763023 185.157472 
+L 498.37927 181.447484 
+L 513.99548 176.511565 
+L 529.611731 170.182956 
+L 545.228 165.614404 
+L 560.844274 162.241237 
+L 576.460557 160.764341 
+L 592.07685 150.371587 
+L 607.693122 135.678581 
+L 623.309383 144.219644 
+L 638.92567 140.836571 
+L 654.541958 132.509323 
+L 670.158255 120.765155 
+L 685.774556 118.399151 
+L 701.390864 114.99709 
+L 717.007176 103.387486 
+L 732.623475 100.195938 
+L 748.239782 92.768532 
+L 763.85609 95.378072 
+L 779.472394 78.671568 
+" style="fill:none;stroke:#006400;stroke-linecap:round;stroke-width:1.5;"/>
+   </g>
+   <g id="line2d_12">
+    <path clip-path="url(#padbb704ba3)" d="M 108 257.527831 
+L 123.616312 256.00553 
+L 139.225357 256.268878 
+L 154.835707 255.067714 
+L 170.450001 254.979381 
+L 186.065768 255.375641 
+L 201.67987 255.60927 
+L 217.2929 255.11477 
+L 232.909155 255.458196 
+L 248.525007 254.895176 
+L 264.139766 255.443336 
+L 279.754857 255.207231 
+L 295.370534 254.300785 
+L 310.986484 254.390769 
+L 326.602365 253.779042 
+L 342.218044 253.424884 
+L 357.834332 252.366539 
+L 373.450023 251.947989 
+L 389.066187 252.268299 
+L 404.682343 251.377539 
+L 420.298254 249.853587 
+L 435.914393 248.378343 
+L 451.530695 247.364577 
+L 467.14687 246.658738 
+L 482.763023 247.296057 
+L 498.37927 247.264686 
+L 513.99548 245.318882 
+L 529.611731 245.449318 
+L 545.228 244.322453 
+L 560.844274 242.066245 
+L 576.460557 240.823803 
+L 592.07685 239.529353 
+L 607.693122 236.475671 
+L 623.309383 236.673801 
+L 638.92567 236.095921 
+L 654.541958 232.834203 
+L 670.158255 231.472883 
+L 685.774556 232.356214 
+L 701.390864 231.156701 
+L 717.007176 229.476722 
+L 732.623475 226.721886 
+L 748.239782 226.579067 
+L 763.85609 221.442542 
+L 779.472394 220.783759 
+" style="fill:none;stroke:#8b0000;stroke-linecap:round;stroke-width:1.5;"/>
+   </g>
+   <g id="line2d_13">
+    <path clip-path="url(#padbb704ba3)" d="M 108 264.544942 
+L 123.616312 264.309662 
+L 139.225357 264.366625 
+L 154.835707 263.621985 
+L 170.450001 263.198482 
+L 186.065768 263.306628 
+L 201.67987 263.275258 
+L 217.2929 263.257921 
+L 232.909155 262.902938 
+L 248.525007 263.056489 
+L 264.139766 263.034199 
+L 279.754857 262.22104 
+L 295.370534 262.972283 
+L 310.986484 262.75434 
+L 326.602365 262.25241 
+L 342.218044 261.757085 
+L 357.834332 261.469796 
+L 373.450023 261.55235 
+L 389.066187 260.834128 
+L 404.682343 260.314037 
+L 420.298254 260.625266 
+L 435.914393 259.935113 
+L 451.530695 257.972798 
+L 467.14687 256.515715 
+L 482.763023 254.635955 
+L 498.37927 253.727858 
+L 513.99548 251.086948 
+L 529.611731 248.196724 
+L 545.228 248.89926 
+L 560.844274 250.555299 
+L 576.460557 249.709117 
+L 592.07685 248.597937 
+L 607.693122 248.284231 
+L 623.309383 246.898971 
+L 638.92567 247.64361 
+L 654.541958 246.638925 
+L 670.158255 246.004083 
+L 685.774556 244.19697 
+L 701.390864 243.5192 
+L 717.007176 243.109731 
+L 732.623475 241.569268 
+L 748.239782 241.236575 
+L 763.85609 240.264086 
+L 779.472394 240.789131 
+" style="fill:none;stroke:#00008b;stroke-linecap:round;stroke-width:1.5;"/>
+   </g>
+   <g id="line2d_14">
+    <path clip-path="url(#padbb704ba3)" d="M 88.164075 270.72 
+L 88.164075 69.12 
+" style="fill:none;stroke:#000000;stroke-dasharray:3.7,1.6;stroke-dashoffset:0;"/>
+   </g>
+   <g id="line2d_15">
+    <path clip-path="url(#padbb704ba3)" d="M 363.938445 270.72 
+L 363.938445 69.12 
+" style="fill:none;stroke:#000000;stroke-dasharray:3.7,1.6;stroke-dashoffset:0;"/>
+   </g>
+   <g id="line2d_16">
+    <path clip-path="url(#padbb704ba3)" d="M 570.769222 270.72 
+L 570.769222 69.12 
+" style="fill:none;stroke:#000000;stroke-dasharray:3.7,1.6;stroke-dashoffset:0;"/>
+   </g>
+   <g id="patch_3">
+    <path d="M 108 270.72 
+L 108 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+   <g id="patch_4">
+    <path d="M 777.6 270.72 
+L 777.6 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+   <g id="patch_5">
+    <path d="M 108 270.72 
+L 777.6 270.72 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+   <g id="patch_6">
+    <path d="M 108 69.12 
+L 777.6 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+   <g id="legend_1">
+    <g id="patch_7">
+     <path d="M 115.7 126.357813 
+L 256.135625 126.357813 
+Q 258.335625 126.357813 258.335625 124.157813 
+L 258.335625 76.82 
+Q 258.335625 74.62 256.135625 74.62 
+L 115.7 74.62 
+Q 113.5 74.62 113.5 76.82 
+L 113.5 124.157813 
+Q 113.5 126.357813 115.7 126.357813 
+z
+" style="fill:#ffffff;opacity:0.8;stroke:#cccccc;stroke-linejoin:miter;"/>
+    </g>
+    <g id="line2d_17">
+     <path d="M 117.9 83.528281 
+L 139.9 83.528281 
+" style="fill:none;stroke:#006400;stroke-linecap:round;stroke-width:1.5;"/>
+    </g>
+    <g id="line2d_18"/>
+    <g id="text_7">
+     <!-- RB tree (std::set) -->
+     <defs>
+      <path d="M 44.390625 34.1875 
+Q 47.5625 33.109375 50.5625 29.59375 
+Q 53.5625 26.078125 56.59375 19.921875 
+L 66.609375 0 
+L 56 0 
+L 46.6875 18.703125 
+Q 43.0625 26.03125 39.671875 28.421875 
+Q 36.28125 30.8125 30.421875 30.8125 
+L 19.671875 30.8125 
+L 19.671875 0 
+L 9.8125 0 
+L 9.8125 72.90625 
+L 32.078125 72.90625 
+Q 44.578125 72.90625 50.734375 67.671875 
+Q 56.890625 62.453125 56.890625 51.90625 
+Q 56.890625 45.015625 53.6875 40.46875 
+Q 50.484375 35.9375 44.390625 34.1875 
+z
+M 19.671875 64.796875 
+L 19.671875 38.921875 
+L 32.078125 38.921875 
+Q 39.203125 38.921875 42.84375 42.21875 
+Q 46.484375 45.515625 46.484375 51.90625 
+Q 46.484375 58.296875 42.84375 61.546875 
+Q 39.203125 64.796875 32.078125 64.796875 
+z
+" id="DejaVuSans-82"/>
+      <path d="M 19.671875 34.8125 
+L 19.671875 8.109375 
+L 35.5 8.109375 
+Q 43.453125 8.109375 47.28125 11.40625 
+Q 51.125 14.703125 51.125 21.484375 
+Q 51.125 28.328125 47.28125 31.5625 
+Q 43.453125 34.8125 35.5 34.8125 
+z
+M 19.671875 64.796875 
+L 19.671875 42.828125 
+L 34.28125 42.828125 
+Q 41.5 42.828125 45.03125 45.53125 
+Q 48.578125 48.25 48.578125 53.8125 
+Q 48.578125 59.328125 45.03125 62.0625 
+Q 41.5 64.796875 34.28125 64.796875 
+z
+M 9.8125 72.90625 
+L 35.015625 72.90625 
+Q 46.296875 72.90625 52.390625 68.21875 
+Q 58.5 63.53125 58.5 54.890625 
+Q 58.5 48.1875 55.375 44.234375 
+Q 52.25 40.28125 46.1875 39.3125 
+Q 53.46875 37.75 57.5 32.78125 
+Q 61.53125 27.828125 61.53125 20.40625 
+Q 61.53125 10.640625 54.890625 5.3125 
+Q 48.25 0 35.984375 0 
+L 9.8125 0 
+z
+" id="DejaVuSans-66"/>
+      <path d="M 45.40625 46.390625 
+L 45.40625 75.984375 
+L 54.390625 75.984375 
+L 54.390625 0 
+L 45.40625 0 
+L 45.40625 8.203125 
+Q 42.578125 3.328125 38.25 0.953125 
+Q 33.9375 -1.421875 27.875 -1.421875 
+Q 17.96875 -1.421875 11.734375 6.484375 
+Q 5.515625 14.40625 5.515625 27.296875 
+Q 5.515625 40.1875 11.734375 48.09375 
+Q 17.96875 56 27.875 56 
+Q 33.9375 56 38.25 53.625 
+Q 42.578125 51.265625 45.40625 46.390625 
+z
+M 14.796875 27.296875 
+Q 14.796875 17.390625 18.875 11.75 
+Q 22.953125 6.109375 30.078125 6.109375 
+Q 37.203125 6.109375 41.296875 11.75 
+Q 45.40625 17.390625 45.40625 27.296875 
+Q 45.40625 37.203125 41.296875 42.84375 
+Q 37.203125 48.484375 30.078125 48.484375 
+Q 22.953125 48.484375 18.875 42.84375 
+Q 14.796875 37.203125 14.796875 27.296875 
+z
+" id="DejaVuSans-100"/>
+      <path d="M 11.71875 12.40625 
+L 22.015625 12.40625 
+L 22.015625 0 
+L 11.71875 0 
+z
+M 11.71875 51.703125 
+L 22.015625 51.703125 
+L 22.015625 39.3125 
+L 11.71875 39.3125 
+z
+" id="DejaVuSans-58"/>
+     </defs>
+     <g style="fill:#262626;" transform="translate(148.7 87.378281)scale(0.11 -0.11)">
+      <use xlink:href="#DejaVuSans-82"/>
+      <use x="69.482422" xlink:href="#DejaVuSans-66"/>
+      <use x="138.085938" xlink:href="#DejaVuSans-32"/>
+      <use x="169.873047" xlink:href="#DejaVuSans-116"/>
+      <use x="209.082031" xlink:href="#DejaVuSans-114"/>
+      <use x="250.164062" xlink:href="#DejaVuSans-101"/>
+      <use x="311.6875" xlink:href="#DejaVuSans-101"/>
+      <use x="373.210938" xlink:href="#DejaVuSans-32"/>
+      <use x="404.998047" xlink:href="#DejaVuSans-40"/>
+      <use x="444.011719" xlink:href="#DejaVuSans-115"/>
+      <use x="496.111328" xlink:href="#DejaVuSans-116"/>
+      <use x="535.320312" xlink:href="#DejaVuSans-100"/>
+      <use x="598.796875" xlink:href="#DejaVuSans-58"/>
+      <use x="632.488281" xlink:href="#DejaVuSans-58"/>
+      <use x="666.179688" xlink:href="#DejaVuSans-115"/>
+      <use x="718.279297" xlink:href="#DejaVuSans-101"/>
+      <use x="779.802734" xlink:href="#DejaVuSans-116"/>
+      <use x="819.011719" xlink:href="#DejaVuSans-41"/>
+     </g>
+    </g>
+    <g id="line2d_19">
+     <path d="M 117.9 99.674219 
+L 139.9 99.674219 
+" style="fill:none;stroke:#8b0000;stroke-linecap:round;stroke-width:1.5;"/>
+    </g>
+    <g id="line2d_20"/>
+    <g id="text_8">
+     <!-- B-tree (absl::btree) -->
+     <defs>
+      <path d="M 4.890625 31.390625 
+L 31.203125 31.390625 
+L 31.203125 23.390625 
+L 4.890625 23.390625 
+z
+" id="DejaVuSans-45"/>
+      <path d="M 34.28125 27.484375 
+Q 23.390625 27.484375 19.1875 25 
+Q 14.984375 22.515625 14.984375 16.5 
+Q 14.984375 11.71875 18.140625 8.90625 
+Q 21.296875 6.109375 26.703125 6.109375 
+Q 34.1875 6.109375 38.703125 11.40625 
+Q 43.21875 16.703125 43.21875 25.484375 
+L 43.21875 27.484375 
+z
+M 52.203125 31.203125 
+L 52.203125 0 
+L 43.21875 0 
+L 43.21875 8.296875 
+Q 40.140625 3.328125 35.546875 0.953125 
+Q 30.953125 -1.421875 24.3125 -1.421875 
+Q 15.921875 -1.421875 10.953125 3.296875 
+Q 6 8.015625 6 15.921875 
+Q 6 25.140625 12.171875 29.828125 
+Q 18.359375 34.515625 30.609375 34.515625 
+L 43.21875 34.515625 
+L 43.21875 35.40625 
+Q 43.21875 41.609375 39.140625 45 
+Q 35.0625 48.390625 27.6875 48.390625 
+Q 23 48.390625 18.546875 47.265625 
+Q 14.109375 46.140625 10.015625 43.890625 
+L 10.015625 52.203125 
+Q 14.9375 54.109375 19.578125 55.046875 
+Q 24.21875 56 28.609375 56 
+Q 40.484375 56 46.34375 49.84375 
+Q 52.203125 43.703125 52.203125 31.203125 
+z
+" id="DejaVuSans-97"/>
+      <path d="M 48.6875 27.296875 
+Q 48.6875 37.203125 44.609375 42.84375 
+Q 40.53125 48.484375 33.40625 48.484375 
+Q 26.265625 48.484375 22.1875 42.84375 
+Q 18.109375 37.203125 18.109375 27.296875 
+Q 18.109375 17.390625 22.1875 11.75 
+Q 26.265625 6.109375 33.40625 6.109375 
+Q 40.53125 6.109375 44.609375 11.75 
+Q 48.6875 17.390625 48.6875 27.296875 
+z
+M 18.109375 46.390625 
+Q 20.953125 51.265625 25.265625 53.625 
+Q 29.59375 56 35.59375 56 
+Q 45.5625 56 51.78125 48.09375 
+Q 58.015625 40.1875 58.015625 27.296875 
+Q 58.015625 14.40625 51.78125 6.484375 
+Q 45.5625 -1.421875 35.59375 -1.421875 
+Q 29.59375 -1.421875 25.265625 0.953125 
+Q 20.953125 3.328125 18.109375 8.203125 
+L 18.109375 0 
+L 9.078125 0 
+L 9.078125 75.984375 
+L 18.109375 75.984375 
+z
+" id="DejaVuSans-98"/>
+      <path d="M 9.421875 75.984375 
+L 18.40625 75.984375 
+L 18.40625 0 
+L 9.421875 0 
+z
+" id="DejaVuSans-108"/>
+     </defs>
+     <g style="fill:#262626;" transform="translate(148.7 103.524219)scale(0.11 -0.11)">
+      <use xlink:href="#DejaVuSans-66"/>
+      <use x="68.603516" xlink:href="#DejaVuSans-45"/>
+      <use x="104.6875" xlink:href="#DejaVuSans-116"/>
+      <use x="143.896484" xlink:href="#DejaVuSans-114"/>
+      <use x="184.978516" xlink:href="#DejaVuSans-101"/>
+      <use x="246.501953" xlink:href="#DejaVuSans-101"/>
+      <use x="308.025391" xlink:href="#DejaVuSans-32"/>
+      <use x="339.8125" xlink:href="#DejaVuSans-40"/>
+      <use x="378.826172" xlink:href="#DejaVuSans-97"/>
+      <use x="440.105469" xlink:href="#DejaVuSans-98"/>
+      <use x="503.582031" xlink:href="#DejaVuSans-115"/>
+      <use x="555.681641" xlink:href="#DejaVuSans-108"/>
+      <use x="583.464844" xlink:href="#DejaVuSans-58"/>
+      <use x="617.15625" xlink:href="#DejaVuSans-58"/>
+      <use x="650.847656" xlink:href="#DejaVuSans-98"/>
+      <use x="714.324219" xlink:href="#DejaVuSans-116"/>
+      <use x="753.533203" xlink:href="#DejaVuSans-114"/>
+      <use x="794.615234" xlink:href="#DejaVuSans-101"/>
+      <use x="856.138672" xlink:href="#DejaVuSans-101"/>
+      <use x="917.662109" xlink:href="#DejaVuSans-41"/>
+     </g>
+    </g>
+    <g id="line2d_21">
+     <path d="M 117.9 115.820156 
+L 139.9 115.820156 
+" style="fill:none;stroke:#00008b;stroke-linecap:round;stroke-width:1.5;"/>
+    </g>
+    <g id="line2d_22"/>
+    <g id="text_9">
+     <!-- B− tree -->
+     <defs>
+      <path d="M 10.59375 35.5 
+L 73.1875 35.5 
+L 73.1875 27.203125 
+L 10.59375 27.203125 
+z
+" id="DejaVuSans-8722"/>
+     </defs>
+     <g style="fill:#262626;" transform="translate(148.7 119.670156)scale(0.11 -0.11)">
+      <use xlink:href="#DejaVuSans-66"/>
+      <use x="68.603516" xlink:href="#DejaVuSans-8722"/>
+      <use x="152.392578" xlink:href="#DejaVuSans-32"/>
+      <use x="184.179688" xlink:href="#DejaVuSans-116"/>
+      <use x="223.388672" xlink:href="#DejaVuSans-114"/>
+      <use x="264.470703" xlink:href="#DejaVuSans-101"/>
+      <use x="325.994141" xlink:href="#DejaVuSans-101"/>
+     </g>
+    </g>
+   </g>
+  </g>
+  <g id="axes_2">
+   <g id="patch_8">
+    <path d="M 108 512.64 
+L 777.6 512.64 
+L 777.6 311.04 
+L 108 311.04 
+z
+" style="fill:#ffffff;"/>
+   </g>
+   <g id="matplotlib.axis_3">
+    <g id="xtick_6">
+     <g id="line2d_23">
+      <path clip-path="url(#p123cdd25c3)" d="M 226.05126 512.64 
+L 226.05126 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_10">
+      <!-- $\mathdefault{2^{15}}$ -->
+      <g style="fill:#262626;" transform="translate(217.41626 530.498281)scale(0.11 -0.11)">
+       <use transform="translate(0 0.684375)" xlink:href="#DejaVuSans-50"/>
+       <use transform="translate(64.580078 38.965625)scale(0.7)" xlink:href="#DejaVuSans-49"/>
+       <use transform="translate(109.116211 38.965625)scale(0.7)" xlink:href="#DejaVuSans-53"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_7">
+     <g id="line2d_24">
+      <path clip-path="url(#p123cdd25c3)" d="M 363.938445 512.64 
+L 363.938445 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_11">
+      <!-- $\mathdefault{2^{17}}$ -->
+      <defs>
+       <path d="M 8.203125 72.90625 
+L 55.078125 72.90625 
+L 55.078125 68.703125 
+L 28.609375 0 
+L 18.3125 0 
+L 43.21875 64.59375 
+L 8.203125 64.59375 
+z
+" id="DejaVuSans-55"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(355.303445 530.498281)scale(0.11 -0.11)">
+       <use transform="translate(0 0.684375)" xlink:href="#DejaVuSans-50"/>
+       <use transform="translate(64.580078 38.965625)scale(0.7)" xlink:href="#DejaVuSans-49"/>
+       <use transform="translate(109.116211 38.965625)scale(0.7)" xlink:href="#DejaVuSans-55"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_8">
+     <g id="line2d_25">
+      <path clip-path="url(#p123cdd25c3)" d="M 501.82563 512.64 
+L 501.82563 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_12">
+      <!-- $\mathdefault{2^{19}}$ -->
+      <defs>
+       <path d="M 10.984375 1.515625 
+L 10.984375 10.5 
+Q 14.703125 8.734375 18.5 7.8125 
+Q 22.3125 6.890625 25.984375 6.890625 
+Q 35.75 6.890625 40.890625 13.453125 
+Q 46.046875 20.015625 46.78125 33.40625 
+Q 43.953125 29.203125 39.59375 26.953125 
+Q 35.25 24.703125 29.984375 24.703125 
+Q 19.046875 24.703125 12.671875 31.3125 
+Q 6.296875 37.9375 6.296875 49.421875 
+Q 6.296875 60.640625 12.9375 67.421875 
+Q 19.578125 74.21875 30.609375 74.21875 
+Q 43.265625 74.21875 49.921875 64.515625 
+Q 56.59375 54.828125 56.59375 36.375 
+Q 56.59375 19.140625 48.40625 8.859375 
+Q 40.234375 -1.421875 26.421875 -1.421875 
+Q 22.703125 -1.421875 18.890625 -0.6875 
+Q 15.09375 0.046875 10.984375 1.515625 
+z
+M 30.609375 32.421875 
+Q 37.25 32.421875 41.125 36.953125 
+Q 45.015625 41.5 45.015625 49.421875 
+Q 45.015625 57.28125 41.125 61.84375 
+Q 37.25 66.40625 30.609375 66.40625 
+Q 23.96875 66.40625 20.09375 61.84375 
+Q 16.21875 57.28125 16.21875 49.421875 
+Q 16.21875 41.5 20.09375 36.953125 
+Q 23.96875 32.421875 30.609375 32.421875 
+z
+" id="DejaVuSans-57"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(493.19063 530.498281)scale(0.11 -0.11)">
+       <use transform="translate(0 0.765625)" xlink:href="#DejaVuSans-50"/>
+       <use transform="translate(64.580078 39.046875)scale(0.7)" xlink:href="#DejaVuSans-49"/>
+       <use transform="translate(109.116211 39.046875)scale(0.7)" xlink:href="#DejaVuSans-57"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_9">
+     <g id="line2d_26">
+      <path clip-path="url(#p123cdd25c3)" d="M 639.712815 512.64 
+L 639.712815 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_13">
+      <!-- $\mathdefault{2^{21}}$ -->
+      <g style="fill:#262626;" transform="translate(631.077815 530.498281)scale(0.11 -0.11)">
+       <use transform="translate(0 0.765625)" xlink:href="#DejaVuSans-50"/>
+       <use transform="translate(64.580078 39.046875)scale(0.7)" xlink:href="#DejaVuSans-50"/>
+       <use transform="translate(109.116211 39.046875)scale(0.7)" xlink:href="#DejaVuSans-49"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_10">
+     <g id="line2d_27">
+      <path clip-path="url(#p123cdd25c3)" d="M 777.6 512.64 
+L 777.6 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_14">
+      <!-- $\mathdefault{2^{23}}$ -->
+      <defs>
+       <path d="M 40.578125 39.3125 
+Q 47.65625 37.796875 51.625 33 
+Q 55.609375 28.21875 55.609375 21.1875 
+Q 55.609375 10.40625 48.1875 4.484375 
+Q 40.765625 -1.421875 27.09375 -1.421875 
+Q 22.515625 -1.421875 17.65625 -0.515625 
+Q 12.796875 0.390625 7.625 2.203125 
+L 7.625 11.71875 
+Q 11.71875 9.328125 16.59375 8.109375 
+Q 21.484375 6.890625 26.8125 6.890625 
+Q 36.078125 6.890625 40.9375 10.546875 
+Q 45.796875 14.203125 45.796875 21.1875 
+Q 45.796875 27.640625 41.28125 31.265625 
+Q 36.765625 34.90625 28.71875 34.90625 
+L 20.21875 34.90625 
+L 20.21875 43.015625 
+L 29.109375 43.015625 
+Q 36.375 43.015625 40.234375 45.921875 
+Q 44.09375 48.828125 44.09375 54.296875 
+Q 44.09375 59.90625 40.109375 62.90625 
+Q 36.140625 65.921875 28.71875 65.921875 
+Q 24.65625 65.921875 20.015625 65.03125 
+Q 15.375 64.15625 9.8125 62.3125 
+L 9.8125 71.09375 
+Q 15.4375 72.65625 20.34375 73.4375 
+Q 25.25 74.21875 29.59375 74.21875 
+Q 40.828125 74.21875 47.359375 69.109375 
+Q 53.90625 64.015625 53.90625 55.328125 
+Q 53.90625 49.265625 50.4375 45.09375 
+Q 46.96875 40.921875 40.578125 39.3125 
+z
+" id="DejaVuSans-51"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(768.965 530.498281)scale(0.11 -0.11)">
+       <use transform="translate(0 0.765625)" xlink:href="#DejaVuSans-50"/>
+       <use transform="translate(64.580078 39.046875)scale(0.7)" xlink:href="#DejaVuSans-50"/>
+       <use transform="translate(109.116211 39.046875)scale(0.7)" xlink:href="#DejaVuSans-51"/>
+      </g>
+     </g>
+    </g>
+    <g id="text_15">
+     <!-- # of elements -->
+     <defs>
+      <path d="M 51.125 44 
+L 36.921875 44 
+L 32.8125 27.6875 
+L 47.125 27.6875 
+z
+M 43.796875 71.78125 
+L 38.71875 51.515625 
+L 52.984375 51.515625 
+L 58.109375 71.78125 
+L 65.921875 71.78125 
+L 60.890625 51.515625 
+L 76.125 51.515625 
+L 76.125 44 
+L 58.984375 44 
+L 54.984375 27.6875 
+L 70.515625 27.6875 
+L 70.515625 20.21875 
+L 53.078125 20.21875 
+L 48 0 
+L 40.1875 0 
+L 45.21875 20.21875 
+L 30.90625 20.21875 
+L 25.875 0 
+L 18.015625 0 
+L 23.09375 20.21875 
+L 7.71875 20.21875 
+L 7.71875 27.6875 
+L 24.90625 27.6875 
+L 29 44 
+L 13.28125 44 
+L 13.28125 51.515625 
+L 30.90625 51.515625 
+L 35.890625 71.78125 
+z
+" id="DejaVuSans-35"/>
+      <path d="M 30.609375 48.390625 
+Q 23.390625 48.390625 19.1875 42.75 
+Q 14.984375 37.109375 14.984375 27.296875 
+Q 14.984375 17.484375 19.15625 11.84375 
+Q 23.34375 6.203125 30.609375 6.203125 
+Q 37.796875 6.203125 41.984375 11.859375 
+Q 46.1875 17.53125 46.1875 27.296875 
+Q 46.1875 37.015625 41.984375 42.703125 
+Q 37.796875 48.390625 30.609375 48.390625 
+z
+M 30.609375 56 
+Q 42.328125 56 49.015625 48.375 
+Q 55.71875 40.765625 55.71875 27.296875 
+Q 55.71875 13.875 49.015625 6.21875 
+Q 42.328125 -1.421875 30.609375 -1.421875 
+Q 18.84375 -1.421875 12.171875 6.21875 
+Q 5.515625 13.875 5.515625 27.296875 
+Q 5.515625 40.765625 12.171875 48.375 
+Q 18.84375 56 30.609375 56 
+z
+" id="DejaVuSans-111"/>
+      <path d="M 37.109375 75.984375 
+L 37.109375 68.5 
+L 28.515625 68.5 
+Q 23.6875 68.5 21.796875 66.546875 
+Q 19.921875 64.59375 19.921875 59.515625 
+L 19.921875 54.6875 
+L 34.71875 54.6875 
+L 34.71875 47.703125 
+L 19.921875 47.703125 
+L 19.921875 0 
+L 10.890625 0 
+L 10.890625 47.703125 
+L 2.296875 47.703125 
+L 2.296875 54.6875 
+L 10.890625 54.6875 
+L 10.890625 58.5 
+Q 10.890625 67.625 15.140625 71.796875 
+Q 19.390625 75.984375 28.609375 75.984375 
+z
+" id="DejaVuSans-102"/>
+      <path d="M 52 44.1875 
+Q 55.375 50.25 60.0625 53.125 
+Q 64.75 56 71.09375 56 
+Q 79.640625 56 84.28125 50.015625 
+Q 88.921875 44.046875 88.921875 33.015625 
+L 88.921875 0 
+L 79.890625 0 
+L 79.890625 32.71875 
+Q 79.890625 40.578125 77.09375 44.375 
+Q 74.3125 48.1875 68.609375 48.1875 
+Q 61.625 48.1875 57.5625 43.546875 
+Q 53.515625 38.921875 53.515625 30.90625 
+L 53.515625 0 
+L 44.484375 0 
+L 44.484375 32.71875 
+Q 44.484375 40.625 41.703125 44.40625 
+Q 38.921875 48.1875 33.109375 48.1875 
+Q 26.21875 48.1875 22.15625 43.53125 
+Q 18.109375 38.875 18.109375 30.90625 
+L 18.109375 0 
+L 9.078125 0 
+L 9.078125 54.6875 
+L 18.109375 54.6875 
+L 18.109375 46.1875 
+Q 21.1875 51.21875 25.484375 53.609375 
+Q 29.78125 56 35.6875 56 
+Q 41.65625 56 45.828125 52.96875 
+Q 50 49.953125 52 44.1875 
+z
+" id="DejaVuSans-109"/>
+     </defs>
+     <g style="fill:#262626;" transform="translate(400.307813 545.904063)scale(0.12 -0.12)">
+      <use xlink:href="#DejaVuSans-35"/>
+      <use x="83.789062" xlink:href="#DejaVuSans-32"/>
+      <use x="115.576172" xlink:href="#DejaVuSans-111"/>
+      <use x="176.757812" xlink:href="#DejaVuSans-102"/>
+      <use x="211.962891" xlink:href="#DejaVuSans-32"/>
+      <use x="243.75" xlink:href="#DejaVuSans-101"/>
+      <use x="305.273438" xlink:href="#DejaVuSans-108"/>
+      <use x="333.056641" xlink:href="#DejaVuSans-101"/>
+      <use x="394.580078" xlink:href="#DejaVuSans-109"/>
+      <use x="491.992188" xlink:href="#DejaVuSans-101"/>
+      <use x="553.515625" xlink:href="#DejaVuSans-110"/>
+      <use x="616.894531" xlink:href="#DejaVuSans-116"/>
+      <use x="656.103516" xlink:href="#DejaVuSans-115"/>
+     </g>
+    </g>
+   </g>
+   <g id="matplotlib.axis_4">
+    <g id="ytick_6">
+     <g id="line2d_28">
+      <path clip-path="url(#p123cdd25c3)" d="M 108 512.64 
+L 777.6 512.64 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_16">
+      <!-- 0 -->
+      <g style="fill:#262626;" transform="translate(91.50125 516.819141)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-48"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_7">
+     <g id="line2d_29">
+      <path clip-path="url(#p123cdd25c3)" d="M 108 471.362874 
+L 777.6 471.362874 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_17">
+      <!-- 500 -->
+      <g style="fill:#262626;" transform="translate(77.50375 475.542015)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-53"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-48"/>
+       <use x="127.246094" xlink:href="#DejaVuSans-48"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_8">
+     <g id="line2d_30">
+      <path clip-path="url(#p123cdd25c3)" d="M 108 430.085749 
+L 777.6 430.085749 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_18">
+      <!-- 1000 -->
+      <g style="fill:#262626;" transform="translate(70.505 434.264889)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-49"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-48"/>
+       <use x="127.246094" xlink:href="#DejaVuSans-48"/>
+       <use x="190.869141" xlink:href="#DejaVuSans-48"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_9">
+     <g id="line2d_31">
+      <path clip-path="url(#p123cdd25c3)" d="M 108 388.808623 
+L 777.6 388.808623 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_19">
+      <!-- 1500 -->
+      <g style="fill:#262626;" transform="translate(70.505 392.987763)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-49"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-53"/>
+       <use x="127.246094" xlink:href="#DejaVuSans-48"/>
+       <use x="190.869141" xlink:href="#DejaVuSans-48"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_10">
+     <g id="line2d_32">
+      <path clip-path="url(#p123cdd25c3)" d="M 108 347.531497 
+L 777.6 347.531497 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_20">
+      <!-- 2000 -->
+      <g style="fill:#262626;" transform="translate(70.505 351.710638)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-50"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-48"/>
+       <use x="127.246094" xlink:href="#DejaVuSans-48"/>
+       <use x="190.869141" xlink:href="#DejaVuSans-48"/>
+      </g>
+     </g>
+    </g>
+    <g id="text_21">
+     <!-- Find (ns) -->
+     <defs>
+      <path d="M 9.8125 72.90625 
+L 51.703125 72.90625 
+L 51.703125 64.59375 
+L 19.671875 64.59375 
+L 19.671875 43.109375 
+L 48.578125 43.109375 
+L 48.578125 34.8125 
+L 19.671875 34.8125 
+L 19.671875 0 
+L 9.8125 0 
+z
+" id="DejaVuSans-70"/>
+      <path d="M 9.421875 54.6875 
+L 18.40625 54.6875 
+L 18.40625 0 
+L 9.421875 0 
+z
+M 9.421875 75.984375 
+L 18.40625 75.984375 
+L 18.40625 64.59375 
+L 9.421875 64.59375 
+z
+" id="DejaVuSans-105"/>
+     </defs>
+     <g style="fill:#262626;" transform="translate(64.009375 438.079687)rotate(-90)scale(0.12 -0.12)">
+      <use xlink:href="#DejaVuSans-70"/>
+      <use x="57.410156" xlink:href="#DejaVuSans-105"/>
+      <use x="85.193359" xlink:href="#DejaVuSans-110"/>
+      <use x="148.572266" xlink:href="#DejaVuSans-100"/>
+      <use x="212.048828" xlink:href="#DejaVuSans-32"/>
+      <use x="243.835938" xlink:href="#DejaVuSans-40"/>
+      <use x="282.849609" xlink:href="#DejaVuSans-110"/>
+      <use x="346.228516" xlink:href="#DejaVuSans-115"/>
+      <use x="398.328125" xlink:href="#DejaVuSans-41"/>
+     </g>
+    </g>
+   </g>
+   <g id="line2d_33">
+    <path clip-path="url(#p123cdd25c3)" d="M 108 503.86696 
+L 123.616312 503.181759 
+L 139.225357 502.366949 
+L 154.835707 501.782465 
+L 170.450001 501.119554 
+L 186.065768 500.244479 
+L 201.67987 498.937645 
+L 217.2929 498.498457 
+L 232.909155 496.474226 
+L 248.525007 495.924415 
+L 264.139766 494.098315 
+L 279.754857 492.821201 
+L 295.370534 489.703127 
+L 310.986484 488.925466 
+L 326.602365 485.787579 
+L 342.218044 484.432038 
+L 357.834332 481.238014 
+L 373.450023 479.798268 
+L 389.066187 478.479876 
+L 404.682343 476.159276 
+L 420.298254 471.955614 
+L 435.914393 468.69307 
+L 451.530695 464.476199 
+L 467.14687 461.012222 
+L 482.763023 457.807466 
+L 498.37927 454.557305 
+L 513.99548 453.607931 
+L 529.611731 449.0336 
+L 545.228 443.659319 
+L 560.844274 442.790848 
+L 576.460557 439.822197 
+L 592.07685 426.551601 
+L 607.693122 425.166341 
+L 623.309383 428.16306 
+L 638.92567 424.725501 
+L 654.541958 416.781305 
+L 670.158255 412.586724 
+L 685.774556 407.375899 
+L 701.390864 401.170296 
+L 717.007176 395.225565 
+L 732.623475 386.439316 
+L 748.239782 388.241475 
+L 763.85609 379.584011 
+L 779.472394 376.119209 
+" style="fill:none;stroke:#006400;stroke-linecap:round;stroke-width:1.5;"/>
+   </g>
+   <g id="line2d_34">
+    <path clip-path="url(#p123cdd25c3)" d="M 108 505.207641 
+L 123.616312 505.310008 
+L 139.225357 505.063996 
+L 154.835707 504.915399 
+L 170.450001 504.874947 
+L 186.065768 504.631412 
+L 201.67987 504.825415 
+L 217.2929 504.617378 
+L 232.909155 504.344949 
+L 248.525007 504.253314 
+L 264.139766 504.233501 
+L 279.754857 503.958595 
+L 295.370534 503.878517 
+L 310.986484 503.636633 
+L 326.602365 503.415388 
+L 342.218044 503.019953 
+L 357.834332 502.393366 
+L 373.450023 501.774209 
+L 389.066187 501.368043 
+L 404.682343 500.432703 
+L 420.298254 499.869683 
+L 435.914393 499.390868 
+L 451.530695 499.038362 
+L 467.14687 499.309139 
+L 482.763023 498.592569 
+L 498.37927 497.608522 
+L 513.99548 497.869393 
+L 529.611731 497.398834 
+L 545.228 496.414787 
+L 560.844274 495.458809 
+L 576.460557 493.983565 
+L 592.07685 490.663233 
+L 607.693122 488.79503 
+L 623.309383 489.930151 
+L 638.92567 488.747149 
+L 654.541958 487.563321 
+L 670.158255 485.56138 
+L 685.774556 485.908108 
+L 701.390864 485.633202 
+L 717.007176 484.003581 
+L 732.623475 481.647483 
+L 748.239782 477.052513 
+L 763.85609 474.495808 
+L 779.472394 474.902801 
+" style="fill:none;stroke:#8b0000;stroke-linecap:round;stroke-width:1.5;"/>
+   </g>
+   <g id="line2d_35">
+    <path clip-path="url(#p123cdd25c3)" d="M 108 511.622932 
+L 123.616312 511.608897 
+L 139.225357 511.622932 
+L 154.835707 511.032669 
+L 170.450001 511.033494 
+L 186.065768 511.028541 
+L 201.67987 511.022762 
+L 217.2929 511.002124 
+L 232.909155 510.984787 
+L 248.525007 510.994694 
+L 264.139766 510.982311 
+L 279.754857 510.975706 
+L 295.370534 510.983136 
+L 310.986484 510.944336 
+L 326.602365 510.82876 
+L 342.218044 510.690894 
+L 357.834332 510.719788 
+L 373.450023 510.628978 
+L 389.066187 510.577795 
+L 404.682343 510.427546 
+L 420.298254 510.340864 
+L 435.914393 510.148513 
+L 451.530695 509.222254 
+L 467.14687 508.71372 
+L 482.763023 508.939093 
+L 498.37927 508.010358 
+L 513.99548 506.653166 
+L 529.611731 507.138585 
+L 545.228 507.137759 
+L 560.844274 507.368911 
+L 576.460557 507.195547 
+L 592.07685 506.751405 
+L 607.693122 506.575565 
+L 623.309383 506.209849 
+L 638.92567 506.265161 
+L 654.541958 505.992732 
+L 670.158255 505.499057 
+L 685.774556 504.941816 
+L 701.390864 505.06895 
+L 717.007176 504.458874 
+L 732.623475 503.776976 
+L 748.239782 502.981978 
+L 763.85609 502.49821 
+L 779.472394 502.851542 
+" style="fill:none;stroke:#00008b;stroke-linecap:round;stroke-width:1.5;"/>
+   </g>
+   <g id="line2d_36">
+    <path clip-path="url(#p123cdd25c3)" d="M 88.164075 512.64 
+L 88.164075 311.04 
+" style="fill:none;stroke:#000000;stroke-dasharray:3.7,1.6;stroke-dashoffset:0;"/>
+   </g>
+   <g id="line2d_37">
+    <path clip-path="url(#p123cdd25c3)" d="M 363.938445 512.64 
+L 363.938445 311.04 
+" style="fill:none;stroke:#000000;stroke-dasharray:3.7,1.6;stroke-dashoffset:0;"/>
+   </g>
+   <g id="line2d_38">
+    <path clip-path="url(#p123cdd25c3)" d="M 570.769222 512.64 
+L 570.769222 311.04 
+" style="fill:none;stroke:#000000;stroke-dasharray:3.7,1.6;stroke-dashoffset:0;"/>
+   </g>
+   <g id="patch_9">
+    <path d="M 108 512.64 
+L 108 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+   <g id="patch_10">
+    <path d="M 777.6 512.64 
+L 777.6 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+   <g id="patch_11">
+    <path d="M 108 512.64 
+L 777.6 512.64 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+   <g id="patch_12">
+    <path d="M 108 311.04 
+L 777.6 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+  </g>
+ </g>
+ <defs>
+  <clipPath id="padbb704ba3">
+   <rect height="201.6" width="669.6" x="108" y="69.12"/>
+  </clipPath>
+  <clipPath id="p123cdd25c3">
+   <rect height="201.6" width="669.6" x="108" y="311.04"/>
+  </clipPath>
+ </defs>
+</svg>
diff --git a/content/english/hpc/data-structures/img/btree-relative.svg b/content/english/hpc/data-structures/img/btree-relative.svg
new file mode 100644
index 00000000..e40210ff
--- /dev/null
+++ b/content/english/hpc/data-structures/img/btree-relative.svg
@@ -0,0 +1,1505 @@
+<?xml version="1.0" encoding="utf-8" standalone="no"?>
+<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
+  "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
+<!-- Created with matplotlib (https://matplotlib.org/) -->
+<svg height="576pt" version="1.1" viewBox="0 0 864 576" width="864pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
+ <defs>
+  <style type="text/css">
+*{stroke-linecap:butt;stroke-linejoin:round;}
+  </style>
+ </defs>
+ <g id="figure_1">
+  <g id="patch_1">
+   <path d="M 0 576 
+L 864 576 
+L 864 0 
+L 0 0 
+z
+" style="fill:#ffffff;"/>
+  </g>
+  <g id="axes_1">
+   <g id="patch_2">
+    <path d="M 108 270.72 
+L 777.6 270.72 
+L 777.6 69.12 
+L 108 69.12 
+z
+" style="fill:#ffffff;"/>
+   </g>
+   <g id="matplotlib.axis_1">
+    <g id="xtick_1">
+     <g id="line2d_1">
+      <path clip-path="url(#p12d3d9f39c)" d="M 226.05126 270.72 
+L 226.05126 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+    </g>
+    <g id="xtick_2">
+     <g id="line2d_2">
+      <path clip-path="url(#p12d3d9f39c)" d="M 363.938445 270.72 
+L 363.938445 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+    </g>
+    <g id="xtick_3">
+     <g id="line2d_3">
+      <path clip-path="url(#p12d3d9f39c)" d="M 501.82563 270.72 
+L 501.82563 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+    </g>
+    <g id="xtick_4">
+     <g id="line2d_4">
+      <path clip-path="url(#p12d3d9f39c)" d="M 639.712815 270.72 
+L 639.712815 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+    </g>
+    <g id="xtick_5">
+     <g id="line2d_5">
+      <path clip-path="url(#p12d3d9f39c)" d="M 777.6 270.72 
+L 777.6 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+    </g>
+   </g>
+   <g id="matplotlib.axis_2">
+    <g id="ytick_1">
+     <g id="line2d_6">
+      <path clip-path="url(#p12d3d9f39c)" d="M 108 267.160287 
+L 777.6 267.160287 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_1">
+      <!-- 1 -->
+      <defs>
+       <path d="M 12.40625 8.296875 
+L 28.515625 8.296875 
+L 28.515625 63.921875 
+L 10.984375 60.40625 
+L 10.984375 69.390625 
+L 28.421875 72.90625 
+L 38.28125 72.90625 
+L 38.28125 8.296875 
+L 54.390625 8.296875 
+L 54.390625 0 
+L 12.40625 0 
+z
+" id="DejaVuSans-49"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(91.50125 271.339428)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-49"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_2">
+     <g id="line2d_7">
+      <path clip-path="url(#p12d3d9f39c)" d="M 108 238.731493 
+L 777.6 238.731493 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_2">
+      <!-- 2 -->
+      <defs>
+       <path d="M 19.1875 8.296875 
+L 53.609375 8.296875 
+L 53.609375 0 
+L 7.328125 0 
+L 7.328125 8.296875 
+Q 12.9375 14.109375 22.625 23.890625 
+Q 32.328125 33.6875 34.8125 36.53125 
+Q 39.546875 41.84375 41.421875 45.53125 
+Q 43.3125 49.21875 43.3125 52.78125 
+Q 43.3125 58.59375 39.234375 62.25 
+Q 35.15625 65.921875 28.609375 65.921875 
+Q 23.96875 65.921875 18.8125 64.3125 
+Q 13.671875 62.703125 7.8125 59.421875 
+L 7.8125 69.390625 
+Q 13.765625 71.78125 18.9375 73 
+Q 24.125 74.21875 28.421875 74.21875 
+Q 39.75 74.21875 46.484375 68.546875 
+Q 53.21875 62.890625 53.21875 53.421875 
+Q 53.21875 48.921875 51.53125 44.890625 
+Q 49.859375 40.875 45.40625 35.40625 
+Q 44.1875 33.984375 37.640625 27.21875 
+Q 31.109375 20.453125 19.1875 8.296875 
+z
+" id="DejaVuSans-50"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(91.50125 242.910634)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-50"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_3">
+     <g id="line2d_8">
+      <path clip-path="url(#p12d3d9f39c)" d="M 108 210.302699 
+L 777.6 210.302699 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_3">
+      <!-- 3 -->
+      <defs>
+       <path d="M 40.578125 39.3125 
+Q 47.65625 37.796875 51.625 33 
+Q 55.609375 28.21875 55.609375 21.1875 
+Q 55.609375 10.40625 48.1875 4.484375 
+Q 40.765625 -1.421875 27.09375 -1.421875 
+Q 22.515625 -1.421875 17.65625 -0.515625 
+Q 12.796875 0.390625 7.625 2.203125 
+L 7.625 11.71875 
+Q 11.71875 9.328125 16.59375 8.109375 
+Q 21.484375 6.890625 26.8125 6.890625 
+Q 36.078125 6.890625 40.9375 10.546875 
+Q 45.796875 14.203125 45.796875 21.1875 
+Q 45.796875 27.640625 41.28125 31.265625 
+Q 36.765625 34.90625 28.71875 34.90625 
+L 20.21875 34.90625 
+L 20.21875 43.015625 
+L 29.109375 43.015625 
+Q 36.375 43.015625 40.234375 45.921875 
+Q 44.09375 48.828125 44.09375 54.296875 
+Q 44.09375 59.90625 40.109375 62.90625 
+Q 36.140625 65.921875 28.71875 65.921875 
+Q 24.65625 65.921875 20.015625 65.03125 
+Q 15.375 64.15625 9.8125 62.3125 
+L 9.8125 71.09375 
+Q 15.4375 72.65625 20.34375 73.4375 
+Q 25.25 74.21875 29.59375 74.21875 
+Q 40.828125 74.21875 47.359375 69.109375 
+Q 53.90625 64.015625 53.90625 55.328125 
+Q 53.90625 49.265625 50.4375 45.09375 
+Q 46.96875 40.921875 40.578125 39.3125 
+z
+" id="DejaVuSans-51"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(91.50125 214.481839)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-51"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_4">
+     <g id="line2d_9">
+      <path clip-path="url(#p12d3d9f39c)" d="M 108 181.873904 
+L 777.6 181.873904 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_4">
+      <!-- 4 -->
+      <defs>
+       <path d="M 37.796875 64.3125 
+L 12.890625 25.390625 
+L 37.796875 25.390625 
+z
+M 35.203125 72.90625 
+L 47.609375 72.90625 
+L 47.609375 25.390625 
+L 58.015625 25.390625 
+L 58.015625 17.1875 
+L 47.609375 17.1875 
+L 47.609375 0 
+L 37.796875 0 
+L 37.796875 17.1875 
+L 4.890625 17.1875 
+L 4.890625 26.703125 
+z
+" id="DejaVuSans-52"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(91.50125 186.053045)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-52"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_5">
+     <g id="line2d_10">
+      <path clip-path="url(#p12d3d9f39c)" d="M 108 153.44511 
+L 777.6 153.44511 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_5">
+      <!-- 5 -->
+      <defs>
+       <path d="M 10.796875 72.90625 
+L 49.515625 72.90625 
+L 49.515625 64.59375 
+L 19.828125 64.59375 
+L 19.828125 46.734375 
+Q 21.96875 47.46875 24.109375 47.828125 
+Q 26.265625 48.1875 28.421875 48.1875 
+Q 40.625 48.1875 47.75 41.5 
+Q 54.890625 34.8125 54.890625 23.390625 
+Q 54.890625 11.625 47.5625 5.09375 
+Q 40.234375 -1.421875 26.90625 -1.421875 
+Q 22.3125 -1.421875 17.546875 -0.640625 
+Q 12.796875 0.140625 7.71875 1.703125 
+L 7.71875 11.625 
+Q 12.109375 9.234375 16.796875 8.0625 
+Q 21.484375 6.890625 26.703125 6.890625 
+Q 35.15625 6.890625 40.078125 11.328125 
+Q 45.015625 15.765625 45.015625 23.390625 
+Q 45.015625 31 40.078125 35.4375 
+Q 35.15625 39.890625 26.703125 39.890625 
+Q 22.75 39.890625 18.8125 39.015625 
+Q 14.890625 38.140625 10.796875 36.28125 
+z
+" id="DejaVuSans-53"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(91.50125 157.62425)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-53"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_6">
+     <g id="line2d_11">
+      <path clip-path="url(#p12d3d9f39c)" d="M 108 125.016315 
+L 777.6 125.016315 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_6">
+      <!-- 6 -->
+      <defs>
+       <path d="M 33.015625 40.375 
+Q 26.375 40.375 22.484375 35.828125 
+Q 18.609375 31.296875 18.609375 23.390625 
+Q 18.609375 15.53125 22.484375 10.953125 
+Q 26.375 6.390625 33.015625 6.390625 
+Q 39.65625 6.390625 43.53125 10.953125 
+Q 47.40625 15.53125 47.40625 23.390625 
+Q 47.40625 31.296875 43.53125 35.828125 
+Q 39.65625 40.375 33.015625 40.375 
+z
+M 52.59375 71.296875 
+L 52.59375 62.3125 
+Q 48.875 64.0625 45.09375 64.984375 
+Q 41.3125 65.921875 37.59375 65.921875 
+Q 27.828125 65.921875 22.671875 59.328125 
+Q 17.53125 52.734375 16.796875 39.40625 
+Q 19.671875 43.65625 24.015625 45.921875 
+Q 28.375 48.1875 33.59375 48.1875 
+Q 44.578125 48.1875 50.953125 41.515625 
+Q 57.328125 34.859375 57.328125 23.390625 
+Q 57.328125 12.15625 50.6875 5.359375 
+Q 44.046875 -1.421875 33.015625 -1.421875 
+Q 20.359375 -1.421875 13.671875 8.265625 
+Q 6.984375 17.96875 6.984375 36.375 
+Q 6.984375 53.65625 15.1875 63.9375 
+Q 23.390625 74.21875 37.203125 74.21875 
+Q 40.921875 74.21875 44.703125 73.484375 
+Q 48.484375 72.75 52.59375 71.296875 
+z
+" id="DejaVuSans-54"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(91.50125 129.195456)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-54"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_7">
+     <g id="line2d_12">
+      <path clip-path="url(#p12d3d9f39c)" d="M 108 96.587521 
+L 777.6 96.587521 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_7">
+      <!-- 7 -->
+      <defs>
+       <path d="M 8.203125 72.90625 
+L 55.078125 72.90625 
+L 55.078125 68.703125 
+L 28.609375 0 
+L 18.3125 0 
+L 43.21875 64.59375 
+L 8.203125 64.59375 
+z
+" id="DejaVuSans-55"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(91.50125 100.766661)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-55"/>
+      </g>
+     </g>
+    </g>
+    <g id="text_8">
+     <!-- Insert (speedup) -->
+     <defs>
+      <path d="M 9.8125 72.90625 
+L 19.671875 72.90625 
+L 19.671875 0 
+L 9.8125 0 
+z
+" id="DejaVuSans-73"/>
+      <path d="M 54.890625 33.015625 
+L 54.890625 0 
+L 45.90625 0 
+L 45.90625 32.71875 
+Q 45.90625 40.484375 42.875 44.328125 
+Q 39.84375 48.1875 33.796875 48.1875 
+Q 26.515625 48.1875 22.3125 43.546875 
+Q 18.109375 38.921875 18.109375 30.90625 
+L 18.109375 0 
+L 9.078125 0 
+L 9.078125 54.6875 
+L 18.109375 54.6875 
+L 18.109375 46.1875 
+Q 21.34375 51.125 25.703125 53.5625 
+Q 30.078125 56 35.796875 56 
+Q 45.21875 56 50.046875 50.171875 
+Q 54.890625 44.34375 54.890625 33.015625 
+z
+" id="DejaVuSans-110"/>
+      <path d="M 44.28125 53.078125 
+L 44.28125 44.578125 
+Q 40.484375 46.53125 36.375 47.5 
+Q 32.28125 48.484375 27.875 48.484375 
+Q 21.1875 48.484375 17.84375 46.4375 
+Q 14.5 44.390625 14.5 40.28125 
+Q 14.5 37.15625 16.890625 35.375 
+Q 19.28125 33.59375 26.515625 31.984375 
+L 29.59375 31.296875 
+Q 39.15625 29.25 43.1875 25.515625 
+Q 47.21875 21.78125 47.21875 15.09375 
+Q 47.21875 7.46875 41.1875 3.015625 
+Q 35.15625 -1.421875 24.609375 -1.421875 
+Q 20.21875 -1.421875 15.453125 -0.5625 
+Q 10.6875 0.296875 5.421875 2 
+L 5.421875 11.28125 
+Q 10.40625 8.6875 15.234375 7.390625 
+Q 20.0625 6.109375 24.8125 6.109375 
+Q 31.15625 6.109375 34.5625 8.28125 
+Q 37.984375 10.453125 37.984375 14.40625 
+Q 37.984375 18.0625 35.515625 20.015625 
+Q 33.0625 21.96875 24.703125 23.78125 
+L 21.578125 24.515625 
+Q 13.234375 26.265625 9.515625 29.90625 
+Q 5.8125 33.546875 5.8125 39.890625 
+Q 5.8125 47.609375 11.28125 51.796875 
+Q 16.75 56 26.8125 56 
+Q 31.78125 56 36.171875 55.265625 
+Q 40.578125 54.546875 44.28125 53.078125 
+z
+" id="DejaVuSans-115"/>
+      <path d="M 56.203125 29.59375 
+L 56.203125 25.203125 
+L 14.890625 25.203125 
+Q 15.484375 15.921875 20.484375 11.0625 
+Q 25.484375 6.203125 34.421875 6.203125 
+Q 39.59375 6.203125 44.453125 7.46875 
+Q 49.3125 8.734375 54.109375 11.28125 
+L 54.109375 2.78125 
+Q 49.265625 0.734375 44.1875 -0.34375 
+Q 39.109375 -1.421875 33.890625 -1.421875 
+Q 20.796875 -1.421875 13.15625 6.1875 
+Q 5.515625 13.8125 5.515625 26.8125 
+Q 5.515625 40.234375 12.765625 48.109375 
+Q 20.015625 56 32.328125 56 
+Q 43.359375 56 49.78125 48.890625 
+Q 56.203125 41.796875 56.203125 29.59375 
+z
+M 47.21875 32.234375 
+Q 47.125 39.59375 43.09375 43.984375 
+Q 39.0625 48.390625 32.421875 48.390625 
+Q 24.90625 48.390625 20.390625 44.140625 
+Q 15.875 39.890625 15.1875 32.171875 
+z
+" id="DejaVuSans-101"/>
+      <path d="M 41.109375 46.296875 
+Q 39.59375 47.171875 37.8125 47.578125 
+Q 36.03125 48 33.890625 48 
+Q 26.265625 48 22.1875 43.046875 
+Q 18.109375 38.09375 18.109375 28.8125 
+L 18.109375 0 
+L 9.078125 0 
+L 9.078125 54.6875 
+L 18.109375 54.6875 
+L 18.109375 46.1875 
+Q 20.953125 51.171875 25.484375 53.578125 
+Q 30.03125 56 36.53125 56 
+Q 37.453125 56 38.578125 55.875 
+Q 39.703125 55.765625 41.0625 55.515625 
+z
+" id="DejaVuSans-114"/>
+      <path d="M 18.3125 70.21875 
+L 18.3125 54.6875 
+L 36.8125 54.6875 
+L 36.8125 47.703125 
+L 18.3125 47.703125 
+L 18.3125 18.015625 
+Q 18.3125 11.328125 20.140625 9.421875 
+Q 21.96875 7.515625 27.59375 7.515625 
+L 36.8125 7.515625 
+L 36.8125 0 
+L 27.59375 0 
+Q 17.1875 0 13.234375 3.875 
+Q 9.28125 7.765625 9.28125 18.015625 
+L 9.28125 47.703125 
+L 2.6875 47.703125 
+L 2.6875 54.6875 
+L 9.28125 54.6875 
+L 9.28125 70.21875 
+z
+" id="DejaVuSans-116"/>
+      <path id="DejaVuSans-32"/>
+      <path d="M 31 75.875 
+Q 24.46875 64.65625 21.28125 53.65625 
+Q 18.109375 42.671875 18.109375 31.390625 
+Q 18.109375 20.125 21.3125 9.0625 
+Q 24.515625 -2 31 -13.1875 
+L 23.1875 -13.1875 
+Q 15.875 -1.703125 12.234375 9.375 
+Q 8.59375 20.453125 8.59375 31.390625 
+Q 8.59375 42.28125 12.203125 53.3125 
+Q 15.828125 64.359375 23.1875 75.875 
+z
+" id="DejaVuSans-40"/>
+      <path d="M 18.109375 8.203125 
+L 18.109375 -20.796875 
+L 9.078125 -20.796875 
+L 9.078125 54.6875 
+L 18.109375 54.6875 
+L 18.109375 46.390625 
+Q 20.953125 51.265625 25.265625 53.625 
+Q 29.59375 56 35.59375 56 
+Q 45.5625 56 51.78125 48.09375 
+Q 58.015625 40.1875 58.015625 27.296875 
+Q 58.015625 14.40625 51.78125 6.484375 
+Q 45.5625 -1.421875 35.59375 -1.421875 
+Q 29.59375 -1.421875 25.265625 0.953125 
+Q 20.953125 3.328125 18.109375 8.203125 
+z
+M 48.6875 27.296875 
+Q 48.6875 37.203125 44.609375 42.84375 
+Q 40.53125 48.484375 33.40625 48.484375 
+Q 26.265625 48.484375 22.1875 42.84375 
+Q 18.109375 37.203125 18.109375 27.296875 
+Q 18.109375 17.390625 22.1875 11.75 
+Q 26.265625 6.109375 33.40625 6.109375 
+Q 40.53125 6.109375 44.609375 11.75 
+Q 48.6875 17.390625 48.6875 27.296875 
+z
+" id="DejaVuSans-112"/>
+      <path d="M 45.40625 46.390625 
+L 45.40625 75.984375 
+L 54.390625 75.984375 
+L 54.390625 0 
+L 45.40625 0 
+L 45.40625 8.203125 
+Q 42.578125 3.328125 38.25 0.953125 
+Q 33.9375 -1.421875 27.875 -1.421875 
+Q 17.96875 -1.421875 11.734375 6.484375 
+Q 5.515625 14.40625 5.515625 27.296875 
+Q 5.515625 40.1875 11.734375 48.09375 
+Q 17.96875 56 27.875 56 
+Q 33.9375 56 38.25 53.625 
+Q 42.578125 51.265625 45.40625 46.390625 
+z
+M 14.796875 27.296875 
+Q 14.796875 17.390625 18.875 11.75 
+Q 22.953125 6.109375 30.078125 6.109375 
+Q 37.203125 6.109375 41.296875 11.75 
+Q 45.40625 17.390625 45.40625 27.296875 
+Q 45.40625 37.203125 41.296875 42.84375 
+Q 37.203125 48.484375 30.078125 48.484375 
+Q 22.953125 48.484375 18.875 42.84375 
+Q 14.796875 37.203125 14.796875 27.296875 
+z
+" id="DejaVuSans-100"/>
+      <path d="M 8.5 21.578125 
+L 8.5 54.6875 
+L 17.484375 54.6875 
+L 17.484375 21.921875 
+Q 17.484375 14.15625 20.5 10.265625 
+Q 23.53125 6.390625 29.59375 6.390625 
+Q 36.859375 6.390625 41.078125 11.03125 
+Q 45.3125 15.671875 45.3125 23.6875 
+L 45.3125 54.6875 
+L 54.296875 54.6875 
+L 54.296875 0 
+L 45.3125 0 
+L 45.3125 8.40625 
+Q 42.046875 3.421875 37.71875 1 
+Q 33.40625 -1.421875 27.6875 -1.421875 
+Q 18.265625 -1.421875 13.375 4.4375 
+Q 8.5 10.296875 8.5 21.578125 
+z
+M 31.109375 56 
+z
+" id="DejaVuSans-117"/>
+      <path d="M 8.015625 75.875 
+L 15.828125 75.875 
+Q 23.140625 64.359375 26.78125 53.3125 
+Q 30.421875 42.28125 30.421875 31.390625 
+Q 30.421875 20.453125 26.78125 9.375 
+Q 23.140625 -1.703125 15.828125 -13.1875 
+L 8.015625 -13.1875 
+Q 14.5 -2 17.703125 9.0625 
+Q 20.90625 20.125 20.90625 31.390625 
+Q 20.90625 42.671875 17.703125 53.65625 
+Q 14.5 64.65625 8.015625 75.875 
+z
+" id="DejaVuSans-41"/>
+     </defs>
+     <g style="fill:#262626;" transform="translate(85.005625 219.456562)rotate(-90)scale(0.12 -0.12)">
+      <use xlink:href="#DejaVuSans-73"/>
+      <use x="29.492188" xlink:href="#DejaVuSans-110"/>
+      <use x="92.871094" xlink:href="#DejaVuSans-115"/>
+      <use x="144.970703" xlink:href="#DejaVuSans-101"/>
+      <use x="206.494141" xlink:href="#DejaVuSans-114"/>
+      <use x="247.607422" xlink:href="#DejaVuSans-116"/>
+      <use x="286.816406" xlink:href="#DejaVuSans-32"/>
+      <use x="318.603516" xlink:href="#DejaVuSans-40"/>
+      <use x="357.617188" xlink:href="#DejaVuSans-115"/>
+      <use x="409.716797" xlink:href="#DejaVuSans-112"/>
+      <use x="473.193359" xlink:href="#DejaVuSans-101"/>
+      <use x="534.716797" xlink:href="#DejaVuSans-101"/>
+      <use x="596.240234" xlink:href="#DejaVuSans-100"/>
+      <use x="659.716797" xlink:href="#DejaVuSans-117"/>
+      <use x="723.095703" xlink:href="#DejaVuSans-112"/>
+      <use x="786.572266" xlink:href="#DejaVuSans-41"/>
+     </g>
+    </g>
+   </g>
+   <g id="line2d_13">
+    <path clip-path="url(#p12d3d9f39c)" d="M 108 261.556364 
+L 123.616312 257.215632 
+L 139.225357 246.247543 
+L 154.835707 256.138132 
+L 170.450001 242.566331 
+L 186.065768 251.456923 
+L 201.67987 252.403842 
+L 217.2929 248.528095 
+L 232.909155 240.029373 
+L 248.525007 225.511488 
+L 264.139766 219.875237 
+L 279.754857 206.767065 
+L 295.370534 208.252978 
+L 310.986484 221.12375 
+L 326.602365 209.634959 
+L 342.218044 208.069104 
+L 357.834332 217.84083 
+L 373.450023 203.82905 
+L 389.066187 191.097904 
+L 404.682343 196.845432 
+L 420.298254 190.461254 
+L 435.914393 197.476494 
+L 451.530695 199.866112 
+L 467.14687 194.712998 
+L 482.763023 191.74493 
+L 498.37927 187.387161 
+L 513.99548 190.151507 
+L 529.611731 182.487783 
+L 545.228 182.395786 
+L 560.844274 187.961986 
+L 576.460557 191.030403 
+L 592.07685 185.897216 
+L 607.693122 183.481038 
+L 623.309383 189.960473 
+L 638.92567 188.945686 
+L 654.541958 191.878372 
+L 670.158255 186.968734 
+L 685.774556 182.714453 
+L 701.390864 183.69208 
+L 717.007176 180.247581 
+L 732.623475 185.407234 
+L 748.239782 180.980138 
+L 763.85609 194.432086 
+L 779.472394 186.255555 
+" style="fill:none;stroke:#8b0000;stroke-linecap:round;stroke-width:1.5;"/>
+   </g>
+   <g id="line2d_14">
+    <path clip-path="url(#p12d3d9f39c)" d="M 108 222.88282 
+L 123.616312 207.505583 
+L 139.225357 183.35888 
+L 154.835707 208.593268 
+L 170.450001 184.625984 
+L 186.065768 204.243376 
+L 201.67987 207.935209 
+L 217.2929 197.171797 
+L 232.909155 187.115923 
+L 248.525007 150.881833 
+L 264.139766 145.096632 
+L 279.754857 133.466253 
+L 295.370534 110.503544 
+L 310.986484 142.938624 
+L 326.602365 123.622176 
+L 342.218044 126.707851 
+L 357.834332 141.327669 
+L 373.450023 107.697919 
+L 389.066187 100.559243 
+L 404.682343 112.045748 
+L 420.298254 78.283636 
+L 435.914393 92.341916 
+L 451.530695 120.205458 
+L 467.14687 124.710647 
+L 482.763023 144.356008 
+L 498.37927 146.231215 
+L 513.99548 159.17462 
+L 529.611731 168.691605 
+L 545.228 158.653972 
+L 560.844274 142.652503 
+L 576.460557 146.813474 
+L 592.07685 140.930796 
+L 607.693122 124.475502 
+L 623.309383 144.619425 
+L 638.92567 135.580115 
+L 654.541958 132.425149 
+L 670.158255 123.107705 
+L 685.774556 132.323492 
+L 701.390864 132.835981 
+L 717.007176 123.295882 
+L 732.623475 129.288162 
+L 748.239782 124.002991 
+L 763.85609 131.917765 
+L 779.472394 113.178564 
+" style="fill:none;stroke:#00008b;stroke-linecap:round;stroke-width:1.5;"/>
+   </g>
+   <g id="line2d_15">
+    <path clip-path="url(#p12d3d9f39c)" d="M 88.164075 270.72 
+L 88.164075 69.12 
+" style="fill:none;stroke:#000000;stroke-dasharray:3.7,1.6;stroke-dashoffset:0;"/>
+   </g>
+   <g id="line2d_16">
+    <path clip-path="url(#p12d3d9f39c)" d="M 363.938445 270.72 
+L 363.938445 69.12 
+" style="fill:none;stroke:#000000;stroke-dasharray:3.7,1.6;stroke-dashoffset:0;"/>
+   </g>
+   <g id="line2d_17">
+    <path clip-path="url(#p12d3d9f39c)" d="M 570.769222 270.72 
+L 570.769222 69.12 
+" style="fill:none;stroke:#000000;stroke-dasharray:3.7,1.6;stroke-dashoffset:0;"/>
+   </g>
+   <g id="patch_3">
+    <path d="M 108 270.72 
+L 108 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+   <g id="patch_4">
+    <path d="M 777.6 270.72 
+L 777.6 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+   <g id="patch_5">
+    <path d="M 108 270.72 
+L 777.6 270.72 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+   <g id="patch_6">
+    <path d="M 108 69.12 
+L 777.6 69.12 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+   <g id="legend_1">
+    <g id="patch_7">
+     <path d="M 115.7 110.211875 
+L 256.135625 110.211875 
+Q 258.335625 110.211875 258.335625 108.011875 
+L 258.335625 76.82 
+Q 258.335625 74.62 256.135625 74.62 
+L 115.7 74.62 
+Q 113.5 74.62 113.5 76.82 
+L 113.5 108.011875 
+Q 113.5 110.211875 115.7 110.211875 
+z
+" style="fill:#ffffff;opacity:0.8;stroke:#cccccc;stroke-linejoin:miter;"/>
+    </g>
+    <g id="line2d_18">
+     <path d="M 117.9 83.528281 
+L 139.9 83.528281 
+" style="fill:none;stroke:#8b0000;stroke-linecap:round;stroke-width:1.5;"/>
+    </g>
+    <g id="line2d_19"/>
+    <g id="text_9">
+     <!-- B-tree (absl::btree) -->
+     <defs>
+      <path d="M 19.671875 34.8125 
+L 19.671875 8.109375 
+L 35.5 8.109375 
+Q 43.453125 8.109375 47.28125 11.40625 
+Q 51.125 14.703125 51.125 21.484375 
+Q 51.125 28.328125 47.28125 31.5625 
+Q 43.453125 34.8125 35.5 34.8125 
+z
+M 19.671875 64.796875 
+L 19.671875 42.828125 
+L 34.28125 42.828125 
+Q 41.5 42.828125 45.03125 45.53125 
+Q 48.578125 48.25 48.578125 53.8125 
+Q 48.578125 59.328125 45.03125 62.0625 
+Q 41.5 64.796875 34.28125 64.796875 
+z
+M 9.8125 72.90625 
+L 35.015625 72.90625 
+Q 46.296875 72.90625 52.390625 68.21875 
+Q 58.5 63.53125 58.5 54.890625 
+Q 58.5 48.1875 55.375 44.234375 
+Q 52.25 40.28125 46.1875 39.3125 
+Q 53.46875 37.75 57.5 32.78125 
+Q 61.53125 27.828125 61.53125 20.40625 
+Q 61.53125 10.640625 54.890625 5.3125 
+Q 48.25 0 35.984375 0 
+L 9.8125 0 
+z
+" id="DejaVuSans-66"/>
+      <path d="M 4.890625 31.390625 
+L 31.203125 31.390625 
+L 31.203125 23.390625 
+L 4.890625 23.390625 
+z
+" id="DejaVuSans-45"/>
+      <path d="M 34.28125 27.484375 
+Q 23.390625 27.484375 19.1875 25 
+Q 14.984375 22.515625 14.984375 16.5 
+Q 14.984375 11.71875 18.140625 8.90625 
+Q 21.296875 6.109375 26.703125 6.109375 
+Q 34.1875 6.109375 38.703125 11.40625 
+Q 43.21875 16.703125 43.21875 25.484375 
+L 43.21875 27.484375 
+z
+M 52.203125 31.203125 
+L 52.203125 0 
+L 43.21875 0 
+L 43.21875 8.296875 
+Q 40.140625 3.328125 35.546875 0.953125 
+Q 30.953125 -1.421875 24.3125 -1.421875 
+Q 15.921875 -1.421875 10.953125 3.296875 
+Q 6 8.015625 6 15.921875 
+Q 6 25.140625 12.171875 29.828125 
+Q 18.359375 34.515625 30.609375 34.515625 
+L 43.21875 34.515625 
+L 43.21875 35.40625 
+Q 43.21875 41.609375 39.140625 45 
+Q 35.0625 48.390625 27.6875 48.390625 
+Q 23 48.390625 18.546875 47.265625 
+Q 14.109375 46.140625 10.015625 43.890625 
+L 10.015625 52.203125 
+Q 14.9375 54.109375 19.578125 55.046875 
+Q 24.21875 56 28.609375 56 
+Q 40.484375 56 46.34375 49.84375 
+Q 52.203125 43.703125 52.203125 31.203125 
+z
+" id="DejaVuSans-97"/>
+      <path d="M 48.6875 27.296875 
+Q 48.6875 37.203125 44.609375 42.84375 
+Q 40.53125 48.484375 33.40625 48.484375 
+Q 26.265625 48.484375 22.1875 42.84375 
+Q 18.109375 37.203125 18.109375 27.296875 
+Q 18.109375 17.390625 22.1875 11.75 
+Q 26.265625 6.109375 33.40625 6.109375 
+Q 40.53125 6.109375 44.609375 11.75 
+Q 48.6875 17.390625 48.6875 27.296875 
+z
+M 18.109375 46.390625 
+Q 20.953125 51.265625 25.265625 53.625 
+Q 29.59375 56 35.59375 56 
+Q 45.5625 56 51.78125 48.09375 
+Q 58.015625 40.1875 58.015625 27.296875 
+Q 58.015625 14.40625 51.78125 6.484375 
+Q 45.5625 -1.421875 35.59375 -1.421875 
+Q 29.59375 -1.421875 25.265625 0.953125 
+Q 20.953125 3.328125 18.109375 8.203125 
+L 18.109375 0 
+L 9.078125 0 
+L 9.078125 75.984375 
+L 18.109375 75.984375 
+z
+" id="DejaVuSans-98"/>
+      <path d="M 9.421875 75.984375 
+L 18.40625 75.984375 
+L 18.40625 0 
+L 9.421875 0 
+z
+" id="DejaVuSans-108"/>
+      <path d="M 11.71875 12.40625 
+L 22.015625 12.40625 
+L 22.015625 0 
+L 11.71875 0 
+z
+M 11.71875 51.703125 
+L 22.015625 51.703125 
+L 22.015625 39.3125 
+L 11.71875 39.3125 
+z
+" id="DejaVuSans-58"/>
+     </defs>
+     <g style="fill:#262626;" transform="translate(148.7 87.378281)scale(0.11 -0.11)">
+      <use xlink:href="#DejaVuSans-66"/>
+      <use x="68.603516" xlink:href="#DejaVuSans-45"/>
+      <use x="104.6875" xlink:href="#DejaVuSans-116"/>
+      <use x="143.896484" xlink:href="#DejaVuSans-114"/>
+      <use x="184.978516" xlink:href="#DejaVuSans-101"/>
+      <use x="246.501953" xlink:href="#DejaVuSans-101"/>
+      <use x="308.025391" xlink:href="#DejaVuSans-32"/>
+      <use x="339.8125" xlink:href="#DejaVuSans-40"/>
+      <use x="378.826172" xlink:href="#DejaVuSans-97"/>
+      <use x="440.105469" xlink:href="#DejaVuSans-98"/>
+      <use x="503.582031" xlink:href="#DejaVuSans-115"/>
+      <use x="555.681641" xlink:href="#DejaVuSans-108"/>
+      <use x="583.464844" xlink:href="#DejaVuSans-58"/>
+      <use x="617.15625" xlink:href="#DejaVuSans-58"/>
+      <use x="650.847656" xlink:href="#DejaVuSans-98"/>
+      <use x="714.324219" xlink:href="#DejaVuSans-116"/>
+      <use x="753.533203" xlink:href="#DejaVuSans-114"/>
+      <use x="794.615234" xlink:href="#DejaVuSans-101"/>
+      <use x="856.138672" xlink:href="#DejaVuSans-101"/>
+      <use x="917.662109" xlink:href="#DejaVuSans-41"/>
+     </g>
+    </g>
+    <g id="line2d_20">
+     <path d="M 117.9 99.674219 
+L 139.9 99.674219 
+" style="fill:none;stroke:#00008b;stroke-linecap:round;stroke-width:1.5;"/>
+    </g>
+    <g id="line2d_21"/>
+    <g id="text_10">
+     <!-- B− tree -->
+     <defs>
+      <path d="M 10.59375 35.5 
+L 73.1875 35.5 
+L 73.1875 27.203125 
+L 10.59375 27.203125 
+z
+" id="DejaVuSans-8722"/>
+     </defs>
+     <g style="fill:#262626;" transform="translate(148.7 103.524219)scale(0.11 -0.11)">
+      <use xlink:href="#DejaVuSans-66"/>
+      <use x="68.603516" xlink:href="#DejaVuSans-8722"/>
+      <use x="152.392578" xlink:href="#DejaVuSans-32"/>
+      <use x="184.179688" xlink:href="#DejaVuSans-116"/>
+      <use x="223.388672" xlink:href="#DejaVuSans-114"/>
+      <use x="264.470703" xlink:href="#DejaVuSans-101"/>
+      <use x="325.994141" xlink:href="#DejaVuSans-101"/>
+     </g>
+    </g>
+   </g>
+  </g>
+  <g id="axes_2">
+   <g id="patch_8">
+    <path d="M 108 512.64 
+L 777.6 512.64 
+L 777.6 311.04 
+L 108 311.04 
+z
+" style="fill:#ffffff;"/>
+   </g>
+   <g id="matplotlib.axis_3">
+    <g id="xtick_6">
+     <g id="line2d_22">
+      <path clip-path="url(#p855d6446cc)" d="M 226.05126 512.64 
+L 226.05126 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_11">
+      <!-- $\mathdefault{2^{15}}$ -->
+      <g style="fill:#262626;" transform="translate(217.41626 530.498281)scale(0.11 -0.11)">
+       <use transform="translate(0 0.684375)" xlink:href="#DejaVuSans-50"/>
+       <use transform="translate(64.580078 38.965625)scale(0.7)" xlink:href="#DejaVuSans-49"/>
+       <use transform="translate(109.116211 38.965625)scale(0.7)" xlink:href="#DejaVuSans-53"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_7">
+     <g id="line2d_23">
+      <path clip-path="url(#p855d6446cc)" d="M 363.938445 512.64 
+L 363.938445 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_12">
+      <!-- $\mathdefault{2^{17}}$ -->
+      <g style="fill:#262626;" transform="translate(355.303445 530.498281)scale(0.11 -0.11)">
+       <use transform="translate(0 0.684375)" xlink:href="#DejaVuSans-50"/>
+       <use transform="translate(64.580078 38.965625)scale(0.7)" xlink:href="#DejaVuSans-49"/>
+       <use transform="translate(109.116211 38.965625)scale(0.7)" xlink:href="#DejaVuSans-55"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_8">
+     <g id="line2d_24">
+      <path clip-path="url(#p855d6446cc)" d="M 501.82563 512.64 
+L 501.82563 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_13">
+      <!-- $\mathdefault{2^{19}}$ -->
+      <defs>
+       <path d="M 10.984375 1.515625 
+L 10.984375 10.5 
+Q 14.703125 8.734375 18.5 7.8125 
+Q 22.3125 6.890625 25.984375 6.890625 
+Q 35.75 6.890625 40.890625 13.453125 
+Q 46.046875 20.015625 46.78125 33.40625 
+Q 43.953125 29.203125 39.59375 26.953125 
+Q 35.25 24.703125 29.984375 24.703125 
+Q 19.046875 24.703125 12.671875 31.3125 
+Q 6.296875 37.9375 6.296875 49.421875 
+Q 6.296875 60.640625 12.9375 67.421875 
+Q 19.578125 74.21875 30.609375 74.21875 
+Q 43.265625 74.21875 49.921875 64.515625 
+Q 56.59375 54.828125 56.59375 36.375 
+Q 56.59375 19.140625 48.40625 8.859375 
+Q 40.234375 -1.421875 26.421875 -1.421875 
+Q 22.703125 -1.421875 18.890625 -0.6875 
+Q 15.09375 0.046875 10.984375 1.515625 
+z
+M 30.609375 32.421875 
+Q 37.25 32.421875 41.125 36.953125 
+Q 45.015625 41.5 45.015625 49.421875 
+Q 45.015625 57.28125 41.125 61.84375 
+Q 37.25 66.40625 30.609375 66.40625 
+Q 23.96875 66.40625 20.09375 61.84375 
+Q 16.21875 57.28125 16.21875 49.421875 
+Q 16.21875 41.5 20.09375 36.953125 
+Q 23.96875 32.421875 30.609375 32.421875 
+z
+" id="DejaVuSans-57"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(493.19063 530.498281)scale(0.11 -0.11)">
+       <use transform="translate(0 0.765625)" xlink:href="#DejaVuSans-50"/>
+       <use transform="translate(64.580078 39.046875)scale(0.7)" xlink:href="#DejaVuSans-49"/>
+       <use transform="translate(109.116211 39.046875)scale(0.7)" xlink:href="#DejaVuSans-57"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_9">
+     <g id="line2d_25">
+      <path clip-path="url(#p855d6446cc)" d="M 639.712815 512.64 
+L 639.712815 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_14">
+      <!-- $\mathdefault{2^{21}}$ -->
+      <g style="fill:#262626;" transform="translate(631.077815 530.498281)scale(0.11 -0.11)">
+       <use transform="translate(0 0.765625)" xlink:href="#DejaVuSans-50"/>
+       <use transform="translate(64.580078 39.046875)scale(0.7)" xlink:href="#DejaVuSans-50"/>
+       <use transform="translate(109.116211 39.046875)scale(0.7)" xlink:href="#DejaVuSans-49"/>
+      </g>
+     </g>
+    </g>
+    <g id="xtick_10">
+     <g id="line2d_26">
+      <path clip-path="url(#p855d6446cc)" d="M 777.6 512.64 
+L 777.6 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_15">
+      <!-- $\mathdefault{2^{23}}$ -->
+      <g style="fill:#262626;" transform="translate(768.965 530.498281)scale(0.11 -0.11)">
+       <use transform="translate(0 0.765625)" xlink:href="#DejaVuSans-50"/>
+       <use transform="translate(64.580078 39.046875)scale(0.7)" xlink:href="#DejaVuSans-50"/>
+       <use transform="translate(109.116211 39.046875)scale(0.7)" xlink:href="#DejaVuSans-51"/>
+      </g>
+     </g>
+    </g>
+    <g id="text_16">
+     <!-- # of elements -->
+     <defs>
+      <path d="M 51.125 44 
+L 36.921875 44 
+L 32.8125 27.6875 
+L 47.125 27.6875 
+z
+M 43.796875 71.78125 
+L 38.71875 51.515625 
+L 52.984375 51.515625 
+L 58.109375 71.78125 
+L 65.921875 71.78125 
+L 60.890625 51.515625 
+L 76.125 51.515625 
+L 76.125 44 
+L 58.984375 44 
+L 54.984375 27.6875 
+L 70.515625 27.6875 
+L 70.515625 20.21875 
+L 53.078125 20.21875 
+L 48 0 
+L 40.1875 0 
+L 45.21875 20.21875 
+L 30.90625 20.21875 
+L 25.875 0 
+L 18.015625 0 
+L 23.09375 20.21875 
+L 7.71875 20.21875 
+L 7.71875 27.6875 
+L 24.90625 27.6875 
+L 29 44 
+L 13.28125 44 
+L 13.28125 51.515625 
+L 30.90625 51.515625 
+L 35.890625 71.78125 
+z
+" id="DejaVuSans-35"/>
+      <path d="M 30.609375 48.390625 
+Q 23.390625 48.390625 19.1875 42.75 
+Q 14.984375 37.109375 14.984375 27.296875 
+Q 14.984375 17.484375 19.15625 11.84375 
+Q 23.34375 6.203125 30.609375 6.203125 
+Q 37.796875 6.203125 41.984375 11.859375 
+Q 46.1875 17.53125 46.1875 27.296875 
+Q 46.1875 37.015625 41.984375 42.703125 
+Q 37.796875 48.390625 30.609375 48.390625 
+z
+M 30.609375 56 
+Q 42.328125 56 49.015625 48.375 
+Q 55.71875 40.765625 55.71875 27.296875 
+Q 55.71875 13.875 49.015625 6.21875 
+Q 42.328125 -1.421875 30.609375 -1.421875 
+Q 18.84375 -1.421875 12.171875 6.21875 
+Q 5.515625 13.875 5.515625 27.296875 
+Q 5.515625 40.765625 12.171875 48.375 
+Q 18.84375 56 30.609375 56 
+z
+" id="DejaVuSans-111"/>
+      <path d="M 37.109375 75.984375 
+L 37.109375 68.5 
+L 28.515625 68.5 
+Q 23.6875 68.5 21.796875 66.546875 
+Q 19.921875 64.59375 19.921875 59.515625 
+L 19.921875 54.6875 
+L 34.71875 54.6875 
+L 34.71875 47.703125 
+L 19.921875 47.703125 
+L 19.921875 0 
+L 10.890625 0 
+L 10.890625 47.703125 
+L 2.296875 47.703125 
+L 2.296875 54.6875 
+L 10.890625 54.6875 
+L 10.890625 58.5 
+Q 10.890625 67.625 15.140625 71.796875 
+Q 19.390625 75.984375 28.609375 75.984375 
+z
+" id="DejaVuSans-102"/>
+      <path d="M 52 44.1875 
+Q 55.375 50.25 60.0625 53.125 
+Q 64.75 56 71.09375 56 
+Q 79.640625 56 84.28125 50.015625 
+Q 88.921875 44.046875 88.921875 33.015625 
+L 88.921875 0 
+L 79.890625 0 
+L 79.890625 32.71875 
+Q 79.890625 40.578125 77.09375 44.375 
+Q 74.3125 48.1875 68.609375 48.1875 
+Q 61.625 48.1875 57.5625 43.546875 
+Q 53.515625 38.921875 53.515625 30.90625 
+L 53.515625 0 
+L 44.484375 0 
+L 44.484375 32.71875 
+Q 44.484375 40.625 41.703125 44.40625 
+Q 38.921875 48.1875 33.109375 48.1875 
+Q 26.21875 48.1875 22.15625 43.53125 
+Q 18.109375 38.875 18.109375 30.90625 
+L 18.109375 0 
+L 9.078125 0 
+L 9.078125 54.6875 
+L 18.109375 54.6875 
+L 18.109375 46.1875 
+Q 21.1875 51.21875 25.484375 53.609375 
+Q 29.78125 56 35.6875 56 
+Q 41.65625 56 45.828125 52.96875 
+Q 50 49.953125 52 44.1875 
+z
+" id="DejaVuSans-109"/>
+     </defs>
+     <g style="fill:#262626;" transform="translate(400.307813 545.904063)scale(0.12 -0.12)">
+      <use xlink:href="#DejaVuSans-35"/>
+      <use x="83.789062" xlink:href="#DejaVuSans-32"/>
+      <use x="115.576172" xlink:href="#DejaVuSans-111"/>
+      <use x="176.757812" xlink:href="#DejaVuSans-102"/>
+      <use x="211.962891" xlink:href="#DejaVuSans-32"/>
+      <use x="243.75" xlink:href="#DejaVuSans-101"/>
+      <use x="305.273438" xlink:href="#DejaVuSans-108"/>
+      <use x="333.056641" xlink:href="#DejaVuSans-101"/>
+      <use x="394.580078" xlink:href="#DejaVuSans-109"/>
+      <use x="491.992188" xlink:href="#DejaVuSans-101"/>
+      <use x="553.515625" xlink:href="#DejaVuSans-110"/>
+      <use x="616.894531" xlink:href="#DejaVuSans-116"/>
+      <use x="656.103516" xlink:href="#DejaVuSans-115"/>
+     </g>
+    </g>
+   </g>
+   <g id="matplotlib.axis_4">
+    <g id="ytick_8">
+     <g id="line2d_27">
+      <path clip-path="url(#p855d6446cc)" d="M 108 512.64 
+L 777.6 512.64 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_17">
+      <!-- 0.0 -->
+      <defs>
+       <path d="M 31.78125 66.40625 
+Q 24.171875 66.40625 20.328125 58.90625 
+Q 16.5 51.421875 16.5 36.375 
+Q 16.5 21.390625 20.328125 13.890625 
+Q 24.171875 6.390625 31.78125 6.390625 
+Q 39.453125 6.390625 43.28125 13.890625 
+Q 47.125 21.390625 47.125 36.375 
+Q 47.125 51.421875 43.28125 58.90625 
+Q 39.453125 66.40625 31.78125 66.40625 
+z
+M 31.78125 74.21875 
+Q 44.046875 74.21875 50.515625 64.515625 
+Q 56.984375 54.828125 56.984375 36.375 
+Q 56.984375 17.96875 50.515625 8.265625 
+Q 44.046875 -1.421875 31.78125 -1.421875 
+Q 19.53125 -1.421875 13.0625 8.265625 
+Q 6.59375 17.96875 6.59375 36.375 
+Q 6.59375 54.828125 13.0625 64.515625 
+Q 19.53125 74.21875 31.78125 74.21875 
+z
+" id="DejaVuSans-48"/>
+       <path d="M 10.6875 12.40625 
+L 21 12.40625 
+L 21 0 
+L 10.6875 0 
+z
+" id="DejaVuSans-46"/>
+      </defs>
+      <g style="fill:#262626;" transform="translate(81.006563 516.819141)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-48"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-46"/>
+       <use x="95.410156" xlink:href="#DejaVuSans-48"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_9">
+     <g id="line2d_28">
+      <path clip-path="url(#p855d6446cc)" d="M 108 485.42804 
+L 777.6 485.42804 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_18">
+      <!-- 2.5 -->
+      <g style="fill:#262626;" transform="translate(81.006563 489.607181)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-50"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-46"/>
+       <use x="95.410156" xlink:href="#DejaVuSans-53"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_10">
+     <g id="line2d_29">
+      <path clip-path="url(#p855d6446cc)" d="M 108 458.216081 
+L 777.6 458.216081 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_19">
+      <!-- 5.0 -->
+      <g style="fill:#262626;" transform="translate(81.006563 462.395221)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-53"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-46"/>
+       <use x="95.410156" xlink:href="#DejaVuSans-48"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_11">
+     <g id="line2d_30">
+      <path clip-path="url(#p855d6446cc)" d="M 108 431.004121 
+L 777.6 431.004121 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_20">
+      <!-- 7.5 -->
+      <g style="fill:#262626;" transform="translate(81.006563 435.183261)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-55"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-46"/>
+       <use x="95.410156" xlink:href="#DejaVuSans-53"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_12">
+     <g id="line2d_31">
+      <path clip-path="url(#p855d6446cc)" d="M 108 403.792161 
+L 777.6 403.792161 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_21">
+      <!-- 10.0 -->
+      <g style="fill:#262626;" transform="translate(74.007812 407.971302)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-49"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-48"/>
+       <use x="127.246094" xlink:href="#DejaVuSans-46"/>
+       <use x="159.033203" xlink:href="#DejaVuSans-48"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_13">
+     <g id="line2d_32">
+      <path clip-path="url(#p855d6446cc)" d="M 108 376.580201 
+L 777.6 376.580201 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_22">
+      <!-- 12.5 -->
+      <g style="fill:#262626;" transform="translate(74.007812 380.759342)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-49"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-50"/>
+       <use x="127.246094" xlink:href="#DejaVuSans-46"/>
+       <use x="159.033203" xlink:href="#DejaVuSans-53"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_14">
+     <g id="line2d_33">
+      <path clip-path="url(#p855d6446cc)" d="M 108 349.368242 
+L 777.6 349.368242 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_23">
+      <!-- 15.0 -->
+      <g style="fill:#262626;" transform="translate(74.007812 353.547382)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-49"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-53"/>
+       <use x="127.246094" xlink:href="#DejaVuSans-46"/>
+       <use x="159.033203" xlink:href="#DejaVuSans-48"/>
+      </g>
+     </g>
+    </g>
+    <g id="ytick_15">
+     <g id="line2d_34">
+      <path clip-path="url(#p855d6446cc)" d="M 108 322.156282 
+L 777.6 322.156282 
+" style="fill:none;stroke:#cccccc;stroke-linecap:round;"/>
+     </g>
+     <g id="text_24">
+      <!-- 17.5 -->
+      <g style="fill:#262626;" transform="translate(74.007812 326.335423)scale(0.11 -0.11)">
+       <use xlink:href="#DejaVuSans-49"/>
+       <use x="63.623047" xlink:href="#DejaVuSans-55"/>
+       <use x="127.246094" xlink:href="#DejaVuSans-46"/>
+       <use x="159.033203" xlink:href="#DejaVuSans-53"/>
+      </g>
+     </g>
+    </g>
+    <g id="text_25">
+     <!-- Find (speedup) -->
+     <defs>
+      <path d="M 9.8125 72.90625 
+L 51.703125 72.90625 
+L 51.703125 64.59375 
+L 19.671875 64.59375 
+L 19.671875 43.109375 
+L 48.578125 43.109375 
+L 48.578125 34.8125 
+L 19.671875 34.8125 
+L 19.671875 0 
+L 9.8125 0 
+z
+" id="DejaVuSans-70"/>
+      <path d="M 9.421875 54.6875 
+L 18.40625 54.6875 
+L 18.40625 0 
+L 9.421875 0 
+z
+M 9.421875 75.984375 
+L 18.40625 75.984375 
+L 18.40625 64.59375 
+L 9.421875 64.59375 
+z
+" id="DejaVuSans-105"/>
+     </defs>
+     <g style="fill:#262626;" transform="translate(67.512187 456.890625)rotate(-90)scale(0.12 -0.12)">
+      <use xlink:href="#DejaVuSans-70"/>
+      <use x="57.410156" xlink:href="#DejaVuSans-105"/>
+      <use x="85.193359" xlink:href="#DejaVuSans-110"/>
+      <use x="148.572266" xlink:href="#DejaVuSans-100"/>
+      <use x="212.048828" xlink:href="#DejaVuSans-32"/>
+      <use x="243.835938" xlink:href="#DejaVuSans-40"/>
+      <use x="282.849609" xlink:href="#DejaVuSans-115"/>
+      <use x="334.949219" xlink:href="#DejaVuSans-112"/>
+      <use x="398.425781" xlink:href="#DejaVuSans-101"/>
+      <use x="459.949219" xlink:href="#DejaVuSans-101"/>
+      <use x="521.472656" xlink:href="#DejaVuSans-100"/>
+      <use x="584.949219" xlink:href="#DejaVuSans-117"/>
+      <use x="648.328125" xlink:href="#DejaVuSans-112"/>
+      <use x="711.804688" xlink:href="#DejaVuSans-41"/>
+     </g>
+    </g>
+   </g>
+   <g id="line2d_35">
+    <path clip-path="url(#p855d6446cc)" d="M 108 499.791772 
+L 123.616312 498.594841 
+L 139.225357 497.880247 
+L 154.835707 497.340579 
+L 170.450001 496.491036 
+L 186.065768 495.792765 
+L 201.67987 493.554257 
+L 217.2929 493.4533 
+L 232.909155 491.427237 
+L 248.525007 490.945433 
+L 264.139766 488.63212 
+L 279.754857 487.791102 
+L 295.370534 484.144491 
+L 310.986484 483.969882 
+L 326.602365 480.954895 
+L 342.218044 480.723568 
+L 357.834332 479.282329 
+L 373.450023 479.740861 
+L 389.066187 479.653222 
+L 404.682343 480.111522 
+L 420.298254 477.96265 
+L 435.914393 476.535533 
+L 451.530695 474.09666 
+L 467.14687 470.485392 
+L 482.763023 470.152497 
+L 498.37927 470.580425 
+L 513.99548 469.137974 
+L 529.611731 467.214216 
+L 545.228 466.363885 
+L 560.844274 468.388531 
+L 576.460557 470.155676 
+L 592.07685 470.001626 
+L 607.693122 472.709907 
+L 623.309383 472.150377 
+L 638.92567 472.589095 
+L 654.541958 471.031573 
+L 670.158255 472.421618 
+L 685.774556 469.778201 
+L 701.390864 467.713399 
+L 717.007176 468.010441 
+L 732.623475 468.317457 
+L 748.239782 474.591484 
+L 763.85609 474.671292 
+L 779.472394 473.262427 
+" style="fill:none;stroke:#8b0000;stroke-linecap:round;stroke-width:1.5;"/>
+   </g>
+   <g id="line2d_36">
+    <path clip-path="url(#p855d6446cc)" d="M 108 418.749904 
+L 123.616312 412.794548 
+L 139.225357 402.696615 
+L 154.835707 439.113201 
+L 170.450001 434.583906 
+L 186.065768 428.913038 
+L 201.67987 420.416599 
+L 217.2929 418.659986 
+L 232.909155 406.332849 
+L 248.525007 402.055402 
+L 264.139766 390.890873 
+L 279.754857 383.021445 
+L 295.370534 361.955976 
+L 310.986484 360.412005 
+L 326.602365 351.268366 
+L 342.218044 355.112604 
+L 357.834332 334.636814 
+L 373.450023 334.882006 
+L 389.066187 332.335167 
+L 404.682343 333.16291 
+L 420.298254 320.02818 
+L 435.914393 320.645107 
+L 451.530695 359.248681 
+L 467.14687 369.512873 
+L 482.763023 351.371353 
+L 498.37927 376.08138 
+L 513.99548 405.312609 
+L 529.611731 386.792036 
+L 545.228 376.179269 
+L 560.844274 368.401697 
+L 576.460557 367.059523 
+L 592.07685 353.509739 
+L 607.693122 355.637439 
+L 623.309383 369.639737 
+L 638.92567 362.529499 
+L 654.541958 355.673089 
+L 670.158255 360.130965 
+L 685.774556 363.802691 
+L 701.390864 352.381704 
+L 717.007176 356.423031 
+L 732.623475 357.651465 
+L 748.239782 372.440374 
+L 763.85609 369.836243 
+L 779.472394 360.828615 
+" style="fill:none;stroke:#00008b;stroke-linecap:round;stroke-width:1.5;"/>
+   </g>
+   <g id="line2d_37">
+    <path clip-path="url(#p855d6446cc)" d="M 88.164075 512.64 
+L 88.164075 311.04 
+" style="fill:none;stroke:#000000;stroke-dasharray:3.7,1.6;stroke-dashoffset:0;"/>
+   </g>
+   <g id="line2d_38">
+    <path clip-path="url(#p855d6446cc)" d="M 363.938445 512.64 
+L 363.938445 311.04 
+" style="fill:none;stroke:#000000;stroke-dasharray:3.7,1.6;stroke-dashoffset:0;"/>
+   </g>
+   <g id="line2d_39">
+    <path clip-path="url(#p855d6446cc)" d="M 570.769222 512.64 
+L 570.769222 311.04 
+" style="fill:none;stroke:#000000;stroke-dasharray:3.7,1.6;stroke-dashoffset:0;"/>
+   </g>
+   <g id="patch_9">
+    <path d="M 108 512.64 
+L 108 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+   <g id="patch_10">
+    <path d="M 777.6 512.64 
+L 777.6 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+   <g id="patch_11">
+    <path d="M 108 512.64 
+L 777.6 512.64 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+   <g id="patch_12">
+    <path d="M 108 311.04 
+L 777.6 311.04 
+" style="fill:none;stroke:#cccccc;stroke-linecap:square;stroke-linejoin:miter;stroke-width:1.25;"/>
+   </g>
+  </g>
+  <g id="text_26">
+   <!-- * vs. std::set -->
+   <defs>
+    <path d="M 47.015625 60.890625 
+L 29.5 51.421875 
+L 47.015625 41.890625 
+L 44.1875 37.109375 
+L 27.78125 47.015625 
+L 27.78125 28.609375 
+L 22.21875 28.609375 
+L 22.21875 47.015625 
+L 5.8125 37.109375 
+L 2.984375 41.890625 
+L 20.515625 51.421875 
+L 2.984375 60.890625 
+L 5.8125 65.71875 
+L 22.21875 55.8125 
+L 22.21875 74.21875 
+L 27.78125 74.21875 
+L 27.78125 55.8125 
+L 44.1875 65.71875 
+z
+" id="DejaVuSans-42"/>
+    <path d="M 2.984375 54.6875 
+L 12.5 54.6875 
+L 29.59375 8.796875 
+L 46.6875 54.6875 
+L 56.203125 54.6875 
+L 35.6875 0 
+L 23.484375 0 
+z
+" id="DejaVuSans-118"/>
+   </defs>
+   <g style="fill:#262626;" transform="translate(386.524125 39.74175)scale(0.144 -0.144)">
+    <use xlink:href="#DejaVuSans-42"/>
+    <use x="50" xlink:href="#DejaVuSans-32"/>
+    <use x="81.787109" xlink:href="#DejaVuSans-118"/>
+    <use x="140.966797" xlink:href="#DejaVuSans-115"/>
+    <use x="193.066406" xlink:href="#DejaVuSans-46"/>
+    <use x="224.853516" xlink:href="#DejaVuSans-32"/>
+    <use x="256.640625" xlink:href="#DejaVuSans-115"/>
+    <use x="308.740234" xlink:href="#DejaVuSans-116"/>
+    <use x="347.949219" xlink:href="#DejaVuSans-100"/>
+    <use x="411.425781" xlink:href="#DejaVuSans-58"/>
+    <use x="445.117188" xlink:href="#DejaVuSans-58"/>
+    <use x="478.808594" xlink:href="#DejaVuSans-115"/>
+    <use x="530.908203" xlink:href="#DejaVuSans-101"/>
+    <use x="592.431641" xlink:href="#DejaVuSans-116"/>
+   </g>
+  </g>
+ </g>
+ <defs>
+  <clipPath id="p12d3d9f39c">
+   <rect height="201.6" width="669.6" x="108" y="69.12"/>
+  </clipPath>
+  <clipPath id="p855d6446cc">
+   <rect height="201.6" width="669.6" x="108" y="311.04"/>
+  </clipPath>
+ </defs>
+</svg>
diff --git a/content/english/hpc/data-structures/img/eytzinger.png b/content/english/hpc/data-structures/img/eytzinger.png
index 97237c73..901efdd2 100644
Binary files a/content/english/hpc/data-structures/img/eytzinger.png and b/content/english/hpc/data-structures/img/eytzinger.png differ
diff --git a/content/english/hpc/data-structures/img/eytzinger_old.png b/content/english/hpc/data-structures/img/eytzinger_old.png
new file mode 100644
index 00000000..97237c73
Binary files /dev/null and b/content/english/hpc/data-structures/img/eytzinger_old.png differ
diff --git a/content/english/hpc/data-structures/img/src/eytzinger.svg b/content/english/hpc/data-structures/img/src/eytzinger.svg
new file mode 100644
index 00000000..da565f0d
--- /dev/null
+++ b/content/english/hpc/data-structures/img/src/eytzinger.svg
@@ -0,0 +1,454 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<!-- Created with Inkscape (http://www.inkscape.org/) -->
+
+<svg
+   width="420"
+   height="280"
+   viewBox="0 0 111.12511 74.083332"
+   version="1.1"
+   id="svg5"
+   inkscape:version="1.2.1 (9c6d41e410, 2022-07-14)"
+   sodipodi:docname="eytzinger.svg"
+   inkscape:export-filename="../eytzinger2.png"
+   inkscape:export-xdpi="137.14285"
+   inkscape:export-ydpi="137.14285"
+   xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
+   xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
+   xmlns="http://www.w3.org/2000/svg"
+   xmlns:svg="http://www.w3.org/2000/svg">
+  <sodipodi:namedview
+     id="namedview7"
+     pagecolor="#ffffff"
+     bordercolor="#000000"
+     borderopacity="0.25"
+     inkscape:showpageshadow="2"
+     inkscape:pageopacity="0.0"
+     inkscape:pagecheckerboard="0"
+     inkscape:deskcolor="#d1d1d1"
+     inkscape:document-units="mm"
+     showgrid="true"
+     inkscape:zoom="4"
+     inkscape:cx="241.875"
+     inkscape:cy="237.375"
+     inkscape:window-width="2560"
+     inkscape:window-height="1011"
+     inkscape:window-x="0"
+     inkscape:window-y="32"
+     inkscape:window-maximized="1"
+     inkscape:current-layer="layer1">
+    <inkscape:grid
+       type="xygrid"
+       id="grid23683"
+       spacingx="0.1322917"
+       spacingy="0.1322917"
+       originx="0"
+       originy="0" />
+  </sodipodi:namedview>
+  <defs
+     id="defs2">
+    <rect
+       x="-90"
+       y="-80"
+       width="690"
+       height="530"
+       id="rect27770" />
+  </defs>
+  <g
+     inkscape:label="Layer 1"
+     inkscape:groupmode="layer"
+     id="layer1">
+    <circle
+       style="fill:#ff0000;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-opacity:1;stroke-dasharray:none"
+       id="path221"
+       cx="69.291191"
+       cy="7.9003606"
+       r="4.5881357" />
+    <circle
+       style="fill:#ff4646;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-opacity:1;stroke-dasharray:none"
+       id="path221-4"
+       cx="41.714005"
+       cy="23.151033"
+       r="4.5881357" />
+    <circle
+       style="fill:#ff4646;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-opacity:1;stroke-dasharray:none"
+       id="path221-4-4"
+       cx="87.836967"
+       cy="23.180923"
+       r="4.5881357" />
+    <circle
+       style="fill:#ffdcdc;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-opacity:1;stroke-dasharray:none"
+       id="path221-4-4-8"
+       cx="13.676345"
+       cy="51.400574"
+       r="4.5881357" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:0.265643;stroke-dasharray:none;stroke-opacity:1"
+       x="15.16085"
+       y="53.399693"
+       id="text2027"><tspan
+         sodipodi:role="line"
+         id="tspan2025"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;opacity:1;fill:#000000;fill-opacity:1;stroke-width:0.265643;stroke-dasharray:none"
+         x="15.16085"
+         y="53.399693">0</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:0.265643;stroke-dasharray:none;stroke-opacity:1"
+       x="89.296989"
+       y="25.13541"
+       id="text2027-3-9-7"><tspan
+         sodipodi:role="line"
+         id="tspan2025-1-2-1"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;opacity:1;fill:#000000;fill-opacity:1;stroke-width:0.265643;stroke-dasharray:none"
+         x="89.296989"
+         y="25.13541">7</tspan></text>
+    <circle
+       style="fill:#ffdcdc;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-opacity:1;stroke-dasharray:none"
+       id="path221-4-4-8-7"
+       cx="32.24707"
+       cy="51.400574"
+       r="4.5881357" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:0.265643;stroke-dasharray:none;stroke-opacity:1"
+       x="33.896336"
+       y="53.318047"
+       id="text2027-3-9"><tspan
+         sodipodi:role="line"
+         id="tspan2025-1-2"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;opacity:1;fill:#000000;fill-opacity:1;stroke-width:0.265643;stroke-dasharray:none"
+         x="33.896336"
+         y="53.318047">2</tspan></text>
+    <circle
+       style="fill:#ffdcdc;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-opacity:1;stroke-dasharray:none"
+       id="path221-4-4-8-7-2"
+       cx="50.70089"
+       cy="51.425514"
+       r="4.5881357" />
+    <circle
+       style="fill:#ffaaaa;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-opacity:1;stroke-dasharray:none"
+       id="path221-4-4-0"
+       cx="22.96171"
+       cy="38.635292"
+       r="4.5881357" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:0.265643;stroke-dasharray:none;stroke-opacity:1"
+       x="24.498793"
+       y="40.485737"
+       id="text2027-3"><tspan
+         sodipodi:role="line"
+         id="tspan2025-1"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;opacity:1;fill:#000000;fill-opacity:1;stroke-width:0.265643;stroke-dasharray:none"
+         x="24.498793"
+         y="40.485737">1</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:0.265643;stroke-dasharray:none;stroke-opacity:1"
+       x="42.994846"
+       y="25.13541"
+       id="text2027-3-9-6"><tspan
+         sodipodi:role="line"
+         id="tspan2025-1-2-8"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;opacity:1;fill:#000000;fill-opacity:1;stroke-width:0.265643;stroke-dasharray:none"
+         x="42.994846"
+         y="25.13541">3</tspan></text>
+    <circle
+       style="fill:#ffaaaa;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-opacity:1;stroke-dasharray:none"
+       id="path221-4-4-0-9"
+       cx="60.003384"
+       cy="38.635292"
+       r="4.5881357" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:0.265643;stroke-dasharray:none;stroke-opacity:1"
+       x="52.321949"
+       y="53.507244"
+       id="text2027-3-9-6-2"><tspan
+         sodipodi:role="line"
+         id="tspan2025-1-2-8-6"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;opacity:1;fill:#000000;fill-opacity:1;stroke-width:0.265643;stroke-dasharray:none"
+         x="52.321949"
+         y="53.507244">4</tspan></text>
+    <circle
+       style="fill:#ffaaaa;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-opacity:1;stroke-dasharray:none"
+       id="path221-4-4-0-9-6"
+       cx="78.524223"
+       cy="38.635292"
+       r="4.5881357" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:0.265643;stroke-dasharray:none;stroke-opacity:1"
+       x="79.889954"
+       y="40.358799"
+       id="text2027-3-9-7-2"><tspan
+         sodipodi:role="line"
+         id="tspan2025-1-2-1-7"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;opacity:1;fill:#000000;fill-opacity:1;stroke-width:0.265643;stroke-dasharray:none"
+         x="79.889954"
+         y="40.358799">8</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:0.265643;stroke-dasharray:none;stroke-opacity:1"
+       x="61.449898"
+       y="40.618031"
+       id="text2027-3-9-6-2-4"><tspan
+         sodipodi:role="line"
+         id="tspan2025-1-2-8-6-9"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;opacity:1;fill:#000000;fill-opacity:1;stroke-width:0.265643;stroke-dasharray:none"
+         x="61.449898"
+         y="40.618031">5</tspan></text>
+    <circle
+       style="fill:#ffaaaa;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-opacity:1;stroke-dasharray:none"
+       id="path221-4-4-0-9-6-5"
+       cx="97.045059"
+       cy="38.635292"
+       r="4.5881357" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:0.265643;stroke-dasharray:none;stroke-opacity:1"
+       x="98.401169"
+       y="40.353977"
+       id="text2027-3-9-7-2-2"><tspan
+         sodipodi:role="line"
+         id="tspan2025-1-2-1-7-6"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;opacity:1;fill:#000000;fill-opacity:1;stroke-width:0.265643;stroke-dasharray:none"
+         x="98.401169"
+         y="40.353977">9</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:0.265643;stroke-dasharray:none;stroke-opacity:1"
+       x="70.737701"
+       y="9.9268551"
+       id="text2027-3-9-6-2-4-0"><tspan
+         sodipodi:role="line"
+         id="tspan2025-1-2-8-6-9-4"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;opacity:1;fill:#000000;fill-opacity:1;stroke-width:0.265643;stroke-dasharray:none"
+         x="70.737701"
+         y="9.9268551">6</tspan></text>
+    <rect
+       style="opacity:1;fill:#ff0000;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-opacity:1;stroke-dasharray:none"
+       id="rect19616"
+       width="9.2742929"
+       height="9.2742929"
+       x="9.0694494"
+       y="60.684288" />
+    <rect
+       style="fill:#ff4646;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-opacity:1;stroke-dasharray:none"
+       id="rect19616-6"
+       width="9.2742929"
+       height="9.2742929"
+       x="18.343744"
+       y="60.684288" />
+    <rect
+       style="fill:#ff4646;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-opacity:1;stroke-dasharray:none"
+       id="rect19616-3"
+       width="9.2742929"
+       height="9.2742929"
+       x="27.618038"
+       y="60.684288" />
+    <rect
+       style="fill:#ffaaaa;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-opacity:1;stroke-dasharray:none"
+       id="rect19616-6-8"
+       width="9.2742929"
+       height="9.2742929"
+       x="36.89233"
+       y="60.684288" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:0.265643;stroke-dasharray:none;stroke-opacity:1"
+       x="43.01965"
+       y="67.340981"
+       id="text2027-1-7"><tspan
+         sodipodi:role="line"
+         id="tspan2025-0-3"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;opacity:1;fill:#000000;fill-opacity:1;stroke-width:0.265643;stroke-dasharray:none"
+         x="43.01965"
+         y="67.340981">1</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:0.265643;stroke-dasharray:none;stroke-opacity:1"
+       x="24.175987"
+       y="67.309631"
+       id="text2027-1-7-1"><tspan
+         sodipodi:role="line"
+         id="tspan2025-0-3-2"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;opacity:1;fill:#000000;fill-opacity:1;stroke-width:0.265643;stroke-dasharray:none"
+         x="24.175987"
+         y="67.309631">3</tspan></text>
+    <rect
+       style="fill:#ffaaaa;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-opacity:1;stroke-dasharray:none"
+       id="rect19616-9"
+       width="9.2742929"
+       height="9.2742929"
+       x="46.166618"
+       y="60.684288" />
+    <rect
+       style="fill:#ffdcdc;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-opacity:1;stroke-dasharray:none"
+       id="rect19616-4"
+       width="9.2742929"
+       height="9.2742929"
+       x="83.263794"
+       y="60.684288" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:0.265643;stroke-dasharray:none;stroke-opacity:1"
+       x="89.305939"
+       y="67.155983"
+       id="text2027-1-9"><tspan
+         sodipodi:role="line"
+         id="tspan2025-0-4"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;opacity:1;fill:#000000;fill-opacity:1;stroke-width:0.265643;stroke-dasharray:none"
+         x="89.305939"
+         y="67.155983">2</tspan></text>
+    <rect
+       style="fill:#ffdcdc;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-opacity:1;stroke-dasharray:none"
+       id="rect19616-6-6"
+       width="9.2742929"
+       height="9.2742929"
+       x="92.538086"
+       y="60.684288" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:0.265643;stroke-dasharray:none;stroke-opacity:1"
+       x="98.788048"
+       y="67.177864"
+       id="text2027-1-3"><tspan
+         sodipodi:role="line"
+         id="tspan2025-0-9"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;opacity:1;fill:#000000;fill-opacity:1;stroke-width:0.265643;stroke-dasharray:none"
+         x="98.788048"
+         y="67.177864">4</tspan></text>
+    <rect
+       style="fill:#ffaaaa;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-opacity:1;stroke-dasharray:none"
+       id="rect19616-6-0"
+       width="9.2742929"
+       height="9.2742929"
+       x="55.440918"
+       y="60.684288" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:0.265643;stroke-dasharray:none;stroke-opacity:1"
+       x="61.383411"
+       y="67.336502"
+       id="text2027-1-4"><tspan
+         sodipodi:role="line"
+         id="tspan2025-0-47"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;opacity:1;fill:#000000;fill-opacity:1;stroke-width:0.265643;stroke-dasharray:none"
+         x="61.383411"
+         y="67.336502">8</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:0.265643;stroke-dasharray:none;stroke-opacity:1"
+       x="51.990688"
+       y="67.204216"
+       id="text2027-1-7-8"><tspan
+         sodipodi:role="line"
+         id="tspan2025-0-3-8"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;opacity:1;fill:#000000;fill-opacity:1;stroke-width:0.265643;stroke-dasharray:none"
+         x="51.990688"
+         y="67.204216">5</tspan></text>
+    <rect
+       style="fill:#ffaaaa;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-opacity:1;stroke-dasharray:none"
+       id="rect19616-3-5"
+       width="9.2742929"
+       height="9.2742929"
+       x="64.715202"
+       y="60.684288" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:0.265643;stroke-dasharray:none;stroke-opacity:1"
+       x="70.595444"
+       y="67.341507"
+       id="text2027-1-7-3"><tspan
+         sodipodi:role="line"
+         id="tspan2025-0-3-1"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;opacity:1;fill:#000000;fill-opacity:1;stroke-width:0.265643;stroke-dasharray:none"
+         x="70.595444"
+         y="67.341507">9</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:0.265643;stroke-dasharray:none;stroke-opacity:1"
+       x="15.05302"
+       y="67.42263"
+       id="text2027-1-9-0"
+       inkscape:transform-center-x="0.081921321"
+       inkscape:transform-center-y="-0.075522315"><tspan
+         sodipodi:role="line"
+         id="tspan2025-0-4-9"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;opacity:1;fill:#000000;fill-opacity:1;stroke-width:0.265643;stroke-dasharray:none"
+         x="15.05302"
+         y="67.42263">6</tspan></text>
+    <rect
+       style="fill:#ffdcdc;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-opacity:1;stroke-dasharray:none"
+       id="rect19616-6-8-6"
+       width="9.2742929"
+       height="9.2742929"
+       x="73.989502"
+       y="60.684288" />
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:0.265643;stroke-dasharray:none;stroke-opacity:1"
+       x="79.983849"
+       y="67.158051"
+       id="text2027-1"><tspan
+         sodipodi:role="line"
+         id="tspan2025-0"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;opacity:1;fill:#000000;fill-opacity:1;stroke-width:0.265643;stroke-dasharray:none"
+         x="79.983849"
+         y="67.158051">0</tspan></text>
+    <text
+       xml:space="preserve"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;fill:#000000;fill-opacity:1;stroke:#000000;stroke-width:0.265643;stroke-dasharray:none;stroke-opacity:1"
+       x="33.420914"
+       y="67.478096"
+       id="text2027-1-7-1-3"><tspan
+         sodipodi:role="line"
+         id="tspan2025-0-3-2-8"
+         style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:4.93889px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;text-anchor:end;opacity:1;fill:#000000;fill-opacity:1;stroke-width:0.265643;stroke-dasharray:none"
+         x="33.420914"
+         y="67.478096">7</tspan></text>
+    <text
+       xml:space="preserve"
+       transform="scale(0.26458333)"
+       id="text27768"
+       style="font-style:normal;font-variant:normal;font-weight:normal;font-stretch:normal;font-size:18.6667px;font-family:'Droid Sans';-inkscape-font-specification:'Droid Sans';text-align:end;white-space:pre;shape-inside:url(#rect27770);display:inline;opacity:1;fill:#ffaaaa;fill-opacity:1;stroke:#000000;stroke-width:1.00401;stroke-dasharray:none;stroke-opacity:1" />
+    <path
+       style="opacity:1;fill:#ffffff;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       d="m 20.24065,42.333349 -3.968755,5.291675"
+       id="path43298" />
+    <path
+       style="opacity:1;fill:#ffffff;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       d="m 57.282363,42.333349 -3.968754,5.291675"
+       id="path43378" />
+    <path
+       style="opacity:1;fill:#ffffff;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       d="m 25.664616,42.333349 3.968755,5.291675"
+       id="path43380" />
+    <path
+       style="opacity:1;fill:#ffffff;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       d="M 37.921684,25.72447 25.797251,35.056853"
+       id="path43382" />
+    <path
+       style="opacity:1;fill:#ffffff;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       d="m 45.55494,25.723093 11.594882,9.333899"
+       id="path43384" />
+    <path
+       style="opacity:1;fill:#ffffff;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       d="m 85.328232,26.987498 -4.497921,7.672924"
+       id="path43388" />
+    <path
+       style="opacity:1;fill:#ffffff;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       d="m 90.355323,26.987498 4.36563,7.672924"
+       id="path43390" />
+    <path
+       style="opacity:1;fill:#ffffff;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       d="m 72.760509,10.980185 11.377097,9.525012"
+       id="path43392" />
+    <path
+       style="opacity:1;fill:#ffffff;fill-opacity:1;stroke:#000000;stroke-width:0.3000003;stroke-dasharray:none;stroke-dashoffset:0;stroke-opacity:1"
+       d="M 65.182847,9.9895775 45.507975,20.505879"
+       id="path43394" />
+  </g>
+</svg>
diff --git a/content/english/hpc/data-structures/s-tree.md b/content/english/hpc/data-structures/s-tree.md
index 216ba4bb..875f72ec 100644
--- a/content/english/hpc/data-structures/s-tree.md
+++ b/content/english/hpc/data-structures/s-tree.md
@@ -3,9 +3,9 @@ title: Static B-Trees
 weight: 2
 ---
 
-This article is a follow-up to the [previous one](../binary-search), where we optimized binary search by the means of removing branching and improving the memory layout. Here, we will also be searching over sorted arrays, but this time we are not limited to fetching and comparing only one element at a time.
+This section is a follow-up to the [previous one](../binary-search), where we optimized binary search by the means of removing branching and improving the memory layout. Here, we will also be searching in sorted arrays, but this time we are not limited to fetching and comparing only one element at a time.
 
-In this article, we generalize the techniques we developed for binary search to *static B-trees* and accelerate them further using [SIMD instructions](/hpc/simd). In particular, we develop two new implicit data structures:
+In this section, we generalize the techniques we developed for binary search to *static B-trees* and accelerate them further using [SIMD instructions](/hpc/simd). In particular, we develop two new implicit data structures:
 
 - The [first](#b-tree-layout) is based on the memory layout of a B-tree, and, depending on the array size, it is up to 8x faster than `std::lower_bound` while using the same space as the array and only requiring a permutation of its elements.
 - The [second](#b-tree-layout-1) is based on the memory layout of a B+ tree, and it is up to 15x faster than `std::lower_bound` while using just 6-7% more memory — or 6-7% **of** the memory if we can keep the original sorted array.
@@ -102,7 +102,19 @@ int i = __builtin_ffs(mask) - 1;
 // now i is the number of the correct child node
 ```
 
-Unfortunately, the compilers are not smart enough yet to auto-vectorize this code, so we need to manually vectorize it with intrinsics:
+Unfortunately, the compilers are not smart enough to [auto-vectorize](/hpc/simd/auto-vectorization/) this code yet, so we have to optimize it manually. In AVX2, we can load 8 elements, compare them against the search key, producing a [vector mask](/hpc/simd/masking/), and then extract the scalar mask from it with `movemask`. Here is a minimized illustrated example of what we want to do:
+
+```center
+       y = 4        17       65       103     
+       x = 42       42       42       42      
+   y ≥ x = 00000000 00000000 11111111 11111111
+           ├┬┬┬─────┴────────┴────────┘       
+movemask = 0011                               
+           ┌─┘                                
+     ffs = 3                                  
+```
+
+Since we are limited to processing 8 elements at a time (half our block / cache line size), we have to split the elements into two groups and then combine the two 8-bit masks. To do this, it will be slightly easier to swap the condition for `x > y` and compute the inverted mask instead:
 
 ```c++
 typedef __m256i reg;
@@ -114,7 +126,7 @@ int cmp(reg x_vec, int* y_ptr) {
 }
 ```
 
-This function works for 8-element vectors, which is half our block / cache line size. To process the entire block, we need to call it twice and then combine the masks:
+Now, to process the entire block, we need to call it twice and combine the masks:
 
 ```c++
 int mask = ~(
@@ -123,7 +135,7 @@ int mask = ~(
 );
 ```
 
-Now, to descend down the tree, we use `ffs` on that mask to get the correct child number and just call the `go` function we defined earlier:
+To descend down the tree, we use `ffs` on that mask to get the correct child number and just call the `go` function we defined earlier:
 
 ```c++
 int i = __builtin_ffs(mask) - 1;
@@ -301,7 +313,7 @@ It doesn't feel very satisfying so far, but we will reuse these optimization ide
 There are two main problems with the current implementation:
 
 - The `update` procedure is quite costly, especially considering that it is very likely going to be useless: 16 out of 17 times, we can just fetch the result from the last block.
-- We do a non-constant number of iterations, causing branch prediction problems similar to how it did for the [Eytzinger binary search](/binary-search/#removing-the-last-branch); you can also see it on the graph this time, but the latency bumps have a period of $2^4$.
+- We do a non-constant number of iterations, causing branch prediction problems similar to how it did for the [Eytzinger binary search](../binary-search/#removing-the-last-branch); you can also see it on the graph this time, but the latency bumps have a period of $2^4$.
 
 To address these problems, we need to change the layout a little bit.
 
@@ -325,7 +337,7 @@ The disadvantage is that this layout is not *succinct*: we need some additional
 
 ### Implicit B+ Tree
 
-To be more explicit with pointer arithmetic, we will store the entire tree in a single one-dimensional array. To minimize index computations during run-time, we will store each layer sequentially in this array and use compile-time computed offsets to address them: the keys of the node number `k` on layer `h` start with `btree[offset(h) + k * B]`, and its `i`-th child will at `btree[offset(h - 1) + (k * (B + 1) + i) * B]`.
+To be more explicit with pointer arithmetic, we will store the entire tree in a single one-dimensional array. To minimize index computations during run time, we will store each layer sequentially in this array and use compile time computed offsets to address them: the keys of the node number `k` on layer `h` start with `btree[offset(h) + k * B]`, and its `i`-th child will at `btree[offset(h - 1) + (k * (B + 1) + i) * B]`.
 
 To implement all that, we need slightly more `constexpr` functions:
 
@@ -335,7 +347,7 @@ constexpr int blocks(int n) {
     return (n + B - 1) / B;
 }
 
-// number of keys on the layer pervious to one with n element
+// number of keys on the layer previous to one with n keys
 constexpr int prev_keys(int n) {
     return (blocks(n) + B) / (B + 1) * B;
 }
@@ -345,7 +357,7 @@ constexpr int height(int n) {
     return (n <= B ? 1 : height(prev_keys(n)) + 1);
 }
 
-// where the layer h starts (0 is the largest)
+// where the layer h starts (layer 0 is the largest)
 constexpr int offset(int h) {
     int k = 0, n = N;
     while (h--) {
@@ -467,7 +479,7 @@ A lot of the performance boost of the S+ tree comes from removing branching and
 
 <!-- grouping requests together explicitly? -->
 
-Although nobody except maybe the HFT people cares about real latency, and everybody actually measures throughput even when using the word "latency", this nuance is still something to take into account when predicting the possible speedup in user applications.
+Although nobody except maybe the HFT people cares about real latency, and everybody actually measures throughput even when using the word "latency," this nuance is still something to take into account when predicting the possible speedup in user applications.
 
 ### Modifications and Further Optimizations
 
@@ -548,6 +560,7 @@ Other possible minor optimizations include:
 - Rewriting the whole thing in assembly, as the compiler seems to struggle with pointer arithmetic.
 - Using [blending](/hpc/simd/masking) instead of `packs`: you can odd-even shuffle node keys (`[1 3 5 7] [2 4 6 8]`), compare against the search key, and then blend the low 16 bits of the first register mask with the high 16 bits of the second. Blending is slightly faster on many architectures, and it may also help to alternate between packing and blending as they use different subsets of ports. (Thanks to Const-me from HackerNews for [suggesting](https://news.ycombinator.com/item?id=30381912) it.)
 - Using [popcount](/hpc/simd/shuffling/#shuffles-and-popcount) instead of `tzcnt`: the index `i` is equal to the number of keys less than `x`, so we can compare `x` against all keys, combine the vector mask any way we want, call `maskmov`, and then calculate the number of set bits with `popcnt`. This removes the need to store the keys in any particular order, which lets us skip the permutation step and also use this procedure on the last layer as well.
+- Defining the key $i$ as the *maximum* key in the subtree of child $i$ instead of the *minimum* key in the subtree of child $(i + 1)$. The correctness doesn't change, but this guarantees that the result will be stored in the last node we access (and not in the first element of the next neighbor node), which lets us fetch slightly fewer cache lines.   
 
 Note that the current implementation is specific to AVX2 and may require some non-trivial changes to adapt to other platforms. It would be interesting to port it for Intel CPUs with AVX-512 and Arm CPUs with 128-bit NEON, which may require some [trickery](https://github.com/WebAssembly/simd/issues/131) to work.
 
@@ -583,7 +596,7 @@ My next priorities is to adapt it to segment trees, which I know how to do, and
 
 Of course, this comparison is not fair, as implementing a dynamic search tree is a more high-dimensional problem.
 
-We'd also need to implement the update operation, which will not be that efficient, and for which we'd need to sacrifice the fanout factor. But it still seems possible to implement a 10-20x faster `std::set` and a 3-5x faster `absl::btree_set`, depending on how you define "faster" — and this is one of the things we'll attempt to do next.
+We'd also need to implement the update operation, which will not be that efficient, and for which we'd need to sacrifice the fanout factor. But it still seems possible to implement a 10-20x faster `std::set` and a 3-5x faster `absl::btree_set`, depending on how you define "faster" — and this is one of the things we'll [attempt to do next](../b-tree).
 
 
 <!--
diff --git a/content/english/hpc/data-structures/segment-trees.md b/content/english/hpc/data-structures/segment-trees.md
index 08aa0fa2..9ad14608 100644
--- a/content/english/hpc/data-structures/segment-trees.md
+++ b/content/english/hpc/data-structures/segment-trees.md
@@ -1,6 +1,7 @@
 ---
 title: Segment Trees
-weight: 3
+weight: 4
+published: true
 ---
 
 The lessons learned from [optimizing](../s-tree) [binary search](../binary-search) can be applied to a broad range of data structures.
@@ -79,7 +80,7 @@ There are many things segment trees can do. Persistent structures, computational
 
 Segment trees are used for windowing queries or range queries in general, either by themselves or as part of a larger algorithm.
 
-Functional programming, e. g. for implementing persistent arrays and derived structures.
+Functional programming, e.g., for implementing persistent arrays and derived structures.
 
 -->
 
@@ -249,7 +250,7 @@ Apart from requiring much less memory, which is good for fitting into the CPU ca
 
 To improve the performance further, we can:
 
-- manually optimize the index arithmetic (e. g. noticing that we need to multiply `v` by `2` either way),
+- manually optimize the index arithmetic (e.g., noticing that we need to multiply `v` by `2` either way),
 - replace division by two with an explicit binary shift (because [compilers aren't always able to do it themselves](/hpc/compilation/contracts/#arithmetic)),
 - and, most importantly, get rid of [recursion](/hpc/architecture/functions) and make the implementation fully iterative.
 
@@ -329,7 +330,7 @@ int sum(int l, int r) {
     int s = 0;
     while (l <= r) {
         if ( l & 1) s += t[l++]; // l is a right child: add it and move to a cousin
-        if (~r & 1) s += t[r--]; // r is a light child: add it and move to a cousin
+        if (~r & 1) s += t[r--]; // r is a left child: add it and move to a cousin
         l >>= 1, r >>= 1;
     }
     return s;
@@ -530,7 +531,7 @@ Repeatedly adding the lowest set bit to `k` makes it "more even" and lifts it to
 
 ![A path for an update query in a Fenwick tree](../img/fenwick-update.png)
 
-Now, if we leave all the code as it is, it works correctly even when $n$ is not a power of two. In this case, the Fenwick tree is not equivalent to a segment tree fo size $n$ but to a *forest* of up to $O(\log n)$ segment trees of power-of-two sizes — or to a single segment tree padded with zeros to a large power of two, if you like to think this way. In either case, all procedures remain working correctly as they never touch anything outside the $[1, n]$ range.
+Now, if we leave all the code as it is, it works correctly even when $n$ is not a power of two. In this case, the Fenwick tree is not equivalent to a segment tree of size $n$ but to a *forest* of up to $O(\log n)$ segment trees of power-of-two sizes — or to a single segment tree padded with zeros to a large power of two, if you like to think this way. In either case, all procedures still work correctly as they never touch anything outside the $[1, n]$ range.
 
 <!-- Sometimes people use `k -= k & -k` to iterate when processing the `sum` query, which makes this implementation delightfully symmetric. -->
 
@@ -592,8 +593,8 @@ constexpr int height(int n) {
 constexpr int offset(int h) {
     int s = 0, n = N;
     while (h--) {
-        s += (n + B - 1) / B * B;
-        n /= B;
+        n = (n + B - 1) / B;
+        s += n * B;
     }
     return s;
 }
@@ -602,14 +603,14 @@ constexpr int H = height(N);
 alignas(64) int t[offset(H)]; // an array for storing nodes
 ```
 
-This way we effectively reduce the height of the tree by approximately $\frac{\log_B n}{\log_2 n} = \log_2 B$ times ($\sim4$ times if $B = 16$), but it becomes non-trivial to implement in-node operations efficiently. For our problem, we have two main options:
+This way, we effectively reduce the height of the tree by approximately $\frac{\log_B n}{\log_2 n} = \log_2 B$ times ($\sim4$ times if $B = 16$), but it becomes non-trivial to implement in-node operations efficiently. For our problem, we have two main options:
 
 1. We could store $B$ *sums* in each node (for each of its $B$ children).
 2. We could store $B$ *prefix sums* in each node (the $i$-th being the sum of the first $(i + 1)$ children).
 
 If we go with the first option, the `add` query would be largely the same as in the bottom-up segment tree, but the `sum` query would need to add up to $B$ scalars in each node it visits. And if we go with the second option, the `sum` query would be trivial, but the `add` query would need to add `x` to some suffix on each node it visits.
 
-In either case, one operation will perform $O(\log_B n)$ operations, touching just one scalar in each node, while the other will perform $O(B \cdot \log_B n)$ operations, touching up to $B$ scalars in each node. However, it is 21st century, and we can use [SIMD](/hpc/simd) to accelerate the slower operation. Since there are no fast [horizontal reductions](/hpc/simd/reduction) in SIMD instruction sets, but it is easy to add a vector to a vector, we will choose the second approach and store prefix sums in each node.
+In either case, one operation would perform $O(\log_B n)$ operations, touching just one scalar in each node, while the other would perform $O(B \cdot \log_B n)$ operations, touching up to $B$ scalars in each node. We can, however, use [SIMD](/hpc/simd) to accelerate the slower operation, and since there are no fast [horizontal reductions](/hpc/simd/reduction) in SIMD instruction sets, but it is easy to add a vector to a vector, we will choose the second approach and store prefix sums in each node.
 
 This makes the `sum` query extremely fast and easy to implement:
 
@@ -622,7 +623,7 @@ int sum(int k) {
 }
 ```
 
-The `add` query is more complicated and slower. We need to add a number to only a suffix of a node, and we can do this by [masking out](/hpc/simd/masking) the positions that need not be modified.
+The `add` query is more complicated and slower. We need to add a number only to a suffix of a node, and we can do this by [masking out](/hpc/simd/masking) the positions that should not be modified.
 
 We can pre-calculate a $B \times B$ array corresponding to $B$ such masks that tell, for each of $B$ positions within a node, whether a certain prefix sum value needs to be updated or not:
 
@@ -724,7 +725,7 @@ This makes both queries much slower — especially the reduction — but this sh
 
 **Minimum** is a nice exception where the update query can be made slightly faster if the new value of the element is less than the current one: we can skip the horizontal reduction part and just update $\log_B n$ nodes using a scalar procedure.
 
-This works very fast when we mostly have such updates, which is the case e. g. for the sparse-graph Dijkstra algorithm when we have more edges than vertices. For this problem, the wide segment tree can serve as an efficient fixed-universe min-heap.
+This works very fast when we mostly have such updates, which is the case, e.g., for the sparse-graph Dijkstra algorithm when we have more edges than vertices. For this problem, the wide segment tree can serve as an efficient fixed-universe min-heap.
 
 **Lazy propagation** can be done by storing a separate array for the delayed operations in a node. To propagate the updates, we need to go top to bottom (which can be done by simply reversing the direction of the `for` loop and using `k >> (h * b)` to calculate the `h`-th ancestor), [broadcast](/hpc/simd/moving/#broadcast) and reset the delayed operation value stored in the parent of the current node, and apply it to all values stored in the current node with SIMD.
 
diff --git a/content/english/hpc/external-memory/_index.md b/content/english/hpc/external-memory/_index.md
index d7c1612c..0af587b3 100644
--- a/content/english/hpc/external-memory/_index.md
+++ b/content/english/hpc/external-memory/_index.md
@@ -19,7 +19,7 @@ When you fetch anything from memory, the request goes through an incredibly comp
 
 -->
 
-When you fetch anything from memory, there is always some non-zero latency before the data arrives. Moreover, the request doesn't go directly to its ultimate storage location, but it first goes through an incredibly complex system of address translation units and caching layers designed to both help in memory management and reduce the latency.
+When you fetch anything from memory, there is always some latency before the data arrives. Moreover, the request doesn't go directly to its ultimate storage location, but it first goes through a complex system of address translation units and caching layers designed to both help in memory management and reduce latency.
 
 Therefore, the only correct answer to this question is "it depends" — primarily on where the operands are stored:
 
@@ -27,7 +27,7 @@ Therefore, the only correct answer to this question is "it depends" — primaril
 - If it was accessed recently, it is probably *cached* and will take less than that to fetch, depending on how long ago it was accessed — it could be ~50 cycles for the slowest layer of cache and around 4-5 cycles for the fastest.
 - But it could also be stored on some type of *external memory* such as a hard drive, and in this case, it will take around 5ms, or roughly $10^7$ cycles (!) to access it.
 
-Such high variance of memory performance is caused by the fact that memory hardware doesn't follow the same [laws of silicon scaling](/hpc/complexity/hardware) as CPU chips do. Memory is still improving through other means, but if 50 years ago memory timings were roughly on the same scale with the instruction latencies, nowadays they lag far behind.
+Such a high variance of memory performance is caused by the fact that memory hardware doesn't follow the same [laws of silicon scaling](/hpc/complexity/hardware) as CPU chips do. Memory is still improving through other means, but if 50 years ago memory timings were roughly on the same scale with the instruction latencies, nowadays they lag far behind.
 
 ![](img/memory-vs-compute.png)
 
@@ -41,7 +41,7 @@ It becomes ever more important to optimize
 
 Modern computers grow ever more powerful, but their memory systems can't quite pick up with the increase in computing power, because they don't follow the same [laws of silicon scaling](/hpc/complexity/hardware) as CPU chips do.
 
-If a CPU core has a frequency of 3 GHz, it roughly means that it is capable of executing up to $3 \cdot 10^9$ operations per second, depending on what constitutes an "operation". This is the baseline: on modern architectures, it can be increased by techniques such as SIMD and instruction-level parallelism up to $10^{11}$ operations per second, if the computation allows it.
+If a CPU core has a frequency of 3 GHz, it roughly means that it is capable of executing up to $3 \cdot 10^9$ operations per second, depending on what constitutes an "operation." This is the baseline: on modern architectures, it can be increased by techniques such as SIMD and instruction-level parallelism up to $10^{11}$ operations per second, if the computation allows it.
 
 But for many algorithms, the CPU is not the bottleneck. Before trying to optimize performance above that baseline, we need to learn not to drop below it, and the number one reason for this is memory.
 
diff --git a/content/english/hpc/external-memory/hierarchy.md b/content/english/hpc/external-memory/hierarchy.md
index 35670da9..26dfc144 100644
--- a/content/english/hpc/external-memory/hierarchy.md
+++ b/content/english/hpc/external-memory/hierarchy.md
@@ -40,8 +40,8 @@ Everything up to the RAM level is called *volatile memory* because it does not p
 
 From fastest to slowest:
 
-- **CPU registers**, which are the zero-time access data cells CPU uses to store all its intermediate values, can also be thought of as a memory type. There is only a limited number of them (e. g. 16 "general purpose" ones), and in some cases, you may want to use all of them for performance reasons.
-- **CPU caches.** Modern CPUs have multiple layers of cache (L1, L2, often L3, and rarely even L4). The lowest layer is shared between cores and is usually scaled with their number (e. g. a 10-core CPU should have around 10M of L3 cache).
+- **CPU registers**, which are the zero-time access data cells CPU uses to store all its intermediate values, can also be thought of as a memory type. There is only a limited number of them (e.g., just 16 "general purpose" ones), and in some cases, you may want to use all of them for performance reasons.
+- **CPU caches.** Modern CPUs have multiple layers of cache (L1, L2, often L3, and rarely even L4). The lowest layer is shared between cores and is usually scaled with their number (e.g., a 10-core CPU should have around 10M of L3 cache).
 - **Random access memory,** which is the first scalable type of memory: nowadays you can rent machines with half a terabyte of RAM on the public clouds. This is the one where most of your working data is supposed to be stored.
 
 The CPU cache system has an important concept of a *cache line*, which is the basic unit of data transfer between the CPU and the RAM. The size of a cache line is 64 bytes on most architectures, meaning that all main memory is divided into blocks of 64 bytes, and whenever you request (read or write) a single byte, you are also fetching all its 63 cache line neighbors whether your want them or not.
@@ -58,7 +58,7 @@ There are other caches inside CPUs that are used for something other than data.
 
 ### Non-Volatile Memory
 
-While the data cells in CPU caches and the RAM only gently store just a few electrons (that periodically leak and need to be periodically refreshed), the data cells in *non-volatile memory* types store hundreds of them. This lets the data to be persisted for prolonged periods of time without power but comes at the cost of performance and durability — because when you have more electrons, you also have more opportunities for them colliding with silicon atoms.
+While the data cells in CPU caches and the RAM only gently store just a few electrons (that periodically leak and need to be periodically refreshed), the data cells in *non-volatile memory* types store hundreds of them. This lets the data persist for prolonged periods of time without power but comes at the cost of performance and durability — because when you have more electrons, you also have more opportunities for them to collide with silicon atoms.
 
 <!-- error correction -->
 
diff --git a/content/english/hpc/external-memory/list-ranking.md b/content/english/hpc/external-memory/list-ranking.md
index 07b33c71..6d7c0053 100644
--- a/content/english/hpc/external-memory/list-ranking.md
+++ b/content/english/hpc/external-memory/list-ranking.md
@@ -50,11 +50,11 @@ List ranking is especially useful in graph algorithms.
 
 For example, we can obtain the Euler tour of a tree in external memory by constructing a linked list from the tree that corresponds to its Euler tour and then applying the list ranking algorithm — the ranks of each node will be the same as its index $tin_v$ in the Euler tour. To construct this list, we need to:
 
-- split each undirected tree edge into two directed ones;
-- duplicate the parent node for each up-edge (because list nodes can only have one incoming edge, but we visit some tree vertices multiple times);
-- route each such node either to the "next sibling", if it has one, or otherwise to its own parent;
+- split each undirected edge into two directed ones;
+- duplicate the parent node for each up-edge (because list nodes can only have one incoming edge, but we visit some vertices multiple times);
+- route each such node either to the "next sibling," if it has one, or otherwise to its own parent;
 - and then finally break the resulting cycle at the root.
 
 This general technique is called *tree contraction*, and it serves as the basis for a large number of tree algorithms.
 
-Exactly the same approach can be applied to parallel algorithms, and we will convert that much more deeply in part 2.
+The same approach can be applied to parallel algorithms, and we will cover that much more deeply in part II.
diff --git a/content/english/hpc/external-memory/locality.md b/content/english/hpc/external-memory/locality.md
index 8607506d..e61cb5a3 100644
--- a/content/english/hpc/external-memory/locality.md
+++ b/content/english/hpc/external-memory/locality.md
@@ -23,7 +23,7 @@ In this article, we continue designing algorithms for the external memory model
 
 In this context, we can talk about the degree of cache reuse primarily in two ways:
 
-- *Temporal locality* refers to the repeated access of the same data within a relatively small time duration, such that the data likely remains cached between the requests.
+- *Temporal locality* refers to the repeated access of the same data within a relatively small time period, such that the data likely remains cached between the requests.
 - *Spatial locality* refers to the use of elements relatively close to each other in terms of their memory locations, such that they are likely fetched in the same memory block.
 
 In other words, temporal locality is when it is likely that this same memory location will soon be requested again, while spatial locality is when it is likely that a nearby location will be requested right after.
@@ -34,8 +34,8 @@ In this section, we will do some case studies to show how these high-level conce
 
 Consider a divide-and-conquer algorithm such as merge sorting. There are two approaches to implementing it:
 
-- We can implement it recursively, or "depth-first", the way it is normally implemented: sort the left half, sort the right half and then merge the results.
-- We can implement it iteratively, or "breadth-first": do the lowest "layer" first, looping through the entire dataset and comparing odd elements with even elements, then merge the first two elements with the second two elements, the third two elements with the fourth two elements and so on.
+- We can implement it recursively, or "depth-first," the way it is normally implemented: sort the left half, sort the right half and then merge the results.
+- We can implement it iteratively, or "breadth-first:" do the lowest "layer" first, looping through the entire dataset and comparing odd elements with even elements, then merge the first two elements with the second two elements, the third two elements with the fourth two elements and so on.
 
 It seems like the second approach is more cumbersome, but faster — because recursion is always slow, right?
 
@@ -47,44 +47,51 @@ In practice, there is still some overhead associated with the recursion, and for
 
 ### Dynamic Programming
 
-Similar reasoning can be applied to the implementations of dynamic programming algorithms but leading to the reverse result. Consider the classic knapsack problem, where we got $n$ items with integer costs $c_i$, and we need to pick a subset of items with the maximum total cost that does not exceed a given constant $w$.
+Similar reasoning can be applied to the implementations of dynamic programming algorithms but leading to the reverse result. Consider the classic *knapsack problem:* given $N$ items with positive integer costs $c_i$, pick a subset of items with the maximum total cost that does not exceed a given constant $W$.
 
-The way to solve it is to introduce the *state* $f[i, k]$, which corresponds to the maximum total cost not exceeding $k$ that can be achieved having already considered and excluded the first $i$ items. The state can be updated in $O(1)$ time per entry if consider either taking or not taking the $i$-th item and using further states of the dynamic to compute the optimal decision for each state.
+The way to solve it is to introduce the *state* $f[n, w]$, which corresponds to the maximum total cost not exceeding $w$ that can be achieved using only the first $n$ items. These values can be computed in $O(1)$ time per entry if we consider either taking or not taking the $n$-th item and using the previous states of the dynamic to make the optimal decision.
 
-Python has a handy `lru_cache` decorator, which can be used for implementing it with memoized recursion:
+Python has a handy `lru_cache` decorator which can be used for implementing it with memoized recursion:
 
 ```python
 @lru_cache
-def f(i, k):
-    if i == n or k == 0:
+def f(n, w):
+    # check if we have no items to choose
+    if n == 0:
         return 0
-    if w[i] > k:
-        return f(i + 1, k)
-    return max(f(i + 1, k), c[i] + f(i + 1, k - w[i]))
+    
+    # check if we can't pick the last item (note zero-based indexing)
+    if c[n - 1] > w:
+        return f(n - 1, w)
+    
+    # otherwise, we can either pick the last item or not
+    return max(f(n - 1, w), c[n - 1] + f(n - 1, w - c[n - 1]))
 ```
 
-When computing $f[n, w]$, the recursion may visit up to $O(n \cdot w)$ different states, which is asymptotically efficient, but rather slow in reality. Even after nullifying the overhead of Python recursion and all the hash table queries required for the LRU cache to work, it would still be slow because it does random I/O throughout most of the execution.
+When computing $f[N, W]$, the recursion may visit up to $O(N \cdot W)$ different states, which is asymptotically efficient, but rather slow in reality. Even after nullifying the overhead of Python recursion and all the [hash table queries](../policies/#implementing-caching) required for the LRU cache to work, it would still be slow because it does random I/O throughout most of the execution.
 
 What we can do instead is to create a two-dimensional array for the dynamic and replace the recursion with a nice nested loop like this:
 
 ```cpp
-int f[N + 1][W + 1];
+int f[N + 1][W + 1] = {0}; // this zero-fills the array
 
-for (int i = n - 1; i >= 0; i++)
-    for (int k = 0; k <= W; k++)
-        f[i][k] = w[i] > k ? f[i + 1][k] : max(f[i + 1][k], c[i] + f[i + 1][k - w[i]]);
+for (int n = 1; n <= N; n++)
+    for (int w = 0; w <= W; w++)
+        f[n][w] = c[n - 1] > w ?
+                  f[n - 1][w] :
+                  max(f[n - 1][k], c[n - 1] + f[n - 1][w - c[n - 1]]);
 ```
 
-Notice that we are only using the previous layer of the dynamic to calculate the next one. This means that if we can store one layer in the cache, we would only need to write $O(\frac{n \cdot w}{B})$ blocks in external memory.
+Notice that we are only using the previous layer of the dynamic to calculate the next one. This means that if we can store one layer in the cache, we would only need to write $O(\frac{N \cdot W}{B})$ blocks in external memory.
 
-Moreover, if we only need the answer, we don't actually have to store the whole 2d array but only the last layer. This lets us use just $O(w)$ memory by maintaining a single array of $w$ values. To simplify the code, we can slightly change the dynamic to store a binary value: whether it is possible to get the sum of exactly $k$ using the items that we have already considered. This dynamic is even faster to compute:
+Moreover, if we only need the answer, we don't actually have to store the whole 2d array but only the last layer. This lets us use just $O(W)$ memory by maintaining a single array of $W$ values. To simplify the code, we can slightly change the dynamic to store a binary value: whether it is possible to get the sum of exactly $w$ using the items that we have already considered. This dynamic is even faster to compute:
 
 ```cpp
-bool f[W + 1] = {}; // this zero-fills the array
+bool f[W + 1] = {0};
 f[0] = 1;
-for (int i = 0; i < n; i++)
-    for (int x = W - a[i]; x >= 0; x--)
-        f[x + a[i]] |= f[x];
+for (int n = 0; n < N; n++)
+    for (int x = W - c[n]; x >= 0; x--)
+        f[x + c[n]] |= f[x];
 ```
 
 As a side note, now that it only uses simple bitwise operations, it can be optimized further by using a bitset:
@@ -92,8 +99,8 @@ As a side note, now that it only uses simple bitwise operations, it can be optim
 ```cpp
 std::bitset<W + 1> b;
 b[0] = 1;
-for (int i = 0; i < n; i++)
-    b |= b << c[i];
+for (int n = 0; n < N; n++)
+    b |= b << c[n];
 ```
 
 Surprisingly, there is still some room for improvement, and we will come back to this problem later.
@@ -129,7 +136,7 @@ $$
 t[k][i] = \min(t[k-1][i], t[k-1][i+2^{k-1}])
 $$
 
-Now, there are two design choices to make: whether the log-size $k$ should be the first or the second dimension, and whether to iterate over $k$ and then $i$ or the other way around. This means that there are of $2×2=4$ ways to build it, and here is the optimal one:
+Now, there are two design choices to make: whether the log-size $k$ should be the first or the second dimension, and whether to iterate over $k$ and then $i$ or the other way around. This means that there are $2×2=4$ ways to build it, and here is the optimal one:
 
 ```cpp
 int mn[logn][maxn];
@@ -167,7 +174,7 @@ The AoS layout is usually preferred for data structures, but SoA still has good
 
 This difference in design is important in data processing applications. For example, databases can be either *row-* or *column-oriented* (also called *columnar*):
 
-- *Row-oriented* storage formats are used when you need to search for a limited amount of objects in a large dataset and fetch all or most of their fields. Examples: PostgreSQL, MongoDB.
+- *Row-oriented* storage formats are used when you need to search for a limited number of objects in a large dataset and/or fetch all or most of their fields. Examples: PostgreSQL, MongoDB.
 - *Columnar* storage formats are used for big data processing and analytics, where you need to scan through everything anyway to calculate certain statistics. Examples: ClickHouse, Hbase.
 
 Columnar formats have the additional advantage that you can only read the fields that you need, as different fields are stored in separate external memory regions.
diff --git a/content/english/hpc/external-memory/model.md b/content/english/hpc/external-memory/model.md
index 35cba4ea..9ab86eba 100644
--- a/content/english/hpc/external-memory/model.md
+++ b/content/english/hpc/external-memory/model.md
@@ -18,7 +18,7 @@ Similar in spirit, in the *external memory model*, we simply ignore every operat
 
 In this model, we measure the performance of an algorithm in terms of its high-level *I/O operations*, or *IOPS* — that is, the total number of blocks read or written to external memory during execution.
 
-We will mostly focus on the case where the internal memory is RAM and external memory is SSD or HDD, although the underlying analysis techniques that we will develop are applicable to any layer in the cache hierarchy. Under these settings, reasonable block size $B$ is about 1MB, internal memory size $M$ is usually a few gigabytes, and $N$ is up to a few terabytes.
+We will mostly focus on the case where the internal memory is RAM and the external memory is SSD or HDD, although the underlying analysis techniques that we will develop are applicable to any layer in the cache hierarchy. Under these settings, reasonable block size $B$ is about 1MB, internal memory size $M$ is usually a few gigabytes, and $N$ is up to a few terabytes.
 
 ### Array Scan
 
diff --git a/content/english/hpc/external-memory/oblivious.md b/content/english/hpc/external-memory/oblivious.md
index 5e4650b2..93c4f2fc 100644
--- a/content/english/hpc/external-memory/oblivious.md
+++ b/content/english/hpc/external-memory/oblivious.md
@@ -118,7 +118,7 @@ It seems like we can't do better, but it turns out we can.
 
 ### Algorithm
 
-Cache-oblivious matrix multiplication relies on essentially the same trick as the transposition. We need to divide the data until it fits into lowest cache (i. e. $N^2 \leq M$). For matrix multiplication, this equates to using this formula:
+Cache-oblivious matrix multiplication relies on essentially the same trick as the transposition. We need to divide the data until it fits into lowest cache (i.e., $N^2 \leq M$). For matrix multiplication, this equates to using this formula:
 
 $$
 \begin{pmatrix}
@@ -198,7 +198,7 @@ $$
 T(N) = O\left(\frac{(\sqrt{M})^2}{B} \cdot \left(\frac{N}{\sqrt M}\right)^3\right) = O\left(\frac{N^3}{B\sqrt{M}}\right)
 $$
 
-This is better than just $O(\frac{N^3}{B})$ and by quite a lot.
+This is better than just $O(\frac{N^3}{B})$, and by quite a lot.
 
 ### Strassen Algorithm
 
@@ -237,7 +237,7 @@ $$
 
 You can verify these formulas with simple substitution if you feel like it.
 
-As far as I know, none of the mainstream optimized linear algebra libraries use the Strassen algorithm, although there are some prototype implementations that are efficient for matrices larger than 4000 or so.
+As far as I know, none of the mainstream optimized linear algebra libraries use the Strassen algorithm, although there are [some prototype implementations](https://arxiv.org/pdf/1605.01078.pdf) that are efficient for matrices larger than 2000 or so.
 
 This technique can and actually has been extended multiple times to reduce the asymptotic even further by considering more submatrix products. As of 2020, current world record is $O(n^{2.3728596})$. Whether you can multiply matrices in $O(n^2)$ or at least $O(n^2 \log^k n)$ time is an open problem.
 
diff --git a/content/english/hpc/external-memory/policies.md b/content/english/hpc/external-memory/policies.md
index 1ff0e724..4cb36bdd 100644
--- a/content/english/hpc/external-memory/policies.md
+++ b/content/english/hpc/external-memory/policies.md
@@ -33,7 +33,7 @@ $$
 
 The main idea of the proof is to consider the worst case scenario. For LRU it would be the repeating series of $\frac{M}{B}$ distinct blocks: each block is new and so LRU has 100% cache misses. Meanwhile, $OPT_{M/2}$ would be able to cache half of them (but not more, because it only has half the memory). Thus $LRU_M$ needs to fetch double the number of blocks that $OPT_{M/2}$ does, which is basically what is expressed in the inequality, and anything better for $LRU$ would only weaken it.
 
-![Dimmed are the blocks cached by OPT (but note cached by LRU)](../img/opt.png)
+![Dimmed are the blocks cached by OPT (but not cached by LRU)](../img/opt.png)
 
 This is a very relieving result. It means that, at least in terms of asymptotic I/O complexity, you can just assume that the eviction policy is either LRU or OPT — whichever is easier for you — do complexity analysis with it, and the result you get will normally transfer to any other reasonable cache replacement policy.
 
diff --git a/content/english/hpc/external-memory/sorting.md b/content/english/hpc/external-memory/sorting.md
index 6ac13ae0..299da78f 100644
--- a/content/english/hpc/external-memory/sorting.md
+++ b/content/english/hpc/external-memory/sorting.md
@@ -1,6 +1,7 @@
 ---
 title: External Sorting
 weight: 4
+published: true
 ---
 
 Now, let's try to design some actually useful algorithms for the new [external memory model](../model). Our goal in this section is to slowly build up more complex things and eventually get to *external sorting* and its interesting applications.
@@ -33,17 +34,17 @@ So far the examples have been simple, and their analysis doesn't differ too much
 
 In the standard RAM model, the asymptotic complexity would be multiplied $k$, since we would need to perform $O(k)$ comparisons to fill each next element. But in the external memory model, since everything we do in-memory doesn't cost us anything, its asymptotic complexity would not change as long as we can fit $(k+1)$ full blocks in memory, that is, if $k = O(\frac{M}{B})$.
 
-Remember [the $M \gg B$ assumption](../model) when we introduced the computational model? If we have $M \geq B^{1+ε}$ for $\epsilon > 0$, then we can fit any sub-polynomial amount of blocks in memory, certainly including $O(\frac{M}{B})$. This condition is called *tall cache assumption*, and it is usually required in many other external memory algorithms.
+Remember [the $M \gg B$ assumption](../model) when we introduced the computational model? If we have $M \geq B^{1+ε}$ for $\epsilon > 0$, then we can fit any sub-polynomial number of blocks in memory, certainly including $O(\frac{M}{B})$. This condition is called *tall cache assumption*, and it is usually required in many other external memory algorithms.
 
 ### Merge Sorting
 
-The "normal" complexity of the standard mergesort algorithm is $O(N \log_2 N)$: on each of its $O(\log_2 N)$ "layers", the algorithms need to go through all $N$ elements in total and merge them in linear time.
+The "normal" complexity of the standard mergesort algorithm is $O(N \log_2 N)$: on each of its $O(\log_2 N)$ "layers," the algorithms need to go through all $N$ elements in total and merge them in linear time.
 
-In the external memory model, when we read a block of size $M$, we can sort its elements "for free", since they are already in memory. This way we can split the arrays into $O(\frac{N}{M})$ blocks of consecutive elements and sort them separately as the base step, and only then merge them.
+In the external memory model, when we read a block of size $M$, we can sort its elements "for free," since they are already in memory. This way we can split the arrays into $O(\frac{N}{M})$ blocks of consecutive elements and sort them separately as the base step, and only then merge them.
 
 ![](../img/k-way.png)
 
-This effectively means that, in terms of IO operations, the first $O(\log M)$ layers of mergesort are free, and there are only $O(\log_2 \frac{N}{B})$ non-zero-cost layers, each mergeable in $O(\frac{N}{B})$ IOPS in total. This brings total I/O complexity to
+This effectively means that, in terms of I/O operations, the first $O(\log M)$ layers of mergesort are free, and there are only $O(\log_2 \frac{N}{M})$ non-zero-cost layers, each mergeable in $O(\frac{N}{B})$ IOPS in total. This brings total I/O complexity to
 
 $$
 O\left(\frac{N}{B} \log_2 \frac{N}{M}\right)
@@ -57,7 +58,7 @@ Half of a page ago we have learned that in the external memory model, we can mer
 
 Let's sort each block of size $M$ in-memory just as we did before, but during each merge stage, we will split sorted blocks not just in pairs to be merged, but take as many blocks we can fit into our memory during a $k$-way merge. This way the height of the merge tree would be greatly reduced, while each layer would still be done in $O(\frac{N}{B})$ IOPS.
 
-How many sorted arrays can we merge at once? Exactly $k = \frac{M}{B}$, since we need memory for one block for each array. Since the total amount of layers will be reduced to $\log_{\frac{M}{B}} \frac{N}{M}$, the total complexity will be reduced to
+How many sorted arrays can we merge at once? Exactly $k = \frac{M}{B}$, since we need memory for one block for each array. Since the total number of layers will be reduced to $\log_{\frac{M}{B}} \frac{N}{M}$, the total complexity will be reduced to
 
 $$
 SORT(N) \stackrel{\text{def}}{=} O\left(\frac{N}{B} \log_{\frac{M}{B}} \frac{N}{M} \right)
@@ -106,15 +107,28 @@ fclose(input);
 
 What is left now is to merge them together. The bandwidth of modern HDDs can be quite high, and there may be a lot of parts to merge, so the I/O efficiency of this stage is not our only concern: we also need a faster way to merge $k$ arrays than by finding minima with $O(k)$ comparisons. We can do that in $O(\log k)$ time per element if we maintain a min-heap for these $k$ elements, in a manner almost identical to heapsort.
 
-Here is how to implement it. First, we need to initialize some variables:
+Here is how to implement it. First, we are going to need a heap (`priority_queue` in C++):
 
-```cpp
+```c++
+struct Pointer {
+    int key, part; // the element itself and the number of its part
+
+    bool operator<(const Pointer& other) const {
+        return key > other.key; // std::priority_queue is a max-heap by default
+    }
+};
+
+std::priority_queue<Pointer> q;
+```
+
+Then, we need to allocate and fill the buffers:
+
+```c++
 const int nparts = parts.size();
 
-std::priority_queue< std::pair<int, int> > q; // the heap itself (element + part number)
-auto buffers = new int[nparts][B];            // buffers for each part
-int *l = new int[nparts],                     // # of already processed buffer elements
-    *r = new int[nparts];                     // buffer size (in case it isn't full)
+auto buffers = new int[nparts][B]; // buffers for each part
+int *l = new int[nparts],          // # of already processed buffer elements
+    *r = new int[nparts];          // buffer size (in case it isn't full)
 
 // now we add fill the buffer for each part and add their elements to the heap
 for (int part = 0; part < nparts; part++) {
diff --git a/content/english/hpc/external-memory/virtual.md b/content/english/hpc/external-memory/virtual.md
index 6535283d..92bb454c 100644
--- a/content/english/hpc/external-memory/virtual.md
+++ b/content/english/hpc/external-memory/virtual.md
@@ -19,7 +19,7 @@ Virtual memory gives each process the impression that it fully controls a contig
 
 To achieve this, the memory address space is divided into *pages* (typically 4KB in size), which are the base units of memory that the programs can request from the operating system. The memory system maintains a special hardware data structure called the *page table*, which contains the mappings of virtual page addresses to the physical ones. When a process accesses data using its virtual memory address, the memory system calculates its page number (by right-shifting it by $12$ if $4096=2^{12}$ is the page size), looks up in the page table that its physical address is, and forwards the read or write request to where that data is actually stored.
 
-Since the address translation needs to be done for each memory request, and the number of memory pages itself may be large (e. g. 16G RAM / 4K page size = 4M pages), address translation poses a difficult problem in itself. One way to speed it up is to use a special cache for the page table itself called *translation lookaside buffer* (TLB), and the other is to [increase the page size](/hpc/cpu-cache/paging) so that the total number of memory pages is made smaller at the cost of reduced granularity.
+Since the address translation needs to be done for each memory request, and the number of memory pages itself may be large (e.g., 16G RAM / 4K page size = 4M pages), address translation poses a difficult problem in itself. One way to speed it up is to use a special cache for the page table itself called *translation lookaside buffer* (TLB), and the other is to [increase the page size](/hpc/cpu-cache/paging) so that the total number of memory pages is made smaller at the cost of reduced granularity.
 
 <!--
 
diff --git a/content/english/hpc/number-theory/_index.md b/content/english/hpc/number-theory/_index.md
index 091f476f..e91fa1fb 100644
--- a/content/english/hpc/number-theory/_index.md
+++ b/content/english/hpc/number-theory/_index.md
@@ -1,13 +1,40 @@
 ---
 title: Number Theory
 weight: 7
-draft: true
 ---
 
-In 1940, British mathematician Godfrey Harold Hardy published a famous essay titled [A Mathematician's Apology](https://en.wikipedia.org/wiki/A_Mathematician%27s_Apology) where he discusses the notion that mathematics should be pursued for its own sake rather than for the sake of its applications. As a 62-year-old, he saw the devastation caused by first world war, and was amidst the second one.
+In 1940, a British mathematician [G. H. Hardy](https://en.wikipedia.org/wiki/G._H._Hardy) published a famous essay titled "[A Mathematician's Apology](https://en.wikipedia.org/wiki/A_Mathematician%27s_Apology)" discussing the notion that mathematics should be pursued for its own sake rather than for the sake of its applications.
 
-A scientist faces a moral dilemma because some of its inventions may do more harm than good. One can find calm in pursuing useless math. Hardy himself specialized in number theory, and he was content about it not having any applications: "No one has yet discovered any warlike purpose to be served by the theory of numbers or relativity, and it seems unlikely that anyone will do so for many years".
+Similar to mathematics, the various fields of computer science also form a spectrum, with mathematical logic and computability theory on one end and web programming and application development on the other. I assume that you, the reader, is more on the applied side: this book was written to show that there are way too few people working on practical algorithm design instead of theoretical computer science — and since you got to Chapter 7, you probably also believe in that statement.
+
+But, regardless of the personal views on the matter, one can see where Hardy is coming from. Being 62 years old at the date of writing, he witnessed the devastation caused by the First and the ongoing Second World War — which was greatly amplified by the weaponization of science. As a number theorist, Hardy finds calm working in a "useless" field and not having to face any moral dilemmas, writing:
+
+> No one has yet discovered any warlike purpose to be served by the theory of numbers or relativity, and it seems unlikely that anyone will do so for many years.
+
+Ironically, this statement was proved very wrong just 5 years later with the development of the atomic bomb, which would not have been possible without the [understanding](https://en.wikipedia.org/wiki/Einstein%E2%80%93Szil%C3%A1rd_letter) of relativity, and the inception of computer-era cryptography, which extensively builds on number theory — the computational aspect of which is the main topic of this chapter.
+
+<!--
+
+One can find calm in pursuing "useless" math and not having to face any moral dilemmas.
+
+Hardy seems somewhat gratified that his own field has no applications:
+
+A scientist faces a moral dilemma because some of their inventions may do more harm than good. One may find calm in pursing "useless" math.
+
+Hardy seems to find calm in pursuing "useless" math and not having to face any moral dilemmas:
+
+Scientists often face a moral dilemma because some of their inventions may do more harm than good.
+
+One can find calm in pursuing "useless" math and not facing any moral dilemmas. Hardy seems somewhat gratified that his own field has no applications:
+
+If your field has no applications, you don't have to face any moral dilemmas — and Hardy seems to be his own field, number theory, has none:
+
+somewhat proudly pointing out that his field has no practical applications:
+
+A scientist faces a moral dilemma because some of its inventions may do more harm than good. One can find calm in pursuing useless math. Hardy himself specialized in number theory, and he was content about it not having any applications:
 
 It is ironic that within just 5 years number theory was the basis of cracking Enigma and relativity theory developing atomic bomb respectively.
 
 Number theory has many more applications.
+
+-->
diff --git a/content/english/hpc/number-theory/cryptography.md b/content/english/hpc/number-theory/cryptography.md
index 87f58124..0b8c6b76 100644
--- a/content/english/hpc/number-theory/cryptography.md
+++ b/content/english/hpc/number-theory/cryptography.md
@@ -1,6 +1,6 @@
 ---
 title: Cryptography
-weight: 6
+weight: 7
 draft: true
 ---
 
@@ -22,15 +22,15 @@ To calculate $d$ and restore the message, the attacker would need to repeat step
 
 When doing actual communication, people first exchange their public keys (in any, possibly unsecure way) and then use it to encrypt messages.
 
-This is what web browsers do when establishing connection "https". You can also do it by hand with GPG.
+This is what web browsers do when establishing connection "https." You can also do it by hand with GPG.
 
 ### Man-in-the-middle
 
 There is an issue when establishing initial communication that the attacker could replace it and control the communication.
 
-Between your browser and a bank. "Hey this is a message from a bank".
+Between your browser and a bank. "Hey this is a message from a bank."
 
-Trust networks. E. g. everyone can trust Google or whoever makes the device or operating system.
+Trust networks. E.g., everyone can trust Google or whoever makes the device or operating system.
 
 ## Symmetric Cryptography
 
diff --git a/content/english/hpc/number-theory/error-correction.md b/content/english/hpc/number-theory/error-correction.md
index 91f1f472..e8774ed8 100644
--- a/content/english/hpc/number-theory/error-correction.md
+++ b/content/english/hpc/number-theory/error-correction.md
@@ -1,6 +1,6 @@
 ---
 title: Error Correction
-weight: 4
+weight: 6
 draft: true
 ---
 
diff --git a/content/english/hpc/number-theory/euclid-extended.md b/content/english/hpc/number-theory/euclid-extended.md
new file mode 100644
index 00000000..a37c1b29
--- /dev/null
+++ b/content/english/hpc/number-theory/euclid-extended.md
@@ -0,0 +1,100 @@
+---
+title: Extended Euclidean Algorithm
+weight: 3
+---
+
+[Fermat’s theorem](../modular/#fermats-theorem) allows us to calculate modular multiplicative inverses through [binary exponentiation](..exponentiation/) in $O(\log n)$ operations, but it only works with prime modula. There is a generalization of it, [Euler's theorem](https://en.wikipedia.org/wiki/Euler%27s_theorem), stating that if $m$ and $a$ are coprime, then
+
+$$
+a^{\phi(m)} \equiv 1 \pmod m
+$$
+
+where $\phi(m)$ is [Euler's totient function](https://en.wikipedia.org/wiki/Euler%27s_totient_function) defined as the number of positive integers $x < m$ that are coprime with $m$. In the special case when $m$ is a prime, then all the $m - 1$ residues are coprime and $\phi(m) = m - 1$, yielding the Fermat's theorem.
+
+This lets us calculate the inverse of $a$ as $a^{\phi(m) - 1}$ if we know $\phi(m)$, but in turn, calculating it is not so fast: you usually need to obtain the [factorization](/hpc/algorithms/factorization/) of $m$ to do it. There is a more general method that works by modifying the [the Euclidean algorthm](/hpc/algorithms/gcd/).
+
+### Algorithm
+
+*Extended Euclidean algorithm*, apart from finding $g = \gcd(a, b)$, also finds integers $x$ and $y$ such that
+
+$$
+a \cdot x + b \cdot y = g
+$$
+
+which solves the problem of finding modular inverse if we substitute $b$ with $m$ and $g$ with $1$:
+
+$$
+a^{-1} \cdot a + k \cdot m = 1
+$$
+
+Note that, if $a$ is not coprime with $m$, there is no solution since no integer combination of $a$ and $m$ can yield anything that is not a multiple of their greatest common divisor.
+
+The algorithm is also recursive: it calculates the coefficients $x'$ and $y'$ for $\gcd(b, a \bmod b)$ and restores the solution for the original number pair. If we have a solution $(x', y')$ for the pair $(b, a \bmod b)$
+
+$$
+b \cdot x' + (a \bmod b) \cdot y' = g
+$$
+
+then, to get the solution for the initial input, we can rewrite the expression $(a \bmod b)$ as $(a - \lfloor \frac{a}{b} \rfloor \cdot b)$ and subsitute it into the aforementioned equation:
+
+$$
+b \cdot x' + (a - \Big \lfloor \frac{a}{b} \Big \rfloor \cdot b) \cdot y' = g
+$$
+
+Now we rearrange the terms grouping by $a$ and $b$ to get
+
+$$
+a \cdot \underbrace{y'}_x + b \cdot \underbrace{(x' - \Big \lfloor \frac{a}{b} \Big \rfloor \cdot y')}_y = g
+$$
+
+Comparing it with the initial expression, we infer that we can just use coefficients of $a$ and $b$ for the initial $x$ and $y$.
+
+### Implementation
+
+We implement the algorithm as a recursive function. Since its output is not one but three integers, we pass the coefficients to it by reference:
+
+```c++
+int gcd(int a, int b, int &x, int &y) {
+    if (a == 0) {
+        x = 0;
+        y = 1;
+        return b;
+    }
+    int x1, y1;
+    int d = gcd(b % a, a, x1, y1);
+    x = y1 - (b / a) * x1;
+    y = x1;
+    return d;
+}
+```
+
+To calculate the inverse, we simply pass $a$ and $m$ and return the $x$ coefficient the algorithm finds. Since we pass two positive numbers, one of the coefficient will be positive and the other one is negative (which one depends on whether the number of iterations is odd or even), so we need to optionally check if $x$ is negative and add $m$ to get a correct residue:
+
+```c++
+int inverse(int a) {
+    int x, y;
+    gcd(a, M, x, y);
+    if (x < 0)
+        x += M;
+    return x;
+}
+```
+
+It works in ~160ns — 10ns faster than inverting numbers with [binary exponentiation](../exponentiation). To optimize it further, we can similarly turn it iterative ­— which takes 135ns:
+
+```c++
+int inverse(int a) {
+    int b = M, x = 1, y = 0;
+    while (a != 1) {
+        y -= b / a * x;
+        b %= a;
+        swap(a, b);
+        swap(x, y);
+    }
+    return x < 0 ? x + M : x;
+}
+```
+
+Note that, unlike binary exponentiation, the running time depends on the value of $a$. For example, for this particular value of $m$ ($10^9 + 7$), the worst input happens to be 564400443, for which the algorithm performs 37 iterations and takes 250ns.
+
+**Exercise**. Try to adapt the same technique for the [binary GCD](/hpc/algorithms/gcd/#binary-gcd) (it won't give performance speedup though unless you are better than me at optimization).
diff --git a/content/english/hpc/number-theory/exponentiation.md b/content/english/hpc/number-theory/exponentiation.md
new file mode 100644
index 00000000..8806257d
--- /dev/null
+++ b/content/english/hpc/number-theory/exponentiation.md
@@ -0,0 +1,109 @@
+---
+title: Binary Exponentiation
+weight: 2
+---
+
+In modular arithmetic (and computational algebra in general), you often need to raise a number to the $n$-th power — to do [modular division](../modular/#modular-division), perform [primality tests](../modular/#fermats-theorem), or compute some combinatorial values — ­and you usually want to spend fewer than $\Theta(n)$ operations calculating it.
+
+*Binary exponentiation*, also known as *exponentiation by squaring*, is a method that allows for computation of the $n$-th power using $O(\log n)$ multiplications, relying on the following observation:
+
+$$
+\begin{aligned}
+    a^{2k}       &= (a^k)^2
+\\  a^{2k + 1}   &= (a^k)^2 \cdot a
+\end{aligned}
+$$
+
+To compute $a^n$, we can recursively compute $a^{\lfloor n / 2 \rfloor}$, square it, and then optionally multiply by $a$ if $n$ is odd, corresponding to the following recurrence:
+
+$$
+a^n = f(a, n) = \begin{cases}
+   1,               && n = 0
+\\ f(a, \frac{n}{2})^2,     && 2 \mid n
+\\ f(a, n - 1) \cdot a, && 2 \nmid n
+\end{cases}
+$$
+
+Since $n$ is at least halved every two recursive transitions, the depth of this recurrence and the total number of multiplications will be at most $O(\log n)$.
+
+### Recursive Implementation
+
+As we already have a recurrence, it is natural to implement the algorithm as a case matching recursive function:
+
+```c++
+const int M = 1e9 + 7; // modulo
+typedef unsigned long long u64;
+
+u64 binpow(u64 a, u64 n) {
+    if (n == 0)
+        return 1;
+    if (n % 2 == 1)
+        return binpow(a, n - 1) * a % M;
+    else {
+        u64 b = binpow(a, n / 2);
+        return b * b % M;
+    }
+}
+```
+
+In our benchmark, we use $n = m - 2$ so that we compute the [multiplicative inverse](../modular/#modular-division) of $a$ modulo $m$:
+
+```c++
+u64 inverse(u64 a) {
+    return binpow(a, M - 2);
+}
+```
+
+We use $m = 10^9+7$, which is a modulo value commonly used in competitive programming to calculate checksums in combinatorial problems — because it is prime (allowing inverse via binary exponentiation), sufficiently large, not overflowing `int` in addition, not overflowing `long long` in multiplication, and easy to type as `1e9 + 7`.
+
+Since we use it as compile-time constant in the code, the compiler can optimize the modulo by [replacing it with multiplication](/hpc/arithmetic/division/) (even if it is not a compile-time constant, it is still cheaper to compute the magic constants by hand once and use them for fast reduction).
+
+The execution path — and consequently the running time — depends on the value of $n$. For this particular $n$, the baseline implementation takes around 330ns per call. As recursion introduces some [overhead](/hpc/architecture/functions/), it makes sense to unroll the implementation into an iterative procedure.
+
+### Iterative Implementation
+
+The result of $a^n$ can be represented as the product of $a$ to some powers of two — those that correspond to 1s in the binary representation of $n$. For example, if $n = 42 = 32 + 8 + 2$, then
+
+$$
+a^{42} = a^{32+8+2} = a^{32} \cdot a^8 \cdot a^2 
+$$
+
+To calculate this product, we can iterate over the bits of $n$ maintaining two variables: the value of $a^{2^k}$ and the current product after considering $k$ lowest bits of $n$. On each step, we multiply the current product by $a^{2^k}$ if the $k$-th bit of $n$ is set, and, in either case, square $a^k$ to get $a^{2^k \cdot 2} = a^{2^{k+1}}$ that will be used on the next iteration.
+
+```c++
+u64 binpow(u64 a, u64 n) {
+    u64 r = 1;
+    
+    while (n) {
+        if (n & 1)
+            r = res * a % M;
+        a = a * a % M;
+        n >>= 1;
+    }
+    
+    return r;
+}
+```
+
+The iterative implementation takes about 180ns per call. The heavy calculations are the same; the improvement mainly comes from the reduced dependency chain: `a = a * a % M` needs to finish before the loop can proceed, and it can now execute concurrently with `r = res * a % M`.
+
+The performance also benefits from $n$ being a constant, [making all branches predictable](/hpc/pipelining/branching/) and letting the scheduler know what needs to be executed in advance. The compiler, however, does not take advantage of it and does not unroll the `while(n) n >>= 1` loop. We can rewrite it as a `for` loop that performs constant 30 iterations:
+
+```c++
+u64 inverse(u64 a) {
+    u64 r = 1;
+    
+    #pragma GCC unroll(30)
+    for (int l = 0; l < 30; l++) {
+        if ( (M - 2) >> l & 1 )
+            r = r * a % M;
+        a = a * a % M;
+    }
+
+    return r;
+}
+```
+
+This forces the compiler to generate only the instructions we need, shaving off another 10ns and making the total running time ~170ns.
+
+Note that the performance depends not only on the binary length of $n$, but also on the number of binary 1s. If $n$ is $2^{30}$, it takes around 20ns less as we don't have to to perform any off-path multiplications.
diff --git a/content/english/hpc/number-theory/finite.md b/content/english/hpc/number-theory/finite.md
index fbef0015..cae2f2ef 100644
--- a/content/english/hpc/number-theory/finite.md
+++ b/content/english/hpc/number-theory/finite.md
@@ -1,6 +1,6 @@
 ---
 title: Finite Fields
-weight: 3
+weight: 5
 draft: true
 ---
 
diff --git a/content/english/hpc/number-theory/hashing.md b/content/english/hpc/number-theory/hashing.md
index 0484d173..294573a1 100644
--- a/content/english/hpc/number-theory/hashing.md
+++ b/content/english/hpc/number-theory/hashing.md
@@ -12,7 +12,7 @@ Hash function is any function that is:
 
 * Computed fast — at least in linear time, that is.
 * Has a limited image — say, 64-bit values.
-* "Deterministically-random": if it takes $n$ different values, then the probability of collision of two random hashes is $\frac{1}{n}$ and can't be predicted well without knowing the hash function.
+* "Deterministically-random:" if it takes $n$ different values, then the probability of collision of two random hashes is $\frac{1}{n}$ and can't be predicted well without knowing the hash function.
 
 One good test is that can't create a collision in any better time than by birthday paradox. Square root of the hash space.
 
diff --git a/content/english/hpc/number-theory/img/clock.gif b/content/english/hpc/number-theory/img/clock.gif
new file mode 100644
index 00000000..0d0c6555
Binary files /dev/null and b/content/english/hpc/number-theory/img/clock.gif differ
diff --git a/content/english/hpc/number-theory/inverse.md b/content/english/hpc/number-theory/inverse.md
deleted file mode 100644
index dbfe1676..00000000
--- a/content/english/hpc/number-theory/inverse.md
+++ /dev/null
@@ -1,187 +0,0 @@
----
-title: Modular Inverse
-weight: 1
----
-
-```c++
-mint inv() const {
-    uint t = x;
-    uint res = 1;
-    while (t != 1) {
-        uint z = mod / t;
-        res = (ull) res * (mod - z) % mod;
-        t = mod - t * z;
-    }
-    return res;
-}
-```
-
-In this section, we are going to discuss some preliminaries before discussing more advanced topics.
-
-In computers, we use the 1st of January, 1970 as the start of the "Unix era", and all time computations are usually done relative to that timestamp.
-
-We humans also keep track of time relative to some point in the past, which usually has a political or religious significance. At the moment of writing, approximately 63882260594 seconds have passed since 0 AD.
-
-But for daily tasks, we do not really need that information. Depending on the situation, the relevant part may be that it is 2 pm right now and it's time to go to dinner, or that it's Thursday and so Subway's sub of the day is an Italian BMT. What we do is instead of using a timestamp we use its remainder, which contains just the information we need. And the beautiful thing about it is that remainders are small and cyclic. Think the hour clock: after 12 there comes 1 again, so the number is always small.
-
-![](../img/clock.gif)
-
-It is much easier to deal with 1- or 2-digit numbers than 11-digit ones. If we encode each day of the weak starting with Monday from 0 to 6 inclusive, Thursday is going to get number 3. But what day of the week is it going to be in one year? We need to add 365 to it and then reduce modulo 7. It is convenient that `365 % 7` is 1, so we will know that it's Friday unless it is a leap year (in which case it will be Saturday).
-
-Modular arithmetic studies the way these sets of remainders behave, and it has fundamental applications in number theory, cryptography and data compression.
-
-
-Consider the following problem: our "week" now consists of $m$ days, and we cycle through it with a steps of $a > 0$. How many distinct days there will be?
-
-Let's assume that the first day is always Monday. At some point the sequence of day is going to cycle. The days will be representable as $k a \mod m$, so we need to find the first $k$ such as $k a$ is divisible by $m$. In the case of $m=7$, $m$ is prime, so the cycle length will be 7 exactly for any $a$.
-
-Now, if $m$ is not prime, but it is still coprime with $a$. For $ka$ to be divisible by $m$, $k$ needs to be divisible by $m$. In general, the answer is $\frac{m}{gcd(a, m)}$. For example, if the week is 10 days long, if the starting number is even, then it will cycle through all even numbers, and if the number is 5, then it will only cycle between 0 and 5. Otherwise it will go through all 10 remainders.
-
-### Fermat's Theorem
-
-Now, consider what happens if instead of adding a number $a$, we repeatedly multiply by it, that is, write numbers in the form $a^n \mod m$. Since these are all finite numbers there is going to be a cycle, but what will its length be? If $p$ is prime, it turns out, all of them.
-
-**Theorem.** $a^p \equiv a \pmod p$ for all $a$ that are not multiple of $p$.
-
-**Proof**. Let $P(x_1, x_2, \ldots, x_n) = \frac{k}{\prod (x_i!)}$ be the *multinomial coefficient*, that is, the number of times the element $a_1^{x_1} a_2^{x_2} \ldots a_n^{x_n}$ would appear after the expansion of $(a_1 + a_2 + \ldots + a_n)^k$. Then
-
-$$
-\begin{aligned}
-a^p &= (\underbrace{1+1+\ldots+1+1}_\text{$a$ times})^p &
-\\\ &= \sum_{x_1+x_2+\ldots+x_a = p} P(x_1, x_2, \ldots, x_a) & \text{(by defenition)}
-\\\ &= \sum_{x_1+x_2+\ldots+x_a = p} \frac{p!}{x_1! x_2! \ldots x_a!} & \text{(which terms will not be divisible by $p$?)}
-\\\ &\equiv P(p, 0, \ldots, 0) + \ldots + P(0, 0, \ldots, p) & \text{(everything else will be canceled)}
-\\\ &= a
-\end{aligned}
-$$
-
-and then dividing by $a$ gives us the Fermat's theorem.
-
-Note that this is only true for prime $p$. Euler's theorem handles the case of arbitary $m$, and states that
-
-$$
-a^{\phi(m)} \equiv 1 \pmod m
-$$
-
-where $\phi(m)$ is called Euler's totient function and is equal to the number of residues of $m$ that is coprime with it. In particular case of when $m$ is prime, $\phi(p) = p - 1$ and we get Fermat's theorem, which is just a special case.
-
-### Primality Testing
-
-These theorems have a lot of applications. One of them is checking whether a number $n$ is prime or not faster than factoring it. You can pick any base $a$ at random and try to raise it to power $a^{p-1}$ modulo $n$ and check if it is $1$. Such base is called *witness*.
-
-Such probabilistic tests are therefore returning either "no" or "maybe". It may be the case that it just happened to be equal to $1$ but in fact $n$ is composite, in which case you need to repeat the test until you are okay with the false positive probability. Moreover, there exist carmichael numbers, which are composite numbers $n$ that satisfy $a^n \equiv 1 \pmod n$ for all $a$. These numbers are rare, but still [exist](https://oeis.org/A002997).
-
-Unless the input is provided by an adversary, the mistake probability will be low. This test is adequate for finding large primes: there are roughly $\frac{n}{\ln n}$ primes among the first $n$ numbers, which is another fact that we are not going to prove. These primes are distributed more or less evenly, so one can just pick a random number and check numbers in sequence, and after checking $O(\ln n)$ numbers one will probably be found.
-
-### Binary Exponentiation
-
-To perform the Fermat test, we need to raise a number to power $n-1$, preferrably using less than $n-2$ modular multiplications. We can use the fact that multiplication is associative:
-
-$$
-\begin{aligned}
-    a^{2k}       &= (a^k)^2
-\\  a^{2k + 1} &= (a^k)^2 \cdot a
-\end{aligned}
-$$
-
-We essentially group it like this:
-
-$$
-a^8 = (aaaa) \cdot (aaaa) = ((aa)(aa))((aa)(aa))
-$$
-
-This allows using only $O(\log n)$ operations (or, more specifically, at most $2 \cdot \log_2 n$ modular multiplications).
-
-```c++
-int binpow(int a, int n) {
-    int res = 1;
-    while (n) {
-        if (n & 1)
-            res = res * a % mod;
-        a = a * a % mod;
-        n >>= 1;
-    }
-    return res;
-}
-```
-
-This helps if `n` or `mod` is a constant.
-
-### Modular Division
-
-"Normal" operations also apply to residues: +, -, *. But there is an issue with division, because we can't just bluntly divide two numbers: $\frac{8}{2} = 4$, но $\frac{8 \\% 5 = 3}{2 \\% 5 = 2} \neq 4$.
-
-To perform division, we need to find an element that will behave itself like the reciprocal $\frac{1}{a} = a^{-1}$, and instead of "division" multiply by it. This element is called a *modular inverse*.
-
-If the modulo is a prime number, then the solution is $a^{-1} \equiv a^{p-2}$, which follows directly from Fermat's theorem by dividing the equivalence by $a$:
-
-$$
-a^p \equiv a \implies a^{p-1} \equiv 1 \implies a^{p-2} \equiv a^{-1}
-$$
-
-This means that $a^{p-2}$ "behaves" like $a^{-1}$ which is what we need.
-
-You can calculate $a^{p-2}$ in $O(\log p)$ time using binary exponentiation:
-
-```c++
-int inv(int x) {
-    return binpow(x, mod - 2);
-}
-```
-
-If the modulo is not prime, then we can still get by calculating $\phi(m)$ and invoking Euler's theorem. But calculating $\phi(m)$ is as difficult as factoring it, which is not fast. There is a more general method.
-
-### Extended Euclidean Algorithm
-
-*Extended Euclidean algorithm* apart from finding $g = \gcd(a, b)$ also finds integers $x$ and $y$ such that
-
-$$
-a \cdot x + b \cdot y = g
-$$
-
-which solves the problem of finding modular inverse if we substitute $b$ with $m$ and $g$ with $1$:
-
-$$
-a^{-1} \cdot a + k \cdot m = 1
-$$
-
-Note that if $a$ is not coprime with $m$, then there will be no solution. We can still find *some* element, but it will not work for any dividend.
-
-The algorithm is also recursive. It makes a recursive call, calculates the coefficients $x'$ and $y'$ for $\gcd(b, a \bmod b)$, and restores the general solution. If we have a solution $(x', y')$ for pair $(b, a \bmod b)$:
-
-$$
-b \cdot x' + (a \bmod b) \cdot y' = g
-$$
-
-To get the solution for the initial input, rewrite the expression $(a \bmod b)$ as $(a - \lfloor \frac{a}{b} \rfloor \cdot b)$ and subsitute it into the aforementioned equality:
-
-$$
-b \cdot x' + (a - \Big \lfloor \frac{a}{b} \Big \rfloor \cdot b) \cdot y' = g
-$$
-
-Now let's rearrange the terms (grouping by $a$ and $b$) to get
-
-$$
-a \cdot \underbrace{y'}_x + b \cdot \underbrace{(x' - \Big \lfloor \frac{a}{b} \Big \rfloor \cdot y')}_y = g
-$$
-
-Comparing it with initial expression, we infer that we can just use coefficients by $a$ and $b$ for the initial $x$ and $y$.
-
-```c++
-int gcd(int a, int b, int &x, int &y) {
-    if (a == 0) {
-        x = 0;
-        y = 1;
-        return b;
-    }
-    int x1, y1;
-    int d = gcd(b % a, a, x1, y1);
-    x = y1 - (b / a) * x1;
-    y = x1;
-    return d;
-}
-```
-
-Another application is the exact division modulo $2^k$.
-
-**Exercise**. Try to adapt the technique for binary GCD.
diff --git a/content/english/hpc/number-theory/modular.md b/content/english/hpc/number-theory/modular.md
new file mode 100644
index 00000000..3d05e2f9
--- /dev/null
+++ b/content/english/hpc/number-theory/modular.md
@@ -0,0 +1,140 @@
+---
+title: Modular Arithmetic
+weight: 1
+---
+
+<!--
+
+TODO: use it in binary exponentiation.
+
+In this section, we are going to discuss some preliminaries before discussing more advanced topics.
+
+we use the 1st of January, 1970 as the start of the "Unix era," and all time computations are usually done relative to that timestamp.
+
+And the beautiful thing about it is that remainders are small and cyclic. Think the hour clock: after 12 there comes 1 again, so the number is always small.
+
+![](../img/clock.gif)
+
+-->
+
+Computers usually store time as the number of seconds that have passed since the 1st of January, 1970 — the start of the "Unix era" — and use these timestamps in all computations that have to do with time.
+
+We humans also keep track of time relative to some point in the past, which usually has a political or religious significance. For example, at the moment of writing, approximately 63882260594 seconds have passed since 1 AD — [6th century Eastern Roman monks' best estimate](https://en.wikipedia.org/wiki/Anno_Domini) of the day Jesus Christ was born.
+
+But unlike computers, we do not always need *all* that information. Depending on the task at hand, the relevant part may be that it's 2 pm right now, and it's time to go to dinner; or that it's Thursday, and so Subway's sub of the day is an Italian BMT. Instead of the whole timestamp, we use its *remainder* containing just the information we need: it is much easier to deal with 1- or 2-digit numbers than 11-digit ones.
+
+**Problem.** Today is Thursday. What day of the week will be exactly in a year?
+
+If we enumerate each day of the week, starting with Monday, from $0$ to $6$ inclusive, Thursday gets number $3$. To find out what day it is going to be in a year from now, we need to add $365$ to it and then reduce modulo $7$. Conveniently, $365 \bmod 7 = 1$, so we know that it will be Friday unless it is a leap year (in which case it will be Saturday).
+
+### Residues
+
+**Definition.** Two integers $a$ and $b$ are said to be *congruent* modulo $m$ if $m$ divides their difference:
+
+$$
+m \mid (a - b) \; \Longleftrightarrow \; a \equiv b \pmod m
+$$
+
+For example, the 42nd day of the year is the same weekday as the 161st since $(161 - 42) = 119 = 17 \times 7$.
+
+Congruence modulo $m$ is an equivalence relation that splits all integers into equivalence classes called *residues*. Each residue class modulo $m$ may be represented by any one of its members — although we commonly use the smallest nonnegative integer of that class (equal to the remainder $x \bmod m$ for all nonnegative $x$).
+
+<!--
+
+Equivalently, the *remainders* of their division by $m$ should be equal:
+
+a \bmod m = b \bmod m
+
+Here are a few example of how this can be useful.
+
+-->
+
+*Modular arithmetic* studies these sets of residues, which are fundamental for number theory.
+
+**Problem.** Our "week" now consists of $m$ days, and our year consists of $a$ days (no leap years). How many distinct days of the week there will be among one, two, three and so on whole years from now?
+
+For simplicity, assume that today is Monday, so that the initial day number $d_0$ is zero, and after each year, it changes to
+
+$$
+d_{k + 1} = (d_k + a) \bmod m
+$$
+
+After $k$ years, it will be
+
+$$
+d_k = k \cdot a \bmod m
+$$
+
+Since there are only $m$ days in a week, at some point, it will be Monday again, and the sequence of day numbers is going to cycle. The number of distinct days is the length of this cycle, so we need to find the smallest $k$ such that
+
+$$
+k \cdot a \equiv 0 \pmod m
+$$
+
+First of all, if $a \equiv 0$, it will be eternal Monday. Now, assuming the non-trivial case of $a \not \equiv 0$:
+
+- For a seven-day week, $m = 7$ is prime. There is no $k$ smaller than $m$ such that $k \cdot a$ is divisible by $m$ because $m$ can not be decomposed in such a product by the definition of primality. So, if $m$ is prime, we will cycle through all of $m$ weekdays.
+- If $m$ is not prime, but $a$ is *coprime* with it (that is, $a$ and $m$ do not have common divisors), then the answer is still $m$ for the same reason: the divisors of $a$ do not help in zeroing out the product any faster.
+- If $a$ and $m$ share some divisors, then it is only possible to get residues that are also divisible by them. For example, if the week is $m = 10$ days long, and the year has $a = 42$ or any other even number of days, then we will cycle through all even day numbers, and if the number of days is a multiple of $5$, then we will only oscillate between $0$ and $5$. Otherwise, we will go through all the $10$ remainders.
+
+Therefore, in general, the answer is $\frac{m}{\gcd(a, m)}$, where $\gcd(a, m)$ is the [greatest common divisor](/hpc/algorithms/gcd/) of $a$ and $m$.
+
+### Fermat's Theorem
+
+Now, consider what happens if, instead of adding a number $a$, we repeatedly multiply by it, writing out a sequence of
+
+$$
+d_n = a^n \bmod m
+$$
+
+Again, since there is a finite number of residues, there is going to be a cycle. But what will its length be? Turns out, if $m$ is prime, it will span all $(m - 1)$ non-zero residues.
+
+**Theorem.** For any $a$ and a prime $p$:
+
+$$
+a^p \equiv a \pmod p
+$$
+
+**Proof**. Let $P(x_1, x_2, \ldots, x_n) = \frac{k}{\prod (x_i!)}$ be the *multinomial coefficient:* the number of times the element $a_1^{x_1} a_2^{x_2} \ldots a_n^{x_n}$ appears after the expansion of $(a_1 + a_2 + \ldots + a_n)^k$. Then:
+
+$$
+\begin{aligned}
+a^p &= (\underbrace{1+1+\ldots+1+1}_\text{$a$ times})^p &
+\\\ &= \sum_{x_1+x_2+\ldots+x_a = p} P(x_1, x_2, \ldots, x_a) & \text{(by definition)}
+\\\ &= \sum_{x_1+x_2+\ldots+x_a = p} \frac{p!}{x_1! x_2! \ldots x_a!} & \text{(which terms will not be divisible by $p$?)}
+\\\ &\equiv P(p, 0, \ldots, 0) + \ldots + P(0, 0, \ldots, p) & \text{(everything else will be canceled)}
+\\\ &= a
+\end{aligned}
+$$
+
+Note that this is only true for prime $p$. We can use this fact to test whether a given number is prime faster than by factoring it: we can pick a number $a$ at random, calculate $a^{p} \bmod p$, and check whether it is equal to $a$ or not.
+
+This is called *Fermat primality test*, and it is probabilistic — only returning either "no" or "maybe" — since it may be that $a^p$ just happened to be equal to $a$ despite $p$ being composite, in which case you need to repeat the test with a different random $a$ until you are satisfied with the false positive probability.
+
+Primality tests are commonly used to generate large primes (for cryptographic purposes). There are roughly $\frac{n}{\ln n}$ primes among the first $n$ numbers (a fact that we are not going to prove), and they are distributed more or less evenly. One can just pick a random number from the required range, perform a primality check, and repeat until a prime is found, performing $O(\ln n)$ trials on average.
+
+An extremely bad input to the Fermat test is the [Carmichael numbers](https://en.wikipedia.org/wiki/Carmichael_number), which are composite numbers $n$ that satisfy $a^{n-1} \equiv 1 \pmod n$ for all relatively prime $a$. But these are [rare](https://oeis.org/A002997), and the chance of randomly bumping into it is low.
+
+### Modular Division
+
+Implementing most "normal" arithmetic operations with residues is straightforward. You only need to take care of integer overflows and remember to take modulo:
+
+```c++
+c = (a + b) % m;
+c = (a - b + m) % m;
+c = a * b % m;
+```
+
+But there is an issue with division: we can't just bluntly divide two residues. For example, $\frac{8}{2} = 4$, but
+
+$$
+\frac{8 \bmod 5}{2 \bmod 5} = \frac{3}{2} \neq 4
+$$
+
+To perform modular division, we need to find an element that "acts" like the reciprocal $\frac{1}{a} = a^{-1}$ and multiply by it. This element is called a *modular multiplicative inverse*, and Fermat's theorem can help us find it when the modulo $p$ is a prime. When we divide its equivalence twice by $a$, we get:
+
+$$
+a^p \equiv a \implies a^{p-1} \equiv 1 \implies a^{p-2} \equiv a^{-1}
+$$
+
+Therefore, $a^{p-2}$ is like $a^{-1}$ for the purposes of multiplication, which is what we need from a modular inverse of $a$.
diff --git a/content/english/hpc/number-theory/montgomery.md b/content/english/hpc/number-theory/montgomery.md
index e784dfaf..0eeef0b0 100644
--- a/content/english/hpc/number-theory/montgomery.md
+++ b/content/english/hpc/number-theory/montgomery.md
@@ -1,102 +1,208 @@
 ---
 title: Montgomery Multiplication
-weight: 2
+weight: 4
+published: true
 ---
 
-When we talked about [integers](../integer) in general, we discussed how to perform division and modulo by multiplication, and, unsurprisingly, in modular arithmetic 90% of its time is spent calculating modulo. Apart from using the general tricks described in the previous article, there is another method specifically for modular arithmetic, called *Montgomery multiplication*.
+Unsurprisingly, a large fraction of computation in [modular arithmetic](../modular) is often spent on calculating the modulo operation, which is as slow as [general integer division](/hpc/arithmetic/division/) and typically takes 15-20 cycles, depending on the operand size.
 
-As all other fast reduction methods, it doesn't come for free. It works only in *Montgomery space*, so we need to transform our numbers in and out of it before doing the multiplications. This means that on top of doing some compile-time computations, we would also need to do some operations before the multiplication.
+The best way to deal this nuisance is to avoid modulo operation altogether, delaying or replacing it with [predication](/hpc/pipelining/branchless), which can be done, for example, when calculating modular sums:
 
-For the space we need a positive integer $r \ge n$ coprime to $n$. In practice we always choose $r$ to be $2^m$ (with $m$ usually being equal 32 or 64), since multiplications, divisions and modulo $r$ operations can then be efficiently implemented using shifts and bitwise operations. Therefore $n$ needs to be an odd number so that every power of $2$ will be coprime to $n$. And if it is not, we can make it odd (?).
+```cpp
+const int M = 1e9 + 7;
 
-The representative $\bar x$ of a number $x$ in the Montgomery space is defined as
+// input: array of n integers in the [0, M) range
+// output: sum modulo M
+int slow_sum(int *a, int n) {
+    int s = 0;
+    for (int i = 0; i < n; i++)
+        s = (s + a[i]) % M;
+    return s;
+}
+
+int fast_sum(int *a, int n) {
+    int s = 0;
+    for (int i = 0; i < n; i++) {
+        s += a[i]; // s < 2 * M
+        s = (s >= M ? s - M : s); // will be replaced with cmov
+    }
+    return s;
+}
+
+int faster_sum(int *a, int n) {
+    long long s = 0; // 64-bit integer to handle overflow
+    for (int i = 0; i < n; i++)
+        s += a[i]; // will be vectorized
+    return s % M;
+}
+```
+
+However, sometimes you only have a chain of modular multiplications, and there is no good way to eel out of computing the remainder of the division — other than with the [integer division tricks](../hpc/arithmetic/division/) requiring a constant modulo and some precomputation.
+
+But there is another technique designed specifically for modular arithmetic, called *Montgomery multiplication*.
+
+### Montgomery Space
+
+Montgomery multiplication works by first transforming the multipliers into *Montgomery space*, where modular multiplication can be performed cheaply, and then transforming them back when their actual values are needed. Unlike general integer division methods, Montgomery multiplication is not efficient for performing just one modular reduction and only becomes worthwhile when there is a chain of modular operations.
+
+The space is defined by the modulo $n$ and a positive integer $r \ge n$ coprime to $n$. The algorithm involves modulo and division by $r$, so in practice, $r$ is chosen to be $2^{32}$ or $2^{64}$, so that these operations can be done with a right-shift and a bitwise AND respectively.
+
+<!-- Therefore $n$ needs to be an odd number so that every power of $2$ will be coprime to $n$. And if it is not, we can make it odd (?). -->
+
+**Definition.** The *representative* $\bar x$ of a number $x$ in the Montgomery space is defined as
 
 $$
 \bar{x} = x \cdot r \bmod n
 $$
 
-Note that the transformation is actually such a multiplication that we want to optimize, so it is still an expensive operation. However, we will only need to transform a number into the space once, perform as many operations as we want efficiently in that space and at the end transform the final result back, which should be profitable if we are doing lots of operations modulo $n$.
+Computing this transformation involves a multiplication and a modulo — an expensive operation that we wanted to optimize away in the first place — which is why we only use this method when the overhead of transforming numbers to and from the Montgomery space is worth it and not for general modular multiplication.
+
+<!-- Note that the transformation is actually such a multiplication that we want to optimize, so it is still an expensive operation. However, we will only need to transform a number into the space once, perform as many operations as we want efficiently in that space and at the end transform the final result back, which should be profitable if we are doing lots of operations modulo $n$. -->
+
+Inside the Montgomery space, addition, substraction, and checking for equality is performed as usual:
+
+$$
+x \cdot r + y \cdot r \equiv (x + y) \cdot r \bmod n
+$$
 
-Inside the Montgomery space addition, substraction and checking for equality is performed as usual ($x \cdot r + y \cdot r \equiv (x + y) \cdot r \bmod n$). However, this is not the case for multiplication. Denoting multiplication in Montgomery space as $*$ and normal multiplication as $\cdot$, we expect the result to be:
+However, this is not the case for multiplication. Denoting multiplication in the Montgomery space as $*$ and the "normal" multiplication as $\cdot$, we expect the result to be:
 
 $$
 \bar{x} * \bar{y} = \overline{x \cdot y} = (x \cdot y) \cdot r \bmod n
 $$
 
-But the normal multiplication will give us:
+But the normal multiplication in the Montgomery space yields:
 
 $$
 \bar{x} \cdot \bar{y} = (x \cdot y) \cdot r \cdot r \bmod n
 $$
 
-Therefore the multiplication in the Montgomery space is defined as
+Therefore, the multiplication in the Montgomery space is defined as
 
 $$
 \bar{x} * \bar{y} = \bar{x} \cdot \bar{y} \cdot r^{-1} \bmod n
 $$
 
-This means that whenever we multiply two numbers, after the multiplication we need to *reduce* them. Therefore, we need to have an efficient way of calculating $x \cdot r^{-1} \bmod n$.
+This means that, after we normally multiply two numbers in the Montgomery space, we need to *reduce* the result by multiplying it by $r^{-1}$ and taking the modulo — and there is an efficent way to do this particular operation.
 
 ### Montgomery reduction
 
-Assume that $r=2^{64}$, the modulo $n$ is 64-bit and the number $x$ we need to reduce (multiply by $r^{-1}$) is 128-bit (the product of two 64-bit numbers).
+Assume that $r=2^{32}$, the modulo $n$ is 32-bit, and the number $x$ we need to reduce is 64-bit (the product of two 32-bit numbers). Our goal is to calculate $y = x \cdot r^{-1} \bmod n$. 
 
-Because $\gcd(n, r) = 1$, we know that there are two numbers $r^{-1}$ and $n'$ in the $[0, n)$ range such that
+Since $r$ is coprime with $n$, we know that there are two numbers $r^{-1}$ and $n^\prime$ in the $[0, n)$ range such that
 
 $$
-r \cdot r^{-1} + n \cdot n' = 1
+r \cdot r^{-1} + n \cdot n^\prime = 1
 $$
 
-and both $r^{-1}$ and $n'$ can be computed using the extended Euclidean algorithm.
+and both $r^{-1}$ and $n^\prime$ can be computed, e.g., using the [extended Euclidean algorithm](../euclid-extended).
 
-Using this identity we can express $r \cdot r^{-1}$ as $(-n \cdot n' + 1)$ and write $x \cdot r^{-1}$ as
+Using this identity, we can express $r \cdot r^{-1}$ as $(1 - n \cdot n^\prime)$ and write $x \cdot r^{-1}$ as
 
 $$
 \begin{aligned}
 x \cdot r^{-1} &= x \cdot r \cdot r^{-1} / r
-\\             &= x \cdot (-n \cdot n^{\prime} + 1) / r
-\\             &= (-x \cdot n \cdot n^{\prime} + x) / r
-\\             &\equiv (-x \cdot n \cdot n^{\prime} + l \cdot r \cdot n + x) / r \bmod n
-\\             &\equiv ((-x \cdot n^{\prime} + l \cdot r) \cdot n + x) / r \bmod n
+\\             &= x \cdot (1 - n \cdot n^{\prime}) / r
+\\             &= (x - x \cdot n \cdot n^{\prime}    ) / r
+\\             &\equiv (x - x \cdot n \cdot n^{\prime} + k \cdot r \cdot n) / r &\pmod n &\;\;\text{(for any integer $k$)}
+\\             &\equiv (x - (x \cdot n^{\prime} - k \cdot r) \cdot n) / r &\pmod n
 \end{aligned}
 $$
 
-The equivalences hold for any integer $l$. This means that we can add or subtract an arbitrary multiple of $r$ to $x \cdot n'$, or in other words, we can compute $q = x \cdot n'$ modulo $r$.
+Now, if we choose $k$ to be $\lfloor x \cdot n^\prime / r \rfloor$ (the upper 64 bits of the $x \cdot n^\prime$ product), it will cancel out, and $(k \cdot r - x \cdot n^{\prime})$ will simply be equal to $x \cdot n^{\prime} \bmod r$ (the lower 32 bits of $x \cdot n^\prime$), implying:
+
+$$
+x \cdot r^{-1} \equiv (x - x \cdot n^{\prime} \bmod r \cdot n) / r
+$$
+
+The algorithm itself just evaluates this formula, performing two multiplications to calculate $q = x \cdot n^{\prime} \bmod r$ and $m = q \cdot n$, and then subtracts it from $x$ and right-shifts the result to divide it by $r$.
+
+The only remaining thing to handle is that the result may not be in the $[0, n)$ range; but since
+
+$$
+x < n \cdot n < r \cdot n \implies x / r < n
+$$
+
+and
+
+$$
+m = q \cdot n < r \cdot n \implies m / r < n
+$$
+
+it is guaranteed that
+
+$$
+-n < (x - m) / r < n
+$$
+
+Therefore, we can simply check if the result is negative and in that case, add $n$ to it, giving the following algorithm:
 
-This gives us the following algorithm to compute $x \cdot r^{-1} \bmod n$:
+```c++
+typedef __uint32_t u32;
+typedef __uint64_t u64;
 
-```python
-def reduce(x):
-    q = (x % r) * nr % r
-    a = (x - q * n) / r
-    if a < 0:
-        a += n
-    return a
+const u32 n = 1e9 + 7, nr = inverse(n, 1ull << 32);
+
+u32 reduce(u64 x) {
+    u32 q = u32(x) * nr;      // q = x * n' mod r
+    u64 m = (u64) q * n;      // m = q * n
+    u32 y = (x - m) >> 32;    // y = (x - m) / r
+    return x < m ? y + n : y; // if y < 0, add n to make it be in the [0, n) range
+}
 ```
 
-Since $x < n \cdot n < r \cdot n$ (as $x$ is a product of multiplicatio) and $q \cdot n < r \cdot n$, we know that $-n < (x - q \cdot n) / r < n$. Therefore the final modulo operation can be implemented using a single bound check and addition.
+This last check is relatively cheap, but it is still on the critical path. If we are fine with the result being in the $[0, 2 \cdot n - 2]$ range instead of $[0, n)$, we can remove it and add $n$ to the result unconditionally:
+
+```c++
+u32 reduce(u64 x) {
+    u32 q = u32(x) * nr;
+    u64 m = (u64) q * n;
+    u32 y = (x - m) >> 32;
+    return y + n
+}
+```
+
+We can also move the `>> 32` operation one step earlier in the computation graph and compute $\lfloor x / r \rfloor - \lfloor m / r \rfloor$ instead of $(x - m) / r$. This is correct because the lower 32 bits of $x$ and $m$ are equal anyway since
+
+$$
+m = x \cdot n^\prime \cdot n \equiv x \pmod r
+$$
+
+But why would we voluntarily choose to perfom two right-shifts instead of just one? This is beneficial because for `((u64) q * n) >> 32` we need to do a 32-by-32 multiplication and take the upper 32 bits of the result (which the x86 `mul` instruction [already writes](../hpc/arithmetic/integer/#128-bit-integers) in a separate register, so it doesn't cost anything), and the other right-shift `x >> 32` is not on the critical path.
+
+```c++
+u32 reduce(u64 x) {
+    u32 q = u32(x) * nr;
+    u32 m = ((u64) q * n) >> 32;
+    return (x >> 32) + n - m;
+}
+```
 
-Here is an equivalent C implementation for 64-bit integers:
+One of the main advantages of Montgomery multiplication over other modular reduction methods is that it doesn't require very large data types: it only needs a $r \times r$ multiplication that extracts the lower and higher $r$ bits of the result, which [has special support](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#ig_expand=7395,7392,7269,4868,7269,7269,1820,1835,6385,5051,4909,4918,5051,7269,6423,7410,150,2138,1829,1944,3009,1029,7077,519,5183,4462,4490,1944,5055,5012,5055&techs=AVX,AVX2&text=mul) on most hardware also makes it easily generalizable to [SIMD](../hpc/simd/) and larger data types:
 
 ```c++
-u64 reduce(u128 x) {
+typedef __uint128_t u128;
+
+u64 reduce(u128 x) const {
     u64 q = u64(x) * nr;
     u64 m = ((u128) q * n) >> 64;
-    u64 xhi = (x >> 64);
-    if (xhi >= m)
-        return (xhi - m);
-    else
-        return (xhi - m) + n;
+    return (x >> 64) + n - m;
 }
 ```
 
-We also need to implement calculating calculating the inverse of $n$ (`nr`) and transformation of numbers in and our of Montgomery space. Before providing complete implementation, let's discuss how to do that smarter, although they are just done once.
+Note that a 128-by-64 modulo is not possible with general integer division tricks: the compiler [falls back](https://godbolt.org/z/fbEE4v4qr) to calling a slow [long arithmetic library function](https://github.com/llvm-mirror/compiler-rt/blob/69445f095c22aac2388f939bedebf224a6efcdaf/lib/builtins/udivmodti4.c#L22) to support it.
+
+### Faster Inverse and Transform
 
-To transfer a number back from the Montgomery space we can just use Montgomery reduction.
+Montgomery multiplication itself is fast, but it requires some precomputation:
 
-### Fast inverse
+- inverting $n$ modulo $r$ to compute $n^\prime$,
+- transforming a number *to* the Montgomery space,
+- transforming a number *from* the Montgomery space.
 
-For computing the inverse $n' = n^{-1} \bmod r$ more efficiently, we can use the following trick inspired from the Newton's method:
+The last operation is already efficiently performed with the `reduce` procedure we just implemented, but the first two can be slightly optimized.
+
+**Computing the inverse** $n^\prime = n^{-1} \bmod r$ can be done faster than with the extended Euclidean algorithm by taking advantage of the fact that $r$ is a power of two and using the following identity:
 
 $$
 a \cdot x \equiv 1 \bmod 2^k
@@ -106,7 +212,7 @@ a \cdot x \cdot (2 - a \cdot x)
 1 \bmod 2^{2k}
 $$
 
-This can be proven this way:
+Proof:
 
 $$
 \begin{aligned}
@@ -119,47 +225,69 @@ a \cdot x \cdot (2 - a \cdot x)
 \end{aligned}
 $$
 
-This means we can start with $x = 1$ as the inverse of $a$ modulo $2^1$, apply the trick a few times and in each iteration we double the number of correct bits of $x$.
-
-### Fast transformation
+We can start with $x = 1$ as the inverse of $a$ modulo $2^1$ and apply this identity exactly $\log_2 r$ times, each time doubling the number of bits in the inverse — somewhat reminiscent of [the Newton's method](../hpc/arithmetic/newton/).
 
-Although we can just multiply a number by $r$ and compute one modulo the usual way, there is a faster way that makes use of the following relation:
+**Transforming** a number into the Montgomery space can be done by multiplying it by $r$ and computing modulo [the usual way](../hpc/arithmetic/division/), but we can also take advantage of this relation:
 
 $$
 \bar{x} = x \cdot r \bmod n = x * r^2
 $$
 
-Transforming a number into the space is just a multiplication inside the space of the number with $r^2$. Therefore we can precompute $r^2 \bmod n$ and just perform a multiplication and reduction instead.
+Transforming a number into the space is just a multiplication by $r^2$. Therefore, we can precompute $r^2 \bmod n$ and perform a multiplication and reduction instead — which may or may not be actually faster because multiplying a number by $r=2^{k}$ can be implemented with a left-shift, while multiplication by $r^2 \bmod n$ can not.
 
 ### Complete Implementation
 
+It is convenient to wrap everything into a single `constexpr` structure:
+
 ```c++
-// TODO fix me and prettify me
-struct montgomery {
-    u64 n, nr;
+struct Montgomery {
+    u32 n, nr;
     
-    montgomery(u64 n) : n(n) {
-        nr = 1;
-        for (int i = 0; i < 6; i++)
+    constexpr Montgomery(u32 n) : n(n), nr(1) {
+        // log(2^32) = 5
+        for (int i = 0; i < 5; i++)
             nr *= 2 - n * nr;
     }
 
-    u64 reduce(u128 x) {
-        u64 q = u64(x) * nr;
-        u64 m = ((u128) q * n) >> 64;
-        u64 xhi = (x >> 64);
-        if (xhi >= m)
-            return (xhi - m);
-        else
-            return (xhi - m) + n;
+    u32 reduce(u64 x) const {
+        u32 q = u32(x) * nr;
+        u32 m = ((u64) q * n) >> 32;
+        return (x >> 32) + n - m;
+        // returns a number in the [0, 2 * n - 2] range
+        // (add a "x < n ? x : x - n" type of check if you need a proper modulo)
     }
 
-    u64 mult(u64 x, u64 y) {
-        return reduce((u128) x * y);
+    u32 multiply(u32 x, u32 y) const {
+        return reduce((u64) x * y);
     }
 
-    u64 transform(u64 x) {
-        return (u128(x) << 64) % n;
+    u32 transform(u32 x) const {
+        return (u64(x) << 32) % n;
+        // can also be implemented as multiply(x, r^2 mod n)
     }
 };
 ```
+
+To test its performance, we can plug Montgomery multiplication into the [binary exponentiation](../hpc/number-theory/exponentiation/):
+
+```c++
+constexpr Montgomery space(M);
+
+int inverse(int _a) {
+    u64 a = space.transform(_a);
+    u64 r = space.transform(1);
+    
+    #pragma GCC unroll(30)
+    for (int l = 0; l < 30; l++) {
+        if ( (M - 2) >> l & 1 )
+            r = space.multiply(r, a);
+        a = space.multiply(a, a);
+    }
+
+    return space.reduce(r);
+}
+```
+
+While vanilla binary exponentiation with a compiler-generated fast modulo trick requires ~170ns per `inverse` call, this implementation takes ~166ns, going down to ~158ns we omit `transform` and `reduce` (a reasonable use case is for `inverse` to be used as a subprocedure in a bigger modular computation). This is a small improvement, but Montgomery multiplication becomes much more advantageous for SIMD applications and larger data types.
+
+**Exercise.** Implement efficient *modular* [matix multiplication](/hpc/algorithms/matmul).
diff --git a/content/english/hpc/parallel/concurrency/fibers.md b/content/english/hpc/parallel/concurrency/fibers.md
index 2ec2806c..cce7b860 100644
--- a/content/english/hpc/parallel/concurrency/fibers.md
+++ b/content/english/hpc/parallel/concurrency/fibers.md
@@ -28,4 +28,4 @@ func main() {
 
 The way they work is that the language maintains a group of threads ready to pick up from where they left. This is called N:M scheduling.
 
-Similar runtimes exist for other languages, e. g. for C++ and Rust.
+Similar runtimes exist for other languages, e.g., for C++ and Rust.
diff --git a/content/english/hpc/parallel/gpu/_index.en.md b/content/english/hpc/parallel/gpu/_index.en.md
index aafb7ba1..ac2a4aa9 100644
--- a/content/english/hpc/parallel/gpu/_index.en.md
+++ b/content/english/hpc/parallel/gpu/_index.en.md
@@ -73,7 +73,7 @@ CUDA is available for many languages.
 
 Nice documentation can be found here: https://documen.tician.de/pycuda/index.html
 
-If you are on Colab, go to Runtime -> Change runtime type -> Hardware accelerator and set it to "GPU".
+If you are on Colab, go to Runtime -> Change runtime type -> Hardware accelerator and set it to "GPU."
 
 
 ```python
@@ -167,7 +167,7 @@ There is also `drv.InOut` function, which makes it available for both reading an
 
 Most of the operations here are memory operations, so measuring performance here is useless. Don't worry, we will get to more complex examples soon enough.
 
-GPUs have very specific operations. However, in case of NVIDIA GPUs managing it is quite simple: the cards have *compute capabilities* (1.0, 1.1, 1.2, 1.3, 2.0, etc.) and all features added at capability $x$ is also available at later versions. These can be checked at run-time or compile-time.
+GPUs have very specific operations. However, in case of NVIDIA GPUs managing it is quite simple: the cards have *compute capabilities* (1.0, 1.1, 1.2, 1.3, 2.0, etc.) and all features added at capability $x$ is also available at later versions. These can be checked at run time or compile time.
 
 You can check differences in this Wikipedia article: https://en.wikipedia.org/wiki/CUDA#Version_features_and_specifications
 
@@ -195,7 +195,7 @@ Some tasks, especially in cryptography, cannot be parallelized. But some can.
 
 ## Summing arrays in $O(\log n)$ time
 
-Assume we want to perform some associative (i. e. $A*(B*C) = (A*B)*C$) operation on an array of $n$ elements. Say, sum it up.
+Assume we want to perform some associative (i.e., $A*(B*C) = (A*B)*C$) operation on an array of $n$ elements. Say, sum it up.
 
 Normally, we would do that with a simple loop:
 
@@ -418,7 +418,7 @@ Intrinsics for that.
 
 Now, a lot of value comes from cryptocurrency and deep learning. The latter relies on two specific operations: matrix multiplications for linear layers and convolutions for convolutional layers used in computer vision.
 
-First, they introduced "multiply-accumulate" operation (e. g. `x += y * z`) per 1 GPU clock cycle.
+First, they introduced "multiply-accumulate" operation (e.g., `x += y * z`) per 1 GPU clock cycle.
 
 Google uses Tensor Processing Units. Nobody really knows how they work (proprietary hardware that they rent, not sell).
 
@@ -431,7 +431,7 @@ Well, you don't really need anything more precise than that for deep learning an
 
 It is called mixed precision because input matrices are fp16 but multiplication result and accumulator are fp32 matrices.
 
-Probably, the proper name would be "4x4 matrix cores", however NVIDIA marketing team decided to use "tensor cores".
+Probably, the proper name would be "4x4 matrix cores," however NVIDIA marketing team decided to use "tensor cores."
 
 So, see, this is not exactly fair comparison.
 
diff --git a/content/english/hpc/pipelining/_index.md b/content/english/hpc/pipelining/_index.md
index 3d7d49b5..aab72d79 100644
--- a/content/english/hpc/pipelining/_index.md
+++ b/content/english/hpc/pipelining/_index.md
@@ -5,7 +5,7 @@ weight: 3
 
 When programmers hear the word *parallelism*, they mostly think about *multi-core parallelism*, the practice of explicitly splitting a computation into semi-independent *threads* that work together to solve a common problem.
 
-This type of parallelism is mainly about reducing *latency* and achieving *scalability*, but not about improving *efficiency*. You can solve a problem ten times as big with a parallel algorithm, but it would take at least ten times as many computational resources. Although parallel hardware is becoming [ever more abundant](/hpc/complexity/hardware), and parallel algorithm design is becoming an increasingly more important area, for now, we will consider the use of more than one CPU core cheating.
+This type of parallelism is mainly about reducing *latency* and achieving *scalability*, but not about improving *efficiency*. You can solve a problem ten times as big with a parallel algorithm, but it would take at least ten times as many computational resources. Although parallel hardware is becoming [ever more abundant](/hpc/complexity/hardware) and parallel algorithm design is becoming an increasingly important area, for now, we will limit ourselves to considering only a single CPU core.
 
 But there are other types of parallelism, already existing inside a CPU core, that you can use *for free*.
 
@@ -19,9 +19,9 @@ Parallelism helps in reducing *latency*. It is important, but for now, our main
 
 Sharing computations is an art in itself, but for now, we want to learn how to use resources that we already have more efficiently.
 
-While multi-core parallelism is "cheating", many form of parallelism exist  "for free".
+While multi-core parallelism is "cheating," many form of parallelism exist "for free."
 
-Adapting algorithms for parallel hardware is important for achieving *scalability*. In the first part of this book, we will consider this technique "cheating". We only do optimizations that are truly free, and preferably don't take away resources from other processes that might be running concurrently.
+Adapting algorithms for parallel hardware is important for achieving *scalability*. In the first part of this book, we will consider this technique "cheating." We only do optimizations that are truly free, and preferably don't take away resources from other processes that might be running concurrently.
 
 -->
 
@@ -42,16 +42,16 @@ Pipelining does not reduce *actual* latency but functionally makes it seem like
 
 Having this in mind, hardware manufacturers prefer to use *cycles per instruction* (CPI) instead of something like "average instruction latency" as the main performance indicator for CPU designs. It is a [pretty good metric](/hpc/profiling/benchmarking) for algorithm designs too, if we only consider *useful* instructions.
 
-CPI of a perfectly pipelined processor should tend to one, but it can actually be even lower if we make each stage of the pipeline "wider" by duplicating it, so that more than one instruction can be processed at a time. Because the cache and most of the ALU can be shared, this ends up being cheaper than adding a fully separate core. Such architectures, capable of executing more than one instruction per cycle, are called *superscalar*, and most modern CPUs are.
+The CPI of a perfectly pipelined processor should tend to one, but it can actually be even lower if we make each stage of the pipeline "wider" by duplicating it, so that more than one instruction can be processed at a time. Because the cache and most of the ALU can be shared, this ends up being cheaper than adding a fully separate core. Such architectures, capable of executing more than one instruction per cycle, are called *superscalar*, and most modern CPUs are.
 
-You can only take advantage of superscalar processing if the stream of instructions contains groups of logically independent operations that can be processed separately. The instructions don't always arrive in the most convenient order, so, when possible, modern CPUs can execute them *out-of-order* to improve overall utilization and minimize pipeline stalls. How this magic works is a topic for a more advanced discussion<!--[a more advanced discussion](scheduling)-->, but for now, you can assume that the CPU maintains a buffer of pending instructions up to some distance in the future, and executes them as soon as the values of its operands are computed and there is an execution unit available.
+You can only take advantage of superscalar processing if the stream of instructions contains groups of logically independent operations that can be processed separately. The instructions don't always arrive in the most convenient order, so, when possible, modern CPUs can execute them *out of order* to improve overall utilization and minimize pipeline stalls. How this magic works is a topic for a more advanced discussion<!--[a more advanced discussion](scheduling)-->, but for now, you can assume that the CPU maintains a buffer of pending instructions up to some distance in the future, and executes them as soon as the values of its operands are computed and there is an execution unit available.
 
 ### An Education Analogy
 
 Consider how our education system works:
 
 1. Topics are taught to groups of students instead of individuals as broadcasting the same things to everyone at once is more efficient.
-2. An intake of students is split into groups lead by different teachers; assignments and other course materials are shared between groups.
+2. An intake of students is split into groups led by different teachers; assignments and other course materials are shared between groups.
 3. Each year the same course is taught to a new intake so that the teachers are kept busy.
 
 These innovations greatly increase the *throughput* of the whole system, although the *latency* (time to graduation for a particular student) remains unchanged (and maybe increases a little bit because personalized tutoring is more effective).
@@ -62,7 +62,7 @@ You can find many analogies with modern CPUs:
 2. There are multiple execution units that can process these instructions simultaneously while sharing other CPU facilities (usually 2-4 execution units).
 3. Instructions are processed in pipelined fashion (saving roughly the same number of cycles as the number of years between kindergarten and PhD).
 
-<!-- You can continue "up": there are multiple school branches (cores), multiple schools (computers), etc. -->
+<!-- You can continue "up:" there are multiple school branches (cores), multiple schools (computers), etc. -->
 
 In addition to that, several other aspects also match:
 
diff --git a/content/english/hpc/pipelining/branching.md b/content/english/hpc/pipelining/branching.md
index 849e75a0..08d7887d 100644
--- a/content/english/hpc/pipelining/branching.md
+++ b/content/english/hpc/pipelining/branching.md
@@ -45,17 +45,17 @@ body:
     jmp  counter
 ```
 
-Our goal is to simulate a completely unpredictable branch, and we successfully achieve it: the code takes ~14 CPU cycles per element. For a very rough estimate of what it is supposed to be, we can assume that the branches alternate between "<" and ">=", and the pipeline is mispredicted every other iteration. Then, every two iterations:
+Our goal is to simulate a completely unpredictable branch, and we successfully achieve it: the code takes ~14 CPU cycles per element. For a very rough estimate of what it is supposed to be, we can assume that the branches alternate between `<` and `>=`, and the pipeline is mispredicted every other iteration. Then, every two iterations:
 
-- We discard the pipeline, which is 19 cycles deep on Zen 2 (i. e. it has 19 stages, each taking one cycle).
+- We discard the pipeline, which is 19 cycles deep on Zen 2 (i.e., it has 19 stages, each taking one cycle).
 - We need a memory fetch and a comparison, which costs ~5 cycles. We can check the conditions of even and odd iterations concurrently, so let's assume we only pay it once per 2 iterations.
-- In the case of the "<" branch, we need another ~4 cycles to add `a[i]` to a volatile (memory-stored) variable `s`.
+- In the case of the `<` branch, we need another ~4 cycles to add `a[i]` to a volatile (memory-stored) variable `s`.
 
 Therefore, on average, we need to spend $(4 + 5 + 19) / 2 = 14$ cycles per element, matching what we measured.
 
 ### Branch Prediction
 
-We can replace the hardcoded `50` with a tweakable parameter `P` that effectively sets the probability of the "<" branch:
+We can replace the hardcoded `50` with a tweakable parameter `P` that effectively sets the probability of the `<` branch:
 
 ```c++
 for (int i = 0; i < N; i++)
@@ -69,7 +69,7 @@ Now, if we benchmark it for different values of `P`, we get an interesting-looki
 
 Its peak is at 50-55%, as expected: branch misprediction is the most expensive thing here. This graph is asymmetrical: it takes just ~1 cycle to only check conditions that are never satisfied (`P = 0`), and ~7 cycles for the sum if the branch is always taken (`P = 100`).
 
-This graph is not unimodal: there is another local minimum at around 85-90%. We spend ~6.15 cycles per element there or about 10-15% faster than when we always take the branch, accounting for the fact that we need to perform fewer additions. Branch misprediction stops affecting the performance at this point because when it happens, not the whole instruction buffer is discarded, but only the operations that were speculatively scheduled. Essentially, that 10-15% mispredict rate is the equilibrium point where we can see far enough in the pipeline not to stall but still save 10-15% on taking the cheaper ">=" branch.
+This graph is not unimodal: there is another local minimum at around 85-90%. We spend ~6.15 cycles per element there or about 10-15% faster than when we always take the branch, accounting for the fact that we need to perform fewer additions. Branch misprediction stops affecting the performance at this point because when it happens, not the whole instruction buffer is discarded, but only the operations that were speculatively scheduled. Essentially, that 10-15% mispredict rate is the equilibrium point where we can see far enough in the pipeline not to stall but still save 10-15% on taking the cheaper `>=` branch.
 
 Note that it costs almost nothing to check for a condition that never or almost never occurs. This is why programmers use runtime exceptions and base case checks so profusely: if they are indeed rare, they don't really cost anything.
 
@@ -86,9 +86,9 @@ for (int i = 0; i < N; i++)
 std::sort(a, a + n);
 ```
 
-We are still processing the same elements, but in a different order, and instead of 14 cycles, it now runs in a little bit more than 4, which is exactly the average of the cost of the pure "<" and ">=" branches.
+We are still processing the same elements, but in a different order, and instead of 14 cycles, it now runs in a little bit more than 4, which is exactly the average of the cost of the pure `<` and `>=` branches.
 
-The branch predictor can pick up on much more complicated patterns than just "always left, then always right" or "left-right-left-right". If we just decrease the size of the array $N$ to 1000 (without sorting it), then the branch predictor memorizes the entire sequence of comparisons, and the benchmark again measures at around 4 cycles — in fact, even slightly fewer than in the sorted array case, because in the former case branch predictor needs to spend some time flicking between the "always yes" and "always no" states.
+The branch predictor can pick up on much more complicated patterns than just "always left, then always right" or "left-right-left-right." If we just decrease the size of the array $N$ to 1000 (without sorting it), then the branch predictor memorizes the entire sequence of comparisons, and the benchmark again measures at around 4 cycles — in fact, even slightly fewer than in the sorted array case, because in the former case branch predictor needs to spend some time flicking between the "always yes" and "always no" states.
 
 ### Hinting Likeliness of Branches
 
diff --git a/content/english/hpc/pipelining/branchless.md b/content/english/hpc/pipelining/branchless.md
index 62f0aa2f..31bd5a39 100644
--- a/content/english/hpc/pipelining/branchless.md
+++ b/content/english/hpc/pipelining/branchless.md
@@ -28,11 +28,11 @@ for (int i = 0; i < N; i++)
     s += (a[i] < 50) * a[i];
 ```
 
-Suddenly, the loop now takes ~7 cycles per element instead of the original ~14. Also, the performance remains constant if we change `50` to some other threshold, so it doesn't depend on the branch probability.
+The loop now takes ~7 cycles per element instead of the original ~14. Also, the performance remains constant if we change `50` to some other threshold, so it doesn't depend on the branch probability.
 
 But wait… shouldn't there still be a branch? How does `(a[i] < 50)` map to assembly?
 
-There are no boolean types in assembly, nor any instructions that yield either one or zero based on the result of the comparison, but we can compute it indirectly like this: `(a[i] - 50) >> 31`. This trick relies on the [binary representation of integers](/hpc/arithmetic/integer), specifically on the fact that if the expression `a[i] - 50` is negative (implying `a[i] < 50`), then the highest bit of the result will be set to one, which we can then extract using a right-shift.
+There are no Boolean types in assembly, nor any instructions that yield either one or zero based on the result of the comparison, but we can compute it indirectly like this: `(a[i] - 50) >> 31`. This trick relies on the [binary representation of integers](/hpc/arithmetic/integer), specifically on the fact that if the expression `a[i] - 50` is negative (implying `a[i] < 50`), then the highest bit of the result will be set to one, which we can then extract using a right-shift.
 
 ```nasm
 mov  ebx, eax   ; t = x
@@ -41,7 +41,7 @@ sar  ebx, 31    ; t >>= 31
 imul  eax, ebx   ; x *= t
 ```
 
-Another, more complicated way to implement this whole sequence is to convert this sign bit into a mask and then use bitwise `and` instead of multiplication: `((a[i] - 50) >> 1 - 1) & a`. This makes the whole sequence one cycle faster, considering that unlike other instructions, `imul` takes 3 cycles:
+Another, more complicated way to implement this whole sequence is to convert this sign bit into a mask and then use bitwise `and` instead of multiplication: `((a[i] - 50) >> 31 - 1) & a[i]`. This makes the whole sequence one cycle faster, considering that, unlike other instructions, `imul` takes 3 cycles:
 
 ```nasm
 mov  ebx, eax   ; t = x
@@ -91,9 +91,9 @@ $$
 
 This way you can eliminate branching, but this comes at the cost of evaluating *both* branches and the `cmov` itself. Because evaluating the ">=" branch costs nothing, the performance is exactly equal to [the "always yes" case](../branching/#branch-prediction) in the branchy version.
 
-### When It Is Beneficial
+### When Predication Is Beneficial
 
-Using predication eliminates [a structural hazard](../hazard) but introduces a data hazard. There is still a pipeline stall, but it is a cheaper one: you only need to wait for `cmov` to be resolved and not flush the entire pipeline in case of a mispredict.
+Using predication eliminates [a control hazard](../hazards) but introduces a data hazard. There is still a pipeline stall, but it is a cheaper one: you only need to wait for `cmov` to be resolved and not flush the entire pipeline in case of a mispredict.
 
 However, there are many situations when it is more efficient to leave branchy code as it is. This is the case when the cost of computing *both* branches instead of just *one* outweighs the penalty for the potential branch mispredictions.
 
@@ -101,9 +101,9 @@ In our example, the branchy code wins when the branch can be predicted with a pr
 
 ![](../img/branchy-vs-branchless.svg)
 
-This 75% threshold is commonly used by the compilers as a heuristic for determining whether to use the `cmov` or not. Unfortunately, this probability is usually unknown at the compile-time, so it needs to be provided in one of several ways:
+This 75% threshold is commonly used by the compilers as a heuristic for determining whether to use the `cmov` or not. Unfortunately, this probability is usually unknown at the compile time, so it needs to be provided in one of several ways:
 
-- We can use [profile-guided optimization](/hpc/compilation/pgo) which will decide for itself whether to use predication or not.
+- We can use [profile-guided optimization](/hpc/compilation/situational/#profile-guided-optimization) which will decide for itself whether to use predication or not.
 - We can use [likeliness attributes](../branching#hinting-likeliness-of-branches) and [compiler-specific intrinsics](/hpc/compilation/situational) to hint at the likeliness of branches: `__builtin_expect_with_probability` in GCC and `__builtin_unpredictable` in Clang.
 - We can rewrite branchy code using the ternary operator or various arithmetic tricks, which acts as sort of an implicit contract between programmers and compilers: if the programmer wrote the code this way, then it was probably meant to be branchless.
 
@@ -180,11 +180,11 @@ int abs(int a) {
 
 ### Larger Examples
 
-**Strings.** Oversimplifying things, an `std::string` is comprised of a pointer to a null-terminated char array (also known as "C-string") allocated somewhere on the heap and one integer containing the string size.
+**Strings.** Oversimplifying things, an `std::string` is comprised of a pointer to a null-terminated `char` array (also known as a "C-string") allocated somewhere on the heap and one integer containing the string size.
 
-A very common value for strings is the empty string — which is also its default value. You also need to handle them somehow, and the idiomatic thing to do is to assign `nullptr` as the pointer and `0` as the string size, and then check if the pointer is null or if the size is zero at the beginning of every procedure involving strings.
+A common value for a string is the empty string — which is also its default value. You also need to handle them somehow, and the idiomatic approach is to assign `nullptr` as the pointer and `0` as the string size, and then check if the pointer is null or if the size is zero at the beginning of every procedure involving strings.
 
-However, this requires a separate branch, which is costly unless most strings are empty. What we can do to get rid of it is to allocate a "zero C-string", which is just a zero byte allocated somewhere, and then simply point all empty strings there. Now all string operations with empty strings have to read this useless zero byte, but this is still much cheaper than a branch misprediction.
+However, this requires a separate branch, which is costly (unless the majority of strings are either empty or non-empty). To remove the check and thus also the branch, we can allocate a "zero C-string," which is just a zero byte allocated somewhere, and then simply point all empty strings there. Now all string operations with empty strings have to read this useless zero byte, but this is still much cheaper than a branch misprediction.
 
 **Binary search.** The standard binary search [can be implemented](/hpc/data-structures/binary-search) without branches, and on small arrays (that fit into cache) it works ~4x faster than the branchy `std::lower_bound`:
 
@@ -193,10 +193,10 @@ int lower_bound(int x) {
     int *base = t, len = n;
     while (len > 1) {
         int half = len / 2;
-        base = (base[half] < x ? &base[half] : base);
+        base += (base[half - 1] < x) * half; // will be replaced with a "cmov"
         len -= half;
     }
-    return *(base + (*base < x));
+    return *base;
 }
 ```
 
@@ -216,9 +216,9 @@ That there are no substantial reasons why compilers can't do this on their own,
 
 -->
 
-**Data-parallel programming.** Branchless programming is very important for [SIMD](/hpc/simd) applications, including GPU programming, because they don't have branching in the first place.
+**Data-parallel programming.** Branchless programming is very important for [SIMD](/hpc/simd) applications because they don't have branching in the first place.
 
-In our array sum example, if you remove the `volatile` type qualifier from the accumulator, the compiler becomes able to [vectorize](/hpc/simd/auto-vectorization) the loop:
+In our array sum example, removing the `volatile` type qualifier from the accumulator allows the compiler to [vectorize](/hpc/simd/auto-vectorization) the loop:
 
 ```c++
 /* volatile */ int s = 0;
@@ -230,7 +230,7 @@ for (int i = 0; i < N; i++)
 
 It now works in ~0.3 per element, which is mainly [bottlenecked by the memory](/hpc/cpu-cache/bandwidth).
 
-The compiler is usually able to vectorize any loop that doesn't have branches or dependencies between the iterations — and some specific deviations from that, such as [reductions](/hpc/simd/reduction) or simple loops that contain just one if-without-else. Vectorization of anything more complex is a very nontrivial problem, which may involve various techniques such as [masking](/hpc/simd/masking) and [in-register permutations](/hpc/simd/shuffling).
+The compiler is usually able to vectorize any loop that doesn't have branches or dependencies between the iterations — and some specific small deviations from that, such as [reductions](/hpc/simd/reduction) or simple loops that contain just one if-without-else. Vectorization of anything more complex is a very nontrivial problem, which may involve various techniques such as [masking](/hpc/simd/masking) and [in-register permutations](/hpc/simd/shuffling).
 
 <!--
 
diff --git a/content/english/hpc/pipelining/hazards.md b/content/english/hpc/pipelining/hazards.md
index 02a0869d..872b59de 100644
--- a/content/english/hpc/pipelining/hazards.md
+++ b/content/english/hpc/pipelining/hazards.md
@@ -8,7 +8,7 @@ published: true
 
 There are multiple ways this may happen:
 
-* A *structural hazard* happens when two or more instructions need the same part of CPU (e. g. an execution unit).
+* A *structural hazard* happens when two or more instructions need the same part of CPU (e.g., an execution unit).
 * A *data hazard* happens when you have to wait for an operand to be computed from some previous step.
 * A *control hazard* happens when a CPU can't tell which instructions it needs to execute next.
 
@@ -20,6 +20,6 @@ Different hazards have different penalties:
 
 - In structural hazards, you have to wait (usually one more cycle) until the execution unit is ready. They are fundamental bottlenecks on performance and can't be avoided — you have to engineer around them.
 - In data hazards, you have to wait for the required data to be computed (the latency of the *critical path*). Data hazards are solved by restructuring computations so that the critical path is shorter.
-- In control hazards, you generally have to flush the entire pipeline and start over, wasting whole 15-20 cycles. They are solved by either removing branches completely, or making them predictable so that the CPU can effectively *speculate* on what is going to be executed next.
+- In control hazards, you generally have to flush the entire pipeline and start over, wasting a whole 15-20 cycles. They are solved by either removing branches completely, or making them predictable so that the CPU can effectively *speculate* on what is going to be executed next.
 
-As they have very different impact on performance, we are going to go in the reversed order and start with the more grave ones.
+As they have very different impacts on performance, we are going to go in the reversed order and start with the more grave ones.
diff --git a/content/english/hpc/pipelining/scheduling.md b/content/english/hpc/pipelining/scheduling.md
index da380037..0cda777c 100644
--- a/content/english/hpc/pipelining/scheduling.md
+++ b/content/english/hpc/pipelining/scheduling.md
@@ -14,7 +14,7 @@ As there are many different instructions, It is very common for programs to have
 
 <!-- Pipeline of a superscalar CPU with the width of 2 img/superscalar.png -->
 
-Interleaving the stages of execution is a general idea in digital electronics, and it is applied not only in the main CPU pipeline, but also on the level of separate instructions and [memory](/hpc/cpu-cache/mlp). Most execution units have their own little pipelines, and can take another instruction just one or two cycles after the previous one. If a certain instruction is frequently used, it makes sense to duplicate its execution unit also, and also place frequently jointly used instructions on the same execution unit: e. g. not using the same for arithmetic and memory operation.
+Interleaving the stages of execution is a general idea in digital electronics, and it is applied not only in the main CPU pipeline, but also on the level of separate instructions and [memory](/hpc/cpu-cache/mlp). Most execution units have their own little pipelines, and can take another instruction just one or two cycles after the previous one. If a certain instruction is frequently used, it makes sense to duplicate its execution unit also, and also place frequently jointly used instructions on the same execution unit: e.g., not using the same for arithmetic and memory operation.
 
 ### Microcode
 
@@ -22,9 +22,9 @@ While complex instruction sets had the benefit, with superscalar processors you
 
 Instructions are microcoded.
 
-uOps ("micro-ops", the first letter is meant to be greek letter mu as in us (microsecond), but nobody cares enough to type it).
+uOps ("micro-ops," the first letter is meant to be greek letter mu as in us (microsecond), but nobody cares enough to type it).
 
-Each architecture has its own set of "ports", each capable of executing its own set of instructions (uOps, to be more exact).
+Each architecture has its own set of "ports," each capable of executing its own set of instructions (uOps, to be more exact).
 
 But still, when you use it, it appears and feels like a single instruction. How does CPU achieve that?
 
diff --git a/content/english/hpc/pipelining/tables.md b/content/english/hpc/pipelining/tables.md
index d18d99c6..ad90c400 100644
--- a/content/english/hpc/pipelining/tables.md
+++ b/content/english/hpc/pipelining/tables.md
@@ -14,7 +14,7 @@ In this context, it makes sense to use two different "[costs](/hpc/complexity)"
 
 <!-- alternative throughput definitions, maybe in scheduling? -->
 
-You can get latency and throughput numbers for a specific architecture from special documents called [instruction tables](https://www.agner.org/optimize/instruction_tables.pdf). Here are some samples values for my Zen 2 (all specified for 32-bit operands, if there is any difference):
+You can get latency and throughput numbers for a specific architecture from special documents called [instruction tables](https://www.agner.org/optimize/instruction_tables.pdf). Here are some sample values for my Zen 2 (all specified for 32-bit operands, if there is any difference):
 
 | Instruction | Latency | RThroughput |
 |-------------|---------|:------------|
@@ -30,11 +30,11 @@ You can get latency and throughput numbers for a specific architecture from spec
 
 Some comments:
 
-- Because our minds are so used to the cost model where "more" means "worse", people mostly use *reciprocals* of throughput instead of throughput.
+- Because our minds are so used to the cost model where "more" means "worse," people mostly use *reciprocals* of throughput instead of throughput.
 - If a certain instruction is especially frequent, its execution unit could be duplicated to increase its throughput — possibly to even more than one, but not higher than the [decode width](/hpc/architecture/layout).
 - Some instructions have a latency of 0. This means that these instruction are used to control the scheduler and don't reach the execution stage. They still have non-zero reciprocal throughput because the [CPU front-end](/hpc/architecture/layout) still needs to process them.
-- Most instructions are pipelined, and if they have the reciprocal throughput of $n$, this usually means that their execution unit can take another instruction after $n$ cycles (and if it is below 1, this means that there are multiple execution units, all capable of taking another instruction on the next cycle). One notable exception is the [integer division](/hpc/arithmetic/division): it is either very poorly pipelined or not pipelined at all.
-- Some instructions have variable latency, depending on not only the size, but also the values of the operands. For memory operations (including fused ones like `add`), latency is usually specified for the best case (an L1 cache hit).
+- Most instructions are pipelined, and if they have the reciprocal throughput of $n$, this usually means that their execution unit can take another instruction after $n$ cycles (and if it is below 1, this means that there are multiple execution units, all capable of taking another instruction on the next cycle). One notable exception is [integer division](/hpc/arithmetic/division): it is either very poorly pipelined or not pipelined at all.
+- Some instructions have variable latency, depending on not only the size, but also the values of the operands. For memory operations (including fused ones like `add`), the latency is usually specified for the best case (an L1 cache hit).
 
 There are many more important little details, but this mental model will suffice for now.
 
diff --git a/content/english/hpc/pipelining/throughput.md b/content/english/hpc/pipelining/throughput.md
index ffb6b762..0b596404 100644
--- a/content/english/hpc/pipelining/throughput.md
+++ b/content/english/hpc/pipelining/throughput.md
@@ -6,7 +6,7 @@ weight: 4
 Optimizing for *latency* is usually quite different from optimizing for *throughput*:
 
 - When optimizing data structure queries or small one-time or branchy algorithms, you need to [look up the latencies](../tables) of its instructions, mentally construct the execution graph of the computation, and then try to reorganize it so that the critical path is shorter. <!-- [Binary GCD](/hpc/algorithms/gcd) is a good example of that. -->
-- When optimizing hot loops and large-dataset algorithms, you need to look up the throughputs of its instructions, count how many times each one is used per iteration, determine which of them is the bottleneck, and then try to restructure the loop so that it is used less often.
+- When optimizing hot loops and large-dataset algorithms, you need to look up the throughputs of their instructions, count how many times each one is used per iteration, determine which of them is the bottleneck, and then try to restructure the loop so that it is used less often.
 
 The last advice only works for *data-parallel* loops, where each iteration is fully independent of the previous one. When there is some interdependency between consecutive iterations, there may potentially be a pipeline stall caused by a [data hazard](../hazards) as the next iteration is waiting for the previous one to complete.
 
@@ -21,7 +21,7 @@ for (int i = 0; i < n; i++)
     s += a[i];
 ```
 
-Let's assume for a moment that the compiler doesn't [vectorize](/hpc/simd) this loop, [the memory bandwidth](/hpc/memory/bandwidth) isn't a concern, and that the loop is [unrolled](/hpc/architecture/loops) so that we don't pay any additional cost associated with maintaining the loop variables. In this case, the computation becomes very simple:
+Let's assume for a moment that the compiler doesn't [vectorize](/hpc/simd) this loop, [the memory bandwidth](/hpc/cpu-cache/bandwidth) isn't a concern, and that the loop is [unrolled](/hpc/architecture/loops) so that we don't pay any additional cost associated with maintaining the loop variables. In this case, the computation becomes very simple:
 
 ```c++
 int s = 0;
@@ -64,7 +64,7 @@ If an instruction has a latency of $x$ and a throughput of $y$, then you would n
 
 This technique is mostly used with [SIMD](/hpc/simd) and not in scalar code. You can [generalize](/hpc/simd/reduction) the code above and compute sums and other reductions faster than the compiler.
 
-In general, when optimizing loops, you usually have just one or a few *execution ports* that you want to utilize to their fullest, and you engineer the rest of the loop around them. As different instructions may use different sets of ports, it is not always clear which one is going to be the overused. In situations like this, [machine code analyzers](/hpc/profiling/mca) can be very helpful for finding bottlenecks of small assembly loops.
+In general, when optimizing loops, you usually have just one or a few *execution ports* that you want to utilize to their fullest, and you engineer the rest of the loop around them. As different instructions may use different sets of ports, it is not always clear which one is going to be overused. In situations like this, [machine code analyzers](/hpc/profiling/mca) can be very helpful for finding the bottlenecks of small assembly loops.
 
 <!--
 
@@ -84,7 +84,7 @@ Bandwidth is the rate at which data can be read or stored. For the purpose of de
 
 In the previous version, we have an inherently sequential chain of operations in the innermost loop. We accumulate the minimum in variable v by a sequence of min operations. There is no way to start the second operation before we know the result of the first operation; there is no room for parallelism here:
 
-The result will be clearly the same, but we are calculating the operations in a different order. In essence, we split the work in two independent parts, calculating the minimum of odd elements and the minimum of even elements, and finally combining the results. If we calculate the odd minimum v0 and even minimum v1 in an interleaved manner, as shown above, we will have more opportunities for parallelism. For example, the 1st and 2nd operation could be calculated simultaneously in parallel (or they could be executed in a pipelined fashion in the same execution unit). Once these results are available, the 3rd and 4th operation could be calculated simultaneously in parallel, etc. We could potentially obtain a speedup of a factor of 2 here, and naturally the same idea could be extended to calculating e.g. 4 minimums in an interleaved fashion.
+The result will be clearly the same, but we are calculating the operations in a different order. In essence, we split the work in two independent parts, calculating the minimum of odd elements and the minimum of even elements, and finally combining the results. If we calculate the odd minimum v0 and even minimum v1 in an interleaved manner, as shown above, we will have more opportunities for parallelism. For example, the 1st and 2nd operation could be calculated simultaneously in parallel (or they could be executed in a pipelined fashion in the same execution unit). Once these results are available, the 3rd and 4th operation could be calculated simultaneously in parallel, etc. We could potentially obtain a speedup of a factor of 2 here, and naturally the same idea could be extended to calculating, e.g., 4 minimums in an interleaved fashion.
 
 Instruction-level parallelism is automatic Now that we know how to reorganize calculations so that there is potential for parallelism, we will need to know how to realize the potential. For example, if we have these two operations in the C++ code, how do we tell the computer that the operations can be safely executed in parallel?
 
diff --git a/content/english/hpc/preface.md b/content/english/hpc/preface.md
index 28adae07..2e18e715 100644
--- a/content/english/hpc/preface.md
+++ b/content/english/hpc/preface.md
@@ -19,6 +19,22 @@ There are a lot of forward references I couldn't get rid of.
 
 Read some of the SIMD and memory chapter first.
 
+Chapter 1 is a "why you should care" sort of read.
+
+Chapter 2 is an introduction to computer architectures from the perspective of performance. There is a high chance that you already know it from a college course, but I still advise to read it to get into context, as we will cover assembly-level optimization techniques there.
+
+Chapter 3 is where experienced programmers should start from.
+
+Chapter 4 discusses compilation with the example of C++ and GCC/Clang. Chapter 5 discusses language-agnostic profiling methods. You are free to skip both.
+
+Chapter 6 discusses arithmetic and chapter 7 discusses modular arithmetic and its applications. They also acts as a sort of reference for algorithms in the case studies.
+
+Chapter 8 introduces the external memory model and how the memory system works. Chapter 9 follows up with experimental studies of how it can affect performance.
+
+Chapters 10 discusses SIMD programming, which is a major part. It is not *that* intertwined with the preivous ones, and if you are feeling comfortable, I'd suggest that you start reading with it because it will teach you powerful techniques right away.
+
+Chapters 11-12 contain case studies of complex algorithms. Performance engineering is a practical field, so you should learn from major examples.
+
 The first 5 chapters build up general understanding of performance.
 
 Chapters 6-10 go deeper into modern features. Arithmetic, number theory (the techniques that are also relevant outside of it). Some are theoretic, and then applied in practice.
diff --git a/content/english/hpc/profiling/_index.md b/content/english/hpc/profiling/_index.md
index 0b7ca30f..ceca0f2f 100644
--- a/content/english/hpc/profiling/_index.md
+++ b/content/english/hpc/profiling/_index.md
@@ -10,7 +10,7 @@ There are many different types of profilers. I like to think about them by analo
 
 - When objects are on a micrometer scale, they use optical microscopes.
 - When objects are on a nanometer scale, and light no longer interacts with them, they use electron microscopes.
-- When objects are smaller than that (e. g. the insides of an atom), they resort to theories and assumptions about how things work (and test these assumptions using intricate and indirect experiments).
+- When objects are smaller than that (e.g., the insides of an atom), they resort to theories and assumptions about how things work (and test these assumptions using intricate and indirect experiments).
 
 Similarly, there are three main profiling techniques, each operating by its own principles, having distinct areas of applicability, and allowing for different levels of precision:
 
diff --git a/content/english/hpc/profiling/benchmarking.md b/content/english/hpc/profiling/benchmarking.md
index 7357f451..2be61235 100644
--- a/content/english/hpc/profiling/benchmarking.md
+++ b/content/english/hpc/profiling/benchmarking.md
@@ -51,7 +51,7 @@ int main() {
 }
 ```
 
-This is a very low-overhead method that lets you run more experiments and [get more accurate results](../noise) from them. You still have to perform some repeated actions, but they can be largely automated with frameworks, [Google benchmark library](https://github.com/google/benchmark) being the most popular choice for C++. Some programming languages also have handy built-in tools for benchmarking: special mention here goes to [Python's timeit function](https://docs.python.org/3/library/timeit.html) and [Julia's @benckmark macro](https://github.com/JuliaCI/BenchmarkTools.jl).
+This is a very low-overhead method that lets you run more experiments and [get more accurate results](../noise) from them. You still have to perform some repeated actions, but they can be largely automated with frameworks, [Google benchmark library](https://github.com/google/benchmark) being the most popular choice for C++. Some programming languages also have handy built-in tools for benchmarking: special mention here goes to [Python's timeit function](https://docs.python.org/3/library/timeit.html) and [Julia's @benchmark macro](https://github.com/JuliaCI/BenchmarkTools.jl).
 
 Although *efficient* in terms of execution speed, C and C++ are not the most *productive* languages, especially when it comes to analytics. When your algorithm depends on some parameters such as the input size, and you need to collect more than just one data point from each implementation, you really want to integrate your benchmarking code with the outside environment and analyze the results using something else.
 
@@ -59,7 +59,7 @@ Although *efficient* in terms of execution speed, C and C++ are not the most *pr
 
 One way to improve modularity and reusability is to separate all testing and analytics code from the actual implementation of the algorithm, and also make it so that different versions are implemented in separate files, but have the same interface.
 
-In C/C++, you can do this by creating a single header file (e. g. `gcd.hh`) with a function interface and all its benchmarking code in `main`:
+In C/C++, you can do this by creating a single header file (e.g., `gcd.hh`) with a function interface and all its benchmarking code in `main`:
 
 ```c++
 int gcd(int a, int b); // to be implemented
@@ -93,7 +93,7 @@ int main() {
 }
 ```
 
-Then you create many implementation files for each algorithm version (e. g. `v1.cc`, `v2.cc` and so on, or some meaningful names if applicable) that all include that single header file:
+Then you create many implementation files for each algorithm version (e.g., `v1.cc`, `v2.cc`, and so on, or some meaningful names if applicable) that all include that single header file:
 
 ```c++
 #include "gcd.hh"
@@ -186,4 +186,4 @@ plt.plot(ns, [x / y for x, y in zip(baseline, results)])
 plt.show()
 ```
 
-Once established, this workflow makes you iterate much faster and just focus on optimizing the algorithm itself.
+Once established, this workflow makes you iterate much faster and focus on optimizing the algorithm itself.
diff --git a/content/english/hpc/profiling/events.md b/content/english/hpc/profiling/events.md
index 71ae9cd3..eb2ba613 100644
--- a/content/english/hpc/profiling/events.md
+++ b/content/english/hpc/profiling/events.md
@@ -93,7 +93,7 @@ Overhead  Command  Shared Object        Symbol
    0.80%  run      libc-2.33.so         [.] rand
 ```
 
-Note that, for each function, just its *overhead* is listed and not the total running time (e. g. `setup` includes `std::__introsort_loop` but only its own overhead is accounted as 3.43%). There are tools for constructing [flame graphs](https://www.brendangregg.com/flamegraphs.html) out of perf reports to make them more clear. You also need to account for possible inlining, which is apparently what happened with `std::lower_bound` here. Perf also tracks shared libraries (like `libc`) and, in general, any other spawned processes: if you want, you can launch a web browser with perf and see what's happening inside.
+Note that, for each function, just its *overhead* is listed and not the total running time (e.g., `setup` includes `std::__introsort_loop` but only its own overhead is accounted as 3.43%). There are tools for constructing [flame graphs](https://www.brendangregg.com/flamegraphs.html) out of perf reports to make them more clear. You also need to account for possible inlining, which is apparently what happened with `std::lower_bound` here. Perf also tracks shared libraries (like `libc`) and, in general, any other spawned processes: if you want, you can launch a web browser with perf and see what's happening inside.
 
 Next, you can "zoom in" on any of these functions, and, among others things, it will offer to show you its disassembly with an associated heatmap. For example, here is the assembly for `query`:
 
diff --git a/content/english/hpc/profiling/instrumentation.md b/content/english/hpc/profiling/instrumentation.md
index bdf3392b..a622e24a 100644
--- a/content/english/hpc/profiling/instrumentation.md
+++ b/content/english/hpc/profiling/instrumentation.md
@@ -1,8 +1,11 @@
 ---
 title: Instrumentation
 weight: 1
+published: true
 ---
 
+<!-- pv in Linux, pipes -->
+
 *Instrumentation* is an overcomplicated term that means inserting timers and other tracking code into programs. The simplest example is using the `time` utility in Unix-like systems to measure the duration of execution for the whole program.
 
 More generally, we want to know *which parts* of the program need optimization. There are tools shipped with compilers and IDEs that can time designated functions automatically, but it is more robust to do it by hand using any methods of interacting with time that the language provides:
@@ -77,4 +80,4 @@ void query() {
 
 This way we can remove the need to sample a new random number on each invocation, only resetting the counter when we choose to calculate statistics.
 
-Techniques like that are frequently by library algorithm developers inside large projects to collect profiling data without affecting the performance of the end program too much.
+Techniques like that are frequently used by library algorithm developers inside large projects to collect profiling data without affecting the performance of the end program too much.
diff --git a/content/english/hpc/profiling/mca.md b/content/english/hpc/profiling/mca.md
index 4634ba25..99cfe2ed 100644
--- a/content/english/hpc/profiling/mca.md
+++ b/content/english/hpc/profiling/mca.md
@@ -40,7 +40,7 @@ First, it outputs general information about the loop and the hardware:
 - It "ran" the loop 100 times, executing 400 instructions in total in 108 cycles, which is the same as executing $\frac{400}{108} \approx 3.7$ [instructions per cycle](/hpc/complexity/hardware) on average (IPC).
 - The CPU is theoretically capable of executing up to 6 instructions per cycle ([dispatch width](/hpc/architecture/layout)).
 - Each cycle in theory can be executed in 0.8 cycles on average ([block reciprocal throughput](/hpc/pipelining/tables)).
-- The "uOps" here are the micro-operations that CPU splits each instruction into (e. g. fused load-add is composed of two uOps).
+- The "uOps" here are the micro-operations that the CPU splits each instruction into (e.g., fused load-add is composed of two uOps).
 
 Then it proceeds to give information about each individual instruction: 
 
diff --git a/content/english/hpc/profiling/noise.md b/content/english/hpc/profiling/noise.md
index c530c160..b1b186ae 100644
--- a/content/english/hpc/profiling/noise.md
+++ b/content/english/hpc/profiling/noise.md
@@ -1,6 +1,7 @@
 ---
 title: Getting Accurate Results
 weight: 10
+published: true
 ---
 
 It is not an uncommon for there to be two library algorithm implementations, each maintaining its own benchmarking code, and each claiming to be faster than the other. This confuses everyone involved, especially the users, who have to somehow choose between the two.
@@ -11,7 +12,7 @@ Situations like these are usually not caused by fraudulent actions by their auth
 
 There are many things that can introduce bias into benchmarks.
 
-**Differing datasets.** There are many algorithms whose performance somehow depends on the dataset distribution. In order to define, for example, what the fastest sorting, shortest path, or binary search algorithms are, you have to fixing the dataset on which the algorithm is run.
+**Differing datasets.** There are many algorithms whose performance somehow depends on the dataset distribution. In order to define, for example, what the fastest sorting, shortest path, or binary search algorithms are, you have to fix the dataset on which the algorithm is run.
 
 This sometimes applies even to algorithms that process a single piece of input. For example, it is not a good idea to feed GCD implementations sequential numbers because it makes branches very predictable:
 
@@ -87,7 +88,7 @@ for (int i = 0; i < N; i++)
     checksum ^= lower_bound(checksum ^ q[i]);
 ```
 
-It usually makes the most difference in algorithms with possible pipeline stall issues, e. g. when comparing branchy and branch-free algorithms.
+It usually makes the most difference in algorithms with possible pipeline stall issues, e.g., when comparing branchy and branch-free algorithms.
 
 **Cold cache.** Another source of bias is the *cold cache effect*, when memory reads initially take longer time because the required data is not in cache yet.
 
@@ -111,7 +112,7 @@ for (int i = 0; i < N; i++)
     checksum ^= lower_bound(q[i]);
 ```
 
-It is also sometimes convenient to combine the warm-up run with answer validation, it if is more complicated than just computing some sort of checksum.
+It is also sometimes convenient to combine the warm-up run with answer validation, if it is more complicated than just computing some sort of checksum.
 
 **Over-optimization.** Sometimes the benchmark is outright erroneous because the compiler just optimized the benchmarked code away. To prevent the compiler from cutting corners, you need to add checksums and either print them somewhere or add the `volatile` qualifier, which also prevents any sort of interleaving of loop iterations.
 
@@ -127,10 +128,10 @@ https://github.com/sosy-lab/benchexec
 
 The issues we've described produce *bias* in measurements: they consistently give advantage to one algorithm over the other. There are other types of possible problems with benchmarking that result in either unpredictable skews or just completely random noise, thus increasing *variance*.
 
-These type of issues are caused by side effects and some sort of external noise, mostly due to noisy neighbors and CPU frequency scaling:
+These types of issues are caused by side effects and some sort of external noise, mostly due to noisy neighbors and CPU frequency scaling:
 
 - If you benchmark a compute-bound algorithm, measure its performance in cycles using `perf stat`: this way it will be independent of clock frequency, fluctuations of which is usually the main source of noise.
-- Otherwise, set core frequency to the what you expect it to be and make sure nothing interferes with it. On Linux you can do it with `cpupower` (e. g. `sudo cpupower frequency-set -g powersave` to put it to minimum or `sudo cpupower frequency-set -g ondemand` to enable turbo boost). I use a [convenient GNOME shell extension](https://extensions.gnome.org/extension/1082/cpufreq/) that has a separate button to do it.
+- Otherwise, set core frequency to what you expect it to be and make sure nothing interferes with it. On Linux you can do it with `cpupower` (e.g., `sudo cpupower frequency-set -g powersave` to put it to minimum or `sudo cpupower frequency-set -g ondemand` to enable turbo boost). I use a [convenient GNOME shell extension](https://extensions.gnome.org/extension/1082/cpufreq/) that has a separate button to do it.
 - If applicable, turn hyper-threading off and attach jobs to specific cores. Make sure no other jobs are running on the system, turn off networking and try not to fiddle with the mouse.
 
 You can't remove noises and biases completely. Even a program's name can affect its speed: the executable's name ends up in an environment variable, environment variables end up on the call stack, and so the length of the name affects stack alignment, which can result in data accesses slowing down due to crossing cache line or memory page boundaries.
diff --git a/content/english/hpc/profiling/simulation.md b/content/english/hpc/profiling/simulation.md
index 2f6c6dc6..75401b8a 100644
--- a/content/english/hpc/profiling/simulation.md
+++ b/content/english/hpc/profiling/simulation.md
@@ -50,7 +50,7 @@ Mispred rate:         22.0% (      22.5%     +        0.0%   )
 
 We've fed Cachegrind exactly the same example code as in [the previous section](../events): we create an array of a million random integers, sort it, and then perform a million binary searches on it. Cachegrind shows roughly the same numbers as perf does, except that that perf's measured numbers of memory reads and branches are slightly inflated due to [speculative execution](/hpc/pipelining): they really happen in hardware and thus increment hardware counters, but are discarded and don't affect actual performance, and thus ignored in the simulation.
 
-Cachegrind only models the first (`D1` for data, `I1` for instructions) and the last (`LL`, unified) levels of cache, the characteristics of which are inferred from the system. It doesn't limit you in any way as you can also set them from the command line, e. g. to model the L2 cache: `--LL=<size>,<associativity>,<line size>`.
+Cachegrind only models the first (`D1` for data, `I1` for instructions) and the last (`LL`, unified) levels of cache, the characteristics of which are inferred from the system. It doesn't limit you in any way as you can also set them from the command line, e g., to model the L2 cache: `--LL=<size>,<associativity>,<line size>`.
 
 It seems like it only slowed down our program so far and hasn't provided us any information that `perf stat` couldn't. To get more out of it than just the summary info, we can inspect a special file with profiling info, which it dumps by default in the same directory named as `cachegrind.out.<pid>`. It is human-readable, but is expected to be read via the `cg_annotate` command:
 
diff --git a/content/english/hpc/simd/_index.md b/content/english/hpc/simd/_index.md
index 988e83e8..50f6e3ed 100644
--- a/content/english/hpc/simd/_index.md
+++ b/content/english/hpc/simd/_index.md
@@ -29,7 +29,7 @@ Now, let's add the following magic directive in the very beginning:
 
 When compiled and run in the same environment, it finishes in 1.24 seconds. This is almost twice as fast, and we didn't change a single line of code or the optimization level.
 
-What happened here is we provided a little bit of info about the computer on which this code is supposed to be run. Specifically, we told the compiler that the target CPU supports an extension to the x86 instruction set called "AVX2". AVX2 is one of the many so-called "SIMD extensions" for x86. These extensions include instructions that operate on special registers capable of holding 128, 256, or even 512 bits of data using the "single instruction, multiple data" (SIMD) approach. Instead of working with a single scalar value, SIMD instructions divide the data in registers into blocks of 8, 16, 32, or 64 bits and perform the same operation on them in parallel, yielding a proportional increase in performance[^power].
+What happened here is we provided a little bit of info about the computer on which this code is supposed to be run. Specifically, we told the compiler that the target CPU supports an extension to the x86 instruction set called "AVX2." AVX2 is one of the many so-called "SIMD extensions" for x86. These extensions include instructions that operate on special registers capable of holding 128, 256, or even 512 bits of data using the "single instruction, multiple data" (SIMD) approach. Instead of working with a single scalar value, SIMD instructions divide the data in registers into blocks of 8, 16, 32, or 64 bits and perform the same operation on them in parallel, yielding a proportional increase in performance[^power].
 
 [^power]: On some CPUs, especially heavy SIMD instructions consume more energy and thus [require downclocking](https://blog.cloudflare.com/on-the-dangers-of-intels-frequency-scaling/) to balance off the total power consumption, so the real-time speedup is not always proportional.
 
@@ -43,6 +43,6 @@ In particular, AVX2 has instructions for working with 256-bit registers, while b
 
 ![](img/intel-extensions.webp)
 
-Compilers often do a good job rewriting simple loops with SIMD instructions, like in the case above. This optimization is called [auto-vectorization](auto-vectorization), and it is the preferred way to use SIMD.
+Compilers often do a good job rewriting simple loops with SIMD instructions, like in the case above. This optimization is called [auto-vectorization](auto-vectorization), and it is the most popular way of using SIMD.
 
 The problem is that it only works with certain types of loops, and even then it often yields suboptimal results. To understand its limitations, we need to get our hands dirty and explore this technology on a lower level, which is what we are going to do in this chapter.
diff --git a/content/english/hpc/simd/auto-vectorization.md b/content/english/hpc/simd/auto-vectorization.md
index 5fc568c3..b7b8a45f 100644
--- a/content/english/hpc/simd/auto-vectorization.md
+++ b/content/english/hpc/simd/auto-vectorization.md
@@ -1,15 +1,17 @@
 ---
-title: Auto-Vectorization
+title: Auto-Vectorization and SPMD
 weight: 10
 ---
 
-SIMD-parallelism is most often used for *embarrassingly parallel* computations: the kinds where all you do is apply some elementwise function to all elements of an array and write it back somewhere else. In this setting, you don't even need to know how SIMD works: the compiler is perfectly capable of optimizing such loops by itself — you just need to be aware that such optimization exists and that it usually yields a 5-10x speedup.
+SIMD parallelism is most often used for *embarrassingly parallel* computations: the kinds where all you do is apply some elementwise function to all elements of an array and write it back somewhere else. In this setting, you don't even need to know how SIMD works: the compiler is perfectly capable of optimizing such loops by itself — you just need to be aware that such optimization exists and that it usually yields a 5-10x speedup.
 
-Doing nothing and relying on auto-vectorization is actually the preferred way of using SIMD. Whenever you can, you should always stick with the scalar code for its simplicity and maintainability. But often even the loops that seem straightforward to vectorize are not optimized because of some technical nuances. [As in many other cases](/hpc/compilation/contracts), the compiler may need some additional input from the programmer as he may know a bit more about the problem than what can be inferred from static analysis.
+Doing nothing and relying on auto-vectorization is actually the most popular way of using SIMD. In fact, in many cases, it even advised to stick with the plain scalar code for its simplicity and maintainability.
+
+But often even the loops that seem straightforward to vectorize are not optimized because of some technical nuances. [As in many other cases](/hpc/compilation/contracts), the compiler may need some additional input from the programmer as he may know a bit more about the problem than what can be inferred from static analysis.
 
 ### Potential Problems
 
-Consider the "a + b" example:
+Consider the "a + b" example we [started with](../intrinsics/#simd-intrinsics):
 
 ```c++
 void sum(int *a, int *b, int *c, int n) {
@@ -47,8 +49,18 @@ for (int i = 0; i < n; i++)
 
 To help the compiler eliminate this corner case, we can use the `alignas` specifier on static arrays and the `std::assume_aligned` function to mark pointers aligned.
 
-**Checking if vectorization happened.**  In either case, it is useful to check if the compiler vectorized the loop the way you intended. You can either [compiling it to assembly](/hpc/compilation/stages) and look for blocks for instructions that start with a "v" or add the `-fopt-info-vec-optimized` compiler flag so that the compiler indicates where auto-vectorization is happening and what SIMD width is being used. If you swap `optimized` for `missed` or `all`, you may also get some reasoning behind why it is not happening in other places.
+**Checking if vectorization happened.** In either case, it is useful to check if the compiler vectorized the loop the way you intended. You can either [compiling it to assembly](/hpc/compilation/stages) and look for blocks for instructions that start with a "v" or add the `-fopt-info-vec-optimized` compiler flag so that the compiler indicates where auto-vectorization is happening and what SIMD width is being used. If you swap `optimized` for `missed` or `all`, you may also get some reasoning behind why it is not happening in other places.
 
----
+There are [many other ways](https://software.intel.com/sites/default/files/m/4/8/8/2/a/31848-CompilerAutovectorizationGuide.pdf) of telling the compiler exactly what we mean, but in especially complex cases — e.g., when there are a lot of branches or function calls inside the loop — it is easier to go one level of abstraction down and vectorize manually.
+
+### SPMD
+
+There is a neat compromise between auto-vectorization and the manual use of SIMD intrinsics: "single program, multiple data" (SPMD). This is a model of computation in which the programmer writes what appears to be a regular serial program, but that is actually executed in parallel on the hardware.  
+
+The programming experience is largely the same, and there is still the fundamental limitation in that the computation must be data-parallel, but SPMD ensures that the vectorization will happen regardless of the compiler and the target CPU architecture. It also allows for the computation to be automatically parallelized across multiple cores and, in some cases, even offloaded to other types of parallel hardware.
+
+There is support for SPMD is some modern languages ([Julia](https://docs.julialang.org/en/v1/base/base/#Base.SimdLoop.@simd)), multiprocessing APIs ([OpenMP](https://www.openmp.org/spec-html/5.0/openmpsu42.html)), and specialized compilers (Intel [ISPC](https://ispc.github.io/)), but it has seen the most success in the context of GPU programming where both problems and hardware are massively parallel.
+
+We will cover this model of computation in much more depth in Part 2
 
-There are [many other ways](https://software.intel.com/sites/default/files/m/4/8/8/2/a/31848-CompilerAutovectorizationGuide.pdf) of telling the compiler what we meant exactly, but in especially complex cases — when inside the loop there are a lot of branches or some functions are called — it is easier to go down to the intrinsics level and write it yourself.
+<!-- This approach is especially popular with [game developers](https://twitter.com/pbrubaker/status/1537041398037303296) because they need to support many platforms and have reliable performance, and also because it resembles the way graphics programming is done. -->
diff --git a/content/english/hpc/simd/intrinsics.md b/content/english/hpc/simd/intrinsics.md
index 0b2b8d32..4e9c6804 100644
--- a/content/english/hpc/simd/intrinsics.md
+++ b/content/english/hpc/simd/intrinsics.md
@@ -95,7 +95,7 @@ for (int i = 0; i < 100; i += 4) {
 
 The main challenge of using SIMD is getting the data into contiguous fixed-sized blocks suitable for loading into registers. In the code above, we may in general have a problem if the length of the array is not divisible by the block size. There are two common solutions to this:
 
-1. We can "overshoot" by iterating over the last incomplete segment either way. To make sure we don't segfault by trying to read from or write to a memory region we don't own, we need to pad the arrays to the nearest block size (typically with some "neutral" element, e. g. zero).
+1. We can "overshoot" by iterating over the last incomplete segment either way. To make sure we don't segfault by trying to read from or write to a memory region we don't own, we need to pad the arrays to the nearest block size (typically with some "neutral" element, e.g., zero).
 2. Make one iteration less and write a little loop in the end that calculates the remainder normally (with scalar operations).
 
 Humans prefer #1 because it is simpler and results in less code, and compilers prefer #2 because they don't really have another legal option.
@@ -135,13 +135,13 @@ Also, some of the intrinsics don't map to a single instruction but a short seque
 
 <!--
 
-For example, the group of `extract` intrinsics that are used to get individual elements out of vectors: e. g. `_mm256_extract_epi32(x, 0)` returns the first element out of 8-integer vector. t is quite slow (~5 cycles) to move data between "normal" and SIMD registers in general.
+For example, the group of `extract` intrinsics that are used to get individual elements out of vectors: e g., `_mm256_extract_epi32(x, 0)` returns the first element out of 8-integer vector. t is quite slow (~5 cycles) to move data between "normal" and SIMD registers in general.
 
 -->
 
 ### GCC Vector Extensions
 
-If you feel like the design of C intrinsics is terrible, you are not alone. are all generated by cats walking on keyboards. I've spent hundreds of hours writing SIMD code and reading the Intel Intrinsics Guide, and I still can't remember whether I need to type `_mm256` or `__m256`.
+If you feel like the design of C intrinsics is terrible, you are not alone. I've spent hundreds of hours writing SIMD code and reading the Intel Intrinsics Guide, and I still can't remember whether I need to type `_mm256` or `__m256`.
 
 Intrinsics are not only hard to use but also neither portable nor maintainable. In good software, you don't want to maintain different procedures for each CPU: you want to implement it just once, in an architecture-agnostic way.
 
@@ -156,7 +156,7 @@ typedef int v8si __attribute__ (( vector_size(32) ));
 
 Unfortunately, this is not a part of the C or C++ standard, so different compilers use different syntax for that.
 
-There is somewhat of a naming convention, which is to include size and type of elements into the name of the type: in the example above, we defined a "vector of 8 signed integers". But you may choose any name you want, like `vec`, `reg` or whatever. The only thing you don't want to do is to name it `vector` because of how much confusion there would be because of `std::vector`.
+There is somewhat of a naming convention, which is to include size and type of elements into the name of the type: in the example above, we defined a "vector of 8 signed integers." But you may choose any name you want, like `vec`, `reg` or whatever. The only thing you don't want to do is to name it `vector` because of how much confusion there would be because of `std::vector`.
 
 The main advantage of using these types is that for many operations you can use normal C++ operators instead of looking up the relevant intrinsic.
 
@@ -185,4 +185,13 @@ for (int i = 0; i < 100/4; i++)
     c[i] = a[i] + b[i];
 ```
 
-As you can see, vector extensions are much cleaner compared to the nightmare we have with intrinsic functions. But some things that we may want to do are just not expressible with native C++ constructs, so we will still need intrinsics. Luckily, this is not an exclusive choice, because vector types support zero-cost conversion to the `_mm` types and back. We will, however, try to avoid doing so as much as possible and stick to vector extensions when we can.
+As you can see, vector extensions are much cleaner compared to the nightmare we have with intrinsic functions. Their downside is that there are some things that we may want to do are just not expressible with native C++ constructs, so we will still need intrinsics for them. Luckily, this is not an exclusive choice, because vector types support zero-cost conversion to the `_mm` types and back:
+
+```c++
+v8f x;
+int mask = _mm256_movemask_ps((__m256) x)
+```
+
+There are also many third-party libraries for different languages that provide a similar capability to write portable SIMD code and also implement some, and just in general are nicer to use than both intrinsics and built-in vector types. Notable examples for C++ are [Highway](https://github.com/google/highway), [Expressive Vector Engine](https://github.com/jfalcou/eve), [Vector Class Library](https://github.com/vectorclass/version2), and [xsimd](https://github.com/xtensor-stack/xsimd).
+
+Using a well-established SIMD library is recommended as it greatly improves the developer experience. In this book, however, we will try to keep close to the hardware and mostly use intrinsics directly, occasionally switching to the vector extensions for simplicity when we can.
diff --git a/content/english/hpc/simd/masking.md b/content/english/hpc/simd/masking.md
index 332597c1..dbe71575 100644
--- a/content/english/hpc/simd/masking.md
+++ b/content/english/hpc/simd/masking.md
@@ -67,7 +67,7 @@ for (int i = 0; i < N; i += 8) {
 }
 ```
 
-This loop performs slightly faster because on this particular CPU, the vector `and` take one cycle less than `blend`.
+This loop performs slightly faster because on this particular CPU, the vector `and` takes one cycle less than `blend`.
 
 Several other instructions support masks as inputs, most notably:
 
diff --git a/content/english/hpc/simd/moving.md b/content/english/hpc/simd/moving.md
index e2cf3035..72cbbd33 100644
--- a/content/english/hpc/simd/moving.md
+++ b/content/english/hpc/simd/moving.md
@@ -1,5 +1,5 @@
 ---
-title: Loading and Writing Data
+title: Moving Data
 aliases: [/hpc/simd/vectorization]
 weight: 2
 ---
@@ -13,7 +13,7 @@ While using the elementwise instructions is easy, the largest challenge with SIM
 
 ### Aligned Loads and Stores
 
-Operations of reading and writing the contents of a SIMD register into memory have two versions each: `load` / `loadu` and `store` / `storeu`. The letter "u" here stands for "unaligned". The difference is that the former ones only work correctly when the read / written block fits inside a single [cache line](/hpc/cpu-cache/cache-lines) (and crash otherwise), while the latter work either way, but with a slight performance penalty if the block crosses a cache line.
+Operations of reading and writing the contents of a SIMD register into memory have two versions each: `load` / `loadu` and `store` / `storeu`. The letter "u" here stands for "unaligned." The difference is that the former ones only work correctly when the read / written block fits inside a single [cache line](/hpc/cpu-cache/cache-lines) (and crash otherwise), while the latter work either way, but with a slight performance penalty if the block crosses a cache line.
 
 Sometimes, especially when the "inner" operation is very lightweight, the performance difference becomes significant (at least because you need to fetch two cache lines instead of one). As an extreme example, this way of adding two arrays together:
 
@@ -39,7 +39,7 @@ for (int i = 0; i < n; i += 8) {
 
 In the first version, assuming that arrays `a`, `b` and `c` are all 64-byte *aligned* (the addresses of their first elements are divisible by 64, and so they start at the beginning of a cache line), roughly half of reads and writes will be "bad" because they cross a cache line boundary.
 
-Note that the performance difference is caused by the cache system and not by the instructions themselves. On most modern architectures, the `loadu` / `storeu` intrinsics should be equally as fast as `load` / `store` given that in both cases the blocks only span one cache line. The advantage of the latter is that they can act as free run-time assertions that all reads and writes are aligned.
+Note that the performance difference is caused by the cache system and not by the instructions themselves. On most modern architectures, the `loadu` / `storeu` intrinsics should be equally as fast as `load` / `store` given that in both cases the blocks only span one cache line. The advantage of the latter is that they can act as free run time assertions that all reads and writes are aligned.
 
 This makes it important to properly [align](/hpc/cpu-cache/alignment) arrays and other data on allocation, and it is also one of the reasons why compilers can't always [auto-vectorize](../auto-vectorization) efficiently. For most purposes, we only need to guarantee that any 32-byte SIMD block will not cross a cache line boundary, and we can specify this alignment with the `alignas` specifier:
 
diff --git a/content/english/hpc/simd/reduction.md b/content/english/hpc/simd/reduction.md
index 28fb4d9c..89678103 100644
--- a/content/english/hpc/simd/reduction.md
+++ b/content/english/hpc/simd/reduction.md
@@ -1,9 +1,9 @@
 ---
-title: Sums and Other Reductions
+title: Reductions
 weight: 3
 ---
 
-*Reduction* (also known as *folding* in functional programming) is the action of computing the value of some associative and commutative operation (i.e. $(a \circ b) \circ c = a \circ (b \circ c)$ and $a \circ b = b \circ a$) over a range of arbitrary elements.
+*Reduction* (also known as *folding* in functional programming) is the action of computing the value of some associative and commutative operation (i.e., $(a \circ b) \circ c = a \circ (b \circ c)$ and $a \circ b = b \circ a$) over a range of arbitrary elements.
 
 The simplest example of reduction is calculating the sum an array:
 
@@ -46,58 +46,64 @@ int sum_simd(v8si *a, int n) {
 }
 ```
 
-You can use this approach for for other reductions, such as for finding the minimum or the xor-sum of an array.
-
-### Horizontal Summation
-
-The last part, where we sum up the 8 accumulators stored in a vector register into a single scalar to get the total sum, is called "horizontal summation".
-
-Although extracting and adding every scalar one by one only takes a constant number of cycles, it can be computed slightly faster using a [special instruction](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=AVX,AVX2&text=_mm256_hadd_epi32&expand=2941) that adds together pairs of adjacent elements in a register.
-
-![Horizontal summation in SSE/AVX. Note how the output is stored: the (a b a b) interleaving is common for reducing operations](../img/hsum.png)
-
-Since it is a very specific operation, it can only be done with SIMD intrinsics — although the compiler probably emits roughly the same procedure for the scalar code anyway:
-
-```c++
-int hsum(__m256i x) {
-    __m128i l = _mm256_extracti128_si256(x, 0);
-    __m128i h = _mm256_extracti128_si256(x, 1);
-    l = _mm_add_epi32(l, h);
-    l = _mm_hadd_epi32(l, l);
-    return _mm_extract_epi32(l, 0) + _mm_extract_epi32(l, 1);
-}
-```
-
-There are [other similar instructions](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=AVX,AVX2&ig_expand=3037,3009,5135,4870,4870,4872,4875,833,879,874,849,848,6715,4845&text=horizontal), e. g. for integer multiplication or calculating absolute differences between adjacent elements (used in image processing).
-
-There is also one specific instruction, `_mm_minpos_epu16`, that calculates the horizontal minimum and its index among eight 16-bit integers. This is the only horizontal reduction that works in one go: all others are computed in multiple steps.
+You can use this approach for other reductions, such as for finding the minimum or the xor-sum of an array.
 
 ### Instruction-Level Parallelism
 
-Our implementation matches what the compiler produces automatically, but it is actually [suboptimal](/hpc/pipelining/throughput): when we use just one accumulator, we have to wait one cycle between the loop iterations for vector addition to complete, while its throughput is 2 on this microarchitecture.
+Our implementation matches what the compiler produces automatically, but it is actually suboptimal: when we use just one accumulator, [we have to wait](/hpc/pipelining/throughput) one cycle between the loop iterations for a vector addition to complete, while the [throughput](/hpc/pipelining/tables/) of corresponding instruction is 2 on this microarchitecture.
 
 If we again divide the array in $B \geq 2$ parts and use a *separate* accumulator for each, we can saturate the throughput of vector addition and increase the performance twofold:
 
 ```c++
-const int B = 2;
+const int B = 2; // how many vector accumulators to use
 
 int sum_simd(v8si *a, int n) {
     v8si b[B] = {0};
 
-    for (int i = 0; i < n / 8; i += B)
+    for (int i = 0; i + (B - 1) < n / 8; i += B)
         for (int j = 0; j < B; j++)
             b[j] += a[i + j];
-    
+
+    // sum all vector accumulators into one
     for (int i = 1; i < B; i++)
         b[0] += b[i];
     
     int s = 0;
 
+    // sum 8 scalar accumulators into one
     for (int i = 0; i < 8; i++)
         s += b[0][i];
 
+     // add the remainder of a
+    for (int i = n / (8 * B) * (8 * B); i < n; i++)
+        s += a[i];
+
     return s;
 }
 ```
 
-If you have more than 2 relevant execution ports, you can increase `B` accordingly. But the n-fold performance increase will only apply to arrays that fit L1 cache — [memory bandwidth](/hpc/cpu-cache/bandwidth) will be the bottleneck for anything larger.
+If you have more than 2 relevant execution ports, you can increase the `B` constant accordingly, but the $n$-fold performance increase will only apply to arrays that fit into L1 cache — [memory bandwidth](/hpc/cpu-cache/bandwidth) will be the bottleneck for anything larger.
+
+### Horizontal Summation
+
+The part where we sum up the 8 accumulators stored in a vector register into a single scalar to get the total sum is called "horizontal summation."
+
+Although extracting and adding every scalar one by one only takes a constant number of cycles, it can be computed slightly faster using a [special instruction](https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=AVX,AVX2&text=_mm256_hadd_epi32&expand=2941) that adds together pairs of adjacent elements in a register.
+
+![Horizontal summation in SSE/AVX. Note how the output is stored: the (a b a b) interleaving is common for reducing operations](../img/hsum.png)
+
+Since it is a very specific operation, it can only be done with SIMD intrinsics — although the compiler probably emits roughly the same procedure for the scalar code anyway:
+
+```c++
+int hsum(__m256i x) {
+    __m128i l = _mm256_extracti128_si256(x, 0);
+    __m128i h = _mm256_extracti128_si256(x, 1);
+    l = _mm_add_epi32(l, h);
+    l = _mm_hadd_epi32(l, l);
+    return _mm_extract_epi32(l, 0) + _mm_extract_epi32(l, 1);
+}
+```
+
+There are [other similar instructions](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=AVX,AVX2&ig_expand=3037,3009,5135,4870,4870,4872,4875,833,879,874,849,848,6715,4845&text=horizontal), e.g., for integer multiplication or calculating absolute differences between adjacent elements (used in image processing).
+
+There is also one specific instruction, `_mm_minpos_epu16`, that calculates the horizontal minimum and its index among eight 16-bit integers. This is the only horizontal reduction that works in one go: all others are computed in multiple steps.
diff --git a/content/english/hpc/simd/shuffling.md b/content/english/hpc/simd/shuffling.md
index f2a2cd15..6ff3b749 100644
--- a/content/english/hpc/simd/shuffling.md
+++ b/content/english/hpc/simd/shuffling.md
@@ -175,7 +175,7 @@ The general idea of our algorithm is as follows:
 - use this mask to index a lookup table that returns a permutation moving the elements that satisfy the predicate to the beginning of the vector (in their original order);
 - use the `_mm256_permutevar8x32_epi32` intrinsic to permute the values;
 - write the whole permuted vector to the buffer — it may have some trailing garbage, but its prefix is correct;
-- calculate the population count of the scalar mask and move the buffer pointer by that amount.
+- calculate the population count of the scalar mask and move the buffer pointer by that number.
 
 First, we need to precompute the permutations:
 
@@ -225,7 +225,9 @@ The vectorized version takes some work to implement, but it is 6-7x faster than
 
 ![](../img/filter.svg)
 
-This operation is considerably faster on AVX-512: it has a special "[compress](_mm512_mask_compress_epi32)" instruction that takes a vector of data and a mask and writes its unmasked elements contiguously. It makes a huge difference in algorithms that rely on various filtering subroutines.
+The loop performance is still relatively low — taking 4 CPU cycles per iteration —  because, on this particular CPU (Zen 2), `movemask`, `permute`, and `store` have low throughput and all have to go through the same execution port (P2). On most other x86 CPUs, you can expect it to be ~2x faster.
+
+Filtering can also be implemented considerably faster on AVX-512: it has a special "[compress](https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#ig_expand=7395,7392,7269,4868,7269,7269,1820,1835,6385,5051,4909,4918,5051,7269,6423,7410,150,2138,1829,1944,3009,1029,7077,519,5183,4462,4490,1944,1395&text=_mm512_mask_compress_epi32)" instruction that takes a vector of data and a mask and writes its unmasked elements contiguously. It makes a huge difference in algorithms that rely on various filtering subroutines, such as quicksort.
 
 <!--
 
diff --git a/content/english/hpc/slides/01-intro/_index.md b/content/english/hpc/slides/01-intro/_index.md
new file mode 100644
index 00000000..615a89aa
--- /dev/null
+++ b/content/english/hpc/slides/01-intro/_index.md
@@ -0,0 +1,297 @@
+---
+title: Why Go Beyond Big O?
+outputs: [Reveal]
+---
+
+# Performance Engineering
+
+Sergey Slotin
+
+$x + y$
+
+May 7, 2022
+
+---
+
+### About me
+
+- Former [competitive programmer](https://codeforces.com/profile/sslotin)
+- Created [Algorithmica.org](https://ru.algorithmica.org/cs) and "co-founded" [Tinkoff Generation](https://algocode.ru/)
+- Wrote [Algorithms for Modern Hardware](https://en.algorithmica.org/hpc/), on which these lectures are based
+- Twitter: [@sergey_slotin](https://twitter.com/sergey_slotin); Telegram: [@bydlokoder](https://t.me/bydlokoder); anywhere else: @sslotin
+
+----
+
+### About this mini-course
+
+- Low-level algorithm optimization
+- Two days, six lectures
+- **Day 1:** CPU architecture & assembly, pipelining, SIMD programming
+- **Day 2:** CPU caches & memory, binary search, tree data structures
+- Prerequisites: CS 102, C/C++
+- No assignments, but you are encouraged to reproduce case studies: https://github.com/sslotin/amh-code
+
+---
+
+## Lecture 0: Why Go Beyond Big O
+
+*(AMH chapter 1)*
+
+---
+
+## The RAM Model of Computation
+
+- There is a set of *elementary operations* (read, write, add, multiply, divide)
+- Each operation is executed sequentially and has some constant *cost*
+- Running time ≈ sum of all elementary operations weghted by their costs
+
+----
+
+![](https://en.algorithmica.org/hpc/complexity/img/cpu.png =400x)
+
+- The “elementary operations” of a CPU are called *instructions*
+- Their “costs” are called *latencies* (measured in cycles)
+- Instructions modify the state of the CPU stored in a number of *registers*
+- To convert to real time, sum up all latencies of executed instructions and divide by the *clock frequency* (the number of cycles a particular CPU does per second) <!-- .element: class="fragment" data-fragment-index="1" -->
+- Clock speed is volatile, so counting cycles is more useful for analytical purposes <!-- .element: class="fragment" data-fragment-index="1" -->
+
+----
+
+![](https://external-preview.redd.it/6PIp0RLbdWFGFUOT6tFuufpMlplgWdnXWOmjuqkpMMU.jpg?auto=webp&s=9bed495f3dbb994d7cdda33cc114aba1cebd30e2 =400x)
+
+http://ithare.com/infographics-operation-costs-in-cpu-clock-cycles/
+
+----
+
+### Asymptotic complexity
+
+![](https://en.algorithmica.org/hpc/complexity/img/complexity.jpg =400x)
+
+For sufficiently large $n$, we only care about asymptotic complexity: $O(n) = O(1000 \cdot n)$
+
+$\implies$ The costs of basic ops don't matter since they don't affect complexity <!-- .element: class="fragment" data-fragment-index="1" -->
+
+But can we handle "sufficiently large" $n$? <!-- .element: class="fragment" data-fragment-index="2" -->
+
+---
+
+When complexity theory was developed, computers were different
+
+![](https://upload.wikimedia.org/wikipedia/commons/thumb/4/4e/Eniac.jpg/640px-Eniac.jpg =500x)
+
+Bulky, costly, and fundamentally slow (due to speed of light)
+
+----
+
+![](https://researchresearch-news-wordpress-media-live.s3.eu-west-1.amazonaws.com/2022/02/microchip_fingertip-738x443.jpg =500x)
+
+Micro-scale circuits allow signals to propagate faster
+
+----
+
+<style>
+.randomname{
+    display: flex;
+    flex: 1em 5em;
+}
+</style>
+
+<div class="randomname">
+
+<div style="flex: 1; margin-top: -30px">
+
+![](https://en.algorithmica.org/hpc/complexity/img/lithography.png =450x)    
+
+</div>
+
+<div style="flex: 2">
+
+Microchips are "printed" on a slice of silicon using a procees called [photolithography](https://en.wikipedia.org/wiki/Photolithography):
+
+1. grow and slice a [very pure silicon crystal](https://en.wikipedia.org/wiki/Wafer_(electronics))
+2. cover it with a layer of [photoresist](https://en.wikipedia.org/wiki/Photoresist)
+3. hit it with photons in a set pattern
+4. chemically [etch](https://en.wikipedia.org/wiki/Etching_(microfabrication)) the exposed parts
+5. remove the remaining photoresist
+
+(…plus another 40-50 steps over several months to complete the rest of the CPU)
+    
+</div>
+
+</div>
+
+----
+
+The development of microchips and photolithography enabled:
+
+- higher clock rates
+- the ability to scale the production
+- **much** lower material and power usage (= lower cost)
+
+----
+
+![](https://upload.wikimedia.org/wikipedia/commons/4/49/MOS_6502AD_4585_top.jpg =500x)
+
+MOS Technology 6502 (1975), Atari 2600 (1977), Apple II (1977), Commodore 64 (1982)
+
+----
+
+Also a clear path to improvement: just make lenses stronger and chips smaller
+
+**Moore’s law:** transistor count doubles every two years. <!-- .element: class="fragment" data-fragment-index="1" -->
+
+----
+
+**Dennard scaling:** reducing die dimensions by 30%
+
+- doubles the transistor density ($0.7^2 \approx 0.5$)
+- increases the clock speed by 40% ($\frac{1}{0.7} \approx 1.4$)
+- leaves the overall *power density* the same
+  (we have a mechanical limit on how much heat can be dissipated)
+
+$\implies$ Each new "generation" should have roughly the same total cost, but 40% higher clock and twice as many transistors
+
+(which can be used, e.g., to add new instructions or increase the word size) <!-- .element: class="fragment" data-fragment-index="1" -->
+
+----
+
+Around 2005, Dennard scaling stopped — due to *leakage* issues:
+
+- transistors became very smal
+- $\implies$ their magnetic fields started to interfere with the neighboring circuitry
+- $\implies$ unnecessary heating and occasional bit flipping
+- $\implies$ have to increase voltage to fix it
+- $\implies$ have to reduce clock frequency to balance off power consumption
+
+----
+
+![](https://en.algorithmica.org/hpc/complexity/img/dennard.ppm =600x)
+
+A limit on the clock speed
+
+---
+
+Clock rates have plateaued, but we still have more transistors to use:
+
+- **Pipelining:** overlapping the execution of sequential instructions to keep different parts of the CPU busy
+- **Out-of-order execution:** no waiting for the previous instructions to complete
+- **Superscalar processing:** adding duplicates of execution units
+- **Caching:** adding layers of faster memory on the chip to speed up RAM access
+- **SIMD:** adding instructions that handle a block of 128, 256, or 512 bits of data
+- **Parallel computing:** adding multiple identinal cores on a chip
+- **Distributed computing:** multiple chips in a motherboard or multiple computers
+- **FPGAs** and **ASICs:** using custom hardware to solve a specific problem
+
+----
+
+![](https://en.algorithmica.org/hpc/complexity/img/die-shot.jpg =500x)
+
+For modern computers, the “let’s count all operations” approach for predicting algorithm performance is off by several orders of magnitude
+
+---
+
+### Matrix multiplication
+
+```python
+n = 1024
+
+a = [[random.random()
+      for row in range(n)]
+      for col in range(n)]
+
+b = [[random.random()
+      for row in range(n)]
+      for col in range(n)]
+
+c = [[0
+      for row in range(n)]
+      for col in range(n)]
+
+for i in range(n):
+    for j in range(n):
+        for k in range(n):
+            c[i][j] += a[i][k] * b[k][j]
+```
+
+630 seconds or 10.5 minutes to multiply two $1024 \times 1024$ matrices in plain Python
+
+~880 cycles per multiplication
+
+----
+
+```java
+public class Matmul {
+    static int n = 1024;
+    static double[][] a = new double[n][n];
+    static double[][] b = new double[n][n];
+    static double[][] c = new double[n][n];
+
+    public static void main(String[] args) {
+        Random rand = new Random();
+
+        for (int i = 0; i < n; i++) {
+            for (int j = 0; j < n; j++) {
+                a[i][j] = rand.nextDouble();
+                b[i][j] = rand.nextDouble();
+                c[i][j] = 0;
+            }
+        }
+
+        for (int i = 0; i < n; i++)
+            for (int j = 0; j < n; j++)
+                for (int k = 0; k < n; k++)
+                    c[i][j] += a[i][k] * b[k][j];
+    }
+}
+```
+
+Java needs 10 seconds, 63 times faster
+
+~13 cycles per multiplication
+
+----
+
+```c
+#define n 1024
+double a[n][n], b[n][n], c[n][n];
+
+int main() {
+    for (int i = 0; i < n; i++) {
+        for (int j = 0; j < n; j++) {
+            a[i][j] = (double) rand() / RAND_MAX;
+            b[i][j] = (double) rand() / RAND_MAX;
+        }
+    }
+
+    for (int i = 0; i < n; i++)
+        for (int j = 0; j < n; j++)
+            for (int k = 0; k < n; k++)
+                c[i][j] += a[i][k] * b[k][j];
+    
+    return 0;
+}
+```
+
+`GCC -O3` needs 9 seconds, but if we include `-march=native` and `-ffast-math`, the compiler vectorizes the code, and it drops down to 0.6s.
+
+----
+
+```python
+import time
+import numpy as np
+
+n = 1024
+
+a = np.random.rand(n, n)
+b = np.random.rand(n, n)
+
+start = time.time()
+
+c = np.dot(a, b)
+
+duration = time.time() - start
+print(duration)
+```
+
+BLAS needs ~0.12 seconds
+(~5x over auto-vectorized C and ~5250x over plain Python)
diff --git a/content/english/hpc/slides/_index.md b/content/english/hpc/slides/_index.md
new file mode 100644
index 00000000..794e67a6
--- /dev/null
+++ b/content/english/hpc/slides/_index.md
@@ -0,0 +1,10 @@
+---
+title: Slides
+ignoreIndexing: true
+weight: 1000
+draft: true
+---
+
+This is an attempt to make a university course out of the book.
+
+Work in progress.
diff --git a/content/english/hpc/stats.md b/content/english/hpc/stats.md
index 2961f4d5..15d81e39 100644
--- a/content/english/hpc/stats.md
+++ b/content/english/hpc/stats.md
@@ -18,7 +18,7 @@ A **random variable** is any variable whose value depends on an outcome of a ran
 2. $\forall x \in X, 0 \leq P \leq 1$.
 3. $\sum_{x \in X} P(x) = 1$.
 
-For example, consider a random variable $X$ with $k$ discrete states (e. g. the result of a die toss). We can place a *uniform distribution* on $X$ — that is, make each of its states equally likely — by setting its probability distribution to:
+For example, consider a random variable $X$ with $k$ discrete states (e.g., the result of a die toss). We can place a *uniform distribution* on $X$ — that is, make each of its states equally likely — by setting its probability distribution to:
 
 $$
 P(x=x_i) = \frac{1}{k}
@@ -121,7 +121,7 @@ The last transition is true because it is a sum of harmonic series.
 
 ### Order Statistics
 
-There is a slight modification of quicksort called quickselect that allows finding the $k$-th smallest element in $O(n)$ time, which is useful when we need to quickly compute order statistics, e. g. medians or 75-th quantiles.
+There is a slight modification of quicksort called quickselect that allows finding the $k$-th smallest element in $O(n)$ time, which is useful when we need to quickly compute order statistics; e.g., medians or 75-th quantiles.
 
 1. Select a random element $p$ from the array.
 2. Partition the array into two arrays $L$ and $R$ using the predicate $a_i > p$.
@@ -193,7 +193,7 @@ f(n, m) &= 1 \times (1-\frac{1}{m}) \times (1-\frac{2}{m}) \times ... \times (1-
 \end{aligned}
 $$
 
-This product shrinks pretty quickly with $n$, but it is not clear what value of $m$ is needed to be "safe". Turns out, if $n = O(\sqrt m)$, the probability of collision tends to zero, and anything asymptotically larger guarantees a collision. One can show this with calculus, but we will choose the probability theory way.
+This product shrinks pretty quickly with $n$, but it is not clear what value of $m$ is needed to be "safe." Turns out, if $n = O(\sqrt m)$, the probability of collision tends to zero, and anything asymptotically larger guarantees a collision. One can show this with calculus, but we will choose the probability theory way.
 
 Let's go back to the idea of counting pairs of birthdays and introduce $\frac{n \cdot (n-1)}{2}$ indicators $I_{ij}$ — one for each pair $(i, j)$ of persons — each being equal to $1$ if the birthdays match. The probability and expectation of each indicator is $\frac{1}{m}$.
 
diff --git a/content/russian/cs/algebra/binpow.md b/content/russian/cs/algebra/binpow.md
index 5c7d2d43..4126061d 100644
--- a/content/russian/cs/algebra/binpow.md
+++ b/content/russian/cs/algebra/binpow.md
@@ -6,7 +6,7 @@ authors:
 weight: -10
 ---
 
-*Бинарное возведение в степень* — приём, позволяющий возводить любое число в $n$-ую степень за $O(\log n)$ умножений (вместо n умножений при обычном подходе).
+*Бинарное возведение в степень* — приём, позволяющий возводить любое число в $n$-ую степень за $O(\log n)$ умножений (вместо $n$ умножений при обычном подходе).
 
 ## Основная идея
 
diff --git a/content/russian/cs/algebra/matmul.md b/content/russian/cs/algebra/matmul.md
index bc5ca593..8a633bea 100644
--- a/content/russian/cs/algebra/matmul.md
+++ b/content/russian/cs/algebra/matmul.md
@@ -188,7 +188,7 @@ matrix binpow(matrix a, int p) {
 
 Эту технику можно применить и к другим динамикам, где нужно посчитать количество способов что-то сделать — иногда очень неочевидными способами.
 
-Например, можно решить такую задачу: найти количество строк длины $k \approx 10^{18}$, не содержащих данные маленькие запрещённые подстроки. Для этого нужно построить граф «легальных» переходов в [Ахо-Корасике](/cs/automata/aho-corasick), возвести его матрицу смежности в $k$-тую степень и просуммировать в нём первую строчку.
+Например, можно решить такую задачу: найти количество строк длины $k \approx 10^{18}$, не содержащих данные маленькие запрещённые подстроки. Для этого нужно построить граф «легальных» переходов в [Ахо-Корасике](/cs/string-structures/aho-corasick), возвести его матрицу смежности в $k$-тую степень и просуммировать в нём первую строчку.
 
 В некоторых изощрённых случаях в матричном умножении вместо умножения и сложения нужно использовать другие операции, которые ведут себя как умножение и сложение. Пример задачи: «найти путь от $s$ до $t$ с минимальным весом ребра, использующий ровно $k$ переходов»; здесь нужно возводить в $(k-1)$-ую степень матрицу весов графа, и вместо и сложения, и умножения использовать минимум из двух весов.
 
diff --git a/content/russian/cs/basic-structures/iterators.md b/content/russian/cs/basic-structures/iterators.md
index b2d8269f..c048e0b6 100644
--- a/content/russian/cs/basic-structures/iterators.md
+++ b/content/russian/cs/basic-structures/iterators.md
@@ -71,7 +71,7 @@ for (int x : c)
 
 ### Алгоритмы из STL
 
-Например, итераторы `std::vector` относятся к `random_access_iterator`, и если вызвать функцию `lower_bound` из стандартной библиотеки, то она произведет [бинарный поиск](../../ordered-search/binary-search) по элементам (предполагая, что они отсортированы в порядке неубывания):
+Например, итераторы `std::vector` относятся к `random_access_iterator`, и если вызвать функцию `lower_bound` из стандартной библиотеки, то она произведет [бинарный поиск](/cs/interactive/binary-search/) по элементам (предполагая, что они отсортированы в порядке неубывания):
 
 ```cpp
 vector<int> a = {1, 2, 3, 5, 8, 13};
@@ -93,4 +93,4 @@ array<int, 3> a = {4, 2, 1, 3};
 cout << *min_element(a.begin(), a.end()) << endl;
 ```
 
-Подробнее про разные полезные алгоритмы STL можно прочитать в [ликбезе по C++](../../programming/cpp).
+<!-- Подробнее про разные полезные алгоритмы STL можно прочитать в [ликбезе по C++](../../programming/cpp). -->
diff --git a/content/russian/cs/decomposition/scanline.md b/content/russian/cs/decomposition/scanline.md
index 6ea7e2e7..3bc99afd 100644
--- a/content/russian/cs/decomposition/scanline.md
+++ b/content/russian/cs/decomposition/scanline.md
@@ -1,14 +1,15 @@
 ---
 title: Сканирующая прямая
 authors:
-- Сергей Слотин
+  - Сергей Слотин
 prerequisites:
-- /cs/range-queries
-- /cs/segment-tree
+  - /cs/range-queries
+  - /cs/segment-tree
 weight: 1
+published: true
 ---
 
-Метод сканирующей прямой (англ. *scanline*) заключается в сортировке точек или каких-то абстрактных *событий* (англ. *event*) и последующему проходу по ним.
+Метод сканирующей прямой (англ. *scanline*) заключается в сортировке точек на координатной прямой либо каких-то абстрактных «событий» по какому-то признаку и последующему проходу по ним.
 
 Он часто используется для решения задач на структуры данных, когда все запросы известны заранее, а также в геометрии для нахождения объединений фигур.
 
@@ -22,7 +23,7 @@ weight: 1
 
 Это решение можно улучшить. Отсортируем интересные точки по возрастанию координаты и пройдем по ним слева направо, поддерживая количество отрезков `cnt`, которые покрывают данную точку. Если в данной точке начинается отрезок, то надо увеличить `cnt` на единицу, а если заканчивается, то уменьшить. После этого пробуем обновить ответ на задачу текущим значением `cnt`. 
 
-Как такое писать: нужно представить интересные точки в виде структур с полями «координата» и «тип» (начало / конец) и отсортировать со своим компаратором. Удобно начало отрезка обозначать +1, а конец -1, чтобы просто прибавлять к `cnt` это значение и на разбирать случае.
+Как такое писать: нужно представить интересные точки в виде структур с полями «координата» и «тип» (начало / конец) и отсортировать со своим компаратором. Удобно начало отрезка обозначать +1, а конец -1, чтобы просто прибавлять к `cnt` это значение и не разбивать на случаи.
 
 Единственный нюанс — если координаты двух точек совпали, чтобы получить правильный ответ, сначала надо рассмотреть все начала отрезков, а только потом концы (чтобы при обновлении ответа в этой координате учлись и правые, и левые граничные отрезки).
 
@@ -62,15 +63,15 @@ int scanline(vector<pair<int, int>> segments) {
 
 **Задача.** Дан набор из $n$ отрезков на прямой, заданных координатами начал и концов $[l_i, r_i]$. Требуется найти суммарную длину их объединения.
 
-Как и в прошлой задаче, отсортируем интересные точки и при проходе будем поддерживать число отрезков, покрывающих данную точку. Если оно больше 0, то отрезок который мы прошли с прошлой рассмотренной точки принадлежит объединению, и его длину нужно прибавить к ответу:
+Как и в прошлой задаче, отсортируем все интересные точки и при проходе будем поддерживать число отрезков, покрывающих текущую точку. Если оно больше 0, то отрезок, который мы прошли с прошлой рассмотренной точки, принадлежит объединению, и его длину нужно прибавить к ответу:
 
 ```cpp
 int cnt = 0, res = 0, prev = -inf;
 
 for (event e : events) {
-    cnt += e.type;
     if (prev != -inf && cnt > 0)
-        res += prev - e.x;
+        res += e.x - prev; // весь отрезок [prev, e.x] покрыт cnt отрезками
+    cnt += e.type;
     prev = e.x;
 }
 ```
diff --git a/content/russian/cs/factorization/eratosthenes.md b/content/russian/cs/factorization/eratosthenes.md
index 02e72c0e..acf47749 100644
--- a/content/russian/cs/factorization/eratosthenes.md
+++ b/content/russian/cs/factorization/eratosthenes.md
@@ -12,10 +12,10 @@ published: true
 
 Основная идея соответствует названию алгоритма: запишем ряд чисел $1, 2,\ldots, n$, а затем будем вычеркивать
 
-* сначала числа, делящиеся на $2$, кроме самого числа $2$,
-* потом числа, делящиеся на $3$, кроме самого числа $3$,
-* с числами, делящимися на $4$, ничего делать не будем — мы их уже вычёркивали,
-* потом продолжим вычеркивать числа, делящиеся на $5$, кроме самого числа $5$,
+- сначала числа, делящиеся на $2$, кроме самого числа $2$,
+- потом числа, делящиеся на $3$, кроме самого числа $3$,
+- с числами, делящимися на $4$, ничего делать не будем — мы их уже вычёркивали,
+- потом продолжим вычеркивать числа, делящиеся на $5$, кроме самого числа $5$,
 
 …и так далее.
 
@@ -23,10 +23,10 @@ published: true
 
 ```c++
 vector<bool> sieve(int n) {
-    vector<bool> is_prime(n+1, true);
+    vector<bool> is_prime(n + 1, true);
     for (int i = 2; i <= n; i++)
         if (is_prime[i])
-            for (int j = 2*i; j <= n; j += i)
+            for (int j = 2 * i; j <= n; j += i)
                 is_prime[j] = false;
     return is_prime;            
 }
@@ -49,7 +49,6 @@ $$
 У исходного алгоритма асимптотика должна быть ещё лучше. Чтобы найти её точнее, нам понадобятся два факта про простые числа:
 
 1. Простых чисел от $1$ до $n$ примерно $\frac{n}{\ln n}$ .
-
 2. Простые числа распределены без больших «разрывов» и «скоплений», то есть $k$-тое простое число примерно равно $k \ln k$.
 
 Мы можем упрощённо считать, что число $k$ является простым с «вероятностью» $\frac{1}{\ln n}$. Тогда, время работы алгоритма можно более точнее оценить как
@@ -65,11 +64,11 @@ $$
 
 ## Линейное решето
 
-Основная проблема решета Эратосфена состоит в том, что некоторые числа мы будем помечать как составные несколько раз — а именно столько раз, сколько у них различных простых делителей. Чтобы достичь линейного времени работы, нам нужно придумать способ, как рассматривать все составные числа ровно один раз.
+Основная проблема решета Эратосфена состоит в том, что некоторые числа мы будем помечать как составные несколько раз — столько, сколько у них различных простых делителей. Чтобы достичь линейного времени работы, нам нужно придумать способ, как рассматривать все составные числа ровно один раз.
 
 Обозначим за $d(k)$ минимальный простой делитель числа $k$ и заметим следующий факт: у составного числа $k$ есть единственное представление $k = d(k) \cdot r$, и при этом у числа $r$ нет простых делителей меньше $d(k)$.
 
-Идея оптимизации состоит в том, чтобы перебирать этот $r$, и для каждого перебирать только нужные множители — а именно все от $2$ до $d(r)$ включительно.
+Идея оптимизации состоит в том, чтобы перебирать этот $r$, и для каждого перебирать только нужные множители — а именно, все от $2$ до $d(r)$ включительно.
 
 ### Алгоритм
 
diff --git a/content/russian/cs/geometry-basic/polygons.md b/content/russian/cs/geometry-basic/polygons.md
index 7537e591..e0a3c5e7 100644
--- a/content/russian/cs/geometry-basic/polygons.md
+++ b/content/russian/cs/geometry-basic/polygons.md
@@ -80,7 +80,7 @@ $$
 
 В более общем случае есть два популярных подхода, оба за $O(n)$.
 
-Первый заключается в подсчете углов. Пройдемся по всем вершинам в порядке обхода и будем последовательно рассматривать углы с вершиной в точке $P$ и лучами, проходящими через соседние вершины многоугольника. Если просуммировать эти ориентированные углы, то получится какая-то величина $\theta$. Если точка $P$ лежит внутри многоугольника, то $\theta = \pm 2 \theta$, иначе $\theta = 0$.
+Первый заключается в подсчете углов. Пройдемся по всем вершинам в порядке обхода и будем последовательно рассматривать углы с вершиной в точке $P$ и лучами, проходящими через соседние вершины многоугольника. Если просуммировать эти ориентированные углы, то получится какая-то величина $\theta$. Если точка $P$ лежит внутри многоугольника, то $\theta = \pm 2 \pi$, иначе $\theta = 0$.
 
 Второй заключается в подсчете, сколько раз луч, выпущенный из $P$, пересекает ребра многоугольника.
 
diff --git a/content/russian/cs/geometry-basic/products.md b/content/russian/cs/geometry-basic/products.md
index a4e1a3d5..488dbca6 100644
--- a/content/russian/cs/geometry-basic/products.md
+++ b/content/russian/cs/geometry-basic/products.md
@@ -1,6 +1,7 @@
 ---
 title: Скалярное и векторное произведение
 weight: 2
+published: true
 ---
 
 Помимо очевидных сложения, вычитания и умножения на константу, у векторов можно ввести и свои особенные операции, которые нам упростят жизнь.
@@ -40,9 +41,9 @@ $$
 a \times b = |a| \cdot |b| \cdot \sin \theta = x_a y_b - y_a x_b
 $$
 
-Так же, как и со скалярным произведением, доказательство координатной формулы оставляется упражнением читателю. Если кто-то захочет это сделать: это следует из линейности обоих произведений (что в свою очередь тоже нужно доказать) и разложения и разложения по базисным векторам $\overline{(0, 1)}$ и $\overline{(1, 0)}$.
+Так же, как и со скалярным произведением, доказательство координатной формулы оставляется упражнением читателю. Если кто-то захочет это сделать: это следует из линейности обоих произведений (что в свою очередь тоже нужно доказать) и разложения по базисным векторам $\overline{(0, 1)}$ и $\overline{(1, 0)}$.
 
-Геометрически, это ориентированный объем параллелограмма, натянутого на вектора $a$ и $b$:
+Геометрически, это ориентированная площадь параллелограмма, натянутого на вектора $a$ и $b$:
 
 ![](../img/cross.jpg)
 
@@ -65,7 +66,7 @@ int operator^(r a, r b) { return a.x*b.y - b.x*a.y; }
 
 Скалярное и векторное произведения тесно связаны с углами между векторами и могут использоваться для подсчета величин вроде ориентированных углов и площадей, которые обычно используются для разных проверок.
 
-Когда они уже реализованы, использовать произведения гораздо проще, чем опираться на алгебру. Например, можно легко угол между двумя векторами, подставив в знакомый нам `atan2` векторное и скалярное произведение:
+Когда они уже реализованы, использовать произведения гораздо проще, чем опираться на алгебру. Например, можно легко вычислить угол между двумя векторами, подставив в знакомый нам `atan2` векторное и скалярное произведение:
 
 ```c++
 double angle(r a, r b) {
diff --git a/content/russian/cs/geometry-basic/vectors.md b/content/russian/cs/geometry-basic/vectors.md
index 05051396..ee1a052a 100644
--- a/content/russian/cs/geometry-basic/vectors.md
+++ b/content/russian/cs/geometry-basic/vectors.md
@@ -1,6 +1,7 @@
 ---
-title: Точки и векторы
+title: Точки и вектора
 weight: 1
+published: true
 ---
 
 Отрезок, для которого указано, какой из его концов считается началом, а какой концом, называется *вектором*. Вектор на плоскости можно задать двумя числами — его координатами по горизонтали и вертикали.
diff --git a/content/russian/cs/graph-traversals/connectivity.md b/content/russian/cs/graph-traversals/connectivity.md
index 45ceec28..17628308 100644
--- a/content/russian/cs/graph-traversals/connectivity.md
+++ b/content/russian/cs/graph-traversals/connectivity.md
@@ -31,7 +31,7 @@ void dfs(int v, int num) {
 int num = 0;
 for (int v = 0; v < n; v++)
     if (!component[v])
-        dfs(v, num++);
+        dfs(v, ++num);
 ```
 
 После этого переменная `num` будет хранить число компонент связности, а массив `component` — номер компоненты для каждой вершины, который, например, можно использовать, чтобы быстро проверять, существует ли путь между заданной парой вершин.
diff --git a/content/russian/cs/graph-traversals/cycle.md b/content/russian/cs/graph-traversals/cycle.md
index 5347e9cd..7a274da1 100644
--- a/content/russian/cs/graph-traversals/cycle.md
+++ b/content/russian/cs/graph-traversals/cycle.md
@@ -60,6 +60,7 @@ int dfs(int v, int p = -1) {
             }
         }
     }
+    return -1;
 }
 ```
 
diff --git a/content/russian/cs/interactive/answer-search.md b/content/russian/cs/interactive/answer-search.md
index 28e4b4bc..0b38ce24 100644
--- a/content/russian/cs/interactive/answer-search.md
+++ b/content/russian/cs/interactive/answer-search.md
@@ -66,7 +66,7 @@ int solve() {
 Здесь, в отличие от предыдущей задачи, кажется, существует прямое решение с формулой. Но вместо того, чтобы о нем думать, можно просто свести задачу к обратной. Давайте подумаем, как по числу минут $t$ (ответу) понять, сколько листов напечатается за это время? Очень легко:
 
 $$
-\lfloor\frac{t}{x}\rfloor + \lfloor\frac{t}{y}\rfloor
+\left \lfloor \frac{t}{x} \right \rfloor + \left \lfloor \frac{t}{y} \right \rfloor
 $$
 
-Ясно, что за $0$ минут $n$ листов распечатать нельзя, а за $xn$ минут один только первый принтер успеет напечатать $n$ листов. Поэтому $0$ и $xn$ — это подходящие изначальные границы для бинарного поиска.
+Ясно, что за $0$ минут $n$ листов распечатать нельзя, а за $x \cdot n$ минут один только первый принтер успеет напечатать $n$ листов. Поэтому $0$ и $xn$ — это подходящие изначальные границы для бинарного поиска.
diff --git a/content/russian/cs/layer-optimizations/_index.md b/content/russian/cs/layer-optimizations/_index.md
index 492473b5..2456aa4c 100644
--- a/content/russian/cs/layer-optimizations/_index.md
+++ b/content/russian/cs/layer-optimizations/_index.md
@@ -10,10 +10,7 @@ date: 2021-08-29
 
 **Задача.** Даны $n$ точек на прямой, отсортированные по своей координате $x_i$. Нужно найти $m$ отрезков, покрывающих все точки, минимизировав при этом сумму квадратов их длин.
 
-**Базовое решение** — это следующая динамика:
-
-- $f[i, j]$ = минимальная стоимость покрытия $i$ первых точек, используя не более $j$ отрезков.
-- Переход — перебор всех возможных последних отрезков, то есть
+**Базовое решение** — определить состояние динамики $f[i, j]$ как минимальную стоимость покрытия $i$ первых точек используя не более $j$ отрезков. Пересчитывать её можно перебором всех возможных последних отрезков:
 
 $$
 f[i, j] = \min_{k < i} \{f[k, j-1] + (x_{i-1}-x_k)^2 \}
@@ -30,7 +27,7 @@ int cost(int i, int j) {
 }
 
 for (int i = 0; i <= m; i++)
-    f[0][k] = 0; // если нам не нужно ничего покрывать, то всё и так хорошо
+    f[0][i] = 0; // если нам не нужно ничего покрывать, то всё и так хорошо
 // все остальные f предполагаем равными бесконечности
 
 for (int i = 1; i <= n; i++)
diff --git a/content/russian/cs/layer-optimizations/divide-and-conquer.md b/content/russian/cs/layer-optimizations/divide-and-conquer.md
index a7731f49..c5e218db 100644
--- a/content/russian/cs/layer-optimizations/divide-and-conquer.md
+++ b/content/russian/cs/layer-optimizations/divide-and-conquer.md
@@ -8,44 +8,43 @@ published: true
 
 *Эта статья — одна из [серии](../). Рекомендуется сначала прочитать все предыдущие.*
 
-Посмотрим на формулу пересчета динамики для базового решения:
+Посмотрим на формулу пересчета динамики из базового решения:
 
 $$
 f[i, j] = \min_{k < i} \{f[k, j-1] + (x_{i-1}-x_k)^2 \}
 $$
 
-Обозначим за $opt[i, j]$ оптимальный $k$ для данного состояния — то есть  от выражения выше. Для однозначности, если оптимальный индекс не один, то выберем среди них самый правый.
+Обозначим за $opt[i, j]$ оптимальный $k$ для данного состояния — то есть аргминимум от выражения выше. Для однозначности, если оптимальный индекс не один, то выберем среди них самый правый.
 
-Конкретно в задаче покрытия точек отрезками, можно заметить следующее:
+Конкретно в задаче покрытия точек отрезками можно заметить следующее:
 
 $$
-opt[i, j] \leq opt[i, j+1]
+opt[i + 1, j] \leq opt[i, j]
 $$
 
-Интуиция такая: если у нас появился дополнительный отрезок, то последний отрезок нам не выгодно делать больше, а скорее наоборот его нужно «сжать».
+Интуация такая: если нам нужно покрыть больший префикс точек, то начало последнего отрезка точно не будет раньше.
 
-### Идея
+### Алгоритм
 
-Пусть мы уже знаем $opt[i, l]$ и $opt[i, r]$ и хотим посчитать $opt[i, j]$ для какого-то $j$ между $l$ и $r$. Тогда, воспользовавшись неравенством выше, мы можем сузить отрезок поиска оптимального индекса для $j$ со всего отрезка $[0, i-1]$ до $[opt[i, l], opt[i, r]]$.
+Пусть мы уже знаем $opt[l, k]$ и $opt[r, k]$ и хотим посчитать $opt[i, k]$ для какого-то $i$ между $l$ и $r$. Тогда, воспользовавшись неравенством выше, мы можем сузить отрезок поиска оптимального индекса для $i$ со всего отрезка $[0, i - 1]$ до $[opt[l, k], opt[r, k]]$.
 
-Будем делать следующее: заведем рекурсивную функцию, которая считает динамики для отрезка $[l, r]$, зная, что их $opt$ лежат между $l'$ и $r'$. Эта функция просто берет середину отрезка $[l, r]$ и линейным проходом считает ответ для неё, а затем рекурсивно запускается от половин, передавая в качестве границ $[l', opt]$ и $[opt, r']$ соответственно.
-
-### Реализация
-
-Один $k$-тый слой целиком пересчитывается из $(k-1)$-го следующим образом:
+Будем делать следующее: заведем рекурсивную функцию, которая считает динамики для отрезка $[l, r]$ на $k$-том слое, зная, что их $opt$ лежат между $l'$ и $r'$. Эта функция просто берет середину отрезка $[l, r]$ и линейным проходом считает ответ для неё, а затем рекурсивно запускается от половин, передавая в качестве границ $[l', opt]$ и $[opt, r']$ соответственно:
 
 ```c++
+// [ l,  r] -- какие динамики на k-том слое посчитать
+// [_l, _r] -- где могут быть их ответы
 void solve(int l, int r, int _l, int _r, int k) {
     if (l > r)
         return; // отрезок пустой -- выходим
     int opt = _l, t = (l + r) / 2;
+    // считаем ответ для f[t][k]
     for (int i = _l; i <= min(_r, t); i++) { 
         int val = f[i + 1][k - 1] + cost(i, t - 1);
         if (val < f[t][k])
             f[t][k] = val, opt = i;
     }
-    solve(l, t - 1, _l, opt, k);
-    solve(t + 1, r, opt, _r, k);
+    solve(l,     t - 1, _l,  opt, k);
+    solve(t + 1, r,     opt, _r,  k);
 }
 ```
 
@@ -56,8 +55,6 @@ for (int k = 1; k <= m; k++)
     solve(0, n - 1, 0, n - 1, k);
 ```
 
-### Асимптотика
-
 Так как отрезок $[l, r]$ на каждом вызове уменьшается примерно в два раза, глубина рекурсии будет $O(\log n)$. Так как отрезки поиска для всех элементов на одном «уровне» могут пересекаться разве что только по границам, то суммарно на каждом уровне поиск проверит $O(n)$ различных индексов. Соответственно, пересчет всего слоя займет $O(n \log n)$ операций вместо $O(n^2)$ в базовом решении.
 
-Таким образом, мы улучшили асимптотику до $O(n m \log n)$.
+Таким образом, мы улучшили асимптотику до $O(n \cdot m \cdot \log n)$.
diff --git a/content/russian/cs/layer-optimizations/knuth.md b/content/russian/cs/layer-optimizations/knuth.md
index 5c49dbe6..8a184d2d 100644
--- a/content/russian/cs/layer-optimizations/knuth.md
+++ b/content/russian/cs/layer-optimizations/knuth.md
@@ -9,13 +9,13 @@ prerequisites:
 
 Предыдущий метод оптимизации опирался на тот факт, что $opt[i, j] \leq opt[i, j + 1]$.
 
-Асимптотику можно ещё улучшить, заметив, что $opt$ монотонен ещё и по первому параметру:
+Асимптотику можно ещё улучшить, заметив, что $opt$ монотонен также и по второму параметру:
 
 $$
-opt[i-1, j] \leq opt[i, j] \leq opt[i, j+1]
+opt[i - 1, j] \leq opt[i, j] \leq opt[i, j + 1]
 $$
 
-В задаче про покрытие отрезками это выполняется примерно по той же причине: если нам нужно покрывать меньше точек, то новый оптимальный последний отрезок будет начинаться не позже старого.
+В задаче про покрытие отрезками это выполняется примерно по той же причине: если нам доступно больше отрезков, то последний отрезок в оптимальном решении точно не будет длиннее, чем раньше.
 
 ### Алгоритм
 
diff --git a/content/russian/cs/matching/matching-problems.md b/content/russian/cs/matching/matching-problems.md
index cedfe69d..cd14e54e 100644
--- a/content/russian/cs/matching/matching-problems.md
+++ b/content/russian/cs/matching/matching-problems.md
@@ -81,6 +81,6 @@ $$
 
 Пусть у вершин левой доли есть какие-то веса, и нам нужно набрать максимальное паросочетание минимального веса.
 
-Выясняется, что можно просто отсортировать вершины левой доли по весу и пытаться в таком порядке добавлять их в паросочетание стандартным алгоритмом Куна. Для доказательства этого факта читатель может прочитать про [жадный алгоритм Радо-Эдмондса](/cs/greedy/matroid), частным случаем которого является такая модификация алгоритма Куна.
+Выясняется, что можно просто отсортировать вершины левой доли по весу и пытаться в таком порядке добавлять их в паросочетание стандартным алгоритмом Куна. Для доказательства этого факта читатель может прочитать про [жадный алгоритм Радо-Эдмондса](/cs/combinatorial-optimization/matroid), частным случаем которого является такая модификация алгоритма Куна.
 
 Аналогичную задачу, но когда у *ребер* есть веса, проще всего решать сведением к нахождению [потока минимальной стоимости](/cs/flows/mincost-maxflow).
diff --git a/content/russian/cs/modular/reciprocal.md b/content/russian/cs/modular/reciprocal.md
index 5d0e34e9..7b966de3 100644
--- a/content/russian/cs/modular/reciprocal.md
+++ b/content/russian/cs/modular/reciprocal.md
@@ -99,7 +99,7 @@ $$ ax + my = 1 \iff ax \equiv 1 \iff x \equiv a^{-1} \pmod m $$
 int inv(int a, int m) {
     if (a == 1)
         return 1;
-    return (1 - inv(m % a, a) * m) / a + m;
+    return (1 - 1ll * inv(m % a, a) * m) / a + m;
 }
 ```
 
diff --git a/content/russian/cs/numerical/newton.md b/content/russian/cs/numerical/newton.md
index 248e1b4e..5426cff5 100644
--- a/content/russian/cs/numerical/newton.md
+++ b/content/russian/cs/numerical/newton.md
@@ -66,9 +66,9 @@ double sqrt(double n) {
 
 Запустим метод Ньютона для поиска квадратного корня $2$, начиная с $x_0 = 1$, и посмотрим, сколько первых цифр оказались правильными после каждой итерации:
 
-<pre>
-<b>1</b>
-<b>1</b>.5
+<pre class='center-pre'>
+<b>1</b>.0000000000000000000000000000000000000000000000000000000000000
+<b>1</b>.5000000000000000000000000000000000000000000000000000000000000
 <b>1.41</b>66666666666666666666666666666666666666666666666666666666675
 <b>1.41421</b>56862745098039215686274509803921568627450980392156862745
 <b>1.41421356237</b>46899106262955788901349101165596221157440445849057
diff --git a/content/russian/cs/persistent/persistent-array.md b/content/russian/cs/persistent/persistent-array.md
index e476c355..018c287a 100644
--- a/content/russian/cs/persistent/persistent-array.md
+++ b/content/russian/cs/persistent/persistent-array.md
@@ -2,8 +2,9 @@
 title: Структуры с откатами
 weight: 1
 authors:
-- Сергей Слотин
-date: 2021-09-12
+  - Сергей Слотин
+date: {}
+published: true
 ---
 
 Состояние любой структуры как-то лежит в памяти: в каких-то массивах, или в более общем случае, по каким-то определенным адресам в памяти. Для простоты, пусть у нас есть некоторый массив $a$ размера $n$, и нам нужно обрабатывать запросы присвоения и чтения, а также иногда откатывать изменения обратно.
@@ -20,7 +21,7 @@ int a[N];
 stack< pair<int, int> > s;
 
 void change(int k, int x) {
-    l.push({k, a[k]});
+    s.push({k, a[k]});
     a[k] = x;
 }
 
@@ -84,7 +85,7 @@ void rollback() {
 
 ```cpp
 int t = 0;
-vector<int> versions[N];
+vector< pair<int, int> > versions[N];
 
 void change(int k, int x) {
     versions[k].push_back({t++, x});
diff --git a/content/russian/cs/programming/bayans.md b/content/russian/cs/programming/bayans.md
index 7d8d773b..d7b42267 100644
--- a/content/russian/cs/programming/bayans.md
+++ b/content/russian/cs/programming/bayans.md
@@ -4,11 +4,12 @@ weight: 100
 authors:
 - Сергей Слотин
 created: 2017-2019
+date: 2022-07-17
 ---
 
 Везде, где не указано — время работы $O(n)$, а если есть конкретные числа, то TL 1 секунда.
 
-Задачи идут в порядке вспоминания, то есть в весьма рандомном.
+Задачи идут в порядке вспоминания/придумывания, то есть в весьма рандомном.
 
 ## Попугаи
 
@@ -121,12 +122,24 @@ int lower_bound(int x) {
 
 ## Нулевая сумма
 
-Дано  мультимножество из $n$ целых чисел. Найдите любое его подмножество, сумма чисел которого делится на $n$.
+Дано мультимножество из $n$ целых чисел. Найдите любое его непустое подмножество, сумма чисел которого делится на $n$.
 
 ## Мета-задача
 
 В задаче дана произвольная строка, по которой известным только авторам способом генерируется ответ yes/no. В задаче 100 тестов. У вас есть 20 попыток. В качестве фидбэка вам доступны вердикты на каждом тесте. Вердикта всего два: OK (ответ совпал) и WA. Попытки поделить на ноль, выделить терабайт памяти и подобное тоже считаются как WA. «Решите» задачу.
 
+## Мета-задача 2
+
+Условие как в «Мета-задаче», но сообщается только число пройденных тестов.
+
+100 тестов, 70 попыток.
+
+## Мета-задача 3
+
+Условие как в «Мета-задаче», но сообщается только номер первого не пройденного теста.
+
+10 тестов, 100 попыток.
+
 ## Ниточка
 
 В плоскую доску вбили $n$ гвоздей радиуса $r$, причём так, что соответствующие точки на плоскости образуют вершины выпуклого многоугольника. На эти гвозди натянули ниточку, причём ниточка «огибает» по кругу гвозди. Найдите длину ниточки, то есть периметр этого многоугольника с учётом закругления.
@@ -302,3 +315,56 @@ def query(y):
 ```
 
 Ваша задача — отгадать число, используя не более 10000 попыток.
+
+## Коммивояжер
+
+Даны $3 \cdot 10^5$ точек на плоскости. Выберите среди них любое подмножество из 500 точек и решите для него задачу коммивояжера: найдите минимальный по длине цикл, проходящий через все эти точки.
+
+## Анаграммы
+
+Найдите в строке $s$ первую подстроку, являющуюся анаграммой (пререстановкой символов) строки $t$ за $O(n)$.
+
+## Функциональный граф
+
+Дан ориентированный граф из $n < 10^5$ вершин, в котором из каждой вершины ведет ровно одно ребро. Требуется ответить на $q < 10^5$ запросов «в какую вершину мы попадем, если начнем в вершине $v_i$ и сделаем $k_i < 10^{18}$ переходов» за время $O(q + n)$.
+
+## Асинхронная шляпа
+
+Серёжа и его $(n - 1)$ друзей решили поиграть в «шляпу», в которой один игрок должен за ограниченное время объяснить как можно больше слов, чтобы его партнер их отгадал.
+
+Каждый игрок должен пообщаться с любым другим по разу; обычно игра проводится так:
+
+- 1-й игрок объясняет в течение минуты слова 2-му,
+- 2-й игрок объясняет слова 3-му,
+- ...,
+- $n$-й игрок объясняет слова 1-му,
+- 1-й игрок объясняет слова 3-му,
+- 2-й игрок объясняет слова 4-му…
+
+…и так далее, пока $(n-1)$-й игрок не закончит объяснять слова $(n-2)$-ому.
+
+Если друзей собралось много, то игра может занять приличное время. Серёжу интересует, какое минимальное время она может длиться, если разрешить парам участников общаться между собой одновременно и в любом порядке.
+
+Для данного $n \le 500$, найдите минимальное количество времени $k$ и соответствующее ему расписание.
+
+## Random coffee
+
+В компании, в которой вы работаете, устроено неизвестное число людей — от одного до бесконечности с равной вероятностью. Для борьбы с одиночеством, каждый сотрудник участвует в «random coffee»: каждую неделю вы встречаетесь со случайным человеком из компании, чтобы попить кофе и обсудить что угодно.
+
+Вы участвовали в random coffee $n$ раз и пообщались с $k$ разными людьми (с некоторыми — более одного раза). Какое наиболее вероятное число человек работает в компании?
+
+## Мафия
+
+В «мафию» играют 13 человек, из которых 10 мирных и 3 мафии. Все роли розданы с помощью стандартной колоды игральных карт: заранее выбрали и перемешали 10 красных и 3 чёрные карты, кто вытянул черную — мафия. Все карты различны и известны всем. Игра начинается с дневного голосования.
+
+Как мирным гарантированно победить?
+
+<!-- Экзамены -->
+
+<!--
+
+## Случайная перестановка
+
+Дана *случайная* перестановка чисел. Для каждого элемента, найдите
+
+-->
diff --git a/content/russian/cs/programming/stress-test.md b/content/russian/cs/programming/stress-test.md
index b20c77b6..c67d1237 100644
--- a/content/russian/cs/programming/stress-test.md
+++ b/content/russian/cs/programming/stress-test.md
@@ -151,12 +151,12 @@ _, f1, f2, gen, iters = sys.argv
 
 for i in range(int(iters)):
     print('Test', i + 1)
-    os.popen('python3 %s > test.txt' % gen)
-    v1 = os.popen('./%s < test.txt' % f1).read()
-    v2 = os.popen('./%s < test.txt' % f2).read()
+    os.system(f'python3 {gen} > test.txt')
+    v1 = os.popen(f'./{f1} < test.txt').read()
+    v2 = os.popen(f'./{f2} < test.txt').read()
     if v1 != v2:
         print("Failed test:")
-        print(open("text.txt").read())
+        print(open("test.txt").read())
         print(f'Output of {f1}:')
         print(v1)
         print(f'Output of {f2}:')
diff --git a/content/russian/cs/range-queries/fenwick.md b/content/russian/cs/range-queries/fenwick.md
index f07a1ed4..9e37fc8d 100644
--- a/content/russian/cs/range-queries/fenwick.md
+++ b/content/russian/cs/range-queries/fenwick.md
@@ -84,7 +84,7 @@ int sum (int r1, int r2) {
     int res = 0;
     for (int i = r1; i > 0; i -= i & -i)
         for (int j = r2; j > 0; j -= j & -j)
-            ans += t[i][j];
+            res += t[i][j];
     return res;
 }
 ```
diff --git a/content/russian/cs/range-queries/img/prefix-sum.png b/content/russian/cs/range-queries/img/prefix-sum.png
new file mode 100644
index 00000000..4e00190a
Binary files /dev/null and b/content/russian/cs/range-queries/img/prefix-sum.png differ
diff --git a/content/russian/cs/range-queries/prefix-sum.md b/content/russian/cs/range-queries/prefix-sum.md
index 861200a1..f4e02570 100644
--- a/content/russian/cs/range-queries/prefix-sum.md
+++ b/content/russian/cs/range-queries/prefix-sum.md
@@ -52,13 +52,15 @@ $$
 
 Для ответа на запрос поиска суммы на произвольном полуинтервале нужно просто вычесть друг из друга две предподсчитанные префиксные суммы.
 
-@@
+<!--
 \foreach \n [count=\x] in {5, 4, 7, 2, 2, -1, 8}
   \node[rectangle, minimum size=9mm, draw] at (\x, 0) {\n};
 
 \foreach \s [count=\x] in {0, 5, 9, 16, 18, 20, 19, 27}
   \node[below] at (\x-.5, -.5) {\s};
-@@
+-->
+
+![](../img/prefix-sum.png)
 
 ### Другие операции
 
diff --git a/content/russian/cs/range-queries/sqrt-structures.md b/content/russian/cs/range-queries/sqrt-structures.md
index bac0da16..25fe3b5e 100644
--- a/content/russian/cs/range-queries/sqrt-structures.md
+++ b/content/russian/cs/range-queries/sqrt-structures.md
@@ -1,10 +1,10 @@
 ---
 title: Корневые структуры
 authors:
-- Сергей Слотин
-- Иван Сафонов
+  - Сергей Слотин
+  - Иван Сафонов
 weight: 6
-date: 2021-09-13
+date: 2022-08-16
 ---
 
 Корневые оптимизации можно использовать много для чего, в частности в контексте структур данных.
@@ -23,16 +23,15 @@ date: 2021-09-13
 ```c++
 // c это и количество блоков, и также их размер; оно должно быть чуть больше корня
 const int maxn = 1e5, c = 330;
-int a[maxn], b[c];
-int add[c];
+int a[maxn], b[c], add[c];
 
 for (int i = 0; i < n; i++)
     b[i / c] += a[i];
 ```
 
-Заведем также массив `add` размера $\sqrt n$, который будем использовать для отложенной операции прибавления на блоке. Будем считать, что реальное значение $i$-го элемента равно `a[i] + add[i / c]`.
+Заведем также массив `add` размера $\sqrt n$, который будем использовать для отложенной операции прибавления на блоке: будем считать, что реальное значение $i$-го элемента равно `a[i] + add[i / c]`.
 
-Теперь мы можем отвечать на запросы первого типа за $O(\sqrt n)$ на запрос:
+Теперь мы можем отвечать на запросы первого типа за $O(\sqrt n)$ операций на запрос:
 
 1. Для всех блоков, лежащих целиком внутри запроса, просто возьмём уже посчитанные суммы и сложим.
 2. Для блоков, пересекающихся с запросом только частично (их максимум два — правый и левый), проитерируемся по нужным элементам и поштучно прибавим к ответу.
@@ -68,6 +67,7 @@ void upd(int l, int r, int x) {
             l += c;
         }
         else {
+            b[l / c] += x;
             a[l] += x;
             l++;
         }
@@ -111,8 +111,8 @@ vector< vector<int> > blocks;
 // возвращает индекс блока и индекс элемента внутри блока
 pair<int, int> find_block(int pos) {
     int idx = 0;
-    while (blocks[idx].size() >= pos)
-        pos -= blocks[idx--].size();
+    while (blocks[idx].size() <= pos)
+        pos -= blocks[idx++].size();
     return {idx, pos};
 }
 ```
diff --git a/content/russian/cs/sequences/_index.md b/content/russian/cs/sequences/_index.md
index d02ed49b..6888831d 100644
--- a/content/russian/cs/sequences/_index.md
+++ b/content/russian/cs/sequences/_index.md
@@ -1,7 +1,6 @@
 ---
 title: Последовательности
 weight: 4
-draft: true
 ---
 
-В этой главе рассматриваются некоторые алгоритмы на неотсортированных последовательностях.
+В этой главе рассматриваются алгоритмы для неотсортированных последовательностей.
diff --git a/content/russian/cs/sequences/compression.md b/content/russian/cs/sequences/compression.md
index 332011b3..5b469fec 100644
--- a/content/russian/cs/sequences/compression.md
+++ b/content/russian/cs/sequences/compression.md
@@ -3,46 +3,64 @@ title: Сжатие координат
 authors:
 - Сергей Слотин
 weight: -1
-draft: true
+date: 2022-04-20
 ---
 
+Часто бывает полезно преобразовать последовательность чисел либо каких-то других объектов в промежуток последовательных целых чисел — например, чтобы использовать её элементы как индексы в массиве либо какой-нибудь другой структуре.
 
-## Сжатие координат
-Это общая идея, которая может оказаться полезной. Пусть, есть $n$ чисел $a_1,\ldots,a_n$. Хотим, преобразовать $a_i$ так, чтобы равные остались равными, разные остались разными, но все они были от 0 до $n-1$. Для этого надо отсортировать числа, удалить повторяющиеся и заменить каждое $a_i$ на его индекс в отсортированном массиве.
+Эта задача эквивалентна нумерации элементов множества, что можно сделать за $O(n)$ через хеш-таблицу:
 
+```c++
+vector<int> compress(vector<int> a) {
+    unordered_map<int, int> m;
 
-```
-int a[n], all[n];
-for (int i = 0; i < n; ++i) {
-    cin >> a[i];
-    all[i] = a[i];
+    for (int &x : a) {
+        if (m.count(x))
+            x = m[x];
+        else
+            m[x] = m.size();
+    }
+
+    return a;
 }
-sort(all, all + n);
-m = unique(all, all + n) - all; // теперь m - число различных координат
-for (int i = 0; i < n; ++i)
-    a[i] = lower_bound(all, all + m, x[i]) - all;
 ```
 
-```cpp
+Элементам будут присвоены номера в порядке их первого вхождения в последовательность. Если нужно сохранить *порядок*, присвоив меньшим элементам меньшие номера, то задача становится чуть сложнее, и её можно решить разными способами.
+
+Как вариант, можно отсортировать массив, а затем два раза пройтись по нему с хэш-таблицей — в первый раз заполняя её, а во второй раз сжимая сам массив:
+
+```c++
 vector<int> compress(vector<int> a) {
+    vector<int> b = a;
+    sort(b.begin(), b.end());
+
     unordered_map<int, int> m;
-    for (int x : a)
-        if (m.count(x))
+
+    for (int x : b)
+        if (!m.count(x))
             m[x] = m.size();
+
     for (int &x : a)
         x = m[x];
+
     return a;
 }
 ```
 
+Также можно выкинуть из отсортированного массива дупликаты (за линейное время), а затем использовать его для нахождения индекса каждого элемента исходного массива бинарным поиском:
 
-```cpp
+```c++
 vector<int> compress(vector<int> a) {
     vector<int> b = a;
+
     sort(b.begin(), b.end());
     b.erase(unique(b.begin(), b.end()), b.end());
+
     for (int &x : a)
         x = int(lower_bound(b.begin(), b.end(), x) - b.begin());
+
     return a;
 }
 ```
+
+Оба подхода работают за $O(n \log n)$. Используйте тот, который больше нравится.
diff --git a/content/russian/cs/sequences/inversions.md b/content/russian/cs/sequences/inversions.md
index f18d1f4a..2fbec7d9 100644
--- a/content/russian/cs/sequences/inversions.md
+++ b/content/russian/cs/sequences/inversions.md
@@ -4,13 +4,18 @@ title: Число инверсий
 weight: 5
 authors:
 - Сергей Слотин
+draft: true
 ---
 
-Пусть у нас есть некоторая перестановка $p$ (какая-то последовательность чисел от $1$ до $n$, где все числа встречаются ровно один раз). *Инверсией* называется пара индексов $i$ и $j$ такая, что $i < j$ и $p_i > p_j$. Требуется найти количество инверсий в данной перестановке.
+**Определение.** *Инверсией* в перестановке $p$ называется пара индексов $i$ и $j$ такая, что $i < j$ и $p_i > p_j$.
 
-## Наивный алгоритм
+Например:
 
-Эта задача легко решается за $O(n^2)$ обычным перебором всех пар индексов и проверкой каждого на инверсию:
+- в перестановке $[1, 2, 3]$ инверсий нет,
+- в $[1, 3, 2]$ одна инверсия ($3 \leftrightarrow 2$),
+- в $[3, 2, 1]$ три инверсии ($3 \leftrightarrow 2$, $3 \leftrightarrow 1$ и $2 \leftrightarrow 1$).
+
+В этой статье мы рассмотрим, как находить количество инверсий в перестановке. Эта задача легко решается за $O(n^2)$ обычным перебором всех пар индексов и проверкой каждого на инверсию:
 
 ```cpp
 int count_inversions(int *p, int n) {
@@ -23,6 +28,8 @@ int count_inversions(int *p, int n) {
 }
 ```
 
+Решить её быстрее сложнее.
+
 ## Сортировкой слиянием
 
 Внезапно эту задачу можно решить сортировкой слиянием, слегка модифицировав её.
diff --git a/content/russian/cs/sequences/quickselect.md b/content/russian/cs/sequences/quickselect.md
index b1606bbd..7e83a267 100644
--- a/content/russian/cs/sequences/quickselect.md
+++ b/content/russian/cs/sequences/quickselect.md
@@ -1,12 +1,12 @@
 ---
-# TODO: реализация
 title: Порядковые статистики
 weight: 4
+draft: true
 ---
 
 Если в [начале предыдущей главы](/cs/interactive/binary-search) мы искали число элементов массива, меньших $x$ — также известное как индекс этого элемента в отсортированном массиве — то теперь нас интересует обратная задача: узнать, какой элемент $k$-тый по возрастанию.
 
-Если массив уже отсортирован, то задача тривиальная — просто берем $k$-тый элемент. Иначе мы его можем отсортировать, но на это потребуется $O(n \log n)$ операций — и мы знаем, что используя только сравнения быстрее не получится.
+Если массив уже отсортирован, то задача тривиальная: просто берем $k$-тый элемент. Иначе мы его можем отсортировать, но на это потребуется $O(n \log n)$ операций — и мы знаем, что если мы используем только сравнения, быстрее не получится.
 
 Есть другой подход — мы можем модифицировать алгоритм быстрой сортировки.
 
@@ -26,4 +26,17 @@ weight: 4
 
 Подумав над тем, что размер отрезка каждый раз убывает приблизительно в 2 раза, над ограниченностью суммы $n + \frac{n}{2} + \frac{n}{4} + \ldots = 2 \cdot n$, и немного помахав руками, получаем, что алгоритм работает за $O(n)$. 
 
+<!--
+```c++
+int buffer[maxn];
+
+int quickselect(int *a, int n) {
+    int t = rand() % n;
+
+    for ()
+
+}
+```
+-->
+
 В C++ этот алгоритм уже реализован и доступен как `nth_element`.
diff --git a/content/russian/cs/set-structures/dsu.md b/content/russian/cs/set-structures/dsu.md
index 6c9a4d80..ee437a43 100644
--- a/content/russian/cs/set-structures/dsu.md
+++ b/content/russian/cs/set-structures/dsu.md
@@ -66,7 +66,7 @@ int leader(int v) {
 
 Следующие две эвристики похожи по смыслу и стараются оптимизировать высоту дерева, выбирая оптимальный корень для переподвешивания.
 
-**Ранговая эвристика**. Будем хранить для каждой вершины её *ранг* — высоту её поддереа. При объединении деревьев будем делать корнем нового дерева ту вершину, у которой ранг больше, и пересчитывать ранги (ранг у лидера должен увеличиться на единицу, если он совпадал с рангом другой вершины). Эта эвристика оптимизирует высоту дерева напрямую.
+**Ранговая эвристика**. Будем хранить для каждой вершины её *ранг* — высоту её поддерева. При объединении деревьев будем делать корнем нового дерева ту вершину, у которой ранг больше, и пересчитывать ранги (ранг у лидера должен увеличиться на единицу, если он совпадал с рангом другой вершины). Эта эвристика оптимизирует высоту дерева напрямую.
 
 ```cpp
 void unite(int a, int b) {
diff --git a/content/russian/cs/sorting/bubble.md b/content/russian/cs/sorting/bubble.md
index 2d9af9b5..38fa5c8a 100644
--- a/content/russian/cs/sorting/bubble.md
+++ b/content/russian/cs/sorting/bubble.md
@@ -1,9 +1,10 @@
 ---
 title: Сортировка пузырьком
 weight: 1
+published: true
 ---
 
-Наш первый подход будет заключаться в следующем: обозначим за $n$ длину массива и $n$ раз пройдёмся раз пройдемся по нему слева направо, меняя два соседних элемента, если первый больше второго.
+Наш первый подход будет заключаться в следующем: обозначим за $n$ длину массива и $n$ раз пройдёмся по нему слева направо, меняя два соседних элемента, если первый больше второго.
 
 Каждую итерацию максимальный элемент «всплывает» как пузырек к концу массива — отсюда и название.
 
diff --git a/content/russian/cs/sorting/quicksort.md b/content/russian/cs/sorting/quicksort.md
index f3a6a5d6..e6494cd3 100644
--- a/content/russian/cs/sorting/quicksort.md
+++ b/content/russian/cs/sorting/quicksort.md
@@ -7,13 +7,18 @@ draft: true
 Быстрая сортировка заключается в том, что на каждом шаге мы находим опорный элемент, все элементы, которые меньше его кидаем в левую часть, остальные в правую, а затем рекурсивно спускаемся в обе части.
 
 ```cpp
+// partition - функция разбивающие элементы 
+// на меньшие и больше/равные a[index], 
+// при этом функция возвращает границу разбиения
+void partition(int l, int r, int p) {
+
+}
+
 void quicksort(int l, int r){
     if (l < r){
         int index = (l + r) / 2; /* index - индекс опорного элемента для 
         начала сделаем его равным середине отрезка*/
-        index = divide(l, r, index); /* divide - функция разбивающие элементы 
-        на меньшие и больше/равные a[index], 
-        при этом функция возвращает границу разбиения*/
+        index = partition(l, r, index);
         quicksort(l, index);
         quicksort(index + 1, r);
     }
@@ -25,8 +30,6 @@ void quicksort(int l, int r){
 
 Существуют несколько выходов из этой ситуации :
 
-2. Давайте если быстрая сортировка работает долго, то запустим любую другую сортировку за $NlogN$.
-
-3. Давайте делить массив не на две, а на три части(меньше, равны, больше).
-
-4. Чтобы избавиться от проблемы с максимумом/минимумом в середине, давайте **брать случайный элемент**.
+1. Давайте если быстрая сортировка работает долго, то запустим любую другую сортировку за $NlogN$.
+2. Давайте делить массив не на две, а на три части(меньше, равны, больше).
+3. Чтобы избавиться от проблемы с максимумом/минимумом в середине, давайте **брать случайный элемент**.
diff --git a/content/russian/cs/sorting/selection.md b/content/russian/cs/sorting/selection.md
index b47f2320..30854b5f 100644
--- a/content/russian/cs/sorting/selection.md
+++ b/content/russian/cs/sorting/selection.md
@@ -1,6 +1,7 @@
 ---
 title: Сортировка выбором
 weight: 2
+published: true
 ---
 
 Похожим методом является **сортировка выбором** (минимума или максимума).
@@ -10,7 +11,7 @@ weight: 2
 ```cpp
 void selection_sort(int *a, int n) {
     for (int k = 0; k < n - 1; k++)
-        for (j = k + 1; j < n; j++)
+        for (int j = k + 1; j < n; j++)
             if (a[k] > a[j])
                 swap(a[j], a[k]);
 }
diff --git a/content/russian/cs/spanning-trees/kruskal.md b/content/russian/cs/spanning-trees/kruskal.md
index ddb9cabf..1f4c98a4 100644
--- a/content/russian/cs/spanning-trees/kruskal.md
+++ b/content/russian/cs/spanning-trees/kruskal.md
@@ -34,4 +34,4 @@ for (auto [a, b, w] : edges) {
 }
 ```
 
-Раз остовные деревья являются частным случаем [матроида](/cs/greedy/matroid), то алгоритм Краскала является частным случаем алгоритма Радо-Эдмондса.
+Раз остовные деревья являются частным случаем [матроида](/cs/combinatorial-optimization/matroid), то алгоритм Краскала является частным случаем алгоритма Радо-Эдмондса.
diff --git a/content/russian/cs/spanning-trees/prim.md b/content/russian/cs/spanning-trees/prim.md
index d9a00c6e..ff250c70 100644
--- a/content/russian/cs/spanning-trees/prim.md
+++ b/content/russian/cs/spanning-trees/prim.md
@@ -2,7 +2,8 @@
 title: Алгоритм Прима
 weight: 2
 prerequisites:
-- safe-edge
+  - safe-edge
+published: true
 ---
 
 Лемма о безопасном ребре говорит, что мы можем строить минимальный остов постепенно, добавляя по одному ребра, про которые мы точно знаем, что они минимальные для соединения какого-то разреза.
@@ -47,7 +48,7 @@ min_edge[0] = 0;
 
 for (int i = 0; i < n; i++) {
     int v = -1;
-    for (int u = 0; u < n; j++)
+    for (int u = 0; u < n; u++)
         if (!used[u] && (v == -1 || min_edge[u] < min_edge[v]))
             v = u;
 
diff --git a/content/russian/cs/spanning-trees/safe-edge.md b/content/russian/cs/spanning-trees/safe-edge.md
index cc7138c9..19f97006 100644
--- a/content/russian/cs/spanning-trees/safe-edge.md
+++ b/content/russian/cs/spanning-trees/safe-edge.md
@@ -24,4 +24,4 @@ weight: 1
 - Если веса всех рёбер различны, то остов будет уникален.
 - Минимальный остов является также и остовом с минимальным произведением весов рёбер (замените веса всех рёбер на их логарифмы).
 - Минимальный остов является также и остовом с минимальным весом самого тяжелого ребра.
-- Остовные деревья — частный случай [матроидов](/cs/greedy/matroid).
+- Остовные деревья — частный случай [матроидов](/cs/combinatorial-optimization/matroid).
diff --git a/content/russian/cs/string-searching/manacher.md b/content/russian/cs/string-searching/manacher.md
index 8954b653..16d32ccb 100644
--- a/content/russian/cs/string-searching/manacher.md
+++ b/content/russian/cs/string-searching/manacher.md
@@ -32,7 +32,7 @@ vector<int> pal_array(string s) {
 
 Тот же пример $s = aa\dots a$ показывает, что данная реализация работает за $O(n^2)$.
 
-Для оптимизации применим идею, знакомую из алгоритма [z-функции](string-searching): при инициализации $t_i$ будем пользоваться уже посчитанными $t$. А именно, будем поддерживать $(l, r)$ — интервал, соответствующий самому правому из найденных подпалиндромов. Тогда мы можем сказать, что часть наибольшего палиндрома с центром в $s_i$, которая лежит внутри $s_{l:r}$, имеет радиус хотя бы $\min(r-i, \; t_{l+r-i})$. Первая величина равна длине, дальше которой произошел бы выход за пределы $s_{l:r}$, а вторая — значению радиуса в позиции, зеркальной относительно центра палиндрома $s_{l:r}$.
+Для оптимизации применим идею, знакомую из алгоритма [z-функции](/cs/string-searching/z-function/): при инициализации $t_i$ будем пользоваться уже посчитанными $t$. А именно, будем поддерживать $(l, r)$ — интервал, соответствующий самому правому из найденных подпалиндромов. Тогда мы можем сказать, что часть наибольшего палиндрома с центром в $s_i$, которая лежит внутри $s_{l:r}$, имеет радиус хотя бы $\min(r-i, \; t_{l+r-i})$. Первая величина равна длине, дальше которой произошел бы выход за пределы $s_{l:r}$, а вторая — значению радиуса в позиции, зеркальной относительно центра палиндрома $s_{l:r}$.
 
 ```c++
 
diff --git a/content/russian/cs/string-structures/aho-corasick.md b/content/russian/cs/string-structures/aho-corasick.md
index 369f5171..2ca1da65 100644
--- a/content/russian/cs/string-structures/aho-corasick.md
+++ b/content/russian/cs/string-structures/aho-corasick.md
@@ -1,10 +1,11 @@
 ---
 title: Алгоритм Ахо-Корасик
 authors:
-- Сергей Слотин
+  - Сергей Слотин
 weight: 2
 prerequisites:
-- trie
+  - trie
+published: true
 ---
 
 Представим, что мы работаем журналистами в некотором авторитарном государстве, контролирующем СМИ, и в котором время от времени издаются законы, запрещающие упоминать определенные политические события или использовать определенные слова. Как эффективно реализовать подобную цензуру программно?
@@ -36,7 +37,7 @@ prerequisites:
 
 **Определение.** *Суффиксная ссылка* $l(v)$ ведёт в вершину $u \neq v$, которая соответствует наидлиннейшему принимаемому бором суффиксу $v$.
 
-**Определение.** *Автоматный переход* $\delta(v, c)$ ведёт в вершину, соответствующую минимальному принимаемому бором суффиксу строки $v + c$.
+**Определение.** *Автоматный переход* $\delta(v, c)$ ведёт в вершину, соответствующую максимальному принимаемому бором суффиксу строки $v + c$.
 
 **Наблюдение.** Если переход и так существует в боре (будем называть такой переход *прямым*), то автоматный переход будет вести туда же.
 
diff --git a/content/russian/cs/string-structures/palindromic-tree.md b/content/russian/cs/string-structures/palindromic-tree.md
index 3d70c76b..9b57534a 100644
--- a/content/russian/cs/string-structures/palindromic-tree.md
+++ b/content/russian/cs/string-structures/palindromic-tree.md
@@ -19,7 +19,7 @@ weight: 3
 
 Будем поддерживать наибольший суффикс-палиндром. Когда мы будем дописывать очередной символ $c$, нужно найти наибольший суффикс этого палиндрома, который может быть дополнен символом $c$ — это и будет новый наидлиннейший суффикс-палиндром.
 
-Для этого поступим аналогично [алгоритму Ахо-Корасик](aho-corasick): будем поддерживать для каждого палиндрома суффиксную ссылку $l(v)$, ведущую из $v$ в её наибольший суффикс-палиндром. При добавлении очередного символа, будем подниматься по суффиксным ссылкам, пока не найдём вершину, из которой можно совершить нужный переход.
+Для этого поступим аналогично [алгоритму Ахо-Корасик](../aho-corasick): будем поддерживать для каждого палиндрома суффиксную ссылку $l(v)$, ведущую из $v$ в её наибольший суффикс-палиндром. При добавлении очередного символа, будем подниматься по суффиксным ссылкам, пока не найдём вершину, из которой можно совершить нужный переход.
 
 Если в подходящей вершине этого перехода не существовало, то нужно создать новую вершину, и для неё тоже понадобится своя суффиксная ссылка. Чтобы найти её, будем продолжать подниматься по суффиксным ссылкам предыдущего суффикс-палиндрома, пока не найдём второе такое место, которое мы можем дополнить символом $c$.
 
diff --git a/content/russian/cs/string-structures/suffix-array.md b/content/russian/cs/string-structures/suffix-array.md
index 80d2b129..a7b90768 100644
--- a/content/russian/cs/string-structures/suffix-array.md
+++ b/content/russian/cs/string-structures/suffix-array.md
@@ -22,7 +22,7 @@ weight: 100
 
 ![Сортировка всех суффиксов строки «mississippi$»](../img/sa-sort.png)
 
-**Где это может быть полезно.** Пусть вы хотите основать ещё один поисковик, и чтобы получить финансирование, вам нужно сделать хоть что-то минимально работающее — хотя бы просто научиться искать по ключевому слову документы, включающие его, а также позиции их вхождения (в 90-е это был бы уже довольно сильный MVP). Простыми алгоритмами — [полиномиальными хешами](/cs/hashing), [z- и префикс-функцией](/cs/string-searching) и даже [Ахо-Корасиком](/cs/automata/aho-corasick) — это сделать быстро нельзя, потому что на каждый раз нужно проходиться по всем данным, а суффиксными структурами — можно.
+**Где это может быть полезно.** Пусть вы хотите основать ещё один поисковик, и чтобы получить финансирование, вам нужно сделать хоть что-то минимально работающее — хотя бы просто научиться искать по ключевому слову документы, включающие его, а также позиции их вхождения (в 90-е это был бы уже довольно сильный MVP). Простыми алгоритмами — [полиномиальными хешами](/cs/hashing), [z- и префикс-функцией](/cs/string-searching) и даже [Ахо-Корасиком](../aho-corasick) — это сделать быстро нельзя, потому что на каждый раз нужно проходиться по всем данным, а суффиксными структурами — можно.
 
 В случае с суффиксным массивом можно сделать следующее: сконкатенировать все строки-документы с каким-нибудь внеалфавитным разделителем (`$`), построить по ним суффиксный массив, а дальше для каждого запроса искать бинарным поиском первый суффикс в суффиксном массиве, который меньше искомого слова, а также последний, который меньше. Все суффиксы между этими двумя будут включать искомую строку как префикс.
 
@@ -132,11 +132,11 @@ vector<int> suffix_array(vector<int> &s) {
 
 Тогда есть мотивация посчитать массив `lcp$` в котором окажутся наибольшие общие префиксы соседних суффиксов, а после как-нибудь считать минимумы на отрезках в этом массиве (например, с помощью [разреженной таблицы](/cs/range-queries/sparse-table)).
 
-Осталось придумать способ быстро посчитать массив `lcp`. Можно воспользоваться идеей из построения суффиксного массива за $O(n \log^2 n)$: с помощью [хешей](hashing) и бинпоиска находить `lcp` для каждой пары соседей. Такой метод работает за $O(n \log n)$, но является не самым удобным и популярным.
+Осталось придумать способ быстро посчитать массив `lcp`. Можно воспользоваться идеей из построения суффиксного массива за $O(n \log^2 n)$: с помощью [хешей](/cs/hashing/polynomial/) и бинпоиска находить `lcp` для каждой пары соседей. Такой метод работает за $O(n \log n)$, но является не самым удобным и популярным.
 
 ### Алгоритм Касаи, Аримуры, Арикавы, Ли, Парка
 
-Алгоритм в реальности называется как угодно, но не исходным способом (*алгоритм Касаи*, *алгоритм пяти корейцев*, и т. д.). Используется для подсчета $lcp$ за линейное время. Автору алгоритм кажется чем-то похожим на [z-функцию](string-searching) по своей идее.
+Алгоритм в реальности называется как угодно, но не исходным способом (*алгоритм Касаи*, *алгоритм пяти корейцев*, и т. д.). Используется для подсчета $lcp$ за линейное время. Автору алгоритм кажется чем-то похожим на [z-функцию](/cs/string-searching/z-function) по своей идее.
 
 **Утверждение.** Пусть мы уже построили суфмасс и посчитали $lcp[i]$. Тогда:
 
diff --git a/content/russian/cs/tree-structures/treap.md b/content/russian/cs/tree-structures/treap.md
index dd3417dd..ad11c794 100644
--- a/content/russian/cs/tree-structures/treap.md
+++ b/content/russian/cs/tree-structures/treap.md
@@ -100,7 +100,7 @@ $$
 
 Примечательно, что ожидаемая глубина вершин зависит от их позиции: вершина из середины должна быть примерно в два раза глубже, чем крайняя.
 
-**Упражнение.** Выведите по аналогии с этим рассуждением асимптотику [quicksort](/cs/sorting/quicksort).
+**Упражнение.** Выведите по аналогии с этим рассуждением асимптотику quicksort.<!-- [quicksort](/cs/sorting/quicksort). -->
 
 ## Реализация
 
@@ -199,7 +199,7 @@ struct Node {
 Вместо того, чтобы модифицировать и `merge`, и `split` под наши хотелки, напишем вспомогательную функцию `upd`, которую будем вызывать при обновлении детей вершины:
 
 ```c++
-void sum(Node* v) { return v ? v->sum : 0; }
+int sum(Node* v) { return v ? v->sum : 0; }
 // обращаться по пустому указателю нельзя -- выдаст ошибку
 
 void upd(Node* v) { v->sum = sum(v->l) + sum(v->r) + v->val; }
diff --git a/netlify.toml b/netlify.toml
index 1b5ed16e..fb612037 100644
--- a/netlify.toml
+++ b/netlify.toml
@@ -2,7 +2,7 @@
 command = "hugo --gc --minify"
 
 [context.production.environment]
-HUGO_VERSION = "0.87.0"
+HUGO_VERSION = "0.96.0"
 HUGO_ENV = "production"
 HUGO_ENABLEGITINFO = "true"
 
@@ -10,20 +10,20 @@ HUGO_ENABLEGITINFO = "true"
 command = "hugo --gc --minify --enableGitInfo"
 
 [context.split1.environment]
-HUGO_VERSION = "0.87.0"
+HUGO_VERSION = "0.96.0"
 HUGO_ENV = "production"
 
 [context.deploy-preview]
 command = "hugo --gc --minify --buildFuture -b $DEPLOY_PRIME_URL"
 
 [context.deploy-preview.environment]
-HUGO_VERSION = "0.87.0"
+HUGO_VERSION = "0.96.0"
 
 [context.branch-deploy]
 command = "hugo --gc --minify -b $DEPLOY_PRIME_URL"
 
 [context.branch-deploy.environment]
-HUGO_VERSION = "0.87.0"
+HUGO_VERSION = "0.96.0"
 
 [context.next.environment]
 HUGO_ENABLEGITINFO = "true"
diff --git a/scripts/check-links.sh b/scripts/check-links.sh
new file mode 100644
index 00000000..9f87cefd
--- /dev/null
+++ b/scripts/check-links.sh
@@ -0,0 +1,2 @@
+# hugo serve
+wget --spider -r -nd -nv http://localhost:1313/
diff --git a/scripts/list-files.sh b/scripts/list-files.sh
new file mode 100644
index 00000000..47259b5c
--- /dev/null
+++ b/scripts/list-files.sh
@@ -0,0 +1 @@
+find ./ -type f -name "*.md" -exec wc {} +
diff --git a/themes/algorithmica/assets/dark.sass b/themes/algorithmica/assets/dark.sass
index c26997ba..b5a53b28 100644
--- a/themes/algorithmica/assets/dark.sass
+++ b/themes/algorithmica/assets/dark.sass
@@ -1,24 +1,22 @@
-$font-color: rgb(206, 177, 150)
-$background: black
-$borders: 1px solid #d4ae8d
+$font-color: #DDD
+$background: #222
+$borders: 1px solid rgb(57, 57, 57)
 
-$code-background: #222
-$code-border: 1px solid #333
-$quote-line-color: 0.25em #d4ae8d solid
+$code-background: #333
+$code-border: 1px solid #444
+$quote-line-color: 0.25em #444 solid
 
-$dimmed: #cea163
-$section-headers: #c77d0f
-$headers-color: rgb(200, 160, 130)
+$dimmed: rgb(179, 179, 179)
+$section-headers: rgb(239, 239, 239)
+$headers-color: rgb(239, 239, 239)
 $scrollbar1: #444
 $scrollbar2: #555
 $scrollbar3: #666
 
-$link-color: #ac7625
-$link-hover-color: #eb9a20
+$link-color: #80acd3
+$link-hover-color: #5490c5
 
 @import style.sass
 
 img
-  //filter: invert(100%) sepia(100%) saturate(0%) hue-rotate(288deg) brightness(102%) contrast(102%)
-  filter: invert(100%) sepia(20%) saturate(36.4%) hue-rotate(29deg) brightness(85%)
-  
\ No newline at end of file
+  filter: invert(85%) sepia(20%) saturate(100%) hue-rotate(29deg) brightness(85%)
diff --git a/themes/algorithmica/assets/style.sass b/themes/algorithmica/assets/style.sass
index fe3ebaeb..00a420cf 100644
--- a/themes/algorithmica/assets/style.sass
+++ b/themes/algorithmica/assets/style.sass
@@ -157,6 +157,11 @@ body
       &::before
         content: counter(chapter-counter) "." counter(section-counter) ". "
         font-weight: bold
+  
+  .draft, .draft a
+    color: $dimmed
+
+    
 
 #wrapper
   width: 100%
@@ -182,10 +187,10 @@ menu
   display: flex
   font-family: $font-headings
   
-  height: 30px
+  height: 26px
   background-color: $background
   justify-content: space-between
-  padding: 12px
+  padding: 14px
   margin: 0
   text-align: center
 
@@ -217,7 +222,37 @@ menu
     .title
       opacity: 1
       transition: opacity 0.1s
-    
+
+#search
+  display: none
+  font-family: $font-interface
+
+  input
+    width: 100%
+    padding: 6px
+
+    color: $font-color
+
+    background: $code-background
+    border: $code-border
+
+    &:focus
+      outline: 1px solid $dimmed
+
+  #search-count
+    margin-top: 8px
+    color: $dimmed
+  
+  #search-results
+    margin-top: 6px
+    border-bottom: $borders
+
+    li
+      list-style: none
+      margin: 12px 6px
+
+    p
+      margin-top: 0
 
 /*
   .github
@@ -460,7 +495,13 @@ pre
   padding-left: 8px
   font-size: 0.85em
   text-align: left
-  
+
+pre.center-pre
+  text-align: center
+  font-size: 1em
+  background: none
+  border: none
+
 .highlight
   margin: 0px
 
diff --git a/themes/algorithmica/i18n/en.toml b/themes/algorithmica/i18n/en.toml
index d58a7924..6fa12340 100644
--- a/themes/algorithmica/i18n/en.toml
+++ b/themes/algorithmica/i18n/en.toml
@@ -15,6 +15,15 @@ other = "updated"
 [sections]
 other = "sections"
 
+[search]
+other = "Search this book…"
+
+[searchCountPrefix]
+other = "Found"
+
+[searchCountSuffix]
+other = "pages"
+
 [prerequisites]
 other = "prerequisites"
 
@@ -22,7 +31,7 @@ other = "prerequisites"
 other = "translations"
 
 [copyright1]
-other = "Copyright 2021 Sergey Slotin"
+other = "Copyright 2021–2022 Sergey Slotin"
 
 [copyright2]
 other = " " # Content is distributed under <a href='https://tldrlegal.com/license/creative-commons-attribution-noncommercial-4.0-international-(cc-by-nc-4.0)'>CC BY-NC</a>
diff --git a/themes/algorithmica/i18n/ru.toml b/themes/algorithmica/i18n/ru.toml
index a25a0c27..08d47b66 100644
--- a/themes/algorithmica/i18n/ru.toml
+++ b/themes/algorithmica/i18n/ru.toml
@@ -21,6 +21,15 @@ other = "обновлено"
 [sections]
 other = "статьи раздела"
 
+[search]
+other = "Поиск по сайту…"
+
+[searchCountPrefix]
+other = "Найдено"
+
+[searchCountSuffix]
+other = "страниц"
+
 [prerequisites]
 other = "пререквизиты"
 
@@ -28,7 +37,7 @@ other = "пререквизиты"
 other = "переводы"
 
 [copyright1]
-other = "Copyleft 2017–2021 Тинькофф Образование" # {{ .Count / . }}
+other = "Copyleft 2017–2022 Algorithmica.org" # {{ .Count / . }}
 
 [copyright2]
 other = "Материалы распространяются под <a href='https://tldrlegal.com/license/creative-commons-attribution-sharealike-4.0-international-(cc-by-sa-4.0)'>CC BY-SA</a>"
diff --git a/themes/algorithmica/layouts/_default/_markup/render-codeblock-center.html b/themes/algorithmica/layouts/_default/_markup/render-codeblock-center.html
new file mode 100644
index 00000000..d263bb5a
--- /dev/null
+++ b/themes/algorithmica/layouts/_default/_markup/render-codeblock-center.html
@@ -0,0 +1,3 @@
+<pre class='center-pre'>
+{{.Inner}}
+</pre>
diff --git a/themes/algorithmica/layouts/_default/baseof.html b/themes/algorithmica/layouts/_default/baseof.html
index f9056521..dbe71ede 100644
--- a/themes/algorithmica/layouts/_default/baseof.html
+++ b/themes/algorithmica/layouts/_default/baseof.html
@@ -6,6 +6,7 @@
     <div id='wrapper' {{ if .Params.HideSidebar }}class='sidebar-hidden sidebar-toggled'{{end}}>
       {{- partial "buttons.html" . -}}
       <main>
+        {{ partial "search.html" . }}
         {{- partial "header.html" . -}}
         <article>
           {{- block "main" . }}{{- end }}
diff --git a/themes/algorithmica/layouts/_default/list.searchindex.json b/themes/algorithmica/layouts/_default/list.searchindex.json
new file mode 100644
index 00000000..6310c263
--- /dev/null
+++ b/themes/algorithmica/layouts/_default/list.searchindex.json
@@ -0,0 +1,5 @@
+{{- $.Scratch.Add "searchindex" slice -}}
+{{- range $index, $element := .Site.Pages -}}
+    {{- $.Scratch.Add "searchindex" (dict "id" $index "title" $element.Title "path" $element.RelPermalink "content" $element.Plain) -}}
+{{- end -}}
+{{- $.Scratch.Get "searchindex" | jsonify -}}
diff --git a/themes/algorithmica/layouts/partials/buttons.html b/themes/algorithmica/layouts/partials/buttons.html
index ce9d5728..265b63d9 100644
--- a/themes/algorithmica/layouts/partials/buttons.html
+++ b/themes/algorithmica/layouts/partials/buttons.html
@@ -3,16 +3,21 @@
   {{ with .File }}{{ $path = .Path }}{{ end }}
   <div class='left'>
     <a>
-      <img src='/icons/bars-solid.svg' onclick='toggleSidebar()' title='open table of contents'>
+      <img src='/icons/bars-solid.svg'
+           onclick='toggleSidebar()'
+           title='open table of contents'>
     </a>
     <a>
-      <img src='/icons/adjust-solid.svg' onclick='switchTheme(localStorage.getItem("theme") == "dark" ? "light" : "dark")' title='dark theme'>
+      <img src='/icons/adjust-solid.svg'
+           style='position: relative; top: -1px'
+           onclick='switchTheme(localStorage.getItem("theme") == "dark" ? "light" : "dark")'
+           title='dark theme'>
     </a>
-    <!--
     <a>
-      <img src='/icons/search-solid.svg'>
+      <img src='/icons/search-solid.svg'
+           onclick='toggleSearch()'
+           title='search'>
     </a>
-    -->
   </div>
   <div class='title'>{{.Title}}</div>
   <div class='right'>
@@ -20,7 +25,9 @@
       <img src='/icons/print-solid.svg' title='print'>
     </a>
     <a href='https://prose.io/#algorithmica-org/algorithmica/edit/master/{{.Site.Params.ContentDir}}/{{$path}}'>
-      <img src='/icons/edit-solid.svg' title='edit'>
+      <img src='/icons/edit-solid.svg'
+           title='edit'
+           style='width: 18px; position: relative; right: -2px; top: -1px'>
     </a>
     <a href='{{.Site.Params.Repo}}/blob/master/{{.Site.Params.ContentDir}}/{{$path}}' class='github-main'>
       <img src='/icons/github-brands.svg' title='view on github'>
diff --git a/themes/algorithmica/layouts/partials/head.html b/themes/algorithmica/layouts/partials/head.html
index f87a8873..c5013dba 100644
--- a/themes/algorithmica/layouts/partials/head.html
+++ b/themes/algorithmica/layouts/partials/head.html
@@ -10,6 +10,11 @@
   <link rel="stylesheet" type="text/css" href="https://tikzjax.com/v1/fonts.css">
   <script src="https://tikzjax.com/v1/tikzjax.js"></script>
 
+  <script src="https://cdnjs.cloudflare.com/ajax/libs/lunr.js/2.3.9/lunr.min.js"></script>
+  <script src="/scripts/lunr.stemmer.support.min.js"></script>
+  <script src="/scripts/lunr.ru.min.js"></script>
+  <script src="/scripts/lunr.multi.min.js"></script>
+
   {{ $dark := resources.Get "dark.sass" | toCSS | minify | fingerprint }}
   <link rel="stylesheet" id="theme">
 
@@ -18,22 +23,101 @@
       console.log("Toggling sidebar visibility")
       var sidebar = document.getElementById('sidebar')
       var wrapper = document.getElementById('wrapper')
-      if (sidebar.classList.contains('sidebar-toggled') || window.getComputedStyle(sidebar).display == 'block') {
+      if (sidebar.classList.contains('sidebar-toggled') || window.getComputedStyle(sidebar).display == 'block') { 
         sidebar.classList.toggle('sidebar-hidden')
         wrapper.classList.toggle('sidebar-hidden')
       }
       sidebar.classList.add('sidebar-toggled')
       wrapper.classList.add('sidebar-toggled')
     }
+
     function switchTheme(theme) {
       console.log("Changing theme:", theme)
       document.getElementById('theme').href = (theme == 'dark' ? "{{ $dark.RelPermalink }}" : "")
       document.getElementById('syntax-theme').href = (theme == 'dark' ? '/syntax-dark.css' : '/syntax.css')
       localStorage.setItem('theme', theme)
     }
+
+    async function toggleSearch() {
+      console.log("Toggling search")
+      
+      var searchDiv = document.getElementById('search')
+      if (window.getComputedStyle(searchDiv).display == 'none') {
+        searchDiv.style.display = 'block'
+        window.scrollTo({ top: 0 });
+        document.getElementById('search-bar').focus()
+      } else {
+        searchDiv.style.display = 'none'  
+      }
+
+      if (!index) {
+        console.log("Fetching index")
+        const response = await fetch('/searchindex.json')
+        const pages = await response.json()
+        index = lunr(function() {
+          this.use(lunr.multiLanguage('en', 'ru'))
+          this.field('title', {
+            boost: 5
+          })
+          this.field('content', {
+            boost: 1
+          })
+          pages.forEach(function(doc) {
+            this.add(doc)
+            articles.push(doc)
+          }, this)
+        })
+        console.log("Ready to search")
+      }
+    }
+
+    var articles = []
+    var index = undefined
+
+    function search() {
+      var query = document.getElementById('search-bar').value
+      var resultsDiv = document.getElementById('search-results')
+      var countDiv = document.getElementById('search-count')
+      
+      if (query == '') {
+        resultsDiv.innerHTML = ''
+        countDiv.innerHTML = ''
+        return
+      }
+      
+      var results = index.search(query)
+
+      countDiv.innerHTML = '{{ T "searchCountPrefix" }} <b>' + results.length + '</b> {{ T "searchCountSuffix" }}'
+
+      let resultList = ''
+
+      for (const n in results) {
+        const item = articles[results[n].ref]
+        resultList += '<li><a href="' + item.path + '">' + item.title + '</a> <p>'
+        const text = item.content
+
+        const contextLimit = 80
+
+        if (text.includes(query)) {
+          const start = text.indexOf(query)
+          if (start > contextLimit)
+            resultList += '…'
+          resultList += text.substring(start - contextLimit, start)
+                      + '<b>' + query + '</b>' + text.substring(start + query.length, start + query.length + contextLimit)
+
+        } else {
+          resultList += text.substring(0, contextLimit * 2)
+        }
+        resultList += '…</p></li>'
+      }
+
+      resultsDiv.innerHTML = resultList
+    }
+
     if (localStorage.getItem('theme') == 'dark') {
       switchTheme('dark')
     }
+
     window.addEventListener('load', function() {
       var el = document.getElementById("active-element")
       //console.log(el)
@@ -46,6 +130,7 @@
         toggleSidebar()
       }*/
     })
+
     window.addEventListener('scroll', function() {
       var menu = document.getElementById('menu')
       if (window.scrollY < 120) {
@@ -56,8 +141,10 @@
         menu.classList.add('scrolled')
       }
     })
+
     window.addEventListener('keydown', function(e) {
       if (e.altKey) { return }
+      if (document.activeElement.tagName == 'INPUT') { return }
       if (e.key == 'ArrowLeft') {
         document.getElementById('prev-article').click()
       } else if (e.key == 'ArrowRight') {
diff --git a/themes/algorithmica/layouts/partials/search.html b/themes/algorithmica/layouts/partials/search.html
new file mode 100644
index 00000000..ee853dfa
--- /dev/null
+++ b/themes/algorithmica/layouts/partials/search.html
@@ -0,0 +1,6 @@
+<div id="search">
+    <input id="search-bar" type="search" placeholder='{{ T "search" }}' oninput="search()">
+    <div id="search-count"></div>
+    <div id="search-results">
+    </div>
+</div>
diff --git a/themes/algorithmica/layouts/partials/sidebar.html b/themes/algorithmica/layouts/partials/sidebar.html
index 2276957a..652a1f1b 100644
--- a/themes/algorithmica/layouts/partials/sidebar.html
+++ b/themes/algorithmica/layouts/partials/sidebar.html
@@ -24,13 +24,13 @@
         {{ if isset .Params "part" }}
           <li class='part'>{{.Params.Part}}</li>
         {{ end }}
-        <li {{ if .Params.IgnoreIndexing }}class='ignore-indexing'{{end}}><a href='{{ .RelPermalink }}'
+        <li class='{{ if .Draft }}draft{{end}} {{ if .Params.IgnoreIndexing }}ignore-indexing{{end}}'><a href='{{ .RelPermalink }}'
           {{ if eq $currentPage . }}id='active-element'{{ end }}
           >{{ .Title }}</a></li>
         {{ if .IsSection }}
           <ol>
             {{ range .Pages }}
-              <li><a href='{{ .RelPermalink }}'
+              <li {{ if .Draft }}class='draft'{{end}}><a href='{{ .RelPermalink }}'
                 {{ if eq $currentPage . }}id='active-element'{{ end }}
                 >{{ .Title }}</a></li>
             {{ end }}
diff --git a/themes/algorithmica/static/scripts/lunr.multi.min.js b/themes/algorithmica/static/scripts/lunr.multi.min.js
new file mode 100644
index 00000000..6f417304
--- /dev/null
+++ b/themes/algorithmica/static/scripts/lunr.multi.min.js
@@ -0,0 +1 @@
+!function(e,t){"function"==typeof define&&define.amd?define(t):"object"==typeof exports?module.exports=t():t()(e.lunr)}(this,function(){return function(e){e.multiLanguage=function(){for(var t=Array.prototype.slice.call(arguments),i=t.join("-"),r="",n=[],s=[],p=0;p<t.length;++p)"en"==t[p]?(r+="\\w",n.unshift(e.stopWordFilter),n.push(e.stemmer),s.push(e.stemmer)):(r+=e[t[p]].wordCharacters,e[t[p]].stopWordFilter&&n.unshift(e[t[p]].stopWordFilter),e[t[p]].stemmer&&(n.push(e[t[p]].stemmer),s.push(e[t[p]].stemmer)));var o=e.trimmerSupport.generateTrimmer(r);return e.Pipeline.registerFunction(o,"lunr-multi-trimmer-"+i),n.unshift(o),function(){this.pipeline.reset(),this.pipeline.add.apply(this.pipeline,n),this.searchPipeline&&(this.searchPipeline.reset(),this.searchPipeline.add.apply(this.searchPipeline,s))}}}});
diff --git a/themes/algorithmica/static/scripts/lunr.ru.min.js b/themes/algorithmica/static/scripts/lunr.ru.min.js
new file mode 100644
index 00000000..f04c9d33
--- /dev/null
+++ b/themes/algorithmica/static/scripts/lunr.ru.min.js
@@ -0,0 +1 @@
+!function(e,n){"function"==typeof define&&define.amd?define(n):"object"==typeof exports?module.exports=n():n()(e.lunr)}(this,function(){return function(e){if(void 0===e)throw new Error("Lunr is not present. Please include / require Lunr before this script.");if(void 0===e.stemmerSupport)throw new Error("Lunr stemmer support is not present. Please include / require Lunr stemmer support before this script.");e.ru=function(){this.pipeline.reset(),this.pipeline.add(e.ru.trimmer,e.ru.stopWordFilter,e.ru.stemmer),this.searchPipeline&&(this.searchPipeline.reset(),this.searchPipeline.add(e.ru.stemmer))},e.ru.wordCharacters="Ѐ-҄҇-ԯᴫᵸⷠ-ⷿꙀ-ꚟ︮︯",e.ru.trimmer=e.trimmerSupport.generateTrimmer(e.ru.wordCharacters),e.Pipeline.registerFunction(e.ru.trimmer,"trimmer-ru"),e.ru.stemmer=function(){var n=e.stemmerSupport.Among,r=e.stemmerSupport.SnowballProgram,t=new function(){function e(){for(;!W.in_grouping(S,1072,1103);){if(W.cursor>=W.limit)return!1;W.cursor++}return!0}function t(){for(;!W.out_grouping(S,1072,1103);){if(W.cursor>=W.limit)return!1;W.cursor++}return!0}function w(){b=W.limit,_=b,e()&&(b=W.cursor,t()&&e()&&t()&&(_=W.cursor))}function i(){return _<=W.cursor}function u(e,n){var r,t;if(W.ket=W.cursor,r=W.find_among_b(e,n)){switch(W.bra=W.cursor,r){case 1:if(t=W.limit-W.cursor,!W.eq_s_b(1,"а")&&(W.cursor=W.limit-t,!W.eq_s_b(1,"я")))return!1;case 2:W.slice_del()}return!0}return!1}function o(){return u(h,9)}function s(e,n){var r;return W.ket=W.cursor,!!(r=W.find_among_b(e,n))&&(W.bra=W.cursor,1==r&&W.slice_del(),!0)}function c(){return s(g,26)}function m(){return!!c()&&(u(C,8),!0)}function f(){return s(k,2)}function l(){return u(P,46)}function a(){s(v,36)}function p(){var e;W.ket=W.cursor,(e=W.find_among_b(F,2))&&(W.bra=W.cursor,i()&&1==e&&W.slice_del())}function d(){var e;if(W.ket=W.cursor,e=W.find_among_b(q,4))switch(W.bra=W.cursor,e){case 1:if(W.slice_del(),W.ket=W.cursor,!W.eq_s_b(1,"н"))break;W.bra=W.cursor;case 2:if(!W.eq_s_b(1,"н"))break;case 3:W.slice_del()}}var _,b,h=[new n("в",-1,1),new n("ив",0,2),new n("ыв",0,2),new n("вши",-1,1),new n("ивши",3,2),new n("ывши",3,2),new n("вшись",-1,1),new n("ившись",6,2),new n("ывшись",6,2)],g=[new n("ее",-1,1),new n("ие",-1,1),new n("ое",-1,1),new n("ые",-1,1),new n("ими",-1,1),new n("ыми",-1,1),new n("ей",-1,1),new n("ий",-1,1),new n("ой",-1,1),new n("ый",-1,1),new n("ем",-1,1),new n("им",-1,1),new n("ом",-1,1),new n("ым",-1,1),new n("его",-1,1),new n("ого",-1,1),new n("ему",-1,1),new n("ому",-1,1),new n("их",-1,1),new n("ых",-1,1),new n("ею",-1,1),new n("ою",-1,1),new n("ую",-1,1),new n("юю",-1,1),new n("ая",-1,1),new n("яя",-1,1)],C=[new n("ем",-1,1),new n("нн",-1,1),new n("вш",-1,1),new n("ивш",2,2),new n("ывш",2,2),new n("щ",-1,1),new n("ющ",5,1),new n("ующ",6,2)],k=[new n("сь",-1,1),new n("ся",-1,1)],P=[new n("ла",-1,1),new n("ила",0,2),new n("ыла",0,2),new n("на",-1,1),new n("ена",3,2),new n("ете",-1,1),new n("ите",-1,2),new n("йте",-1,1),new n("ейте",7,2),new n("уйте",7,2),new n("ли",-1,1),new n("или",10,2),new n("ыли",10,2),new n("й",-1,1),new n("ей",13,2),new n("уй",13,2),new n("л",-1,1),new n("ил",16,2),new n("ыл",16,2),new n("ем",-1,1),new n("им",-1,2),new n("ым",-1,2),new n("н",-1,1),new n("ен",22,2),new n("ло",-1,1),new n("ило",24,2),new n("ыло",24,2),new n("но",-1,1),new n("ено",27,2),new n("нно",27,1),new n("ет",-1,1),new n("ует",30,2),new n("ит",-1,2),new n("ыт",-1,2),new n("ют",-1,1),new n("уют",34,2),new n("ят",-1,2),new n("ны",-1,1),new n("ены",37,2),new n("ть",-1,1),new n("ить",39,2),new n("ыть",39,2),new n("ешь",-1,1),new n("ишь",-1,2),new n("ю",-1,2),new n("ую",44,2)],v=[new n("а",-1,1),new n("ев",-1,1),new n("ов",-1,1),new n("е",-1,1),new n("ие",3,1),new n("ье",3,1),new n("и",-1,1),new n("еи",6,1),new n("ии",6,1),new n("ами",6,1),new n("ями",6,1),new n("иями",10,1),new n("й",-1,1),new n("ей",12,1),new n("ией",13,1),new n("ий",12,1),new n("ой",12,1),new n("ам",-1,1),new n("ем",-1,1),new n("ием",18,1),new n("ом",-1,1),new n("ям",-1,1),new n("иям",21,1),new n("о",-1,1),new n("у",-1,1),new n("ах",-1,1),new n("ях",-1,1),new n("иях",26,1),new n("ы",-1,1),new n("ь",-1,1),new n("ю",-1,1),new n("ию",30,1),new n("ью",30,1),new n("я",-1,1),new n("ия",33,1),new n("ья",33,1)],F=[new n("ост",-1,1),new n("ость",-1,1)],q=[new n("ейше",-1,1),new n("н",-1,2),new n("ейш",-1,1),new n("ь",-1,3)],S=[33,65,8,232],W=new r;this.setCurrent=function(e){W.setCurrent(e)},this.getCurrent=function(){return W.getCurrent()},this.stem=function(){return w(),W.cursor=W.limit,!(W.cursor<b)&&(W.limit_backward=b,o()||(W.cursor=W.limit,f()||(W.cursor=W.limit),m()||(W.cursor=W.limit,l()||(W.cursor=W.limit,a()))),W.cursor=W.limit,W.ket=W.cursor,W.eq_s_b(1,"и")?(W.bra=W.cursor,W.slice_del()):W.cursor=W.limit,p(),W.cursor=W.limit,d(),!0)}};return function(e){return"function"==typeof e.update?e.update(function(e){return t.setCurrent(e),t.stem(),t.getCurrent()}):(t.setCurrent(e),t.stem(),t.getCurrent())}}(),e.Pipeline.registerFunction(e.ru.stemmer,"stemmer-ru"),e.ru.stopWordFilter=e.generateStopWordFilter("алло без близко более больше будем будет будете будешь будто буду будут будь бы бывает бывь был была были было быть в важная важное важные важный вам вами вас ваш ваша ваше ваши вверх вдали вдруг ведь везде весь вниз внизу во вокруг вон восемнадцатый восемнадцать восемь восьмой вот впрочем времени время все всегда всего всем всеми всему всех всею всю всюду вся всё второй вы г где говорил говорит год года году да давно даже далеко дальше даром два двадцатый двадцать две двенадцатый двенадцать двух девятнадцатый девятнадцать девятый девять действительно дел день десятый десять для до довольно долго должно другая другие других друго другое другой е его ее ей ему если есть еще ещё ею её ж же жизнь за занят занята занято заняты затем зато зачем здесь значит и из или им именно иметь ими имя иногда их к каждая каждое каждые каждый кажется как какая какой кем когда кого ком кому конечно которая которого которой которые который которых кроме кругом кто куда лет ли лишь лучше люди м мало между меля менее меньше меня миллионов мимо мира мне много многочисленная многочисленное многочисленные многочисленный мной мною мог могут мож может можно можхо мои мой мор мочь моя моё мы на наверху над надо назад наиболее наконец нам нами нас начала наш наша наше наши не него недавно недалеко нее ней нельзя нем немного нему непрерывно нередко несколько нет нею неё ни нибудь ниже низко никогда никуда ними них ничего но ну нужно нх о об оба обычно один одиннадцатый одиннадцать однажды однако одного одной около он она они оно опять особенно от отовсюду отсюда очень первый перед по под пожалуйста позже пока пор пора после посреди потом потому почему почти прекрасно при про просто против процентов пятнадцатый пятнадцать пятый пять раз разве рано раньше рядом с сам сама сами самим самими самих само самого самой самом самому саму свое своего своей свои своих свою сеаой себе себя сегодня седьмой сейчас семнадцатый семнадцать семь сих сказал сказала сказать сколько слишком сначала снова со собой собою совсем спасибо стал суть т та так такая также такие такое такой там твой твоя твоё те тебе тебя тем теми теперь тех то тобой тобою тогда того тоже только том тому тот тою третий три тринадцатый тринадцать ту туда тут ты тысяч у уж уже уметь хорошо хотеть хоть хотя хочешь часто чаще чего человек чем чему через четвертый четыре четырнадцатый четырнадцать что чтоб чтобы чуть шестнадцатый шестнадцать шестой шесть эта эти этим этими этих это этого этой этом этому этот эту я \ufeffа".split(" ")),e.Pipeline.registerFunction(e.ru.stopWordFilter,"stopWordFilter-ru")}});
diff --git a/themes/algorithmica/static/scripts/lunr.stemmer.support.min.js b/themes/algorithmica/static/scripts/lunr.stemmer.support.min.js
new file mode 100644
index 00000000..abd4475b
--- /dev/null
+++ b/themes/algorithmica/static/scripts/lunr.stemmer.support.min.js
@@ -0,0 +1 @@
+!function(r,t){"function"==typeof define&&define.amd?define(t):"object"==typeof exports?module.exports=t():t()(r.lunr)}(this,function(){return function(r){r.stemmerSupport={Among:function(r,t,i,s){if(this.toCharArray=function(r){for(var t=r.length,i=new Array(t),s=0;s<t;s++)i[s]=r.charCodeAt(s);return i},!r&&""!=r||!t&&0!=t||!i)throw"Bad Among initialisation: s:"+r+", substring_i: "+t+", result: "+i;this.s_size=r.length,this.s=this.toCharArray(r),this.substring_i=t,this.result=i,this.method=s},SnowballProgram:function(){var r;return{bra:0,ket:0,limit:0,cursor:0,limit_backward:0,setCurrent:function(t){r=t,this.cursor=0,this.limit=t.length,this.limit_backward=0,this.bra=this.cursor,this.ket=this.limit},getCurrent:function(){var t=r;return r=null,t},in_grouping:function(t,i,s){if(this.cursor<this.limit){var e=r.charCodeAt(this.cursor);if(e<=s&&e>=i&&(e-=i,t[e>>3]&1<<(7&e)))return this.cursor++,!0}return!1},in_grouping_b:function(t,i,s){if(this.cursor>this.limit_backward){var e=r.charCodeAt(this.cursor-1);if(e<=s&&e>=i&&(e-=i,t[e>>3]&1<<(7&e)))return this.cursor--,!0}return!1},out_grouping:function(t,i,s){if(this.cursor<this.limit){var e=r.charCodeAt(this.cursor);if(e>s||e<i)return this.cursor++,!0;if(e-=i,!(t[e>>3]&1<<(7&e)))return this.cursor++,!0}return!1},out_grouping_b:function(t,i,s){if(this.cursor>this.limit_backward){var e=r.charCodeAt(this.cursor-1);if(e>s||e<i)return this.cursor--,!0;if(e-=i,!(t[e>>3]&1<<(7&e)))return this.cursor--,!0}return!1},eq_s:function(t,i){if(this.limit-this.cursor<t)return!1;for(var s=0;s<t;s++)if(r.charCodeAt(this.cursor+s)!=i.charCodeAt(s))return!1;return this.cursor+=t,!0},eq_s_b:function(t,i){if(this.cursor-this.limit_backward<t)return!1;for(var s=0;s<t;s++)if(r.charCodeAt(this.cursor-t+s)!=i.charCodeAt(s))return!1;return this.cursor-=t,!0},find_among:function(t,i){for(var s=0,e=i,n=this.cursor,u=this.limit,o=0,h=0,c=!1;;){for(var a=s+(e-s>>1),f=0,l=o<h?o:h,_=t[a],m=l;m<_.s_size;m++){if(n+l==u){f=-1;break}if(f=r.charCodeAt(n+l)-_.s[m])break;l++}if(f<0?(e=a,h=l):(s=a,o=l),e-s<=1){if(s>0||e==s||c)break;c=!0}}for(;;){var _=t[s];if(o>=_.s_size){if(this.cursor=n+_.s_size,!_.method)return _.result;var b=_.method();if(this.cursor=n+_.s_size,b)return _.result}if((s=_.substring_i)<0)return 0}},find_among_b:function(t,i){for(var s=0,e=i,n=this.cursor,u=this.limit_backward,o=0,h=0,c=!1;;){for(var a=s+(e-s>>1),f=0,l=o<h?o:h,_=t[a],m=_.s_size-1-l;m>=0;m--){if(n-l==u){f=-1;break}if(f=r.charCodeAt(n-1-l)-_.s[m])break;l++}if(f<0?(e=a,h=l):(s=a,o=l),e-s<=1){if(s>0||e==s||c)break;c=!0}}for(;;){var _=t[s];if(o>=_.s_size){if(this.cursor=n-_.s_size,!_.method)return _.result;var b=_.method();if(this.cursor=n-_.s_size,b)return _.result}if((s=_.substring_i)<0)return 0}},replace_s:function(t,i,s){var e=s.length-(i-t),n=r.substring(0,t),u=r.substring(i);return r=n+s+u,this.limit+=e,this.cursor>=i?this.cursor+=e:this.cursor>t&&(this.cursor=t),e},slice_check:function(){if(this.bra<0||this.bra>this.ket||this.ket>this.limit||this.limit>r.length)throw"faulty slice operation"},slice_from:function(r){this.slice_check(),this.replace_s(this.bra,this.ket,r)},slice_del:function(){this.slice_from("")},insert:function(r,t,i){var s=this.replace_s(r,t,i);r<=this.bra&&(this.bra+=s),r<=this.ket&&(this.ket+=s)},slice_to:function(){return this.slice_check(),r.substring(this.bra,this.ket)},eq_v_b:function(r){return this.eq_s_b(r.length,r)}}}},r.trimmerSupport={generateTrimmer:function(r){var t=new RegExp("^[^"+r+"]+"),i=new RegExp("[^"+r+"]+$");return function(r){return"function"==typeof r.update?r.update(function(r){return r.replace(t,"").replace(i,"")}):r.replace(t,"").replace(i,"")}}}}});
\ No newline at end of file