Skip to content

Commit d6bc8e8

Browse files
committed
corrections
1 parent 04e1d77 commit d6bc8e8

File tree

1 file changed

+6
-3
lines changed

1 file changed

+6
-3
lines changed

content/english/hpc/data-structures/b-tree.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ In the [previous article](../s-tree), we designed and implemented *static* B-tre
77

88
In this article, we follow up on that proposition and design a minimally functional search tree for integer keys, [achieving](#evaluation) up to 18x/8x speedup over `std::set` and up to 7x/2x speedup over [`absl::btree`](https://abseil.io/blog/20190812-btree) for `lower_bound` and `insert` queries, respectively, with yet ample room for improvement.
99

10-
The memory overhead of the structure is around 30%, and the [final implementation](https://github.com/sslotin/amh-code/blob/main/b-tree/btree-final.cc) is under 150 lines of C.
10+
The memory overhead of the structure is around 30%, and the final implementation is [under 150 lines of C++](https://github.com/sslotin/amh-code/blob/main/b-tree/btree-final.cc).
1111

1212
<!--
1313
@@ -146,7 +146,7 @@ Implementing search is easy, and it doesn't introduce much overhead. The hard pa
146146

147147
On the one side, correctly implementing insertion takes a lot of code, but on the other, most of that code is executed very infrequently, so we don't have to care about its performance that much. Most often, all we need to do is to reach the leaf node (which we've already figured out how to do) and then insert a new key into it, moving some suffix of the keys one position to the right. Occasionally, we also need to split the node and/or update some ancestors, but this is relatively rare, so let's focus on the most common execution path first.
148148

149-
To insert a key into an array of $(B - 1)$ sorted elements, we can load them in vector registers and then [mask-store](/hpc/simd/masking) them one position to the right using a precomputed mask that tells which elements need to be written for a given `i`:
149+
To insert a key into an array of $(B - 1)$ sorted elements, we can load them in vector registers and then [mask-store](/hpc/simd/masking) them one position to the right using a [precomputed](/hpc/compilation/precalc/) mask that tells which elements need to be written for a given `i`:
150150

151151
```c++
152152
struct Precalc {
@@ -176,6 +176,8 @@ void insert(int *node, int i, int x) {
176176
}
177177
```
178178
179+
This [constexpr magic](/hpc/compilation/precalc/) is the only C++ feature we use.
180+
179181
There are other ways to do it, some possibly more efficient, but we are going to stop there for now.
180182
181183
When we split a node, we need to move half of the keys to another node, so let's write another primitive that does it:
@@ -281,8 +283,9 @@ We want the evaluation to take a reasonable time, so our benchmark is a loop tha
281283
- Increase the structure size from $1.17^k$ to $1.17^{k+1}$ using individual `insert`s and measure the time it took.
282284
- Perform $10^6$ random `lower_bound` queries and measure the time it took.
283285
284-
We start at the size $10^4$ and end at $10^7$, for around $50$ data points in total. We generate the data for both query types uniformly in the $[0, 2^{30})$ range and independently between the stages. Since the data generation process allows for repeated keys, we compared against `std::multiset` and `absl::btree_multiset`, although we still refer to them as `std::set` and `absl::btree` for brevity. We also enable [hugepages](/hpc/cpu-cache/paging) on the system level for all three runs.
286+
We start at the size $10^4$ and end at $10^7$, for around $50$ data points in total. We generate the data for both query types uniformly in the $[0, 2^{30})$ range and independently between the stages. Since the data generation process allows for repeated keys, we compared against `std::multiset` and `absl::btree_multiset`[^absl], although we still refer to them as `std::set` and `absl::btree` for brevity. We also enable [hugepages](/hpc/cpu-cache/paging) on the system level for all three runs.
285287
288+
[^absl]: If you also think that only comparing with Abseil's B-tree is not convincing enough, [feel free](https://github.com/sslotin/amh-code/tree/main/b-tree) to add your favorite search tree to the benchmark.
286289
287290
<!--
288291

0 commit comments

Comments
 (0)