You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -15,7 +15,7 @@ Unlike other case studies of this book, in this one you will actually learn an a
15
15
16
16
### Benchmark
17
17
18
-
For all methods, we will implement `find_factor` function that takes a positive integer $n$ and returns either its smallest divisor (or `1` if the number is prime):
18
+
For all methods, we will implement `find_factor` function that takes a positive integer $n$ and returns any of its non-trivial divisors (or `1` if the number is prime):
19
19
20
20
```c++
21
21
// I don't feel like typing "unsigned long long" each time
@@ -45,35 +45,30 @@ Since after each removed factor the problem becomes considerably smaller and sim
45
45
46
46
For many factorization algorithms, including those presented in this article, the running time scales with the least prime factor. Therefore, to provide worst-case input, we use *semiprimes:* products of two prime numbers $p \le q$ that are on the same order of magnitude. To generate a $k$-bit semiprime, we generate two random $\lfloor k / 2 \rfloor$-bit primes.
47
47
48
-
Since some of the algorithms are inherently randomized, we also tolerate a small (<1%) percentage of errors, although they can be reduced to almost zero without significant performance penalties.
48
+
Since some of the algorithms are inherently randomized, we also tolerate a small (<1%) percentage of false negative errors (when `find_factor` returns `1` despite number $n$ being composite), although this rate can be reduced to almost zero without significant performance penalties.
49
49
50
50
### Trial division
51
51
52
-
Trial division was first described by Fibonacci in 1202. Although it was probably known to animals. Perhaps some animals can factor? The scientific priority probably belongs to dinosaurs or ancient fish trying to divvy stuff up.
52
+
<!--
53
53
54
-
I tried finding references to who invented trial division, but probably it was known to animals long before to split into equal parts.
54
+
Trial division was first described by Fibonacci in 1202. Although it was probably known to animals. Perhaps some animals can factor? The scientific priority probably belongs to dinosaurs or ancient fish trying to divvy stuff up.
55
55
56
56
0.056024
57
-
2043.968140
57
+
58
+
-->
59
+
60
+
The most basic approach is to try every number less than $n$ as a divosor:
58
61
59
62
```c++
60
63
u64find_factor(u64 n) {
61
-
for (u64 d = 2; d * d <= n; d++)
64
+
for (u64 d = 2; d < n; d++)
62
65
if (n % d == 0)
63
66
return d;
64
67
return 1;
65
68
}
66
69
```
67
70
68
-
This is the most basic algorithm to find a prime factorization.
69
-
70
-
We divide by each possible divisor $d$.
71
-
We can notice, that it is impossible that all prime factors of a composite number $n$ are bigger than $\sqrt{n}$.
72
-
Therefore, we only need to test the divisors $2 \le d \le \sqrt{n}$, which gives us the prime factorization in $O(\sqrt{n})$.
73
-
74
-
The smallest divisor has to be a prime number.
75
-
We remove the factor from the number, and repeat the process.
76
-
If we cannot find any divisor in the range $[2; \sqrt{n}]$, then the number itself has to be prime.
71
+
One simple optimization is to notice that it is enough to only check divisors that do not exceed $\sqrt n$. This works because if $n$ is divided by $d > \sqrt n$, then it is also divided by $\frac{n}{d} < \sqrt n$, so we can don't have to check it separately.
77
72
78
73
```c++
79
74
u64 find_factor(u64 n) {
@@ -84,13 +79,43 @@ u64 find_factor(u64 n) {
84
79
}
85
80
```
86
81
82
+
In our benchmark, $n$ is a semiprime, and we always find the lesser divisor, so both $O(n)$ and $O(\sqrt n)$ implementations perform the same and are able to factorize ~2k 30-bit numbers per second, while taking whole ~20 seconds to factorize a single 60-bit number.
83
+
84
+
### Lookup Table
85
+
86
+
Nowadays, you can type `factor 57` in your Linux terminal or Google search bar to get the factorization of any number. But before computers were invented, it was more practical to use *factorization tables:* special books containing factorizations of the first $N$ numbers.
87
+
88
+
We can also use this approach to compute these lookup tables [during compile time](/hpc/compilation/precalc/). To save space, it is convenient to only store the smallest divisor of a number, requiring just one byte for a 16-bit integer:
89
+
90
+
```c++
91
+
template <int N = (1<<16)>
92
+
structPrecalc {
93
+
unsigned char divisor[N];
94
+
95
+
constexpr Precalc() : divisor{} {
96
+
for (int i = 0; i < N; i++)
97
+
divisor[i] = 1;
98
+
for (int i = 2; i * i < N; i++)
99
+
if (divisor[i] == 1)
100
+
for (int k = i * i; k < N; k += i)
101
+
divisor[k] = i;
102
+
}
103
+
};
104
+
105
+
constexpr Precalc P{};
106
+
107
+
u64find_factor(u64 n) {
108
+
return P.divisor[n];
109
+
}
110
+
```
111
+
112
+
This approach can process 3M 16-bit integers per second, although it [probably gets slower](../hpc/cpu-cache/bandwidth/) for larger numbers. While it requires just a few milliseconds and 64KB of memory to calculate and store the divisors of the first $2^{16}$ numbers, it does not scale well for larger inputs.
113
+
87
114
### Wheel factorization
88
115
89
-
This is an optimization of the trial division.
90
-
The idea is the following.
91
-
Once we know that the number is not divisible by 2, we don't need to check every other even number.
92
-
This leaves us with only $50\%$ of the numbers to check.
93
-
After checking 2, we can simply start with 3 and skip every other number.
116
+
To save paper space, pre-computer era factorization tables typically excluded numbers divisible by 2 and 5: in decimal numeral system, you can quickly determine whether a number is divisible by 2 or 5 (by looking at its last digit) and keep dividing the number $n$ by 2 or 5 while it is possible, eventually arriving to some entry in the factorization table. This makes the factorization table just ½ × ⅘ = 0.4 its original size.
117
+
118
+
We can apply a similar trick to trial division, first checking if the number is divisible by $2$, and then only check for odd divisors:
94
119
95
120
```c++
96
121
u64 find_factor(u64 n) {
@@ -103,24 +128,27 @@ u64 find_factor(u64 n) {
103
128
}
104
129
```
105
130
106
-
This method can be extended.
107
-
If the number is not divisible by 3, we can also ignore all other multiples of 3 in the future computations.
108
-
So we only need to check the numbers $5, 7, 11, 13, 17, 19, 23, \dots$.
109
-
We can observe a pattern of these remaining numbers.
110
-
We need to check all numbers with $d \bmod 6 = 1$ and $d \bmod 6 = 5$.
111
-
So this leaves us with only $33.3\%$ percent of the numbers to check.
112
-
We can implement this by checking the primes 2 and 3 first, and then start checking with 5 and alternatively skip 1 or 3 numbers.
131
+
With 50% fewer divisions to do, this algorithm works twice as fast, but it can be extended. If the number is not divisible by $3$, we can also ignore all multiples of $3$, and the same goes for all other divisors.
132
+
133
+
The problem is, as we increase the number of primes to exclude, it becomes less straightforward to iterate only over the numbers not divisible by them as they follow an irregular pattern — unless the number of primes is small. For example, if we consider $2$, $3$, and $5$, then, among the first $90$ numbers, we only need to check:
134
+
135
+
```center
136
+
(1,) 7, 11, 13, 17, 19, 23, 29,
137
+
31, 37, 41, 43, 47, 49, 53, 59,
138
+
61, 67, 71, 73, 77, 79, 83, 89…
139
+
```
140
+
141
+
You can notice a pattern: the sequence repeats itself every $30$ numbers because remainder modulo $2 \times 3 \times 5 = 30$ is all we need to determine whether a number is divisible by $2$, $3$, or $5$. This means that we only need to check $8$ specific numbers in every $30$, proportionally improving the performance:
113
142
114
143
```c++
115
144
u64find_factor(u64 n) {
116
145
for (u64 d : {2, 3, 5})
117
146
if (n % d == 0)
118
147
return d;
119
-
u64 increments[] = {0, 4, 6, 10, 12, 16, 22, 24};
120
-
u64 sum = 30;
121
-
for (u64 d = 7; d * d <= n; d += sum) {
122
-
for (u64 k = 0; k < 8; k++) {
123
-
u64 x = d + increments[k];
148
+
u64 offsets[] = {0, 4, 6, 10, 12, 16, 22, 24};
149
+
for (u64 d = 7; d * d <= n; d += 30) {
150
+
for (u64 offset : offsets) {
151
+
u64 x = d + offset;
124
152
if (n % x == 0)
125
153
return x;
126
154
}
@@ -129,38 +157,80 @@ u64 find_factor(u64 n) {
129
157
}
130
158
```
131
159
132
-
We can extend this even further.
133
-
Here is an implementation for the prime number 2, 3 and 5.
134
-
It's convenient to use an array to store how much we have to skip.
160
+
As expected, it works $\frac{30}{8} = 3.75$ times faster than the naive trial division, processing about 7.6k 30-bit numbers per second. The performance can be improved by considering more primes, but the returns are diminishing: adding a new prime $p$ reduces the number of iterations by $\frac{1}{p}$, but increases the size of the skip-list by a factor of $p$, requiring proportionally more memory.
135
161
136
-
### Lookup table
162
+
### Precomputed Primes
137
163
138
-
We will choose to store smallest factors of first $2^16$ — because this way they all fit in just one byte, so we are sort of saving on memory here.
164
+
If we keep increasing the number of primes we exclude in wheel factorization, we eventually exclude all composite numbers and only check for prime factors. In this case, we don't need this array of offsets, but we need to precompute primes, which we can do during compile time like this:
139
165
140
166
```c++
141
-
template<int N = (1<<16)>
167
+
const int N = (1 << 16);
168
+
142
169
struct Precalc {
143
-
char divisor[N];
170
+
u16 primes[6542]; // # of primes under N=2^16
144
171
145
-
constexpr Precalc() : divisor{} {
146
-
for (int i = 0; i < N; i++)
147
-
divisor[i] = 1;
148
-
for (int i = 2; i * i < N; i++)
149
-
if (divisor[i] == 1)
150
-
for (int k = i * i; k < N; k += i)
151
-
divisor[k] = i;
172
+
constexpr Precalc() : primes{} {
173
+
bool marked[N] = {};
174
+
int n_primes = 0;
175
+
176
+
for (int i = 2; i < N; i++) {
177
+
if (!marked[i]) {
178
+
primes[n_primes++] = i;
179
+
for (int j = 2 * i; j < N; j += i)
180
+
marked[j] = true;
181
+
}
182
+
}
152
183
}
153
184
};
154
185
155
-
constexpr Precalc precalc{};
186
+
constexpr Precalc P{};
187
+
188
+
u64 find_factor(u64 n) {
189
+
for (u16 p : P.primes)
190
+
if (n % p == 0)
191
+
return p;
192
+
return 1;
193
+
}
194
+
```
195
+
196
+
This approach lets us process almost 20k 30-bit integers per second, but it does not work for larger (64-bit) numbers unless they have small ($< 2^{16}$) factors. Note that this is actually an asymptotic optimization: there are $O(\frac{n}{\ln n})$ primes among the first $n$ numbers, so this algorithm performs $O(\frac{\sqrt n}{\ln \sqrt n})$ operations, while wheel factorization only eliminates a large but fixed fraction of divisors. If we extend it to 64-bit numbers and precompute every prime under $2^{32}$ (storing which would require several hundred megabytes of memory), the relative speedup would grow by a factor of $\frac{\ln \sqrt{n^2}}{\ln \sqrt n} = 2 \cdot \frac{1/2}{1/2} \cdot \frac{\ln n}{\ln n} = 2$.
197
+
198
+
All variants of trial division, including this one, are bottlenecked by the speed of integer division, which can be [optimized](../hpc/arithmetic/division/) if we know the divisors in advice and allow for some precomputation:
199
+
200
+
```c++
201
+
// ...precomputation is the same as before,
202
+
// but we store the reciprocal instead of the prime number itself
203
+
u64 magic[6542];
204
+
// for each prime i:
205
+
magic[n_primes++] = u64(-1) / i + 1;
156
206
157
207
u64find_factor(u64 n) {
158
-
return precalc.divisor[n];
208
+
for (u64 m : P.magic)
209
+
if (m * n < m)
210
+
return u64(-1) / m + 1;
211
+
return 1;
159
212
}
160
213
```
161
214
215
+
This makes the algorithm ~18x faster: we can now process ~350k 30-bit numbers per second. This is actually the most efficient algorithm we have
216
+
217
+
218
+
$\tilde{O}(\sqrt n)$ territory
219
+
162
220
### Pollard's Rho Algorithm
163
221
222
+
---
223
+
224
+
```c++
225
+
u64 find_factor(u64 n) {
226
+
while (true) {
227
+
if (u64 g = gcd(randint(2, n - 1), n); g != 1)
228
+
return g;
229
+
}
230
+
}
231
+
```
232
+
233
+
164
234
The algorithm is probabilistic. This means that it may or may not work. You would also need to
165
235
166
236
Ро-алгоритм Полларда — рандомизированный алгоритм факторизации целых чисел, работающий за время $O(n^\frac{1}{4})$ и основывающийся не следствии из парадокса дней рождений:
@@ -232,99 +302,6 @@ If you have limited time, you should probably compute as much forward as possibl
232
302
233
303
How to optimize for the *average* case is unclear.
234
304
235
-
0.087907
236
-
3964.321045
237
-
238
-
```c++
239
-
u64find_factor(u64 n) {
240
-
if (n % 2 == 0)
241
-
return 2;
242
-
for (u64 d = 3; d * d <= n; d += 2)
243
-
if (n % d == 0)
244
-
return d;
245
-
return 1;
246
-
}
247
-
```
248
-
249
-
0.199740
250
-
7615.217773
251
-
252
-
```c++
253
-
u64 find_factor(u64 n) {
254
-
for (u64 d : {2, 3, 5})
255
-
if (n % d == 0)
256
-
return d;
257
-
u64 increments[] = {0, 4, 6, 10, 12, 16, 22, 24};
258
-
for (u64 d = 7; d * d <= n; d += 30) {
259
-
for (u64 k = 0; k < 8; k++) {
260
-
u64 x = d + increments[k];
261
-
if (n % x == 0)
262
-
return x;
263
-
}
264
-
}
265
-
return 1;
266
-
}
267
-
```
268
-
269
-
19430.058594
270
-
271
-
```c++
272
-
constint N = (1 << 16);
273
-
274
-
structPrecalc {
275
-
u16 primes[6542]; // # of primes under N=2^16
276
-
277
-
constexpr Precalc() : primes{} {
278
-
bool marked[N] = {};
279
-
int n_primes = 0;
280
-
281
-
for (int i = 2; i < N; i++) {
282
-
if (!marked[i]) {
283
-
primes[n_primes++] = i;
284
-
for (int j = 2 * i; j < N; j += i)
285
-
marked[j] = true;
286
-
}
287
-
}
288
-
}
289
-
};
290
-
291
-
constexpr Precalc P{};
292
-
293
-
u64find_factor(u64 n) {
294
-
for (u16 p : P.primes)
295
-
if (n % p == 0)
296
-
return p;
297
-
return 1;
298
-
}
299
-
```
300
-
301
-
352997.656250
302
-
303
-
```c++
304
-
u64 magic[6542];
305
-
magic[n_primes++] = u64(-1) / i + 1;
306
-
307
-
u64 find_factor(u64 n) {
308
-
for (u64 m : P.magic)
309
-
if (m * n < m)
310
-
return u64(-1) / m + 1;
311
-
return 1;
312
-
}
313
-
```
314
-
315
-
Except that it is contant, so the speedup should be twice as much.
0 commit comments