Skip to content

Commit 69290a5

Browse files
committed
alternative cmov
1 parent 8ea89b4 commit 69290a5

File tree

1 file changed

+12
-1
lines changed

1 file changed

+12
-1
lines changed

content/english/hpc/pipelining/branchless.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,18 @@ sar ebx, 31 ; t >>= 31
4040
mul eax, ebx ; x *= t
4141
```
4242

43-
But the compiler actually produced something different. Instead of going with this arithmetic trick, it used a special `cmov` ("conditional move") instruction that assigns a value based on a condition (which is computed and checked using the flags register, the same way as for jumps):
43+
Another, more complicated way to implement this whole sequence is to convert this sign byte into a mask and then use bitwise `and` instead of multiplication: `((a[i] - 50) >> 1 - 1) & a`. This makes the whole sequence one cycle faster, considering that unlike other instructions, `mul` takes 3 cycles:
44+
45+
```nasm
46+
mov ebx, eax ; t = x
47+
sub ebx, 50 ; t -= 50
48+
sar ebx, 31 ; t >>= 31
49+
; mul eax, ebx ; x *= t
50+
sub ebx, 1 ; t -= 1 (causing underflow if t = 0)
51+
and eax, ebx ; x &= t
52+
```
53+
54+
But the compiler actually elects to do something different. Instead of going with this arithmetic trick, it used a special `cmov` ("conditional move") instruction that assigns a value based on a condition (which is computed and checked using the flags register, the same way as for jumps):
4455

4556
```nasm
4657
mov ebx, 0 ; cmov doesn't support immediate values, so we need a zero register

0 commit comments

Comments
 (0)