Minimum implementations
April 4, 2017
I really enjoy reading the Bit Twiddling
Hacks page. It has lots of
great bit-sized (pun intended) algorithm implementations optimized for
performance (or at least number of operations). Combine this with Compiler
Explorer, and you’ve got hours of assembly language fun. For
example, I thought it might be interesting to understand how different
implementations of the min
algorithm generate assembly code.
Let’s look at three different implementations, all from this section of the page. You can find the code generated by GCC 6.3 here.
No branches, please
Naturally, you would expect the min
algorithm to require at least one branch,
since it is essentially an “if” check. But with some fun bitwise operations, this
branch can be avoided.
GCC produces this assembly (with an optimized compile), for x64:
1
2
3
4
5
6
7
8
9
no_branch_min(int, int):
xor eax, eax
cmp edi, esi
setl al
xor edi, esi
neg eax
and eax, edi
xor eax, esi
ret
Let’s step through this code line by line to understand how the algorithm works. I’ll assume the arguments are the values 2 and 3.
- Line 1: It looks like the arguments are passed in the
edi
andesi
registers.
eax:? edi:0x00000002 esi:0x00000003 EFLAGS - SF:? OF:?
- Line 2: Clear the
eax
register (it will be used later).
eax:0x00000000 edi:0x00000002 esi:0x00000003 EFLAGS - SF:? OF:?
- Line 3: The
cmp
instruction subtracts the two arguments, and sets status flags, but doesn’t change any other register values. So2 - 3 = -1
; the sign flag is set and the overflow flag is not.
eax:0x00000000 edi:0x00000002 esi:0x00000003 EFLAGS - SF:1 OF:0
- Line 4: The
setl
instruction sets its destination operand, in this case, theal
register (the lower 16 bytes ofeax
) to a value of 1 ifSF != OF
. That is the case.
eax:0x00000001 edi:0x00000002 esi:0x00000003
- Line 5: The
xor
instruction performs an exclusive or, and puts the result in the first register.
eax:0x00000001 edi:0x00000001 esi:0x00000003
- Line 6: The
neg
instruction takes the two’s complement of its argument.
eax:0xffffffff edi:0x00000001 esi:0x00000003
- Line 7: The logical
and
instruction next stores its result ineax
. Again, it is zero.
eax:0x00000001 edi:0x00000001 esi:0x00000003
- Line 8: The xor instruction of 1 and 3 results in 2, which is stored in
eax
and returned.
eax:0x00000002 edi:0x00000001 esi:0x00000003
So as advertised, this algorithm does not take any branches, and indeed returns the minimum (at least of 2 and 3). How does it compare to the minimum algorithm in the standard library though?
Using std::min
I had to do a double-take after looking at the code generated for std::min
. Here
it is, in all of its tiny glory:
1
2
3
4
cmp esi, edi
mov eax, edi
cmovle eax, esi
ret
That’s right, GCC compiled this minimum implementation into a real minimum implementation, just four instructions! Let’s understand what they are doing, again with the inputs 2 and 3.
- Line 1: The arguments are passed in the
edi
andesi
registers. Notice that thiscmp
instruction is different from the first case, here it does3 - 2 = 1
, so no status flags are set.
eax:? edi:0x00000002 esi:0x00000003 EFLAGS - SF:0 OF:0
- Line 2: Via
mov
, the value of 2 is copied fromedi
toeax
. This really just sets upeax
to be the return value of the function, it doesn’t have an impact on the algorithm.
eax:0x00000002 edi:0x00000002 esi:0x00000003 EFLAGS - SF:0 OF:0
- Line 3: Finally, the meat of the algorithm is in
cmovle
. This instruction does a conditional move, replacingeax
withesi
only if the previouscmp
found that the left operand was less than or equal to the right operand. In other words, if the zero flag is set or the sign flag is not equal to the overflow flag. Neither condition is true in this case, so the value of 2 remains ineax
and is returned.
eax:0x00000002 edi:0x00000002 esi:0x00000003 EFLAGS - SF:0 OF:0
Even in this case, there are no branches! The compiler has replaced the branch by a conditional instruction.
Let’s be naive
Suppose instead that I roll my own implementation of min
, using a very clear
branch:
This code must have a branching instruction, right? Wrong!
1
2
3
4
cmp edi, esi
mov eax, esi
cmovle eax, edi
ret
The compiler has provided the same implementation as std::min
. It inferred what
I wanted to do and wrote a better implementation than I could write myself.
Time to double check
Look at again at the entire assembly code. In the
pane for compiler #2, I’ve used GCC 6.3 as well, but this time without any
optimizations. Although the three different implementations generate more assembly
code than the optimized compile, notice that they are still different from on
another. The no_branch_min
code does indeed have no branches, while
standard_min
and naive_min
both have one conditional jump and one unconditional
jump instruction.
So it is true that the no_branch_min
implementation guarantees no branches, but
a good optimizing compiler will generate better code for the more
human-readable std::min
and even the naive minimum implementation.
Let’s face it, optimizing C++ compilers are simply amazing.