We'll take a look at the same very simple max
function, as in the previous
section.
.. literalinclude:: listings/simple_if_max.cpp :language: c++
Translated to LLVM IR:
.. literalinclude:: listings/simple_if_max_cleaned.ll :language: llvm
We can see that the function allocates space on the stack with alloca
[2], where the bigger value is stored. In one branch %a
is
stored, while in the other branch %b
is stored to the stack allocated
memory. However, we want to avoid using memory load/store operation and use
registers instead, whenever possible. So we would like to write something like
this:
define i32 @max(i32 %a, i32 %b) {
entry:
%0 = icmp sgt i32 %a, %b
br i1 %0, label %btrue, label %bfalse
btrue:
%retval = %a
br label %end
bfalse:
%retval = %b
br label %end
end:
ret i32 %retval
}
This is not valid LLVM IR, because it violates the static single assignment form (SSA, [1]) of the LLVM IR. SSA form requires that every variable is assigned only exactly once. SSA form enables and simplifies a vast number of compiler optimizations, and is the de-facto standard for intermediate representations in compilers of imperative programming languages.
Now how would one implement the above code in proper SSA form LLVM IR? The
answer is the magic phi
instruction. The phi
instruction is named after
the φ function used in the theory of SSA. This functions magically chooses the
right value, depending on the control flow. In LLVM you have to manually
specify the name of the value and the previous basic block.
end:
%retval = phi i32 [%a, %btrue], [%b, %bfalse]
Here we instruct the phi
instruction to choose %a
if the previous basic
block was %btrue
. If the previous basic block was %bfalse
, then %b
will be used. The value is then assigned to a new variable %retval
. Here
you can see the full code listing:
.. literalinclude:: listings/simple_if_max_phi.ll :language: llvm
Let's have a look how the @max
function now maps to actual machine code.
We'll have a look what kind of assembly code is generated by the compiler back
end. In this case we'll look at the code generated for x86 64-bit, compiled
with different optimization levels. We'll start with a non-optimizing backend
(llc -O0 -filetype=asm
). We will get something like this assembly:
max: # @max
# %bb.0: # %entry
cmpl %esi, %edi # %edi = %a, %esi = %b
jle .LBB0_2
# %bb.1: # %btrue
movl %edi, -4(%rsp) # mov src, dst
jmp .LBB0_3
.LBB0_2: # %bfalse
movl %esi, -4(%rsp) # mov src, dst
jmp .LBB0_3
.LBB0_3: # %end
movl -4(%rsp), %eax # return value in eax
retq
The parameters %a
and %b
are passed in %edi
and %esi
respectively. We can see that the compiler back end generated code that uses
the stack to store the bigger value. So the code generated by the compiler back
end is not what we had in mind, when we wrote the LLVM IR. The reason for this
is that the compiler back end needs to implement the phi
instruction with
real machine instructions. Usually that is done by assigning to one register or
storing to one common stack memory location. Usually the compiler back end will
use the stack for implementing the phi
instruction. However, if we use a
little more optimization in the back end (i.e., llc -O1
), we can get a more
optimized version:
max: # @max
# %bb.0: # %entry
cmpl %esi, %edi
jg .LBB0_2
# %bb.1: # %bfalse
movl %esi, %edi
.LBB0_2: # %end
movl %edi, %eax
retq
Here the phi
function is implemented by using the %edi
register. In one
branch %edi
already contains the desired value, so nothing happens. In the
other branch %esi
is copied to %edi
. At the %end
basic block,
%edi
contains the desired value from both branches. This is more like what
we had in mind. We can see that optimization is something that needs to be
applied through the whole compilation pipeline.
[1] | Wikipedia: Static single assignment form |
[2] | LangRef: alloca |
[3] | LangRef: phi |