Call stack
To understand how Co’s call stack works, let’s explore a simple program:
fun twice(x i32) i32 {
x * 2
}
fun add(a, b, c i32) (i32, bool) {
b = twice(b)
(a + b + c, true)
}
fun main {
add(10, 20, 30)
}
Before we dig in, let’s define some terminology:
- FP: Frame pointer — arguments and locals (aka “BP”)
- IP: Instruction pointer — jumps and branches (aka “PC”)
- SB: Static base pointer — globals
- SP: Stack pointer — top of stack
This example assumes an intel-like architecture with 64-bit (8 byte) addressing.
When our main
function is called, a return address has been placed on top
of the stack, the stack pointer (SP) has been adjusted to point to the top
of the stack, and the instruction pointer (IP) is pointing at the first
instruction of our main function:
SP
is the stack pointer—a register value that is used to refer to the
current address in the stack memory. We use this value to address nearby
values in the current stack frame.
The main
function first grows the stack to make room for
frame pointer address, call arguments and return values:
SUBQ $32, SP # grow stack
MOVQ FP, 24(SP) # store frame-pointer in stack memory at SP+24
LEAQ 24(SP), FP # load new frame pointer addr into FP (excl return address)
In these illustrations, the number next to
SP
indicates an offset fromSP
, for instanceSP +4
means “SP plus 4 bytes” and is written as4(SP)
in code.
Since stack memory is pre-allocated, “growing” the stack really just means
to change the value of SP
.
As the stack grows, the stack pointer (SP) decreases in value.
Next, main
writes the arguments to the add
function we’re about to call:
MOVL $10, (SP) # store arg 1 for "add" on stack at 0xFFDF (= SP +0)
MOVL $20, 4(SP) # store arg 2 for "add" on stack at 0xFFE3 (= SP +4)
MOVL $30, 8(SP) # store arg 3 for "add" on stack at 0xFFE7 (= SP +8)
Why didn’t we decrement SP to include 4 bytes extra bytes for Arg 1?
Intel-like architectures automatically decrements SP by the address size before it writes the return address to SP, thus we have 8 bytes of space in the stack after SP.
CALL add(SB) # call the "add" function
The CALL
instruction performs the following:
- Increment IP (instruction pointer)
- Subtract 8 from SP (8 = address size)
- Copy IP to address stored at SP — this effectively writes the return addressat SP; the program-memory address just after our CALL instruction (value of IP.)
- Set IP to address given with CALL operand (
add(SB)
)
CALL can be substituted by the following code:
SUBQ $8 SP # grow stack to accomodate return address
LEAQ 16(IP) (SP) # store return address on stack at SP+0
MOVQ add(SB) IP # load destination address into IP
We’re now in the add
function which does the same thing that the main
function did in it’s “prologue”, namely to grow the stack and save its
frame pointer:
SUBQ $24, SP # grow stack
MOVQ FP, 16(SP) # store frame-pointer in stack memory at SP+16
LEAQ 16(SP), FP # load new frame pointer addr into FP (excl return address)
MOVL b+36(SP), R1 # load add's 2nd arg into register R1
MOVL R1, (SP) # store arg 1 for "twice" on stack at 0xFFBF (= SP +0)
CALL twice(SB) # call the "twice" function
MOVL 8(SP), R1
SHLL $1, R1 # shift left ("x * 2" optimized to "x << 1")
MOVL R1, 16(SP)
RET # return to caller
The RET
instruction performs the following:
- Copy the value of SP to IP so that execution continues after the CALL instruction in the calling function.
- Add 8 to SP (8 = address size) to effectively “pop” the return address off the stack.
RET can be substituted by the following code:
ADDQ $8 SP # shrink stack beyond return address
LEAQ -8(SP) IP # load return address into IP
On architectures without LEA, we would have to temporarily store the value of SP into another register while shrinking the stack:
MOVQ (SP) R1
ADDQ $8 SP
MOVQ SP IP
After RET, execution continues in the calling function.
We can now use the result from the call to “twice” to complete our calculations in the “add” function:
MOVL 32(SP), R1 # load argument a into R1
ADDL 8(SP), R1 # add return value of "twice" and b into R1
MOVL 40(SP), R2 # load argument c into R2
ADDL R2, R1 # add result of previous addition and c into R1
Next we write the return values:
MOVL R1, 44(SP) # store result of addition to stack at SP+44 (result 1)
MOVB $1, 48(SP) # store constant 1 to stack at SP+48 (result 2)
It’s time to return to “main” from the “add” function:
MOVQ 16(SP), FP # restore frame pointer
ADDQ $24, SP # shrink stack back to where it was when we entered "add"
RET # return to caller
And that’s it.
There’s a lot of room for hardware-dependent optimizations, like passing arguments and return values in registers instead of on the stack.
Another thing we probably will do is to get rid of the frame pointer (FP) and free up that register for general use. FP is handy for manual assembly but is not needed as we address the stack using only SP.
Code listing
twice function
twice(SB) 0, 16
MOVL 8(SP), R1 # load argument x into R1
SHLL $1, R1 # shift left ("x * 2" optimized to "x << 1")
MOVL R1, 16(SP) # store result in R1 to stack at SP+16
RET # return to caller
add function
add(SB), 24, 24
SUBQ $24, SP # grow stack for call args + retvals + retaddr
MOVQ FP, 16(SP) # store current frame pointer at SP+16
LEAQ 16(SP), FP # compute & store new frame pointer addr
MOVL 36(SP), R1 # load add's 2nd arg b into register R1
MOVL R1, (SP) # push arg 1
CALL twice(SB) # call "twice" function
MOVL 32(SP), R1 # load argument a into R1
ADDL 8(SP), R1 # add return value of "twice" and b into R1
MOVL 40(SP), R2 # load argument c into R2
ADDL R2, R1 # add result of previous addition and c into R1
# store return values (for the current function)
MOVL R1, 44(SP) # store result of addition to stack at SP+44 (result 1)
MOVB $1, 48(SP) # store constant 1 to stack at SP+48 (result 2)
# restore stack
MOVQ 16(SP), FP # restore frame pointer (unwind by one stack frame)
ADDQ $24, SP # shrink stack (balances growth by SUBQ $24, SP)
RET # return to caller
main function
main(SB), 32, 0
SUBQ $32, SP # grow stack
MOVQ FP, 24(SP) # store frame-pointer in stack memory at SP+24
LEAQ 24(SP), FP # load new frame pointer addr into FP
MOVL $10, (SP) # store arg 1 for "add" on stack at 0xFFDF (= SP +0)
MOVL $20, 4(SP) # store arg 2 for "add" on stack at 0xFFE3 (= SP +4)
MOVL $30, 8(SP) # store arg 3 for "add" on stack at 0xFFE7 (= SP +8)
CALL add(SB) # call the "add" function
# restore stack
MOVQ 24(SP), FP # unwind stack pointer by one stack frame
ADDQ $32, SP # shrink stack (balances growth by SUBQ $32, SP)
RET # return to caller
Note: At function definitions like add(SB), 24, 24
, the numbers means
“frame-size, args-size”. This states that the function has a 24-byte frame
and is called with 24 bytes of argument, which live on the caller’s frame.
CALL emulation
SUBQ $8 SP # grow stack to accomodate return address
LEAQ 16(IP) (SP) # store return address on stack at SP+0
MOVQ add(SB) IP # load destination address into IP
Note: 16(IP)
assumes the sum of the current instruction and the following
MOVQ instruction is a total of 16 bytes.
RET emulation
ADDQ $8 SP # shrink stack beyond return address
LEAQ -8(SP) IP # load return address into IP
Notes
LEA stands for Load Effective Address (seen as LEAQ
in the example code).
LEA, the only instruction on x86 that performs memory addressing calculations but doesn’t actually address memory. LEA accepts a standard memory addressing operand, but does nothing more than store the calculated memory offset in the specified register, which may be any general purpose register.
What does that give us? Two things that ADD doesn’t provide:
- the ability to perform addition with either two or three operands, and
- the ability to store the result in any register; not just one of the source operands