Don't like this style? Click here to change it! blue.css

LOGIN:
Welcome .... Click here to logout

Calling Conventions

OK so this is the beginning of Mastery Task C. If you don't yet feel comfortable with B I think this will help more than hurt actually. So here's the idea:

When you've learned coding you were told "Don't Repeat Yourself" which is to say, if you do an action in code frequently you should wrap it up in a reusable function.

OK so think about that sentence in the land of mindless x86 weird machines.

You will create some code that will live somewhere and call it from any other place you'd like.

But so far we have just seen one way to move the instruction pointer and that is slowly walking one step at a time. So how EXACTLY do function calls work?

Well let's explore that.

This story has the following parts:

  1. CALL .blah; is PUSH rip; JMP .blah;
  2. At the start of a function is: PUSH rbp; MOV rbp, rsp;
  3. LEAVE; RET; is MOV rsp, rbp; POP rbp; POP rip;
  4. The first 6 Parameters in 64-bit are in 6 registers: RDI, RSI, RDX, RCX, R8, R9
  5. In 32-bit (and the rest of the params in 64-bit) live on the stack after return address. (push in reverse order: push arg3; push arg2; push arg1; call .blah;)

The story version of this is at pwnwizard.com

GOAL: Understand this diagram

Here's a little web app for visualizing these things as burgers: Simple Stack Visualizer.

the call instruction

CALL .thefunction does two things:

  1. push rip (kinda)
  2. jmp .thefunction

Huh? Why? OK. Imagine this:

You are going to do a series of instructions:

  1. instruction 1
  2. instruction 2
  3. call somefunction
  4. instruction 3
  5. instruction 4

The job of the processor is to run instruction 1 then 2, then jump to the other function and when that function is done to come back and do instruction 3 then instruction 4.

push rip is really push where_to_go_after_the_function_is_done. It is writing down an address in the stack so it can come back to instruction 3! It has to store that somewhere and the stack makes good sense.

jmp .thefunc will set the instruction pointer to the address of the utility function, somewhere far far away. But don't worry you wrote down your return address so you won't get lost.

Simple Fact: all functions start with PUSH rbp; MOV rbp, rsp;

We're already kind of comfortable with this. It "creates" the stack from for new local variables.

What happens with these two commands?

PUSH rbp;

This simply stores the OLD base-pointer so you can recover your previous "state". It is long-term storage of your previous context. ANY TIME RBP CHANGES YOU HAVE MADE A NEW "SCOPE".

MOV rbp, rsp;

This starts all over. It replaces the old base pointer (saved on stack for recovery) to the current stack pointer. This begins the process of having a fresh home/world/context/dimension for the new function to live in.

LOCAL VARIABLES ALWAYS LIVE BETWEEN rsp AND rbp

The first Trick: what happens at leave; ret;

Your coding experience so far has trained you to think of the instruction return as your method for getting a value out of a program. But think about that word: RETURN. It actually has a different and more important meaning. To GO BACK TO WHERE YOU CAME FROM.

The ret instruction actually has NOTHING TO DO with "return values"... that's what rax/eax is for.

ret is actually better understood as: pop rip that is, it just reads the next instruction address FROM THE STACK.

In fact the last two instructions of any function tend to be:

  1. leave
  2. ret

These two instructions literally do the following:

  1. mov rsp, rbp
  2. pop rbp
  3. pop rip

NOTE: THIS IS THE EXACT INVERSE OF THE PREVIOUS STEPS. If you put on socks then shoes you must take off shoes then socks.

mov rsp, rbp; puts the stack pointer back to where it was at the start of the function.

pop rbp; changes our context to whatever the calling function had as its context (the old base pointer is read from the stack).

pop rip; sets the instruction pointer back to the address that was stored in the stack which is the next thing to do AFTER the function.

OK Let's trace it all

Let's write a program that calls a function and watch what happens at each step to:

  1. the stack
  2. the instruction pointer
  3. the base pointer
  4. the stack pointer

Step through this binary: callme

Parameters

Of course when you call a function you want to pass arguments too, right?

So let's look at how that works in the 2 different paradigms:

The first 6 parameters in a 64-bit program

This is actually pretty easy.

If your function needs arguments it knows to find them in the registers

  1. RDI holds arg1
  2. RSI holds arg2
  3. RDX holds arg3
  4. RCX holds arg4
  5. R8 holds arg5
  6. R9 holds arg6

That's it. I don't even really memorize these, just remember that it's registers and google/wikipedia until I find the list. OK maybe I know the first 3 by memory, RDI, RSI, RDX isn't so hard to just know.

If you have more than 6 arguments then they are stored in the stack just like the 32-bit case (coming next).

Step through this 64-bit binary: addme

Parameters in the 32-bit case

In the 32-bit case (or 64-bit for arguments over 6) it checks the stack:

  1. arg1 is at [rbp + 4]
  2. arg2 is at [rbp + 8] (or just after arg1)
  3. etc.

So if you imagine the stack, the arguments live TO THE RIGHT OF the return address.

In practice that means that the calling convention looks like this:

  1. push arg3
  2. push arg2
  3. push arg1
  4. call .somefunc
  5. pop rbx (or some other temp location)
  6. pop rbx (or whatever register)
  7. pop rbx (just cleaning the stack up)

Let's play

Now to make 32-bit binaries you'll want to do some installing:

Make a 32-bit binary: install apt-get install gcc-multilib then add the -m32 flag to your compilation: gcc -m32 -o bin_name source_code.c

OK let's make programs with parameters that call each other and trace what happens and make predictions and such.

Step through this 32-bit binary: addme32

Some Other Conventions: command line, syscalls, and sigint

I won't dive into these just yet but you should just be aware that a couple other calling conventions exist out there for other situations.

Command Line Arguments

These are passed in the stack but offset by a little, so inspect and test. This is just for main.

syscalls

I'm going to do a whole lecture or two on syscalls, but for now you can summarize it like this: anything that is operating system/kernel level you do through a syscall. That is, access to the filesystem, making new memory segments, starting/stopping processes, etc.

The syscall is it's own instruction that uses the rax register to dictate what type of syscall you're doing and the other registers hold the other params.

In the 32-bit case it's the interrupt instruction int 0x80 then a similar structure.

sigint

When there is an interrupt sometimes the entire register set is stored on the stack to let you resume after calling the error handler or what have you. We will exploit this later (it's called SROP).