Don't like this style? Click here to change it! blue.css
OK so this is the beginning of Mastery Task C. If you don't yet feel comfortable with B I think this will help more than hurt actually. So here's the idea:
When you've learned coding you were told "Don't Repeat Yourself" which is to say, if you do an action in code frequently you should wrap it up in a reusable function.
OK so think about that sentence in the land of mindless x86 weird machines.
You will create some code that will live somewhere and call it from any other place you'd like.
But so far we have just seen one way to move the instruction pointer and that is slowly walking one step at a time. So how EXACTLY do function calls work?
Well let's explore that.
This story has the following parts:
PUSH rip; JMP .blah;
PUSH rbp; MOV rbp, rsp;
MOV rsp, rbp; POP rbp; POP rip;
RDI, RSI, RDX, RCX, R8, R9
The story version of this is at pwnwizard.com
Here's a little web app for visualizing these things as burgers: Simple Stack Visualizer.
call
instructionCALL .thefunction does two things:
push rip
(kinda)jmp .thefunction
Huh? Why? OK. Imagine this:
You are going to do a series of instructions:
The job of the processor is to run instruction 1 then 2, then jump to the other function and when that function is done to come back and do instruction 3 then instruction 4.
push rip
is really push where_to_go_after_the_function_is_done
.
It is writing down an address in the stack so it can come back to instruction 3!
It has to store that somewhere and the stack makes good sense.
jmp .thefunc
will set the instruction pointer to the address of the utility function, somewhere far far away.
But don't worry you wrote down your return address so you won't get lost.
PUSH rbp; MOV rbp, rsp;
We're already kind of comfortable with this. It "creates" the stack from for new local variables.
What happens with these two commands?
This simply stores the OLD base-pointer so you can recover your previous "state". It is long-term storage of your previous context. ANY TIME RBP CHANGES YOU HAVE MADE A NEW "SCOPE".
This starts all over. It replaces the old base pointer (saved on stack for recovery) to the current stack pointer. This begins the process of having a fresh home/world/context/dimension for the new function to live in.
LOCAL VARIABLES ALWAYS LIVE BETWEEN rsp AND rbp
leave; ret;
Your coding experience so far has trained you to think of the instruction return
as your method for getting a value out of a program. But think about that word: RETURN.
It actually has a different and more important meaning. To GO BACK TO WHERE YOU CAME FROM.
The ret
instruction actually has NOTHING TO DO with "return values"...
that's what rax/eax
is for.
ret
is actually better understood as: pop rip
that is, it just reads the next instruction address FROM THE STACK.
In fact the last two instructions of any function tend to be:
leave
ret
These two instructions literally do the following:
mov rsp, rbp
pop rbp
pop rip
NOTE: THIS IS THE EXACT INVERSE OF THE PREVIOUS STEPS. If you put on socks then shoes you must take off shoes then socks.
mov rsp, rbp;
puts the stack pointer back to where it was at the start of the function.
pop rbp;
changes our context to whatever the calling function had as its context (the old base pointer is read from the stack).
pop rip;
sets the instruction pointer back to the address that was stored in the stack which is the next thing to do AFTER the function.
Let's write a program that calls a function and watch what happens at each step to:
Step through this binary: callme
Of course when you call a function you want to pass arguments too, right?
So let's look at how that works in the 2 different paradigms:
This is actually pretty easy.
If your function needs arguments it knows to find them in the registers
That's it. I don't even really memorize these, just remember that it's registers and google/wikipedia until I find the list. OK maybe I know the first 3 by memory, RDI, RSI, RDX isn't so hard to just know.
If you have more than 6 arguments then they are stored in the stack just like the 32-bit case (coming next).
Step through this 64-bit binary: addme
In the 32-bit case (or 64-bit for arguments over 6) it checks the stack:
So if you imagine the stack, the arguments live TO THE RIGHT OF the return address.
In practice that means that the calling convention looks like this:
Now to make 32-bit binaries you'll want to do some installing:
Make a 32-bit binary: install apt-get install gcc-multilib
then add the -m32 flag to your compilation: gcc -m32 -o bin_name source_code.c
OK let's make programs with parameters that call each other and trace what happens and make predictions and such.
Step through this 32-bit binary: addme32
I won't dive into these just yet but you should just be aware that a couple other calling conventions exist out there for other situations.
These are passed in the stack but offset by a little, so inspect and test. This is just for main.
I'm going to do a whole lecture or two on syscalls, but for now you can summarize it like this: anything that is operating system/kernel level you do through a syscall. That is, access to the filesystem, making new memory segments, starting/stopping processes, etc.
The syscall is it's own instruction that uses the rax
register to dictate what type of syscall you're doing and the other registers hold the other params.
In the 32-bit case it's the interrupt instruction int 0x80
then a similar structure.
When there is an interrupt sometimes the entire register set is stored on the stack to let you resume after calling the error handler or what have you. We will exploit this later (it's called SROP).