Another site

Note! This tutorial is not being maintained. Why not try this PPC tutorial blog instead?

Introduction

I got fed up with the lack of usable PowerPC tutorials on-line, particularly as they pertain to development under OS X. This is largely a tutorial for myself (and, by consequence, for people like myself). What I assume of the reader is that you already know (a) assembly (albeit under different architectures); and (b) how to develop under Unix. If you don't know these two things, this should be very useless for me. Myself, I'm fairly comfortable with 386, PDP-11, and Sparc assembly (with a small smattering of SIMD in the form of MMX and 3DNow!), so this is going to focus mainly on differences between those and PowerPC. I have a G5 iMac running Mac OS 10.4.3, which is the main development environment.

A note on syntax: because I'm doing development under OS X, I'm using the native assembler there. The syntax there appears to differ from the syntax used by IBM or by gcc (e.g., under PPC Linux).

This is a work in progress.

The register set up

The PowerPC has 32 general purpose registers, each either 32 bits or 64 bits in size (depending on which chip you're using). It should be of note that 32-bit PowerPC and 64-bit PowerPC really have the same instruction set, and 32-bit code will run natively unmodified on a 64-bit chip. 32-bit code is 64-bit code. Of course you will run into some difficulties trying to load a 64-bit immediate into a register on a 32-bit machine :)

Anyway, the registers are labelled r0 through r31. The common PowerPC ABI with respect to registers is as follows:

RegisterClassificationNotes
r0localcommonly used to hold the old link register when building the stack frame
r1dedicatedstack pointer
r2dedicatedtable of contents pointer
r3localcommonly used as the return value of a function, and also the first argument in
r4–r10localcommonly used to send in arguments 2 through 8 into a function
r11–r12local
r13–r31global
lrdedicatedlink register; cannot be used as a general register. Use mflr (move from link register) or mtlr (move to link register) to get at, e.g., mtlr r0
crdedicatedcondition register

The link register can be thought of as the old instruction pointer for our purposes.

A very simple program

.globl _main
_main:
        li r3, 5        // return 5
        blr

No magic here. r3 is our return value, which we set via li (load immediate). blr (branch on link register) just returns from the function. The only oddity is that main starts with an underscore. For some reason the C calling convention states that global functions start with an underscore. Weird.

Try it out and you should get something like this:

$ cc -o foo foo.S
$ ./foo
$ echo $?
5

A simple program

bar:
	mflr r0		// set up the stack frame
	stw r0, 8(r1)
	stwu r1, -16(r1)
	addi r3, r3, 3	// add 3 to the argument and return it
	addi r1, r1, 16	// destroy the stack frame
	lwz r0, 8(r1)
	mtlr r0
	blr		// return

.globl _main
_main:
	mflr r0		// set up the stack frame
	stw r0, 8(r1)
	stwu r1, -16(r1)
	lis r3, hi16(847318093)	// load big number into r3
	ori r3, r3, lo16(847318092)
	bl bar		// call stuff
	addi r1, r1, 16	// destroy the stack frame
	lwz r0, 8(r1)
	mtlr r0
	blr		// return

So were have two functions, bar and _main. We're demonstrating two concepts here: calling a function, dealing with stack frames, and loading immediates.

So let's walk through bar. The first thing we do is set up the stack frame. This involves getting the current link register into a register we can actually work with (r0 in our case). We then take that link register and dump it onto the stack (offset 8 from the current stack pointer). We then dump the old stack pointer onto the stack and grow the stack down to accomodate. The stwu instructions (store word and update) stores the first operand into the effective address indicated by the offset/register second operand, and then updates the register of the second operand to the effective address. Thus, in one instruction we effectively store the old stack pointer and grow the stack.

The addition instruction is hopefully obvious (addi stands for add immediate). After that we shrink the stack, and then load the old link register back in and return. The lwz instruction (load word and zero) loads a word (32-bit value), and if we are using 64-bit registers, zeroes out the highest 32 bits.

The main function at this point is hopefully largely obvious, with the exception of the lis and ori functions. Since PowerPC instructions (even on 64-bit PowerPCs) are all 32 bits in length, loading an arbitrary 32-bit immediate into a register obviously takes at least 2 instructions. And, in fact, we can do it in two instructions. lis (load immediate shifted) takes the 16-bit operand and stuffs it into the highest 16 bits of the destination register (hence the "shifted"). The ori (or immediate) instruction obviously just bitwise ors the lowest 16 bits onto that. Of note here are the assembler constructs hi16 and lo16, which evaluate to the highest 16 bits or lowest 16 bits of a 32-bit constant, respectively.