Code generation outline

This is how I approached code generation. Again, I used an extension of the Visitor class to do the generation, called CodeGen. I also make use of several variables I keep in the CodeGen class including a pointer to the global symbol table, a pointer to the current symbol table, a local variable offset, a current variable number, and a current label number. If you still have the machine organization book (shame on you if you don't!), then Appendix A is a good place to look for a review of MIPS. .data denotes the beginning of data allocation for variables and .text denotes the beginning of executable instructions. You can have several data and/or text segments. As described in class, I use the following function in CodeGen to write an instruction out.

void emitCode(String lab,String inst,String arg1, 
              String arg2, String arg3, 
              String comment) 
  // each arg must be present but it may be an empty string
  // lab is the label for the statement 
  // inst is the instruction name
  // arg1 is the first argument
  // arg2 is the second argument
  // arg3 is the third argument
  // comment is a comment at the end of the instruction

 

So if I call:

emitCode("i",".word","0","","","declare i as global var");

then it prints:

i: .word 0  # declare i as global var

I also have the following functions to push onto stack and pop off into register reg:

void pushOnStackW(String reg) {
  emitCode("","sw",reg,"-4($sp)","","");
  emitCode("","addi","$sp","$sp","-4","");
}
 
void popOffStackW(String reg) {
  emitCode("","lw",reg,"0($sp)","","");
  emitCode("","addi","$sp","$sp","4","");

You will also need to be able to push and pop bytes. In this case, use sb/lb, but still adjust the sp by 4 (otherwise we get alignment errors). You also need versions that push and pop from/to from floating point registers.

At end of code generation, call:

void genFinalCode() {
  /* generate final code stuff */
  emitCode("","add","$0","$0","$0"," no op"); 
}

xspim is available on cerebro (if you're running xwin32). You start it by typing xspim. To load a file, press the "load" button and type the file name. In order to receive any credit for the code generation, your compiler output files must at least load into SPIM. You can also run PC-Spim in the computer labs (but not the CS lab unless some kind soul installs it).

I call the code generator initially by creating a CodeGen visitor and calling apply on the top-level pointer to the abstract syntax tree. I do not even bother to call the code generator if there are any errors in the other phases. So when I do code generation, I know that I have an error-free abstract syntax tree. I added an int variable to the info for variables in the symbol tables called offset to hold the offset for variables from the $fp. As a general rule, whenever an expression value is calculated, we'll put the result on the top of the stack.

Here's an outline of the work done for each type of node in the code generator.

We only encounter these at the top level if they are global variables. The vardecl nodes within compound statements will be handled as part of handling the compound statement node. Therefore, we want to store the variable defined in these top-level nodes in static space. By the way, do not use the variable name j. It causes an error in SPIM. Here's the outline:

1. Set currentTable to globalTable
2. Look up the variable name in the table. You should get back a reference to the info for that variable so that you can alter the info in the table. Set the offset to -1 for that variable so we can recognize that it is a global variable.
3. write out the following code:
   .data   
   varname:  .word 0  # if int 
   .text
 
   .data   
   varname:  .byte 0  # if char
   .align 2 
   .text
 
   .data
   varname:  .float 0.0  # if float
   .text
 
   .data
   varname:  .space size  # if array. replace size with the array 
                          # size (*4 if float or int because space is in bytes) 
   .align 2
   .text
4. Call apply on the sibling if it exists.

These are allocated in the static area also. If we were being efficient, we wouldn't duplicate them (so if 7 was used as a NUMLIT 10 times, we'd only store it once and the others would refer to this). But we'll sacrifice efficiency for clarity though this is an extra credit opportunity if you want to try only storing them once. Also, we need to give each of these a unique name. I wrote the following function to construct the unique name (obviously it's not unique if your program uses V0, etc.). The varNum variable would need to be defined CodeGen object.

String genNextName() {
  String name;
   name = “V”+varNum;
  varNum++;
  return name;
}

By the way, you will have a similar function for generating unique labels.

Here's what I do for const nodes:

1. generate a name using the function above
2. emit the following code:
   .data
   name: .word  value   # this will differ depending on what type of const
   .text
   la $t0, name # load address of variable into $t0
   lw $t0, 0($t0)   # load the value into $t0 (may be lb if char)
3. Then emit the code to put contents of $t0 on top of stack   
arg[0] (bottom of stack - higher address)
arg[1]
...
arg[n]
old $ra (has return address)
old $fp
local var[0]
local var[1]  (top of stack - lower address)
...

The $sp points to the top element and the $fp points to the old $fp. So, for example, to access arg[n] in the function, we would load the word at 8($fp). Remember that a word is 4 bytes and addresses are in bytes so each of the slots above is 4 addresses. To access local var[0], we'd use -4($fp).

So in processing the function declaration, we have to put the offsets for the parameters/args into the symbol table so we can find them later. With SPIM, by the way, main is required to have a void return and no arguments.

Here are the steps:

1. Set the currentTable to the table for the function body
2. Set lvoffset to -4 (because local variables within this function
   start at -4 off the $fp)
3. Emit function label
4. If the function is main, go ahead and do steps 2-4 in the steps
   specified below for the function call.
5. Set poffset (a local var) to (number of params+1)*4  
   (this will be the offset for the first parameter)
6. Go through the parameter list.  For each one, look up name
   in currentTable and store poffset as offset for that name in the symbol table.
   Subtract 4 from poffset.
7. call apply on the function body
8. Put the following code after the body:
   addi $sp,$fp,0  # loads fp into sp
   lw $fp,0($sp)   # loads old fp into fp
   addi $sp,$sp,4  # pops old fp off stack
  # if this is main, do step 6 from call code below  
  jr $ra          # returns from function
9. Close the current symbol table by setting the current table 
   to point to the node's table's parent 
10. Call apply on the sibling if it exists.

One other note - if the function is main, you need to put:

.globl main

before the label for the function.

Here are the steps:

1. Go through the args and call apply on each. Each is an expression
   and so the final result for each will end up on the stack (assuming
   you handle expressions correctly in their cases). Count the args. 
2. Emit code to $ra on the stack 
3. Emit code to put $fp on the stack
4. Emit code: add $fp, $sp, $zero
   to put the $sp into $fp
5. emit code: jal nameoffunction
6. emit code to pop $ra off of stack (this will happen after return)
7. Emit code: addi $sp, $sp, 4*argcount
8. Emit code to push $v0 on stack even if the function is void 
   (this is the return value from the function call) 

Here's the outline for if:

1. If there is an else clause, generate a label for it and store in
   a variable for later use
2. Generate a label for the end of the if and store in
   a variable for later use
3. Call apply on the child representing the test.
4. Emit code to pop the test result off of the stack
 
If there is an else clause
5. Emit code to compare the test result to $zero and branch to the 
   else label if they are equal
6. Call apply on the child representing the then clause.
7. Emit code to jump to the end if label.
8. Emit the else label
9. Call apply on the child representing the else clause.
10. Emit the end if label
 
If there is no else clause
5. Emit code to compare the test result to $zero and branch to the 
   end label if they are equal
6. Call apply on the child representing the then clause. 
7. Emit the end if label

In either case, call apply on the sibling as the last step. The other selection/iteration statements have a similar structure.

There are various types of variables you have to deal with. I'll show you the int version. Don't forget to use lb for characters and in character arrays, use 1 for each element instead of 4.

All of these start by looking up name in symbol table.

5.                        add $t2,$t2,$t2 # $t2 has 2*original $t2
6.                        add $t2,$t2,$t2 # $t2 has 4*original $t2
7.                        add $t2,$t1,$t2

We'll talk about these in class.

- Note that local arrays will be considered to be upside down in the stack.

1.      Set spoffset=0

2.      If there is a symbol table associated with this node, set the current table to it.

3.      For each local variable, we need to allocate space on the stack:

4.      Call apply on the compound statement body

5.      Add -1*spoffset to lvoffset to reset lvoffset to its previous value

6.      Emit the code to reset the stack pointer to its value before any local variables were allocated by adding spoffset*-1.

7.      If there is a symbol table associated with this node, close it by setting the current table point to the node's table's parent

8.      Call apply on sibling

0.      Call apply on expression

1.      Emit code to pop result of the stack

2.      Call apply on sibling

0.      If there is a return value, call apply on that expression node

1.      If there is a return value, Emit code to pop the top of the stack into v0.

2.      do step 8 from the instructions for fundecl

3.      Call apply on sibling

Input-output

You need to use system functions to do I/O. We'll do this by creating some function declarations.

Second, look at p. A.49 of Appendix A. This is a table for the sycalls for input and output. For each of the function declarations, you should generate code for the function at the beginning of your code. So, you'll have 6 sections at the beginning of your code, each labeled with one of the names above. For the input int function, emit code to:

  1. put the system call code (5) into $v0
  2. do a syscall
  3. do step 8 from the instructions for fundecl

The result will be put into $v0 by the system call. This will be put onto the top of the stack as part of the function call. For the input float function, emit code to:

  1. put the system call code (6) into $v0
  2. do a syscall
  3. emit code to put contents of $f0 into $v0. You'll have to use the stack as a temporary storage area because you can't transfer director from a floating point register to a normal register.
  4. do step 8 from the instructions for fundecl

For the input char function:

  1. create a new variable Vx (using genNextName) and emit:
2.               .data
3.           Vx: .byte 0
4.               .byte 0
5.               .align 2
6.               .text

Then emit code to do the following:

  1. put the system call code (8) into $v0
  2. put address of Vx into a0
  3. put 2 into a1
  4. do a syscall
  5. la $t0,Vx
  6. lb 0($t0) into v0.
  7. do step 8 from the instructions for fundecl

For the output int function, emit code to:

  1. put the system call code (1) into $v0
  2. put the first argument (8($fp)) into $a0
  3. do a syscall
  4. do step 8 from the instructions for fundecl

For the output float function, emit code:

  1. put the system call code (2) into $v0
  2. put the first argument into $f12
  3. do a syscall
  4. do step 8 from the instructions for fundecl

For the output char function:

  1. put the system call code (4) into $v0
  2. store the address of the variable from inputChar above into $a0
  3. store the byte in the first argument into 0($a0)
  4. store 0 into 1($a0)
  5. do a syscall
  6. do step 8 from the instructions for fundecl

The function calls will be handled just like regular function calls.