CSCI 310 - Project 1, Spring 2008

Due Date: Monday, January 28, 2008, 11:59 pm

This project has several goals:  

The basic ideas here will reappear as you work on your compiler over the course of the semester.

What you will be doing

We're going to give you the data classes for a very simple "straight line" programming language. That is, there are no loops or conditionals. Using the classes directly you can construct a tree that describes a program.

Given a particular tree that represents a straight line program, we'll ask you to do a few different tasks with the tree. The tasks themselves are not so difficult, but the fact that you need to do more than one is analogous to the multiple passes over the tree that you will need to do in your compiler. Those multiple passes motivate our use of what is known as the Visitor Pattern.

The Visitor Pattern

Given a structure of objects, such as a tree of expressions found in a parse tree, there may be several different operations we want to perform with and on those objects. In a compiler, for example, we may use the tree to do type checking, interpretation, and code generation. We may also simply want to print the tree to see our code.

The structure may be formed of objects from many different classes, for example, a class for each type of statement and each type of expression. One way to implement each of these operations is to modify the code for every class to include the necessary code of the operation you want performed. However, this causes difficulties for at least two reasons. First, it's a pain to modify lots of classes. Second, it turns out that the code describing the expressions and statement data is often automatically generated -- so modifying that code is a rather volatile activity because a change to the language wipes out your code changes when the data structures are regenerated.

The Visitor Pattern is a method of encapsulating the operation within a single class. We add a single method to each data structure class, typically called apply, which takes a single parameter of type Visitor. The apply method calls a method called caseFoo(this) from the visitor (where Foo is the name of the data structure). Since the apply method doesn't need to know any particulars of the Visitor's operation, it can be automatically generated in cases where the data structure code is automatically generated.

Example

Here's a simple example involving only integers and plus expressions. We begin by setting up the base class for our expressions to require the implementation of the apply method.

public interface Expression 
{
    public void apply(Visitor v);
}

Each of our two expressions is simple to implement:

public class IntExpression implements Expression
{
    private int value;
    public IntExpression(int v) { value = v; }
    public void apply(Visitor v) {  v.caseInt(this); }
    public int getValue() { return value;}
}
 
public class PlusExpression implements Expression
{
    private Expression left;
    private Expression right;
    public PlusExpression(Expression l, Expression r) { left = l; right = r;}
    public void apply(Visitor v) {  v.casePlus(this); }
    public Expression getLeft() { return left;}
    public Expression getRight() { return right;}
}

Our example visitor will simply print out the expression. Notice that since a plus expression could be a recursive structure (1 + 2 + 3 for example) that the apply is called on each child to make the visit recursive.

public class Visitor
{
    public void caseInt(IntExpression intExp)
    {
        System.out.print(intExp.getValue());
    }
    public void casePlus(PlusExpression plusExp)
    {
        plusExp.getLeft().apply(this);
        System.out.print("+");
        plusExp.getRight().apply(this);
    }
}

Again, please note that the point here is to push all the pain of implementation into the visitor class with minimal effort (the apply method) in the data structure.

What happens if you want to write a second visitor that does something different? Simply extend the original Visitor and override the methods! Here is a simple example for our little plus expression language:

public class AddVisitor extends Visitor
{
    private int sum;
    public AddVisitor() { sum = 0;}
    public void caseInt(IntExpression intExp)
    {
        sum += intExp.getValue();
    }
    public void casePlus(PlusExpression plusExp)
    {
        plusExp.getLeft().apply(this);
        plusExp.getRight().apply(this);
    }
    public int getSum() { return sum;}
}

(If you want to test all of these, they're here.)

Your tasks for project 1

Download the chap1.tar files (also all under the chap1/  directory here). The files under the abstractsyntax directory implement a simple little language. Your tasks:

  1. Implement a visitor that prints out the program. (Notice that TestProgram.java constructs the program via constructors, so your printer will be a nice way to see what sort of code we have.)
  2. Implement a visitor that determines the maximum number of arguments to any Print statement. [hashtables are necessary for this one, so read about them below.]
  3. Implement a visitor that interprets the program. This is more complicated, of course! I suggest the following approach because this will lead you on a path that is useful later in the "real" compiler.
    1. The first interpretation issue deals with assignments. You will need to implement a symbol table to keep track of the values. A hash table is the easiest way to do this. For example, you can create a visitor that has a hash table called SymbolTable. This language has no declaration, so the first time you encounter the identifier you insert it into the table. (Ideally you only do this via an assignment, and note it as an error otherwise, *but* it is okay for this assignment to insert it into the table and give it a value of 0 the first time you see it.)
    2. You'll need to evaluate expressions as well. How to do this? Believe it or not, another hash table is the trick, but rather than indexing into it with an identifier, index into it with a node of the tree. In java, any object can be used to index a Hashtable. So for example, in our plus expression language, we could have done the following for our visitor:
import java.util.*;
public class AddVisitorHash extends Visitor
{
    private Hashtable values;
    public AddVisitorHash() { values = new Hashtable();}
    public void caseInt(IntExpression intExp)
    {
      values.put(intExp, new Integer(intExp.getValue()));
    }
 
    public void casePlus(PlusExpression plusExp)
    {
      plusExp.getLeft().apply(this);
      plusExp.getRight().apply(this);
      Integer leftSummand = (Integer) values.get(plusExp.getLeft());
      Integer rightSummand = (Integer) values.get(plusExp.getRight());
      values.put(plusExp, new Integer(leftSummand.intValue()
                                     +rightSummand.intValue()));
    }
    public int getValue(Expression e) { return ((Integer) values.get(e)).intValue();}
}
    1. When you interpret an expression list, one way to deal with the interpretation is to store an array of values as its result (in the hashtable mentioned in b). Then when you need to interpret the print statement or an assignment that has an ESEQ, you can easily pull out the necessary value.

Finally, you'll probably (hah) want to know what each of these things in the language is:

As an example,
b := (a:= 2, a*3); print (b);
is a program that would print out 6.

What to include and how to turn this in

 

Along with your code, you should write and provide test cases that demonstrate your code works. (i.e. do not simply try this on the TestProgram).

When you use handin, you should use tar to create an archive of the entire directory rather than handing in individual files. You should use the following command to hand-in your project:

handin 310 project1 file1.tar

 

where file1.tar is the name of your tar file. With your files you should include test case(s) called test1.java, test2.java, etc. (based on TestProgram.java) that fully exhibit your code’s capabilities. The test cases should be commented to indicate what they are testing.  You may use the same test cases on all 3 programs. You should also include a short writeup in a file called readme that describes the current status of the project.

 

This assignment was originally developed by Gary Lewandowski.