CSCI 310 Spring 2008 Project 2: The Lexical Analyzer

Due date: February 4, 2008, 11:59 pm

This project begins your implementation of your own compiler. You will use a tool called sablecc that takes a specification of tokens and turns it into lexical analyzer code.

 

We’ll use a language which is a superset of C- (a subset of C) called C2008.  See added description here.

If you are using cerebro as your primary platform, then sablecc is already installed. However, you can also download it from sablecc.org (get the latest beta).

Format of the sable input file

example.sable is an example of a lexical description in the appropriate sable format, taken from program 2.10 in the text.

format of the file

At this point you have essentially three sections in the file:

  • Helpers: defines things that are not tokens, but will be used to make up tokens. In the example you'll see things like digits are defined here. Also, carriage return and line feed are defined with their ascii values. These are useful for eating whitespace and comments...
  • Tokens: defines actual tokens -- regular expressions we care about noticing in the text. Notice that we define comments even though we don't really want to pass them on.
  • IgnoredTokens are tokens that do not get passed on to whichever function is using the analyzer.

buiding the lexer

After you have created your lexical description, you can build the lexer by simply calling sablecc on the file holding the description. For example, sablecc example.sable created the whole bunch of files here.

At this point the only directories created that you may care about are lexer/ and node/. lexer/ holds the code that runs the lexical analyzer and node/ holds the data nodes that will hold the specific tokens.

using the lexer

MainLexer.java is an example using the lexical analyser. You can put this in the directory that holds your lexical description and compile it with java. Then if you give it a C2008 program, it will spit out the tokens!

So, your task

Build a lexical description for the C2008 grammar. Write test data to show that it recognizes all the appropriate tokens (and doesn't recognize non-tokens).

 

Put your MainLexer.java and c2008.sable in your project2 directory with your test cases.  You’ll handin your project2 directory as a tar file:

handin 310 project2 project2.tar

 

Thanks to Gary for the original project