Project 34 (ecs10)

April 14, 2010

This project has intermediate milestones that are relevant to your grade.

Objective

Build a compiler for Jack -- a modern, object-based, Java-like language. The compiler construction spans two projects: 10 (Syntax Analysis) and 11 (Code Generation). In this project we build a syntax analyzer that parses programs according to the Jack grammar, producing an XML file that reflects the program's structure. In the next project, the XML code will be replaced with executable machine code.

Resources

The main tool in this project is the programming language in which you will implement the syntax analyzer. You will also need the TextComparer utility (available in your tools directory), which allows comparing the output files generated by your analyzer to the compare files supplied with the project. You may also want to inspect the generated and supplied output files using an XML viewer (any standard Web browser should do the job). All the test files necessary for this project are available in project 10.zip.

Contract

Write the syntax analyzer program in two stages: tokenizing and parsing. Use it to parse all the .jack files supplied below. For each source .jack file, your analyzer should generate an .xml output file. The generated files should be identical to the .xml compare-files supplied.

Test Programs

A natural way to test your analyzer it is to have it parse some representative Jack programs. You will use two such test programs: Square Dance and Array Test. The former includes all the features of the Jack language except for array processing, which appears in the latter. A simpler version of the Square Dance program is also supplied and explained below.

Square Dance:  A simple interactive game, described in Project 9. The game implementation is organized in three classes:

Source class files

Description

Tokenizer output

Parser output

Main.jack

Initializes and starts a new "square moving session."

MainT.xml

Main.xml

Square.jack

Implements an animated square. A square object has a screen location and size properties, and methods for drawing, erasing, moving, and size changing.

SquareT.xml

Square.xml

SquareGame.jack

Runs the program according to the game rules.

SquareGameT.xml

SquareGame.xml

Note: The three source Jack files comprising the Square Dance game are identical to those stored in the projects/09/Square directory.  For completeness, an identical copy of these files is also available in the projects/10/Square directory.

Expressionless Square Dance: This is a version of Square Dance in which each expression in the original source code was replaced with a single identifier (some variable name in scope). This version of the program was designed for one purpose only: unit-testing the analyzer's ability to parse everything except for expressions. Note that the replacement of expressions with variables has resulted in a nonsensical program that cannot be compiled by the supplied Jack Compiler. Still, it follows all the Jack grammar rules. The expressionless files have the same names as those of the original files, but they are located in a separate directory (projects/10/ExpressionlessSquare):

Source class files

Tokenizer output

Parser output

Main.jack 

MainT.xml

Main.xml

Square.jack

SquareT.xml

Square.xml

SquareGame.jack

SquareGameT.xml

SquareGame.xml

Array test: A single-class Jack program that computes the average of a user-supplied sequence of integers using array notation and array manipulation:

Source class files

Tokenizer output

Parser output

Main.jack 

MainT.xml

Main.xml

Experimenting with the test programs: If you want, you can compile the Square Dance and Test Array programs using the supplied Jack compiler, then use the supplied VM Emulator to run the compiled code. These activities are completely irrelevant to this project, but they serve to highlight the fact that the test programs are not &quout;just&quout; plain text (although this is perhaps the best way to think about them in the context of this project).

Recommended Steps

First, read ECS chapter 10. This pdf may also be insightful.

Within your src directory, create a subdir named project10. Extract the contents of project10.zip into it.

Add, commit and push the new files.

cd src/project10
git add *
git commit -m "Adding project10 files, swish!"
git push

 

Stage I: Tokenizer

 

First, implement the JackTokenizer module specified in the chapter 10. When applied to a text file containing Jack code, the tokenizer should produce a list of tokens, each printed in a separate line along with its classification: symbol, keyword, identifier, integer constant, or string constant. The classification should be recorded using XML tags.  Here is an example:

 

Source Code

Tokenizer output

if (x < 153) {let city = ”Paris”;}

<tokens>

  <keyword> if </keyword>

  <symbol> ( </symbol>

  <identifier> x </identifier>

  <symbol> &lt; </symbol>

  <integerConstant> 153 </integerConstant>

  <symbol> ) </symbol>

  <symbol> { </symbol>

  <keyword> let </keyword>

  <identifier> city </identifier>

  <symbol> = </symbol>

  <stringConstant> Paris </stringConstant>

  <symbol> ; </symbol>

  <symbol> } </symbol>

</tokens>

Note that in the case of string constants, the tokenizer throws away the double quote characters. That’s OK.

The tokenizer’s output has two “features” dictated by XML conventions. First, an XML file must be enclosed in some begin and end tags, and that’s why the  <tokens> and </tokens> tags were added to the output. Second, four of the symbols used in the Jack language (<, >, ", &) are also used for XML markup, and thus they cannot appear as data in XML files. To solve the problem, we require the tokenizer to output these tokens as &lt;, &gt;, &quot;, and &amp;, respectively. For example, in order for the text “<symbol> < </symbol>” to be displayed properly in a web browser, the source XML should be written as “<symbol> &lt; </symbol>”.

Testing Your Tokenizer:

 

Stage II: Parser

Next, implement the CompilationEngine module specified in chapter 10. Write each method of the engine, as specified in the API, and make sure that it emits the correct XML output. We recommend to start by writing a compilation engine that handles everything except expressions, and test it on the expressionless Square Dance program only. Next, extend the parser to handle expressions as well, and proceed to test it on the Square Dance and Array Test programs.

Testing Your Parser:

  1. Apply your analyzer to the supplied test programs, then use the supplied TextComparer utility to compare the generated output to the supplied .xml compare files.

  2. Since the output files generated by your analyzer will have the same names and extensions as those of the supplied compare files, we suggest putting them in separate directories.

  3. Note that the indentation of the XML output is only for readability. Web browsers and the supplied TextComparer utility ignore white space.

Grading Criteria (1100 pts)

Due Date Points Description Grading Criteria
April 14 @ 4:00PM 100 Choose partners You must send me an email that tells me:
  • Who you are partnering with (or if you're working alone)
  • Whose repository you will be using (so I can add you as a collaborator)
April 16 @ 4:00PM 50 Project setup You must add, commit and push the project10.zip files which should reside in your project10 directory.
April 19 @ 4:00PM 350 Stage I: Tokenizing You must commit & push a compiler that generates proper XxxT.xml.
April 21 @ 5:00PM 400 Stage II: Parsing (expressionless) You must commit & push a compiler that generates proper Xxx.xml. Your compiler should handle everything except expressions, tested on the expressionless Square Dance code only.
April 23 @ 5:00PM 200 Stage II: Parsing (complete) You must commit & push a compiler that generates proper Xxx.xml. It should handle expressions, tested on Square Dance and Array Test programs.

Your repository must show a history of work. You should be committing at least every time you complete a milestone (per the ones above or your own intermediate milestones). A repository log showing one commit with a message like "project done" is not acceptable.

The final due date is 5PM on Friday, April 23.