This project has intermediate milestones that are relevant to your grade.
Build a compiler for Jack -- a modern, object-based, Java-like language. The compiler construction spans two projects: 10 (Syntax Analysis) and 11 (Code Generation). In this project we build a syntax analyzer that parses programs according to the Jack grammar, producing an XML file that reflects the program's structure. In the next project, the XML code will be replaced with executable machine code.
The main tool in this project is the programming language in which you will implement the syntax analyzer. You will also need the TextComparer
utility (available in your tools directory), which allows comparing the output files generated by your analyzer to the compare files supplied with the project. You may also want to inspect the generated and supplied output files using an XML viewer (any standard Web browser should do the job). All the test files necessary for this project are available in project 10.zip.
Write the syntax analyzer program in two stages: tokenizing and parsing. Use it to parse all the .jack
files supplied below. For each source .jack
file, your analyzer should generate an .xml
output file. The generated files should be identical to the .xml
compare-files supplied.
A natural way to test your analyzer it is to have it parse some representative Jack programs. You will use two such test programs: Square Dance and Array Test. The former includes all the features of the Jack language except for array processing, which appears in the latter. A simpler version of the Square Dance program is also supplied and explained below.
Square Dance: A simple interactive game, described in Project 9. The game implementation is organized in three classes:
Source class files |
Description |
Tokenizer output |
Parser output |
Initializes and starts a new "square moving session." |
|||
Implements an animated square. A square object has a screen location and size properties, and methods for drawing, erasing, moving, and size changing. |
|||
Runs the program according to the game rules. |
Note: The three source Jack files comprising the Square Dance game are identical to those stored in the projects/09/Square directory. For completeness, an identical copy of these files is also available in the projects/10/Square directory.
Expressionless Square Dance: This
is a version of Square Dance in which each expression in the original source
code was replaced with a single identifier (some variable name in scope). This
version of the program was designed for one purpose only: unit-testing the
analyzer's ability to parse everything except for expressions. Note that the
replacement of expressions with variables has resulted in a nonsensical program that
cannot be compiled by the supplied Jack Compiler. Still, it follows all the Jack
grammar rules. The
expressionless files have the same names as those of the original files, but
they are located in a separate directory (projects/10/ExpressionlessSquare
Source class files |
Tokenizer output |
Parser output |
Array
test: A
single-class Jack program that computes the average of a user-supplied sequence
of integers using array notation and array manipulation:
Source class files |
Tokenizer output |
Parser output |
Experimenting
with the test programs: If you want, you can compile the Square Dance and
Test Array programs using the supplied Jack compiler, then use the
supplied VM
Emulator to run the compiled code. These activities are completely
irrelevant to this project, but they serve to highlight the fact that the test
programs are not &quout;just&quout; plain text (although this is perhaps the best way to think
about them in the context of this project).
First, read ECS chapter 10. This pdf may also be insightful.
Within your src
directory, create a subdir named project10
. Extract the contents of project10.zip into it.
Add, commit and push the new files.
Stage I: Tokenizer
First, implement the JackTokenizer module specified in the chapter 10. When applied to a text
file containing Jack code, the tokenizer should produce a list of tokens, each
printed in a separate line along with its classification: symbol, keyword,
identifier, integer constant, or string constant. The
classification should be recorded using XML tags. Here is an example:
Source Code |
Tokenizer output |
if (x < 153) {let city = ”Paris”;} |
<tokens>
<keyword> if </keyword>
<symbol> ( </symbol>
<identifier> x </identifier>
<symbol> < </symbol>
<integerConstant> 153 </integerConstant>
<symbol> ) </symbol>
<symbol> { </symbol>
<keyword> let </keyword>
<identifier> city </identifier>
<symbol> = </symbol>
<stringConstant> Paris </stringConstant>
<symbol> ; </symbol>
<symbol> } </symbol> </tokens> |
Note that in the case of string constants, the tokenizer throws away the double quote characters. That’s OK.
The tokenizer’s output has two “features” dictated by XML conventions. First, an XML file must be enclosed in some begin and end tags, and that’s why the <tokens> and </tokens> tags were added to the output. Second, four of the symbols used in the Jack language (<, >, ", &) are also used for XML markup, and thus they cannot appear as data in XML files. To solve the problem, we require the tokenizer to output these tokens as <, >, ", and &, respectively. For example, in order for the text “<symbol> < </symbol>” to be displayed properly in a web browser, the source XML should be written as “<symbol> < </symbol>”.
Testing
Your Tokenizer:
Test your tokenizer on the Square Dance and Test Array programs. There is no need to test it on the expressionless version of the former.
For each source file Xxx.jack,
have your tokenizer give the output file the name XxxT.xml.
Apply your tokenizer to every class file in the test programs, then use the
supplied TextComparer utility to compare the
generated output to the supplied .xml
compare files.
Since the output files generated by your tokenizer will have the same names and extensions as those of the supplied compare files, we suggest putting them in separate directories.
Stage II: Parser
Next,
implement the CompilationEngine module specified in chapter 10. Write
each method of the engine, as specified in the API, and make sure that it emits
the correct XML output. We recommend to start by writing a compilation engine
that handles everything except expressions, and test it on the expressionless
Square Dance program only. Next, extend the parser to handle expressions as
well, and proceed to test it on the Square Dance and Array Test
programs.
Testing
Your Parser:
Apply your analyzer to the supplied test programs, then use the supplied TextComparer utility to compare the generated output to the supplied .xml compare files.
Since the output files generated by your analyzer
will have the same names and extensions as those of the supplied compare
files, we suggest putting them in separate directories.
Note that the indentation of the XML output is only for readability. Web browsers and the supplied TextComparer utility ignore white space.
Due Date | Points | Description | Grading Criteria |
---|---|---|---|
April 14 @ 4:00PM | 100 | Choose partners | You must send me an email that tells me:
|
April 16 @ 4:00PM | 50 | Project setup | You must add, commit and push the project10.zip files which should reside in your project10 directory. |
April 19 @ 4:00PM | 350 | Stage I: Tokenizing | You must commit & push a compiler that generates proper XxxT.xml . |
April 21 @ 5:00PM | 400 | Stage II: Parsing (expressionless) | You must commit & push a compiler that generates proper Xxx.xml . Your compiler should handle everything except expressions, tested on the expressionless Square Dance code only. |
April 23 @ 5:00PM | 200 | Stage II: Parsing (complete) | You must commit & push a compiler that generates proper Xxx.xml . It should handle expressions, tested on Square Dance and Array Test programs. |
Your repository must show a history of work. You should be committing at least every time you complete a milestone (per the ones above or your own intermediate milestones). A repository log showing one commit with a message like "project done" is not acceptable.