Step 2: A Sequential Representation

In order to apply the TCS Alignment Toolbox, one has to transfer the student solutions into a sequential format. Fortunately, this is quite intuitive for Java programs: We consider the Abstract Syntax Tree of the program, as returned by the Oracle Java Compiler API. For this demo, we have already extracted the syntax trees and put them in a simple text format into the fog directory.

To transform these trees into a sequence we use the prefix order of the nodes in the tree. This relates to the order of statements in the program:

int a = f(b, c);

is transformed to:

  1. a node representing the variable declaration for int a,
  2. a node representing the function call to f and
  3. nodes representing the variable references to b and c.

We record several features of every node in the syntax tree:

  1. The overall type of the node (such as variable for a variable declaration),
  2. the position of the node in the original code,
  3. the index of the parent node in the syntax tree,
  4. the scope the node lies in,
  5. the name, if this node declares a variable, method or class,
  6. the name of the class, if this node declares a variable,
  7. the class name of the returned object, if this node declares a method and
  8. the names of methods, classes and variables that are referenced but are not part of this program.

On the right hand side we show the sequence representations for each of the student solutions in our dataset.