Of course, the alignment of programs is much more complex. We are not handling sequences of single symbols, but sequencs of feature vectors, where the features are different in nature.
As mentioned before, an alignment algorithm introduces gaps into both input sequences, such that similar elements of both sequences are aligned. To do that we need a dissimilarity measure that works on sequence elements. In our case, we can define such a metric as the weighted sum of feature- specific metrics.
More formally: Consider two nodes x and y. We have K different features and denote the kth feature of node x as xk. Then we define the distance d between the nodes as:
d(x,y) := Σk=1 ... K wk ⋅ ck(xk , yk)
where wk is a weight between 0 and 1, such that all weights add up to 1 and ck is the distance between the kth feature value for both nodes.
To construct a fitting alignment distance for your, you have several design choices:
To illustrate this point we have set up an example on the right hand side: We align two very simple Java code snippets:
int myVar = 4;
and
int yourVar;
yourVar = 4;
Both programs have the same function: They initialize an integer variable with the value 4. However, they do so in different ways: The first program does it all in one step, while the second program declares the variable first and sets it later. This has consequences regarding our sequential representation: The sequence representing the first program consists just of a variable node and a literal node (representing the 4). For the second program we have three nodes, a variable, an assignment and a literal. Also, the names of the declared variables are different.
You can play around with the alignment of those two snippets on the right hand side:
Note that all of these choices manipulate the overall alignment metric. In particular, note that the initial parameters tend to overestimate the differences between the two programs. But if you choose different parameters, you can focus the "attention" of the alignment to structural similarities between the two programs, such that the distance gets very small.