Step 6: Metric Learning

The initial parametrization of the alignment metric on programs is not perfect. For example: The codePosition feature is very susceptible to stylistic noise. If we insert a comment in our code, the position of all following nodes is influenced, without any effect on the program semantics. The scope feature, on the other hand, captures the structure of the program much better.

However, as we have seen before, it is diffuclt to determine the optimal parametrization by hand. To infer optimal metric parameters from data automaticallyis the topic of metric learning. In this particular setting, we want to infer an optimal weighting of the different features, such that programs with the same underlying strategy are close together and programs with a different underlying strategy are as far apart as possible.

For this demo we apply a simple gradient descent on the Large Margin Nearest Neighbor (LMNN) cost function (see Weinberger et al. (2009)). If you click on the Learn button on the right hand side the learning is started and you can observe the development of the weighting during learning.