Step 5: Alignment Visualization

If we want to inspect the alignments on a whole dataset, the interface of this demo is certainly not ideal. There are several ways to visualize alignments and distances in general:

A basic option is to just have a look at a heat map of the distance matrix, where each entry (i,j) contains the distance between the ith sequence and the jth sequence. If the sequences are sorted properly, clusters become apparent as patches in the heat map, where distances inside the clusters are relatively low, while distances between the clusters are relatively high.

Many dimensionality reduction methods use distances in the high-dimensional space as basis for a low-dimensional embedding of the data. If one restricts the dimensionality of the embedding to two or three dimensions, a picture of the dataset can be obtained. Notable methods include Multidimensional Scaling (MDS) and t-Stochastic Neighborhood Embedding (t-SNE).

On the right hand side we provide a button to calculate a HTML visualization of the dataset. The visualization includes a heatmap, where each entry of the map can be clicked to inspect the detailed alignment between both sequences. The TCS Alignment Toolbox provides the means to compute such a visualization.

Up to this point, the TCS Alignment Toolbox does not contain dimensionality reduction algorithms. However, the interface for such methods is just a distance matrix, which can be easily computed using the toolbox. On the right hand side we show a t-SNE embedding of our test dataset.