Module 1 of 4

The Shape of Numbers

How do we measure the evolutionary distance between two structures?

To understand this without getting lost in complex 3D protein folds, this playground uses simple 2D geometric shapes as analogs. The mathematical principles, however, are identical.

In structural biology, the standard metric for comparison is RMSD (Root Mean Square Deviation). Mathematically, calculating RMSD is a Procrustes problem: you must translate and rotate one structure to optimally match the other before measuring the difference.

Try it yourself

Use the slider below to inject Gaussian Noise (mimicking evolutionary drift) into the shape's coordinates.

The "Procrustes" Distance

Mathematical Definition

The Procrustes Distance ($d$) displayed on the right is essentially the RMSD. It minimizes the squared difference between shape $A$ and shape $B$ after optimal superposition (rotation $\mathbf{R}$ and translation $\mathbf{t}$):

$$ d(A, B) = \sqrt{ \sum_{i=1}^{n} \| \mathbf{a}_i - (\mathbf{b}_i \mathbf{R} + \mathbf{t}) \|^2 } $$

Note on other metrics: While RMSD is a direct distance, other common scores like TM-score or Q-score measure similarity (0 to 1). To build trees, we convert these into distances by inverting them (e.g., $d = 1 - \text{TM-score}$), but the underlying geometric concept remains the same.

As you increase the noise, watch how the distance grows. At a certain point, the "signal" of the original circle is overwhelmed by noise.