Module 3 of 4
Building the Tree
From Geometry to Matrix to Tree.
Here we see the full pipeline of distance-based phylogenetics.
- Geometry (Top): The raw physical reality of the structures.
- Matrix (Bottom Left): The pairwise distances ($d_{ij}$) derived from geometry.
- Tree (Bottom Right): The simplified evolutionary graph.
Interactive Lab
Drag the points (A, B, C, D) in the top plot.
The ◆ Black Diamond on the tree marks the Midpoint Root.
Neighbor-Joining (NJ)
Unlike UPGMA (which assumes a constant molecular clock), NJ allows for varying rates of evolution.
Common Sources of Error (and how to create them by dragging points)
Distance-based trees are only as good as the distance matrix. Below are a few classic failure modes you can
intentionally create in “Shape Space” to see how and why trees go wrong.
-
Weak signal / no clear clusters
Setup: Arrange A, B, C, D roughly as the corners of a square (all pairwise distances similar).
What you’ll see: the distance matrix loses block structure, and small moves can flip the NJ joins.
-
Long-branch attraction (a fast-evolving outlier)
Setup: Drag A far away from the other three points (A becomes an extreme outlier).
What you’ll see: branch lengths stretch and internal joins become unstable; the midpoint root shifts.
In real datasets, a rapidly evolving taxon can sometimes appear incorrectly “pulled” toward the wrong group.
-
Star-like topology (insufficient resolution)
Setup: Place A, B, C, D all at similar distance from each other (near-equal off-diagonal values).
What you’ll see: the tree becomes effectively unresolved; multiple topologies fit nearly equally well.
-
Near-ties in the NJ criterion (multiple “best” joins)
Setup: Make A close to B and also close to C (a “triangle” where AB ≈ AC), while D is moderately far.
What you’ll see: tiny drags cause a different first join (A–B vs A–C), even though the matrix looks similar.
-
Hidden structure vs. projected geometry (model mismatch)
Setup: Create two tight clusters (A,B) and (C,D), then slowly slide the clusters closer until they overlap.
What you’ll see: the matrix still contains subtle differences, but the tree may “snap” between plausible
explanations. In real structural phylogenetics, different distance definitions (TM-score vs RMSD vs Q-score)
can change what the algorithm considers “close”.
-
Noise sensitivity (small measurement error)
Setup: Put A and B extremely close, and C and D extremely close, then drag one point by a tiny amount.
What you’ll see: the heatmap changes only slightly, but the inferred topology can still flip if there is a near-tie.