p(Zn): A Background-Calibrated Classifier for Zn/Mn Binding Pocket Determination

Abstract. Distinguishing zinc-binding from manganese-binding sites remains challenging because many metalloproteins share similar folds while differing in subtle local coordination chemistry. We define p(Zn) as the probability that a metal-binding pocket is zinc-like rather than manganese-like, conditioned on structural and chemical descriptors of the local environment.

To estimate p(Zn), we train a supervised predictor on a background population of curated metal sites drawn from experimentally determined structures. For each metal site, features are extracted from the first coordination shell (ligand identity and geometry) and the surrounding second shell (local residue environment), capturing both direct coordination and contextual constraints. The resulting model assigns each site a continuous score that quantifies how Zn-like versus Mn-like its pocket is, enabling robust ranking and comparison across proteins, conformations, and experimental conditions.

Biophysical landscape of metal selectivity: Mn vs Zn basins and example site scores
Figure: The biophysical landscape of metal selectivity
A balanced training dataset was created from curated Mn-bound and Zn-bound metal sites. Each site is assigned a class label (Mn vs Zn) and the model learns a continuous score p(Zn), interpreted as the probability that a pocket is Zn-like given its structural and chemical features (with p(Mn)=1−p(Zn)). The two shaded basins show the distribution of p(Zn) for the training data: Mn-bound sites concentrate near low p(Zn), and Zn-bound sites near high p(Zn), with a central promiscuous valley where pocket identity is less separable. The overlaid dots illustrate how a new protein structure would score: points near the Mn basin indicate Mn-like pockets, near the Zn basin indicate Zn-like pockets, and points in the valley suggest promiscuous/ambiguous identity. The vertical displacement is only for visual separation and does not encode an additional measured quantity.