SGGly Structure-Guided N-Linked Glycosylation Prediction Using a Pretrained Large Language Model and Deep Neural Networks

Xiaotong Gu, Yunzhuo Zhou, Yoochan Myung, David B. Ascher

Abstract: N-linked glycosylation is crucial for protein function and stability, yet accurate identification of glycosylated sites remains challenging because site occupancy depends on both sequence motifs and structural context. Many existing computational approaches focus on motif-centred sequence windows and provide limited support for whole-protein inspection of candidate sites. We introduce SGGly, a freely accessible web server for structure-guided analysis of candidate N-linked glycosylation sites across full-length proteins. SGGly integrates full-protein sequence embeddings from the transformer-based ProtBERT language model with sequon-aware and structure-derived residue descriptors to generate residue-level candidate-site predictions. The server returns downloadable residue-level predictions together with interactive 3D visualisation, enabling users to inspect and prioritise candidate sites directly in their structural context. Evaluated under strict publication-supported and independent benchmark settings, SGGly demonstrated strong generalisability and competitive performance against existing methods. SGGly provides a practical resource for whole-protein glycosylation candidate mapping, structural inspection, and follow-up hypothesis generation to support experimental design and interpretation of glycoproteomic observations. The website is freely accessible to all users, with no login required.

Workflow image