Structome-TM
Origin of Structome
Structome began as an initiative to gather the evolutionary signal of relatedness between proteins sharing a structural neighborhood. The first work in this area utilized the Q-score metric to rapidly assemble datasets, leveraging the structural similarity between proteins. As the research evolved, alternative metrics such as the TM-score were explored to provide complementary insights into structural comparisons. Using "1 - TM-score" as a distance measure, phylogenetic trees can be constructed to illustrate evolutionary relationships inferred from protein structures.
To emphasize the reliance of this resource on the TM-score metric and distinguish it from other tools in the Structome suite, this resource is named Structome-TM.
What is Structome-TM?
Structome-TM is a web resource for determining evolutionary history using structural relatedness. Using the TM-score metric as a distance measure, it employs neighbor-joining (NJ) methods to infer evolutionary relationships between protein structures. Structome-TM is part of the Structome suite, designed to provide high-quality, structure-informed phylogenetic analyses.
How was the resource built?
RCSB protein seqres data was filtered for proteins > 50 amino acids, clustered at 90% identity using USEARCH, resulting in 69,138 clusters. For each cluster, a centroid was obtained, and 69,138 pairwise comparisons were carried out to determine TM-scores for each protein structure comparison.
What can users achieve?
Structome-TM primarily serves as a tool to explore structural neighborhoods and infer phylogenetic relationships using two distinct starting points:
- Sequence-based Search: Submit a protein sequence (1-200 residues) to perform an on-the-fly structure prediction using ESMFold. The predicted model is then searched against the entire PDB. The results table includes a special sortable column indicating with a "Yes/No" marker if a given hit is also a representative Structome-TM centroid, providing a powerful bridge from broad sequence searches to curated structural datasets.
- Structure-based Search: Start with a query PDB ID to retrieve hits from the pre-computed set of Structome-TM centroids. A subset of these structurally similar proteins can be selected for further analysis.
- Custom Structure Search: Upload your own single-chain protein structure (>50 residues) in either PDB or CIF format to initiate a search
- Tree Generation: The selected subset from either search method is passed to the tree generation module, which constructs a neighbor-joining (NJ) phylogenetic tree. The distance matrix for the tree is populated with "1 - TM-score" values, representing structural dissimilarity between proteins.
- Interactive Visualization and Export: Explore evolutionary relationships interactively and access detailed logs and annotations. Final trees can be downloaded in Newick format for publication or further analysis.
Search Results
What are the columns in the table?
The search results table includes the following columns:
- CATH_Description: Description of the protein domain from the CATH database.
- CATH_ID: Identifier for the protein domain in the CATH database.
- ECOD_Description: Description of the protein domain from the ECOD database.
- ECOD_ID: Identifier for the protein domain in the ECOD database.
- Hit: Protein structure identifier (PDB ID with chain).
- Prot_Length: Length of the protein sequence.
- TM-score: The similarity score between the query and each result.
- RCSB_Description: Description of the protein structure from the RCSB database.
- is_centroid: A "Yes/No" marker indicating if a hit from a sequence-based search is also a representative centroid in the core Structome-TM dataset. This column is only present in the results from a sequence search.
- SCOP_Description: Description of the protein domain from the SCOP database.
- SCOP_ID: Identifier for the protein domain in the SCOP database.
- Tax_ID: Taxonomy identifier for the species.
- Tax_Name: Scientific name of the species.
- nAlign: Number of aligned residues between query and hit.
- percentID: Percentage identity between query and hit.
What is the TM-score distribution plot?
The TM-score distribution plot visualizes the similarity scores (TM-scores) of the hits. It provides a quick overview of the structural similarity of proteins in the neighborhood to the query structure, helping users identify closely related proteins.
What do row clicks do?
Clicking on a row in the search results table:
- The table lists centroids (Hits) which correspond to clusters with other members. A row click displays a smaller table comprising other members of the cluster. In cases where the sizes of the members vary substantially, a histogram is populated to display the sizes.
- The Mol* structural viewer will show structural overlap of the query protein and the centroid in the clicked row.
How to add structures to the tree generation module's text area?
To add structures to the tree generation module:
- Check the box next to each row.
- The selected proteins will be automatically added to the text area of the tree generation module.
- The tree generation module submits a job, which on successful completion renders a tree with interactive leaves, a table listing all members of the taxa space, and an area populated by taxon details when a tree leaf is pressed.
API
Structome-TM provides a RESTful API for programmatic access to its core functions.
Search for Structural Neighbours
Retrieve the pre-computed structural neighborhood for a given PDB chain.
- Method: GET
- Endpoint: /api/result
- Parameter: pdb_chain (e.g., 1a4f_A)
Example Request:
curl "https://biosig.lab.uq.edu.au/structome_tm/api/result?pdb_chain=1a4f_A"
Response:
Returns a JSON object containing details about the query and a list of structural neighbors with their TM-scores and annotations.
Submit a Job for Tree Generation
Submit a list of 3 to 50 PDB chains to generate a neighbor-joining tree.
- Method: GET
- Endpoint: /api/tree
- Parameter: structures (A semicolon-separated string of PDB chains, e.g., 1a4f_A;4g1b_A;1bab_A)
Example Request:
curl "https://biosig.lab.uq.edu.au/structome_tm/api/tree?structures=1a4f_A;4g1b_A;1bab_A"
Response:
If successful, returns a JSON object with a unique job_id for tracking.
{
"status": "success",
"job_id": "a1b2c3d4-e5f6-7890-1234-567890abcdef"
}
Check Tree Generation Status
Check the status of a previously submitted tree generation job using its job_id.
- Method: GET
- Endpoint: /api/tree/check_status
- Parameter: job_id (The UUID from the tree submission endpoint)
Example Request:
curl "https://biosig.lab.uq.edu.au/structome_tm/api/tree/check_status?job_id=a1b2c3d4-e5f6-7890-1234-567890abcdef"
Response:
Returns a JSON object with the job status. The status will be pending, in_progress, or complete. If complete, the response will also include the final tree in Newick format.
{
"status": "complete",
"tree": "((1a4f_A:0.1,4g1b_A:0.2):0.05,1bab_A:0.3);"
}
Access
Structome-TM is a web app available at https://biosig.lab.uq.edu.au/structome_tm/
Citation
If you find this resource helpful, please cite this:
- Structome-TM
Ensure to check the latest version of the article.
Additional reading:
Contact
For any issues with the server or functionality, please contact:
Ashar Malik
Email: ashar.malik@uq.edu.au