The transformative power of transformers in protein structure prediction

Transformer neural networks have revolutionized structural biology with the ability to predict protein structures at unprecedented high accuracy. Here, we report the predictive modeling performance of the state-of-the-art protein structure prediction methods built on transformers for 69 protein targets from the recently concluded 15th Critical Assessment of Structure Prediction (CASP15) challenge. Our study shows the power of transformers in protein structure modeling and highlights future areas of improvement.

OmegaFold. OmegaFold (5) was installed into a Python virtual environment via the pip installable repository from https://github.com/HeliXonProtein/OmegaFold downloaded on September 6, 2022. The pretrained pLM called OmegaPLM with 670 million parameters was used. Similar to ESMFold, OmegaFold is free from any database dependencies. We used the default 10 rounds of recycling, generating a single prediction for each target that was used as the predicted structural model.

Evaluation metrics and performance assessment
To evaluate the accuracy of the protein structural models predicted by AlphaFold2, RoseTTAFold, ESMFold, and OmegaFold, we used a set of widely used metrics, including GDT-TS (6), GDC-SC (7), TM-score (8) and lDDT (9). The experimental structures were obtained from the CASP15 website at https://predictioncenter.org/download_area/CASP15/targets/ to evaluate the accuracy of the predicted structural models. For evaluating the domain-level prediction accuracy, we used the official domain definitions from CASP15 available at: https://predictioncenter.org/casp15/domains_summary.cgi. Below, we provide a brief description of the evaluation metrics and performance assessment.

GDT-TS.
Global distance test total score (GDT-TS) (6) measures the global structural similarity between two protein structures. Specifically, the score quantifies the percentage of residues (using the Cα atoms in particular) in a structural model that fall within specific distance thresholds of the corresponding residues in the reference structure, after optimal structural superposition as implemented in the local/global alignment (LGA) program (6). Four distance thresholds of 1Å, 2Å, 4Å, and 8Å are used for GDT-TS calculation as follows: where represents the percentage of residues in the model that are superimposed with the corresponding residues of the reference structure within the distance threshold of Å. The GDT-TS score ranges from 0 to 100, with higher scores indicating greater structural similarity between the model and the reference structure.

GDC-SC.
Global distance calculation for side-chains (GDC-SC) (7) is a side-chain specific superposition-based global accuracy metric implemented in the LGA program (6) for measuring the accuracy of the side-chain positioning in a protein structural model. GDC-SC uses residuespecific representative side-chain atoms during the comparison with the reference structure considering 10 different distance thresholds ranging from 0.5Å to 5Å with a step size of 0.5Å as follows: . [2] where = 10. GDC-SC score ranges from 0 to 100, and a higher score indicates better accuracy of the side-chain positioning.
For each of 69 full-length targets from CASP15, we evaluated the global backbone and sidechain accuracies of the structural models predicted by AlphaFold2, RoseTTAFold, ESMFold, and OmegaFold by comparing the predictions against the experimental structures using the LGA program and calculating the GDT-TS and GDC-SC scores.
TM-score. Template modeling score (TM-score) (8) is a structural alignment-based evaluation metric to calculate the topological similarity of two protein structures, calculated as follows: where ! represents the length of the reference protein, " represents the length of the aligned residues in the predicted structural model, # is the distance between the i th pair of aligned residues, and $ is a scale to normalize the TM-score in a way that the magnitude of the average TM-score for random protein pairs is independent of the size of the proteins. denotes the maximum value after optimal structural superposition between the predicted structural model and the reference protein.
Ranging from 0 to 1, TM-score provides a measure of the topological similarity of a protein structural model to a reference structure, with higher scores indicating better similarity. Particularly, TM-score > 0.5 indicates that the predicted structural model has a similar overall topology to the reference structure (10). We calculated the TM-scores of the structural models predicted by AlphaFold2, RoseTTAFold, ESMFold, and OmegaFold for the 69 CASP15 targets by comparing them against the experimental structures. For each method, we then calculated the correct overall topology prediction by identifying the predicted structural models with TM-score > 0.5.
lDDT. The local distance difference test (lDDT) (9) measures the accuracy of a predicted structural model against the reference structure in a superposition-free way by computing the agreements between the local distances (L) of atom-pairs in a structural model within a certain inclusion radius (default 15Å) to that of the reference structure. The local distances are considered 'preserved' if they fall within specific distance thresholds (typically, 0.5Å, 1Å, 2Å, and 4Å), and lDDT calculates the fraction of preserved distances as a measure of the accuracy of the predicted structural model. The final lDDT score is obtained through averaging the fractions of local distances across all distance thresholds.
We calculated the global lDDT scores to evaluate the structural models predicted by AlphaFold2, RoseTTAFold, ESMFold, and OmegaFold against the experimental structures. The global lDDT score ranges from 0 to 1, with higher scores indicating higher predictive modeling accuracy.
Assessment of multi-domain proteins. To investigate the domain-level predictive modeling accuracy, we analyzed the subset of multi-domain targets out of 69 full-length targets from CASP15, resulting in a total of 45 domains. We measured the domain-level GDT-TS for each predicted structural model with respect to the corresponding 45 experimental domains and subsequently computed the weighted-sum GDT-TS as follows:  [4] where is the length of domain , _ ( ) is the GDT-TS score for domain , and represents the number of domains in a full-length target. Using all the multi-domain targets, we analyzed the Grishin plots (11) for AlphaFold2, RoseTTAFold, ESMFold, and OmegaFold to investigate how the full-length target GDT-TS compares to the weighted sum GDT-TS.

Ramachandran plot analysis.
In addition to measuring the accuracy of the predicted structural models by comparing against the experimental structures, we analyzed the Ramachandran plot (12) which captures the joint distribution of the backbone torsion angle pairs ϕ and ψ. A Ramachandran plot offers a standard local quality check of protein structures often used by crystallographers to detect possible angular outliers since not all values of ϕ and ψ are equally frequent, and many combinations are never observed due to steric constraints. We examined how closely the Ramachandran plots of predictions from AlphaFold2, RoseTTAFold, ESMFold, and OmegaFold matched the Ramachandran plot for the corresponding experimental structures.

MolProbity analysis.
To further investigate the quality of the predicted structural models at the atomic detail, we used the MolProbity score (13), which is a composite evaluation metric that analyzes the steric clashes between atoms in a protein structure, the percentages of Ramachandran outliers, side-chain rotamers outliers, and the Cβ outliers. The MolProbity score provides a quantitative measure of the estimated deviation of a structural model from its ideal crystallographic resolution, with lower score indicating higher quality. We analyzed the MolProbity scores for predicted structural models from AlphaFold2, RoseTTAFold, ESMFold, and OmegaFold.