A structural biology community assessment of AlphaFold2 applications

Akdel, Mehmet; Pires, Douglas E. V.; Pardo, Eduard Porta; Jänes, Jürgen; Zalevsky, Arthur O.; Mészáros, Bálint; Bryant, Patrick; Good, Lydia L.; Laskowski, Roman A.; Pozzati, Gabriele; Shenoy, Aditi; Zhu, Wensi; Kundrotas, Petras; Serra, Victoria Ruiz; Rodrigues, Carlos H. M.; Dunham, Alistair S.; Burke, David; Borkakoti, Neera; Velankar, Sameer; Frost, Adam; Basquin, Jérôme; Lindorff-Larsen, Kresten; Bateman, Alex; Kajava, Andrey V.; Valencia, Alfonso; Ovchinnikov, Sergey; Durairaj, Janani; Ascher, David B.; Thornton, Janet M.; Davey, Norman E.; Stein, Amelie; Elofsson, Arne; Croll, Tristan I.; Beltrao, Pedro

doi:10.1038/s41594-022-00849-w

Download PDF

Article
Open access
Published: 07 November 2022

A structural biology community assessment of AlphaFold2 applications

Nature Structural & Molecular Biology volume 29, pages 1056–1067 (2022)Cite this article

58k Accesses
188 Citations
219 Altmetric
Metrics details

Subjects

Abstract

Most proteins fold into 3D structures that determine how they function and orchestrate the biological processes of the cell. Recent developments in computational methods for protein structure predictions have reached the accuracy of experimentally determined models. Although this has been independently verified, the implementation of these methods across structural-biology applications remains to be tested. Here, we evaluate the use of AlphaFold2 (AF2) predictions in the study of characteristic structural elements; the impact of missense variants; function and ligand binding site predictions; modeling of interactions; and modeling of experimental structural data. For 11 proteomes, an average of 25% additional residues can be confidently modeled when compared with homology modeling, identifying structural features rarely seen in the Protein Data Bank. AF2-based predictions of protein disorder and complexes surpass dedicated tools, and AF2 models can be used across diverse applications equally well compared with experimentally determined structures, when the confidence metrics are critically considered. In summary, we find that these advances are likely to have a transformative impact in structural biology and broader life-science research.

Improving microbial phylogeny with citizen science within a mass-market video game

Article Open access 15 April 2024

Highly accurate protein structure prediction with AlphaFold

Article Open access 15 July 2021

Pooled multicolour tagging for visualizing subcellular protein dynamics

Article Open access 19 April 2024

Main

Proteins are the key molecules of the cell that are involved in all cellular processes. The three-dimensional (3D) shape of a protein provides critical information that can, among many things, be used to study protein interactions, functions and the impact of missense variation. Although tremendous progress has been made in experimental approaches to determining protein structures, the experimentally determined structures of ~100,000 proteins¹ represent a very small fraction of the size and diversity of the universe of proteins. Protein structure prediction has been a fundamental challenge in bioinformatics for decades, and accurate predictions could accelerate our understanding of protein structure–function relationships, with vast impacts on the study of life. Since the first blind assessment of prediction methods, much progress has been made—including improvements in extracting pair-wise and higher order residue distance constraints from multiple sequence alignments^2,3,4,5, and an understanding of how this information is eventually encoded into a predicted 3D structure^6,7,8. These developments have been reviewed recently⁹ and can be characterized by the increasing use of neural-network models in key aspects of the challenge of predicting protein structures from their primary sequence. Along with advances in computational methods, there have been large expansions of protein sequence and structure databases^10,11, which have served as key resources for input and training of sophisticated prediction methods. These advances have led to the recent leap in performance demonstrated by Deepmind at CASP14 (ref. ¹²). AF2 has been shown to be able to predict the structure of protein domains with an accuracy matching that of experimental methods. Both the method and a database of 365,198 protein models have been released¹³, enabling the scientific community to better understand the accomplishments, abilities and limitations of AF2.

The accuracy of AF2 has been independently evaluated in blind assessments. Yet many questions remain regarding the extent to which these approaches extend our coverage of structural biology, and the limitations of the AF2 method or structures derived from AF2 for applications in biology. Regarding coverage, previous attempts to generate ‘proteome-wide’ structural models include those based on homology models, such as the SWISS-MODEL Repository (SMR)¹⁴, and more recently, the modeling of known protein domains in the Pfam database¹⁵ using trRosetta¹⁶. These represent prior benchmarks of large target coverage that can be a useful comparison for AF2’s performance.

With regard to the application of AF2 structures, it is noteworthy that the platform provides metrics of uncertainty¹² that have been shown to reflect confidence in the structural assignment—potentially linked to protein disorder—and uncertainty for pair-wise residue distances. It is therefore important to assess whether AF2 structures and confidence metrics can be successfully integrated into and applied to critical structural biology tasks, such as functional classification, variant effects, binding site prediction and modeling into new experimental data (obtained, for example, from cryogenic electron microscopy (cryo-EM)). In addition to the prediction of individual protein structures, it has been shown recently that contact predictions can be used to simultaneously fold and dock proteins¹⁷, and early reports have indicated that AF2 can predict the structure of complexes^18,19,20, which it was not initially trained to handle.

Here, we provide an evaluation and practical examples of applications of AF2 predictions across a large number of diverse structural biology challenges.

Results

Added structural coverage by AlphaFold2 predictions of model proteomes

The AF2 database has released predictions of the canonical protein isoforms for 21 model species, covering nearly every residue in 365,198 proteins. This represents around twice the number of experimental structures and six times the number of unique proteins in the Protein Data Bank (PDB). It is important to assess the extent to which AF2 predictions extend the structural coverage beyond previous proteome-wide structural predictions. We compared the structures of 11 model species that were included in both the SMR and AF2 databases and that had an average additional coverage of 44% of residues by AF2 (Fig. 1a, residues). However, not all of AF2’s residue predictions have high confidence. For residues that are not present in the SMR, we observed that an average of 49.4% are predicted with confidence by AF2 (predicted local distance difference test score (pLDDT) > 70) (Fig. 1a, AF residue confidence). With a more stringent cut-off (pLDDT > 90), AF2 predicts, on average, 25% of residues with very high confidence. In summary, an average of around 25% of the residues of the proteomes of the 11 model species are covered by AF2 with novel (not present in SRM) and confident (pLDDT > 70) predictions.

**Fig. 1: Additional coverage provided by AF2-predicted models.**

We then compared AF2 predictions with those derived for Pfam protein domains¹⁵ using trRosetta¹⁶. As there is only one trRosetta representative structure per domain family, we selected one species—human—and compared 3,035 AF2 models of 1,464 different Pfam domain families with the representative trRosetta model. These two approaches generally agree, with around 50% of AF2 domain structures having a root-mean-square deviation (r.m.s.d.) < 2 Å from the generic trRosetta model (Supplementary Fig. 1a). We observed a correlation between the estimated accuracy of the AF2 model (pLDDT) and the r.m.s.d. from the trRosetta model (Fig. 1b and Supplementary Fig. 1b,c). For AF2 models with an r.m.s.d. below 2 Å from the trRosetta model have, more than 90% of their residues, on average, have a pLDDT above 70 (Fig. 1b). We also examined the variability of domain structure for 273 domain families with 3 or more instances in the human proteome (Supplementary Fig. 2), and observed that 70% of domain instances are within one s.d. of the mean r.m.s.d. for their domain family. Together, these results indicate that, for at least 50% of human Pfam domains, the trRosetta Pfam model was already likely to be accurate.

We assessed the confidence and length of AF2 contiguous regions that are not covered in SMR to identify regions that may correspond to novel structures of folded domains, rather than short termini or interdomain linkers. The distribution of median confidence scores of a fragment versus fragment length shows an enrichment for high-confidence predictions with a length of 100–500 residues (Fig. 1c and Supplementary Fig. 3), consistent with the size of a typical protein domain²¹. This relation can be observed for all species, except Staphylococcus aureus (Supplementary Fig. 3). We identified, across the 11 species, 18,429 contiguous regions that are ‘domain like’ (with a length of 100–500 residues) with confident predictions (pLDDT > 70) that have no model in SMR. The human regions are provided in Supplementary Table 1.

Around half the residues in AF2 predictions of the 11 model species are of low confidence, many of which may correspond to regions without a well-defined structure in isolation. It has been shown that regions with low pLDDT are often intrinsically disordered proteins or regions (IDPs/IDRs)¹³. We benchmarked AF2-derived metrics against IUPred2 (ref. ²²), a commonly used disorder predictor (Fig. 1c), using regions annotated for order/disorder (Supplementary Table 2). In addition to using pLDDT, we tested the relative solvent accessible surface area (SASA) of each residue and smoothed versions of these metrics (Fig. 1d and Supplementary Fig. 4). pLDDT and window averages of pLDDT or SASA outperformed IUPred2, indicating that AF2’s low-confidence predictions are enriched for IDRs. To facilitate the study of human IDRs, we provide these predictions for human proteins in Supplementary Dataset 1 and in ProViz²³: http://slim.icr.ac.uk/projects/alphafold?page=alphafold_proviz_homepage.

Characterization of structural elements in AlphaFold2’s predicted models across 21 proteomes

The AF2 database is likely to contain structural elements that may not have been extensively seen in experimental structures. Owing to the presence of low-confidence regions in the AF2 proteins, we first split each prediction into smaller high-confidence units (see Methods). We then performed a global comparison of structural elements between the 365,198 proteins in the AF2 database and 104,323 proteins from the CASP12 dataset in the PDB. We applied the Geometricus algorithm²⁴ to obtain a description of protein structures as a collection of discrete and comparable shape-mers, analogous to k-mers in protein sequences. We then obtained a matrix of such shape-mer counts for all proteins, which we clustered using non-negative matrix factorization (NMF) (see Methods). The clustering identified 250 groups of proteins, dubbed ‘topics’ (Supplementary Dataset 2), with characteristic combinations of shape-mers. These characteristic shape-mers could include small structural elements, such as repeats, the specific arrangements of ion-binding sites or larger structural elements that could define specific folds. For visualization, we performed a t-distributed stochastic neighbor embedding (t-SNE) dimensionality reduction in which proteins composed of similar shape-mers are expected to group together (Fig. 2). In line with this, the shape-mer representation of AF2 proteins can predict the corresponding PDB protein entries with high accuracy (area under the receiver operating characteristic curve of 0.95 using the cosine similarity of the shape-mer vector). Additionally, the 20 most common superfamilies, predicted from sequence, tend to be placed together.

**Fig. 2: The space of characteristic structural elements in AF2 structural models for 21 species.**

Out of 250 total groups, we selected 5 examples that were almost exclusively (>90%) composed of structures derived from AF2, as well as 1 example with >80% AF2 structures with a particularly interesting novel predicted structural element. We illustrated these with a representative structure in Figure 2. Examples include 4,192 proteins annotated as G-protein-coupled olfactory or odorant receptors (Pfam PF13853), 97% of which are mammalian (Fig. 2a, Topic 88, and Supplementary Fig. 5a); a group of primarily (94%) plant proteins, annotated as PCMP-H and PCMP-E subfamilies of the pentatricopeptide repeat (PPR) superfamily (Fig. 2b, Topic 60, and Supplementary Fig. 5b); a group of heterogeneous structures that were mostly (>75%) annotated as ATP or ion binding (Fig. 2c, Topic 150, and Supplementary Fig. 5c); groups of proteins with leucine-rich repeats (Fig. 2d, Topic 16, and Supplementary Fig. 5d); some proteins with uncommon, regular patterns (Fig. 2e, Topic 188, and Supplementary Fig. 5e); and long α-helical constructs (Fig. 2f, Topic Helix, Supplementary Fig. 5f). For the PCMP-H and PCMP-E subfamilies (Fig. 2b), there are no known experimental structures mapped. AF2 predictions could help elucidate the structural peculiarities of these subfamilies, including the mechanism of RNA recognition and binding for PCMP-H and PCMP-E proteins.

Studying examples from Mycobacterium tuberculosis in Topic 188 led us to identify an interesting structure for a tandem repeat. Tandem repeat proteins with repetitive units of 6–10 residues predominantly have beta-solenoid structures²⁵. Analyzing the AF2 results, we found a novel beta-solenoid structure predicted for a large family of pentapeptide repeats²⁶, found in the mycobacterial PPE proteins (Pfam: PF01469) (Fig. 2e and Supplementary Fig. 6). This structure represents a beta-solenoid, with the shortest possible coil of ten residues (two pentapeptide repeats) (Supplementary Fig. 6b). Although such a beta-solenoid has not yet been resolved, our evaluation of the quality of the atomic structure (stereochemistry and contacts) suggests that the AF2 model is highly probable. Thus, AF2 may have allowed us to answer the question of what is the shortest length of repeat that forms a beta-solenoid.

Finally, we also considered protein groups consisting primarily of PDB proteins to study why AF2 proteins are absent from them. In some cases, this seemed to be due to the limited number of species and proteins covered by the current AF2 database. Topics 209 and 113 consist of immune response proteins, such as immunoglobulins and T-cell receptors, mainly from the PDB. As many of these antibodies are under intense study, there are many more PDB structures (based on multiple individuals and antibody-drug research) than the actual number of such proteins in the respective UniProt proteomes. Topic 38 consists of short fragments of PDB structures, with an average length of 63 residues—there are no AF2 proteins, because AlphaFold models the entire structure instead of returning fragments.

Application of AlphaFold2 models for structure-based variant effect prediction

A protein structure facilitates the generation of hypotheses regarding the impact of missense mutations. Conversely, an agreement between the expected and observed impacts of mutations provides confidence in the accuracy of a structural model. We obtained two independent compilations of experimentally measured impacts of protein mutations on protein function: (1) a compilation of measured changes in stability upon mutations^27,28; and (2) a compilation of deep mutational scanning (DMS) experiments^29,30 measuring the outcome of any possible single point mutation on most protein positions.

The DMS data were available for 33 proteins with 117,135 mutations; we obtained experimentally derived models for 31 of the proteins and AF2 models for all 33. We then used three structure-based variant effect predictors (FoldX³¹, Rosetta³² and DynaMut2 (ref. ³³)) to compare the DMS measurements with predicted impacts. Although the correlation estimates between the experimental and predicted impacts of mutations varied across the proteins, those derived from the AF2 models consistently matched or were better than those derived from experimental models (Fig. 3a,b and Supplementary Fig. 7). Regions with confidence scores lower than 50 result in lower concordance (Fig. 3a), but restriction to protein regions without an experimental model can still lead to correlations that are comparable to those observed in experimental structures (Fig. 3b). Because low AF2 confidence scores are enriched for intrinsically disordered protein regions, it is possible that the poor correlation in low-confidence regions is in part owing to higher tolerance to protein mutations. In line with this, we observed an average higher tolerance to mutations in low-confidence regions (Fig. 3c).

**Fig. 3: Comparing structure-based prediction of impact of protein missense mutations using experimental and AF2-derived models.**

The compilation of measured impacts of mutations on protein stability contains information for 2,648 single-point missense mutations over 121 distinct proteins. We compared the accuracy of structure-based prediction of stability changes using AF2 structures, experimental structures and homology models using different sequence identify cut-offs (Fig. 3d and Supplementary Fig. 8; see Methods). Across 11 well-established methods (Fig. 3d and Supplementary Fig. 8), the predictions of stability changes based on AF2 models were comparable to those of experimental structures. Homology-model-based predictions tended to show substantial decreases in performance for templates below 40% sequence identity.

We investigated, as an example, the human Sphingolipid delta(4)-desaturase (DEGS1), a 323-residue protein associated with leukodystrophy, for which no structure or model was available. All but the terminal residues are predicted by AF2 with high confidence. The presumed catalytic core is discussed further below. Here we focus on disease-associated missense variants. p.A280V has been shown to lead to loss of protein stability³⁴ and has a predicted Gibbs free energy change (ΔΔG) of 3.7 kcal/mol. Two additional pathogenic variants have ΔΔG values of >1.5 kcal/mol, pointing towards loss of stability being the mechanism of pathogenicity; the benign variants do not substantially affect protein stability, as expected (Fig. 3e). The likely pathogenic variant p.R133W is not predicted to affect stability, and hence likely has a different mechanism underlying disease. This is in line with previous findings that core variant changes in particular lead to loss of stability, whereas surface variants are more likely to act through other mechanisms³⁰.

Functional characterization of AF2 models by pocket and structural motif prediction

High-confidence proteome-wide structural predictions open the door for a large expansion of predicted protein pockets^35,36. However, the full protein models produced by AF2 have to be considered carefully given their potential errors, such as the likely incorrect placement of protein segments of low confidence or the low confidence in interdomain orientations. To investigate whether these issues may result in the formation of spurious pockets, we predicted pockets on a set of 225 proteins with known binding sites defined using bound (holo) structures for which the corresponding unbound (apo) structures are available³⁷.

Pockets identified from structures have a wider size range than do ground-truth binding sites (Fig. 4a). This is also true for pockets predicted from AF2 structures, including a small number of particularly large pockets (Fig. 4a). We divided AF2 pocket predictions into high-quality (mean pLDDT > 90) and low-quality (mean pLDDT ≤ 90) subsets (Fig. 4b,c) on the basis of the mean pLDDT of pocket-associated residues. Low-quality pockets are larger on average, and include particularly large pockets (Fig. 4a, bottom). We then asked whether mean pLDDT could be useful as a general metric of prediction confidence by quantifying the overlap between known and predicted pockets (Fig. 4b and Supplementary Fig. 9). We did not observe a difference between the performance of high-quality AF2 pockets and pockets identified from experimental structures. In contrast, low-confidence pockets generally did not overlap with known sites. Although there may be bias because high-confidence AF2 regions are more likely to have relevant deposited templates, we suggest that the mean pLDDT of predicted pockets can be used as an additional criterion for pocket selection in AF2 structures.

**Fig. 4: Pocket detection and function prediction.**

Conserved local conformations of specific residues can be used to identify important functions, such as enzyme activity, ion or ligand binding beyond global sequence and fold similarities³⁸. To showcase the potential of this application for AF2 models in the future, we focused on 912 human proteins with no experimental or homology models available. We found that the prediction score of the highest ranked pocket enriched the set for proteins with previous annotations for enzymatic activity (Fig. 4c and Supplementary Table 3). Discarding pockets with a low mean pLDDT led to slightly improved enrichment. As a specific example, we focused on the human sphingolipid delta(4)-desaturase (EC 1.14.19.17, DEGS1, UniProt Accession O15121, pocket score rank 57 of 912), which has a high confidence level (average pLDDT = 96.31) and for which there are no previous structural data. A sequence search of the 323-residue protein against all existing entries in the PDB shows that the best sequence match is 23.5%, with PDB entry 1VHB (Bacterial dimeric hemoglobin, 9115439), indicating the lack of any structural models from homology. A scan of 400 auto-generated 3-residue templates from the AF2-predicted structure against representative structures in the PDB (reverse template comparison³⁸) yielded a possible 3-residue template match: PDB entry 4ZYO (EC 1.14.19.1, human stearoyl-CoA desaturase³⁹, Fig. 4d). A close up of the metal-binding center (Fig. 4e) of DEGS1 and 4YZO (overall sequence homology, 12.1%) superimposed via the 3-residue templates (Fig. 4d) clearly indicates the potential dimetal catalytic center for DEGS1. The histidine-coordinating metal center of DEGS1, together with data on the bound substrate of 4ZYO, provides a foundation for modeling studies that could impact the pharmacology of DEGS1 by exploring the details of its catalytic mechanism.

AlphaFold2-based prediction of protein complex structures

Since the first development of direct coupling analysis algorithms, co-evolutionary-information-based methods have been used to predict protein-protein interactions⁴⁰. It has been recently reported that several deep-learning-based methods, such as trRosetta¹⁶ and Raptor-X⁴¹, can predict the structure of protein complexes. To examine the capacity of AF2 to predict protein complex structures, we tested the ability of AF2 to fold and ‘dock’ two benchmark sets—a set of proteins known to form oligomers⁴² and the Dockground 4.3 heterodimeric benchmark⁴³.

For oligomerization, we obtained sets of proteins known either not to oligomerize or to form oligomers, including dimers, trimers or tetramers. We then made AF2 predictions for each protein, attempting to predict either a monomer or an oligomeric form (see Methods). Across the set of predictions, higher scores were given to models corresponding to the correct oligomerization state, and 71 out of 87 (82%) predicted top-scoring models corresponded to the correct state (Fig. 5a and Supplementary Table 4). Generally, the multimeric state scores are well separated from the monomeric state scores (Fig. 5b). In 28/30 examples, AF2 was able to correctly predict monomeric proteins as monomers, 29/35 dimers as dimers, 7/9 trimers as trimers and 7/13 tetramers as tetramers. Notably, although the failure rate is high for tetramer state predictions, the predicted structure for the corresponding state was actually correct for 5/6 failures. Examples of failure modes for dimers and a tetramer are shown in Figure 5c,d. We noted that, for some cases of failed tetramer predictions, we could obtain higher confidence of the tetramer predictions by increasing the number of recycles.

**Fig. 5: Using AF2 to predict homo-oligomeric assemblies and their oligomeric state.**

We next examined the Dockground 4.3 heterodimeric benchmark set⁴³. We predicted complex structures using the DeepMind default dataset and the small Big Fantastic Database (BFD) database. This method does not include any ‘pairing’ of interacting chains, as was used in earlier fold-and-dock approaches. The docking quality was evaluated using DockQ^44,45. Only one model for each target was made, and a maximum of three recycles were allowed. In Figure 5e, it can be seen that the performance is far superior to traditional docking methods, with 31% of correctly predicted protein complex models, compared with 7% using GRAMM, a standard shape-complementarity docking method⁴⁴.

Finally, we studied examples of complexes containing IDPs/IDRs that adopt a stable structure upon binding. IDRs often bind through short linear motifs (SLiMs), recognizing folded domains driven by a few residues. The longer IDRs can contain arrays of SLiMs and can also form stable structures upon binding to other IDRs without a structured template. We selected 14 cases of complexes involving IDRs with known structures and analyzed their distinguishing features compared with the experimental complex (Fig. 5f contains selected examples and Supplementary Figs. 10 and 11 show all examples). In general, AF2 performs well at predicting SLiMs that fit into a well-defined binding pocket driven by hydrophobic interactions, such as the SUMO interacting motif of RanBP2. Longer IDRs, which frequently contain tandem motifs, are often challenging, especially if they have a symmetric structure. For the RelA–CBP interaction, AF2 correctly finds the binding groove, but fits the IDR in a reverse orientation. AF2 also performs well on complexes in which IDRs are part of a multi-IDR single folding unit, such as the E2F1–DP1–Rb trimer; however, building complexes for proteins with highly unusual residue compositions, such as collagen triple helices, often fail. We provide a detailed description of the 14 examples in Supplementary Figures 10 and 11 and Supplementary Table 5 and detail the factors that enable or hinder successful predictions.

Evaluation of AlphaFold2 models for use in experimental model building

The accuracy of AF2 predictions provides opportunities for their use in experimental model building: (1) AF2 models could be used for molecular replacement or docking into cryo-EM density, experimental phasing and/or ab initio model building; and (2) they could be used as reference points to improve existing low-resolution structures. These use cases will typically involve the use of conformational restraints, for example to maintain the local geometry of domains while flexibly fitting a large multi-domain model, or to restrain the local geometry of an existing model of an AF2-derived reference to highlight and correct likely sites of error. It is critical to use restraint schemes designed to avoid forcing the model into conformations that clearly disagree with the data. Typically, this is achieved through some form of top-out restraint, for which the applied bias drops off at large deviations from the target. Here, we take advantage of the fact that AF2 models typically include very strong predictions of their own local uncertainty to adjust per-restraint weighting of the adaptive restraints recently implemented in ISOLDE⁴⁶ (see Methods). For the two case studies discussed below, a comparison of validation statistics for the original and revised models is provided in Supplementary Table 6.

As an example of the improvement of existing structures, we used the eukaryotic translation initiation factor (eIF) 2B bound to substrate eIF2 (6O85)^47,48. The eIF2B complex is a decamer comprising two copies each of five unique chains. It displays allosteric communication between physically distant substrate-, ligand- and inhibitor-binding sites. eIF2 is a heterotrimer of three unique chains. We analyzed a 0.4-MDa co-complex enzyme-active state captured by cryo-EM at an overall resolution of 3 Å (ref. ⁴⁹). Rigid-body alignment of AF2 models to their corresponding experimental chains (Fig. 6a) showed overall excellent agreement, with the largest deviations corresponding to correctly folded domains with flexible connections to their neighbors. Other mismatched smaller regions corresponded to either register errors in the original model or flexible loops and tails. Each chain was restrained to its corresponding AF2 model using ISOLDE’s reference-model distance and torsion restraints, with each distance restraint adjusted according to pLDDT. Future work will explore the use of the predicted aligned error (PAE) matrix for this purpose, and weighing of torsion restraints according to pLDDT. Simple energy minimization and equilibration of the restrained model at 20 K corrected the majority of local geometry issues (for example, Fig. 6b,c); a high-confidence prediction for the C-terminal domain of chains I and J allowed us to add this into previously untraceable low-resolution density (Fig. 6d, left of the dashed line). We emphasize that detailed manual inspection remains necessary to find and correct larger errors in the experimental model, sites of disagreement arising from conformational variability and sites where high-confidence predictions are in fact incorrect. An example of the latter is the side chain of Trp A111, which, despite its high confidence (pLDDT = 86.1), was modeled incorrectly by AF2 (Fig. 6f).

**Fig. 6: Application of AF2 predictions to modeling into cryo-EM or crystallographic data.**

To explore the use of AF2 structures for solving and refining new structures, and to map out suitable workflows, we attempted to recapitulate the recent 3.3-Å crystal structure of the Saccharomyces cerevisiae Nse5/6 complex (7OGG)⁵⁰. This was not included in the AF2 training set, and no existing structures have ≥30% identity to either chain. Originally solved using selenomethionine experimental phasing, the combination of low-resolution and anisotropy (ΔB = 80 Å²) meant that, although the core of the complex was confidently and correctly modeled, only 583 out of 850 total residues were definitively modeled by the authors, with a further 65 residues traced as unknown sequence and one peripheral 27-residue helix modeled out of register. For testing purposes, we discarded this model and used the AF2 predictions for molecular replacement (MR). MR requires very close correspondence between atom positions in the search model and in the crystal; separation into individual rigid domains and trimming of flexible loops is a necessity. We used the PAE matrix to extract a single rigid core from each chain (see Methods) and performed MR in Phaser⁵¹, leading to a clear solution with translation function Z-score (TFZ) = 28.2 and log-likelihood gain (LLG) = 884 (see Methods).

Currently, a refined MR solution is typically used as the starting point for some combination of automatic and manual building of missing portions into the density. In many cases, however, it appears that AF2 predictions will support a more ‘top-down’ approach, in which all residues predicted with at least moderate confidence are present in the initial model. To explore this, we trimmed the predicted chains to exclude residues with pLDDT ≤ 50 and aligned the result to the MR solution, setting the occupancies of all atoms not used for MR to zero. This was used as the starting point for rebuilding in ISOLDE; here, zero-occupancy atoms do not contribute to structure factor calculations or bulk solvent masking, but still take part in molecular interactions and are attracted into the map. The model was subjected to three rounds of end-to-end inspection and rebuilding interspersed with refinement with phenix.refine⁵². In the initial round, zero-occupancy residues fitting the map were reinstated to full occupancy, and residues that seemed to be truly unresolved were deleted; a small number of these were re-introduced in subsequent rounds. The total time spent was approximately one working day; the final model (Fig. 6f–h) increased the number of modeled, identified residues from 600 to 818, slightly improved overall geometry and reduced the R_free from 0.317 to 0.295. With few exceptions (primarily at heterodimer and symmetry interfaces), rebuilding was limited to minor side chain adjustments.

Discussion

We estimate that AF2 may add, on average, around 25% of confidently predicted residues to a given proteome, although this will vary depending on how much experimental and previous computational approaches have already covered. However, even for residues that can be modeled by distant homology, it is possible that the AF2 models are more accurate, increasing their usefulness. Here, the precise accuracy estimates at the residue level are extremely useful. In addition, low AF2 prediction scores are enriched for protein disorder, suggesting that regions of low-confidence predictions can be hypothesized to be disordered segments. However, we note that the protein disordered regions are often defined as regions that are not solved by X-ray crystallography. As AF2 is trained primarily on X-ray data, the relation between disorder and predicted confidence could be a by-product of using this definition of disorder. For comparison, we used IUPred2, an easy to deploy tool that is a commonly used dedicated protein-disorder prediction, but there are other dedicated approaches that outperform IUPred2 (ref. ⁵³).

The AlphaFold database was initially released with >300,000 proteins modeled with a more recent expansion to over 200 million proteins with predicted structures, sampling the universe of protein sequences and structures. As we show here, even on a relatively modest set, we can identify what are likely to correspond to rarely explored combinations of structural elements. As an example, we have identified to our knowledge the shortest length of repeat that forms a beta-solenoid to date. Among other areas, the expansion of high-confidence predictions will allow prioritization of experimental structure determination of novel folds; the large-scale prediction of protein function from structure; the identification of novel enzymes; and the study of the evolution of protein structure and function.

We assessed the application of AF2 predictions in diverse structural biology challenges, including variant effect prediction, pocket detection and model building into experimental data. In line with the reported high accuracy of the models, we found that AF2-predicted structures, on average, tend to give results that are as good as those derived from experimental structures. However, Although AF2 returns full protein predictions, these can often contain protein segments that are placed with uncertainty. This uncertainty can lead to incorrect estimations or identification of structural similarity, pockets, variant effects or poor model building. Importantly, in all cases, we find that it is critical to take into account the confidence metrics provided, and that these should be incorporated into the corresponding workflows. For model building based on experimental data, we noted examples of cases for which details were incorrect in regions where AF2 has high confidence, which underlies the need for detailed manual inspection. AF2 will not do away with experimental studies, but the combination of experimental data collection plus artificial intelligence is likely to be a growing trend.

For variant effect prediction, AF2 was used to predict the structure, and the impacts of mutations were predicted with other tools that can use these or experimental structures. A different approach could have been to use AF2 to predict the structure of the reference and mutated proteins and to compare these structures to evaluate the impact of the mutations. However, some reports have indicated that AF2 does not appear to be well suited to predict the structures of mutated proteins^54,55. Additionally, the prediction of such large numbers of mutated structures would have been computationally time consuming.

Finally, we explored the application of AlphaFold2 for prediction of complex structures and found that it outcompetes standard docking approaches while not requiring even starting protein structures. We have expanded on this analysis in a companion manuscript²⁰ and have used this approach to predict complexes of human protein interactions on a large scale⁵⁶. It has already been shown that other distance- and contact-prediction methods trained to predict unmodified intra-chain contacts can be used to predict inter-chain contact predictions, both for homo- and hetero-meric complexes. Therefore, we were not surprised to see that it is possible to use AF2 to fold and dock heterodimeric complexes. However, it was unexpected that it was possible to use non-matched pairs of alignments for different proteins to predict complex structures, indicating that AF2 goes beyond using coevolution to predict these structures.

There are many areas for further benchmarking and improvements of AF2-related approaches. Although the capacity of AF2 to predict the structures of difficult targets has been demonstrated, the extent to which AF2 generalizes to truly never before seen folds remains to be understood. The generation of a test set with folds that are not represented in the AF2 training set may not be a trivial task. Other open areas for this field of research include the prediction of: structures of mutated proteins⁵⁵; conformation diversity and dynamics; structure in the absence of a MSA⁵⁷; and structures of proteins in complex with other biomolecules such as DNA, RNA or metabolites. Another such area is the explicit modeling of biophysical parameters.

In summary, we find that AF2 models, when their uncertainty is taken into account, can be applied to existing structural biology challenges, and their quality is near that of experimental models. The application of AF2 to a large representation of the protein universe and expansion to the prediction of protein complexes will have a transformative impact in life sciences.

Methods

Coverage comparison between the SWISS-MODEL repository and AlphaFold2 databases

The SMR and AF2 databases were accessed on 24 July 2021. Reference proteomes for 11 species shared between AF2 and SMR were downloaded from the Uniprot release 2021_03. Only structures corresponding to entries from the reference proteomes were used for the analysis. Numpy⁵⁸, Pandas⁵⁹, Prody⁶⁰ and Matplotlib⁶¹ Python libraries were used for the analysis and the visualization. Structure counters for protein domains were extracted from the corresponding InterPro entries⁶². Code and data are available online (https://github.com/aozalevsky/alphafold2_vs_swissmodel/).

Comparison between human RoseTTAFold Pfam domain and AlphaFold2 structures

We used the 17,006 human proteins that were defined as the principal isoform for their corresponding gene according to APPRIS and whose sequences were the same in ENSEMBL and Uniprot. We used Pfamscan to identify PFAM domains in the 17,006 protein sequences. The database of PFAM-A models was downloaded on 29 June 2021 and created on 19 March 2021. We kept only those PFAM domains identified with an e-value below 1 × 10^–8. AF2 models for human proteins were downloaded on 23 July 2021 from https://alphafold.ebi.ac.uk. We extracted the sequences and compared them with the ENSEMBL protein sequences used for the structural analysis. For comparison purposes, all the analyses and results presented here are based on the subset of 17,006 protein sequences for which the ENSEMBL and AlphaFold protein sequences were identical. We also extracted pLDDT values for each residue from the AlphaFold models, as these are stored as if they were the B-factor of the protein coordinates file. The RoseTTAFold models were downloaded from the EBI website on 27 July 2021, and the r.m.s.d. between models from both methods was calculated using the function struct.aln from the R package bio3d. All statistical analyses were done using R 4.0.2. Graphical plots were created with the packages ggplot2 (ref. ⁶³), patchwork and reshape2. Molecular graphics and analyses performed with UCSF Chimera⁶⁴, developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco.

Disorder prediction

Benchmarking data for ordered and disordered protein regions were taken from the benchmark set of IUPred2 (ref. ²²) and were filtered for proteins for which AF2-predicted structures are available in the AlphaFold database. Relative SASA was calculated by determining the absolute SASA using DSSP and then comparing it to the SASA calculated in a GGXGG conformation. Receiver operating characteristic (ROC) curves plotting the true positive rate as the function of the false positive rate were calculated on a per-residue level. Area under the ROC curves are single-number measures of the overall predictive performance in the range of 0.5 (for random predictions) to 1.0 (for perfect predictions).

Exploration of structural space covered by the AlphaFold database compared with the Protein Data Bank

We use the 365,198 proteins from the current AlphaFold database (AF) and 104,323 proteins from PDB in 2016 (until CASP12) with a 100% sequence identity threshold, removing duplicates. Owing to the presence of low-confidence regions in the AF proteins, we first performed trimming to split each AF protein into smaller high-confidence units as follows: a one-dimensional Gaussian filter with a standard deviation σ of 5 is applied to the sequence of pLDDT scores extracted from the Cα atoms. The resulting scores are used to split the protein into continuous segments of residues with smoothed pLDDT scores > 70. Segments with fewer than 50 residues are discarded. This removed 68,890 AF proteins with too few high-confidence residues for accurate structural comparison.

For each AF protein segment, and for each PDB protein, rotation invariant moments O3, O4, O5 and F were calculated for the Cα atoms using a k-mer-based approach (with k = 16) and radius-based approach (r = 10 Å) using Geometricus²⁴. These were then converted into shape-mers using a resolution of four for the k-mer based approach and six for the radius-based approach. Shape-mers were counted across the whole protein for a PDB protein and across all splits for an AF protein to give the shape-mer count vectors. We then created a term frequency inverse document frequency (TFIDF) matrix for all PDB and AF proteins, in which the terms are shape-mers and each protein is equivalent to a document. We performed topic modeling using NMF, which attempts to factorize a matrix of size n × m into matrices W of size n × p, and H of size p × m. We interpret this as finding p topics (here set to 250), each of which consists of a weighted combination of the m shape-mers (defined by H). Each of the n proteins can then be seen as a weighted combination of these p topics (defined by W).

For topic analysis, we assigned proteins to each topic using knee detection with a weight cut-off⁶⁵. For visualization, we performed t-SNE dimensionality reduction on the W matrix returned by NMF. Topic-specific scores were obtained for each residue within a shape-mer by multiplying the corresponding topic weight for the shape-mer (from H) with an RBF kernel score of the Euclidean distance between the residue and the central residue of the shape-mer. These were aggregated across all shape-mers within a protein to obtain the topic-specific residue scores for the protein.

Code and scripts can be found at: https://github.com/TurtleTools/alphafold-structural-space

Structure-based variant effect predictions using experimental and predicted structures

A subset of experimentally characterized mutations was curated from ThermoMutDB²⁷, comprising 2,648 single point missense mutations across 132 unique globular proteins. The experimentally measured effect of the mutations on protein stability was represented as the difference in ΔΔG (in kcal/mol) between wild type and mutant. Experimental structures were obtained from the PDB¹. Homology models were generated using Modeller⁶⁶ using the most complete available template within each identity threshold range (20% ± 5%, 30% ± 5%, until 90% ± 5%). AF2 models were generated locally. These mutations were analyzed by computational predictive tools, including the sequence-based predictors I-Mutant⁶⁷, SAAFEC-SEQ⁶⁸ and MUpro⁶⁹, and the structure-based predictors mCSM-stability⁷⁰, DUET⁷¹, SDM⁷², DynaMut⁷³, MAESTRO⁷⁴, ENCoM⁷⁵, DynaMut2 (ref. ³³) and FoldX³¹. For each method, the performance and concordance between the experimental and predicted ΔΔG are determined and presented in the corresponding figures as Pearson’s correlation values. A larger set of experimentally determined impact of missense mutations was derived from a compilation of Deep Mutational Scanning (DMS) experiments²⁹ comprising 117,135 total mutations in 33 proteins. These were compared against predictions made with DynaMut2, FoldX and Rosetta^30,32 (see also Supplementary Methods).

Pocket and structural motif prediction

We downloaded structures from the AlphaFold Protein Structure Database⁷⁶ except for analyses of the LBSp dataset³⁷. In the latter case, we used locally modeled structures, as many LBSp structures are from species not included in the public database. We detected pockets and calculated overlap metrics (F-score, Matthew’s correlation coefficient (MCC)) using AutoSite⁷⁷ from ADFRsuite version 1.0, and OpenBabel⁷⁸ was used to prepare PDBQT input for AutoSite (obabel -h -xr–partialcharge gasteiger). For enzyme activity predictions, we selected AlphaFold models without corresponding entries in SWISS-MODEL 2021–11–30, and kept 921 structures with mean pLDDT ≥ 70 and 100–500 residues. We considered 170 proteins as having known enzymatic activity if there was an EC number and/or a catalytic activity annotation in the corresponding UniProt records.

Oligomerization state prediction

To test the ability of AF2 to predict the oligomeric state of homo-oligomeric assemblies, we downloaded the dataset from Ponstingl et al.⁴². Since the PDB files were not provided, the dataset was filtered to entries for which the oligomeric state was in agreement with PISA annotation. Because AF2’s training was done only on single chains, we reasoned that examples, even if they overlap with the training set, could be used to evaluate AF2’s oligomeric state prediction capabilities. For each PDB entry, the sequence of chain A was extracted, and a multiple sequence alignment was generated using the automated MMseqs2 webserver through ColabFold. For homo-oligomeric prediction, each MSA was copied, padded with gaps to the total length reflecting the number of copies in the assembly and concatenated. These concatenated alignments were fed into AF2. No templates were used. All five ptm-fine tuned model parameters were used. To test the robustness of AF2’s five model parameters to predict homo-oligomeric structures, we use the worst of the predicted TMscores for each state.

Fold-and-dock prediction of heterodimeric protein complexes

We used 219 heterodimeric complexes from Dockground benchmark 4 (ref. ⁷⁹). This set contains unbound forms of heterodimeric protein chains, which share at least 97% sequence identity with the bound forms. The dataset consists of 54% eukaryotic proteins, 38% bacterial proteins and 8% from mixed kingdoms, for example one bacterial protein interacting with one eukaryotic protein. To evaluate performance, one model for each pair was generated with AF2 (using default parameters, except that model_2 was used, providing a complementary set of results to those derived in ref. ²⁰). To enable docking, we changed only the residue number so that both chains are treated as a long chain with a 200-residue gap, as in ref. ⁸⁰. We compared AF2 predictions with models docked with GRAMM. The GRAMM models were ranked using the AACE18 scoring function⁸¹. Docking quality was estimated with DockQ⁴⁵.

Complex structure predictions for disordered proteins

Predictions were run using the sequences defined in the PDB files (not including modified residues and other molecules). Predictions were done using the Google Colab notebooks by S. Ovchinnikov; homooligomers were predicted using the notebook accessible at https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb, and heterooligomers were predicted using the dedicated notebook, accessible at https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2_complexes.ipynb. In the case of dimers, the default settings were used. In case of higher order oligomers, one chain was used on its own (usually the IDR if there is only one), and the rest of the chains were concatenated using a long linker (either several ‘U’s or several repeats of ‘SG’s).

Evaluation of AlphaFold2 models for use in experimental model building

AF2 models were used as an aid to rebuilding the existing 6O85 in ISOLDE, with a preliminary implementation pLDDT-based weighting of its existing adaptive distance restraints⁴⁶. Initial fetching and alignment of the relevant AF2 models for each chain used a tool available in pre-release versions of ChimeraX 1.3, allowing command-based fetching of predictions from the AlphaFold EBI server by UniProt ID. For existing models fetched from the wwPDB, the UniProt ID for each chain is automatically parsed from the mmCIF metadata, and each fetched prediction is aligned and renamed to match the target chain. ISOLDE’s reference-model distance restraint scheme has four adjustable parameters controlling the restraint potential: kappa (the overall strength); wellHalfWidth (the range over which the restraining force is linearly related to distance); tolerance (the width of a flat-bottom—that is, zero-force—region close to the target); and fallOff (the rate at which the potential tapers at large distances). With the exception of kappa, each of these terms is expressed as a function of the reference interatomic distance: for a given restraint, the final harmonic well width, tolerance and fall-off all increase with increasing reference distance. For the purpose of this study, we added terms to further adjust kappa, tolerance and fallOff according to the pLDDT of the lowest-confidence atom in each restrained pair; all restraints where at least one reference atom had a pLDDT < 50 were disabled. For each chain in the complex, the working model was restrained against the AlphaFold reference using the ‘isolde restrain distances’ command with the above modifications enabled but otherwise standard settings. Backbone and side chain torsions were also restrained against the reference model using the ‘isolde restrain torsions’ command with default arguments. After energy minimization and equilibration, the model was inspected and, where necessary, interactively remodeled; where reference model restraints clearly disagreed with the model, they were selectively released. Where the AF2 models included previously-unmodeled residues supported by the density, they were merged into the working model. The final model was refined with phenix.real_space_refine⁵² using settings defined by the ‘isolde write phenixRsr’ command.

For the recapitulation of 7OGG, the AF2 predictions for its two chains were fetched in ChimeraX, as above. The rigid core of each chain was extracted using a community clustering approach based on the PAE matrix; source code for this is available at https://github.com/tristanic/pae_to_domains. After setting B-factors to a constant value of 50, these were used to generate a fresh molecular replacement (MR) solution using PHASER⁵¹. The original, complete AF2 models were aligned to the MR result in ChimeraX, and occupancies for atoms that were not part of the MR models were set to zero. The result was used as the starting model for rebuilding in ISOLDE. After it was initially settled into the map with distance and torsion restraints applied, the model was inspected and rebuilt end-to-end. During this initial rebuilding, zero-occupancy atoms with clear correspondence to density were reinstated to full occupancy, while residues with no associated density were deleted. Where there was clear disagreement with the map (primarily at the heterodimer interface), the initial distance and torsion restraints were selectively released in favor of interactive remodeling. The resulting model was refined with phenix.refine ⁵², using settings defined by the ‘isolde write phenixRefine’ command. In a second and third round of interactive rebuilding in ISOLDE (during which the distance and torsion restraints were fully released) interspersed with phenix.refine, a small number of residues deleted in the first step were re-introduced.

In both the above cases, the final coordinates have been shared with the original authors.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

Contiguous protein regions of human high-confidence structural predictions with no previous structural predictions by homology models in the SWISS-MODEL Repository are available in Supplementary Table 1 and in Github: https://github.com/aozalevsky/alphafold2_vs_swissmodel. The benchmark dataset used for testing of disorder predictions metrics is available in Supplementary Table 2, and predicted disordered regions for human proteins are available in Supplementary Dataset 1 and are integrated into ProViz22 at http://slim.icr.ac.uk/projects/alphafold?page=alphafold_proviz_homepage. The grouping of proteins by structure similarly using the NMF analysis of structural fragments is available as Supplementary Dataset 2, and the pocket prediction scores for 912 human proteins with no previous experimental or predicted structural models are available in Supplementary Table 3.

Code availability

Coverage comparison between SMD and AF2: https://github.com/aozalevsky/alphafold2_vs_swissmodel/. Exploration of structural space: https://github.com/TurtleTools/alphafold-structural-space. Pocket predictions: https://github.com/jurgjn/af2_pockets. Protein complexes: https://gitlab.com/ElofssonLab/FoldDock, https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb. Model building: https://github.com/tristanic/pae_to_domains.

References

Burley, S. K. et al. RCSB Protein Data Bank: powerful new tools for exploring 3D structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic Acids Res. 49, D437–D451 (2021).
CAS PubMed Google Scholar
Thomas, J., Ramakrishnan, N. & Bailey-Kellogg, C. Graphical models of residue coupling in protein families. IEEE/ACM Trans. Comput. Biol. Bioinform. 5, 183–197 (2008).
CAS PubMed Google Scholar
Dunn, S. D., Wahl, L. M. & Gloor, G. B. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 24, 333–340 (2008).
CAS PubMed Google Scholar
Bartlett, G. J. & Taylor, W. R. Using scores derived from statistical coupling analysis to distinguish correct and incorrect folds in de-novo protein structure prediction. Proteins 71, 950–959 (2008).
CAS PubMed Google Scholar
Wang, S., Sun, S., Li, Z., Zhang, R. & Xu, J. Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput. Biol. 13, e1005324 (2017).
PubMed PubMed Central Google Scholar
Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature 577, 706–710 (2020).
CAS PubMed Google Scholar
Xu, J. Distance-based protein folding powered by deep learning. Proc. Natl Acad. Sci. USA 116, 16856–16865 (2019).
CAS PubMed PubMed Central Google Scholar
AlQuraishi, M. End-to-end differentiable learning of protein structure. Cell Syst. 8, 292–301 (2019).
CAS PubMed PubMed Central Google Scholar
AlQuraishi, M. Machine learning in protein structure prediction. Curr. Opin. Chem. Biol. 65, 1–8 (2021).
CAS PubMed Google Scholar
Godzik, A. Metagenomics and the protein universe. Curr. Opin. Struct. Biol. 21, 398–403 (2011).
CAS PubMed PubMed Central Google Scholar
wwPDB Consortium. Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Res. 47, D520–D528 (2019).
Google Scholar
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
CAS PubMed PubMed Central Google Scholar
Tunyasuvunakool, K. et al. Highly accurate protein structure prediction for the human proteome. Nature 596, 590–596 (2021).
CAS PubMed PubMed Central Google Scholar
Bienert, S. et al. The SWISS-MODEL Repository—new features and functionality. Nucleic Acids Res. 45, D313–D319 (2017).
CAS PubMed Google Scholar
Mistry, J. et al. Pfam: The protein families database in 2021. Nucleic Acids Res. 49, D412–D419 (2021).
CAS PubMed Google Scholar
Yang, J. et al. Improved protein structure prediction using predicted interresidue orientations. Proc. Natl Acad. Sci. USA 117, 1496–1503 (2020).
CAS PubMed PubMed Central Google Scholar
Pozzati, G. et al. Limits and potential of combined folding and docking. Bioinformatics 38, 954–961 (2021).
PubMed Central Google Scholar
Ko, J. & Lee, J. Can AlphaFold2 predict protein-peptide complex structures accurately? Preprint at bioRxiv https://doi.org/10.1101/2021.07.27.453972 (2021).
Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
CAS PubMed PubMed Central Google Scholar
Bryant, P., Pozzati, G. & Elofsson, A. Improved prediction of protein-protein interactions using AlphaFold2. Nat. Commun. 13, 1265 (2022).
CAS PubMed PubMed Central Google Scholar
Wheelan, S. J., Marchler-Bauer, A. & Bryant, S. H. Domain size distributions can predict domain boundaries. Bioinformatics 16, 613–618 (2000).
CAS PubMed Google Scholar
Mészáros, B., Erdos, G. & Dosztányi, Z. IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res. 46, W329–W337 (2018).
PubMed PubMed Central Google Scholar
Jehl, P., Manguy, J., Shields, D. C., Higgins, D. G. & Davey, N. E. ProViz—a web-based visualization tool to investigate the functional and evolutionary features of protein sequences. Nucleic Acids Res. 44, W11–W15 (2016).
CAS PubMed PubMed Central Google Scholar
Durairaj, J., Akdel, M., de Ridder, D. & van Dijk, A. D. J. Geometricus represents protein structures as shape-mers derived from moment invariants. Bioinformatics 36, i718–i725 (2020).
CAS PubMed Google Scholar
Kajava, A. V. & Steven, A. C. Beta-rolls, beta-helices, and other beta-solenoid proteins. Adv. Protein Chem. 73, 55–96 (2006).
CAS PubMed Google Scholar
Bateman, A., Murzin, A. G. & Teichmann, S. A. Structure and distribution of pentapeptide repeats in bacteria. Protein Sci. 7, 1477–1480 (1998).
CAS PubMed PubMed Central Google Scholar
Xavier, J. S. et al. ThermoMutDB: a thermodynamic database for missense mutations. Nucleic Acids Res. 49, D475–D479 (2021).
CAS PubMed Google Scholar
Nikam, R., Kulandaisamy, A., Harini, K., Sharma, D. & Gromiha, M. M. ProThermDB: thermodynamic database for proteins and mutants revisited after 15 years. Nucleic Acids Res. 49, D420–D424 (2021).
CAS PubMed Google Scholar
Dunham, A. S. & Beltrao, P. Exploring amino acid functions in a deep mutational landscape. Mol. Syst. Biol. 17, e10305 (2021).
CAS PubMed PubMed Central Google Scholar
Høie, M. H., Cagiada, M., Beck Frederiksen, A. H., Stein, A. & Lindorff-Larsen, K. Predicting and interpreting large-scale mutagenesis data using analyses of protein stability and conservation. Cell Rep. 38, 110207 (2022).
PubMed Google Scholar
Delgado, J., Radusky, L. G., Cianferoni, D. & Serrano, L. FoldX 5.0: working with RNA, small molecules and a new graphical interface. Bioinformatics 35, 4168–4169 (2019).
CAS PubMed PubMed Central Google Scholar
Park, H. et al. Simultaneous optimization of biomolecular energy functions on features from small molecules and macromolecules. J. Chem. Theory Comput. 12, 6201–6212 (2016).
CAS PubMed PubMed Central Google Scholar
Rodrigues, C. H. M., Pires, D. E. V. & Ascher, D. B. DynaMut2: assessing changes in stability and flexibility upon single and multiple point missense mutations. Protein Sci. 30, 60–69 (2021).
CAS PubMed Google Scholar
Karsai, G. et al. DEGS1-associated aberrant sphingolipid metabolism impairs nervous system function in humans. J. Clin. Invest. 129, 1229–1239 (2019).
PubMed PubMed Central Google Scholar
Bhagavat, R., Sankar, S., Srinivasan, N. & Chandra, N. An augmented pocketome: detection and analysis of small-molecule binding pockets in proteins of known 3D structure. Structure 26, 499–512 (2018).
CAS PubMed Google Scholar
Kana, O. & Brylinski, M. Elucidating the druggability of the human proteome with eFindSite. J. Comput. Aided Mol. Des. 33, 509–519 (2019).
CAS PubMed PubMed Central Google Scholar
Clark, J. J., Orban, Z. J. & Carlson, H. A. Predicting binding sites from unbound versus bound protein structures. Sci. Rep. 10, 15856 (2020).
CAS PubMed PubMed Central Google Scholar
Laskowski, R. A., Watson, J. D. & Thornton, J. M. Protein function prediction using local 3D templates. J. Mol. Biol. 351, 614–626 (2005).
CAS PubMed Google Scholar
Wang, H. et al. Crystal structure of human stearoyl-coenzyme A desaturase in complex with substrate. Nat. Struct. Mol. Biol. 22, 581–585 (2015).
CAS PubMed Google Scholar
Weigt, M., White, R. A., Szurmant, H., Hoch, J. A. & Hwa, T. Identification of direct residue contacts in protein-protein interaction by message passing. Proc. Natl Acad. Sci. USA 106, 67–72 (2009).
CAS PubMed Google Scholar
Jing, X., Zeng, H., Wang, S. & Xu, J. A web-based protocol for interprotein contact prediction by deep learning. Methods Mol. Biol. 2074, 67–80 (2020).
CAS PubMed Google Scholar
Ponstingl, H., Henrick, K. & Thornton, J. M. Discriminating between homodimeric and monomeric proteins in the crystalline state. Proteins 41, 47–57 (2000).
CAS PubMed Google Scholar
Kundrotas, P. J., Kotthoff, I., Choi, S. W., Copeland, M. M. & Vakser, I. A. Dockground tool for development and benchmarking of protein docking procedures. Methods Mol. Biol. 2165, 289–300 (2020).
CAS PubMed Google Scholar
Tovchigrechko, A., Wells, C. A. & Vakser, I. A. Docking of protein models. Protein Sci. 11, 1888–1896 (2002).
CAS PubMed PubMed Central Google Scholar
Basu, S. & Wallner, B. DockQ: a quality measure for protein-protein docking models. PLoS ONE 11, e0161879 (2016).
PubMed PubMed Central Google Scholar
Croll, T. I. ISOLDE: a physically realistic environment for model building into low-resolution electron-density maps. Acta Crystallogr D. Struct. Biol. 74, 519–530 (2018).
CAS PubMed PubMed Central Google Scholar
Schoof, M. et al. eIF2B conformation and assembly state regulate the integrated stress response. eLife 10, e65703 (2021).
Zyryanova, A. F. et al. ISRIB blunts the integrated stress response by allosterically antagonising the inhibitory effect of phosphorylated eIF2 on eIF2B. Mol. Cell 81, 88–103 (2021).
CAS PubMed PubMed Central Google Scholar
Kenner, L. R. et al. eIF2B-catalyzed nucleotide exchange and phosphoregulation by the integrated stress response. Science 364, 491–495 (2019).
CAS PubMed PubMed Central Google Scholar
Taschner, M. et al. Nse5/6 inhibits the Smc5/6 ATPase and modulates DNA substrate binding. EMBO J. 40, e107807 (2021).
CAS PubMed PubMed Central Google Scholar
McCoy, A. J. et al. Phaser crystallographic software. J. Appl. Crystallogr. 40, 658–674 (2007).
CAS PubMed PubMed Central Google Scholar
Liebschner, D. et al. Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix. Acta Crystallogr D. Struct. Biol. 75, 861–877 (2019).
CAS PubMed PubMed Central Google Scholar
Necci, M. & Piovesan, D. CAID predictors, DisProt Curators & Tosatto, S. C. E. Critical assessment of protein intrinsic disorder prediction. Nat. Methods 18, 472–481 (2021).
CAS PubMed PubMed Central Google Scholar
Pak, M. A. et al. Using AlphaFold to predict the impact of single mutations on protein stability and function. Preprint at bioRxiv https://doi.org/10.1101/2021.09.19.460937 (2021).
Buel, G. R. & Walters, K. J. Can AlphaFold2 predict the impact of missense mutations on structure? Nat. Struct. Mol. Biol. 29, 1–2 (2022).
CAS PubMed Google Scholar
Burke, D. F. et al. Towards a structurally resolved human protein interaction network. Preprint at bioRxiv https://doi.org/10.1101/2021.11.08.467664 (2021).
Chowdhury, R. et al. Single-sequence protein structure prediction using a language model and deep learning. Nat. Biotechnol. https://doi.org/10.1038/s41587-022-01432-w (2022); publisher correction https://doi.org/10.1038/s41587-022-01556-z (2022).
Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
CAS PubMed PubMed Central Google Scholar
Reback, J. et al. pandas-dev/pandas: Pandas 1.3.3. (Zenodo, 2021); https://doi.org/10.5281/zenodo.5501881
Bakan, A., Meireles, L. M. & Bahar, I. ProDy: protein dynamics inferred from theory and experiments. Bioinformatics 27, 1575–1577 (2011).
CAS PubMed PubMed Central Google Scholar
Caswell, T. A. et al. matplotlib/matplotlib: REL: v3.5.0b1. (Zenodo, 2021); https://doi.org/10.5281/zenodo.5242609
Blum, M. et al. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res. 49, D344–D354 (2021).
CAS PubMed Google Scholar
Wilkinson, L. ggplot2: elegant graphics for data analysis by WICKHAM, H. Biometrics 67, 678–679 (2011).
Google Scholar
Pettersen, E. F. et al. UCSF Chimera—a visualization system for exploratory research and analysis. J. Comput. Chem. 25, 1605–1612 (2004).
CAS PubMed Google Scholar
Satopaa, V., Albrecht, J., Irwin, D. & Raghavan, B. Finding a "kneedle" in a haystack: detecting knee points in system behavior. In Proc. 31st International Conference on Distributed Computing Systems Workshops 166–171 (IEEE Computer Society, 2011).
Sali, A. & Blundell, T. L. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779–815 (1993).
CAS PubMed Google Scholar
Capriotti, E., Fariselli, P. & Casadio, R. I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure. Nucleic Acids Res. 33, W306–W310 (2005).
CAS PubMed PubMed Central Google Scholar
Li, G., Panday, S. K. & Alexov, E. SAAFEC-SEQ: a sequence-based method for predicting the effect of single point mutations on protein thermodynamic stability. Int. J. Mol. Sci. 22, 606 (2021).
Cheng, J., Randall, A. & Baldi, P. Prediction of protein stability changes for single-site mutations using support vector machines. Proteins 62, 1125–1132 (2006).
CAS PubMed Google Scholar
Pires, D. E. V., Ascher, D. B. & Blundell, T. L. mCSM: predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics 30, 335–342 (2014).
CAS PubMed Google Scholar
Pires, D. E. V., Ascher, D. B. & Blundell, T. L. DUET: a server for predicting effects of mutations on protein stability using an integrated computational approach. Nucleic Acids Res. 42, W314–W319 (2014).
CAS PubMed PubMed Central Google Scholar
Worth, C. L., Preissner, R. & Blundell, T. L. SDM—a server for predicting effects of mutations on protein stability and malfunction. Nucleic Acids Res. 39, W215–W222 (2011).
CAS PubMed PubMed Central Google Scholar
Rodrigues, C. H., Pires, D. E. & Ascher, D. B. DynaMut: predicting the impact of mutations on protein conformation, flexibility and stability. Nucleic Acids Res. 46, W350–W355 (2018).
CAS PubMed PubMed Central Google Scholar
Laimer, J., Hofer, H., Fritz, M., Wegenkittl, S. & Lackner, P. MAESTRO—multi agent stability prediction upon point mutations. BMC Bioinf. 16, 116 (2015).
Google Scholar
Frappier, V., Chartier, M. & Najmanovich, R. J. ENCoM server: exploring protein conformational space and the effect of mutations on protein function and stability. Nucleic Acids Res. 43, W395–W400 (2015).
CAS PubMed PubMed Central Google Scholar
Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–D444 (2022).
CAS PubMed Google Scholar
Ravindranath, P. A. & Sanner, M. F. AutoSite: an automated approach for pseudo-ligands prediction-from ligand-binding sites identification to predicting key ligand atoms. Bioinformatics 32, 3142–3149 (2016).
CAS PubMed PubMed Central Google Scholar
O’Boyle, N. M. et al. Open Babel: An open chemical toolbox. J. Cheminform. 3, 33 (2011).
PubMed PubMed Central Google Scholar
Kundrotas, P. J. et al. Dockground: a comprehensive data resource for modeling of protein complexes. Protein Sci. 27, 172–181 (2018).
CAS PubMed Google Scholar
Mirdita, M. et al. ColabFold—making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
CAS PubMed PubMed Central Google Scholar
Anishchenko, I., Kundrotas, P. J. & Vakser, I. A. Contact potential for structure prediction of proteins and protein complexes from Potts model. Biophys. J. 115, 809–821 (2018).
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

B. M. has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement no. 842490 (MIMIC). A. Stein received funding from the Lundbeck Foundation (R272-2017-4528) and Novo Nordisk Foundation (NNF18OC0033950). D. B. A. received funding from the National Health and Medical Research Council of Australia (GNT1174405) and the Victorian Government’s Operational Infrastructure Support Program. K. L.-L. received funding from the Novo Nordisk Foundation (NNF18OC0033950). D. E. V. P. received funding from an Oracle Research Grant. A. O. Z. received funding from the Russian Science Foundation (RSF) 20-14-00121. A. E., A. Shenoy, G. P., P. Bryant and W. Z. received funding from the Swedish Research Council for Natural Science, grant No. VR-2016-06301, the Knut and Alice Wallenber foundation, the Swedish National Initiative for computing, grant No SNIC 2021/6-197 and Berzelius-2021-29, and Swedish E-science Research Center. T. I. C. received funding from the Wellcome Trust. E. P. P. has received funding from PID2019-107043RA-I00 and RYC2019-026415-I from the Spanish Science Ministry. A. V. has been supported by RTI2018-096653-B-I00. S. O. is supported by NIH Grants DP5OD026389 and R21AI156595, NSF Grant MCB2032259 and the Simons Foundation 735929LPI. P. B. is supported by the Helmut Horten Stiftung and the ETH Zurich Foundation.

Author information

These authors contributed equally: Mehmet Akdel, Douglas E V Pires, Eduard Porta Pardo, Jürgen Jänes, Arthur O Zalevsky, Bálint Mészáros, Patrick Bryant, Lydia L. Good, Roman A Laskowski.

Authors and Affiliations

Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
Mehmet Akdel
School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
Douglas E. V. Pires & Carlos H. M. Rodrigues
Josep Carreras Leukaemia Research Institute (IJC), Badalona, Spain
Eduard Porta Pardo
Barcelona Supercomputing Center (BSC), Barcelona, Spain
Eduard Porta Pardo, Victoria Ruiz Serra & Alfonso Valencia
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
Jürgen Jänes, Roman A. Laskowski, Alistair S. Dunham, David Burke, Neera Borkakoti, Sameer Velankar, Alex Bateman, Janet M. Thornton & Pedro Beltrao
Shemyakin–Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russian Federation
Arthur O. Zalevsky
European Molecular Biology Laboratory, Heidelberg, Germany
Bálint Mészáros
Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden
Patrick Bryant, Gabriele Pozzati, Aditi Shenoy, Wensi Zhu, Petras Kundrotas & Arne Elofsson
Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
Lydia L. Good, Kresten Lindorff-Larsen & Amelie Stein
Department of Biochemistry and Biophysics University of California, San Francisco, CA, USA
Adam Frost
Department of Structural Cell Biology, Max Planck Institute of Biochemistry, Martinsried, Germany
Jérôme Basquin
Université de Montpellier, Centre de Recherche en Biologie Cellulaire de Montpellier (CRBM) CNRS, Montpellier, France
Andrey V. Kajava
Faculty of Arts and Sciences, Division of Science, Harvard University, Cambridge, MA, USA
Sergey Ovchinnikov
Biozentrum, University of Basel, Basel, Switzerland
Janani Durairaj
School of Chemistry and Molecular Biology, University of Queensland, Brisbane, Queensland, Australia
David B. Ascher
Institute of Cancer Research, London, UK
Norman E. Davey
Cambridge Institute for Medical Research, Department of Haematology, The University of Cambridge, Cambridge, UK
Tristan I. Croll
Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
Pedro Beltrao

Authors

Mehmet Akdel
View author publications
You can also search for this author in PubMed Google Scholar
Douglas E. V. Pires
View author publications
You can also search for this author in PubMed Google Scholar
Eduard Porta Pardo
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Jänes
View author publications
You can also search for this author in PubMed Google Scholar
Arthur O. Zalevsky
View author publications
You can also search for this author in PubMed Google Scholar
Bálint Mészáros
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Bryant
View author publications
You can also search for this author in PubMed Google Scholar
Lydia L. Good
View author publications
You can also search for this author in PubMed Google Scholar
Roman A. Laskowski
View author publications
You can also search for this author in PubMed Google Scholar
Gabriele Pozzati
View author publications
You can also search for this author in PubMed Google Scholar
Aditi Shenoy
View author publications
You can also search for this author in PubMed Google Scholar
Wensi Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Petras Kundrotas
View author publications
You can also search for this author in PubMed Google Scholar
Victoria Ruiz Serra
View author publications
You can also search for this author in PubMed Google Scholar
Carlos H. M. Rodrigues
View author publications
You can also search for this author in PubMed Google Scholar
Alistair S. Dunham
View author publications
You can also search for this author in PubMed Google Scholar
David Burke
View author publications
You can also search for this author in PubMed Google Scholar
Neera Borkakoti
View author publications
You can also search for this author in PubMed Google Scholar
Sameer Velankar
View author publications
You can also search for this author in PubMed Google Scholar
Adam Frost
View author publications
You can also search for this author in PubMed Google Scholar
Jérôme Basquin
View author publications
You can also search for this author in PubMed Google Scholar
Kresten Lindorff-Larsen
View author publications
You can also search for this author in PubMed Google Scholar
Alex Bateman
View author publications
You can also search for this author in PubMed Google Scholar
Andrey V. Kajava
View author publications
You can also search for this author in PubMed Google Scholar
Alfonso Valencia
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Ovchinnikov
View author publications
You can also search for this author in PubMed Google Scholar
Janani Durairaj
View author publications
You can also search for this author in PubMed Google Scholar
David B. Ascher
View author publications
You can also search for this author in PubMed Google Scholar
Janet M. Thornton
View author publications
You can also search for this author in PubMed Google Scholar
Norman E. Davey
View author publications
You can also search for this author in PubMed Google Scholar
Amelie Stein
View author publications
You can also search for this author in PubMed Google Scholar
Arne Elofsson
View author publications
You can also search for this author in PubMed Google Scholar
Tristan I. Croll
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Beltrao
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E. P. P., A. V. and V. R. S. performed the AF2 and RoseTTaFold comparison. A. O. Z. performed the AF2 and SMD comparison. J. D., M. A., A. B. and A. V. K. performed the characterization of structural elements in the AlphaFold database. N. E. D. and B. M. performed the disorder region and disordered complex structure prediction. T. I. C. performed the experimental model building studies with assistance from J. B. and A. F. S. O. performed the homo-oligomeric state predictions. J. J. and P. Beltrao did the pocket prediction analysis. A. E., A. Shenoy, G. P., P. Bryant, W. Z. and P. K. performed the protein-protein interaction modelling. J. M. T., N. B. and R. A. L. performed the template search study. A. S. D., P. Beltrao, A. Stein, K. L.-L., L. L. G., C. H. M. R., D. B. A. and D. E. V. P. performed the variant effect prediction analyses. D.B provided technical assistance with running AF2 predictions. A. F. and S. V. contributed suggestions for the analysis and revised the text. P. Beltrao, A. V., A. Stein, K. L.-L., N. E. D., T. I. C., S. O., A. E., J. M. T., J. D. and D. B. A. supervised the work. All authors contributed to the writing of the manuscript.

Corresponding authors

Correspondence to Alfonso Valencia, Sergey Ovchinnikov, Janani Durairaj, David B. Ascher, Janet M. Thornton, Norman E. Davey, Amelie Stein, Arne Elofsson, Tristan I. Croll or Pedro Beltrao.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Structural and Molecular Biology thanks Alexander Schug, Charlotte Deane, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Sara Osman was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary Figures 1–11 and Supplementary Methods.

Reporting Summary.

Peer Review File.

Supplementary Data 1

Results for the inference of intrinsically disordered regions in human proteins using metrics derived from the AlphaFold2 predicted structures.

Supplementary Data 2

Grouping of proteins into topics by the Non-negative Matrix Factorization analysis.

Supplementary Tables

Supplementary Tables 1–5.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Akdel, M., Pires, D.E.V., Pardo, E.P. et al. A structural biology community assessment of AlphaFold2 applications. Nat Struct Mol Biol 29, 1056–1067 (2022). https://doi.org/10.1038/s41594-022-00849-w

Download citation

Received: 25 October 2021
Accepted: 20 September 2022
Published: 07 November 2022
Issue Date: November 2022
DOI: https://doi.org/10.1038/s41594-022-00849-w

This article is cited by

Structural determinants for activation of the Tau kinase CDK5 by the serotonin receptor 5-HT7R
- Jana Ackmann
- Alina Brüge
- Evgeni Ponimaskin
Cell Communication and Signaling (2024)
Evolutionary history of the cytochrome P450s from Colletotrichum species and prediction of their putative functional roles during host-pathogen interactions
- Jossue Ortiz-Álvarez
- Sioly Becerra
- Michael R. Thon
BMC Genomics (2024)
Learnt representations of proteins can be used for accurate prediction of small molecule binding sites on experimentally determined and predicted protein structures
- Anna Carbery
- Martin Buttenschoen
- Charlotte M. Deane
Journal of Cheminformatics (2024)
AlphaFold predictions are valuable hypotheses and accelerate but do not replace experimental structure determination
- Thomas C. Terwilliger
- Dorothee Liebschner
- Paul D. Adams
Nature Methods (2024)
Deep learning for protein structure prediction and design—progress and applications
- Jürgen Jänes
- Pedro Beltrao
Molecular Systems Biology (2024)

Subjects

Abstract

Similar content being viewed by others

Main

Results

Added structural coverage by AlphaFold2 predictions of model proteomes

Characterization of structural elements in AlphaFold2’s predicted models across 21 proteomes

Application of AlphaFold2 models for structure-based variant effect prediction

Functional characterization of AF2 models by pocket and structural motif prediction

AlphaFold2-based prediction of protein complex structures

Evaluation of AlphaFold2 models for use in experimental model building

Discussion

Methods

Coverage comparison between the SWISS-MODEL repository and AlphaFold2 databases

Comparison between human RoseTTAFold Pfam domain and AlphaFold2 structures

Disorder prediction

Exploration of structural space covered by the AlphaFold database compared with the Protein Data Bank

Structure-based variant effect predictions using experimental and predicted structures

Pocket and structural motif prediction

Oligomerization state prediction

Fold-and-dock prediction of heterodimeric protein complexes

Complex structure predictions for disordered proteins

Evaluation of AlphaFold2 models for use in experimental model building

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links