Assessment of semiempirical enthalpy of formation in solution as an effective energy function to discriminate native‐like structures in protein decoy sets

In this work, we tested the PM6, PM6‐DH+, PM6‐D3, and PM7 enthalpies of formation in aqueous solution as scoring functions across 33 decoy sets to discriminate native structures or good models in a decoy set. In each set these semiempirical quantum chemistry methods were compared according to enthalpic and geometric criteria. Enthalpically, we compared the methods according to how much lower was the enthalpy of each native, when compared with the mean enthalpy of its set. Geometrically, we compared the methods according to the fraction of native contacts (Q), which is a measure of geometric closeness between an arbitrary structure and the native. For each set and method, the Q of the best decoy was compared with the Q0, which is the Q of the decoy closest to the native in the set. It was shown that the PM7 method is able to assign larger energy differences between the native structure and the decoys in a set, arguably because of a better description of dispersion interactions, however PM6‐DH+ was slightly better than the rest at selecting geometrically good models in the absence of a native structure in the set. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.


Introduction
Protein tertiary structure prediction is an obstacle in molecular biology that has challenged researchers in chemistry, biology, and physics for over half a century. [1][2][3] Traditional approaches have attempted to find suitable methods for conformational sampling and effective energy functions that can correctly identify native-like conformations.
Even though the large size of the search space and the complexity of the energy landscape make extensive sampling and global energy optimization essentially impossible to use physics-based effective energies for most biomolecular systems, these energies represent a very important tool for scoring and refining protein models produced by heuristic methods, such as Rosetta, [4,5] TASSER, [6][7][8] 3D-SHOTGUN , [9] and so forth.
The methods above were shown to produce sets of models that contain relatively accurate native-like conformations among nonnative decoy conformations but they are usually unable to reliably discriminate the native conformations among a set of decoys. These decoy sets are created using a variety of sampling algorithms. [8,[10][11][12][13] Because protein structure prediction is usually based on the Anfinsen thermodynamic hypothesis, [14] a crucial property for an effective energy function is its ability to recognize when a conformation is at a free energy minimum. This means that a good test for an effective energy function is to score a large number of closely related structures to recognize native-like conformations, a purpose for which the decoy sets produced by the methods cited above are widely used. [15][16][17][18] The correct assignment of an effective energy to a conformation requires a good physical description of the stability of the chain, and nonbonded interactions play a decisive role in the conformational stability of proteins, particularly hydrogen bonds [2] and hydrophobic interactions. Despite the fact that a number of force fields have been parametrized to describe these interactions with a good margin of accuracy, the correct description of these types of interactions can only be done through the recovery of a large portion of electron correlation using large basis sets and high-level quantum chemistry methods.
To avoid such costly computational solutions, ad hoc corrections in the energetics of dispersion interactions were proposed by Grimme [19,20] for DFT methods, followed closely by McNamara and Hillier [21] and more recently by Anikin [22] for semiempirical quantum chemical methods. Similar corrections for hydrogen bonds also followed, using specific methodologies. [23][24][25] The accuracy of semiempirical methods corrected for hydrogen bonds and dispersion interactions has been reported to be comparable to DFT corrected for dispersion and up to three times faster. [26] The corrections to noncovalent interactions in semiempirical methods has been thoroughly tested in biological systems. [27][28][29] In particular, the PM6 method [30] and its family of derived methods have been widely used to model biological and other complex systems on several occasions. [28,[31][32][33][34] The PM6 method has also been used to model the geometries of proteins with a good degree of success. [35] Recently, PM6-DH1 has also been highlighted by Yilmazer et al. as a fast and accurate alternative to a full ab initio computation of the interactions between proteins and their ligands. [36] Maia and coworkers used the PM6-DH1 method [24] to assess the performance and accuracy of MOPAC2012 GPUimplementation [37] using the software to make implicit solvent calculations for the enthalpies of formation of three decoy sets taken from the I-TASSER decoy set II [10] assembly. In these calculations, the native structures in each set had a markedly different enthalpy of formation in relation to the rest of the conformations in the set, indicating the possibility of using the semiempirical enthalpies of formation as an effective energy function to distinguish native-like structures from the rest of the decoy set.
This marked difference in the enthalpy of the native structure when compared to the rest led us to the hypothesis that perhaps the surface of the enthalpy of formation in solution around the native structure might be unusually steep.
We decided to test whether the enthalpies of formation in solution, calculated using the PM6/PM7 family of semiempirical Hamiltonians, can be used to find native-like conformations in a set of decoys. We also want to investigate the effect that corrections to noncovalent interactions to the final enthalpy of formation have in that scoring capacity, if any.
Using enthalpies of formation as basis for scoring decoy sets, Wollacott and Merz [38] compared the results of DG estimations using the molecular mechanics AMBER, CHARMM, and OPLS force fields as well as the semiempirical PM3 and AM1 Hamiltonians. Some of the decoy sets used in their work are also present in our assembly.
In their work, Wollacott and Merz applied the semiempirical enthalpies of formation from the AM1 and PM3 Hamiltonians to the DivCon score, shown in eq. (1), along with the polarization energy, obtained from solving the finite difference Poisson equation and the attractive Lennard-Jones term taken from the AMBER parm94 force field.
DG5aDH f 1bDG solv 1cRLJ 6 (1) In this equation, the term DG is the total free energy of the system. DG solv is the free energy of solvation, DH f is the enthalpy of formation as calculated by a Hamiltonian (either AM1 or PM3 in Wollacott's and Merz's work) and the term RLJ 6 represents the attractive Lennard-Jones term. Since this estimation mixes terms calculated at different levels of theory, the weights a, b, and c are introduced and assigned in a way to maximize the Z-Score (a measure of distance from the structure's score to the mean score of the population) difference between the native structure and the decoys. Prior to their scoring, the structures in the sets were minimized using the AMBER force field. The metrics used in their assessment included the Z-Score gap and overall ranking. In their work, they showed that the DivCon score using the AM1 and PM3 semiempirical Hamiltonians is consistently better at scoring the native structure than the AMBER parm94 molecular mechanics force field. Furthermore, the PM3 Hamiltonian was shown to be slightly better than the AM1 Hamiltonian.
In their analysis, Wollacot and Merz compared how good the final free energy estimations, as calculated by eq. (1) (where DH f is either PM3 or AM1 Hamiltonian), were at ranking the native structures. They also showed the results for the enthalpies of formation in gas phase, which were much worse. However, Maia and his collaborators [37] showed that the enthalpies of formation in solution on their own are able to detect the native structure in sets of decoys.
Merz and Wollacot's work also points to the fact that in their sets of decoys, although the native structure usually had a good score in terms of heat of formation, they all scored poorly in respect to solvation, when compared to the decoys. This trend is also seen in the best scoring decoys. All native structures in their work also had a favorable Lennard-Jones term, suggesting that tight packing might compensate for the reduction in the free energy of solvation. Overall, both AMBER and DivScore demonstrate poor discrimination of the decoys by score as a function of RMSD. However, the work called our attention to the fact that since the protein decoys have been created computationally using various force fields, scoring the decoy structures relative to the unmodified crystal structure introduces bias. The decoys may have geometries that are better suited to the force field, improving their score relative to the native structure.
A few years later, Scheraga et al. released a paper in which they studied a different way of using decoys to test a scoring function. Instead of minimizing the decoys, they relaxed the structures with short Monte Carlo runs. [39] They tested three energy functions, all of which were a combination of an all-atom force field and a solvation term: ECEPP05 combined with a solvent-accessible surface area model with the parameters optimized with a set of protein decoys (ECEPP05/SA); ECEPP/3 combined with a different solvent-accessible surface area model (ECEPP3/OONS) and ECEPP05 combined with an implicit solvent model based on a solution of the Poisson equation with an optimized Fast Adaptive Multigrid Boundary Element (FAMBEpH) method (ECEPP05/FAMBEpH). Scheraga and collaborators [39] then compared these three functions for their ability to rank the native structures on 45 of all the proteins in the Rosetta decoy sets after a Monte Carlo with Minimization (MCM) run and a local energy minimization. They compared this with the scoring method using only local minimization. To assess the reliability of the methods they compared the scores of the 5 and 10 structures with the lowest C a RMSD from the native structure to assess the method's ability to adequately rank near-native structures.
In their analysis, they showed that the ECEPP05/SA combination was particularly good at ranking the native structure appropriately, most likely because its solvation model was so very simple. Nevertheless, the ECEPP05/FAMBEpH combination was the overall best combination, trailing behind the ECEP P05/SA combination in terms of practicality simply because its solvation term is more complicated to compute. In all cases, the discriminative power increased when some form of simulation was used to relax the structures found in the decoy set prior to scoring.
Lazaridis and Karplus also studied ways of estimating effective energy functions for protein conformational studies. [40] In their work, they compared the ability of a CHARMM potential to discriminate between the native structure and its decoys with and without a Gaussian estimation for the Gibbs free energy variation due to the solvation, DG solv .
The publication of the PM7 method [41] and its new parametrization scheme for dispersion interactions as well as the work of Merz et al. and that of Lazaridis and Karplus motivated our revisitation of the decoy sets, this time using the PM7 Hamiltonian and the COSMO solvation model. The use of the PM3 method in Wollacott and Merz's work, a somewhat old method in the PMx family of semiempirical Hamiltonians, and the absence of dispersion corrections in their work motivated us to build on their steps, assessing the ability of the enthalpy of formation of the PM7 method in solution to discriminate native-like structures from a set of decoy models.
The idea was to do a different approach on the notion of Merz and Wollacot's work. [38] Instead of exploring the mixing of molecular mechanics and semiempirical methods as a way to estimate free energy, we decided to use only the enthalpies of formation in aqueous solution calculated by semiempirical quantum chemistry methods.
The reason is that not only are the semiempirical enthalpies of formation fully quantum mechanical descriptions of the system, they are also quite fast to compute. Furthermore, their versions corrected for noncovalent interactions have been shown to be comparably accurate to DFT ab initio calculations. [26] Because Maia and his collaborators used the PM6-DH1 method in the original work on GPU implementation [37] and the PM6-D3 [42] correction scheme is an improvement over the DH1, we decided to apply both the PM6-DH1 and PM6-D3 along with PM7 methods to the entire assemble of decoy sets, instead of only the three sets presented in that paper.
To compare the PM7, PM7, PM6-DH1 and PM6-D3 enthalpies of formation in solution as effective energy functions to find native structures in the decoy sets, we needed a kind of "control" energy function that lacked any corrections whatsoever. This energy was the PM6 method, which itself has no corrections for dispersion or hydrogen bond interactions. Because the trend we observed was in the enthalpies of formation, we decided not to estimate the free energy but rather to determine whether the enthalpy of formation in solution could be used to separate the native-like structure from those which are not native-like and furthermore to determine how the dispersion corrections affected accuracy.

Methodology
The decoy set was downloaded from the Zhang Lab site, decoy set type II, [10] generated by the I-TASSER Algorithm [8] . This assembly contains 56 sets but not all were used. For some sets, the native structures was missing entire side chains or atoms in the amino-terminus or had other structural imperfections and were therefore abandoned.
For the remaining 33 sets (Table 1), the UCSF Chimera [43] software was used to add hydrogen atoms to the native structures. The protonation states of Histidines were determined by Chimera, which took into consideration steric limitations and any possible formation of hydrogen bonds. Subsequently, valences and disulfide bridges were verified using the information in the PDB file headers using the SCIGRESS software. [44] Hydrogen-only optimizations were carried out using MOPAC 2012 [37] separately for each method. Because only the hydrogen atoms were optimized, a convergence threshold for the gradient norm was set to 1.0 kcal mol 21 Å 21 . For multichain proteins, only chain A was used.
The molecules were grouped based on their single-point heats of formation in aqueous solution. In total, we carried out 61,280 semiempirical quantum chemistry calculations across 33 decoy sets using the MOPAC [35,37] software and the COSMO [45] implict solvent field with a relative permittivity of 78.4 and an effective solvent molecule radius of 1.3 Å .
The calculations were run using the linear-scaling technique MOZYME. [46] For the hydrogen optimization in the native structures, the MOZYME cutoff radius was 10 Å , and for the single-point calculations, it was set to 9 Å . After the calculations, decoys that showed large positive enthalpies of formation were excluded from further analysis because it was verified that the SCF (Self-Consistent Field) did not converge for such cases. Table 1 summarizes the overall characteristics of the sets considered in this work.
It has been known for a long time that the native ensemble of a protein is composed of structures that are close together on a range of about 2 Å . [47] It is also been known for a long time that the entropy of the native state can be approximated by contributions from the small number of native conformations in relation to the vast amount of unfolded conformations. [48] This means that the native state must be located at a very deep minimum in the energy hypersurface and thus there is a significant energy gap between a native structure and an average non-native structure.
Because of this, one can expect a good score function to have this property. To assess the efficacy of semiempirical methods as scoring functions, we tested whether they satisfy these energy gap criteria.
We assumed that a random ensemble of conformations have a Gaussian distribution of enthalpies. Therefore, they belong to a population of structures with similar geometric and energetic properties. This is not an unreasonable assumption, since the sampling algorithm usually explores a small area of the phase space. However, certain conformations have extremely different enthalpy values. For these cases, we assumed that these specific conformations are outliers that differ from the population.
High-enthalpy conformations occur systematically and have no physical meaning. However, the presence of these outliers can strongly impact the ensemble average because they often have extreme values. Thus, the first step is to delete all highenthalpy conformations. To do this, a limit of two standard deviations above the mean was established and the conformations with a heat of formation that exceeds this limit are FULL PAPER WWW.C-CHEM.ORG removed from the ensemble. Then, the mean is recalculated and a new limit is set, and the process is repeated until there are no more high-enthalpy conformations.
After this iterative process concludes due to lack of outliers in the remaining set, a new three standard deviation limit is set below the mean. Conformations with enthalpy values lower than this limit are considered native-like. Notably, the lower limit (3s) is wider than the upper limit (2s), because while the upper limit excludes conformations without physical interpretation, the lower limit indicate native-like conformations and is more stringent.
The complete algorithm is shown in Figure 1. This assessment allows us to verify whether a method satisfies the energy gap property that an energy function must have to correctly describe the energy hypersurface. The methods that managed to assign such markedly lower scores to the native structure were considered better.
We named the assessment Algorithm 3SIEVE (3Sigma Iterative Enthalpy natiVe-likeness Exposer). This methodology was applied for all methods and decoy sets. The process is shown step-by-step for the 1TFI decoy set in Figure 2.

Results and Discussion
To assess the quality of the Hamiltonians as scoring functions, we tested the methods in two ways: (1) their capacity to detect the native structure in the set and (2) the overall quality of the models selected in the absence of the native structure.
To assess the method's overall ability to rank the native structures correctly we used the Z-score shown in eq. (2), which was also used in Wollacot and Merzs work. [38] In this equation, H is the mean enthalpy of the population, s is the standard deviation, and H is the enthalpy of a particular conformation in the set.
ZðHÞ 5H2 H s Here we show Z(H native ) for all sets and methods to see how far the enthalpy of the native structure is from the mean enthalpy. This is a very general quality measure for a score function because as the native structure gets further from the mean structure, it becomes increasingly more likely that it is the structure with lowest enthalpy. These data are summarized in Table 2.
As shown in Table 2, the PM7 method had a better overall performance. On average, this method put the native structure around 4.5 standard deviations below the population mean, making it easy to see. Some problematic cases were observed, like the sets with the PDB codes 1G1C, 1MLA, and 1R69, in which the native structure was put less than 2 standard deviations below the mean. The PM6 and PM6-DH1 methods were overall worse than the PM7 method with instances where the methods put the native inside the population altogether, as exemplified by the sets with PDB codes 1B4B, 1KJS, and 1CEW. In these sets, both of these methods placed the native structure less than one standard deviation below the mean enthalpy. PM6-D3 was not as good as PM7 but better than the other two because it did not place a native structure at less than one standard deviation below the mean enthalpy for any sets, being comparable to PM7 in this aspect.
In Table 2, we can observe the difference between the enthalpies of the native structure and the best ranking decoy. It is easy to distinguish the sets for which the methods failed to correctly rank the native structure, since for such sets the difference is positive. The PM7 method only failed to detect the native structure for the 1KJS set, as did the other methods. However, only a single decoy had its enthalpy lower than the native structure in the ranking of the PM7 method, whereas the PM6 method had 101 decoys with lower enthalpy than the native and the PM6-DH1 had 51. The PM6 method failed to find the native in the sets with PDB codes 1CEW, 1DCJ, 1TFI, 1SRO, and 1KJS. The PM6-DH1 method was unsuccessful only in the sets with PDB codes 1CEW and 1KJS as was PM6-D3, but by a smaller margin.
We compared the methods according to the difference in the heat of formation attributed to the native structure in a set and the line that is three standard deviations below the mean enthalpy of the set, which is the cutoff line for 3SIEVE assessment. These data are shown in Table 3 and summarized in Figure 3.
In Figure 3, the dots that are above the line represent the native structures that were not detected to be native-like, under 3SIEVE. According to 3SIEVE, some methods did not give the native structure a low enough score to set it apart from the rest of the population but these were also native structures that were incorrectly ranked by the scoring function in the first place.
For the sets where the native conformation ranks worse than a decoy, such decoys were not detected as native-like either, even on removal of the native conformation from the set. This means that even though they had a better score, the decoys were not native-like. For the sets where the native structure was properly detected, the PM7 method only detected the 1AOY set to have a decoy as native-like on the removal of the native structure.
This is an indication that for sets where the native structure was not properly discriminated, the energy function collected all the structures around the mean, either because the sampling process was very localized around a known native conformation or the scoring function itself failed to discriminate geometrically dissimilar structures from one another.
The case of the 1KJS set is intriguing. This protein is not large, but it does have three native disulfide bonds as indicated in its PDB record. Some of these bonds are not present in a portion of the decoys. The decoy that had a lower heat of formation than the native conformation according to the PM7 method is part of the fraction that lacks the disulfide bonds.
Initially, it was suspected that the energetics of the disulfide bonds may have been wrongly parameterized. This happened previously with the PM6 method (as mentioned in the PM7 Ref. [41]), and perhaps this problem was not correctly tackled by the new PM7 scheme. However, this hypothesis was quickly refuted because the 1GPT protein has four disulfide bonds and no method failed to detect the native structure for that protein.
As seen in Figure 4, the decoy is a lot more "spread out" than the native, particularly around the amino-terminus ("B" in Fig. 4) and the carboxyl-terminus ("A" in Fig. 4), where one would expect less constrained and faster torsional degrees of freedom to be located. Therefore, we suspected that the structural rigidity provided by disulfide bond formation may interfere with the entropic term of the system, such that the native structure could be enthalpically unfavorable but favorable when considering free energy as a whole. However, that assumption also predicts that such issues would be even more exacerbated in the 1GPT structure, which is not the case. In order further analyze the matter we decided to evaluate the contributions for dispersion interactions (E disp ), solvation (E solv ),hydrogen bonding (E h 2 bond ), and the total energy (E tot ), as outputted by MOPAC.  We have computed each of these terms for the native structure and its best decoy, and subtracted the first from the last to generate the variations DE tot , DE disp , DE solv , and DE h2bond . We did that for the 1KJS and additionally to the sets 1BM8, 1AF7, 1CEW, and 1GJX. These results are summarized in Table 4.
Comparisons across the five decoy sets show that for 1KJS the total energy E tot of the native structure is over 2000 kcal/ mol greater than that of its best decoy. Although the difference in total energy is a significant term, it is noticeable from the table that it is not necessarily very large when compared to the other terms. The 1KJS is clearly a specific case where that happens.
Furthermore, although solvation energy of the best decoy for the 1KJS set is nearly 100 kcal/mol lower than that of the native structure, we also notice that this alone is not enough to make a decoy overall better than the native. We see that the best decoy for the 1AF7 set has solvation energy lower than its native's by over 240 kcal/mol, even so, the native has an enthalpy of formation more than 800 kcal/mol lower. We also notice from Table 2 that the native of 1AF7 set has an enthalpy of formation over 800 kcal/mol lower than its best decoy's, whereas for 1CEW that difference is just a bit over 230 kcal/mol. Nevertheless, their solvation, hydrogen bond, and dispersion terms are very alike, which shows the effect of the total energy in the ranking of the decoys.
The total energy is a sum of the electronic energy, which is always negative, and the core-core repulsion energy, which is always positive. In more packed structures, the core-core repulsion energy is expected to be larger. Native structures are more packed than their decoys. In the case of the 1KJS, this can be seen in Figure 4 but that is readily verified because for every decoy set the COSMO cavity for the native structures have a smaller volume than the cavity for each of their respective decoys, also the native structure has a core-core repulsion energy greater than its best decoy.
More densely packed conformations could in principle be stabilized by the other terms such as noncovalent interactions and solvation, which is what usually happens given that every native structure has a larger core-core repulsion energy than its decoys. In the case of the 1KJS however, it is pretty clear that according to PM7 and COSMO, the best decoy is better solvated in water and its total electronic energy is just more favorable than any noncovalent interactions in the native structure. It is also noticeable that decoy solvation alone should not make it automatically better. Either the native has an electronic energy that is too small or a core-core repulsion that is too large.
Decomposing the total energy we verified that the native structure electronic energy is over 460 thousand eV more negative than its best decoy, but its core-core repulsion energy is nearly 460 thousand eV more positive. Since we know that the total energy is not being increased in favor of the best decoy by the electronic energy, which favors the native structure in this case, we have to assume that the main cause for the large total energy in favor of the best decoy comes from the corecore repulsion in the native structure.
The overall success of PM7 in the detection of native structures when compared to the other Hamiltonians with noncovalent corrections is interesting but it might be fruit of some fortuitous error-cancelation when it comes to intramolecular noncovalent interactions, since it has been shown that PM7 is worse than PM6-D3 when compared with experimental free energy variations of interaction between molecules. [49] Table 2. The differences for each method and decoy set between the native structure and the best scoring decoy expressed in kcal/mol, and the Z-score of the native function in the set. The calculations were done for the whole set, prior to 3SIEVE application. Z-score DDH f .

FULL PAPER
WWW.C-CHEM.ORG

Geometric considerations
The detection of the native structure is an interesting feature for a score function to have. However, during a sampling procedure very rarely (if ever) will the sampling algorithm hit the native structure square on. In a realistic setting, an additional feature that a good scoring function should have is the capacity to discriminate geometrically good decoys from geometrically bad ones. The geometric assessment of the decoy quality can be done based on the fraction of the native contacts Q(X), which is reported to be an important geometrical descriptor of protein structure. [50,51] The calculation of this descriptor is done according to eq. (3).

QðXÞ5
1 jSj ði;jÞ 2 s In this equation, X is a conformation. S is the set of all heavy atom pairs (i, j) belonging to residues R i and R j such that these are more than three residues from each other in the chain and    the distance between atoms i and j in the native conformation is less than r 0 ij 5 4.5 Å . The distance between atom i and j in conformation X is denoted r ij (X), |S| denotes the number of elements in set S and b and k are numerical parameters with the values 5 and 1.8, respectively. [50] We used the shorthand notation Q(method) to denote the fraction of native contacts for the best ranking decoy conformation according to a method. Also in each set there must be a decoy with the greatest fraction of native contacts, the value of that fraction is the Q 0 of the set. The calculation of the fraction of native contacts was made using the suite of scripts pdbTools. [52] The graphs in Figure 5 shows the correlations between the semiempirical methods Q(PM7), Q(PM6), Q(PM6-DH1), and Q(PM6-D3) and Q 0 for all sets.
Correlation coefficients for all three methods are very similar and in all of them they were slightly larger than 0.8. This is indicative that the best decoys in each set have approximately the same fraction of native contacts as the ones closest to the native structure, which suggests that all three methods manage to find comparably good conformations.
In the absence of the native structure, 3SIEVE did not detect any decoy which it considered native-like, with the exception of a single decoy in the 1AOY set using the PM7 method. An interesting feature of that particular decoy is that it was 3SIEVE-detected only by the PM7 method but it is also considered the best decoy by the other methods.
Further analysis of the decoy shows that its fraction of native contacts is equal to 0.49, which is unusually high for the decoys tested. Furthermore the decoy is also geometrically close to its native structure, as shown in Figure 6, which is also not necessarily the case for the rest of the decoys studied.
One other such decoy with a high fraction of native contacts is the best 1KVI decoy for the PM7, where the same fraction of native contacts was found, but since the population of decoys for 1KVI has less variation, none of them managed to stray too far from the population, triggering the detection by 3SIEVE.
In addition, we found difficulties that the sampling method [6][7][8] might have had when it sampled the conformational space, because in most cases we have that Q 0 < 0.5, Figure 5. The correlation between the fraction of native contacts of the decoy with the best fraction of native contacts in each set (Q 0 ) and the fraction of the native contacts of the decoy with the lowest energy according to each method. Adjusted R 2 correlation coefficients can be seen for the PM6, PM6-DH1, PM6-D3, and PM7 methods in the upper left corner of subgraphs a, b, and c, respectively. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.] FULL PAPER WWW.C-CHEM.ORG meaning that the best decoy in each set ordinarily does not satisfy even half of the native contacts.

Conclusions
In this work, we showed that semiempirical Hamiltonians can be good scoring functions for protein structures. We have compared four semiempirical methods within the same family of Hamiltonians from the PM6/PM7 family of Hamiltonians to assess the possibility of using their enthalpies of formation as score functions for proteins.
Enthalpically, the PM7 method was found to be superior to the other three methods, not only in identifying the native structure among the decoys in all but one set, but also in statistically distinguishing the native conformation from the rest of the decoys, simulating the deep well in which the thermodynamic hypothesis predicts the native state to be on the free energy surface. Which is a characteristic of a good score function.
When considering only the decoys, however, all the methods manage to find geometrically sound decoys out the set of conformations, with the PM6-DH1 method faring marginally better than the others. This is shown by the high correlation between the fraction of native contacts of the best decoys according to the methods and the decoys closest to the native. We should also remark that, as a general rule, no decoy in any set had more than half of its native contacts satisfied and the majority had even less. The sole exception was the 1EGX set, for which Q 0 5 0.57.
Given that the PM7 method had scoring success compared to the PM6 and PM6-DH1 methods but apparently has a better description of the features one would expect in a good scoring landscape, which enabled it to distinguish native structures from the rest, we concluded that the PM7 method is overall a better score function than the PM6, PM6-D3, and PM6-DH1 methods.
Given the increase in computational power that has occurred over the years, the number of structures that can be adequately treated by semiempirical scoring functions will become increasingly larger. It is our view that this computational power will enable the use of quantum descriptors to deal with large numbers of decoys.