Distributed under Creative Commons Cc-by 4.0 Procs15: a Dft-based Chemical Shift Predictor for Backbone and Cβ Atoms in Proteins

We present ProCS15: a program that computes the isotropic chemical shielding values of backbone and Cβ atoms given a protein structure in less than a second. ProCS15 is based on around 2.35 million OPBE/6-31G(d,p)//PM6 calculations on tripeptides and small structural models of hydrogen-bonding. The ProCS15-predicted chemical shielding values are compared to experimentally measured chemical shifts for Ubiquitin and the third IgG-binding domain of Protein G through linear regression and yield RMSD values of up to 2.2, 0.7, and 4.8 ppm for carbon, hydrogen, and nitrogen atoms. These RMSD values are very similar to corresponding RMSD values computed using OPBE/6-31G(d,p) for the entire structure for each proteins. These maximum RMSD values can be reduced by using NMR-derived structural ensembles of Ubiquitin. For example, for the largest ensemble the largest RMSD values are 1.7, 0.5, and 3.5 ppm for carbon, hydrogen, and nitrogen. The corresponding RMSD values predicted by several empirical chemical shift predictors range between 0.7–1.1, 0.2–0.4, and 1.8–2.8 ppm for carbon, hydrogen, and nitrogen atoms, respectively.


INTRODUCTION
Chemical shifts hold valuable structural information that is being used more and more in the determination and refinement of protein structures and dynamics (Mulder & Filatov, 2010;Raman et al., 2010;Lange et al., 2012;Bratholm et al., 2015;Robustelli et al., 2010) with the aid of empirical shift predictors such as CamShift (Kohlhoff et al., 2009), Sparta+ (Shen & Bax, 2010, ShiftX2 (Han et al., 2011), PPM One (Li & Brüschweiler, 2015) and shAIC (Nielsen, Eghbalnia & Nielsen, 2012). These methods are typically based on approximate physical models with adjustable parameters that are optimized by minimizing the discrepancy between experimental and predicted chemical shifts computed using protein structures derived from X-ray crystallography. The agreement with experiment is quite remarkable with RMSD values around 1, 0.3, and 2 ppm for carbon, hydrogen, and nitrogen atoms. Chemical shift predictions based on quantum mechanical (QM) calculations (mostly density functional theory, DFT) are becoming increasingly feasible for small proteins (He, Wang & Merz, 2009;Zhu, He & Zhang, 2012;Zhu, Zhang & He, 2013;Exner et al., 2012;Sumowski et al., 2014;Swails et al., 2015) and Vila, Scheraga and co-workers have gone on to develop a DFT-based chemical shift predictor for Cα and Cβ atoms called CheShift-2 (Martin et al., 2013). Generally, these QM-based methods yield chemical shifts that deviate significantly more from experiment than the empirical methods, with RMSD values that generally are at least twice as large. However, many of these studies have also shown that the empirical methods are less sensitive to the details of the protein geometry and that QM-based chemical shift predictors may be more suitable for protein refinement (Parker, Houk & Jensen, 2006;Sumowski et al., 2014;Vila et al., 2009;Christensen et al., 2013).
Some of us recently showed  that protein refinement using a DFT-based backbone amide proton chemical shift predictor (ProCS) yielded more accurate hydrogen-bond geometries and 3h J NC ′ coupling constants involving backbone amide groups than corresponding refinement with CamShift. Furthermore, the ProCS predictions based on the structurally refined ensemble yielded amide proton chemical shift predictions that were at least as accurate as CamShift. This suggests that the larger RMSD observed for QM-based chemical shift predictions may, at least in part, be due to relatively small errors in the protein structures used for the predictions, and not a deficiency in the choice of DFT functional and basis set. However, in order to test whether this is true in general we need to include the effect of more than one type of chemical shift in the structural refinement. In this study we extend ProCS to the prediction of chemical shifts of backbone and Cβ atoms in a new method we call ProCS15. We describe the underlying theory, which is significantly different from the previous, amide proton-only, version of ProCS (hence the new name) and test the accuracy relative to full DFT calculations as well as experiment for Ubiquitin and the third IgG-binding domain of Protein G (GB3). We also compare the accuracy to CheShift-2 and other commonly used empirical chemical shift predictors using both single structures and NMR-derived ensembles for Ubiquitin.

THEORY
ProCS15 computes the chemical shift of an atom in residue i by where a and b are empirically determined parameters as discussed further below and σ i is the isotropic chemical shielding of an atom in residue i. σ i is computed from the protein structure using the following equation (some of these terms only contribute for certain atom types as described below) Here is the chemical shielding computed for an Ac-AXA-NMe tripeptide (AXA for short, Fig. 1), where X is residue i, for a given combination of φ, ψ, and χ 1 , χ 2 ,..., values as described further in the "Backbone scans" subsection. Δσ i−1 BB is the change in chemical shielding of an atom in residue i due to the presence of the side-chain of residue i − 1. It is computed as Here σ i−1 BB is the chemical shielding computed for an AXA tripeptide where X is residue i − 1, and σ A is from the corresponding calculation on the AAA tripeptide but using φ std = −120 • and ψ std = 140 • for all φ and ψ angles. For example, if residue i is a Ser and residue i − 1 is a Val then the effect of the Val side-chain on the Cβ chemical shielding of the Ser residue is computed as the difference in the chemical shielding of the Cβ atom in the C-terminal Ala residue computed for an AVA and AAA tripeptide. This approach assumes that the effect of the i − 1 side chain on the chemical shielding values of the atoms in residue i are independent of the conformations φ i and ψ i angles and the nature of residue i. σ i+1 BB is the corresponding change in chemical shielding of an atom in residue i due to the presence of the side-chain of residue i + 1.
Δσ i HB in Eq.
(2) is the effect of hydrogen bonding to the amide H (Δσ i 1 • HB ) and O (Δσ i 2 • HB ) atoms of residue i on the chemical shielding of the backbone atoms (this term is zero for Cβ) Δσ i 1 • HB is computed using the structural models shown in Fig. 2 as the change in chemical shielding of the backbone atoms in N-methyl acetamide relative to that of the free monomer computed at the OPBE/6-31G(d,p)//PM6 level of theory for a variety of orientations (see the subsection "Hydrogen bond scans" for more information) while the internal monomer geometries are kept fixed. For Hα the chemical shielding is taken as the average of the three hydrogen atom on the N-methyl group. Note that the carbonyl carbon formally belongs to residue i − 1. Δσ i 2 • HB is included only when another amide or amine group is hydrogen bonded to the amide oxygen and is computed as the change in the chemical shielding of the top amide group in Fig. 2A. For Hα the chemical shielding is taken as the average of the three hydrogen atoms on the methyl group of the acetamide. Note that in this case the amide nitrogen and hydrogen formally belong to residue i + 1 and that r HO , θ, and ρ are defined relative to the carbonyl oxygen of residue i rather than the amide proton as for Δσ i 1 • HB . r HO , θ , and ρ are therefore labeled r OH , θ O , and ρ O in Eq. (4). Δσ i HαB is the effect of hydrogen bonding to the Hα and amide O atoms of residue i on the chemical shielding of the backbone atoms and Cβ and has two contributions Δσ i 1 • HαB is computed using the structural models shown in Fig. 3 as the change in chemical shielding of the backbone and Cβ atoms in Ac-A-NMe relative to that of the free monomer computed at the OPBE/6-31G(d,p)//PM6 level of theory for a variety of orientations (see the subsection "Hydrogen bond scans" for more information) while the internal monomer geometries are kept fixed. Δσ i 2 • HB is computed as the change in the chemical shielding of the top amide group in Fig. 3A. For Hα the chemical shielding is taken as the average of the three hydrogen atom on the methyl group of the acetamide. Note in this case that the amide nitrogen and hydrogen formally belong to residue i + 1 and that r HO , θ, and ρ are defined relative to the carbonyl oxygen of residue i rather than the amide proton as for Δσ i 1 • HB . r HαO , θ , and ρ are therefore labeled r OHα , θ O , and ρ O in Eq. (5). Δσ i RC is the effect of ring current on the chemical shielding. Usually this is only significant for proton shift and is thus only calculated for the Hα and amide protons.
The ring current is calculated by a simple point-dipole model equation The model depends on the parameters i, which is the side-chain-specific ring-current intensity relative to benzene, B, which is a constant in the model, and the vector r, which is the vector from the proton to the center of the aromatic ring. θ is the angle between r and the vector normal to the aromatic ring system. The cut-off for calculating ring current is 8Å in Procs15 and the value for i and B are taken from Christensen, Sauer & Jensen (2011). Δσ i w is the change in chemical shielding of an amide proton due to a hydrogen bond to a water molecule. While the backbone terms of ProCS15 is parameterized based on DFT calculations with the polarizable continuum model of solvation, this model does not account for explicit solvent effects and this term is included for amide protons that do not form hydrogen bonds to other atoms in the protein structure. Δσ i w is 2.07 ppm based on DFT calculations on an N-methylacetamide-water complex .

Backbone scans
The capped AXA tripeptides used to compute the first three terms of Eq. (2) were constructed using the FragBuilder Python module (Christensen, Hamelryck & Jensen, 2014), which was also used to make different conformations. The acidic and basic amino acids are all modeled in their charged state, including Histidine. This will be the correct charged state for most ionizable residues in most proteins. However, for any ionizable residues that are in their neutral state this approximation can introduce large errors. For example, the Cβ chemical shifts of Asp and His change by 3.0 and 2.4 ppm due to protonation state changes in small peptides, while the N-chemical shifts change by 1.5 and 1.8 ppm (Platzer, Okon & McIntosh, 2014). This issue will be addressed in future studies. Only Cysteine is modeled and not the disulfide bonded Cysteine. For each tripeptide a scan on the central residue's backbone and side chain dihedral angles φ, ψ, χ 1 , χ 2 , χ 3 , χ 4 was carried out. The ω dihedral angle was fixed at 180 • . The φ/ψ backbone angles on the N and C-termini alanine residues were fixed at −140 • and 120 • corresponding to typical β-sheet residue backbone angles. The scans were done with a 20 • grid spacing. For the alanine AAA tripeptide this resulted in 361 conformations from a φ/ψ scan. For amino acid types with more than two side chain angles this approach would result in far to many samples. Instead we used BASILISK (Harder et al., 2010) that allows us to sample from the continuous space of the side chain torsion degrees of freedom. 1,000 conformations were generated for each φ/ψ backbone pair spaced by 20 • . See Table S1 in Supplementary Materials for an overview of the number of conformations sampled for each residue. The geometry of each conformation were optimized with PM6 (Stewart, 2007) with the backbone and side chain torsion angles frozen. The GIAO NMR calculations were done at the OPBE/6-31G(d,p) level of theory (Zhang et al., 2006) using the CPCM continuum solvation model (Barone & Coss, 1998) with a dielectric constant of 78. The rationale for using 78 is that the bulk solvent effects will have the largest effect for charged side-chains, which are usually located on the surface of the protein. Both the optimization and NMR calculation were done with Gaussian 09 program (Frisch et al., 2014). In total the ProCS15 backbone terms are based on ∼2.35 million DFT calculations.
Several structures failed in the optimization stage or had to be discarded due to steric clashes in the NMR calculation and the missing chemical shielding values were found by interpolation. For amino acids with no and one side chain angles cubic interpolation was used and for 2-4 side chain angles nearest neighbor interpolation. For amino acids with 0 side chain angles, the data is interpolated to a grid with 1 • grid spacing, 1 side chain angles to a grid of 5 • and the rest of the amino acids 20 • . The interpolation is done with the Python package SciPy (Jones, Oliphant & Peterson, 2001). The grids are saved in the .npy compressed file format from the Numpy Python package. In the compressed state on the hard disk the data size is ∼17 GB and when loaded in to random access memory (RAM) ∼25 GB.
Much of the variation in some of the chemical shifts comes from the nature of the side-chain itself and the side chains before and after in the sequence, which can lead to inflated r-values. To separate the contributions of the sequence and the structure we subtract the measured sequence corrected random coil values (Tamiola, Acar & Mulder, 2010) from all predicted and experimental values. Note that this does not affect the computed RMSD values.

Choice of functional and basis set
When it comes to prediction of chemical shifts in proteins the most widely used functional appears to be B3LYP (Becke, 1993). For example, Zhu, He & Zhang (2012) used B3LYP/6-31G(d,p) to compute hydrogen and carbon chemical shifts for small proteins that correlate well with experimental measurements with r values typically ≥0.98 when solvent effects are taken into account. Exner, Möller, and co-workers (2012) obtained similar results using B3LYP/6-31G(d) and even observed a correlation of 0.81 for the notoriously difficult amide N by averaging over several snapshots. Finally, Vila, Baldoni & Scheraga (2009) did a systematic study of the effect of 10 functionals on Cα chemical shifts in Ubiquitin and found very little difference in performance with all r and RMSD values in the range 0.902-0.908 and 2.12-2.30 ppm. Interestingly, this study included functionals such as OPBE that are computationally less demanding than B3LYP. Vila, Scheraga and co-workers (2009) subsequently observed that Cα chemical shifts computed using smaller basis sets such as 6-31G correlate extremely well the chemical shifts computed using lager basis set such as 6-311+G(2d,p). We therefore decided to use the 6-31G(d,p) basis for our calculations and use the computationally efficient OPBE functional.

Benchmarking ProCS15 against full QM calculations
Equation (2) is parameterized using OPBE/6-31G(d,p)//PM6 calculations so we compare ProCS15 against full OPBE/6-31G(d,p)//PM6-D3H+ calculations on Ubiquitin (1UBQ) and GB3 (2OED) to test for errors introduced by the inherent additivity assumptions and the structural simplifications in the model systems used for the DFT calculations. We use PM6-D3H+ for the geometry optimization, rather than PM6, to get a better description of hydrogen-bonding and other intermolecular interactions. However, bond lengths and angles, and their effect on chemical shifts, will be very virtually identical to PM6. The results are summarized in Table 1. The first row, marked "all", summarizes results for ProCS15 if all but the last term of Eq.
(2) are included. The last term corrects for the explicit solvent effects and thus not relevant when comparing to DFT calculations. In the case of Cα none of the terms have a large effect on the chemical shielding. In the case of GB3 the results improve slightly if Δσ i 1 • HB is removed and removing Δσ i 1 • HαB improves the results slightly for both proteins. Accordingly these two terms are removed from ProCS15, while all other terms are kept (note the ring current is only included for hydrogen (2) that are included in ProCS15 for a given atom type are marked with an "x".

Cα
Cβ (2) used in ProCS15 for each atom type can be found Table 2 and the RMSD and r values obtained using this combination of terms are given in the row labeled "ProCS15" in Table 1. The RMSD value for the carbon atoms range from 1.6 to 2.5 ppm and a very similar for both proteins. The r values range between 0.60 and 0.84 with the r value being consistently highest for Cα. For the hydrogen atoms the RMSD and r values range from 0.6 to 0.8 ppm and 0.82 to 0.85, respectively. Finally, for N the RMSD values are 4.3-4.5 ppm, while the r values are in the range 0.74-0.78.
In the case of GB3 the RMSD (r) value for Cβ can be reduced (increased) to 1.8 ppm (0.71) by removing a single outlier identified by the Generalized Extreme Studentized Deviate Test (Rosner, 1983). The outlier is Ala20 for which ProCS15 and DFT predict a Cβ chemical shielding value of 176.8 and 167.4 ppm, respectively. Inspection of the structure shows that the Cβ atom is only 3.1Å from the N atom of Ala26-an interaction not taken into account in the parameterization of ProCS15.
Similarly (also for GB3), the RMSD (r) value for H N can be reduced (increased) to 0.6 ppm (0.91) by removing a single outlier identified by the Generalized Extreme Studentized Deviate Test. The outlier is Gln2 for which ProCS15 and DFT predict a H N chemical shielding value of 24.2 and 20.1 ppm, respectively. Inspection of the structure shows that the H N atom is within 1.77Å of the OE1 atom of the Gln2 side chain and within 2.54Å of an Hε atom of the Met1 side chain. While these interactions should be included in the σ i BB and Δσ i−1 BB term, respectively, it is possible that the latter interaction is not found in the scan due to the choice of φ std and ψ std described above. This residue is also identified as an outlier for N and removing it reduces (increases) the RMSD (r) value to 4.1 ppm (0.81). Table 3 shows the comparison of QM, ProCS15 and several common chemical shift predictors to experimental values. The first two rows use the OPBE/6-31G(d,p) and ProCS15 chemical shielding predictions used to construct Table 1 and therefore use the PM6-D3H+optimized structures of Ubiquitin and GB3. However, most future use of ProCS15 will be based on structures optimized with force fields so prediction of the remaining rows is done using structures optimized with the CHARMM22/CMAP force field. The ProCS15 predictions based on the CHARMM22/CMAP-optimized structures include the Δσ w term (cf. Eq. (2)). The a and b factors in Eq. (1) are found by linear regression to the experimental values for each atom type. In order to offer a fair comparison RMSD values are computed after a linear fit to the experiment for all methods.

Comparison to experimental chemical shifts using single structures
The OPBE/6-31G(d,p)//PM6-D3H+ calculations reproduce the experimental chemical shifts to within 2.8 ppm for carbon atoms, 0.6 ppm for hydrogen atoms and 4.6 ppm for nitrogen. The results are similar to those observed by other researchers using other functionals. For example, Zhu and co-workers (2012) used B3LYP3/6-31G(d,p)//AMBER (and a locally dense 6-31++G(d,p)/4-31G(d) basis set for C ′ ) and an implicit solvent model to reproduce chemical shift values to within 3.3 ppm for carbon atoms, 0.4 for hydrogen atoms and 8.4 ppm for nitrogen. In this study the RMSD for hydrogen atoms was computed for Hα and H N combined. In a later study (Zhu, Zhang & He, 2013), the same researchers reproduced the chemical shifts of amide protons in GB3 to within 0.5 ppm using a locally dense 6-31++G(d,p)/4-31G(d) basis set and a variety of functionals including OPBE. Similarly, Exner and co-workers (2012) used B3LYP/6-31G(d)//AMBER and an implicit solvent model to reproduce the H N chemical shifts of the HA2 Domain to within 0.5 ppm using a single structure and 0.3 pm using several MD snapshots.
While ProCS15 does not reproduce the DFT results perfectly as discussed above the first two rows of Table 3 show that ProCS15 can reproduce experimental chemical shifts with an overall accuracy that is similar to full DFT chemical shielding calculations for Ubiquitin and GB3. The RMSD values predicted with ProCS15 for carbon atoms are 0.1-0.6 ppm lower compared to the DFT results, while the RMSD values for hydrogen and nitrogen atoms are 0.0-0.1 ppm and 0.2-0.4 ppm higher. It is therefore not clear that much is necessarily gained by adding additional terms to ProCS15 without also increasing the underlying level of theory used to compute these terms. For example, it is known that using a larger basis set can significantly improve the prediction of C ′ chemical shifts (Vila et al., 2014;Zhu, He & Zhang, 2012).
Using structures optimized with CHARMM22/CMAP instead of PM6-D3H+ to predict chemical shifts with ProCS15 does also not seem to lead to overall worse agreement with experiment. In fact the results tend to improve slightly (up to 0.5 ppm) for heavy atoms as judged by the RMSD values. Comparison of ProCS15 to CheShift-2, which has also been parameterized against DFT calculations, show fairly similar accuracy for Cα and slightly worse accuracy for Cβ. The latter observation is perhaps due to the fact that CheShift-2 uses a different (empirical-corrected) reference for each residue type. However, this is also the case for Cα for which ProCS15 predictions give a lower RMSD value.
Comparison of ProCS15 to the empirical methods (CamShift through ShiftX2) generally show considerably lower RMSD of the empirical predictions for all atoms types, except Hα for GB3 where the accuracy is mostly comparable. The r values are also considerably higher for the empirical methods than for ProCS15 for Cα and, especially, Cβ, while they are comparable for the remaining atoms.
As mentioned in the introduction the higher RMSD values generally observed for the DFT-based methods compared to the empirical methods is expected. The important issue in the context of structural refinement against measured chemical shifts is whether the DFT-based methods are more sensitive to relative small differences in structure. While a thorough investigation of this complex issue for ProCS15 will be the subject of future studies, we look at the effect of using different structural ensembles on the accuracy next. Table 4 lists the RMSD and r values computed for Ubiquitin using the X-ray structure 1UBQ and five NMR-derived structural ensembles with between 10 and 640 structures. For ProCS15 the average chemical shift is obtained by computing the average chemical shielding for each nucleus followed by the linear regression fit to experimental chemical shift values (cf. Eq. (1)) to obtain the predicted average chemical shifts. The procedure is the same for the remaining methods except that chemical shifts are used instead of chemical shieldings.

Comparison to experimental chemical shifts using NMR-derived ensembles
For ProCS15 use of ensemble structures lowers the RMSD values for all atom types, with decreases in the range 0.1-0.7 ppm for heavy atoms and 0.1 ppm hydrogen atoms. Similar improvements are observed for Cα and Cβ for CheShift-2, except that the improvement in RMSD for Cβ (0.5 ppm) is larger compared to ProCS15 (0.3 ppm). These improvements are expected if the NMR-derived ensembles are a more accurate representation of the protein structure in solution than the single X-ray structure (Arnautova et al., 2009;Vila et al., 2010). Indeed, all but one of the ensembles used here were generated specifically to be a more realistic presentation of protein ensemble in solutions. The exception is 1D3Z, which is a traditional NMR structural model where the conformational diversity is mainly an expression of lack of structural constraints.
Improvements are also observed for CamShift, with RMSD-decreases of 0.3-1.7 and 0.2 ppm for heavy and hydrogen atoms, respectively. In the case of PPM One, Sparta+, and shAIC modest (up to 0.3 ppm) RMSD-decreases are observed for some ensembles but not others and, on average, the RMSD is roughly equally likely to remain unchanged or increase slightly. Finally, for ShiftX2 the RMSD consistently increases (by up to 0.7 ppm) on going from the X-ray structure to the ensembles, with the exception of Cα where the RMSD is lowered by 0.1 ppm. We note that the RMSD values predicted with CamShift using the crystal structure are significantly larger than when using the CHARMM/CMAP structure (presumably due to hydrogen being optimized placed in accordance to the CHARMM22 topology file in the CamShift training set) and that the reduction in RMSD on going to ensembles is at most 0.3 ppm relatively to these values. So, it appears that the use of ensemble structures does not lead to a significant increase in accuracy compared to using a single structure for any of the empirical methods, in contrast to ProCS15 and CheShift-2.
The observations are consistent with earlier observations (Parker, Houk & Jensen, 2006;Sumowski et al., 2014;Vila, Baldoni & Scheraga, 2009;Christensen et al., 2013) that the empirical NMR prediction methods tend to be significantly less sensitive to changes in protein structure compared to DFT-based chemical shift predictors or chemical shifts computed using QM methods.

SUMMARY AND OUTLOOK
In this paper we present ProCS15: a program that computes the isotropic chemical shielding values of backbone atoms and Cβ given a protein structure in less than a second. Table 4 Comparison of chemical shifts predicted using various methods to experimental values measured for ubiquitin corrected for random coil effects. The RMSD values are computed after linear regression. The predictions are done using a single X-ray structure (1UBQ) and five NMR-derived ensembles of varying size (indicated in parentheses for 1UBQ) without further refinement of the structure.  ProCS accounts for the effect of backbone and side-chain dihedral angles of a residue and the two neighboring residues, hydrogen bonding to the backbone amide group and Hα as well as ring-current effects (Christensen, Sauer & Jensen, 2011) on the hydrogen atoms and assumes that these effects are additive. The backbone, side-chain and hydrogen bonding terms are based on ∼2.35 million OPBE/6-31G(d,p)//PM6 calculations on tripeptides and small structural models of hydrogen-bonding. ProCS15 reproduces the chemical shielding values computed using PCM/OPBE/6-31G(d,p)//PM6-D3H+for Ubiquitin and GB3 with RMSD values (after linear regression) of up to 2.5 ppm for carbon atoms, 0.8 ppm for hydrogen atoms, and 4.5 ppm for nitrogen. These deviations, which presumably result from the assumption of additivity and the simplified model systems, does not appear to preclude equal or better accuracy in comparison to experiment because the accuracies of the chemical shifts computed using ProCS15 (based on linear regression of the chemical shifts, cf. Eq. (1)) are very similar to the corresponding DFT calculations using single Ubiquitin and GB3 structures. The largest RMSD values observed for carbon, hydrogen, and nitrogen are, respectively, 2.2 (2.8) ppm, 0.7 (0.6) ppm, and 4.7 (4.6) ppm for ProCS15 (PCM/OPBE/6-31G(d,p)). These accuracies are very similar to DFT-based predictions made by other researchers (e.g., Zhu, He & Zhang, 2012;Zhu, Zhang & He, 2013;Exner et al., 2012) as well as CheShift-2 (Martin et al., 2013), which is another DFT-based chemical shift predictor for Cα and Cβ atoms. The RMSD values computed using ProCS15 for Ubiquitin can be reduced by as much as 0.7, 0.1, and 0.5 ppm for carbon, hydrogen, and nitrogen by using NMR-derived structural ensembles. Similar increase in accuracy is also observed for CheShift-2 (for Cα and Cβ) while for empirical chemical shift predictors the increase in accuracy is at most 0.3 ppm.

Cα
The latter observation is another indication that empirical chemical shift predictors are less sensitive to small structural changes, which may make them less suitable for chemical shift-guided refinement of protein structure compared to DFT-based predictors. Christensen and co-workers (2013) have already demonstrated that this is the case for amide hydrogen bonding geometries using a previous incarnation of ProCS limited to amide proton chemical shift predictions and we are now planning similar refinement studies using all backbone atoms and Cβ chemical shifts.
ProCS15 is freely available at github.com/jensengroup/procs15 and all structures and DFT calculations, including the full NMR shielding tensors, are available at erda.dk/public/ archives/YXJjaGl2ZS1TYk40VXo=/published-archive.html.