Low-mass molecular dynamics simulation for configurational sampling enhancement: More evidence and theoretical explanation

It has been reported recently that classical, isothermal–isobaric molecular dynamics (NTP MD) simulations at a time step of 1.00 fs of the standard-mass time (Δt=1.00 fssmt) and a temperature of ≤340 K using uniformly reduced atomic masses by tenfold offers better configurational sampling than standard-mass NTP MD simulations at the same time step. However, it has long been reported that atomic masses can also be increased to improve configurational sampling because higher atomic masses permit the use of a longer time step. It is worth investigating whether standard-mass NTP MD simulations at Δt=2.00 or 3.16 fssmt can offer better or comparable configurational sampling than low-mass NTP MD simulations at Δt=1.00 fssmt. This article reports folding simulations of two β-hairpins showing that the configurational sampling efficiency of NTP MD simulations using atomic masses uniformly reduced by tenfold at Δt=1.00 fssmt is statistically equivalent to and better than those using standard masses at Δt=3.16 and 2.00 fssmt, respectively. The results confirm that, relative to those using standard masses at routine Δt=2.00 fssmt, the low-mass NTP MD simulations at Δt=1.00 fssmt are a simple and generic technique to enhance configurational sampling at temperatures of ≤340 K.


Introduction
It has been reported recently that use of uniformly reduced atomic masses by tenfold (hereafter abbreviated as low masses) can enhance configurational sampling in classical, isothermalisobaric molecular dynamics (NTP MD) simulations at a time step (Δt) of 1.00 fs of the standard-mass time (fs smt ) [1]. The reported sampling enhancement was determined by the smaller number of time steps required for the low-mass simulations to capture the folding of β-hairpin CLN025 [2] than that of the standard-mass NTP MD simulations performed at the same time step [1]. The folding of CLN025 in the low-mass simulations [1] was verified by the native-state conformation of CLN025 that was independently determined by an NMR spectroscopic study [2].
It has long been reported that increasing atomic masses can improve configurational sampling because higher atomic masses can reduce the fastest motions of the system in a molecular dynamics (MD) simulation to lengthen the Δt of the simulation and can gain larger momenta . Downscaling the solvent mass can reportedly enhance the configurational sampling of oligopeptides in MD simulations through reduction of solvent viscosity [27,28]. Differentially downscaling the solvent and sidechain masses also reportedly improved configurational sampling of a nonapeptide in MD simulations through reduction of solvent viscosity and adiabatic decoupling of motions of solvent, backbone, and side-chain [29].
Further, it has been reported that scaling the total mass by a factor of 10 for an MD simulation scales the time of the new system by a factor of 10 [15,30]. In theory, as explained in Section 2.1, low-mass MD simulations at Δt¼1.00 fs smt are equivalent to standard-mass MD simulations at Δt¼ 10 fs smt . In practice, it is reasonable to expect that low-mass NTP MD simulations at standard-mass NTP MD simulations at Δt42.00 fs smt can actually be performed without the use of the hydrogen mass repartitioning scheme, (ii) whether the configuration sampling efficiency of standard-mass NTP MD simulations at Δt¼3.16 fs smt is statistically equivalent to that of low-mass NTP MD simulations at Δt¼1.00 fs smt , and (iii) whether low-mass NTP MD simulations at Δt¼1.00 fs smt can statistically offer better configurational sampling than standard-mass NTP MD simulations at Δt¼2.00 fs smt .
This article reports a comparative study of the three aims using 160 unique, independent, all-atom, and classical NTP MD simulations-each of which was performed with the SHAKE algorithm for 500 Â 10 6 time steps-to determine relative configurational sampling efficiencies of using mass scaling factors of 1.0 and 0.1 and time steps of 1.00, 2.00, 3.16, and 3.50 fs smt . CLN025 and chignolin [31] were used as model systems for NTP MD simulations of miniprotein folding. To investigate the configurational sampling efficiency in a forcefield independent manner, two forcefields were used in this study-an up-to-date general-purpose AMBER forcefield FF14SB [32] (for either explicit or implicit solvation) and a special-purpose AMBER forcefield FF12MC [33] (for explicit solvation only).

Equivalence of mass scaling and time-step scaling for sampling enhancement
Scaling the total mass by a factor of λ for an MD simulation scales the time of the new system by a factor of λ [15,30]. The reason is because the sole purpose to scale total mass is to improve configurational sampling. Therefore, the units of distance [l] and energy [m]([l]/[t]) 2 of low-mass simulations are kept identical to those of standard-mass simulations. This is so that the structure and energy of the low-mass simulations can be compared to those of the standard-mass simulations in order to determine relative configurational sampling efficiencies. Let superscripts lmt and smt denote the times for the low-mass and standard-mass systems, respectively.
Then It is worth noting that a conventional MD simulation program takes a time step in the standard-mass time rather than the lowmass time. Therefore, low-mass MD simulations at Δt¼1.00 fs smt (viz., 10 fs of the low-mass time) are theoretically equivalent to standard-mass MD simulations at Δt¼ 10 fs smt (viz., 10 fs of the standard-mass time), if both standard-mass and low-mass simulations are carried out for the same number of time steps and if there are no precision issues in performing these simulations.

Molecular dynamics simulations to autonomously fold chignolin and CLN025
Chignolin or CLN025 in a fully extended backbone conformation solvated with the TIP3P water [34] with surrounding counter ions and NaCl molecules was energy-minimized for 100 cycles of steepest-descent minimization followed by 900 cycles of conjugategradient minimization to remove close van der Waals contacts using SANDER of AMBER 11 (University of California, San Francisco). The energy-minimized system was then heated from 0 to 277, 300 or 340 K at a rate of 10 K/ps under constant temperature and constant volume, and finally simulated in 20 unique, independent, allatom, and classical NTP MD simulations using PMEMD of AMBER 11 with a periodic boundary condition at 277, 300, or 340 K and 1 atm employing isotropic molecule-based scaling. The fully extended backbone conformations (viz., anti-parallel β-strand conformations) of chignolin and CLN025 were generated by MacPyMOL Version 1.5.0 (Schrödinger LLC, Portland, OR). The numbers of TIP3P waters and surrounding ions, initial solvation box size, ionizable residues, and computers used for the NTP MD simulations are provided in Table S1. The 20 unique seed numbers for initial velocities of Simulations 1-20 were taken from Ref. [35]. All simulations used (i) a dielectric constant of 1.0, (ii) the Berendsen coupling algorithm [36], (iii) the Particle Mesh Ewald method to calculate long-range electrostatic interactions [37], (iv) Δt¼1.00, 2.00, 3.16, or 3.50 fs smt , (v) the SHAKE-bond-length constraints applied to all the bonds involving hydrogen, (vi) a protocol to save the image closest to the middle of the "primary box" to the restart and trajectory files, (vii) a formatted restart file, (viii) the revised alkali and halide ions parameters [38], (ix) a cutoff of 8.0 Å for nonbonded interactions, and (x) default values of all other inputs of the PMEMD module. Instantaneous conformations of each simulation were saved at every 10 5 time steps. The forcefield parameter file of FF12MC is provided in Appendix A.  Fig. 1D]). Similarly, the reported NTP MD simulations also showed that chignolin could fold to the native β-hairpin with Tyr2 and Trp9 on the same side of the hairpin [31] and to native-like β-hairpins with Tyr2 on one side of the hairpin and Trp9 on the other [33].
The lowest Cα and Cβ root mean square deviation (CαβRMSD) between one of the native-like β-hairpins and the corresponding NMR structure of CLN025 is 2.08 Å, whereas the corresponding Cα root mean square deviation is 1.33 Å. The CαβRMSD between one of the native-like β-hairpins and the NMR structure of chignolin is 1.99 Å, but the corresponding Cα root mean square deviation is 1.58 Å. In this study CαβRMSDs and Cα root mean square deviations were calculated using PTRAJ of AmberTools 1.5 with root mean square fit all α and β carbon atoms to the corresponding ones in the β-hairpin NMR structure without mass weighing. To distinguish the native β-hairpins from the native-like ones, in this study conformations of chignolin and CLN025 with CαβRMSDs of r1.96 Å relative to their NMR structures were considered to be at the native or folded state. The time series of CαβRMSD from native conformations for chignolin and CLN025 revealed that these βhairpins could fold into conformations with CαβRMSDs of $ 1.5 Å (Fig. S1). However, the CαβRMSD cutoff for the native state was set at 1.96 Å rather than 1.50 Å because the CαβRMSD between the NMR and crystal structures of CLN025 is 1.95 Å. Otherwise, the use of a CαβRMSD cutoff of r1.50 Å would preclude conformations determined by crystallographic analysis that are commonly considered at the native state.  1) and (2) of Ref. [33], respectively, wherein N is the number of all simulations, P i is the individual native state population of the ith simulation, and P is the aggregated native state population.

Folding time estimation using survival analysis
The folding time (τ f ) or the reciprocal of the folding rate (k f ) of a β-hairpin was estimated by the mean time of the β-hairpin to fold from a fully extended backbone conformation to the native conformation in 20 unique and independent NTP MD simulations, using survival analysis methods [39] from the R survival package (Therneau T.M., A Package for Survival Analysis in S, 2015, Version 2.38-3, http://CRAN.R-project.org/package¼ survival). The CαβRMSD cutoff of r1.96 Å was used to identify conformations at the native state. For each simulation with instantaneous conformations saved at every 10 5 time steps, the first time instant at which CαβRMSD reached r1.96 Å was recorded as an individual folding time (Fig. S1). If a set of 20 full simulations (each performed for 500 million time steps) all captured a folding event, this set was used to calculate the mean time-to-folding using a two-step procedure. The first step used the Kaplan-Meier estimator [40,41] with the Surv() function in the R survival package. The second used parametric survival functions-that mostly fell within the 95% confidence bounds of the Kaplan-Meier estimator-with the Surreg() function. If the mean of the Kaplan-Meier estimator was identical to the mean derived from a parametric survival function, then this survival function was used to calculate the time-course of the mean time-to-folding of the same set of simulations. If half or more than half of 20 shortened simulations (each performed for o 500 million time steps) did not capture a folding event, the τ f of these simulations was discarded because of their overly large 95% confidence interval.

Results and discussion
3.1. Equal sampling of simulations using low-mass at 1.00 fs smt and standard-mass at 3.16 fs smt It was reported previously that CLN025 did not fold from a fully extended backbone conformation to its native conformation in 10 unique, independent, all-atom, and classical 500-million-timestep NTP MD simulations at Δt¼1.00 fs smt using FF14SB at 277 K and 1 atm [1]. When 20 such simulations were performed under the same simulation conditions, except that the forcefield was changed from FF14SB to FF14SBlm, CLN025 folded in 13 of the 20 simulations ( Fig. S1A) with an aggregated native state population with SE of 17 76% (Table 1). FF14SBlm has all the parameters of FF14SB, except that atomic masses are reduced uniformly by tenfold. When the FF14SBlm simulations were repeated under the same simulation conditions, except that FF14SBlm and Δt were changed to FF14SB and 3.16 fs smt , respectively, CLN025 folded in 16 of the 20 simulations (Fig. S1B), with an aggregated native state population including SE of 2676% (Table 1). Plotting the aggregated native state population as a function of the number of time steps shows no significant separation between the curves of the simulations using FF14SBlm at Δt¼1.00 fs smt and those using FF14SB at Δt¼3.16 fs smt (Fig. 1), according to the unpaired t-test two-tailed P value of 0.2465 for the two curves. Both sets of simulations using FF14SB and FF14SBlm did not converge well, according to (i) the large SDs relative to the means for FF14SB and FF14SBlm listed in Table 1 and (ii) the result that some simulations failed to capture a folding event ( Fig. S1A and B). Therefore, the τ f s of CLN025 were not calculated for the two sets of simulations. Nevertheless, the aggregated native state populations do show that the configurational sampling of the NTP MD simulations of CLN025 using FF14SBlm at Δt¼1.00 fs smt is statistically equivalent to that using FF14SB at Δt¼3.16 fs smt .
Next, 20 unique, independent, all-atom, and classical NTP MD simulations of chignolin at 300 K and 1 atm were carried out for 500 Â 10 6 time steps under two conditions. One used FF12MC with Δt¼1.00 fs smt , and the other used FF12MCstdm with Δt¼3.16 fs smt . FF12MCstdm has all the parameters of FF12MC, except that all atomic masses are changed to standard values. Consistent with the report that FF12MC can reduce the number of time steps of all-atom and classical NTP MD simulation required to capture of the folding of chignolin [33], in this study chignolin folded at 300 K and 1 atm from a fully extended backbone conformation to its native conformation in all 20 NTP MD simulations at Δt¼1.00 fs smt using FF12MC (Fig. S1C) or at Δt¼3.16 fs smt using FF12MCstdm (Fig. S1D), respectively. The aggregated native state populations with SEs of the simulations using FF12MC and FF12MCstdm are converged to 357 3% and 37 72%, respectively (Tables 2A and 2B). The convergences of the two sets of simulations are further supported by the small SDs relative to the means of the populations (Tables 2A and 2B). Using the Kaplan-Meier estimator [40,41] (see Section 2.4), the τ f s of chignolin were estimated to 79 ns smt (95% confidence interval of 51-123 ns smt ) for FF12MC and 72 ns smt (95% confidence interval of 47-112 ns smt ) for FF12MCstdm (Tables 2A and 2B). As indicated by the unpaired t-test two-tailed P value of 0.4788   ( Fig. 1), there is no significant separation between the two time series of aggregated native state population for chignolin at 300 K using FF12MC with Δt¼1.00 fs smt and FF12MCstdm with Δt¼3.16 fs smt , respectively. The τ f of chignolin of the FF12MC simulations is statistically equivalent to that of the FF12MCstdm simulations in terms of both mean and 95% confidence interval. These results confirm that the configurational sampling of the NTP MD simulations of chignolin using FF12MC at Δt¼1.00 fs smt is statistically equivalent to that using FF12MCstdm at Δt¼3.16 fs smt . As to the first two aims described in Section 1, these results suggest that standard-mass NTP MD simulations at Δt¼3.16 fs smt can actually be performed without employing the hydrogen mass repartitioning scheme. The results also suggest that, regardless of which forcefield is used, configuration sampling efficiency of standard-mass NTP MD simulations at Δt¼3.16 fs smt is statistically equivalent to that of low-mass NTP MD simulations at Δt¼1.00 fs smt .
3.2. Better sampling of simulations using low-mass at 1.00 fs smt than standard-mass at 2.00 fs smt Repeating the above simulations of CLN025 at 277 K using FF14SB at Δt¼2.00 fs smt showed that CLN025 folded in ten of the 20 simulations (Fig. S1E) with an aggregated native state population including SE of 9 74% (Table 1). As indicated by the unpaired t-test two-tailed P value of 0.0239 (Fig. 1), there is a significant separation between the two time series of aggregated native state population for CLN025 at 277 K using FF14SBlm and FF14SB. These series are upward and have not reached plateau (Fig. 1). In addition, the SDs are larger than the means of the FF14SBlm and FF14SB simulations. These results indicate that both sets of simulations are not well converged. Nevertheless, the unpaired ttest two-tailed P value of 0.0239 indicates that the configurational sampling of the MD simulations of CLN025 using FF14SBlm at Δt¼1.00 fs smt is significantly better than that using FF14SB at Δt¼2.00 fs smt .
Repeating the above simulations of chignolin at 300 K using FF12MCstdm at Δt¼2.00 fs smt revealed that chignolin folded in 19 of the 20 simulations (Fig. S1F), with an aggregated native state population including SE of 3573% (Table 2C) and a τ f of 147 ns smt (95% confidence interval of 94-230 ns smt ) that was estimated using the exponential survival function (see Section 2.4). As indicated by the unpaired t-test two-tailed P value of 0.0299 (Fig. 1), there is a significant separation between the two time series of aggregated native state population for chignolin at 300 K using FF12MC and FF12MCstdm. Both the mean and 95% confidence interval for the τ f of chignolin estimated from the FF12MCstdm simulations at Δt¼2.00 fs smt are nearly twice those obtained from the FF12MC simulations at Δt¼1.00 fs smt (Tables 2A and 2C). These results confirm that the configurational sampling of the NTP MD simulations of chignolin using FF12MC at Δt¼1.00 fs smt is significantly better than that using FF12MCstdm at Δt¼2.00 fs smt . As to the last aim described in Section 1, the results suggest that the low-mass NTP MD simulations at Δt¼1.00 fs smt offer significantly better configurational sampling efficiency than the standard-mass NTP MD simulations at Δt¼2.00 fs smt .

Low-mass NTP MD simulation for configuration sampling enhancement
According to the survival analysis of the folding simulations, the τ f of chignolin estimated from the Kaplan-Meier estimator [40,41] using the simulation data of FF12MC at Δt¼3.16 fs lmt (viz., Δt¼1.00 fs smt ) was identical to the one obtained from the exponential survival function. This is evident from the linear relationship between simulation time and natural logarithm of the nonnative state population for chignolin shown in Fig. 2. The τ f of chignolin estimated from the Kaplan-Meier estimator using the data of FF12MCstdm at Δt¼3.16 fs smt was also the same as the one obtained from the exponential model (Fig. 2). These observations of the exponential decay of the nonnative state population over simulation time indicate that the folding of chignolin observed in the NTP MD simulations followed a simple two-state kinetics scheme of Eq. (1), wherein D and N denote the nonnative and native conformations, respectively. These results are in excellent agreement with the reported two-state folding kinetics deduced from experimental studies of chignolin, a ten-residue β-hairpin [31]. The results also agree with the generalization that a miniprotein (with residues of o 100) folds according to a two-state kinetics scheme [42]. This implies that the folding rate (k f ) of such a miniprotein follows the first-order rate law, namely, Eq. (2) [43]. Most importantly, the results demonstrate that the folding simulations of chignolin using Δt¼3.16 fs lmt and Δt¼3.16 fs smt are realistic.
By contrast, the τ f of chignolin simulated with FF12MCstdm at Δt¼2.00 fs smt could not be estimated from the Kaplan-Meier estimator because one of the 20 simulations failed to capture a folding event (Table 2C); the relationship between simulation time and natural logarithm of the nonnative population of chignolin simulated with FF12MCstdm at Δt¼2.00 fs lmt (r 2 ¼0.8848, Fig. 2) is not as linear as those with FF12MC at Δt¼3.16 fs lmt and  16 4 128 FF12MCstdm at Δt¼3.16 fs smt (r 2 ¼ 0.9387 and 0.9039, respectively; Fig. 2A and B). Consequently, the τ f of chignolin simulated with FF12MCstdm at Δt¼2.00 fs smt has to be estimated using the exponential survival function and has a very large 95% confidence interval relative to those with Δt¼3.16 fs lmt and Δt¼3.16 fs smt ( years of low-mass NTP MD simulations of both regular and large proteins with biological relevance [44][45][46][47][48][49][50][51][52][53][54][55][56] have already been done by this author to confirm that various proteins can be simulated successfully at Δt¼1.00 fs smt and Tr340 K. Lastly, it should be pointed out that uniformly reducing atomic masses of the original system does not change the physical composition of the new system. The kinetic properties derived from low-mass NTP MD simulations can be scaled back to the properties of the original system by a factor of 10 . For these reasons, low-mass NTP MD simulations at Δt¼1.00 fs smt are presently preferred over standard-mass NTP MD simulations at Δt¼3.16 fs smt . Such low-mass NTP MD simulations can be used as a simple and generic technique to enhance configurational sampling for both kinetic and thermodynamic investigations of proteins at T r 340 K.

Conclusions
The present work demonstrates that the configurational sampling efficiency of low-mass NTP MD simulations of β-hairpin folding at Δt¼1.00 fs smt is statistically equivalent to and higher than those using standard masses at Δt¼3.16 and 2.00 fs smt , respectively. This work also shows that, without employing the hydrogen mass repartitioning scheme, standard-mass NTP MD simulations at Δt¼3.16 fs smt and T ¼277, 300, and 340 K can be performed to capture the autonomous β-hairpin folding. While further studies may show that with sufficient computational precision standard-mass NTP MD simulations at Δt¼3.16 fs smt might be simpler and equally effective for configurational sampling enhancement at T r340 K, the present results together with the previous results reported by this author confirm that low-mass NTP MD simulations are a simple and generic technique to enhance configurational sampling for both kinetic and thermodynamic investigations of proteins at T r340 K.