Short-Time Infrequent Metadynamics for Improved Kinetics Inference

Infrequent Metadynamics is a popular method to obtain the rates of long time-scale processes from accelerated simulations. The inference procedure is based on rescaling the first-passage times of the Metadynamics trajectories using a bias-dependent acceleration factor. While useful in many cases, it is limited to Poisson kinetics, and a reliable estimation of the unbiased rate requires slow bias deposition and prior knowledge of efficient collective variables. Here, we propose an improved inference scheme, which is based on two key observations: (1) the time-independent rate of Poisson processes can be estimated using short trajectories only. (2) Short trajectories experience minimal bias, and their rescaled first-passage times follow the unbiased distribution even for relatively high deposition rates and suboptimal collective variables. Therefore, by basing the inference procedure on short time scales, we obtain an improved trade-off between speedup and accuracy at no additional computational cost, especially when employing suboptimal collective variables. We demonstrate the improved inference scheme for a model system and two molecular systems.

Molecular dynamics (MD) simulations are widely used to study complex systems at the microscopic level.2][3][4] Therefore, long timescale processes, such as protein folding or crystal nucleation, are almost never studied using brute-force, long simulations. 5Instead, enhanced sampling methods are usually employed.
Here, we focus on infrequent MetaD (iMetaD), a method to extract the unbiased kinetics from accelerated MetaD simulations.In iMetaD, several biased trajectories are initiated and stopped after a first-passage criterion is fulfilled.The first-passage time (FPT) of each trajectory is then rescaled by an acceleration factor that depends on the external bias deposited along the trajectory. 1 The method assumes that the underlying process obeys Poisson kinetics, and the unbiased rate is estimated by fitting the rescaled FPTs to an exponential distribution. 2e key assumption of iMetaD is that no bias is deposited near the transition state. 1,5is assumption fails for high bias deposition rates or suboptimal CVs, that lead to hysteresis and bias over-deposition. 2,5,226][27][28][29][30][31] Thus, to improve the inference, one is usually forced to limit the bias deposition rate but this also reduces the acceleration, resulting in a tradeoff between speedup and accuracy. 4,11,22e reliability of iMetaD is usually assessed through a procedure suggested by Salvalagio et.al. 2,5 A Kolmogorov-Smirnov (KS) test 32 is performed to accept or reject the hypothesis that the rescaled FPTs are taken from an exponential distribution.The results are considered reliable for a p-value greater than 0.05, and the unbiased mean FPT (MFPT) is then estimated as the mean of the exponential fit to the rescaled FPT distribution.Trajectories with hysteresis or over-deposition contribute unrealistically long rescaled FPTs, leading to distributions that are broader than exponential and failure of the KS test.In this paper, we propose an improved inference scheme which deals with this prevalent problem, extending the range of applicability of iMetaD.
The scheme relies on two key observations: 1) Since exponential distributions are characterized by a single parameter -their time-independent kinetic rate -short simulations, showing a single transition each, are sufficient to estimate the full distribution reliably.2) The rescaling procedure of iMetaD is more reliable for short trajectories, experiencing minimal bias.This can be seen from the rescaled FPT distribution, which often follows the unbiased distribution at short times even when using high bias deposition rates or suboptimal CVs.
The improved scheme is inspired by our previous work, combining MetaD with stochastic resetting (SR). 11Previously, we showed that SR provides enriched the sampling of short timescales, leading to an improved tradeoff between speedup and accuracy.Interestingly, these observations are not limited to simulations with SR.
We next show how to exploit our observations to build an improved kinetics inference scheme for iMetaD simulations.We refer to this scheme as Short Time iMetaD (ST-iMetaD).
ST-iMetaD extends the applicability of iMetaD even to high bias deposition rates and suboptimal collective variables, reducing the prediction errors by orders of magnitude in comparison to the standard procedure.We demonstrate its advantages in three systems of increasing complexity: the two-dimensional Wolfe-Quapp potential, alanine dipeptide in vaccum, and the unfolding of the chignolin mini-protein in water.
We present ST-iMetaD through the example of the Wolfe-Quapp potential.It is a two states model, previously used to study the performance of suboptimal CVs. 22,23Its exact form is given in the SI, as well as all simulations details.We first performed brute-force, standard MD simulations, to obtain the unbiased FPT distribution, and found the MFPT to be ∼ 110 ns.A KS test confirmed that the unbiased distribution is exponential (a p-value of 0.81).
Next, we performed iMetaD simulations with a good CV and slow bias deposition rate of 10 ns −1 , updating the bias every 10 5 timesteps.The quality of the CV was proved using committor analysis 33,34 (see SI for details).With this choice of parameters, we expect the underlying assumptions of iMetaD to be valid, and the original inference scheme to be accurate.Indeed, the MFPT estimated through the standared inference procedure is 121 ns, in good agreement with the true value.A p-value of 0.31 confirms the reliability of the results.The left panel of Figure 1(a) shows the cumulative distribution functions (CDF) P (τ ≤ t) for both the unbiased FPTs (blue solid line) and the rescaled FPTs (green dashed line).An exponential fit to the CDF of the rescaled FPTs is given in an orange dash-dotted line.We find a good agreement between all three curves, showing that the standard inference procedure is adequate in this case.
We then performed iMetaD simulations using the same CV but with a higher bias deposition rate of 1000 ns −1 (every 1000 timesteps), which is expected to give poor inference due to hysteresis.The obtained MFPT, 1031 ns, overestimates the true value by an order of magnitude, and the p-value of the KS test drops to 5 × 10 −7 , indicating that the results are unreliable.The middle panel of Figure 1(a) again shows the CDF for the unbiased and rescaled FPTs, as well as the exponential fit to the rescaled CDF.In this case, we find that the rescaled CDF is not exponential and clearly deviates from the unbiased CDF.As a result, the exponential fit results in a wrong estimate of the rate and MFPT.Nevertheless, we find that the unbiased and rescaled CDFs are in very close agreement at short times.This is the first key observation of this work: short trajectories experience minimal bias and thus their rescaled CDF reflects the correct statistics for small FPTs even at high bias deposition rates.Results for unbiased FTPS (blue solid lines), rescaled FPTs (dashed green lines), exponential fits to the rescaled CDF in the entire range (orange dash-dotted lines), and linear fits to the survival functions at t < t * (pink dotted lines).Results are shown for iMetaD simulations using a good CV and bias deposition rate of 10 ns −1 (left), a good CV and bias deposition rate of 1000 ns −1 (middle) and a suboptimal CV and bias deposition rate of 200 ns −1 (right).The black dashed lines mark t = t * .A similar phenomenon is observed when employing suboptimal CVs.We select a moderate bias deposition rate of 200 ns −1 , and intentionally reduce the quality of the CV by rotating it at an angle θ = 56 • relative to the good CV.The right panel of Figure 1(a) presents the resulting CDF profiles.Once again, the rescaled CDF is far from exponential (p-value of 6 × 10 −6 ), and the MFPT is overestimated (∼ 670 ns).However, even though the rescaled CDF deviates from the unbiased CDF at long times, they match closely at short times.
When employing iMetaD, it is common practice to present the rescaled CDF profile, which is used for the KS test. 2,4,5However, for the rest of this paper, it would be more convenient to examine the survival function, 1 − P (τ ≤ t), since its logarithm decays linearly for exponential distributions.Figure 1(b) gives log (1 − P (τ ≤ t)) for the unbiased FPTs (blue solid lines) and for the rescaled FPTs (green dashed lines) of the simulations presented in Figure 1(a).The unbiased survival function decays linearly, as expected.When the assumptions of iMetaD hold (left panel), the rescaled survival function closely follows the unbiased one.When they break (middle and right panels), the rescaled survival function matches the unbiased one up to some finite time t = t * , and decays slower at t > t * .As a result, the exponential fits to the rescaled data (orange dash-dotted lines) decay much slower than the unbiased curves.This explains the overestimated MFPT values.
We improve the inference by fitting a linear function S(t) = −kt to the rescaled survival function at t ≤ t * only (dotted pink lines in Figure 1(b)).We then estimate the MFPT as k −1 .Notice that we only fit a single parameter k, as the survival function must fulfill S(t = 0) = 0 due to normalization.In all three cases, we find that these short-time fits are closer to the unbiased survival function, and therefore lead to an improved estimate of the MFPT, as we show below.First, we explain how to choose an adequate value of t * .
We use the Pearson correlation coefficient R, which quantifies the quality of the linear fit to the survival function.Practically, we perform multiple fittings, with different choices of t * , and select the fit resulting in the highest value of R 2 .Our results show that this approach correctly identifies reasonable values of t * , such that the rescaled survival functions match the unbiased ones at t ≤ t * .Specifically, for a good CV and a low bias deposition rate, where the results are reliable, we find t * ∼ 24 ns.For the same CV and a high bias deposition rate, and for a poor CV and a moderate rate, we obtain lower values, t * ∼ 10 ns and t * ∼ 8 ns, respectively (black dashed lines in Figure 1  To summarize our method, the main modification to the original inference scheme is that instead of fitting an exponential distribution to all of the data, as customary, we limit the analysis to short timescales.We perform a series of linear fits to the logarithm of the survival function at times t ≤ t * , with different choices of t * .The parameter k of the best fit is taken as the kinetic rate, and the MFPT is estimated as k −1 .This enables accurate estimations of the MFPT, even with frequent bias deposition or a suboptimal CV. We first demonstrate the advantages of ST-iMetaD using the Wolfe-Quapp potential, showing that we can use higher bias deposition rates, providing higher speedups, with minimal penalty in the inference accuracy.Figure 2(a) shows the estimated MFPT as a function of the speedup, using a good CV and different bias deposition rates in the range of 10 to 1000 ns −1 .When employing standard iMetaD (orange circles), the estimated MFPT increases with speedup, reaching a value that is an order of magnitude larger than the true one at high speedups.On the other hand, when employing ST-iMetaD (pink squares), we obtain estimations within 40% of the true value, for all speedups.
Our scheme also improves the inference from simulations performed with suboptimal CVs.For a fixed bias deposition rate of 200 ns −1 , we gradually reduce the quality of the CV by rotating it with respect to the good CV. Figure 2(b) shows that the estimated MFPT increases as the quality of the CV decreases, for both inference schemes.However, the deviation from the true value is much smaller for ST-iMetaD.The errors remain within an order of magnitude of the true value, in comparison to more than two orders of magnitude for standard iMetaD, even for very poor CVs.We note that with ST-iMetaD, the predicted MFPT may also be slightly underestimated, but remains close to the true value.
We next apply ST-iMetaD in two molecular systems, starting with the well-studied example of alanine dipeptide in vacuum.Alanine dipeptide has two stable conformers, C 7eq and C 7ax , and is usually described by two dihedral angles, ϕ and ψ, with ϕ serving as a good CV and ψ as a suboptimal one 1,2,18,22 (see Ref. 1 for definitions of conformers and angles).
Transitions from the C 7eq conformer to the C 7ax conformer have an estimated MFPT of ∼ 3.5 µs (see SI for more details).We performed MetaD simulations with bias deposition rates in the range of 20 to 1000 ns −1 , and either the ϕ or ψ angle as CV.Full simulations details are given in the SI.
Figure 3(a) shows the estimated MFPT as a funcion of the speedup for simulations biasing the ϕ angle, through the original iMetaD scheme (orange circles) and ST-iMetaD (pink squares).The unbiased MFPT is given in a dashed blue line.The two schemes provide similar, very accurate estimations, even with frequent bias deposition.This confirms that ST-iMetaD is consistent with standard iMetaD, when the latter is reliable.
On the other hand, when ψ, a suboptimal CV, is employed we find a major difference between the schemes, as demonstrated in Figure 3(b).With standard iMetaD (orange circles), the estimated MFPT rapidly increases with speedup (notice the logarithmic scale).It varies from a value of 15.8 µs at a bias deposition rate of 20 ns −1 up to 9930 µs at a rate of 1000 ns −1 , an error of more than three orders of magnitude.However, with ST-iMetaD, we obtain estimations that fall in a much smaller range, 2.0 − 13.6 µs, within an order of magnitude from the true value.We validate the underlying assumptions of ST-iMetaD by examining the survival functions.Panels (c) and (d) of Figure 3 show the survival functions for the unbiased FPT distribution (solid blue lines), the rescaled FPT distributions (dashed green lines), the fit of iMetaD (dash-dotted orange lines) and the fit of ST-iMetaD (pink dotted lines).Results are shown for simulations with a moderate bias deposition rate of 50 ns −1 , which is standard for iMetaD. 1,2The MFPT estimations using this rate are highlighted with black rectangles in panels (a) and (b).Using the ϕ angle as CV, the rescaled survival function decays at a similar rate as the ubniased one (panel (c)).We determine t * = 6.3 µs using the procedure described before, and the two fits coincide.Using the ψ angle as CV, the rescaled survival function quickly deviates from the ubniased one (panel (d)).Nevertheless, they match at short times, as visible in the inset, which zooms-in on t ≤ 0.5 µs.We determine a value of t * = 184 ns, and reduce the error in the predicted MFPT by a factor of ∼ 6.5.
We close this paper with a more complex example -the unfolding of chignolin in explicit water (simulations of 5889 atoms).7][38] A linear combination of six interatomic contacts, optimized via harmonic linear discriminant analysis (HLDA) by Mendels et.al., 25 serves as a good CV.The radius of gyration (Rg) and the C-alpha rootmean-square deviation from a folded configuration (RMSD) serve as examples of suboptimal CVs.All simulation details are provided in the SI.
Figure 4(a) gives the free-energy surfaces (FES) along all CVs, obtained from umbrella sampling simulations (see SI for details) using the weighted histogram analysis method (WHAM). 39The values of the CVs at the initial folded configuration are marked with black stars.We define the first-passage criterion as reaching a value < 0.8 nm for the HLDA-based CV (dashed black line).This process has an estimated MFPT of ∼ 376 ns (see SI for details).
We note, however, that the dynamics for reaching a stable unfolded state (HLDA< 0.2 nm) leads to a MFPT that is longer by an order of magnitude. 22However, since the underlying assumptions of iMetaD are valid for escaping a single energy well, we limit our discussion to overcoming the first energy barrier along the HLDA-based CV.In addition, we note that the suboptimal CVs do not have a second minima in the FES for the stable unfolded state.Moreover, they do not fully distinguish between the unfolded and folded states.Therefore, even we biasing the suboptimal CVs, we use the value of the HLDA-based CV to determine the FPTs.The dotted lines in the middle and right panels of Figure 4   with bias deposition rates in the range of 1 to 50 ns −1 .We observe similar trends as in the previous examples.In all cases, we find that ST-iMetaD leads to a better tradeoff between speedup and accuracy.Most notably, it is able to predict the MFPT rather successfully even for deposition rates where the standard approach leads to large errors.
To summarize, we present ST-iMetaD -an improved inference scheme for iMetaD simulations.We find that the rescaled FPT distribution provides the correct short-time statistics, even for high bias deposition rates and suboptimal CVs.By focusing on these timescales, the time-independent rate of Poisson processes can be estimated reliably, resulting in a better tradeoff between speedup and accuracy in predicting the unbiased MFPTs.
The benefits of ST-iMetaD are demonstrated for the Wolfe-Quapp potential and two molecular systems, an isolated alanine dipeptide molecule and chignolin in explicit water.It reduces the prediction errors by orders of magnitude, especially for simulations with frequent bias deposition or suboptimal CVs.As a result, our method significantly extends the range of applicability of iMetaD, though it will eventually also break for unrealisticly high deposition rates or exceedingly bad CVs.The ST-iMetaD scheme can be applied in post-processing of existing iMetaD data, with no additional cost in comparison to the standard approach.
Since it leads to either improved or comparable accuracy as iMetaD, by construction, we hope it would be widely adopted by the community.Furthermore, the inference scheme is not limited to iMetaD, and could be applied to any enhanced sampling approach based on the iMetaD rescaling scheme, such as OPES-flooding 22 or Variational Flooding. 40,41

Figure 1 :
Figure 1: (a) CDF profiles and (b) survival functions for simulations of the Wolfe-Quapp potential.Results for unbiased FTPS (blue solid lines), rescaled FPTs (dashed green lines), exponential fits to the rescaled CDF in the entire range (orange dash-dotted lines), and linear fits to the survival functions at t < t * (pink dotted lines).Results are shown for iMetaD simulations using a good CV and bias deposition rate of 10 ns −1 (left), a good CV and bias deposition rate of 1000 ns −1 (middle) and a suboptimal CV and bias deposition rate of 200 ns −1 (right).The black dashed lines mark t = t * . (b)).

Figure 2 :
Figure 2: (a) Estimated MFPTs as a function of the speedup for the Wolfe-Quapp potential using a good CV and different bias deposition rates from 10 to 1000 ns −1 .(b) Estimated MFPTs for a bias deposition rate of 200 ns −1 , and different choices of CV.The blue lines mark the unbiased MFPT value.We employed either standard iMetaD (orange circles) or ST-iMetaD (pink squares).The black ellipses highlight overlapping points in the two panels.

Figure 3 :
Figure 3: Upper row: Estimation of the MFPT as a function of speesup, for the C 7eq − C 7ax conformer transition of alanine dipeptide in vacuum.Simulations using either (a) the ϕ angle or (b) the ψ angle as CV, with the originial inference scheme (orange circles) or the improved one (pink squares).The black rectangles highlight result obtained with bias deposition rate of 50 ns −1 .Lower row: Survival functions log (1 − P (τ ≤ t)) for unbiased simulations (blue solid lines) and rescaled iMetaD simulations (green dashed lines), biasing either the (c) ϕ angle or (d) ψ angle, with bias deposition rate of 50 ns −1 .Additional lines show the fit obtained using the original scheme (orange, dash-dotted) and the improved scheme (pink, dotted).The black dashed lines mark t = t * .The inset in panel (d) zooms-in on t < 0.5 µs.
(a), show the average values of those CVs when the first-passage criterion is fulfilled.

Figure 4 :
Figure 4: (a) Free-energy of solvated chignolin along the HLDA-based CV (left), the radius of gyration (middle) and the C-alpha RMSD (right).The vertical dashed line marks the first-passage criterion.The dotted vertical lines mark the average radius of gyration and Calpha RMSD at first-passage events.(b) Estimated MFPTs obtained with standard iMetaD (orange circles) and ST-iMetaD (pink squares), using the HLDA-based CV (left), the radius of gyration (middle) and the C-alpha RMSD (right).

Figure 4 (
Figure 4(b) shows the estimated MFPT as a function of speedup, using the different CVs,