ANALYSIS OF THE ACCELERATED WEIGHTED ENSEMBLE METHODOLOGY

The main issue addressed in this note is the study of an algorithm to accelerate the computation of kinetic rates in the context of molecular dynamics (MD). It is based on parallel simulations of short-time trajectories and its main components are: a decomposition of phase space into macrostates or cells, a resampling procedure that ensures that the number of parallel replicas (MD simulations) in each macro-state remains constant, the use of multiple populations (colored replicas) to compute multiple rates (e.g., forward and backward rates) in one simulation. The method leads to enhancing the sampling of macro-states associated to the transition states, since in traditional MD these are likely to be depleted even after short-time simulations. By allowing parallel replicas to carry different probabilistic weights, the number of replicas within each macro-state can be maintained constant without introducing any bias. The empirical variance of the estimated reaction rate, defined as a probability flux, is expectedly diminished. This note is a first attempt towards a better mathematical and numerical understanding of this method. It provides first a mathematical formalization of the notion of colors. Then, the link between this algorithm and a set of closely related methods having been proposed in the literature within the past few years is discussed. Lastly, numerical results are provided that illustrate the efficiency of the method.

1. Introduction.Computation of reaction rates in the context of molecular dynamics is a well-studied issue and has given rise to an outstanding number of publications in the last few years (see [4] for a selective bibliography).As a consequence, there exists a wide range of methods devoted to such computations.In this note, we will concentrate on the analysis of one of these, the Accelerated Weighted Ensemble (AWE), the principle of which is made precise below.
1.1.Definitions of the reaction rate.From a mathematical viewpoint, there are two ways of defining the phenomenological reaction rate in the context of molecular dynamics.In the general case, the dynamics of the underlying system is modeled by a stochastic process X := (X t ) t∈T on a probability space (Ω, P) with state space E. The notation T refers to the set of time indexes.Process X is associated to some underlying linear operator L whose definition is related to those of T and E.
A first definition of the reaction rate can be formulated in terms of eigenvalues and eigenvectors of the operator L. Under appropriate assumptions on process X, the reaction rate can be related to the second eigenvalue of L and subsets of E, corresponding to metastable sets of the dynamics (for instance, reactant and product states) can be identified.This method is very general and provides an intrinsic characterization of metastable sets.For a high-dimensional space X however, this is too costly to be applied directly.
A second definition of the reaction rate relies on computations of statistics over trajectories of process X.The rate is defined as a flux of replicas originating from a reactant state A through the boundary of a product state B (both are subsets of E) or, equivalently, as the average number of transitions from A to B achieved by a single replica of the system.Such a definition suffers from major drawbacks as well.Indeed, when one is interested in reactions occurring at very slow rates, it requires simulations of long-time trajectories, that, even if possible, result in estimators with unacceptably large variances.
1.2.Partition of the state space.To cope with our inability to generate sufficiently long trajectories with good statistics, numerous computational methods have been elaborated.They aim at computing the reaction rate by using short-time local simulations that can be run in parallel and usually involve an underlying coarse partition of the state space where N S denotes the number of so-called macro-states.Once again, we refer to [4] for a comprehensive survey.Among these ones, the class of Markov State Models (see [13,12,11]) is perhaps the most popular and illustrative example.Its principle is summarized next: First one builds an approximate transition matrix P = (P ij ) 1≤i,j≤N S from short-time simulations of the original system starting from equilibrium initial distributions: where τ is referred to as the lag time, and ρ is the equilibrium distribution associated with process X. Next, denoting by {µ k (τ )} 1≤k≤Ns , µ k > µ k+1 , the set of ordered eigenvalues of matrix P, the reaction rate is defined by ν The estimated reaction rate is thus defined as the exact reaction rate associated with the coarse Markov chain with transition matrix P(τ ).A rigorous analysis of the error for such models can be found in [10].We recall that it embodies two sources of errors: • the assumption that the reduced dynamics is representative of the underlying microscopic dynamics; this is the case only for a very special choice of macrostates (see [14] for a related discussion); • the parameter τ , whose value needs to be carefully chosen.In general, small values of τ result in a bias due to memory effect (see [12,10] for example), whereas large values result in an increase of the variance (see [4]).
1.3.Weighted Ensemble (WE) methodology.The WE methodology was first introduced in [9].It is also a way of substituting parallel short-time simulations for long-time ones.However, it does so in a manner that gets rid of the sources of error mentioned above.In [9] the WE methodology was first applied to the case of systems out of equilibrium.The authors consider a set of replicas whose initial positions lay within the reactant state A. These replicas are then run independently.Each time a replica touches B, it is killed and a new replica is issued in A according to an appropriate probability distribution.The reaction rate is defined as the flux of particles entering B per unit of time.
The main feature of the method consists in the introduction of a resampling procedure that occurs at fixed times t k = kτ where τ is referred to as the resampling time.This step enhances sampling of macro-states with large free energy that would naturally tend to be depleted otherwise.This is achieved by allowing replicas to carry different probabilistic weights and by defining splitting and merging rules for depleted and overpopulated macro-states respectively, whereby the number of replicas per cell is approximatively kept constant after some relaxing time.The method was then extended and applied to the case of systems in a steady state (see [1,2]).Such an extension is referred to as the Accelerated Weighted Ensemble (AWE).The key feature of AWE (compared to WE) is that the initial weights of macro-states are computed as approximations of the steady-state weights.This speeds up the convergence by reducing the initial bias.

A new version of AWE.
A modified version of the AWE methodology was introduced in [4].This algorithm builds on both WE and AWE.This methodology combines two features: the notion of colors and a novel resampling algorithm.The resampling procedure is different from the one involved in WE and AWE; it is optimal because it generates replicas with constant weights inside each macro-state [4].The present article is a first step towards a better understanding of this method from both a theoretical and a numerical viewpoint.The main contributions are as follows: • A more rigorous mathematical definition of the procedure along with a discussion of the connection between AWE and committor functions.• A discussion of the connection between AWE and three other related techniques: Markov State Models, Non-Equilibrium Umbrella Sampling, and the Tilting Algorithm.• Numerical benchmarks on alanine dipeptide to illustrate the technique.This note is organized as follows.Section 2 is devoted to the analysis of the notion of colors on which the method relies.It contains a mathematical formalization of this concept as well as a discussion about the convergence of what is identified as an extended system.Section 3 focuses on the resampling procedure and provides numerical insight into the influence of key parameters.The link with other methods from the literature is also discussed.Section 4 illustrates the efficiency of the method by considering the case of a small peptide.reaching B is immediately moved back to A anew, according to some appropriate distribution.In itself, the idea of using colors is not new.It is implicitly contained in the definition of the reaction rate in terms of path averages as found in [8].It is related as well to the decomposition of steady state distributions into forward and backward populations as exposed in [3].In this section we provide an alternative mathematical formalization of this concept.For the sake of clarity, we restrict ourselves to the case of the over-damped Langevin dynamics: with potential function V sufficiently smooth.However, the presentation extends to alternative choices of T , E and L.
2.1.New dynamics.The main idea of our approach consists in introducing the process (I t ) t≥0 , such that, when the replica is blue (resp.red) at time t, I t = −1 (resp.I t = 1).We will therefore consider Fokker-Planck equations corresponding to the extended process (Y t ) t≥0 := (X t , I t ) t≥0 .To provide colors to walkers initially in We also define the initial distribution of positions, namely µ 0 (dx) = i μ0 (dx, i).
Then, we denote by ψμ0 (t, x, i) the probability density function associated with this extended process at time t with initial distribution μ0 .The function ψμ0 (t, x, i) is related to the probability density function associated to process X, namely The function ψ is the solution to the following system of partial differential equations where function ψ is solution to ∂ t ψ − div β −1 ∇ψ + ∇V ψ = 0 on R d , with initial condition ψ(0, •) = µ 0 (the aggregate initial measure on positions).This set of equations can be interpreted in the following way.There are basically two functions ψ(•, −1) and ψ(•, 1) representing each color.At the boundary of A and B they should match the equilibrium distribution ψ: these are conditions 2 and 3 above.
where q is the probability that a replica, being at location x, has the color blue (−1).Inserting these expressions in (3), we obtain Solving system (3) is equivalent to solving the coupled system composed of the classical Fokker-Planck equation for the aggregate probability function ψ and of equation ( 4), the latter equation describing the dynamics of colors.
2.3.New stationary distributions.It is worth noting that the process Y is ergodic as well.As a consequence, there exists a function ψ∞ such that, for all (x, i) ∈ R × {−1, 1}, we have ψμ0 (t, x, i) → ψ∞ (x, i) =: ρ(x, i), when t → +∞.Obviously, we have, for all where ρ denotes the invariant measure associated to process X.The extended invariant measure ρ is in fact a solution of the following system of equations where function ρ is solution to div β −1 ∇ρ + ∇V ρ = 0 on R d .Applying the same trick as above to ρ, that is, rewriting we can recast ( 6) system into a new system, the second equation of which writes This second equation for q ∞ (x) corresponds to the one satisfied by the backward committor function q − (see [8]).The forward committor q + is defined as the probability to reach B before A whereas the backward committor q − is defined as the probability that the system last came from A. We have q − = 1 − q + for reversible systems. 1   2.4.Convergence of conditional distribution.One important fact with this framework is that we can decouple the convergence of ψ(t, x, 1) + ψ(t, x, −1) = ψ(t, x) to ρ(x) from the convergence of q(t, x) to q ∞ (x) (that is the probability to be blue at location x).More precisely, let us assume that we initially distribute replicas according to ρ(x) and then choose using some procedure whether a replica is red or blue.With this approach we immediately have ψ(t, x) = ρ(x).Therefore the convergence is given by q(t, x) and more precisely by the convergence of the following equations: Note that in these equations we have ρ(x) instead of ψ(t, x) (see Eq. ( 4)).As a result the convergence is given by the second value of the operator L but set on a different domain, namely R d \(A ∪ B), with "holes" around A and B. The eigenvector corresponding to this eigenvalue defines a significantly less metastable set than that associated with the eigenvalue of the original system.Therefore the convergence (relaxation time) is much faster than for ψ(t, x).

3.
A new resampling procedure.We now present the main steps of the resampling algorithm [4].First, positions, colors and weights of replicas are initialized.
At time t = 0, N C replicas are created in each macro-cell, whose initial positions are generated according to ρ k (dx) := ρ(dx)/ρ(S k ), namely the equilibrium conditional distribution.Then, as explained above, we need to specify how the M walkers should be provided a color at time t = 0. Obviously, the ones in A and B are blue and red respectively.The difficulty concerns the replicas whose initial positions lay into R d \(A ∪ B).Since, the number of replicas of both colors in each cell is to be maintained constant, the problem amounts to determining which weight should be given to red and blue walkers respectively.A possible choice [2] consists in using steady state weights of macro-states deriving from either 1) a Markov State Model, or 2) the transition matrix computed using some initial sampling data with AWE.
From either transition matrix, we can calculate an approximate steady-state weight distribution for all cells and use these weights to determine the colors of replicas.At the end, in each cell there are N C /2 red walkers (resp.blue) with equal weights proportional to the conditional probability that a walker belonging to that cell be red (resp.blue).At this stage, we have a collection of M = N S N C independent weighted and colored particles represented by . The trajectories of these replicas are then run independently till time t = τ .At this time the resampling procedure is applied to both blue and red replicas.The number of blue (resp.red) ones N q b,τ (ω) (resp.N q r,τ (ω)) in each macro-state S q is computed, as well as the corresponding weight where b stands for blue and is associated with I k τ (ω) = −1.Then the algorithm proceeds as follows.In each cell a target weight for blue replicas is computed: for all 1 ≤ q ≤ N S , Then, supposing that X k τ ∈ S q , we create on average σ k 0 (ω)/ ŵq b (ω) new replicas with equal weight ŵq b (ω), with initial positions X k τ (see [4] for details).This is achieved by applying either a splitting or merging procedure.Note that this procedure is different from [9] since in [9] the weights of particles after resampling are not constant in each cell but have some fluctuations.In contrast here all particles have a constant probabilistic weight of ŵq b (ω).These new replicas are then run independently till t = 2τ and so on.Along the simulation, the estimate reaction rate is defined as the average probability flux of blue walkers per unit of time.

3.1.
Relation to other methods.The AWE algorithm is closely related to other methods.We have already mentioned MSM.In that case the main point of difference is that AWE is able to converge to the exact answer, that is it does not rely on the Markovian approximation.This comes at the cost of greater computational time in general since we need to converge the statistics globally (i.e., the macro-states weights).The two methods of course become identical if one only performs a small number of resampling steps, in situations where the statistics are independent of the initial distribution of walkers and the Markovian approximation of MSM is accurate.
When τ = ∆t (a single short time step of the MD integrator), AWE becomes similar to the non-equilibrium umbrella sampling (NEUS) [16,5,6,7] and the tilting algorithm (TA) [15].Both in NEUS and TA, when a replica leaves a macro-state, a new one is immediately created.This is done by recording the location where replicas leave macro-states, and then choosing randomly, with some appropriate probability, from this dataset of "re-entry" points.This allows a correct unbiased calculation.This is typically done in an out-of-equilibrium setting where for example replicas in region A have a weight of 1 and absorbing boundary conditions (weight of 0) are used for B. The two methods differ by the way the probabilistic weights of the macrostates are obtained.
When τ = ∆t, AWE performs a resampling at every step.For most steps, no replica leaves or enters the macro-state and therefore nothing is done.However, when a replica enters or leaves, the resampling makes changes to the replicas.This aspect is similar to NEUS and TA.However the procedure is then different.This is illustrated in Figure 1.
Let us start with case when a replica exits a macro-state.In that case, the resampling algorithm randomly chooses a replica inside the macro-state and splits it.This, in effect, corresponds to a process in which a replica is allowed to randomly jump inside the macro-state when it reaches the boundary.Note how this is different from NEUS and TA where we always randomly choose a new replica on the boundary.Clearly if our process was limited to this, this would lead to a bias whereby boundary states would not be sampled appropriately (the probability density on the boundary would be lower than it should).
Consider now the case where a replica enters a macro-state.In that case the resampling algorithm chooses two options: either a replica inside the macro-state is killed (this corresponds in effect to a replica inside the macro-state jumping to the boundary), or the entering replica is killed and the probabilistic weights of the remaining replicas are adjusted.This process corrects for the bias mentioned in the previous paragraph, such that the sampling becomes unbiased and follows the correct steady-state probability.
AWE is simpler to implement than NEUS and TA since it does not require storing a database of re-entry points (position of replicas when they exit a macro-state).The correct sampling is obtained simply by adjusting the probabilistic weights of replicas and our algorithm for merging/splitting replicas.

3.2.
Influence of the parameters.The algorithm described above mainly depends on two parameters: the coarse partition of the state space E and the resampling time τ .To our knowledge, there exists no comprehensive analysis about the influence of such parameters in the context of weighted ensemble methodologies.To what extent are they likely to impact the efficiency of the method?Let us focus on the case of parameter τ .We consider a toy model broadly investigated in the literature.The system is driven by the over-damped Langevin dynamics defined above with potential function V (x, y) = (5/2) (x 2 − 1) 2 + 5y 2 , and Neumann boundary conditions.The value of parameter β is fixed to β = 2.6.The dynamics is discretized using an Euler-Maruyama scheme with time step ∆t = 0.25.At each time the approximate reaction rate is computed as an average over a window of width 100∆t.We have represented on Figure 2, different approximations of the reaction   From Figure 2, the choice of τ impacts the efficiency of the method measured in terms of confidence interval width.Note, indeed, that when τ → +∞ no improvement can be expected since the method coincides with brute force simulations.More interestingly, when τ → 0, the gain seems to reach a plateau.The value of parameter τ at which such a phenomenon occurs depends both on ∆t and the underlying partition.See also in the limit case when τ = ∆t the connection of AWE with related methods.4. Numerical test.In this section, the algorithm described in Section 3 is applied to alanine dipeptide, a small peptide that consists of 22 atoms.Its free energy landscape is represented on a two dimensional plane with two dihedral angles denoted by φ and ψ (Fig. 3).The partition of state space is made of 144 cells, that correspond, in terms of φ and ψ, to squares of width 30 degrees.The slowest reaction within this system is the transition from configuration C 7eq to configuration C 7ax , both identified on Figure 3 as collections of macro-states.These configurations correspond to subsets A and B mentioned above.Resampling was applied every τ = 10 ps.The constant target number of walkers per cell is N C = 30.Between resampling steps, the dynamics of alanine dipeptide is simulated by some discrete approximation of Langevin dynamics at temperature T = 300 K as implemented in Gromacs 4.5.3, with force field Amber96 and OBC Generalized Born implicit solvent.4.1.Convergence of fluxes.The convergence of the method is illustrated on Figure 4. On each subfigure, the blue curve corresponds to the estimation of the reaction rate originating from AWE.The red curve refers to some reference value computed from long-time brute force simulations (more precisely 100 independent trajectories on a 100 ns-long time interval).The AWE simulation was run over a  Another means to check the validity of the approach is to compare with Markov state models (MSM) predictions.In this case, we chose small macro-states such that MSM produces reliable predictions and that the error due to non-Markovity is small.We computed the transition matrix using the following formulas (L is the total length of the simulation): where χ i is the characteristic function of macro-state i; a replica is denoted (n, s) (replica s at step n); r k n,s is the position of replica r 0 n,s at step n − k; w n,s is the weight of a replica.Then the transition matrix with lag time kτ is: The rate is then given by the second eigenvalue µ 2 and − ln µ 2 /τ .On Figure 5, the rates are compared when N S = 144.In both cases, the rate converges to 9.6/ns.The values are close from each other, and match approximately with the reference flux from C 7ax to C 7eq (estimated from brute force simulations independently from any MSM), that is 5.88/ns.Note that − ln µ 2 /τ is not expected to match exactly the flux since this requires that 0 < µ 3 µ 2 < 1.For AWE, at each resampling, we computed the transition matrix using the AWE walkers and estimated the second eigenvalue µ 2 .The convergence corresponds to the relaxation of the weight of the cells to the steadystate.It shows that the rate converges around 900τ .

Conclusion.
In this note, elements of analysis of the version of AWE introduced in [4] have been given.First, a mathematical definition of the concept of colors has been provided.Then, the influence of the frequency at which resampling occurs has been pointed out, as well as the connection between AWE methodologies and alternative methods.Finally, the good behavior of the algorithm has been illustrated on a simple but realistic case.

Figure 1 .
Figure 1.Special case of the general AWE algorithm with a resampling at every step.This figure illustrates how the resampling process leads to replicas jumping in (replica entering the macrostate) and out (replica exiting the macro-state) of the boundary.
Jump to ∂Ci from C i Jump to C i from ∂C i

Figure 2 .
Figure 2. Estimations of fluxes and related confidence interval for several values of parameter τ

Figure 3 .
Figure 3. Ramachandran plot of alanine dipeptide with 30 degrees width grid partition.Cool colors stand for regions with high free energy.

Figure 4 . 4 . 2 .
Figure 4. Flux from C 7eq to C 7ax (left) and backward reaction rate (right).The error bars correspond to one standard deviation.

Figure 5 .
Figure 5. Rate of slowest process estimated by − ln µ 2 /τ using MSM and AWE.MSM is assumed to provide the reference value in this case.For AWE, at each resampling, we computed the transition matrix using the AWE walkers and estimated the second eigenvalue µ 2 .The convergence corresponds to the relaxation of the weight of the cells to the steadystate.It shows that the rate converges around 900τ .