An efficient time-stepping scheme for ab initio molecular dynamics simulations

In ab initio molecular dynamics simulations of real-world problems, the simple Verlet method is still widely used for integrating the equations of motion, while more efficient algorithms are routinely used in classical molecular dynamics. We show that if the Verlet method is used in conjunction with pre- and postprocessing, the accuracy of the time integration is significantly improved with only a small computational overhead. The validity of the processed Verlet method is demonstrated in several examples including ab initio molecular dynamics simulations of liquid water. The structural properties obtained from the processed Verlet method are found to be sufficiently accurate even for large time steps close to the stability limit. This approach results in a 2x performance gain over the standard Verlet method for a given accuracy.

In ab initio molecular dynamics simulations of real-world problems, the simple Verlet method is still widely used for integrating the equations of motion, while more efficient algorithms are routinely used in classical molecular dynamics. We show that if the Verlet method is used in conjunction with pre-and postprocessing, the accuracy of the time integration is significantly improved with only a small computational overhead.

The validity of the processed Verlet method is demonstrated in several examples
including ab initio molecular dynamics simulations of liquid water. The structural properties obtained from the processed Verlet method are found to be sufficiently accurate even for large time steps close to the stability limit. This approach results in a 2× performance gain over the standard Verlet method for a given accuracy.

I. INTRODUCTION
Classical and ab initio molecular dynamics (AIMD) simulations are among the most common methods for theoretical studies of complex systems at the atomistic level. In the former approach, the interatomic forces are usually given explicitly by a sum of pair interactions in analytic form, while the latter approach requires quantum-mechanical calculations to evaluate the forces 1,2 . AIMD is gaining popularity in the last decades, because the accuracy of AIMD is generally higher than the classical approach. On the other hand, AIMD is orders of magnitude more expensive, which poses a serious obstacle to its wide use in industry.
In both approaches, the equations of motion for the atoms are typically integrated numerically using the symplectic integrators. This is because these integrators possess the volume-preserving property in phase space, thus leading to the long-term stability of the simulations [3][4][5] . In classical molecular dynamics simulations, symplectic integrators are often implemented in the form of a multiple time step algorithm 6 . This algorithm allows us to use large time steps for computationally expensive long-range interactions, while inexpensive short-range ones are integrated with small time steps. In contrast, single time step symplectic integrators, such as the Verlet method [6][7][8] , are most commonly used in AIMD simulations, because the interatomic forces from ab initio calculations cannot be divided into short-and long-range components exactly.
Theoretical analysis reveals that the time step h must satisfy h < T /π in the Verlet method 9 , where T is the period of the fastest motion in the system. In many of the classical and ab initio studies on real-world problems, however, the time step satisfies h < T /15, which is significantly smaller than T /π. This is mainly because the use of a too large time step leads to artifacts such as large fluctuations in the total energy and violation of the equipartition theorem 10 . Nevertheless, it is empirically known that molecular dynamics simulations using large time steps often yield surprisingly accurate results [11][12][13] . Researchers are making constant efforts to understand and exploit the robustness of various time integrators at large time steps [14][15][16][17][18][19][20][21][22][23] .
In this paper, we investigate a simple extension of symplectic integrators called the processing technique. This algorithm was originally developed by mathematicians several decades ago [24][25][26][27] , but has received little attention to date in spite of its solid theoretical background. As will be shown below, the processing technique allows us to increase the size of time steps by a factor of two at no extra cost, while keeping the accuracy, and thus significantly extends the applicability of AIMD simulations 28 .
The rest of the paper is organized as follows. In Sec.II, we present the basic theory of the processed integrator, and estimate the computational costs associated with processing.
Numerical examples demonstrate the effectiveness of this algorithm for realistic simulations in Sec.III. In Sec.IV, we discuss several possible extensions of the algorithm, and present our conclusions.

A. Verlet method
The classical Hamiltonian for a system of N atoms is given by where q and p are vectors of dimension 3N, representing atomic positions and momenta, M is the mass matrix, and U(q) is the potential energy. Then, the time evolution of any function A(q, p) is given by where the Liouville operator L H is defined by 6 In particular, q(t) and p(t) satisfy the equations of motion, A formal solution of these equations is given by 6 It is easy to prove that the total energy H(q, p) is conserved along the trajectory (q(t), p(t)), i.e., d dt H(q, p) = {H, H} = 0.
In general, Eq.(6) cannot be calculated analytically, and thus must be evaluated numerically.
To this end, L H is first divided into two parts; the kinetic term and the potential term where the force is defined by f = −∂U/∂q. Then, using the Suzuki-Trotter decomposition, we obtain where h is the time step, and W h is given by If the O(h 3 ) terms are neglected, the time evolution of (q, p) can be written as where the subscript denotes the time-step number, i.e., (q n , p n ) = (q(nh), p(nh)). The right-hand side of Eq.(12) can be calculated explicitly 6 , which leads to the (velocity) Verlet method: This integrator is both symplectic and time-reversible. These properties are crucial for longterm stability of the simulations 6 . Moreover, only one force evaluation is required per step in the Verlet method.
It is also possible to use higher-order integration schemes [39][40][41][42] , which require multiple force evaluations per step. These integrators can significantly improve the accuracy for small time steps. However, the Verlet method is more robust at large time steps 41,42 which are commonly used in AIMD to minimize the computational cost. Therefore, the Verlet method is still the method of choice for AIMD. In the following, we focus on the Verlet method, and assume the use of the microcanonical ensemble unless otherwise noted. Extensions to more general cases will be discussed in Sec.IV.

B. Shadow Hamiltonian
When the Verlet method is used to follow the time evolution of the system, the Hamiltonian is no longer a constant of motion, because Eq. (7) holds only approximately. However, any symplectic integrator, including the Verlet method, is known to possess a conserved quantity called the shadow (or modified) Hamiltonian H S (q, p) 43,44 . While the explicit form of H S is not known in general, a series expansion in powers of h is valid under mild assumptions 43,44 : In particular, the lowest-order term H (2) corresponding to the Verlet method is given by 43,44 where the Hessian matrix H is defined by H ij = ∂ 2 U/∂q i ∂q j , and the Hessian-vector product (HM −1 p) can be calculated according to Appendix A. Higher-order terms are also available in the literature 45 .
In Fig.1 [49][50][51][52] . However, as will be shown below, we can go a step further and take advantage of the shadow Hamiltonian to construct an integrator which provides a more accurate trajectory for a given time step.

C. Processed integrator
If we invert Eq.(16), we obtain which implies that the fluctuations of the total energy are dominated by H (2) . Here we show how to construct an accurate integrator by direct optimization of H (2) through the introduction of pre-and postprocessing [24][25][26][27] . This approach allows us to minimize the effect of time-step size, while the computational cost per time step remains the same. In this approach, the time evolution of (q n , p n ) is calculated in three steps: 1. Preprocessing is defined by a symplectic transformation of the form where and H χ is the auxiliary Hamiltonian to be described later. Q n and P n are intermediate variables with no physical meaning.
2. Time integration is performed as follows: where W h is a symplectic approximation to exp(ihL H ), given, e.g., by Eq.(11).
3. Postprocessing is the inverse of preprocessing: The entire propagator Ψ can be expressed as which also preserves the symplectic structure. Then, we can easily show that When an initial set of (q 0 , p 0 ) is given, we first calculate (Q 0 , P 0 ) according to Eq.(19).
The flow of the algorithm is shown in Fig.2.
At this point, we discuss the choice of H χ which appears in the definition of pre-and postprocessing. Assuming that the Verlet method is used for time integration, we adopt the where λ is an arbitrary constant 54 . Then, the lowest-order term of the shadow Hamiltonian corresponding to the processed Verlet method of Eq.(23) is given by 44,45,53 The value of λ should be chosen to minimize the fluctuations of H (2) (λ). To this end, we assume that U(q) is a quadratic function of q, where U 0 is a constant matrix. Then, using f = −U 0 q and H = U 0 , we can prove that Thus, if we choose λ = 1/16, dH (2) /dt = 0 will hold, which is considered the optimal choice in terms of total energy conservation 44,45,53 . In the following, the same value of λ will be used for more general potential functions, including AIMD simulations.

D. Computational cost
By construction, the processed integrator presented in the previous section improves the conservation of the total energy for a given time step, or, alternatively, allows the use of a larger time step for a given accuracy. However, this advantage would be lost if the overhead of processing were significant. Here we compare the computational costs of the Verlet method with and without the processing.
When the standard Verlet method is used to integrate the equations of motion, we obtain (q n , p n ) and the corresponding values of H(q n , p n ) = K(p n ) + U(q n ) at the expense of one force evaluation per step. At first glance, it may appear a significantly more complicated task to calculate the values of (q n , p n ) and H(q n , p n ) at each step of the processed Verlet method. As will be shown below, however, these values can be obtained without additional effort if the postprocessing is carefully implemented.
We first note that the cost of the time-stepping procedure, Eq. (21), is equal to the underlying integrator, i.e., one force evaluation (f(Q n )). Moreover, the preprocessing, Eq. (19), needs to be performed only once to calculate (Q 0 , P 0 ) at the beginning of the simulation, and is canceled out by the postprocessing in subsequent steps. Therefore, the computational cost of preprocessing can be safely ignored. The numerical implementation of the preprocessing is presented in Appendix B. We also note that when we start from random initial conditions, preprocessing may even be omitted altogether 44 .
At variance with preprocessing, the postprocessing is required much more frequently, and thus the numerical integration of Eq. (22) is prohibitive. Instead, we rely on a truncated series expansion in powers of h 53 : While this procedure preserves the symplectic structure only approximately, the errors in (q n , p n ) do not accumulate with n, as is evident from Fig.2. Therefore, the use of Eqs. (29) and (30) does not affect the long-term stability of the integrator. Moreover, the error terms of O(h 4 ) are smaller than those of the Verlet method. We also note that the cost of evaluating Eq. (29) is negligible, since f(Q n ) has already been calculated in the time-stepping procedure.
Therefore, the values of q n are available at no extra cost, which may be used to calculate various properties such as the radial distribution functions.
In contrast, a Hessian-vector product is required to calculate p n using Eq.(30). The cost of this procedure is comparable to one force evaluation, as explained in Appendix A, which offsets the gain from the use of a larger time step. A simple solution to this problem is to use an alternative expression based on the finite-difference approximation 53 : which has larger O(h 4 ) errors, but may be calculated at negligible cost. As will be shown in Sec.III, Eq.(31) is sufficiently accurate for a posteriori analysis of the trajectories obtained from microcanonical simulations.
In order to obtain the values of U(q n ) at each time step with minimal overhead, we propose to use an expansion of the form This method is similar in spirit to the work of Zhang 55 . In what follows, Eqs. (31) and (32) will be referred to as the cheap approximations.
In summary, the processed Verlet method allows us to generate a trajectory (q n , p n , and H(q n , p n )) at the expense of only one force evaluation per step if Eqs. (29), (31), and (32) are used. The accuracy of this method is compared with that of the original method in Sec.III.

A. Harmonic oscillator
We first explore the basic properties of the processed Verlet method using a onedimensional harmonic oscillator which has been studied extensively in the past 56,57 . The Hamiltonian is given by where m and ω denote the mass and frequency, respectively. Then, from Eq. (25), we obtain The corresponding preprocessing can be written as Similarly, the postprocessing is given by Moreover, the Verlet integrator is given by with Now the entire propagator Ψ 1D can be written as with which also preserves the symplectic structure 6 . The exact shadow Hamiltonian corresponding to Ψ 1D is also available: with If we choose λ = 1/16, the O(h 2 ) terms of β vanish, regardless of the value of ω, thus minimizing the violation of energy equipartition arising from the use of a finite time step.
On the other hand, if λ = 0 is used, Eq.(41) reduces to the well-known formula for the standard Verlet method 22,56,57 . In Fig.3(a), we compare the trajectories in phase space with and without the processing. The trajectory of the processed Verlet method is nearly indistinguishable from the exact solution.
We now turn to the dynamical behavior of (q n , p n ) generated by the processed Verlet method. The eigenvalues of Ψ 1D can be written as exp (±ihω), where the modified frequencỹ ω is defined byω This result is the same as that for the standard Verlet integrator 56,57 . In particular,ω does not depend on λ, which implies that the use of processing does not improve the dynamics 44 .
This is also evident from Fig.3(b), where the errors in the amplitude are significantly reduced, while the phase errors remain uncorrected. We note in passing that Eq.(43) imposes a limit on the maximum size of h, as mentioned in Sec.I. Sinceω is a real number, h must satisfy hω < 2, or, equivalently, where the period T is defined by T = 2π/ω. We also note that the use of processing has no effect on the maximum stability limit, which is dominated by the underlying integrator, i.e., the Verlet method. All simulations were started from the same initial conditions (q 0 , p 0 ), which were obtained after equilibration, and lasted for 1.2 ns in the microcanonical ensemble. Preprocessing was performed numerically exactly, while the postprocessing was approximated as discussed in Sec.II D.
We first investigate the accuracy of numerical integration with and without the processing.
In Fig.4, we show the fluctuations of the total energy, average potential energies, and average temperatures, each as a function of time step. The fluctuations were calculated as the standard deviation from a linear fit to the total energy. The drift was negligibly small in all runs. Figure 4 suggests that the numerical accuracy of the Verlet method using a time step of h is comparable to that of the processed Verlet method using 2h. Moreover, the effect of cheap approximation is found to be small for both atomic momentum (Eq.(31)) and potential energy (Eq.(32)). Therefore, the processed Verlet method is more efficient than the original method by about a factor of two, in agreement with previous studies 53 .
Now we compare the structural properties obtained from each run. In Table I, where we set r max = 7.5Å, and the reference values (g ref OO , g ref OH , and g ref HH ) are the results for h = 0.3 fs without processing. The residual errors shown in Fig.5(a) suggest that although the processed Verlet method is more accurate than the standard Verlet method, the improvement is relatively small, being comparable to a time step reduction of only 0.3 fs. To make this point more explicit, we compare g OO (r) for h = 2.1 fs with and without the processing in Fig.5(b). This figure indicates that the intermolecular structure of this system can be described with reasonable accuracy using large time steps near the stability limit, even if no processing is applied. This is mainly because the intermolecular structure of liquid water is relatively insensitive to the intramolecular vibrations, as demonstrated by the success of rigid water models 61,62 .
We now turn to the dynamical properties of this system. To this end, we have calculated the power spectra and self-diffusion coefficients, as shown in Figs.6 and 7. The power spectrum I(ω) is defined by where v i (t) is the velocity of atom i at time t, and the angle brackets denote the autocorrelation function. The self-diffusion coefficient D self is given by 64 As already mentioned in Sec.III A, there is no reason to expect that the dynamical properties are improved by the use of processing. Our results for I(ω) and D self are consistent with this observation. In particular, the high-frequency part of the spectra shown in Fig.6(a) exhibits a substantial blue shift at large time steps regardless of the use of processing, while the low-frequency part (below 1000 cm −1 ) remains the same in all cases. We have found, however, that the peak positions of the spectra in the limit of zero time step can be accurately estimated by a simple correction formula, which corresponds to the inverse of Eq.(43). Here ω 0 denotes the frequency at zero time step. We compare the power spectra with and without the correction in Fig.6(b), where the corrected results show excellent agreement with each other. We also note that Eq. (48) is valid whether or not the processing is applied. Therefore, this procedure is also useful for estimating the correct peak positions in conventional molecular dynamics simulations. As is evident from Fig.7, the self-diffusion coefficient remains nearly constant at small time steps (h < 1 fs), and grows slowly with h at larger time steps in both methods. These results are consistent with the claim that the use of processing has no apparent effect on the dynamical properties 44 .

C. Liquid water: an ab initio study
In principle, the processing technique should be equally valid for AIMD simulations.
However, we are not aware of any previous work in this direction. Here we study the effect of processing on the performance of AIMD simulations for liquid water. Liquid water at The initial conditions for (q, p) were adjusted to give the same total energy as for the experimental masses, and after re-equilibration, data were collected for 30 ps.
In Fig.8, we show the fluctuations of the total energy for all runs. Although the results show some scatter, the processed Verlet method is approximately twice as efficient as the Verlet method, in agreement with the classical case. An additional gain is obtained by the mass scaling method 80,81 . In particular, the accuracy of the processed Verlet method using h = 2.0 fs and the optimized masses is comparable to that of the standard Verlet method using h = 0.6 fs. Moreover, the use of cheap approximations results in only small changes in accuracy. Figure 9 shows the time evolution of the total and potential energies with and without the processing.
We now compare the average lengths of covalent O-H bonds from AIMD simulations in Table II  However, all runs give very similar results for the first peak at 2.8Å. This is not surprising considering the small errors in the classical case shown in Fig.5.

IV. DISCUSSION AND CONCLUSIONS
Thus far, we have focused on the implementation of the processed Verlet method in the microcanonical ensemble. However, the use of a thermostat is often desired in real applications to generate the canonical ensemble 82 . Unfortunately, many of the standard thermostats require the values of instantaneous temperature [6][7][8] , and thus we need to calculate p n on-the-fly using Eq.(30) at each time step. The performance gain from the processed Verlet method is offset by the computational overhead of this procedure.
One possible solution to this problem is to use the stochastic thermostat originally developed by Heyes 7,83 . In this approach, the temperature needs to be evaluated and updated less frequently, say every 10 steps, and during each interval, the equations of motion are integrated in the microcanonical ensemble. Therefore, the cost of calculating the instantaneous temperature is reduced to an acceptable level. The temperature is updated by scaling all atomic momenta by a common factor γ ≈ 1. After the update, we do not need to preprocess the new momenta explicitly thanks to Eq.(B4). The overhead is further reduced if Eq. (31) is used for calculating p n , because the update is often skipped in the Heyes thermostat 83 .
Preliminary results suggest that the canonical ensemble can actually be generated with an overhead of 5-10 %.
Another possible extension of the present algorithm is the higher-order integrator which was first introduced by Rowlands 84 , and later reformulated by López-Marcos et al. 45,53 . This integrator also consists of three steps, as already mentioned in Sec.II C. The time integration step corresponding to Eq.(21) is given by with α = 1/12 85 . The auxiliary Hamiltonian of Eq.(25) should also be extended to include O(h 3 ) terms 45,53 . In order to evaluate the performance of this algorithm, several test calculations have been carried out. The results indicate significant improvement of accuracy for a given time step compared to the processed Verlet method. Moreover, the maximum stability limit is increased by a factor of ≈ 1.5. On the other hand, the computational cost per time step is about twice as expensive as that of the processed Verlet method, because we need to calculate a Hessian-vector product at each step. Therefore, the increase in time-step size is insufficient to compensate for the overhead, and thus the processed Verlet method should be preferred in terms of total performance. If, however, a highly accurate trajectory is required, this approach would be a viable option.
In summary, the processed Verlet method is about twice as efficient as the standard

Appendix A: Hessian-Vector Product
Evaluation of a Hessian-vector product (z = Hc) for any given vector c is required, e.g., in Eqs. (17), (30), and (49). Each element of z can be written as where q denotes the current atomic positions. When U(q) is given by a sum of pair interactions, z can be calculated analytically, and its computational cost is comparable to that of f(q) 45,53 . A more complicated procedure is required for AIMD, because the explicit form of U(q) is unknown. Let us assume that U(q), f(q), and the corresponding ground-state orbitals Φ(q) have already been calculated. Then, z can be evaluated in either of the two ways: by the density functional perturbation theory 90,91 or by the finite-difference method.
In the density functional perturbation theory, we first obtain the change of Φ(q) induced by an infinitesimal displacement of atoms along c, denoted by (∂Φ/∂c), from the iterative solution of the Sternheimer equation [90][91][92] . Then, z can be calculated analytically using Φ(q) and (∂Φ/∂c) 92 . The computational cost of this procedure is comparable to that of calculating f(q) 92 . We note, however, that the numerical implementation of the density functional perturbation theory requires significant programming efforts.
Alternatively, if we calculate the forces at q ± ǫ 0 c, z may be evaluated by the finitedifference method: where ǫ 0 is a small positive constant. At first glance, the second approach may look twice as expensive as the first. However, the calculation of f(q − ǫ 0 c) is much cheaper than that of f(q+ǫ 0 c), because an extremely good initial guess for Φ(q−ǫ 0 c) is available from Eq.(C4) 93 .
Therefore, the total cost of this approach is also comparable to one force evaluation. For simplicity, we have adopted the finite-difference approach throughout this study.
In either case, the directional derivative, is obtained as a by-product, which may be utilized to enhance the initial guess, as explained in Appendix C.
The Hessian-vector product appearing in Eq.(B3) can be calculated according to Appendix A. These equations can be integrated numerically by any standard method 94 . A fully converged solution (Q 0 , P 0 ) = (q(h), p(h)) is typically obtained at the expense of 10-20 force evaluations.
It is worth noting that when Eq.(B1) holds, so does for any constant γ, because Eqs.(B2) and (B3) are separable. This property is useful for scaling the atomic momenta when the temperature is controlled by a thermostat.

Appendix C: Initial Guess
The computational cost of AIMD simulations is dominated by an iterative procedure for calculating the ground-state orbitals Φ n = Φ(q n ) for each n 1,2 . Therefore, it is important to start from a good initial guess for Φ n to minimize the computational effort. To this end, we present several possible approximations to Φ n+1 , denoted by Φ init n+1 . For simplicity, we limit ourselves to non-metallic systems here, and (q, p) should be replaced by (Q, P) when necessary. Assuming that q n±1 = q n ± hv n + h 2 2 a n (C1) with v n = M −1 p n and a n = M −1 f n , we can show that where ∂Φ/∂v and ∂Φ/∂a are given by Eq. (A3). The simplest choice, is sometimes useful, but far from satisfactory 95 . A more reasonable initial guess is given by which is robust and more efficient. Similarly, higher-order extrapolation formulae can be derived by using Φ n−2 , Φ n−3 , ... 95 . While these formulae give better performance at small time steps, instabilities occur at large time steps 95 . Therefore, we usually use Eq.(C4) in our AIMD simulations.
If, however, the derivatives of Φ are available, we can derive another set of formulas which works well for large time steps. For instance, when the postprocessing for atomic momenta is performed on-the-fly using Eq.(30), we need to calculate Hv n explicitly. Then, we can eliminate the O(h 2 ) term by making use of (∂Φ/∂v) n which is obtained as a by-product: Alternatively, when Eqs. (49)(50)(51) are used to integrate the equations of motion, we need to calculate Ha n at each time step. Then, we can exploit (∂Φ/∂a) n to estimate the initial guess with reduced O(h 2 ) errors: When these advanced extrapolation schemes are used, the number of iterations required for electronic structure calculations is reduced by 10-30 % compared to Eq.(C4). A similar approach may also be useful for geometry optimization problems 92 .      Same notation as in Fig.4.