Assessment and optimization of the fast inertial relaxation engine (FIRE) for energy minimization in atomistic simulations and its implementation in LAMMPS

In atomistic simulations, pseudo-dynamical relaxation schemes often exhibit better performance and accuracy in finding local minima than line-search-based descent algorithms like steepest descent or conjugate gradient. Here, an improved version of the fast inertial relaxation engine ( FIRE ) and its implementation within the open-source atomistic simulation code LAMMPS is presented. It is shown that the correct choice of time integration scheme and minimization parameters is crucial for the performance of FIRE .


Introduction
Numerical optimization [1,2] is of utmost importance in almost every field of science and engineering. It is routinely used in atomistic simulations in condensed matter physics, physical chemistry, biochemistry, and materials science. There, the optimized quantity is usually the potential energy E(x), for a given interatomic interaction model [3]. Minimizing E(x) with respect to the atomic coordinates x yields 0 K equilibrium structures and energies, e.g., of defects. Minimum energy configurations can, furthermore, be used as initial states for subsequent molecular dynamics (md) simulations or normal-mode analyses [4]. Energy minimization is also used to determine the stability of structures under load. Two typical examples are the computation of the Peierls stress required for dislocation glide [5], and the determination of the critical stress intensity factor required for crack propagation [6]. Other uses of energy minimization methods in atomistic simulations include the search for transition states, e.g. by the nudged-elastic-band (neb) method [7], or the detection of transitions in accelerated md methods like parallel-replica dynamics or hyperdynamics [8].
fire is often used in atomistic simulations of mechanical properties of metals and alloys [6,18], ceramics [19], polymers [20], carbon allotropes [21], amorphous materials [22] and granular media [23], as well as in simulations related to catalysis [24] or docking [25]. The strict adherence to force minimization in fire makes it ideally suitable for critical point analysis in translational invariant systems like for the determination of the Peierls stress of a dislocation [5,17], where line-search-based descent algorithms often fail. Furthermore, fire has been shown to be a convenient algorithm for mapping basins of attraction, as it avoids unusual pathologies like disconnected basins of attraction that can appear, e.g., using the l-bfgs method [26]. fire was also shown to be a fast and computationally efficient minimizer for neb [7], as well as for the activation-relaxation technique (art) [27].
Here, we study the influence of the numerical integration scheme and the choice of parameters set (mixing coefficient, initial timestep, maximum timestep, etc.) on the efficiency of fire for different scenarios. We furthermore suggest a modification of the fire algorithm to improve its efficiency and describe our implementation of this modified version fire 2.0 in the atomistic simulation code lammps [9].

The algorithms
2.1. fire Consider a system of N particles with coordinates x ≡ (x 1 , x 2 , . . . , x 3N ) and mass m. The potential energy E(x) depends only on the relative positions of the particles and can thus be envisioned as a (3N − 6)-dimensional surface or "landscape". The principle of fire is to perform dynamics which allow only for downhill motion on this landscape, with the accelerationv Here, t denotes time, v(t) the velocity of the particles (v(t) ≡ẋ(t)), F(x(t)) the force acting on them, i.e., the gradient of the potential energy (F(x(t)) = −∇E(x(t))), and γ(t) a scalar function of time. Boldface quantities denote vectors, hats indicate unit vectors, and | . . . | is the Euclidean norm of the enclosed vector. The first term on the right hand side in equation 1 represents regular Newtonian dynamics. The effect of the second term is to reduce the angle between v(t) and F(x(t)), which is the direction of steepest descent at x(t). Uphill motion is avoided by computing the power P (t) = F(x(t)) · v(t) and setting the velocity to zero whenever P (t) ≤ 0. It was shown that combining equation 1 with an adaptive time stepping scheme yields a simple and competitive optimization algorithm [17]. In practice, equation 1 is implemented by "mixing" v(t) and F(x(t)), using an adaptive mixing factor α(t) ∈ [0, 1]. The algorithm can then be written as proposed in Algorithm 1.
if P (t) > 0 then 9: if N P >0 > N delay then 12: ∆t ← min(∆tf inc , ∆t max ) Calculate x(t + ∆t), v(t + ∆t), F(x(t + ∆t)), E(x(t + ∆t)) MD integration 22: t ← t + ∆t 23: if converged then In Ref. [17], it was suggested that fire can be used in conjunction with any common md integrator. However, fire implements a variable time-stepping scheme to speed up the descent. Therefore, the integrator must be robust against a change of timestep during integration. For example, a simple Euler explicit integration scheme is not suitable. Symplectic schemes like Euler semi-implicit (also called symplectic Euler ), Leapfrog or Velocity Verlet are more robust against varying timesteps [28][29][30]. Similarly, the recent work by Shuang et al. highlighted the importance of a suitable integration scheme for fire [31]. The choice of an adequate integrator for fire 2.0 will be presented and discussed in this manuscript.
An important principle of fire [17] is to set the velocity to zero as soon as P (t) is not positive anymore, that is P (t) <= 0. However, that is numerically impossible, leading to overshooting. Due to discrete time integration, the system will have already gone uphill before P (t) < 0 is detected. One could correct overshoot by moving backwards for one entire step ∆t and then re-starting the motion at time t − ∆t. This will undo the uphill motion as expected, but could keep the trajectory too far from where P (t) = 0. A less aggressive correction is to move backward for half a timestep (0.5∆t).
The algorithm of fire 2.0 can be written as proposed in Algorithm 2, with the modification from Algorithm 1 highlighted in blue. 1 .

Time integration scheme
Historically, fire has been developed for the md code imd [32], which implements a Leapfrog integrator for both dynamics and quenched-dynamics simulations. Thus, the published algorithm implicitly used Leapfrog, and the effect of this choice on fire has not been addressed yet. In the md code lammps [9], fire doesn't use the same md integrator that is used for regular dynamics (Velocity Verlet), but a dedicated integrator. In the current implementation (12 Dec 2018) this is the Explicit Euler method. Explicit Euler integration is not commonly used in classical md, where the requirement for energy conservation over long time periods suggests symplectic integrators [30,33]. To investigate the influence of the integrator, we implemented Euler Explicit (Algorithm 3), Euler Semi-implicit (Algorithm 4) and Velocity Verlet (Algorithm 6) methods. See Appendix B for the source code.
In addition, we also considered the Leapfrog (Algorithm 5) integration scheme which differs from Euler semi-implicit only in the initialization of velocities. Since the velocities are reset to zero at the beginning of the pseudo-dynamic run and also periodically during the run in fire 2.0, it turns out that both integrators are almost identical, as also confirmed by preliminary simulations. Therefore, the Leapfrog integrator is not considered for assessing fire 2.0 in this manuscript 2 .

Correcting uphill motion
This correction is indicated in Algorithm 2, and referred to as halfstepback in the lammps implementation.
1 Note that the time t of v(t) and x(t) in algorithm 2 correspond to an MD integration with the Euler and Velocity Verlet methods. It has to be slightly adapted for the Leapfrog integration method, since the evaluation of v and x are not synchronized. 2 The source code of the Leapfrog integrator is present in the implementation of fire 2.0 in lammps for testing purposes only (See Appendix B). It is accessible by using the keyword leapfrog for the argument integrator, see Tab. 1 Algorithm 2 fire 2.0 1: Initialize x(t) and F(x(t)) 2: v(t) ← 0 3: α ← α start 4: ∆t ← ∆t start 5: N P >0 ← 0 6: N P ≤0 ← 0 7: for i ← 1, N max do 8: if P (t) > 0 then 10: if N P >0 > N delay then 13: ∆t ← min(∆tf inc , ∆t max ) 14: Correct uphill motion 29:

Adjustments for improved stability
The first adjustment consists of delaying the increase of ∆t and decrease α(t) for a few steps after P (t) becomes negative. The second adjustment is to perform the mixing of velocity and force vectors (v → (1 − α)v + αF(x)|v|) just before the last part of the time integration scheme, instead of at the beginning of the step. Note that this modification has no effect if fire is used together with the Euler explicit integrator.

Additional stopping criteria
An additional stopping criteria has been implemented in fire 2.0 in order to avoid unnecessary looping, when it appears that further relaxation is impossible (stopping return value MAXVDOTF in lammps). This could happen when the system is stuck in a narrow valley, bouncing back and forth from the walls but never reaching the bottom. The criterion is the number of consecutive iterations with P (t) < 0. Minimization is stopped if this number exceeds a threshold (vdfmax in the lammps implementation).
We would like to comment on the force-based stopping criterion. While threshold defined for the minimization is usually not mentioned in the literature, the exact definition of the threshold is strongly related to the code. lammps uses the f2norm criterion that corresponds to the Euclidean norm of the 3 × N force vector. Other codes might use less strict criteria, like the maximum force component acting on any atom, or the maximum force component per degree of freedom of the system. On overall and to compare the different degrees of relaxation that can be achieved, it is important to note that the f2norm criterion considered by lammps can be several order of magnitude stricter than the others. This has to be considered when comparing systems relaxed with different codes and the exact criterion should be reported in publications. integer (2000) Exit after vdfmax consecutive iterations with P (t) < 0 halfstepback yes, no (yes) yes activates the inertia correction initialdelay yes, no (yes) yes activates the initial delay in modifying ∆t and α These commands instruct lammps to perform energy minimization until f2norm falls below 10 −6 eV/Å or 10,000 force evaluations have been reached. Velocity Verlet integration is used and the maximum timestep is 0.012 ps.
5. Assessing fire 2.0 for typical applications in material science

Typical optimization problems in material science
To assess the implementation of fire 2.0 in lammps, we use eight test cases (See section 5.3) that address the following common problems in material science: -Simultaneous relaxation of long range strain fields and short range disturbances (cases 1, 3 and 4).
-Relaxation of electrostatic interactions with short range rearrangements and atoms of different mass (case 2).
-Relaxation of short and long range stress fields with a strongly directional atomic bonds (case 5).
-Relaxation of a long range stress field of relatively low magnitude (case 6).
neb calculations, i.e. simulatenous energy minimization of an ensemble of systems with modified forces (cases 7 and 8). In case 7, the converged solution is closer to the initial guess than in case 8.

The force fields
The aforementioned tests rely on four different classes of force fields (ff), which are described in the following and summarized in Tab. 2.
The Embedded Atom Method (eam) potential [34,35] is a widely used ff in atomistic simulations of materials in general and of metals in particular [36][37][38][39][40][41][42]. It is thus the primary ff of the test cases. The eam is a function of a two-body term and an "embedding energy", which is a functional of the local electron density. The latter is calculated based on contributions from radially symmetric electron density functions of atoms in the environment. Here, eam is used for simulating Au and Al.
The Modified Embedded Atom Method (meam) potential [43][44][45][46] is suitable to assess the behavior of fire 2.0 with 3-body interactions potentials suitable for complex alloys or covalent material [47][48][49][50]. In meam, an angular term is added to the energy functional of eam, making it more suitable for complex materials. Here, meam is used to model Mg and the complex intermetallics Mg 17 Al 12 and Mg 2 Ca.
The ff by van Beest, Kramer and van Santen (bks) [58] is chosen to assess the performance of fire 2.0 with long range interactions, in particular electrostatic interactions solved in the reciprocal space [59,60]. Here, we use it to model silicate glass, an ionic material that includes long range coulombic interactions.

Simulation setups
In cases 1-6, the goal is to find a minimum energy configuration starting from some initial state of a system. In cases 7-8 the goal is to find a minimum energy path between two states of the system by neb. The test cases are described in the following and a summary is provided in Tab. 2. The atomic configurations are illustrated in Fig. 1. 1. Relaxation of a dislocation in Al: An edge dislocation [63] is inserted in an Al cylinder by displacing the atoms according to the anisotropic-elastic solution [64]. The cylinder has 25,340 atoms and a radius of 5.  Fig. 1(2)). The melt is obtained by md from an α-quartz crystalline structure. Since this configuration is initially far from a 0 K energy minimum, the maximum atomic displacement per step (dmax in lammps) had to be set to 0.001Å instead of 0.1Å (default value). This case uses the bks potential [58]. The long range coulombic interactions is calculated by a standard Ewald summation with an accuracy of 10 −5 and a direct/reciprocal space cutoff of 1 nm. 3. Relaxation of bulk Au with a nano-porous gyroid structure: The structure has 613,035 atoms and is contained within a box of 44.6 × 42.6 × 15.7 nm 3 with full pbc. This case exhibits a particularly high surface over bulk ratio (21.4% of the atoms belong to surfaces) with complex curvatures, see Fig. 1(3). The ff is of the EAM type [66]. 4. Relaxation of a Au nano-pillar with a nano-porous gyroid structure: This case is similar to case 3, but without pbc. The structure consists of 457,424 atoms and has a cylindrical shape with radius 42.6 nm and height 15.7 nm, see Fig. 1(4). It was cut out of the sample 3. 25.6% of atoms are surface atoms. Due to the absence of periodicity, only surface atoms (white) are visible in Fig. 1(4).    pbc are applied in all directions (See Fig. 1(8)). neb simulations are performed with 18 intermediate configurations as described elsewhere [50]. The ff is the meam from Kim et al. [49].

Results and discussion
The computationally most expensive task in atomistic simulations is typically the calculation of the interatomic forces, therefore the number of force evaluations is used for comparing minimizer performances. Except otherwise mentioned, the threshold f2norm used in this work is = 10 −8 eV/Å. The evolution of f2norm as a function of the number of force evaluations is shown in Fig. 1. Tab. 2 indicates the increase in performance obtained by fire 2.0 versus cg and fire. The performance in optimizing a configuration is determined by the ratio of the number of forces evaluations required by cg or fire to reach the threshold, over the number of forces evaluations required by fire 2.0. A comparison with l-bfgs is outside the scope of this work which is based on lammps, where l-bfgs is not included. A recent comparison between a fire-based algorithm and l-bfgs was reported by Shuang et al. [31].
5.4.1. cg vs fire 2.0 fire 2.0 performs better than cg in the two simple cases 5 and 1, with a ratio of 1.1× and 1.2×, respectively. The relaxation of the long range ff in the case 2 is not possible using cg, which terminates with the lammps's stopping criterion linesearch alpha is zero at comparatively large f2norm. Generally, this occurs when no minimum can be found by line search, for example when the backtracking algorithm backtracks all the way to the initial point. A similar behavior is seen in test case 6: cg fails to reduce the forces sufficiently. Note that the output configuration is clearly different to the one obtained with fire 2.0, see the insets in Fig. 2 (6). Similarly to fire, cg predicts that the dislocation remains in the Mg matrix far away from the precipitate, whereas fire 2.0 predicts that the dislocation moves towards the precipitate.
In test cases 3 and 4 (nano-porous Au) cg fails to reach the strict f2norm threshold of 10 −8 eV/Å, see Figs. 2(3) and 2(4). Line search fails when f2norm is below 10 −6 eV/Å. To quantify the performance of fire 2.0, we thus compare the number of force evaluations requires to reach the lowest f2norm achieved by cg. Here, fire 2.0 performs better than cg in problem 3 (bulk), but worse in 4 (free boundaries).

5.4.2.
fire vs fire 2.0 fire 2.0 performs better than fire as implemented in lammps in all the test cases. The smallest speedups of 1.8× and 2.9× are seen in the neb calculations of problems 7 and 8, respectively. A larger speedup of more than 3× is obtained in case 2, where the bks potential is used. Note that it is particularly difficult to relax the long-range coulombic interaction, so that the desired force stopping criterion was not reached. The relaxation with fire 2.0 stopped when f2norm reached a plateau, see Fig. 2(2). In this plateau region, fire 2.0 detected repeated attempts at uphill motion (P (t) < 0), and so minimization was terminated with return value MAXVDOTF. A speedup can still be defined by comparing the number of force evaluations at which fire reaches a f2norm similar to the one at the end of the fire 2.0 minimization. Much larger speedups of 10× and 30× are obtained in the cases 5 and 1, respectively. Finally, in the cases 3, 4 and 6 convergence of fire is so slow that the desired f2norm threshold is not reached.  (2) Silica melt.
(4) Nano-porous gold, pillar.  (1) integrator, on the other hand, performs slightly better than the others in problems 1, 3 and 5, but not in problems 2 and 4. In particular, the case 2 ( Fig. 2(2)) has stability issues. As for the cases 1 to 6, the neb cases 7 and 8 show a similar poor performance as fire while using fire 2.0 with an Euler Explicit scheme. By switching to Euler Implicit integration, as before, fire 2.0 typically outperforms fire. The Velocity Verlet integrator exhibits mixed behavior: it performs better than the other integrators in problem 7 but not in problem 8, the latter having stability issues.

fire 2.0: Influence of individual parameters
We have investigated the influence of the parameters α start and ∆t max on the performance of fire 2.0. Since the observed trends do not depend on the problem, the computationally less expensive problem 5 has been chosen for this parameter study. Note that α start and ∆t max are controlled by the lammps parameters alpha0 and tmax, respectively.
The performance as a function of α start for different choice of ∆t max (∆t = 1ps) are shown on Fig. 3(1). As seen on Fig. 3(1), best values lie in a range from 0.10 to 0.25. Increasing α start does not improve performance. For ∆t max > 6ps, it even leads to dramatic performance reduction. Generally, large values of tmax will benefit from lower values of alpha0, around 0.10 -0.15.
The performances as a function of ∆t max for different choice of ∆t (α start = 0.15) is shown on Fig. 3(2). The maximum value for the timestep ∆t max is controlled by the coefficient tmax applied on the timestep ∆t. That is ∆t max = tmax * timestep. As seen on Fig. 3(2), the performances largely depend on the correlated choice of the timestep and tmax. The optimum ∆t for running dynamics simulation in such system being 1 fs, it appears that choosing ∆t at least 4 times bigger is not relevant and leads to poor performances. In this case, fire 2.0 shows good performance for ∆t max < 12 fs, which correspond to tmax from 6 to 12, depending on the timestep. Generally, one can consider reducing tmax to improve the stability of the minimization. 5.4.5. fire 2.0: Nudged elastic band method fire 2.0 is 1.8 and 2.9 times faster than fire in the cases 7 and 8, respectively, see Tab. 2. Note that case 8, where the relative performance of fire 2.0 is better, is also the more complex case (complex mechanism and more images). Finally, a comparison of fire 2.0 and cg is not possible in these cases, because neb calculations in lammps require damped dynamics minimizers.

fire 2.0: On the usage of preconditioners
For geometrical optimization of atomistic configurations, preconditioners are known to largely enhanced the efficiency of the algorithms by considering known characteristics of the system, like the local atomic neighborhood [71]. For more details on preconditioners and how to determine them, the reader should refer to the recent work of Packwood et al. [72]. Preconditioners are especially efficient on large systems and could then reduce the difference we observe between cg and fire 2.0. With a similar goal as the preconditioner, that is the reduction of degrees of freedom to be optimized, we also investigated the influence of a pre-relaxation with a different minimizer on the performance of fire 2.0. This pre-relaxation was performed with the quickmin minimizer for 100 iterations, as implemented in lammps [7]. In all the problems but one, we did not observe any gain. Only the case 2 with long range atomic interactions evidence a significant advantage of performing this pre-relaxation, with a speedup close to 30%. That also improved the stability of fire 2.0 with Velocity-Verlet for the same problem.

General aspects
fire 2.0 minimizes faster than fire and can potentially reach lower residual forces. In the case of neb simulations, the performance gain increases with the complexity of the setup. When comparing to cg, fire 2.0 shows better performance except in case 4, the non-periodic nanoporous Au structure (Fig. 2(4)). Recall that the system was created by cutting bulk nanoporous Au (case 3) and removing pbc. The structure thus undergoes a sudden global shrinkage at the beginning of the minimization, which can easily be optimized by cg. In contrast, pseudo-dynamics relaxators like fire and fire 2.0 are sensitive to such scaling by generating a shock wave that has to be damped during optimization and thus may hamper minimization. In the bulk case (case 3), where there is no such global dynamic effect, fire 2.0 performs better than cg, which also indicates that the algorithm remains robust with a large amount of free surfaces. On all other systems, fire 2.0 shows various levels performance increase in comparison to cg, between 20% and 3000%. In addition, cg is not able to minimize the forces in some cases, due either to the long range stress field (case 6) or long range atomic interactions (case 2).
cg sometimes terminates prematurely (at a high level of residual force), because line search fails. Similarly, fire or fire 2.0 could terminate prematurely if convergence is slow and the chosen maximum number of force evaluations is too low. The resulting structure is then insufficiently optimized. Here, this was seen in case 6 ( Fig. 2(6)), where fire and cg yield a different dislocation position than fire 2.0. The latter is less susceptible to premature termination, because it does not suffer from line search problems and typically minimizes with fewer number of force evaluations. As a general statement, we note that reaching low f2norm values is crucial and analyzing an insufficiently relaxed structure could lead to wrong interpretations. This is especially important in statics and quasi-statics calculations of critical stresses. As a good practice, we suggest to always indicate the exact f2norm value alongside results in published work.
Among all the parameters that affect the behavior and performance of fire 2.0, the time integration scheme is the most important. Overall, as presented in the results above, Euler Implicit integrator provides robust minimizations at the cost of a slightly reduce performance. Hence we recommend the usage of fire 2.0 with an Euler Implicit integrator. Similarly, the very recent work of Shuang et al. [31] also recommended to couple the fire approach with a semi-implicit Euler integrator.
More generally, Tab. 1 list the parametrization of fire 2.0 accessible by the command min modify as implemented in lammps and the associated default values we recommend to use. More specifically, tmax can be reduced to improve the stability but should range from 2 to 12, and alpha0 should range from 0.10 to 0.25. In any case, we recommend to set the simulation timestep (command timestep in lammps) to the same value as in MD at low temperature.

Summary
In this work we describe fire 2.0, an optimized version of the fire minimization algorithm within the lammps molecular dynamics simulator, and add important details to the canonical publication [17]. The choice of time integration scheme has appeared to be crucial for fire and is now clearly discussed. A non-symplectic scheme like Euler explicit should not be used. We have shown the clear advantages of fire 2.0 versus fire and versus conjugate gradient through several examples in materials science: fire 2.0 is significantly faster than fire or conjugate gradient and can result in lower energy structures not found by other algorithms.
We intend fire 2.0 to entirely replace fire, the present work being a complement of the original publication [17]. Ultimately, this work intends to provide insights on performing more accurate and more efficient forces minimization of atomistic systems.
Acknowledgments J.G is thankful for the financial support by the German Research Foundation (DFG) through the priority program SPP 1594 "Topological Engineering of Ultra-Strong Glasses". F.H acknowledges financial support by the DFG through projects C3 (atomistic simulations) of SFB/Transregio 103 (Single Crystal Superalloys). Z.X. acknowledges financial support by the German Science Foundation (DFG) via the research training group GRK 1896 "In Situ Microscopy with Electrons, X-rays and Scanning Probes". W.N. acknowledges financial supports by the European Union, within the starting grant ShapingRoughness (757343) and the advanced grant PreCoMet (339081). EB gratefully acknowledges the funding from European Research Council (ERC) through the project "microKIc" (grant agreement No. 725483). A.V., A.P. and E.B. acknowledges the support of the Cluster of Excellence Engineering of Advanced Materials (EAM). A.P. and E.B. acknowledges the support of the Central Institute of Scientific Computing (ZISC). Computing resources were provided by the Regionales RechenZentrum Erlangen (RRZE) and by RWTH Aachen University under project rwth0297 and rwth0407. The authors gratefully thank Jim Lutsko, University Libre de Bruxelles, for helpful discussions on this manuscript.

Data availability
The source code of the implementation of fire 2.0 in lammps is freely available online, as described in Appendix B. The raw data required to reproduce the findings presented in this paper cannot be shared at this time as the data also forms part of an ongoing study.