Classical interaction potentials for diverse materials from ab initio data: a review of potfit

Force matching is an established technique to generate effective potentials for molecular dynamics simulations from first-principles data. This method has been implemented in the open source code potfit. Here, we present a review of the method and describe the main features of the code. Particular emphasis is placed on the features added since the initial release: interactions represented by analytical functions, differential evolution as optimization method, and a greatly extended set of interaction models. Beyond the initially present pair and embedded-atom method potentials, potfit can now also optimize angular dependent potentials, charge and dipolar interactions, and electron-temperature-dependent potentials. We demonstrate the functionality of these interaction models using three example systems: phonons in type I clathrates, fracture of {\alpha}-alumina, and laser-irradiated silicon.


Introduction
The enormous increase of available computational resources in the past decades has powered the success of molecular dynamics (MD), i.e., the computer simulation of the trajectories of atoms (and sometimes molecules) by numerically integrating Newton's equations of motion, in materials simulation. Molecular dynamics studies can routinely comprise millions of time steps for millions of atoms, and up to several trillions in a recent proof-of-concept [1]. Such large numbers of particles and integration steps require an efficient way to evaluate the interactions of a system. Even with the vast advances in first-principles methods like density functional theory (DFT), this can only be provided with classical effective interaction potentials or force fields. These represent the energy (and as its gradient the internal forces) of a system as a function of only the positions of the atoms, eliminating all electronic degrees of freedom. The calculation is further simplified by the fact that in general only two-, threeor rarely four-body terms are used to specify the system energy. These calculations are also efficiently parallelized, which is a prerequisite for such large systems. Typical applications that require millions of atoms are simulations of fracture [2], dislocations [3] or laser ablation [4].
In contrast to ab initio methods, which are, at least in principle, literally parameter-free, interaction models contain parameters, which the simulator must choose appropriately. Their values are used to calculate the energies and forces, which determine the trajectories of the particles and thus all properties resulting from a MD simulation. Also, these parameters are in general specific not only to an element, but to the complete set of atomic species involved in a simulation-or even a single chemical composition [5]. A researcher wishing to undertake MD simulations in a certain system can peruse the literature in search of the appropriate parameter set. There are also a number of repositories that host potentials, e.g. the OpenKIM project ‡ [6,7] or NIST's Interatomic Potentials Repository Project § [5]. At the time of writing, those two repositories contain interaction potentials for many of the elements, but only for around 50 binary and 13 (!) ternary systems. In contrast, the AFLOWLIB project [8], a first-principles high-throughput database of materials, currently contains information on over 500 binary compounds alone. Thus, one of the major limitations of molecular dynamics is the availability of suitable interaction potentials. Consequently, there is a sustained demand to determine effective potentials for more and more alloy systems -and for tools to assist researchers in doing this.
The "traditional" approach to this problem was to fit the potential to experimental data or ab initio calculated parameters like lattice constants, bulk moduli and elastic constants. However, for complex systems such data might be scarce, unreliable or even unavailable, which could drive the users to resort to model potentials with little physical justification. Here, the force matching method [9] offers a way on: It uses a large database of ab initio calculated reference data, like forces and energies, to optimize a potential, defined as a set of independent parameters. This is based on the idea, that if a potential yields correct energies and forces acting on the individual atoms, it can also produce the correct collective dynamics. The dynamics in turn are responsible for measured properties such as phase transitions, free energy differences or elastic properties.
The open source program potfit [10,11] has been developed as a highly flexible implementation of the force matching method, in both the potential forms used and the optimization schemes to arrive at the final parameters. Since the original publication, the functionality of the code has been considerably expanded beyond the pair and embedded atom method (EAM) potentials presented there. While the initial set of interaction models cover many "simple" metals (i.e., without any directional bonding), angular-dependent potentials (ADP) [12] can reproduce certain covalent aspects in a metallic bonding environment. Polarizable potentials of the Tangney-Scandolo (TS) type [13] extend potfit to force-match oxides. Electron temperature-dependent potentials [14,15] are significant wherever the electronic temperature of the system varies strongly; this is most significant for simulation of laser-matter interactions.
In this article, we provide an overview of the force matching method and then review the potfit code in section 2, with particular emphasis on the features introduced since the original publications: Differential evolution as optimization method (section 2.2.1), analytic potential representation (section 2.3.1), and the new potential models mentioned above (section 2.3.2). In section 3, we present three applications that make use of the new components: Phonons in clathrates (section 3.1), fracture of α-alumina (section 3.2) and laser irradiation of silicon (section 3.3).

Methods and models
2.1. Force matching 2.1.1. Interaction potentials from first principles The force matching method was first described around 20 years ago by Ercolessi and Adams [9]. It makes use of the ever increasing availability and predictive power of first-principles simulations, which-while by far not fast enough for a straight ab initio simulation-can be used to parameterize an interaction potential. To this end, it directly uses microscopic quantities from first principles simulations, like internal forces acting on an individual atom, as reference data in the parameterization process. This allows the determination of effective potentials even in cases, where there is insufficient experimental data to fit a sufficiently complex interaction model. The main motivation behind the force matching method is that a potential that yields forces correctly will reproduce correct dynamics, which then in turn leads to correct macroscopic quantities determined from MD simulations.
Fitting a potential is an optimization process. Force matching uses a weighted sum of squares Z, defined as the deviations from the ab initio reference data, as the target function for the optimization. A set of parameters, called α, is adjusted to reproduce the reference data as accurate as possible. The target function Z(α) is defined as with and The deviations Z D are made up from the individual contributions S, like forces, energies and stresses, multiplied by weighting factors u i . An additional term, Z C , can be used to put constraints with weights w r on the optimization process, e.g. fixing gauge degrees of freedom. The quantities with a superscript 0 are ab initio calculated values. Since the inaugural publication, force matching has been used to parameterize countless interaction potentials. The method has also been implemented in a number of codes that are now available under an open-source licence. Potfit [11] was first presented in 2005 at the 9th International Conference for Quasicrystals [10] as a method to determine potentials for quasicrystal-related structures, where experimental data is too scarce and unreliable to fit potentials. As the main focus of this review, it will be discussed in more detail in section 2.1.2 below. ForceFit [16] was presented in 2010. It provides a graphical user interface (GUI) to the force matching method and can fit analytic potentials with a particular emphasis on covalent interactions. In 2013, ForceBalance [17] was introduced to parameterize force fields with a wide variety of functional forms. It integrates the force matching method with an iterative improvement of reference structure selection and uses a local minimizer to optimize the parameters. The selection of force field types (i.e., functional forms) integrated in ForceFit and ForceBalance demonstrate that these codes were designed for applications in organic chemistry, particularly aqueous solutions, whereas potfit caters primarily to solid-state systems. For a current review of the merits and shortcomings of the force matching method, see [18].
The standard force matching method is a sequential multiscale method, where first-principles simulations and classical MD simulations are performed separately, with information from one used to parameterize the other. In contrast, hybrid quantum/classical (QM/MM) simulations (see e.g. [19] for a review) run both simulations at the same time, either by coupling the methods spatially (e.g. [20]), or by obtaining classical potential correction on the fly (e.g. [21]). The clear advantage of sequential simulations is that it allows to use dedicated tools for each multiscale layer. These ab initio and MD programs are wellestablished and highly optimized to their respective task.
There is no standard routine to select reference structures for force matching, however, there are empirical approaches that have proven successful in the past. The general guideline is that the training data set should represent the local environments that will appear in MD simulations with the new potential reasonably well. One way to achieve that is to use snapshots of a MD simulation, either ab initio-MD or using an intermediate potential (or a combination of both [22]). This procedure can be iterated, i.e., the classical MD snapshots are updated with configurations from a trajectory determined with the next generation potential. The reference data (forces, stresses and energies) is then calculated in these snapshots using first principles methods. This data set can be complemented by configurations to cover specific situations (e.g. strained structures). For some special-purpose potentials, the trajectory snapshots can be forgone entirely in favour of specifically targeted reference structures. For example, a force-matched potential used to optimize complex ground state configurations would primarily contain low-temperature structures representing relevant structural motives [23].
The selection of reference configurations also affects a central issue of force-matched potentials: transferability. Can a potential be used under conditions that were not represented in the reference data? For example, potentials for alloys that are based on DFT calculations of one particular composition (and do not use the pure constituents in the database) can not be expected to be used far from this composition. This implies that force matching allows the creation of interactions that are tailored to a very particular situation. On the other hand, a wider training set will allow for more flexible potentials. In any case, a potential should be validated against information not directly used in the training data, but relevant for the desired application. This can either be forces, stresses and energies from a test set, or quantities derived from more involved calculations like elastic constants.
Following a rigorous testing regime will also help identify cases of overfitting, which may occur when there are too many adjustable parameters. Then transferability may be lost completely, as the optimal solution may reproduce only the reference data at the expense of slightly different configurations. Monitoring the success of the potential on a test data set will spot this behaviour and allow countermeasures. It should however be kept in mind that even the best classical potential will most certainly have some limitations; the move from first-principles to an effective interaction incurs a loss of generality. For this reason, creators and users of effective potentials must be aware of the capabilities and deficiencies of the interaction model. As a consequence, all published force-matched potentials should be accompanied by detailed information on the selection of reference configurations used for generating it.

Force matching with potfit
The potfit package [10,11] is a flexible open source implementation of the force matching method. It was originally designed to optimize interpolated pair and EAM [24] potentials for model systems and metals, but has since been considerably expanded, which will be described below. The code is parallelized over reference configurations (calculating the forces in each of them is completely independent of the others) using the Message Passing Interface. While potfit was designed to cooperate closely with the MD code IMD [25,26] and shares a number of force calculation routines and file formats with that code, there are output routines for the popular LAMMPS package [27]. Currently, potfit can determine potentials for a wide range of solid state systems, both in tabulated and in analytic form, which is a unique trait to force matching codes currently available. For further details about the original program, including theoretical background and implementation details, the reader is referred to [28].
There are two independent parts in potfit, a force routine and an optimization algorithm. While the latter is responsible for the adjustment of the parameter set α, the former calculates the quantities S(α) using a specific interaction model. This separation of target function evaluation and parameter modification makes potfit flexible; it is a comparatively simple process to add either an additional force calculation routine or a minimization method. In the following, we will describe those two main parts of the potfit program, with a particular emphasis on new developments.

Optimization
The optimization is responsible for adjusting the parameter set α to minimize the target function Z(α) (1). To this end, there are three distinct optimization algorithms implemented in potfit. Powell's algorithm [29] is a gradient-free conjugate-gradient-like optimization method that takes advantage of the particular form of the target function as a sum of squares. This algorithm is efficient in the number of force calculations required to reach a local minimum in parameter space. In contrast, a method based on simulated annealing [30] is used to sample parts of parameter space in a Monte-Carlo based fashion. The method outlined by Corana et al. [31] is designed to operate on continuous variables and was successfully used particularly with tabulated potentials; a significant share of the work referenced in section 2.3 below made use of this facility. The implementation of these two methods in potfit has been described in the original publication [11]. Differential evolution [32] is a variant of the genetic algorithm (GA) [33] approach, where a population of potentials is evolved to an optimal solution. This method proved to be especially helpful for the optimization of analytic potentials (cf. section 2.3.1) and is described in more detail below. Simulated annealing and differential evolution are non-deterministic methods that unlike Powell's algorithm can leave the basin of attraction of a local minimum in favour of a potentially better solution to the optimization problem.

Differential evolution
The genetic algorithm implemented in potfit is called differential evolution (DE). It is a vector based stochastic optimization method, which has been introduced by [32]. It uses a population P: where N p denotes the number of population vectors, g counts the generations and D is the dimensionality of the problem. The vector x i in (4) is equivalent to a set of parameters α from (1). DE consists of three parts: mutation, crossover and selection. For reasons of parallelization, DE works with two concurrent arrays for the population. One holds the current and one the subsequent generation of parameters. After an entire evolution step has been performed, the next generation is copied to the array holding the current one. Afterwards the next step is started unless some abortion criteria have been met.
Mutation In the first part of the DE optimization loop a mutation vector v i,g is created from the current population by combining several individuals x i,g . There are different approaches on how to perform the combination, the basic version of DE uses the following: The indices r 0 , r 1 and r 2 are mutually exclusive integer random numbers and F a constant ∈ (0, 1]. Each mutation vector is composed of a random vector x r0,g plus a weighted difference of two other random vectors x r1,g and x r2,g .
Crossover To get a diversity enhancement in the population, a crossover step is introduced. It mixes the previously generated mutation vectors v i,g with the population vectors x i,g in order to generate trial vectors u i,g . Generally just a binary choice is made, which is defined as The jth component of the mutant vector is accepted for the trial vector with a probability of C r . If this is not the case, the component of the target vector is retained. rand j [0, 1) is a uniformly created random number and C r is called crossover rate. In order to prevent the case u i,g = x i,g at least one component of the mutant vector v i,g is always accepted. A general recommendation of C r ∈ [0.8, 1.0] is given in [32].
Selection The last part of the DE optimization loop is a simple one-to-one selection. Each trial vector u i,g competes against the corresponding target vector x i,g . The one which yields the lower target function Z survives into the next generation g + 1: After the selection has been performed the algorithm checks, if any of the abortion criteria are met. If that is the case, the optimization is complete, otherwise another optimization loop will be performed. A user defined critical threshold, with a default value of 10 −6 , is the main abortion criterion. After each loop, the difference of the target vectors with the smallest and largest target function is calculated. As long as this difference is greater than the critical threshold of 10 −6 the optimization is continued. That means, the algorithm will continue, until all target vectors have retracted to a very narrow area in parameter space.

Interaction models
In its original implementation, potfit could optimize pair and embedded atom method (EAM) [24] potentials. This functionality has been used extensively to create effective potentials for a wide range of mostly metallic materials: ¶  [41], Mo [42].
To extend the functionality of potfit to different material classes, more potential models have been since been added to the program code. This also motivated a further addition to the code that allows to directly optimize the parameters of a certain functional form (a so-called analytic potential), as not all of the new interaction models can be adequately represented in the tabulated form required by the original potfit implementation. Both these developments are described below.
2.3.1. Potential representation In an analytic potential, the functions creating the potential (e.g. a pair interaction φ (r i j )) are defined by the parameters of a fixed functional form (e.g. φ (r i j ) = A exp(−λ r), with parameters A and λ ). In contrast, tabulated potential functions are represented by their values at (not necessarily equidistant) sampling points. The latter method avoids a model bias introduced by the selection of functional form, but typically requires many more parameters. Also, in some cases, it may be possible to attach a physical interpretation to potential parameter (e.g. charge q i in Coulomb potential q i q j r −1 i j ). Additionally, tabulated potentials require information in the reference data to support all sampling points. This may be an issue with minority constituents of compound systems, where the respective pair distribution functions may have large gaps. For analytic potentials, this is less an issue, as the values of the parameters affect a potential function globally.
To extend the functionality of potfit, analytic potentials have been implemented. Several predefined functions can be used to define a potential; additional functions can be implemented very easily. A cutoff function Ψ(r) is provided to make the potentials and their derivatives vanish at the cutoff distance r c . This functionality has first been used to determine EAM potentials for quasicrystal approximants in the Al-Pd-Mn [46] system.

ADP potentials
Based on the EAM model, an angular dependent potential (ADP) [12] was introduced to account for directional forces. Like EAM it is composed of purely pairwise contributions and does not contain three-body terms explicitly. In an orthogonal Cartesian system the potential energy is given as where indices i and j run over all atoms and superscripts α, β = 1, 2, 3 refer to Cartesian directions x, y and z. The first two terms in (9), pair potential φ and embedding function F, are regular EAM terms. An artificial electron density, n i = ∑ j =i ρ j (r i j ), is calculated as the sum of contributions from all neighbouring atoms using the transfer function ρ j of atom j.
The terms three to five in (9) are responsible for the indirect directional dependence of the potential. The vectors µ i and tensors λ i are functions of two additional pairwise potentials u(r) and w(r), while ν i is the trace of the λ i -tensor. Angular dependent potentials can be thought of as a kind of multipole expansion. The terms µ i and λ i are measures of the dipole and quadrupole distortion of the local environment of atom i. For a perfect cubic symmetry they vanish, otherwise they increase the energy.

Coulomb and dipole interactions
Ionic solids can in general not be described adequately with short-ranged pair or EAM potentials. The Coulomb interaction between charged particles falls of as r −1 and there is no safe radius (i.e., converging to the true result) where the interaction can be cut off [52] -which would be required in standard MD codes. Thus, a force matching code must also treat Coulomb interactions differently from shortranged pair potentials. Additionally, it was shown that certain properties in various oxides cannot be reproduced correctly from pure pair potentials [53]. There, the polarizable oxygen model by Tangney and Scandolo (TS) [13] provides a much better description at limited computational cost. In the TS model, the oxygen atoms are assigned a polarizability α, leading to a dipole moment p, which is determined selfconsistently from the surrounding charges and dipoles, modified for short-range interactions. The energy of the system is composed from the Coulomb interactions between charges and dipoles, and a short-ranged Morse-Stretch type interaction.
In densely charged systems like ionic melts or solids, the long-range interactions between charges can be summed up using the linear scaling Wolf method [52]. The same summation method can also be applied to the dipolar interactions in the TS method [54], leading to a polarizable potential that scales linear in the number of particles. This model has also been implemented in potfit [55], and was used to generate effective interaction potentials for SiO 2 , MgO [55], Al 2 O 3 [56] and yttria-stabilized zirconia (YSZ) [57]. The Wolf summation is applied both in force matching and in MD simulation, which implies that the resulting potentials are consistent regarding the treatment of long-range interactions.

Tersoff MOD potential and temperature dependent interactions
One of the most commonly used empirical interatomic potentials in covalent materials is that developed by Tersoff [58]. In the original Tersoff interaction, the total potential energy V is modelled as a sum of pairlike repulsive V R and attractive V A interactions with environment-dependent coefficient b: The modified angular-dependent term and cutoff function were introduced in the modified Tersoff (MOD) [59] potential to improve the melting temperature value. Under strong laser irradiation, the antibonding states of covalent materials become occupied at the expense of bonding states. This corresponds to a non-thermal occupation of the electronic bands. The electronic states thermalize over a few fs, albeit at an electronic temperature T e T l significantly above the lattice temperature T l . + As a consequence, the + A non-thermal occupation of the electronic states is challenging to capture in standard DFT, which is why we assume a thermal occupation even for the first few timesteps.
potential energy surface and also the interatomic interactions change quasi instantaneously. The resulting interatomic forces can induce nonthermal processes in the lattice such as melting or phase transformation. This cannot be captured by standard potentials, as the forces acting on atoms in a structure representative of T l depend on the electronic temperature T e . To take these effects into account, some or all of the potential parameters can be made explicitly temperature dependent, e.g. A = A(T e ). Reference configurations are created using finite-temperature density functional theory (FTDFT) [60] calculations. A set of parameters is then obtained for each temperature, where the temperature-independent parameters are kept fixed, and those dependent on the electronic temperature are allowed to vary smoothly in T e according to a sixth-order polynomial, i.e., The temperature-dependent modified Tersoff potential is called MOD*. Starikov et al. [39] used a similar approach to determine an electron-temperature dependent EAM potential for gold. In a MD simulation, the time evolution and coupling between electron gas and lattice is then accounted for in a two-temperature model (TTM) [61]. There the evolution of the electronic temperature is modelled in a continuum description, while the phononic system is represented by the point masses of the MD simulation.

Validation and sample applications
The new features have been validated against calculations with the MD code IMD. Here we present a review of recent results obtained with the newly implemented potential models.

Angular dependent potentials for type I clathrates
To test the fitting of angular dependent potentials, semiconductor clathrate systems have been chosen. Over the last few years they have become very promising candidates for thermoelectric applications, due to their vastly different transport coefficients for heat and electric current. Type I clathrate structures are formed by two types of polyhedra, which consist of either Ge or Si atoms in our testcases. Both are shown in Fig. 1. The smaller cages form a bcc lattice while the larger cages fill the remaining gaps. More details on the clathrate structures are given in [62]. A second element, Ba, can be inserted into these cages, bringing the number of atoms per unit cell up to 54. These central atoms exhibit special phononic excitations, so called rattling modes. Investigating the influence of these modes on macroscopic quantities requires simulations with thousands of atoms over several pico-or even nanoseconds. Using ab initio methods is not feasible for such simulations. Molecular dynamics, however, using effective potentials, can give some insight into the influence of the rattling modes. ADP potentials have been fitted for the empty Ge cages as well as for Si cages filled with Ba atoms. To this end, several static as well as dynamic ab initio simulations have been performed to generate suitable reference configurations. In total there were 6937 datapoints, i.e., forces, energies and stresses, for the Ge potential and 11 393 for the BaSi potential. Details of the selection and calculation of DFT values can be found in [63].
The clathrate structure can be well described with these potentials. Static simulations showed a good agreement of the lattice constant with the ab initio reference value as well as an experimental value. The ADP potentials as well as ab initio simulations yield a value of 10.53 Å for the Ge 46 system, the experimental value is 10.55 Å. For the Ba 8 Si 46 system the differences are a little larger, with 10.18 Å for the ADP potential, 10.24 Å for the ab initio calculation and 10.33 Å for the experiment.
The designated application for the clathrate potentials are lattice dynamics. They require a precise description of the phonons in the system. To test the quality of the potentials the phonon density of states has been calculated with the effective potential and DFT methods, for comparison.
To calculate the phonon density of states, there are different approaches for ab initio and MD calculations, which exploit the advantages of the respective calculation method. The ab initio results have been calculated with the phono.py [64] package, using the finite displacements method and the dynamical matrix approach. For the MD simulations, the autocorrelation function of the particle trajectories has been used.
The results for the Ge 46 clathrate system are shown in Fig. 2(a). The overall agreement of the ADP and ab initio data is good. For the ADP potentials the only major difference is the shift of the lowest peak. The upper peak at 8.2 THz for Ge as well as the minor peaks in the intermediate frequency range are well reproduced. For lower frequencies there are minor discrepancies, which are acceptable for effective potentials. The oscillations of the ADP data for small frequencies can be attributed to anharmonicities in the potential, which are not present in the ab initio calculations.
For the density of states of the Ba 8 Si 46 system, shown in Fig. 2(b), the agreement for low frequencies is better than for high frequencies. The three peaks at 2, 4, and 5 THz agree with only little deviations. For frequencies above 6 THz the values of the ADP potential are constantly shifted about 2 THz to the right. The reason for this is unclear. High frequencies correspond to the vibrations of the framework; they are also present in the empty cage structure Si 46 . Adding guest atoms creates phonon modes at low frequencies.
These calculations clearly show that the lattice dynamics of clathrate systems can be reproduced well by ADP potentials. Further work has been performed by determining the dynamical structure factor and the lattice thermal conductivity of these systems using the effective potentials. All results have been published in [63].

Crack propagation in α-alumina
Wolf-summated TS potentials were generated to study fracture in α-alumina (Al 2 O 3 ) [56]. A reference database containing 67 structures with in total 24 120 atoms was used, with atomic positions from three sources: (i) strained structures (up to 20 % strain along three different directions) at 0 K, (ii) three distinct free surfaces in various terminations 0 K, and (iii) snapshots from ab initio MD trajectories at temperatures up to 2000 K. The resulting potential was validated to reproduce elastic constants and surface free energies reasonably well.
Mode I cracks were then simulated in orthorhombic unit cells containing around 80 000 atoms, using an elliptical seed crack. A detailed description of the simulation conditions can be found in Ref. [56]. The results are as expected strongly dependent on cleavage plane and to a lesser extent on propagation direction. Cracks inserted in a {1120} plane propagate in both [0110] and [0001] directions, with considerable disorder at the crack surface and atomic bridges across the crack. In the dense {0001} planes, cracks do not propagate in either [2110] or [0110] direction. At higher energy release rates however, the (0001)[0110] crack diverts into a {1012} plane, which is one of the preferred cleavage planes. Cracks inserted a {1010} plane moving in the [0001] direction deviate slightly from the seed plane, propagating partially in a {1012} plane. Both propagation directions in this plane leave behind atomic bridges across the crack surfaces.
In these simulations, additional insight can be gained from studying the induced dipole moment of the polarizable oxygen atoms (Al atoms are assumed to possess no polarizability and consequently no dipole moment). Two effects contribute to aligning the dipole moments: charged crack surfaces and material strain. Charged surfaces occur depending on the cleavage plane, e.g. the oxygen-terminated {1120} surface show a dipole orientation normal to the crack surface. More interestingly, polarization can also be induced by straining the material. While bulk α-Al 2 O 3 is symmetric under inversion and thus shows no piezoelectricity (which simulations using the new potential reproduced), the crack breaks the inversion symmetry and shows macroscopic alignment of dipoles in inhomogeneously strained volumes of the simulation cell.
To study the degree of dipolar ordering, a scalar quantity called fractional anisotropy (FA) [65] was calculated from the induced dipole moments in a simulation snapshot. The FA represents how well aligned dipoles are over a neighbourhood corresponding to the potential cutoff radius (here: 10 Å), with 0 (1) corresponding to complete disorder (perfect order). Two such visualizations can be found in figure 3, for two different cleavage planes. There, the ordering of dipole moments serve as a probe of the local strain field, which serves to align neighbouring dipoles.
It should be noted, that the dipole moment calculated from a force-matched potential is primarily an empirical quantity, calculated from a polarizability chosen to optimally reproduce reference forces. Dipole moments were not used as reference quantities. However, the choice the arrow glyphs further emphasizes set D1 (cf. table 1) of liquid silica at ewly-developed force field used in the nuously throughout the trajectory and n of dipoles occurs. The non-polar sily spheres. To make the visualization ly filtered out. n a -for every oxygen atom and the l of Tangney and Scandolo (TS) [34]. l energy and the forces on each parime step during simulation. The TS utions: a short-range pair potential (cf. e.g. [34]) and a long-range part ostatic interactions between charges atoms. While U MS is a well-known, tions, U EL is a recent addition to the died in detail [13] and thus is in the decisions. static interactions where dipoles conve to be calculated. As the dipole ctric field of the surrounding charges ative solution has to be found in each ch, a dipole moment p p p n i of atom i in ced part p p p IND i from the electric field and a short-range part p p p SR i from the ds on the dipole moments of the pretions between charges U qq , between n a charge and a dipole U pq , the total by he polarizability of oxygen atoms is that we eventually get from the MD each time step. Each atom i prothe vector p p p i of its dipole moment tion is always performed on a threemost data sets used in this work use roximately 207 ⇥ 137 ⇥ 24Å), and time steps used by the visualization simulation uses a periodic boundary ther two axes have open boundaries. size in the z-direction would only inthe simulation without changing the Fig. 4. A combined visualization using all of our visual metaphors. FA shows the strain in the right area and at the border regions as well as at the surface of the initially inserted crack (left side). The iso surface (FA = 0.81) further emphasizes these regions and especially helps creating context between the atoms' glyphs and the FA scalar field representation. The glyphs, however, are also necessary as the crack (center of the image) is hardly perceivable in the FA field, due to the atom chains mentioned in section 4.3.1. The combination of all three visualizations allows for efficient and effective observation of the data, providing overview (through FA and iso surfaces) as well as detail information (through the glyphs).

Visualization
Point-based glyph representations are nowadays established as a stateof-the-art visualization paradigm for MD data sets, except for special applications from some research fields for which abstract visual metaphors have been derived from the particle data sets, e.g. cartoon or surface representations for proteins in bio-chemistry. For atoms without specifically oriented attributes, spheres are sufficient visualization primitives. However, for the work at hand the orientation of the dipole moments of oxygen atoms needs to be shown and emphasized to aid the analysis of the data. Additionally, the information of each atom has to be related to its surrounding atoms, as the dipole moment is induced by neighboring charges and dipole moments. We therefore use three visual metaphors to represent the data. First, we use arrow glyphs to depict the placement and orientation of the oxygen atoms, while optionally showing non-polar atoms as simple spheres (cf. fig. 3) -an established visualization that domain experts are familiar with. The orientation is furthermore mapped to the color of the glyhps (cf. sec. 3.2.1), which allows for visually identifying large areas of strongly correlated orientations. Second, we derive a scalar field of FA from the orientation of the atoms that quantitatively captures the correlation in a confined area. Slices of this scalar field can be added to the visualization for an extremely easy identification of areas of very high and very low correlation in the orientation of the dipoles (cf. fig. 4). Third, iso surfaces of the FA scalar field are applied to enclose these regions of strongly correlated orientations, and facilitate tracking changes in size and position of correlated areas over time (cf. fig. 11).
The visualization at hand is implemented using an existing visualization system developed at our research group, which focuses on the display of particle data sets, i.e. MegaMol [26]. The system already provides us with modules for ray casting spheres and arrows on the GPU. For this application, we added four new modules: the first one is for extracting the scalar field from the raw simulation data. Its function is described in section 3.2.2. A second one is used to cache the results of the former. The third one renders the scalar field, and the last one is for modulating the color of the glyphs and filtering the glyphs based on the value of a scalar field, which allows for showing the FA values on the glyphs themselves. While the analysis using FA works and yields valuable results, as shown in this paper, other methods of information aggregation might provide further insight. An example would be spatially clustering the dipole moments using a classification scheme not working on individual dipoles, but taking into account local neighborhoods (cf. fig. 8). We want to experiment with such approaches as a future work. of potential model strongly implies that the vector orientations and magnitudes of this quantity can indeed be interpreted as dipole moments. Additionally, the FA visualized in figure 3 can be used as a generic order parameter reflecting on the local environment detached from a specific interpretation.

Temperature dependent potential for Si
We generated an electron-temperature dependent MOD* potential for silicon from FTDFT reference data calculated with VASP [66,67] using local density approximation and projector augmented waves [68]. Using information from 44 configurations at electronic temperatures between 0 K and 25 000 K with in total 352 atoms, we fit sixth order polynomials to the A, B, λ 2 and δ parameters, while keeping the remainder independent of temperature. To test the resulting potential, we examined the electron temperature dependence of elastic constants and lattice constant. This requires a non-equilibrium state of the electrons and ionic lattice, such as they occur after rapid material heating via strong laser fields.
According to the first-principles calculations used for MOD* fitting, the simple cubic crystal structure of silicon becomes energetically the most stable structure at high electronic temperatures above 17 500 K. However, this transition cannot be observed directly in a MD simulation, because the electrons and lattice typically equilibrate after few picoseconds, which reduces the electronic temperature. Alternatively, the stability of a structure can be assessed by its mechanical properties using the well-known Born stability criteria for the elastic constants C i j . In the particular case of a cubic crystal structure, the convexity of the free energy leads to the relations C 11 + 2C 12 > 0, C 44 > 0, C 11 −C 12 > 0. (18) Elastic constants C 11 , C 12 , C 44 and bulk moduli B for diamond silicon obtained at different electronic temperatures using MOD* potential are shown in figure 4 (left). For comparison, the elastic constants C 11 , C 12 , C 44 for silicon at room temperature of 167 GPa, 65 GPa and 81 GPa respectively are slightly underestimated by the MOD* potential. The elastic constants and consequently the bulk modulus soften slowly initially and more rapidly with increasing carrier temperature, eventually vanishing and leading to structural transformation at electronic temperatures above 25 000 K. The softening of interatomic forces is a direct consequence of electronic excitations to conduction bands, which break covalent bonds and induce more metallic behaviour in semiconductors and insulators after strong laser irradiation.
In the second test case we evaluated the cohesive energy of the bulk diamond silicon for a wide range of electronic temperatures and lattice constants. The results are shown in figure  4 (right). The room temperature lattice constant for silicon of 5.43 Å, which corresponds to an equilibrium bond length of 2.35 Å, and equilibrium cohesive energy of 5.43 eV are well reproduced by MOD* potential. With increasing electronic temperature we can observe initially slow and then rapid increase of the lattice constant, represented by the dashed line in the contour plot. That expansion can be explained with a decrease of shared electronic number density and consequent increase of the repulsive forces between ions.
These initial tests show that the MOD* potential can capture essential effects of varying electronic temperature, which makes it suitable to simulate the effects of laser irradiation on silicon. It is our aim to use the MOD* potential to study laser ablation of a Si film; this is ongoing work and will be presented at a later time.

Conclusion
In this review article we demonstrate new features of the established force-matching code potfit. The results span a wide range of solid state materials, from ionic solids to metals and covalently bound materials of varying degree of complexity. This demonstrates the flexibility and versatility of the program in determining interaction potentials for solid state systems. By providing interfaces to standard DFT codes on the one hand, and several MD codes on the other, potfit is an essential part of the sequential multiscale materials modelling stack. In this way, it extends the length and time scales accessible to simulation for materials and conditions which up to now were limited to DFT methods.
One downside of effective interaction potentials is the lack of rigorous uncertainty quantification in the generation process: It is difficult to quantify the confidence of a certain property determined through MD simulation a priori. Only after the fact, an error can be assigned by comparison to experimental values. For predictive modelling applications however, this is insufficient; the point is to yield reliable results including errorbars before an experiment is performed. While there already exist approaches to use Bayesian techniques to estimate precision of potentials [69], it would be a worthwhile goal to equip potfit with the tools to provide not only an effective potential, but also a measure of the reliability of that result.