High-Dimensional Potential Energy Surfaces for Molecular Simulations

An overview of computational methods to describe high-dimensional potential energy surfaces suitable for atomistic simulations is given. Particular emphasis is put on accuracy, computability, transferability and extensibility of the methods discussed. They include empirical force fields, representations based on reproducing kernels, using permutationally invariant polynomials, and neural network-learned representations and combinations thereof. Future directions and potential improvements are discussed primarily from a practical, application-oriented perspective.


Introduction
The dynamics of molecular (i.e. chemical, biological and physical) processes is governed by the underlying intermolecular interactions. These processes can span a wide range of temporal and spatial scales and make a characterization and the understanding of elementary processes at an atomistic scale a formidable task. 1 Examples for such processes are chemical reactions or functional motions in proteins. For typical organic reactions the time scales are on the order of seconds whereas the actual chemical step (i.e. bond breaking or bond formation) occurs on the femtosecond time scale. In other words, during ∼ 10 15 vibrational periods energy is redistributed in the system until sufficient energy has accumulated along the preferred "progression coordinate" for the reaction to occur. 2 Similarly, the biological process of "allostery" couples two (or multiple) spatially separated binding sites of a protein which is used to regulate the affinity of certain substrates to a protein, thereby controlling metabolism. 3 According to the conventional view of allostery, a conformational change of the protein (that might however be very small 4 ) is the source of a signal, but other mechanisms have been proposed as well which are based exclusively on structural dynamics. 5 Here, binding of a ligand at a so-called allosteric site increases (or decreases) the affinity for a substrate at a distant active site, and the process can span multiple time and spatial scales to the extent of the size of the protein itself. Hence, an allosteric protein can be viewed as a "transistor", and complicated feedback networks of many such switches ultimately make up a living cell. 6 As a third example, freezing and phase transitions in water are entirely governed by intermolecular interactions. Describing them at sufficient detail has been found extremely challenging and a complete understanding of the phase diagram or the structural dynamics of liquid water is still not available.
All the above situations require means to compute the total energy of the system computationally efficiently and accurately. The most accurate method is to solve the electronic Schrödinger equation for every configuration x of the system for which energies and forces are required. However, there are certain limitations which are due to the computational approach per se, e.g. the speed and efficiency of the method or due to practical aspects of quantum chemistry such as accounting for the basis set superposition error, the convergence of the Hartree-Fock wavefunction to the desired electronic state for arbitrary geometries, or the choice of a suitable active space irrespective of molecular geometry for problems with multi-reference character, to name a few. Improvements and future avenues for making QMbased approaches even more broadly applicable have been recently discussed. 7 For problems that require extensive conformational sampling or sufficient statistics purely QM-based dynamics approaches are still impractical.
A promising use of QM-based methods are mixed quantum mechanics/molecular mechanics (QM/MM) treatments which are particularly popular for biophysical and biochemical applications. 8 Here, the system is decomposed into a "reactive region" which is treated with a quantum chemical (or semiempirical) method and an environment described by an empirical force field. Such a decomposition considerably speeds up simulations such that even free energy simulations in multiple dimensions can be computed. 9 One of the current open questions in such QM/MM simulations is that of the size of the QM region required for converged results which was recently considered for Catechol O-Methyltransferase. 10 Other possibilities to provide energies for molecular systems are based on empirical energy expressions, fits of reference energies to reference data from quantum chemical calculations, representations of the energies by kernels or by using neural networks. These methods are the topic of the present perspective as they have shown to provide means to follow the dynamics of molecular systems over long time scales or to allow statistically significant sampling of the process of interest.
First, explicit representations of energy functions are discussed. This usually requires one to choose a functional form of the model function. Next, machine learned potential energy surfaces are discussed. In a second part, topical applications of these methods are presented.

Explicit Representations
Empirical force fields are one of the most seasoned concepts to represent the total energy of a molecular system given the coordinates x of all atoms. The general expression for an empirical FF includes bonded E bonded and nonbonded E nonbonded terms. One example for such a predefined functional form are permutationally invariant polynomials (PIPs) which have been applied to molecules with 4 to 10 atoms and to investigate diverse physico-chemical problems. 16 Using PIPs, the permutational symmetry arising in many molecular systems is explicitly built into the construction of the parametrized form of the PES. The monomials are of the form y ij = exp (−r ij /a) where the r ij are atom-atom separations and a is a range parameter. The total potential is then expanded into multinomials, i.e. products of monomials with suitable expansion coefficients. For an A 2 B molecule the symmetrized basis is y a 12 (y b 13 y c 23 + y b 23 y c 13 ) which explicitly obeys permutational symmetry.
A library for constructing the necessary polynomial basis has been made publicly available. 17 One application of PIPs includes the dissociation reaction of CH + 5 to CH + 3 + H 2 for which more than 36000 energies 18 were fitted with an accuracy of 78.1 cm −1 . With this PES the branching ratio to form HD and H 2 for CH 4 D + and CH + 5 , respectively, was determined. Also, the infrared spectra of various isotopes were computed with this PES. 19 Other applications concern a fitted energy function for water dimer, 20 which became the basis for the WHBB force field for liquid water 21 and that for acetaldehyde. 22 For acetaldehyde roughly 135'000 energies at the CCSD(T)/cc-pVTZ level of theory were fitted to 2655 terms with order 5 in the polynomial basis and 9953 terms with order 6 in the polynomial basis. For the relevant stationary states in that study the difference between the reference calculations and the fit ranges from 2 to 4.5 kcal/mol. However, the overall RMSD for all fitted points has not been reported. 22 With this PES the fragment population for dissociation into CH 3 + HCO and CH 4 + CO was investigated.
Another fruitful approach are double many body expansions. 23 These decompose the total energy of a molecular system first into one-and several many-body terms and then represent each of them as a sum of short-and long-range contributions. 23 This yields, for example, an RMSD of 0.99 kcal/mol for 3701 fitted points from electronic structure calculations at the MRCI level of theory for CNO. 24 As a comparison, another recent investigation of the same system 25 using a reproducing kernel Hilbert space (RKHS, see further below) representation yielded an RMSD of 0.38, 0.48 and 0.47 kcal/mol for the 2 A , 2 A and 4 A electronic states using more than 10000 ab initio points for each surface.
Local interpolation has also been shown to provide a meaningful approach. One such approach is Shepard interpolation which represents the PES as a weighted sum of force fields, expanded around several reference geometries. 26,27 Also, recently several computational resources have been made available to construct fully-dimensional PESs for polyatomic molecules such as Autosurf 28 or a repository to automatically construct PIPs.

Machine Learned PESs
Machine learning (ML) methods have become increasingly popular in recent years in order to construct PESs, or estimate other properties of unknown compounds or structures. [29][30][31][32] Such approaches give computers the ability to learn patterns in data without being explicitly programmed. 33 For PES construction, suitable reference data are e.g. energy, forces, or both, usually obtained from ab initio methods. Contrary to the explicit representations discussed in section 2, ML-based PESs are non-parametric and not limited to a predetermined functional form.
Most ML methods used for PES construction are either kernel-based or rely on artificial neural networks (ANNs). Both variants take advantage of the fact that many nonlinear problems (such as predicting energy from nuclear positions) can be linearised by mapping the inputs to a (often higher-dimensional) feature space (see Fig. 1). 34 Kernel-based methods utilize the x (3) x (1) x (2)  x (3) x (1) x (2)  x (3) x (1) x (2) x (3) Figure 1: A: The blue and red points with coordinates (x (1) , x (2) ) are linearly inseparable. B: By defining a suitable mapping from the input space (x (1) ,x (2) ) to a higher-dimensional feature space (x (1) ,x (2) ;,x (3) ), blue and red points become linearly separable by a plane at x (3) = 0.5 (grey).
kernel trick, 35-37 which allows to operate in an implicit feature space without explicitly computing the coordinates of data in that space (see section 3.1 for more details). ML methods based on ANNs rely on "neuron layers", which map their inputs to feature spaces by linear transformations with learnable parameters, followed by a nonlinearity (called activation function). Often, many such layers are stacked on top of each other to build increasingly complex feature spaces (see section 3.2). In the following, both variants are discussed in more detail.

Reproducing Kernel Representations
Starting from a data set {(y i ; kernel regression aims to estimate unknown values y * for inputs x * . For a PES, y is typically the total interaction energy and x is a representation of chemical structure (i.e. a vector of internal coordinates, a molecular descriptor like the Coulomb matrix, 29 descriptors for atomic environments, e.g. symmetry functions, 38 SOAP 39 or FCHL, 40,41 or a representation of crystal structure [42][43][44] ). The representer theorem 45 for a functional relation y = f (x) states that f (x) can always be approximated as a linear combination where α i are coefficients and K(x, x ) is a kernel function. A function K(x, x ) is a reproducing kernel of a Hilbert space H if the inner product φ(x), φ(x ) of H can be expressed as K(x, x ). 46 Here, φ is a mapping from the input space R D to the Hilbert space H, i.e.
φ : R D → H. Many different kernel functions are possible, for example the polynomial kernel where ·, · denotes the dot product and d is the degree of the polynomial, or the Gaussian kernel given by are popular choices. Here, γ is a hyperparameter determining the width of the Gaussian and · denotes the L 2 -norm. It is also possible to include knowledge about the long range behaviour of the physical interactions into the kernel function itself 47 and the consequences of such choices on the long-and short-range behaviour of the inter-and extrapolation have been investigated in quite some detail. 48 The mapping φ associated with the polynomial kernel (Eq. 3) depends on the dimensionality of the inputs x and the chosen degree d of the kernel. For example, for d = 2 and twodimensional input vectors, the mapping is given by φ : Hilbert space H associated with the kernel function is three-dimensional. For a Gaussian kernel, the associated Hilbert space H even is ∞-dimensional. This can easily be seen if Eq. 4 is rewritten as then the Taylor expansion of the third factor e 2γ x,x = ∞ d=0 1 d! (2γ x, x ) d reveals that the Gaussian kernel is equivalent to an infinite sum over polynomial kernels (scaled by constant terms). It is important to point out that in order to apply Eq. 2, the mapping φ has never to be calculated explicitly (or even known at all) and it is therefore possible to operate in the (high-dimensional) space H implicitly. This is often referred to as the kernel trick . [35][36][37] The coefficients α i (Eq. 2) can be determined such that f (x i ) = y i for all inputs x i in the dataset, i.e. 49,50 and y = [y 1 · · · y N ] T is a vector containing the N observations y i in the data set. Since the kernel matrix is symmetric and positive-definite by construction, the efficient Cholesky decomposition 51 can be used to solve Eq. 6. Once the coefficients α i have been determined, unknown values y * at arbitrary positions x * can be estimated as y * = f (x * ) using Eq. 2.
In practice however, the solution of Eq. 6 is only possible if the kernel matrix K is wellconditioned. Fortunately, in case K is ill-conditioned, a regularized solution can be obtained for example by Tikhonov regularization. 52 This amounts to adding a small positive constant λ to the diagonal of K, such that is solved instead of Eq. 6 when determining the coefficients α i (here, I is the identity matrix).
For non-zero λ however, f (x i ) = y i and Eq. 2 reproduces the known values in the data set only approximately. Therefore, this method of determining the coefficients can also be used to prevent over-fitting and is known as kernel ridge regression (KRR). 53 KRR is closely related to Gaussian process regression (GPR). 54 In GPR, it is assumed that in the data set are generated by a Gaussian process, i.e.
drawn from a multivariate Gaussian distribution with zero mean, and covariance K(x, x ).
Note that a mean of zero can always be assumed without loss of generality since two multivariate Gaussian distributions with equal covariance matrix can always be transformed into each other by addition of a constant term. Further, every observation y i is considered to be related to x i through an underlying function f (x) and some observational noise (e.g. due to uncertainties in measuring y i ) With these assumptions, it is now possible to determine the conditional probability p(y * |y), i.e. answer the question "given the data y = [y 1 · · · y N ] T , how likely is it to observe the value y * for an input x * ?". Since it was assumed that the data was drawn from a multivariate Gaussian distribution, it is possible to write where K is the kernel matrix (see Eq. 6) and K * = [K(x * , x 1 ) · · · K(x * , x N )]. Then, the best (most likely) estimate for y * is the mean of this distribution Thus, estimating y * with GPR (Eq. 10) is mathematically equivalent to estimating y * with KRR (compare to Eqs. 2 and 7). However, while in KRR, λ is only a hyperparameter related to regularization, in GPR, λ is directly related to the magnitude of the assumed observational noise (see Eq. 8). Further, the predictive variance, which can also be derived from Eq. 9, can be useful to estimate the uncertainty of a prediction y * , i.e. how confident the model is that its prediction is correct. Since KRR and GPR are so similar, they are both referred to as reproducing kernel representations in this work.

Artificial Neural Networks
The basic building blocks of artificial neural networks (NNs) 55-61 are so-called "dense (neuron) layers", which transform input vectors x ∈ R n in linearly to output vectors y ∈ R nout where the weights W ∈ R nout×n in and biases b ∈ R nout are parameters, and n in and n out denote the dimensionality of inputs and outputs, respectively. A single dense layer can therefore only represent linear relations. To model non-linear relationships between inputs and outputs, at least two dense layers need to be combined with a non-linear function σ (called activation function), i.e.
Such an arrangement (Eqs. 13 and 14) has been proven to be a general function approximator, meaning that any mapping between input x and output y can be approximated to arbitrary precision, provided that the dimensionality of the so-called "hidden layer" h is large enough. 62,63 As such, NNs are a natural choice for representing a PES, i.e. a mapping from chemical structure to energy (for PES construction, the output y usually is one-dimensional and represents the energy).
While shallow NNs with a single hidden layer (see above) are in principle sufficient to solve any learning task, in practice, deep NNs with multiple hidden layers are exponentially more parameter-efficient. 64 In a deep NN, l hidden layers are stacked on top of each other, . . .  For larger systems, it is common practice to decompose the total energy of a chemical system into atomic contributions, which are predicted by a single NN (or one for each element).
This approach, known as high-dimensional neural network (HDNN) 73 and first proposed by Behler and Parrinello, relies on the chemically intuitive assumption that the contribution of an atom to the total energy depends mainly on its local environment.

Applications
In are compared with ab initio energies in all three channels for the 2 A PES in Figure 3. The contour plots shown in Figure 3 shows the topology of the 2 A PES for all three channels.
The overall good agreement between the ab initio and analytical energies in all the channels and for all the electronic states suggests the high quality of the PESs.  Another prototypical example of a chemical reaction is the S N 2 mechanism. In a recent comparative study, 104 three reactive PESs for the [Cl-CH 3 -Br] − system were constructed: Two of these PESs rely on empirical force fields, either combined with the MS-ARMD or the MS-VALBOND 105 approach to construct the global reactive PES, whereas the third is NN-based. While all methods are able to fit the ab initio reference data with R 2 > 0.99, the NN-based PES achieves mean absolute and root mean squared deviations that are an order of magnitude lower than the other methods when using the same number of reference data. When increasing the size of the reference data set, the prediction errors made by the NN-based PES are even up to three orders of magnitude lower than for the force field-based PESs. However, at the same time, evaluating the NN-based PES is about three orders of magnitude slower. 104

Reactions in the Condensed Phase
For reactions in the condensed phase, two different situations are considered in the following. In one of them, ligands bind to a substrate anchored within a protein, such as for small diatomic ligands binding to the heme-group in globins. In the other, the substrate is chemically transformed as is the case for the Claisen rearrangement from chorismate to prephenate.
Ligand (Re-)Binding in Globins: Computationally, the structural dynamics accompanying NO-rebinding to Myoglobin has recently been investigated with the aim to assign the transient, metastable structures relevant for rebinding of the ligand on different time scales. 106 For this, reactive MD simulations using MS-ARMD simulations were run involving the bound 2 A and the unbound 4 A states which are also probed experimentally. The energy for each of the states was represented as a reproducing kernel 47,106,107 for the subspace of important system coordinates (the heme(Fe)-NO separation and angle, and the doming coordinate of the heme-Fe) combined with an empirical force field for all remaining degrees of freedom. Such an approach is inspired by a decomposition of the system into a region that is modelled with high accuracy (typically a "quantum region") and an environment (the "molecular mechanics" part).
With a system parametrized in this fashion, extensive reactive MD simulations were run. 106 The kinetics for ligand rebinding is nonexponential with time scales of 10 and 100 ps. These are consistent with time scales measured from optical, infrared, and X-ray absorption experiments and previous computational work. [108][109][110][111][112][113][114][115][116][117][118][119] The influence of the iron-out-of-plane (Fe-oop or "doming") coordinate on the rebinding reaction, as predicted by experiment, 111 was directly established. The two time scales (10 and 100 ps) are associated with two structurally different states of the His64 side chain -one "out" (state A) and one "in" (state B) -which control ligand access and rebinding dynamics. Such an unequivocal assignment was not possible from experiment. 120 In addition, the simulations provide an explanation why an energetically feasible state for NO-binding to heme is typically not found in Mb: Although the bound Fe-ON state is a local minimum on the potential energy surface, the energy of this state on the unbound 4 A manifold is lower and, hence, the bound 2 A Fe-ON can not be spectroscopically characterized. The simulations finally clarify that the XAS experiments are unable to distinguish between structures with photodissociated NO "close to" or "far away" from the heme-Fe in the active site as had been proposed. 114 In this fashion, validation of experimental results by the MD simulations and in-depth analysis of the configurations driving the dynamics on the different time scales (10 ps and 100 ps) allowed to identify the structural origins of the conformational dynamics at a molecular level.
It is expected that further combined experimental and computational studies of this kind will provide the necessary insight to link energetics, structures and dynamics in complex systems.

Reactions in Solution:
The Claisen rearrangement 121 is an important [3,3]-sigmatropic rearrangement for high stereoselective 122 C-C bond formation. 123 The text book example of a Claisen rearrangement is the reaction of allyl-vinyl ether (AVE) to pent-4-enal. 124 In polar solvent the stabilization of the transition state (TS) relative to the reaction in vacuum is the origin of the catalytic effect. [125][126][127] This has motivated numerous studies on enzymatic Claisen rearrangements in particular [128][129][130][131][132][133][134][135][136][137][138] and reactions with related substrates. [139][140][141][142] Compared to the reaction in aqueous solution the enzymatic catalysis of the Claisen rearrangement reaction in chorismate mutase (CM) leads to a rate acceleration by ∼ 10 6 due to stabilisation of the TS. 143 A reactive force field based on MS-ARMD was parametrized for AVE and used unchanged for AVE-(CO 2 ) 2 and chorismate to study their Claisen rearrangements in the gas phase, in water and in the chorismate mutase from Bacillus subtilis (BsCM). Using free energy simulations it is found that in going from AVE and AVE-(CO 2 ) 2 to chorismate and using the same reactive PES the rate slows down when going from water to the protein as the environment.
However, for the largest substrate (chorismate) they correctly find that the protein accelerates the reaction. Considering the changes of +4.6 (AVE), +2.9 (AVE-(CO 2 ) 2 ) and −4.4 (chorismate) kcal/mol in the activation free energies and correlating them with the actual chemical modifications suggests that both, electrostatic stabilization (AVE→AVE-(CO 2 ) 2 ) and entropic contributions (AVE-(CO 2 ) 2 → chorismate, through the rigidification and larger size of chorismate) lead to the rate enhancement observed for chorismate in CM.
As for the reaction itself it is found that for all substrates considered the O-C bond breaks prior to C-C bond formation. This agrees with kinetic isotope experiments according to which C-O cleavage always precedes C-C bond formation. 144 For the nonenzymatic thermal rearrangement of chorismate to prephenate the measured kinetic isotope effects 144,145 indicate that at the TS the C-O bond is about 40 % broken but little or no C-C bond is formed, consistent with an analysis based on "More O'Ferrall-Jencks" (MOFJ) diagrams. 146,147 The analysis of the TS position in the active site of BsCM reveals that the lack of catalytic effect on AVE is due to its loose positioning, insufficient interaction with and TS stabilization by the active site of the enzyme. Major contributions to localizing the substrate in the active site of BsCM originate from the CO − 2 groups. This together with the probability distributions in the reactant, TS and product states suggest that entropic factors must also be considered when interpreting differences between the systems, specifically (but not only) in the protein environment.

Energy predictions
The systematic exploration of chemical space is a possible way to find as of yet unknown compounds with useful properties, e.g. for medical applications. For example, the GDB-17 database 148 enumerates 166 billion small organic molecules that are potential drug candidates. However, running ab initio calculations to determine the properties of billions of molecules is computationally infeasible. Machine-learned PESs were shown to reach accuracies on par with hybrid DFT methods 149 and thus can serve as an efficient alternative to predict e.g. stabilization energy or equilibrium structures.
In order to be able to compare different approaches, benchmark datasets are used to assess the accuracy of ML methods. One of the most popular benchmarks for this purpose is QM9. 150

Outlook and Conclusions
This section puts the methods discussed in the present overview into perspective and discusses future extensions, and their advantages and disadvantages.
As discussed, RKHS has been applied to generate accurate representations of PES for different triatomic systems (3D) to study either reactive collisions or vibrational spectroscopy.
The RKHS procedure can also be applied to construct higher dimensional PESs. As an example, an RKHS representation of the 6D PES for N 4 is discussed. Previously, a global PES was constructed for N 4 using PIPs from 16435 CASPT2/maug-cc-pVTZ energies 153,154 which are also used here. For constructing the RKHS, a total of 16046 ab initio energies up to 1200 kcal/mol were used. The full PES is expanded in a many body expansion, i (r j , r k , r l ) + V (4) i (r j , r k , r l , r m ) (16) where the first term is the sum of four 1-body energies, the second term is the sum of six 2-body interaction energies, the third term is the sum of four 3-body interaction energies and the last term is the 4-body interaction energy. The first term is set to a constant value which is the energy of total dissociation of N 4 to four N atoms. Each 2-body term can be expressed by a 1D reproducing kernel polynomial and corresponding RKHS PESs can be constructed from the diatomic N 2 potential. The last two terms can be expressed by a product of three and six 1D reproducing kernel polynomials. In this work, the sum of the last two terms are calculated using RKHS interpolation of the E (3+4) energies. The sum of 3 and 4-body interaction energies (E (3+4) ) is calculated as For all the cases the 1D kernel function (k n,m ) with smoothness n = 2 and asymptotic decay m = 6 is used for the radial dimensions, which is expressed as where, x > and x < are the larger and smaller values of x and x , respectively, and the kernel smoothly decays to zero at long range. Symmetry of the system can also be implemented within this approach by considering all possible combinations for the 3 and 4-body interaction energies. Here, it is worth to be mentioned that interpolating the 3-body and 4-body terms separately should provide more accurate energies, which is however not possible in this case as the triatomic energies are not available. The root mean square errors, mean absolute errors are computed for the training data set and tabulated in Table 1. The correlation between the reference ab initio energies and RKHS interpolated energies are plotted in Figure 5 with an R 2 = 0.9981. A few dissociation curves for the N 2 are plotted in Figure 6 for different configurations of the other N 2 diatom. The ab initio energies shown in Figure 6 are not included in the RKHS training grid and show that a RKHS can successfully reproduce the overall shape and values of the unknown ab initio potential. Energy (kcal/mol) Figure 6: Comparison between the test ab initio data (symbols) and RKHS interpolated energies (solid lines) for the dissociation curves N3-N4 (along r B ) for N 2 +N 2 system with N1-N2 fixed at r A . The angle between r A and r B is defined as Although techniques such as RKHS or permutationally invariant polynomials can provide accurate representations, their extensions to higher dimensions remains a challenge. Recently, the use of PIPs was demonstrated for the PES of N-methyl acetamide which is an important step in this direction. 155 Additionally, the (s)GDML approach 156,157 has been used to construct PESs for several small organic molecules, such as ethanol, malondialdehyde and aspirin. 158 Another challenge is to reduce the number of points required to define such a PES.
Efforts in this direction have recently shown that with as few as 300 reference points the PES for scattering calculations in OH+H 2 can be described from a fit based on Gaussian processes together with Bayesian optimization. 159 Nevertheless, such high-accuracy representations of PESs for extended systems will remain a challenge for both, the number of high-quality reference calculations required and the type of inter-(and extra-)polation used to represent them.
Another important aspect of accurate studies of the energetics and dynamics of molecular systems concerns the observation, that "chemistry" is often local. As an example, the details of a chemical bond -its equilibrium separation and its strength -can depend sensitively on the local environment which may play an important role in applications such as infrared spectroscopy. As an example, singly methylated malonaldehyde is considered. Depending on the position of the proton, see Figure 7   Capturing such effects within a NN-trained global PES using PhysNet is more convenient.
As an example, the situation in singly-methylated malonaldehyde (acetoacetaldehyde, AAA) is considered, see Figure 7. There are two CO motifs each of which can carry the transferring hydrogen atom at the oxygen atom. Depending on whether the hydrogen atom is on the OC-CH 3 or OC-C side the chemical nature of the CO bond changes. This also influences the frequencies of the CO stretch vibrations. Figure 8 reports the infrared spectrum from normal modes from MP2/6-311G(d,p) calculations and from an NN trained on energies at the same level of theory. As is shown, the normal modes from the electronic structure calculations from the MP2/6-311G(d,p) for the two isomers (top and bottom panels) differ appreciably in the range of the amide-I stretches. Above 1600 cm −1 the harmonic frequencies occur at 1644 and 1692 cm −1 for isomer AAA1 and at 1658 and 1696 cm −1 for isomer AAA2. The NN (middle two panels) is successful in capturing the higher frequency (at 1689 and 1695 cm −1 for the two isomers, respectively) whereas for the lower frequency the two modes occur at 1635 and 1634 cm −1 . Additional modes involving CO stretch vibrations occur between 1400 and 1500 cm −1 . Figure 8 shows clear differences for the patterns for AAA1 and AAA2 which are correctly captured by the NN.
In a conventional force field all these frequencies would be nearly overlapping as the force field parameters for a CO bond does usually not depend on whether a hydrogen is bonded to it or not. In order to capture such an effect, the force field parameters for the CO bond would need to depend on the bonding pattern of the molecule along the dynamics trajectory.
Encoding such detail into a conventional force field is difficult and NN-trained PESs offer a natural way to do so.
Another benefit yet to be explored that NN-trained PESs such as PhysNet offer is the possibility to have fluctuating point charges for a molecule without the need to explicitly parametrize the dependence on the geometry. Modeling such effects within an empirical force field is challenging. 161 A final challenge for high-dimensional PESs is including the chemical environment, such as the effect of a solvent. Immersing a chemically reacting system into an environment leads Figure 8: The infrared spectrum of methylated malonaldehyde in the region of the CO stretch region. The bands at higher frequency (above 1600 cm −1 are due to C=O bonds whereas those between 1400 and 1500 cm −1 involve a partial double bond for the CO stretch. The top and bottom panels are for normal modes from MP2/6-31G(d,p) calculations and the two middle panels from normal modes on the trained NN using PhysNet.
to pronounced changes. As an example, double proton transfer in formic acid dimer in the gas phase and in solution is considered. The parametrization used here was adapted to yield the correct infrared spectrum in the gas phase. 162 Recent high-resolution work has confirmed that the barrier of 7.3 kcal/mol for the gas-phase PES is compatible with the tunneling splitting observed in microwave studies. 163 Such a barrier height makes spontaneous transitions rare. Hence, umbrella sampling simulations were combined with the molecular mechanics with proton transfer (MMPT) force field to determine the free energy barrier for DPT in the gas phase and in solution. As a comparison, the simulations were also carried out by using the Density-Functional Tight-Binding (DFTB) 164,165 method for the FAD. In both cases the solvent was water represented as the TIP3P model. 166 The free energy barrier in the gas phase is ∆G = 5.4 kcal/mol which increases to 7.5 kcal/mol in water, see Fig. 9. With DFTB3 the barrier height in solution is similar (7.3 kcal/mol) to that with the MMPT parametrization. In all cases, FAD undergoes a concerted double proton transfer to interconvert between two equivalent forms resulting in a symmetric potential. The nature of the transition state was verified by running 5000 structures from the umbrella sampling simulations at the TS, starting with zero velocity, and propagating them for 1 ps in an N V E ensemble. The fraction of reactants and products obtained are 0.54 and 0.46, indicating that the configurations sampled in the umbrella sampling simulations indeed correspond to a transition state and lie midway between reactants and products and are equally likely to relax into either stable state.
From these simulations it is also possible to determine the time to product or reactant which is reported in the the inset of Figure 9. The most probable time is ∼ 5 fs with a wide distribution extending out to to 20 fs. This is typical for a waiting time distribution and indicates that multiple degrees of freedom are involved.  Figure 9: Free energy as a function of reaction coordinate for proton transfer in gaseous and water-solvated FAD. The blue and red curves show the free energy for FAD in the gas and solution phase respectively using the MMPT force field. The energy profile in black is obtained for FAD in solution through DFTB treatment. In all cases, for the umbrella sampling procedure, 17 umbrella windows are located at 0.1Å intervals and trajectories are propagated for 50 ps. The probability distribution from different umbrellas are recombined using the weighted histogram analysis method (WHAM). 167 Figure 10: Solvent distribution around FAD for the transition state ensemble from 5000 transition states sampled from umbrella sampling simulations.
The methods discussed in the present work have all their advantages and shortcomings.
Depending on the application at hand the methods provide different efficiencies and accuracies and are more or less straightforward to apply. In the following, the three approaches discussed here are compared by looking at them from different perspectives.
• For small gas phase systems such as tri-and tetraatomics, RKHSs, PIPs and NN-based force fields are powerful methods for accurate investigations of their reactive dynamics.
Empirical force fields are clearly not intended and suitable for this.
• For medium-sized molecules (up to ∼ 10 atoms) in the gas phase, reactive MD methods, such as EVB 168 (not explicitly discussed here or multi state reactive MD), NNs, or suitably parametrized force fields (polarizable or non-polarizable) including multipoles are viable representations. PIPs or RKHSs will eventually become cumbersome to parametrize and computationally expensive to evaluate.
• Systems with ∼ 10 atoms in solution can be described by refined FFs and reactive MD simulations. NNs, such as Physnet, would be a very attractive possibility, as • Finally, for macromolecules in solution, such as proteins, either refined reactive FFs or a combination of RKHS and a FF has shown to provide meaningful ways to extend quantitative, reactive simjlations to condensed phase systems. Extending such approaches, akin to mixed QM/MM simulations but treating the reactive part with a NN, may provide even better accuracy.
Multidimensional PESs are a powerful way to run high-quality atomistic simulations for gasand condensed phase systems. Recent progress concerns the accurate, routine representation of PESs based on RKHSs or PIPs. As an exciting alternative, NN-based PESs have also become available. Despite this progress, extension of these techniques to simulations in solution and multiple dimensions remain a challenge. Attractive future possibilities are simulations which capture the changes in local chemistry or in the atomic charges without the need to explicitly parametrize them as a function of geometry. This is possible with approaches as those used in PhysNet.