Reactive Dynamics and Spectroscopy of Hydrogen Transfer from Neural Network-Based Reactive Potential Energy Surfaces

The in silico exploration of chemical, physical and biological systems requires accurate and efficient energy functions to follow their nuclear dynamics at a molecular and atomistic level. Recently, machine learning tools gained a lot of attention in the field of molecular sciences and simulations and are increasingly used to investigate the dynamics of such systems. Among the various approaches, artificial neural networks (NNs) are one promising tool to learn a representation of potential energy surfaces. This is done by formulating the problem as a mapping from a set of atomic positions $\mathbf{x}$ and nuclear charges $Z_i$ to a potential energy $V(\mathbf{x})$. Here, a fully-dimensional, reactive neural network representation for malonaldehyde (MA), acetoacetaldehyde (AAA) and acetylacetone (AcAc) is learned. It is used to run finite-temperature molecular dynamics simulations, and to determine the infrared spectra and the hydrogen transfer rates for the three molecules. The finite-temperature infrared spectrum for MA based on the NN learned on MP2 reference data provides a realistic representation of the low-frequency modes and the H-transfer band whereas the CH vibrations are somewhat too high in frequency. For AAA it is demonstrated that the IR spectroscopy is sensitive to the position of the transferring hydrogen at either the OCH- or OCCH$_3$ end of the molecule. For the hydrogen transfer rates it is demonstrated that the O-O vibration is a gating mode and largely determines the rate at which the hydrogen is transferred between the donor and acceptor. Finally, possibilities to further improve such NN-based potential energy surfaces are explored. They include the transferability of an NN-learned energy function across chemical species (here methylation) and transfer learning from a lower level of reference data (MP2) to a higher level of theory (pair natural orbital-LCCSD(T)).


Introduction
Rapid progress of computer technology has given science new opportunities. With the possibility to carry out simulations in the broadest sense, the conventional approach to research consisting of experimental and theoretical/mathematical methods has been considerably extended. This is particularly true for the molecular sciences for which realistic, atomically-resolved simulations have become possible in the past two decades. Such "in silico" approaches now constitute an integral part of the toolbox of molecular scientists to characterize, formulate and test hypotheses and make predictions about complex systems. 1,2 A necessary requirement for carrying out such atomistic simulations -both, molecular dynamics (MD) and Monte Carlo (MC) -is the availability of a means to determine the total energy of the system for given positions x of all the atoms. For MD simulations, the forces are required as well. The most rigorous approach would be to recompute the total energy for every new configuration using electronic structure methods, i.e. solving the electronic Schrödinger equation at the highest level of theory and with the largest basis set that is computationally affordable. However, this is impractical even for small systems due to a number of reasons, in particular if a statistically significant number of trajectories is required. First, the computing time for high-level quantum methods is appreciable. Second, technical aspects such as accounting for basis set superposition errors or including multi reference effects for highly distorted geometries are difficult. Finally, for chemically demanding systems, e.g. those containing metal ions, it may even be difficult to converge the Hartree-Fock wavefunction to the desired state or converging it at all for arbitrary geometries. In such cases, human intervention is required which is not desirable.
As an alternative, the total energy can be computed for a number of conformations a priori on a grid. Then, the potential energy surface (PES) needs to be represented in a way that can be evaluated for arbitrary geometries. This can be done by either resorting to a fit of a parametrized function or by representing the PES, e.g. using reproducing kernel Hilbert space theory. 3,4 Alternatively, machine learning (ML) methods have recently emerged as a possibility to represent the energetics of molecules and their intermolecular interactions. 5 Here, the energies are obtained from learning a representation of the potential energy surface (PES) of a system, which connects a set of nuclear charges and atomic positions to the energy. 6 A suitable tool for "learning" molecular energies are so-called artificial neural networks (ANNs, henceforth NNs), which were shown to be general function approximators. 7 In the present work, an NN based on the PhysNet 6 architecture is used to generate and explore fully-dimensional, reactive PESs for three β-diketones, see Figure 1. The structures all have an enol-and a keto-form, of which the enol form is more stable in the gas phase. 8 Malonaldehyde (MA, propandial ) is a well-studied system (see below), both experimentally and by computation for hydrogen transfer (HT) which is one of the ubiquitous reaction mechanisms in chemistry. 9 The other two systems, acetoacetaldehyde (AAA, 3-oxobutanal ) and acetylacetone (AcAc, pentan-2,4-dion) examine the influence of methyl-substituents on the intramolecular HT for both, symmetric (AcAc) and asymmetric (AAA) substitution.
For MA, the infrared (IR) spectrum [10][11][12][13] and the ground state tunneling splitting [14][15][16] for hydrogen transfer have been determined experimentally. Calculations at different levels of theory have also been carried out to assign these spectra and to reproduce the splitting. In general, this requires fully dimensional dynamics simulations [17][18][19][20][21][22] performed on high level PESs. 23 AcAc is structurally related to MA (see Figure 1) through substitution of the symmetrical H-atoms by methyl groups. Given that the methyl torsion can couple to the O-O stretch and hence to the hydrogen motion along the H-bond, it constitutes a more challenging problem than HT in MA. In AAA, only one of the hydrogen atoms is replaced by methyl.
Contrary to MA, less information about activation barriers, structures and possible tunneling splittings in AcAc is available. Even the question whether its ground state assumes an asymmetric (C s ) or a symmetric (C 2v ) structure is still debated. [24][25][26][27][28][29] The ground state structure from neutron crystallography predicts a C s symmetry 24 whereas electron diffraction experiments suggest either a C s 25,26 or a C 2v structure. 28 The most recent study performed with ultrafast electron diffraction concluded that the lowest energy form of AcAc has C s symmetry. 27 In general, electronic structure calculations find an asymmetric minimum energy structure with C s symmetry for AcAc on the PESs excluding zero-point corrections, [30][31][32][33][34][35] while correcting for zero point vibrational energy leads to a slight preference of a C 2v struc-  Conversely, high-resolution rotational spectra of AcAc and its singly substituted 13 C-isotopologues in the frequency ranges 2-26.5 GHz (∼0.1-1 cm −1 ) and 60-80 GHz (∼2-3 cm −1 ) lead to a structure with C 2v symmetry 29 possibly due to zero point vibration of the proton in systems with strong hydrogen bonds. [36][37][38] The double well potential in AcAc due to HT from one oxygen to another is responsible for the tunneling splitting. 39,40 Several IR spectroscopic studies of AcAc are available in the vapor phase. [41][42][43][44] The OHstretching transition is very broad, however, it is typically assigned in the region from 2750 cm −1 to 2800 cm −1 . 41,44 The only near infrared (NIR) investigation of AcAc was performed in the liquid phase in 1929 as part of a study regarding carbonyl overtones. 45 Other than a tentative assignment of a carbonyl overtone at 1.91 µm, no further transitions for AcAc were assigned. More recently, the vapor phase IR and NIR spectra of AcAc were reported. 46 The experimental spectra were interpreted by using atomistic simulations combined with reactive force fields based on molecular mechanics with proton transfer (MMPT). 47 To the best of our knowledge no spectroscopic data for AAA is available in the open literature.
In the present work fully-dimensional, reactive PESs for MA, AAA, and AcAc are constructed by training a single NN-representation using electronic structure data. The PESs are used in finite-temperature MD simulations to determine the IR spectra of the three compounds and the (classical) rates for HT. Machine learning of the PESs is also carried out in two further, complementary ways. First, the generalizability of such an NN-learned PES is tested by learning the AcAc PES based on reference data only from MA, AAA, and the underlying "amons". 48 Second, transfer learning 49 (TL) of an NN trained on lower-level ab initio data to a higher level of theory is attempted.
The work is structured as follows. First, the methods are discussed. Then, the results from finite-temperature MD simulations for the three systems are presented and discussed in the context of previous experimental and computational work. This is followed by generalizations of the NN-learned PES and the results for TL. Finally, conclusions are drawn and an outlook is given.

Neural Network PESs Based on PhysNet
All PESs used in this work are constructed by fitting the parameters of the PhysNet 6 NN architecture to extensive reference ab initio data. The NN architecture, the data set gener-ation process and the training (fitting) procedure are described below.
Neural network architecture: PhysNet 6 is a high-dimensional NN 50 of the "message-passing" 51 type. It predicts atomic energy contributions and partial charges from feature vectors encoding information about the local chemical environment of each atom. 52 The feature vectors are constructed iteratively from the nuclear charges Z i and Cartesian coordinates r i of all atoms i by passing "messages" between atoms within a cut-off distance r cut (here r cut = 10Å). Longrange electrostatics and dispersion interactions are included explicitly. The total potential energy E of a system is represented as where E i and q i are the predicted atomic energy contributions and partial charges of atom i, r ij is the distance between atoms i and j, k e is Coulomb's constant and E D3 is the D3 dispersion correction. 53 To avoid numerical instabilities, the Coulomb term is damped for small distances r ij (not shown in Eq. 1 for simplicity, see Ref. 6 for details) and charge neutrality of the molecule is explicitly enforced, see Ref. 6. The functional form for the atom embedding is invariant with respect to translation, rotation and permutation of atoms sharing the same nuclear charge Z. 6 The forces F i acting on each atom i, required for MD studies, can be obtained analytically by reverse mode automatic differentiation. 54 For computing the infrared (IR) spectra based on the MD simulations, the total molecular dipole moment is computed from the (fluctuating) partial charges q i (Eq. 2).
During the NN training the parameters are fitted to reference ab initio energies, forces and dipole moments using the procedure described in Ref. 6. The parameters for the D3 disper-sion correction were initialized to standard values for the Hartree-Fock method 55 and then refined during training. Minimization was carried out using "adaptive moment estimation" (Adam) 56 which combines momentum and root mean squared propagation.
Data set generation: The quality of the reference data set is crucial for generating an accurate and robust PES. Here, the general procedure followed Ref. 6 again. The data set for the present work is based on calculations at the MP2/aug-cc-pVTZ 57 level of theory. For a given geometry, the energy, forces and dipole moments are calculated using the MOLPRO software package. 58 The data set not only contained the structures of MA, AAA and AcAc, but included a total of 49 structures and substructures, given in Figure S1, based on the "amon" approach. 48 An initial ensemble of 49 000 molecular geometries was generated by running Langevin dynamics at 1000 K with a time-step of ∆t = 0.1 fs using the Atomic Simulation Environment (ASE) 59 and the semi-empirical PM7 method 60 as implemented in MOPAC. 61 For all these structures, reference data at the MP2/aug-cc-pVTZ level of theory was computed and initial NNs were trained on it. Based on the initial training, the data set was augmented using adaptive sampling. 62,63 For this, Langevin dynamics are run with two NNs and the initial data set is extended with ab initio data if the energies predicted by the two NNs differed by more than 0.5 kcal/mol. Two rounds of adaptive sampling were carried out.
The final data set used to train the NNs contained energies, forces and molecular dipole moments for 71 208 structures including all three molecules and their amons. Choosing the MP2/aug-cc-pVTZ level of theory was motivated by the fact that it still provides good accuracy but is also computationally feasible for data sets of size ∼ 10 5 . As the number of required training points is a priori unknown when training a new NN the level of theory for the reference calculations should be chosen with circumspection. This is why a higher level method, such as CCSD(T), was not considered from the outset but rather used in TL.
Transfer Learning to a Higher Level of Theory: In order to further improve the quality of the entire PES, TL 49 was used. In TL, instead of starting the training procedure of an NN from scratch, an NN trained on related data is used to initialize the parameters, which usually leads to a better model with less data. This approach can be useful when high-level reference data is scarce or expensive to generate, whereas data for "pre-training" (in the present case based on a lower level of electronic structure theory) the NN is readily available. 49 As such, TL is a valuable tool to bypass the high computational cost of modern electronic structure calculations: First, an NN is pre-trained on a large number of low level ab initio data and then re-trained with a smaller number of high level ab initio data to slightly adjust its parameters. 64 Here, the model trained on the MP2 data is the base model and the pair natural orbital structures, composed of 1000 structures per molecule in the reference set, see Figure S1.
Then, the data set is split into a training set of 44 100 and a test set of 4900 structures. TL is carried out with different, randomly chosen structures in the training set with sizes of 100, 1000, 5000, 15 000, 25 000 and 40 000 structures and all parameters of the NN were allowed to change. Since the reference data set only contains energies, the loss function only depends on those and the dependency on the forces and dipole moments was removed by setting the corresponding weights in the loss term to zero. For comparison, models with the same six training sets were also trained from scratch.

Molecular Dynamics Simulations
Molecular dynamics simulations are run using the NN-trained PESs to determine the IR spectrum and the rate for HT. All MD simulations, unless stated otherwise, were run as follows. Starting from an initial structure, random momenta drawn from a Boltzmann distribution at 300 K are used as initial conditions. The system is then equilibrated for 50 ps in the N V T ensemble followed by 50 ps in the N V E ensemble. This was followed by MD simulation in the N V E ensemble for a total of 1 ns, with a time step of 0.5 fs. For each of the molecules (MA, AAA, and AcAc) a total of 1000 trajectories are run. Every second snapshot along the trajectory is recorded for analysis.
IR spectra are calculated from the dipole-dipole autocorrelation function of the MD simulations. For every configuration the molecular dipole moment µ(t) is calculated following Eq. 2. From this, the dipole moment autocorrelation function and a Blackman filter is employed to minimize noise. Finally, the IR-spectrum is Boltzmann averaged to yield Here, ω is the frequency,h is reduced Planck's constant, T is the temperature, and k B is the Boltzmann constant.
HT rates are calculated from analyzing "hazards" which are deduced from the residence times t k of the k-th transition. 67,68 Here, the residence time is defined as the time interval the transferring hydrogen remains bound to one oxygen atom before changing to the other oxygen. This can be measured by defining a cutoff O-H separation, r c . Whenever r > r c , one transfer is considered completed and the transition time t k is recorded. The Hazard for the k-th transition 67 is determined from the n residence times, t 1 ,t 2 ,...,t n , arranged in ascending order. From this, the rate can then be deduced from the slope of the hazard plot (H k versus t k ). 67

Results and Discussion
First, the quality of the fully-dimensional, reactive PESs for the three systems is discussed. This is followed by the analysis of the computed IR spectra and the HT rates. Then, possibilities to exploit chemical principles to reduce the number of required reference structures and that for quality improvement based on TL are examined.

Quality of the PESs
The , and AcAc (red). The solid part of the lines corresponds to translation of the hydrogen along the reaction coordinate q = r O1H − r O2H , whereas the dotted part is for the rotation of a methyl group by 60 • (see insets). The short horizontal lines indicate the barrier heights at higher levels of theory. 69,70 The barrier heights on the NN PESs trained on the data from MP2/augcc-pVTZ calculations for MA, AAA, and AcAc are 2.79, 2.59, and 2.17 kcal/mol, compared with higher-level calculations for MA (4.10 kcal/mol at frozen-core CCSD(T)/(aug-)cc-pVTZ level whereby only the basis set on the oxygen atoms was augmented), 69  Compared with high-level calculations at the CCSD(T)/(aug-)cc-pVTZ level of theory, the barrier heights for MA 69 and AcAc 70 are underestimated by somewhat more than 1 kcal/mol, though. Further improvements of the barriers from the NNs can be achieved through TL, as will be discussed further below. For AAA a 2D-projection of the PES along the HT coordinate q = r O1H − r O2H (x−axis) and the methyl rotation (y−axis) is shown in Figure 3. For visualization purposes, q, Φ and Θ are extracted from a trajectory used for the computation of the IR spectrum and the first 1 ps is shown in Figure 4A as well as in a 3D representation in Figure 4B. Black dots represent early and white dots represent late points during the trajectory. The hydrogen atom is transferred twice and the left-hand side methyl rotates due to the lower rotational barrier for q > 0 (see Fig. 4A). When interpreting the trajectory it should, however, be taken into account that during the MD simulations both methyl groups are free to rotate whereas for the PES scan the right-hand methyl is frozen.
The 3D trajectory illustrates that both methyl rotations (along the y− and z−axis) and HT (along the x−axis) occur during the course of an equilibrium dynamics and that they are coupled as the barrier for methyl rotation depends on whether the oxygen atom two bonds away from the methyl group carries the hydrogen or not. Next, MEP and minimum dynamical path (MDP) for MA are compared. The MDP is defined as the path followed by a trajectory that passes the exact transition state with zero excess energy. 71 The MEP for MA is symmetrical (see Figure 5) with a transition state at

Infrared Spectroscopy
All IR spectra shown in this work are calculated from NN-MD simulations. The IR spectrum for MA, AAA and AcAc are discussed in the following.
Malonaldehyde (MA) The calculated IR spectrum from MD simulations at 300 K along with the normal modes from the NN is shown in Figure 6.  Figure 6: IR spectrum of MA at 300 K averaged over 1000 independent runs, each 1 ns in length (top panel). The normal modes determined from the NN PES are given in the lower panel (red) together with normal modes determined from MP2 reference calculations (green) and the center frequencies from experiment (black). 10 HT leads to broadening of the spectrum between 2000 and 3000 cm −1 . 70 The experimental spectrum shows a peak at 650 cm −1 which is not covered by the theoretical normal modes, however, its presence can be guessed in the IR-spectrum. The peak at 650 cm −1 is attributed to a ring bend, but the frequency was not clearly assigned to a fundamental vibration. 70 Moreover, the position of the ν CH is at lower frequencies than predicted by the theoretical values.

Hydrogen Transfer Rates
HT rates were calculated from a Hazard analysis. 67,68 For MA the analysis was first carried out with r c = 1.23Å which is slightly longer than the O-H separation in the transition state which is 1.20Å. The Hazards H k are reported in Figure 9 with the residence time distribution p(τ ) for the first 1.5 ps shown in the inset. The periodicity of p(τ ) is ∼ 0.14 ps which corresponds to a frequency of 7.14/ps or a vibration of 240 cm −1 . This is close to the frequency of the O-O stretching vibration at 243 cm −1 and suggests that the O-O motion is gating HT. A similar situation was found in protonated ammonia dimer for which the N-N vibration was also found to be the frequency to which HT is coupled. 68 The fast reaction rate for HT in MA is t 1 = 1.81 crossings/ps and the slow process has a time constant t 2 = 0.0076 crossings/ps. The short time scale corresponds to "ballistic" transfer whereas the long time scale is a "dwell" time scale caused by partial equilibration of the hydrogen in one of the two potential wells. In order to test whether the findings depend on the choice of r c , the same data was also analyzed using r c = 1.1Å (see Figure S3). With this, the fast rate reduces slightly from t 1 = 1.81 to t 1 = 1.60 crossings/ps. This is explained by the exclusion of a set of HTs which immediately re-transfer without leaving the TS region completely. The slow reaction rate was not influenced by the change in r c .
On the other hand, t 2 depends quite sensitively on the range over which the data is fitted for long residence times. Values ranging from 0.006 to 0.008 crossings/ps are obtained depending on the range over which the fit is carried out. Within transition state theory (and using a frequency factor of 10 12 s −1 ), 74 a transition rate of 0.0076 ps −1 corresponds to a barrier height of E b = 2.91 kcal/mol, consistent with 2.79 kcal/mol from the NN PES, see Figure 2. In conclusion, the fast time scale for all three systems is between 1.5 and 1.8 crossings/ps whereas the long time scale t 2 differs appreciably and ranges from 0.008 crossings/ps to 0.036 crossings/ps, commensurate with a difference in the barrier height by almost a factor of two between MA and AcAc.
An independent test for the applicability of a Hazard analysis is to consider the total number of hazards as a starting population and to follow its decay using first order kinetics. Then, the residence time is the time for a "state" to decay. For MA such an analysis yields 1.69 crossings/ps compared with 1.81 crossings/ps from the Hazard analysis and confirms its validity. In all these considerations it is important to note that this approach using classical trajectories is expected to give a lower limit for the rate because quantum effects, such as tunneling or zero-point energy are not included and typically increase the rate.

Generalizability of the NN
High-quality PESs are often difficult to obtain due to the computational cost in calculating a sufficiently large number of ab initio reference energies at a sufficiently high level of theory.
It would be desirable to minimize the number of such high-level calculations that need to be carried out while retaining the quality of the high-level calculations in the represented PES.
Two possibilities are explored in the following. The first examines whether it is possible to infer the PES for a larger chemical compound from information about a smaller one (transferability). The second one attempts to learn a high-level PES from a fully-dimensional PES at a lower level of theory together with information about selected points of the higher-level PES (TL).
To assess how the NN generalizes on unseen structures, an NN PES for AcAc (two CH 3 ) is determined by learning on a training set with structures for MA (no CH 3 ), AAA (one CH 3 ) and the amons only. This model is called NN* in the following. The performance of NN* is reported in Figure 10 where    Evidently, the TL models outperform the models trained from scratch. Their performance is superior for small data set sizes, because the training already starts with a good guess for the weights and biases. For larger sizes, the errors of models trained from scratch converge towards the TL models, however, even for the largest training set size the TL models reach higher accuracy. These results suggest that pre-training on lower level data leads to higheraccuracy models when TL is applied. Barrier heights based on the MP2 optimized geometries are given in Table S1 for completeness. TL with different numbers of randomly chosen structures (between 100 and 40 000) from the training set shows that between 15 000 (for AcAc) and 25 000 (for MA and AAA)

Transfer Learning to a Higher Level of Theory
structures are required to obtain barrier heights within fractions of one kcal/mol of the PNO-LCCSD(T)-F12 values. This is primarily due to the fact that the structures for TL were randomly chosen and that it is not guaranteed that the training set contains sufficient information about the property (here TS energy) of interest. If particular features of a PES need to be improved, better choices will be possible, see further below.    Table 3. For m 15000 this increases to 6.42 kcal/mol (range from 6.00 to 6.87 kcal/mol). Hence, the larger training set size did not lead to an improved energy barrier. For  In summary, TL from a fully-trained NN at the MP2 level of theory by using additional information of the higher PNO-LCCSD(T)-F12 level of theory yields improved PESs for the three systems considered at the level of mean absolute and root mean squared error when using randomly selected reference data, see Figure 12. However, when evaluating particular local features of the TL-NNs, such as the barrier height for hydrogen transfer or the relative stabilization of two structural isomers, the results are not particularly accurate and may even lack convergence towards the correct value with increasing size of the training set. Including specific information about the property of interest was shown to considerably improve this. This is reminiscent of the "morphing potential" approach 76 which aims at reshaping a fullydimensional PES calculated at a lower level of theory by means of a generalized coordinate transformation and reference data at a higher level of theory.

Conclusion and Outlook
The In summary, a comprehensive NN model for MD studies of the spectroscopy and HT dynamics for MA, AAA, and AcAc was trained at the MP2 level of theory. The findings further suggest that TL will provide an efficient route forward for high-level, fully dimensional and reactive models to investigate the dynamics of chemical systems. Also, extending chemical space is expected to be possible but more systematic studies on this aspect are required.   Figure S4: Hazard plot and residence times (inset) for a total of 1 µs simulation for AAA structure at 300 K (only every fifth data point is shown). About 60 % of the events concern residence times τ < 0.6 ps and the longest residence time is ≈ 960 ps. A linear regression for the fast and slow process yield rates of t 1 = 1.557 crossings/ps and t 2 = 0.015 crossings/ps. The inset illustrates the histogram of the residence times with a period of ≈ 0.135 ps, corresponding to a frequency of 247 cm −1 .  Figure S5: Hazard plot and residence times (inset) for a total of 1 µs simulation for AcAc structure at 300 K (only every fifth data point is shown). About 60 % of the events concern residence times τ < 0.6 ps and the longest residence time is ≈ 260 ps. A linear regression for the fast and slow process yield rates of t 1 = 1.45 crossings/ps and t 2 = 0.036 crossings/ps. The inset illustrates the histogram of the residence times with a period of ≈ 0.15 ps, corresponding to a frequency of 222 cm −1 .