Combining SchNet and SHARC: The SchNarc machine learning approach for excited-state dynamics

In recent years, deep learning has become a part of our everyday life and is revolutionizing quantum chemistry as well. In this work, we show how deep learning can be used to advance the research field of photochemistry by learning all important properties for photodynamics simulations. The properties are multiple energies, forces, nonadiabatic couplings and spin-orbit couplings. The nonadiabatic couplings are learned in a phase-free manner as derivatives of a virtually constructed property by the deep learning model, which guarantees rotational covariance. Additionally, an approximation for nonadiabatic couplings is introduced, based on the potentials, their gradients and Hessians. As deep-learning method, we employ SchNet extended for multiple electronic states. In combination with the molecular dynamics program SHARC, our approach termed SchNarc is tested on a model system and two realistic polyatomic molecules and paves the way towards efficient photodynamics simulations of complex systems.

In recent years, deep learning has become a part of our everyday life and is revolutionizing quantum chemistry as well. In this work, we show how deep learning can be used to advance the research field of photochemistry by learning all important properties for photodynamics simulations. The properties are multiple energies, forces, nonadiabatic couplings and spin-orbit couplings. The nonadiabatic couplings are learned in a phasefree manner as derivatives of a virtually constructed property by the deep learning model, which guarantees rotational covariance. Additionally, an approximation for nonadiabatic couplings is introduced, based on the potentials, their gradients and Hessians. As deep-learning method, we employ SchNet extended for multiple electronic states. In combination with the molecular dynamics program SHARC, our approach termed SchNarc is tested on a model system and two realistic polyatomic molecules and paves the way towards efficient photodynamics simulations of complex systems.
Excited-state dynamics simulations are powerful tools to predict, understand and explain photo-induced processes, especially in combination with experimental studies. Examples of photo-induced processes range from photosynthesis, DNA photodamage as the starting point of skin cancer, to processes that enable our vision [1][2][3][4][5]. As they are part of our everyday lives, their understanding can help to unravel fundamental processes of nature and to advance several research fields, such as photovoltaics [6,7], photocatalysis [8] or photosensitive drug design [9].
Such nonadiabatic MLMD simulations are in many senses analog to excited-state ab initio molecular dynamics simulations. The only difference is that the costly electronic structure calculations are mostly replaced by a ML model, providing quantum properties like the PESs and the corresponding forces. The nuclei are assumed to move classically on those PESs. This mixed quantum-classical dynamics approach allows for a very fast on-the-fly evaluation of the necessary properties at the geometries visited during the dynamics simulations.
In order to account for nonadiabatic effects, i.e., transitions from one state to another, further approximations have to be introduced [34]. One method, which is frequently used to account for such transitions, is the surface-hopping method originally developed by Tully [35]. A popular extension for this method including not only nonadiabatic couplings (NACs) but also other couplings, e.g., spin-orbit couplings (SOCs), is the SHARC (surface hopping including arbitrary couplings) approach [36][37][38]. Importantly, NACs, also called derivative couplings, are used to determine the hopping directions and probabilities between states of same spin multiplicity [36,37,[39][40][41]. The NAC vector (denoted as C NAC ) between two states, i and j, can be computed as [39,42,43] where the second-order derivatives are neglected. As a further difficulty, NACs are often missing from quantum chemistry implementations and are thus often approximated [44][45][46][47][48][49][50][51][52]. SOCs (denoted as C SOC ) are present between states of different spin multiplicity, and determine the rate of intersystem crossing. They are obtained as off-diagonal elements of the Hamiltonian matrix in standard electronic-structure calculations [37,53].
Most of the recent studies involving ML dynamics deal with ground-state MD simulations, see e.g. Refs. , where one of the most promising ML models is SchNet [79,80], a deep continuous-filter convolutional-layer neural network.
Only a small but quickly increasing number of studies deal with the treatment of excited states and their properties using ML [15, 16, 16-28, 31, 33, 81]. An arising difficulty compared to ground state energies and properties is that not only one, but several PESs as well as the couplings between them have to be taken into account. Additionally, the learning of couplings proves challenging due to the fact that properties resulting from electronic wave functions of two different states, Ψ i and Ψ j , have their sign dependent on the phase of the wave functions [32,33,82,83]. Since the wave function phase is not uniquely defined in quantum chemistry calculations, random phase jumps occur, leading to sign jumps of the coupling values along a reaction path. Hence, the couplings can not be learned directly as obtained from a quantum chemistry calculation. An option is to use a phase correction algorithm to pre-process data and remove these random phase jumps. Assuming that the effect of the Berry phase remains minor on the training set, smooth properties are obtained that are learnable by ML models [32]. However, this approach is expensive and many quantum chemistry reference computations are necessary to generate the training set. In cases of large poly-atomic molecules with many close lying energetic states, this approach might even be infeasible.
The aim of this letter is to provide a framework to carry out efficient excited-state MLMD simulations and to combine two popular methods for this purpose: the SHARC approach for photodynamics with states of different multiplicity and SchNet to efficiently and accurately fit potential energies and other molecular properties. We call this combination the SchNarc approach and adapted SchNet for the treatment of excited state potentials, their forces and couplings for this purpose. The SchNarc approach can overcome the current limitations of existing MLMD simulations for excited states by allowing (i) a phase-free training to omit the costly pre-processing of raw quantum chemistry data and, to treat (ii) rotationally covariant NACs, which can either be trained, or (iii) alternatively be approximated from only ML potentials, their gradients, and Hessians, and to treat (iv) SOCs.
To validate the phase-free training, a new loss function termed phase-less loss function is developed and tested for the methylenimmonium cation, CH 2 NH + 2 , of which we take a phase corrected training set from Ref. [32]. Using the same level of theory (MR-CISD(6,4)/aug-cc-pVDZ) with the program COLUMBUS [84], the training set is recomputed without applying phase correction to train ML models also on raw data obtained directly from quantum chemistry programs. The ML models are trained on energies, gradients, and NACs for three singlet states using 3,000 data points. This molecular system features singlet-only dynamics and ultrafast transitions.
The phase-less loss is based on the standard L 2 loss, but here, the squared error of the predicted properties is computed 2 N S −1 -times with N S being the total number states. The value of each property, L P (i.e. L SOC and L N AC ), that enters the loss function is the minimum function of all possible squared errors ε k P : for vectorial and non-vectorial properties, respectively. The error ε k P for a specific phase is computed as the mean squared error of a property P from quantum chemistry (index QC) and machine learning (index M L). The property P couples different states, indicated by i and j. Since the wave function of each of the states can have an arbitrary phase, the property P ij that couples state i and j has to be multiplied with a product of the phases for these states, p i · p j . The phases for all states together form a vector p with entries of either +1 and -1. Which of the 2 N S −1 possible combinations for p is chosen, is indicated by the index k, also defined in eq. (3). The relative signs within one vector remain and must be predicted correctly for successful training.
The overall loss function used in this work is a combination of such phase-less loss functions and mean squared errors obtained for all properties with a trade-off factor to account for their relative magnitude (a detailed description of the implementation is given in the SI). This loss function removes the influence of the arbitrary phase during the learning process of a ML model and further reduces the computational costs for the training set generation.
Results are given in Figure 1, which shows the population schemes of CH 2 NH + 2 obtained after excitation to the second excited singlet state, S 2 . The SchNarc models are all trained on energies, forces, and NACs for the three considered singlet states. The populations obtained from SchNarc models (panels A and C) trained on a data set that is not phase corrected, i.e. it contains couplings that can randomly switch their sign, are compared to populations obtained from models trained on phase corrected data (panels B and D). As can be seen, the L 2 loss function, as used in the upper plots, leads to an accurate ML model to reproduce ultrafast transitions only in the case of phase corrected data (panel C), whereas this loss can not be used when trained on raw quantum chemistry data (panel A). In comparison, a SchNarc simulation with a ML model that applies the phase-less loss function is successful in reproducing the populations for both training sets. In those simulations, the NACs are multiplied with the corresponding energy gaps, i.e., C NAC ij = C NAC ij · ∆E ij , to get rid of singularities [21,33]. These smooth couplings C NAC ij are not directly learned, but rather constructed as the derivative of a virtual property, analogously to forces that are predicted as derivatives of an energy-ML model. The virtual property is the multi-dimensional anti-derivative of the rightmost expression in equation (1), Ψ i | ∂H el R | Ψ j (a derivation is given in the SI). Compared to previous ML models for NACs [29,30,32,33], where NACs are learned and predicted as direct outputs or even single values, this approach provides rotational and translational covariance, which has recently been achieved in a similar way for the electronic friction tensor [85].
However, even without the need of pre-processing the training set, the costly computations of NAC vectors for the training set generation remain. Approximations of NACs exist and often involve the computation of the squared energy-gap Hessian [11,[86][87][88][89][90]. Their use in dynamics simulations is rather impracticable with quantum chemistry methods, especially in the case of complex systems, due to the expenses of computing second-order derivatives.
Here, we take advantage of the efficiency for second-order derivative computation from ML models with respect to atomic coordinates to obtain the Hessians of the fitted PESs: with R being the atomic coordinates of a molecular system. Note that Hessians are also employed in quantum dynamics simulations [91,92], which might open further applications for our implementation. The squared energy-gap Hessian can be further obtained as the sum of two symmetric dyads, that define the branching space [90]. Hence, this Hessian can be employed to obtain the symmetric dyad of the smooth NACs via [11,93]: After singular value decomposition, the hopping direction can be computed as the eigenvector, v ij , of the largest non-zero eigenvalue [90,93,94] with the corresponding eigenvalue, λ ij , as the squared magnitude of the ML smooth coupling, C NAC ij . The final approximated NAC vectors, C aNAC ij , between two states are then: The approximated NAC vectors can be employed in the vicinity of a conical intersection, otherwise the output becomes too noisy. For the latter reason, we define thresholds of 0.5 eV and 1.0 eV for the energy gaps to compute approximated NACs between coupled singlet-singlet states and triplet-triplet states, respectively. It is worth mentioning that the ML models slightly overestimate the energy gaps [32], since, in contrast to quantum chemistry PESs, the ML PESs are smooth everywhere. In Ref. [94], approximated NACs were applied for a 1D system and their usefulness in combination with ML was already anticipated.
We turn this idea into reality and show ML excited-state dynamics with approximated NACs for a linear vibronic coupling (LVC) model [10,95] of sulfur dioxide, SO 2 , which we use as a reference system to assess the quality of approximated NACs and also include triplet states to treat SOCs. The LVC model of sulfur dioxide, SO 2 , [95] contains 3 singlet states and 3 triplet states, with symmetry allowed NACs between the first and second excited singlet states as well as the first and third triplet states, and is used to train two ML models: One, where the NACs are learned and another, where the NACs are approximated according to equation (7). The ML models for dynamics simulations with approximated (trained) NACs are built up from energies, gradients, and SOCs (plus NACs) using 5,000 (20,000 for singlet only or 200,000 for singlet and triplet states) randomly selected data points from training sets consisting of 280,200 data points.
In addition, we also apply the NAC approximation to two systems, where the data is obtained directly with ab initio methods: the methylenimmonium cation, CH 2 NH + 2 , as presented before, and thioformaldehyde, CSH 2 . CSH 2 , is used, because it shows slow population transfer [96], in contrast to the fast population transfer in CH 2 NH + 2 . The training set is built up of 4,703 data points with two singlet states and two triplet states after initial sampling of normal modes and adaptive sampling with simple multi-layer feed-forward neural networks as done in Ref. [32] for CH 2 NH + 2 . The program MOL-PRO [97] is used for these calculations with CASSCF(6,5)/def2-SVP. The dynamics simulations are carried out after excitation to the first excited singlet state for 3000 fs. A detailed analysis on the reference computations as well as information on the timing of the Hessian evaluation is given in the SI in sections S2 and S3.1 respectively. None of the data points from the ab initio MD simulations, to which we compare our SchNarc models, are included in the training sets and, thus, the dynamics simulations can be seen as a direct test.
Results for the LVC model are depicted in Fig. 2. The potential energy curves along the asymmetric stretching mode of the singlet states (left plot) and the triplet states (right plot) are shown along with the norm of the respective NAC vectors. As can be seen, the shape as well as the height of the peak of the norm of trained NACs (dashed lines) and approximated NACs (dotted lines) are comparable to those of the LVC model (continuous lines). The approximated NACs approach zero faster than the trained NACs, which is due to the applied threshold that is set for the energy gap to compute NACs. Noticeably, the learned NACs between the triplet states show a decrease, when the corresponding triplet energies are close to each other. This might be an effect due to the Berry phase, that can not be captured with our approach leading to artifacts in some regions.
Importantly, the NACs can be predicted accurately with ML in most of the regions around the conical intersections. We first consider a singlet-only model in order to support this assumption with dynamics simulations, see Fig. 3. LVC populations show minor population transfer between the second excited singlet state and the first excited singlet state, which can be reproduced with both SchNarc models (upper panels).
Also when including triplet states, the SchNarc models can reproduce the dynamics (lower panels). Here, population is mostly transferred from the first excited singlet state to the triplet states. Note that populations in surface hopping are often only accurate to within 10%, such that we judge the deviations of the ML populations from the LVC reference as small (see Ref. 33 for examples, where dynamics is not reproduced by ML although potentials seemingly are).
The application of the NAC approximation is further tested on more realistic systems with properties obtained from quantum chemistry data including fast as well as slow population transfer. CH 2 NH + 2 serves as a testsystem for the former case, where ultrafast transitions from the second excited singlet state back to the ground state take place after excitation within 100 femtoseconds [32,98], which is only possible to repro-   duce with accurate NACs [33]. In contrast, the CSH 2 molecule serves as a testsystem for slow populations transfer and shows intersystem crossing strongly dependent on the accuracy of the underlying potentials [96]. Inaccurate ML models would thus be unable to reproduce the reference dynamics. The populations of both systems are given in Fig. 4. The reference populations (left panels) are compared to SchNarc simulations with ML models trained on energies and gradients (panel B) as well as SOCs (panel D) for CH 2 NH + 2 and CSH 2 , respectively. As can be seen, the fast and the slow population transfer can be reproduced accurately, which proves the validity of our ML approach.
In summary, the SchNarc framework combines the SHARC [37] approach for surface hopping and the SchNet [80] approach for ML. The training of ML models is facilitated by using the phase-less loss and the NAC approximation, avoiding quantum chemical NAC calculations at all. Thus, photodynamics simulations are possible solely based on ML PESs, their derivatives and SOCs. Furthermore, this method allows for a very efficient computation of the Hessians of all the excited states at each time step. Hence, SchNarc allows for efficient nonadiabatic dynamics simulations of excited states and light-induced processes including internal conversion and intersystem crossing.

Data availability
The data sets are partly available [32] and will be partly made available on github.com/schnarc. The molecular geometries and corresponding properties are saved in a database format provided by the atomic simulation environment [99].

Code availability
All code developed in this work will be made available on github.com/schnarc.

Supplementary Information 1 SchNet for excited states
As a machine learning (ML) model, the deep continuous-filter convolutional neural network SchNet, that is described in detail in Ref. [79,80] is used and adapted for excited states to train excited-state energies, forces, spin-orbit couplings (SOCs), and nonadiabatic couplings (NACs).
The molecular descriptor is constructed by SchNet [79] that treats atoms in their chemical and structural environment. A cutoff is defined to specify the environment that is included for the description of an atom. Hence the molecular properties are obtained as atom-wise contributions. A continuous-filter convolutional layer and several additional interaction layers define and optimize the atom representations. These representations are mapped to different properties via fully connected layers with shifted softplus activation functions. These prediction blocks, which use a common descriptor network, are separately designed for energies, SOCs, and NACs, whereas the forces are derived with respect to atomic coordinates from outputs of the ML model for energies. The loss function is a combined loss function of all the properties. A trade-off is defined to weigh the properties according to their magnitude. The properties that should be learned have to be specified along with the corresponding trade-off in an additional input file.

Standard loss function, L 2
The overall L 2 loss function as implemented in SchNet for excited states, reads: where t E , t F , t SOC , and t N AC define the trade-offs for the properties E (energies), F (forces), C SOC (SOCs), and C NAC (NACs), respectively. Corresponding labels with an index "QC" refer to the reference value and with an index "ML" to the the SchNet predictions.

Phase-less loss function, L ph
In order to train on inconsistent SOCs and NACs with respect to their sign, we have developed a phase-less loss-function. This is based on the L 2 loss, but here, the squared error of the predicted properties, P , is computed more often, i.e. 2 N S −1 -times with N S being the total number states. The value, L P , that enters the overall loss function, L ph , is the minimum function of all possible squared errors, ε k P , for a given property, P : for vectorial and non-vectorial properties, respectively. The error ε k P for a specific phase is computed as the mean squared error of a property P from quantum chemistry (index QC) and machine learning (index M L). The property P couples different states, indicated by i and j. Since the wave function of each of the states can have an arbitrary phase, the property P ij that couples state i and j has to be multiplied with a product of the phases for these states, p i · p j . The phases for all states together form a vector p with entries of either +1 and -1. Which of the 2 N S −1 possible combinations for p is chosen, is indicated by the index k, also defined in eq. (9). Since we are free to choose one of the phases, we set the phase of the first state always to +1. The relative signs within one vector remain and must be predicted correctly for successful training.
The overall loss function used in this work is a combination of such phase-less errors and mean squared errors obtained for all properties with a trade-off factor to account for their relative magnitude (as already specified in equation 8): (11) This cost function removes the influence of the arbitrary phase during the learning process of a ML model and further reduces the computational costs for the training set generation.
We tested on several alternatives, such as a loss function that additionally includes the norm of a vector, variations of a minimum function and another type of phasefree loss function, that can be used if only one type of coupling, i.e. SOCs or NACs, or dipole moments are trained. This error is also implemented in SchNet for excited states and the error of a property that couples state i and j, ε k P is computed as follows: As can be seen, the error is computed twice -once assuming a correct phase of predicted properties and once a phase switch, that are both combined subsequently in case dim(P) ≥ 3: The value, L P , that enters the loss function is then either a combination of both possibilities for vectorial properties or a minimum function of the two possible errors, ε k,± P . This variation gives comparably accurate results and also leads to a phase-free training. Experiments have shown, that -e.g. in the case of the SO 2 moleculemore data points, but shorter training is necessary. This alternative error can be more favorable in cases, where only one coupling type is needed and many states are involved, since computation of all possible combinations can be omitted. All other tested variations turned out to be less successful in learning the shape of couplings.

Machine learning models
The model parameters for each molecular system are given in Table 1 including the number of data points and states trained. The errors for the remaining test set (i.e. the data points not used for training and validation) are listed as mean absolute values resulting from all states. For the CH 2 NH + 2 and CSH 2 models, the ML predictions reach chemical accuracy and in some cases the error is even below 0.043 eV (1 kcal/mol). If not stated otherwise, 256 features with 3 hidden layers are used for each model. The batch size ranges from 20 to 50 and is set in order to comply with the maximum allowed memory of a used GPU. The learning rate is set to 0.0001 and is reduced by a factor of 0.8 down to a value of 0.000001 with a patience of 15 steps. The maximum number of epochs is set to 5000. The trade-off for each property is defined, so that the mean squared errors in the first few epochs is equally large for all properties. Table 1: Parameters for the trained SchNet models for excited states. For SO 2 , CSH 2 , and CH 2 NH + 2 200-5,000, 200 and 100 data points are used for validation, respectively, hence at least 50,000, 503, and 900 data points are used for testing. The MAEs and RMSEs are reported in eV, eV/Å, and a.u. for energies, gradients and all type of couplings, respectively. If not mentioned otherwise, the data sets are not phase corrected. 2 Training sets and reference computations

Training set generation
SchNet models for excited states are trained on a linear vibronic coupling model (LVC) of SO 2 [10,95], the methylenimmonium cation, CH 2 NH + 2 and thioformaldehyde, CSH 2 . The molecular geometries and corresponding properties are saved in a database format provided by the atomic simulation environment [99]. No data points of the dynamics simulations to which we compare SchNarc models are included in the training sets. The phase corrected training set for the methylenimmonium cation, CH 2 NH + 2 , is taken from Ref. [32] and consists of 4000 data points. The geometries of this training set were recomputed with the same level of theory, MR-CISD(6,4)/aug-cc-pVDZ, but without applying any pre-processing, such as phase correction, in order to provide a non-phase corrected training set. The program suite COLUMBUS [84] was used for this purpose, resulting in 3998 converged single point calculations. For the dynamics simulations with SchNarc, 1000 trajectories (resulting from 20,000 initial conditions sampled from a Wigner distribution [100]) are propagated for 100 fs using a time step of 0.5 fs.
The training set for thioformaldehyde, CSH 2 , is generated in the same way as it is done in Ref. [32] for CH 2 NH + 2 . Initial configurations are sampled via scans of different reaction coordinates, such as normal modes. Additionally, adaptive sampling for excited states is carried out using two simple multi-layer feed-forward neural networks. At a number of 4855 data points, the networks seem to be converged and dynamics simulations up to 3 ps can be reproduced. The training set consists of 4703 data points, where samples showing a smaller energy gap than 0.01 H between triplettriplet states are sorted out due to problematic data points in those regions. Without these points the NNs converge much better, in about half of the time. The RMSE of forces is slightly larger, the rest of the errors are comparable. The level of theory is CASSCF(6,5)/def2-SVP and 2 singlet and 2 triplet states are included. Quantum chemistry calculations are carried out using Molpro [97]. In addition to energies, forces, SOCs, and NACs, the permanent and transition dipole moments are included in the training set. For the dynamics simulations of CSH 2 , 40,000 initial conditions are sampled from a Wigner distribution [100] and excited to the first excited singlet state (2.0-2.5 eV). 100 trajectories are propagated with the reference method for 3 ps with a time step of 0.5 fs. The resulting populations are compared to 959 trajectories obtained from SchNarc.
For the SO 2 molecule, we refer to dynamics simulations with the reference method to generate the training set, which is based on a "one-shot" LVC model [95]. 10,000 initial conditions are sampled from a Wigner distribution [100] and excited between 0 and 10 eV. Surface hopping molecular dynamics simulations are carried out with SHARC after excitation of 1200 initially sampled geometries. They are propagated with NAC vectors for 700 fs with a time step of 0.5 fs. The first 200 trajectories are taken for the training set generation. This procedure is done twice -once only singlet states are considered and once singlet and triplet states are taken into account, resulting in 280,200 data points for each training set. The remaining 1000 trajectories serve for comparison to SchNarc dynamics. Due to symmetry, the SO 2 model contains NACs only between the S 1 state and the S 2 state as well as between the T 1 state and the T 3 state. We considered this restriction for the SchNarc computations by setting the other couplings to zero.
It is worth mentioning that SO2 needs more data points for training than CSH 2 and CH 2 NH + 2 , since for the latter molecules, adaptive sampling was applied and for SO 2 we used data directly from dynamics simulations with the LVC model. The dynamics with the LVC model are extremely fast and hence for the training set generation the usual adaptive sampling approach [32,60,101] is far more costly and time-intensive. Since reference computations are even cheaper than ML predictions, it is not our goal to provide a perfect training set with a minimum number of data points for this model, but rather to provide an easy-to-use but yet challenging test system to validate our method.

Surface hopping molecular dynamics
In order to compute nonadiabatic molecular dynamics simulations, the SHARC [36][37][38] method, an extension of Tully's fewest switches algorithm [35], is applied. This mixed quantum-classical approach allows for on-the-fly computation of the PESs with electronic structure methods, on which the nuclei move according to Newton's second equation of motion. In order to account for nonadiabatic transitions between states of same spin multiplicity, instantaneous switches from one state to the others are allowed in regions of high hopping probability. After every simulation, the trajectories are analyzed using the SHARC diagnostic tools to check improper behaviour, such as energy fluctuations. A few reference trajectories are sorted out in each case -mainly due to improper convergence of quantum chemistry calculations in critical regions of the potential energy surfaces. Decoherence correction is applied [102] and the hopping probabilities are computed from SOCs and NACs from electronic structure calculations in case of reference dynamics or from ML models in case of SchNarc dynamics [38]. The velocities are corrected along the direction of the NAC vectors in each simulation and for dynamics with quantum chemistry, the phase is tracked along an independent trajectory.

Nonadiabatic couplings
As mentioned in the main text, NACs are either approximated or derived from a virtual property built by SchNarc. In order to define the virtual property, which SchNarc builds internally, we start by the derivative of the electronic Hamiltonian, H el , with respect to the atomic coordinates of a molecule, R: Since the adiabatic wavefunctions are eigenfunctions of H el (r, R), we can reformulate equation (15): By using the relation, Applying the Hellmann-Feynman theorem [103], we obtain the diagonal elements as the gradients, and obtain the NAC terms as properties that are inversely proportional to the corresponding energy gap of two adiabatic electronic states: The virtual property that SchNarc is generating is then the multi-dimensional antiderivative of the latter expression in equation 19, Ψ i | ∂H el R | Ψ j . Noticeably, due to the Berry phase [12,104,105] the NAC vector field is not conservative [11] and a line integral remains path dependent. Hence this approach does not include the effects of the Berry phase, which is also neglected in approaches such as the Zhu-Nakamura approximation [51,52] that does not contain a phase at all, or the phase correction algorithm [32,82]. The mixed ML-classical dynamics are thus assumed to be mostly unaffected [32,82], which might not be the case in quantum dynamics simulations.

NAC approximation and timing
The approximation of NAC vectors, as explained in the main text and adapted from Refs. [11,93,94], relies on the approximation of the Hessian from energy potentials between two states. It is especially powerful for ML models trained on quantum chemistry methods, where implementations of NAC vectors are largely missing, such as linear-response methods and here especially between the first excited state and the ground state, with the ADC(2) method being a prominent example [106]. Such an ML approach could further be used to pave the way towards efficient Hessian computations for all the states treated in quantum dynamics simulations using the variational multiconfigurational Gaussian method, where the direct Hessian computation often remains the time limiting step [91,107].
The used NAC approximation is valid in the vicinity of a conical intersection and hence relies on a threshold to define the energy gap, for which the approximation is applied. In order to avoid additional computations, the Hessians are thus only computed if one of the energy differences between all possible singlet-singlet or triplettriplet potentials is within the given threshold (as default we set 0.5 eV and 1.0 eV for singlet-singlet and triplet-triplet gaps). Since the gaps of the PESs are overestimated in case of CH 2 NH + 2 [32], this threshold is increased by 30%. This means, that based on this pre-defined threshold, SchNarc decides whether NAC vectors are computed or not. In all other cases, the NAC vectors are set to zero. It is advisable to check the used thresholds for certain cases and adapt them, where necessary. It is worth mentioning, that this approach is limited to same-symmetry electronic states, which are, nevertheless, the most probable avoided state crossings in case of real, polyatomic systems [94].

Timing
For the thioformaldehyde molecule, the evaluation of the 4 Hessians takes approximately 2 seconds, for the methylenimmonium cation, the computation of 3 Hessians takes 3-4 seconds, both on a CPU. A test computation of a 24 atom molecule was further carried out with SchNarc, which showed that the evaluation of 1 Hessian took around 45 sec on a CPU, which could be reduced to 14 sec on a GPU. In future work, we thus seek to adapt the code in order to compute only the relevant Hessians, i.e., those of close-lying states with respect to the active state during a dynamics simulation. To this aim, we seek to give the information of the active state to the SchNarc model, which is not yet implemented for our pySHARC [32,95] wrapper (python wrapper for the SHARC code, which avoids heavy file I/O). Hence the Hessians would only be computed for the states that are close enough to the active state and the dynamics simulations are then still very efficient compared to pure quantum chemistry dynamics simulations. A comparison of the timings of 100 time steps for a dynamics simulation with the current SchNarc implementations and SHARC is given in Table 2.
As can be seen, the LVC dynamics using the pySHARC wrapper are very cheap and only serve as a test system. It is however clearly visible that the training of NACs in this case needs way more data points and hence it takes longer to train the models. The dynamics of the CH 2 NH + 2 molecule using the MR-CISD method are expensive and ML can substantially decrease the simulation time. It can be seen that the Hessian computation with ML becomes more expensive, the larger the molecule becomes or the more states are involved in a simulation.