Machine learning and excited-state molecular dynamics

Machine learning is employed at an increasing rate in the research field of quantum chemistry. While the majority of approaches target the investigation of chemical systems in their electronic ground state, the inclusion of light into the processes leads to electronically excited states and gives rise to several new challenges. Here, we survey recent advances for excited-state dynamics based on machine learning. In doing so, we highlight successes, pitfalls, challenges and future avenues for machine learning approaches for light-induced molecular processes.

While experimental techniques require large and costly setups, theoretical simulations require highperformance computing facilities due to expensive electronic structure computations. Especially NAMD simulations are seriously limited by the underlying quantum chemical calculations, making long and experimentally relevant simulation times inaccessible with conventional ab initio methods. The larger the molecule becomes, the more electronically excited states are involved in reactions and the more complex their interactions become. This leads to non-linearly increasing costs of quantum chemical calculations and a compromise between accuracy and computational efficiency is indispensable. Relying on such expensive ab initio potentials, only a couple of picoseconds can be simulated and the exploration of rare reaction channels is restricted due to bad statistics [17,43,47].
Technically, the nuclear part and the electronic part of the calculations can be separated to a large extent. First, the electronic problem is solved leading to potential energies for the nuclei. Afterwards, the nuclei move on these potentials classically or quantum chemically [6,8,[48][49][50][51][52]. These two subsequent steps can be carried out in every time step (on-the-fly), if classical trajectories are employed. Alternatively, the two steps are separated as much as possible by precomputing the potential energy surfaces (PESs) and then using these precomputed PESs in the subsequent nuclear dynamics.
Experimental observables and macroscopic properties can be obtained in followup computations or analysis runs. Machine learning (ML) can accelerate the overall simulation process on different levels and at several points.
A broad classification of how to use ML models to replace different parts of quantum chemistry to make simulations more efficient is given in Fig. 1 [53].
The probably most fundamental way is to use Figure 1: A broad classification of how to use machine learning models to replace different parts of quantum chemistry [53]. Simulations can be enhanced by providing (1) a wavefunction from a machine learning model, (2) a force field by fitting energies and forces or (3) other properties, such as energy gaps or reaction rates by learning the final output of a dynamics simulation directly.machine learning model and overview of the parts of a quantum chemical simulation.
ML to solve the Schrödinger equation. This has been done for the ground state by representing the molecular wavefunction on a grid, in a molecular orbital basis or in a Monte-Carlo approach [54][55][56][57][58][59][60][61][62] and has recently also been applied for the excited states of a one-dimensional model [63]. ML can also be used to reconstruct the wavefunction from near-field spectra [64] or to bypass the Kohn-Sham equation in density functional theory (DFT) [65]. The external potential, functional, electronic density or local density of states can be learned [53,[65][66][67][68][69][70][71][72]. Very recently, Ceriotti and co-workers further introduced a smooth atomic density by defining an abstract chemical environment [73]. By having access to the molecular wavefunction or the electron density, the secondary output, which are energies and forces for the ground state and additionally couplings for the excited states, can be derived efficiently with ML. The coefficients of the ML wavefunctions or the density can further be used as an input for quantum chemical simulations, reducing the number of SCF iterations substantially [59].
Instead of learning the quantum chemistry of systems, the so-called "secondary outputs" [53] can also be mapped directly to a molecular structure, giving rise to so-called ML force fields. By training an ML model on ab initio data, the accuracy of quantum chemistry can be combined with the efficiency of conventional force fields for molecular dynamics (MD) simulations in the ground state . For the excited states, only a couple of studies are available [106,[110][111][112][113][114][115][116][117][118][119][120][121][122][123][124][125][126][127][128][129]. Nevertheless, the first NAMD simulation with ML dates back to the year 2008, where the scattering of O 2 on Al(III) was studied in a mixed quantum-classical approach considering singlet-triplet transitions [111,130].
Having access to the excited state energies, "tertiary properties", such as UV spectra [131], band gaps [132][133][134], HOMO-LUMO gaps or vertical excitation energies [135][136][137][138] of molecules can be derived. Again, this tertiary output can also be fit in a direct fashion, which has been done for instance for a light-harvesting system by learning the excitation energy transfer properties [139] or the output of NAMD simulations to find out about the relations of molecular structures and dynamic properties [140]. Moreover, ML has been successfully applied for the inverse design of molecules and materials featuring specific properties, such as defined HOMO-LUMO gaps or catalytic activities. Examples range from the inverse design of photonic materials, to (photo-)catalysts, solar cells or (photo-active) drugs, to name only a few applications [107,[141][142][143][144][145][146][147][148][149][150].
Despite the opportunities of ML for the development of groundbreaking new methodologies, current techniques are often limited to certain molecules or specific problems. Methods exist, that extrapolate throughout chemical compound space, see e.g. Refs. [59,131,[151][152][153][154][155][156][157][158][159][160], but usually models fail to go beyond energies and related properties, such as forces, atomization or excitation energies. Further, it is challenging to predict compounds consisting of atom types strongly different from those inside of the training set. Especially the fitting of the excited-state PESs poses another obstacle, let alone the transferability of excited-state PESs: Not only are ML models restricted to certain molecules or materials [106,[110][111][112][113][114][115][116][117][118][119][120][121][122][123][124][125][127][128][129], more often the different energetic states are fit independently from each other with separately trained ML models. As it is clear that the PESs of molecules are not independent of each other, it might also be unsurprising that learning them simultaneously is advantageous for various applications, such as NAMD or spectra predictions. Only a few studies exist that include more energetic states in one ML model and even less treat related properties, such as the vectorial dipole moments or couplings between different PESs in one ML model [85,118,131,154,161,162].
However, in our view, the "holy grail" of ML for photochemistry is an ML model that provides all relevant energetic states, forces and properties at once, using derivatives where possible, rather than learning the properties independently. At the very best, this model should be transferable throughout chemical compound space [163] and could be used for molecules of any composition. Given the fact that ML models for the electron wavefunctions of different excited states (or related models within the DFT framework) do not exist for polyatomic systems, this dream has not yet come alive. Hence, we will focus this perspective on ML models that learn the secondary outputs, i.e., excited state PESs, corresponding forces, and nonadiabatic and spin-orbit couplings (NACs and SOCs, respectively) between them. We note that our discussion holds for different spin-multiplicities, although most studies focus on singlet states only. We try to address the recent achievements in the fields of photochemistry using ML and discuss the current challenges and future perspectives to get a step further to a transferable ML model for excited states that treats all properties on the same footing.
We start by discussing the generation of a training set for the treatment of excited state PESs, corresponding forces and couplings and focus on their use in NAMD simulations. Especially, we aim to clarify the differences between excited-state and groundstate properties. We therefore describe the NACs and SOCs that couple different electronic states and highlight their importance for NAMD simulations. Subsequently, state-of-the-art ML models for excitedstate PESs are considered along with the challenge of modelling a manifold of energetic states.

Generating a training set for excited states
The basis of any successful ML model is a comprehensive and accurate training set that contains the molecular geometries in combination with the corresponding properties that need to be predicted. For the appli-cation of ML within NAMD simulations, the training set should contain a molecular geometry and the energies of all energetic states, corresponding forces, and couplings between the states. It is computed with the quantum chemistry method, whose accuracy one wants to obtain. The choice of the quantum chemistry method is a problem on its own and often requires expert knowledge [18,164]. Simply said, ML can be seen as efficient interpolation between data points with the accuracy of the reference method.
Before we go into detail on how to efficiently create a training set for excited states, we will first discuss the differences to ground state potentials and properties that need to be considered. A major drawback is the fact that a manifold of excited states and thus also the properties between them have to be accounted for. These are NACs between states of same spin multiplicity and SOCs between states of different spin multiplicity as well as transition dipole moments. The fitting of such properties is problematic due to the arbitrary phase of the wavefunction [118,161]. Therefore, an additional pre-processing might be necessary. Either a diabatization [106,[115][116][117][119][120][121][122][123][124][125][126][127], a so-called phase correction [118], or a special learning algorithm [165] renders data learnable. The latter two are described in the following while further details on the former are given in section 3.2.

Making excited-state data learnable
Compared to energies and forces, NACs, SOCs as well as transition dipole moments result from the wavefunctions of two different electronic states. Due to the non-unique definition of the wavefunction itself, i.e., the fact that multiplication of the electronic wavefunction with a phase factor still gives a valid eigenfunction of the electronic Hamiltonian, leads to an arbitrary phase, which is initiated randomly in a quantum chemical calculation. Consequently, also the sign of the couplings, C ij , can be positive or negative. Here, i and j denote the indices of the involved states. The resulting inconsistencies in the coupling hypersurfaces make it challenging to find a good relation between an ML model, which is per definition a smooth function [166], and those discontinuous raw outputs.
This problem can be illustrated with molecular orbitals of the methylenimmonium cation ( Fig. 2 reproduced from Ref. [118]). Panel A shows the molecular geometries, which are given as an input to a quantum chemical program. Two molecular orbitals are shown as placeholders for the wavefunctions of two electronic states, the S 1 and S 2 states. The color of the orbitals can be either blue or red and changes arbitrarily throughout the reaction coordinate. In the same way, also the overall wavefunction (which is difficult to plot) for the respective state changes its phase arbitrarily. As a consequence, also the sign of the couplings, where the product of the two wavefunctions' signs enters, may change randomly (see panel B). Figure 2: A set of quantum chemical calculations of the methylenimmonium cation, CH 2 NH + 2 , along the C-N bond. The molecular conformations, which are the input of a quantum chemical calculation, are given in panel A, the orbitals of the computed two electronic states, the S 1 and S 2 state, as well as the corresponding off-diagonal matrix elements between those two states, S 1 |Ĥ | S 2 , are given in panel B. As can be seen, the sign of those values arbitrarily switches. Those sign jumps can be removed by applying a phase correction algorithm. Results are given for those elements in panel C. Reproduced from Ref. [118] under CC-BY, https://creativecommons.org/licenses/by/3.0/.
In order to allow for a meaningful ML description of elements resulting from two different electronic states, a data pre-processing is helpful. The former process is termed phase correction [118,167] and is practicable to remove almost all such inconsistencies in the configurational space of the training set. This phase correction makes the use of conventional training algorithms possible.
To carry out phase correction, a wavefunction overlap computation [168], S = Ψ α | Ψ β , has to be carried out between the wavefunction Ψ β at every geometry β inside the training set and the wavefunction Ψ α at a reference geometry α. The phase thus has to be tracked from a pre-defined reference geometry. It often happens that two geometries are dissimilar to each other, so that interpolation between them is necessary, making this process generally more expensive. So-called intruder states give rise to additional problems, since they are so high in energy at the reference geometry that they usually would not be included in the initial calculation. However, they enter the lower energy region at another geometry visited during the NAMD simulations and thus need to be considered in the current calculation step. Hence, they should have been included from the beginning for the phase correction algorithm to work. As a solution to this problem, many electronic states need to be computed from the start. In some cases, where many energetic states lie close to each other and where the photochemistry is complex, phase correction might even be infeasible. The problem of intruder states was also identified by Robertsson et al. and is well explained for a diabatization procedure in Ref. [169]. For a more detailed discussion on phase correction, the reader is referred to Ref.s [118,167,168,170]. Nevertheless, as given in panel C of Fig. 2, smooth curves are obtained if phase correction is carried out correctly and these phase-corrected properties can be learned with conventional ML models.
Similarly, a small set of data can be corrected manually and afterwards a cluster growing algorithm can be applied [125,171].
This algorithm uses Gaussian process regression to continuously add data points to the initially phase-corrected data set. This approach has been employed recently to obtain diabatic transition dipole moments [119]. However, in systems containing many degrees of freedom and many electronic states, a manual correction of the sign of couplings is tedious and the approach has only been applied to small systems, yet [119,126].
In contrast to the phase-correction procedures described above, an ML-based internal phase-correction during training renders the learning of raw quantum chemical data possible and does not require any preprocessing. However, it requires a modification of the training process itself [165]. In a recent study, we applied such a phase-free training using the deep continuous-filter convolutional-layer NN SchNet [156,172] that we adapted for the treatment of excited states. In contrast to conventional algorithms, where the hyperparameters of the network are optimized to minimize the L 1 or L 2 loss function, here a phase-less loss function is applied. The latter allows the ML model to possess a different phase (or sign) for the learned property than the reference data. Since ML models intrinsically yield smooth curves, the algorithm will then automatically choose a phase for every data point such that smooth coupling curves are produced. This freedom of choice is achieved by calculating the errors between the ML value and all possible phase variations of the reference value and using only the smallest of these errors. The possibilities for phase conventions scales with 2 N S −1 , where N S is the number of considered states. Since the error is computed more often than in conventional ML training, the phase-less loss training becomes more expensive, when including more electronic states. Mathematically, instead of computing the mean squared error, ε L2 , between reference couplings, C QC ij , and predicted couplings, the phase-free error, ε ph , is computed as the minimum of 2 N S −1 computed errors: p k i and p k j are phase factors, giving rise to the sign of state i and j resulting in a phase vector with index k. This adaption of the loss function can remove the influence of any phase during the training process, making the use of raw data possible. A variation of this approach with reduced cost is possible if only one property, i.e. NACs or SOCs, are trained for NAMD simulations. A detailed discussion can be found in Ref. [165].
It is worth mentioning that, besides the arbitrary phase of the wavefunction, also the Berry phase (or geometric phase) [173] exists. Effects due to the Berry phase can not be accounted for with phase correction. Nevertheless, most often in mixed quantum classical NAMD simulations, the Berry phase can be neglected. As a drawback, some effects, such as interference of nuclear wavefunctions might not be described correctly with such methods, and thus prevents the application of the corresponding ML properties if those effects are important. In some other dynamics methods and reactions, the Berry phase plays a crucial role and can lead to path-dependent transition probabilities close to conical intersections. This effect is important in quantum dynamics simulations, where problems can be circumvented by using diabatic potentials, which will be described in section 3.2.
The advantage of multi-reference methods compared to single reference methods is that photo-dissociation, which is likely to occur in many molecules after their excitation by light, can be treated accurately. In contrast, single-reference methods fail to do so in many cases. However, multi-reference methods are seriously limited by their computational costs [7,10], calling for an efficient and meaningful training set generation. Therefore, the training set should be as small as possible, but should cover the relevant conformational space of a molecule that is required for accurate NAMD simulations [183]. Accordingly, many recent training sets for MD simulations are built by a so-called "active-learning" [77,184] or iterative/adaptive sampling scheme [85,185] that will be described in the following and can be adapted for excited states [118]. From our point of view, it is most favorable to start by computing a small initial training set and to expand it via such an adaptive sampling scheme [77,85,118,[184][185][186].

Initial training set
If only static calculations are targeted, data bases can be generated efficiently by starting from already existing data sets.
As an example, Schwilk et al. [182] constructed a large data set of 13k carbene structures by randomly choosing 4,000 geometries from the QM9 [187] data set (consisting of 130k organic molecular structures).
Hydrogen-atoms were abstracted and singlet and triplet states were optimized. The MR-CI method was subsequently used to compute the energies of the singlet and triplet state and a data set of 13,000 different carbene structures, called QMspin, was obtained, opening avenues to investigate important intermediate geometries critical for organic reaction networks.
As a starting point for all following training-set generation schemes aiming to investigate the temporal evolution of a system, the equilibrium geometry of a molecule can be computed and taken as a reference. The initial training set can then be built up by sampling conformations close to this molecular configuration. In general, every sampling method is possible. Since the normal modes of a molecule are generally important for dynamics, scans along these coordinates can be used to sample different conformations. In cases of small molecules with few degrees of freedom, this process might be a good starting guess for an initial training set [118]. It also makes sense to optimize critical points like excitedstate minima, conical intersections and state crossings and to include data along such optimization runs into the training set. The same is advisable for larger systems, but in addition, some other approaches like Wigner sampling [188] or sampling via MD simulations [189,190] can be considered. To name a few approaches, umbrella sampling [191], trajectoryguided sampling [192], enhanced sampling [193] or metadynamics [194], using a cheap electronic structure method like the semi-empirical tight-binding based quantum chemistry method GFN2-xTB [195]), can be employed.
Further, if literature or chemical intuition indicate that certain reactions, like dissociation, take place after photo-excitation, it is also favorable to include those reaction coordinates right from the beginning. The initial training set can easily comprise on the order of 1,000 data points, which might seem like a lot but is reasonable given the large number of data points in commonly used training sets [112][113][114]161]. The quality of the initial ML potentials can be assessed by carrying out short scans along different reaction coordinates, such as combinations of normal modes. As soon as the initial training set is large enough, the training set expansion via an adaptive sampling scheme can be started.

Adaptive sampling for excited-states
ML models fail to predict regions with scarce training data, i.e., their extrapolation capabilities are faint [196]. Since such regions are likely visited during a dynamics simulation, the initial training set then needs to be expanded. A quality control is needed to detect whether unknown conformational regions of the molecule are visited, such that the corresponding structures afterwards can be added to the training set.
This concept was introduced already in 1992 as query by committee [197] and has been used in chemistry in the so-called GROW algorithm of Collins and coworkers [128,129] as well as in the iterative sampling of Behler [185]. The latter is nowadays well adapted for the ground state [77,85,184,186] and was recently modified for the excited states [118]. The scheme is described in more detail in the following.
To apply the procedure of adaptive sampling, at least two ML models have to be trained independently, e.g., with slightly different hyperparameters or starting weights. An overview of this procedure with two neural networks (NNs) is given as an example in Fig. 3. At every time step during an MD simulation, the predictions Y p M of at least two ML models, M , for a property p (e.g., a potential energy) are compared to each other. To this end, the standard deviation of these predictions with respect to the mean of each property, Y P , is computed according to Figure 3: Overview of the adaptive sampling scheme for excited states, reproduced from Ref. [118] under CC-BY. Adaptive sampling is illustrated exemplarily with two NNs for the methylenimmonium cation, CH 2 NH + 2 . As a starting point, ML models are trained on the initial training set and ML-based dynamics are executed. At each time step, the predictions of the two deep NNs (NN1 and NN2) are compared to each other for energies (E) and gradients (G), nonadiabatic couplings (NACs), spin-orbit couplings (SOCs) and dipole moments (µ). If the difference between the ML models overcomes a pre-defined, adaptive threshold, the geometry visited at this time step is re-computed with quantum chemistry, added to the training set after phase correction and the ML models are re-trained. Subsequently, a new dynamics cycle is started and this process is repeated until the ML models are deemed to be converged.
This standard deviation is compared to a pre-defined threshold, ε p , for each trained property.
If the standard deviation stays below the threshold, the mean of each property, Y P , is forwarded to the MD program to propagate the nuclei. If the threshold is exceeded, the ML prediction is assumed to stem from an undersampled or unknown region of the PESs and is deemed untrustworthy. This conformation has to be included into the training set to guarantee accurate ML PESs. Thus, a quantum chemical reference computation is carried out, the data point is added to the training set and the ML models are re-trained to execute ML-NAMD simulations on longer time scales. It is sensible to choose a large threshold, ε p , in the beginning and adaptively make it smaller as the robustness of the ML models increases, giving rise to the name adaptive sampling [85].
Adaptive sampling for excited states differs from adaptive sampling in the ground state in the number of properties that are considered. As illustrated in Fig. 3, not only the energies must be accurately predicted, but also the couplings and, if necessary, dipole moments. Since more states are considered, an average standard deviation is taken as the mean of the standard deviations of each state in case of energies and gradients, and as the mean of the standard deviations of each pair of states for couplings or dipole moments, A separate threshold is set for each of these averaged quantities.
If any of the quantities is predicted inaccurately, the data point is recomputed with quantum chemistry, phase corrected and added to the training set.
In order to make this process more efficient, not only one MD simulation, but an ensemble of trajectories can be computed. The ML models are only retrained after each of the independent trajectories has reached an untrustworthy region of the PES and after each of the reference calculations has been finished and included in the training set. This makes the parallelization of many trajectories possible [85,118,185,198].
The adaptive sampling scheme should be carried out until the relevant conformational space for photodynamics is sampled sufficiently. However, using more than one ML model for production runs is still favorable. One of us and coworkers observed that the error of predictions decreases with the number of ML models used [85,198]. We have seen the same trend in a recent study for NAMD simulations. With the adaptive sampling scheme for excited states, we generated a training set of 4,000 data points of the methylenimmonium cation, CH 2 NH + 2 , to carry out long time-scale NAMD simulations with NNs [118].

Additional sampling techniques for excited states
Further training sets for NAMD simulations were generated for one-dimensional systems as well as polyatomic molecules. For example, Chen et al. have computed 90,000 data points via Born-Oppenheimer MD simulations and NAMD simulations, where emphasis was placed on the inclusion of geometries after a transition from one state to another took place [114]. Deep NNs were trained on energies and gradients of this training set to accurately reproduce NAMD simulations.
Hu et al. [112] used a very similar approach and obtained around 200,000 data points of 6aminopyrimidine from Born-Oppenheimer MD simulations. They further carried out NAMD simulations with surface hopping [199,200], where transitions from one state to another were allowed via so-called hops. The geometries that were visited shortly before a hop took place were used as a starting guess to optimize conical intersections, i.e. critical points of the PESs, where two states become degenerate. Those data points were included to comprehensively sample the regions around a conical intersection. However, the ML models were not accurate enough for NAMD simulations solely based on ML potentials and the authors had to resort to quantum chemistry calculations in critical regions of the PESs.
Dral et al. [113] generated a training set for a onedimensional two-state spin-boson model consisting of 10,000 data points with a grid-based method. The training data selection was then based on the structure, rather than on the energy of the molecules. For each data point, a molecular descriptor was computed and the distances of the descriptors were compared. Data points for the training set were chosen to sample the relevant space sufficiently [113,201]. Compared to random sampling, this method allowed a reduction of training set sizes up to 90 %, which was shown for static calculations of the methyl chloride molecule [201,202]. A similar structure-based sampling scheme was proposed by Ceriotti et al. [203].
Additionally, a maximum and minimum value can be computed for each representation of a molecule inside the training set. Every new structure that is obtained throughout an MD run can be compared to those values to get a measure of reliability of ML predictions. If the configuration does not lie within the known region, it can be added to the training set [184,204]. Very recently, an active learning approach has been proposed to construct PESs without the need of running MD trajectories. The difference between two NN potentials was computed and points were iteratively added at the maxima of this difference surface (or, as phrased in the study, at the minima of the negative difference) [205].
It becomes evident from the diversity of approaches and training set sizes that a general guide on how to compute the training set and how large it should be for NAMD simulations can not be given. It is rather a matter of the efficiency that should be achieved and the computational costs that can be justified. Further, the training set strongly depends on the molecule un-der investigation. Especially its size, flexibility and the complexity of the light-induced dynamics play an important role. A molecule, whose photodynamics can be described as a two-state problem, such as in a simplified case of ethylene [206,207], or a molecule, which is rigid, where dynamics mostly lead towards one reaction channel, possibly requires less data points than molecules that exhibit several different reaction channels after photo-excitation.

Beyond Born-Oppenheimer dynamics
With an accurate training set for excited states at hand, NAMD simulations can be enhanced with ML models in order to enable the dynamics on time scales otherwise unfeasible. The most accurate way to study the dynamics of a molecule would be the full quantum mechanical treatment, which is, however, expensive and limited to a few atoms, even if ML PESs are applied [49, 115-117, 119-125, 127, 208-211]. A mixed quantum classical treatment is thus often preferred, where the motion of the nuclei are treated classically on one of the ML PESs. The mixed quantum classical MD simulation can then be interpreted as a mixed MLMD simulation [165]. The Born-Oppenheimer approximation allows to separate the nuclear from the electronic degrees of freedom. However, this approximation is not valid in the vicinity of avoided state crossings of PESs (or conical intersections, as mentioned before), which play an important role in excited-state dynamics.
In these critical regions of the PESs, ultrafast rearrangement of the motions of the electrons and the nuclei takes place due to strong couplings. As already mentioned, the relevant coupling elements are NACs and SOCs. The NACs (denoted as C NAC ) are vectorial properties and can be computed as [48,212,213] neglecting the second order derivatives. Thus, in the vicinity of a conical intersection, the couplings become very large, whereas they are almost vanishing elsewhere. The singularities that arise when two states are degenerate do not only pose an obstacle to quantum chemistry, but consequently also to PESs fitted with ML [112][113][114]118]. NACs are nevertheless important properties to determine the direction and probability of internal conversion -a transition from one state to another, where the spin multiplicity does not change [8,48,51,52,214]. In contrast, the SOCs (denoted as C SOC ) are complex-valued properties that determine the rate of intersystem crossing, i.e., the transitions from one state to another, where spin multiplicity does change. In standard quantum chemistry programs, SOCs are given as the off-diagonal elements of the Hamiltonian matrix [8,52,215]:

Fitting diabatic potentials
The numerical difficulties that arise due to discontinuous PESs and singularities of couplings at conical intersections can be circumvented by the use of diabatic potentials instead of adiabatic ones [116,[216][217][218][219][220]. In the diabatic basis, the coupling elements are smooth properties and the arbitrary phase of the wavefunction does not have an impact. This favors the use of diabiatic PESs. Since the output of a quantum chemistry program is generally given in the adiabatic basis, a quasi-diabatization procedure is necessary. Strictly speaking, a diabatization procedure is not possible because e.g. an infinite number of states is needed for an accurate representation. If using a finite number of states, the term quasi-diabatic is employed. For simplicity, we still use the notation of diabatic potentials for quasi-diabatic potentials. Those have been generated with different methods [221] and for small molecules up to date. Examples are the propagation diabatization [222], diabatization by localization [223] or by ansatz [115,224], diabatization based on couplings or other properties [225][226][227][228], configuration uniformity [229], block-diagonalization [230], CI vectors [169] or (partly) on ML [115,116,224,[231][232][233][234].
Since several years, (modified) Shepard interpolation is used to fit diabatic potentials [129,[235][236][237][238] and also least squares fitting was applied to study the photo dissociation of molecules, such as NH 3 and phenol [239,240]. In a series, Guo, Yarkony and co-workers developed invariant polynomial NNs [116,124,[231][232][233][234] to address the excited-state dynamics of NH 3 and H 2 O by fitting diabatic potential energy matrix elements. Absorption spectra as well as branching ratios could be obtained with high accuracy. The same authors further fit the diabatic 1,2 1 A dipole moment surfaces of NH 3 , which can only be fitted accurately if the topography of the PESs is reproduced correctly, validating the previously fitted diabatic potentials [119].
Habershon, Richings and co-workers used Gaussian process regression (in their notation equal to kernel-ridge regression) to fit diabatic potentials to execute on-the-fly dynamics of the butatrien cation with variational Gaussian wavepackets [127]. In another study, they applied an on-the-fly MCTDH scheme (DD-MCTDH) and carried out 4-mode/2-state simulations of pyrazine [120]. By improving the ML approach with a systematic tensor decomposition of kernel ridge regression, the study of 12-mode/2-state dynamics of pyrazine was rendered possible. This achievement remains a huge improvement over current MCTDH simulations in terms of accuracy and efficiency [241].
For the improvement of the diabatization by ansatz procedure, Williams et al. [115] applied NNs and enabled the fitting of the electronic low lying states of NO 3 . The improvement of the diabatization procedure itself is desirable [116,117], since the generation of meaningful diabatic potentials is often a tedious task and restricts their use tremendously. Up to date, no rigorous diabatization procedure exists that allows the diabatization of adiabatic potentials of polyatomic systems by non-experts in this field [116,119]. Especially challenging for larger and more complex systems is the number of electronic states within a certain energy range that have to be considered for successful diabatization. An increasing computational effort to provide all relevant electronic states is the result, making diabatization further challenging [169]. Often, more extensive approximations [115,169,242], e.g. the linear vibronic coupling model [243] are applied. We refer the reader to Ref.s [50,217,[243][244][245] for more details on such approaches.
Despite the advantages of diabatic potentials, due to the before-mentioned drawbacks and the fact that the direct output of a quantum chemical calculation is given in the adiabatic basis, on-the-fly NAMD in the adiabatic representation is often the method of choice for large polyatomic systems, which will be discussed in the following.

Fitting adiabatic potentials
In order to execute NAMD simulations in the adiabatic basis, approximations have to be introduced to account for nonadiabatic transitions between different PESs.
A good trade-off between accuracy and efficiency can be achieved with the surface hopping methodology [199,200], which is often applied in ML-based NAMD simulations [112][113][114]118].
In surface hopping, the transitions, or so-called hops, between different states, are computed stochastically and a manifold of trajectories needs to be taken into account to analyze different reaction channels and branching ratios [8,51,246] Several algorithms [52,[246][247][248] are frequently used to compute the hopping probability as well as its direction, with Tully's fewest switching algorithm being among the most popular ones [199,200]. There, the couplings between adjacent states determine the hopping probability [51]. Other frequently applied algorithms to compute hopping probabilities are the Landau-Zener [249,250] and the Zhu-Nakamura approximations [247,248,251,252]. Those approximations solely rely on the PESs and omit the computation of wavefunction coefficients and couplings. Other flavors to account for transitions exist, which have, however, not been used in ML based NAMD studies yet. We thus refer the reader to Ref.s [48,51,52,170,199,246,[251][252][253][254][255][256] for further information.

NAMD simulations with ML energies and forces
Based on the fewest switches algorithm, one of the first ML NAMD simulation is carried out by Carbogno et al. [130], where the scattering of O 2 at Al(III) is studied [111,130,257]. A set of 3768 carefully selected data points [258] allowed for interpolation of the PESs with NNs [257]. In a first attempt, the authors include a spin-unpolarized singlet PES and a spin-polarized triplet state. Strictly speaking, the output of a quantum chemistry simulation for a singlet and triplet state is spin-diabatic [52,215] and NAMD simulations ideally should carry out a diagonalization to obtain the spin-adiabatic PESs [52,215]. The authors took advantage of the adiabatic spin-polarized PES [259] to compute the absolute value of couplings [130]. In this way, the transitions between the states could be approximated using surface hopping omitting the computation of wavefunction dynamics. This study was extended by two-state NAMD simulations of different multiple PESs arising from different spin configurations. Findings suggested a high probability of singlet-to-triplet conversion during scattering experiments with a non-zero probability even at low coupling values [111,130,257].
Other studies using the Zhu-Nakamura method [247,248,251,252] to account for nonadiabatic transitions are discussed below. This approximation is based solely on energies and neglects the phase of the wavefunction. As a drawback, PESs are always assumed to couple to each other, when they are close in energy. This holds true for many cases, but one must be aware that strongly and weakly coupled PESs can not be distinguished.
Hu et al. [112] for example trained separate kernel ridge regression models to fit three singlet states of 6-aminopyrimidine. For learning, they used 65,316 data points comprising the molecular structures and energies of 6-aminopyrimidine with gradients not fitted, but computed afterwards. The data points were obtained from Born-Oppenheimer simulations, which were further clustered into sub-groups, from which the training points were selected randomly. As mentioned before, hopping geometries obtained from reference NAMD simulations were taken to find minimum conical intersections and the latter were also included in the training set. In contrast, Chen et al. [114] trained two deep NNs on 90,000 data points of two singlet states of CH 2 NH. Again, data points were obtained from Born-Oppenheimer MD simulations and NAMD simulations starting from hopping geometries.
In both studies, the NAMD simulations of the reference method could be successfully reproduced.
Instead of approximating the hopping probability, the NACs can also be approximated from PESs, gradients and Hessians [208,[260][261][262][263][264]. We made use of this relation and the fact that ML Hessians can be computed efficiently, and carried out NAMD simulations with the surface hopping method for sulfurdioxide, thioformaldehyde and the methylenimmonium cation [165].

NAMD simulations with ML energies, forces, and couplings
In addition to energies and forces, SOCs need to be fitted with ML models when states of different spin multiplicities become relevant. Furthermore, when approximative schemes for the computation of hopping probabilities fail, the ML models need to learn NACs. One of the first studies, where NACs were fitted, used 1,000 and 10,000 data points to train kernel-ridge regression models to reproduce NAMD simulations of a one-dimensional system. However, especially in critical regions of the PESs, the ML models could not replace quantum chemical calculations and so 13-16% electronic structure calculations were required during an NAMD simulation [113]. The authors highlighted this as a drawback, because efficient simulations should be performed purely with ML and should not rely on intermediate quantum chemical calculations. Moreover, each entry of the NAC vectors was fitted by a separate kernel-ridge regression model, which turned out to be insufficiently accurate.
As indicated before, we also aimed for reproducing NAMD simulations with ML. We employed multilayer feed-forward NNs trained on 4,000 data points of 3 singlet states of CH 2 NH + 2 [118]. Short reference NAMD simulations based on electronic structure calculations could be reproduced. With the ML NAMD, long simulation times on the order of a nanosecond were successfully reached. Significantly different from previous ML NAMD approaches is the smaller size of the training set required to reproduce NAMD simulations. Further, a multi-output ML model was used to fit all NAC vectors between different states of same spin multiplicity at once. We term such models multi-state models. Per definition, kernelridge regression, and similar approaches such as linear regression, are single-state models. In order to make multi-state predictions of such models possible, the energetic state has to be encoded explicitly by using for example an additional state kernel. This procedure enables to model several states simultaneously. We studied the use of multi-state descriptors with the QML toolkit [265] for kernel-ridge regression models and showed that a multi-state description is generally superior to a single-state description in terms of accuracy [161].
Lastly, we want to comment on the NACs as vectorial properties.
It should be clarified that approaches relating a molecular input directly to NAC values do not provide rotational covariance. This drawback is independent of a single-state treatment, i.e., the use of a separate ML model for each coupling value, or a multi-state treatment, where all values are represented in one ML model. Very recently, Zhang et al. applied a symmetry-adapted high-dimensional NN [266] and treated the couplings as derivatives of NN representations. In this case, electronic friction was modelled via ML and applied for MD simulations of molecules at metal surfaces to treat the electronnuclei coupling in a rotationally covariant manner. For the NAC vectors, we applied a similar strategy (similar also to force-only training for potentials), and implemented them as derivatives of virtual properties (i.e., non-existent in quantum chemistry) built by a deep NN [165].

Choosing the right descriptor
Many of the aforementioned studies use kernel ridge regression models or NNs in combination with distance-based descriptors [99, 112, 113, ?-114] such as the matrix of inverse distances or the Coulomb matrix [76].
It is worth mentioning that the accuracy of the ML PESs also depends on the type of descriptor. Molecular descriptors that represent atoms in their chemical and structural environment are often superior to those who treat complete molecules [101,154,155]. The symmetry functions of Behler [267,268], their weighted counterparts [175,269] or the FCHL (Faber-Christensen-Huang-Lilienfeld) representation [101,155] work very well for NNs and the latter also for kernel-ridge regression and additionally provide permutation invariance.
Further improvement can be provided by message passing neural networks [270]. Compared to handcrafted molecular descriptors, the representation of molecules can be seen as a part of a deep NN and, thus, is generated automatically. For each training set, an accurate descriptor is intrinsically designed, which accounts for the chemical and structural environment of a molecule.
For the excited states, the SchNarc [165] approach offers this type of descriptor.

Conclusion and outlook
To conclude, ML methods are powerful and can be used to speed up current MD approaches for the excited states. They have been successfully applied to circumvent existing problems due to the expenses of the underlying electronic structure methods.
While the fitting of diabatic potentials is generally more favorable, those methods are limited by the challenges that arise in finding meaningful diabatic potentials. Up to date, diabatization procedures are tedious and often not feasible for large and complex systems. ML models have been successfully applied to improve these processes [115][116][117], but methods to treat large and complex polyatomic systems in the diabatic basis are still lacking.
To investigate the photodynamics of polyatomic molecules, mixed quantum-classical MD simulations in the adiabatic basis thus often remain the method of choice. One advantage is that the direct output of a quantum chemical calculation is given in the adiabatic basis and so the obtained potential energies and forces can be directly fitted with an ML model. By applying approximations for the computation of transition probabilities from one state to another, the photodynamics can be studied efficiently with ML [112,114,165]. When approaches aim for additionally fitting the coupling values between different electronic states, inconsistencies in the data need to be considered carefully. Those have to be either removed from the training set or the training process itself has to be adapted in order to achieve successful training. Both approaches have been applied and were used for NAMD simulations [113,118,165].
The current challenges that have to be tackled when replacing quantum chemical calculations in photodynamics simulations of large and complex polyatomic molecules are the efficient training set generation and the accurate fitting of a manifold of energetic states, forces, and couplings between them. Many recent ML approaches required several thousand of data points for small polyatomic molecules and energies, forces, and couplings were often trained in separate ML models, leading to unsatisfactory accuracy. The development of an ML model that can treat all properties for photodynamics simulations at once is clearly desirable. While current studies struggle with fitting the excited states of one molecule, the transferability of ML potentials for the excited states is far from being achieved. Studies that try to fit more molecules than just one could give first insights into the possibility of extrapolating throughout chemical compound space with ML also for the excited states. support, also in the frame of the research platform ViRAPID.