Machine Learning Seams of Conical Intersection: A Characteristic Polynomial Approach

The machine learning of potential energy surfaces (PESs) has undergone rapid progress in recent years. The vast majority of this work, however, has been focused on the learning of ground state PESs. To reliably extend machine learning protocols to excited state PESs, the occurrence of seams of conical intersections between adiabatic electronic states must be correctly accounted for. This introduces a serious problem, for at such points, the adiabatic potentials are not differentiable to any order, complicating the application of standard machine learning methods. We show that this issue may be overcome by instead learning the coordinate-dependent coefficients of the characteristic polynomial of a simple decomposition of the potential matrix. We demonstrate that, through this approach, quantitatively accurate machine learning models of seams of conical intersection may be constructed.


Kernel ridge regression calculations SOAP descriptor
To efficiently predict structure-property relations with machine learning (ML) we need to encode the structural information in an ML-friendly way.The main requirements of any descriptor can be summarized as follow: 1. Invariant to spatial translations 2. Invariant to rotation of the coordinate system 3. Invariant with permutations of atomic indices 4. Complete and unique: the structure and descriptor representation has a bijective mapping and each property corresponds to a unique descriptor 5. Continuous: 'small' changes in the structure result in a 'small' change in the descriptor

Non-redundant and computationally cheap to construct
The smoothed overlap of atomic positions (SOAP) descriptor 2 satisfies all these criteria and is the descriptor of choice in this work.The SOAP descriptor encodes local atomic environments through a sum of Gaussian-type densities placed at a number of local centres, and subsequently expanded in terms of Gaussian radial and spherical harmonic angular basis functions.These local environments are expressed as power spectrum and can be placed arbitrarily in real space, but they are often, and in this work, placed on each atomic centre in the molecule.These power spectra satisfy the above requirements of a descriptor, the proof and derivations can be found in (Bartok 2013). 2 A partial power spectrum placed on a single centre is given by where n is the indexes radial basis indices, and l and m index the angular spherical harmonic basis functions.The coefficient c Z nlm is defined through the following inner product: ) where ρ Z in Equation S3 is a species-dependent pseudo-atomic density built from a Gaussian controlled by the width parameter σ, and the Y lm are spherical harmonics.The decay parameter α n in the Gaussian radial function g n is chosen such that each term in g n decays to a threshold value of 10 −3 at a cutoff radius r cut .A full power spectrum that represents a local environment is taken as the concatenation of the partial power spectrum p Z 1 Z 2 nn ′ l for all unique pairs of atomic species.A global SOAP descriptor P with elements is then built from a sum over the local power spectra centered at each site.The size of the SOAP descriptor P therefore does not depend directly on the size of the molecule, but rather the dimension of the angular basis {Y lm } and number of atomic species included in Equation S1.The length of the resulting power spectrum can be calculated as where S n is the number of atomic species included.
The SOAP descriptor is specified by four parameters: r cut , n max , l max , σ.The parameters n max and l max specifies the angular expansion of the spherical harmonic in Equation S2, σ controls the width of the Gaussian-type pseudo-density in Equation S3, and r cut implicitly changes the decay parameter in Equation S4, which controls the maximum reach of the radial basis.Computations of the SOAP descriptor were carried out using the DScribe Python package. 3,4

Kernel functions
All results presented in the main text were computed using the anisotropic Radial Basis Function (RBF) (i.e., squared exponential) kernel, where d(P i , P j ) is the Euclidean distance between SOAP feature vectors P i and P j .The RBF kernel function is infinitely differentiable and well suited to the description of the smooth ω(R) and c Z i (R) functions.On the other hand, the non-differentiable adiabatic PESs E i (R) forming a conical intersection could reasonably be assumed to be better described using a non-differentiable kernel.To investigate this, additional calculations were performed using the Matérn-1/2 kernel

Optimization of the SOAP and kernel hyperparameters
The SOAP parameters r cut , n max , l max , σ and the kernel hyperparameter length scale l are optimized simultaneously using a Genetic Algorithm (GA) search.A GA was used as we require the simultaneous optimization of a mixed set of continuous and discrete variables, which makes gradient-based optimization methods difficult.The GA search algorithm follows a generic procedure of creating a population, selecting top candidates based on chosen metric and random selection, then allowing combination and random mutation which form a new set of population.The process is repeated until an optimal solution is found.First, an initial population is generated by randomly sampling from the bounded set of parameter values: The GA search is carried out with pymoo 5

Effect of the choice of kernel
We here consider the effect of using a non-differentiable kernel on the ability of the directenergy KRR models to correctly reproduce a conical intersection.For this purpose, we show in Figures S1 and S2 the direct-energy model PESs for the ethylene example obtained using the Matérn-1/2 kernel.As can clearly be seen, even when using a kernel belonging to the same differentiability class as the adiabatic PESs, the direct-energy models still fail to reproduce the intersection of the PESs, instead furnishing an avoided crossing.Proof of Equation 5We start by expanding out the right hand side of Equation 5: Expanding out Equation S9 and multiplying out the summation we obtain n i<j Taking the first term on the right-hand side of Equation S10, expanding it, and simplifying the double ordered summation to a single summation, we obtain We now make use of the trace property of a Kronecker product where the last equality in Equation S12 holds because the diagonal splitting matrix Z is by construction traceless.We now note that the trance tensor product of an n × n diagonal matrix Z with itself can be written as a sum of the form Setting Equation S13 to zero from the result of Equation S12 gives Substituting Equation S14 into the right-hand side of Equation S11, we obtain the rela- Equation S15 can now be substituted into Equation S10. and after simplifying we get the result n i<j The sum term on the right-hand side of Equation S16 is precisely the definition of the Thus, the desired result is obtained: Ability of the ω-CP models to extrapolate We here consider the ability of the ω-CP KRR models to extrapolate well beyond the sub-

ML computational detail
Below are tables for C2H4, NH3 and CH4 containing all relevant ML parameters.r, n, l and σ are SOAP parameters and ls corresponds to the an-isotropic RBF kernel lengthscale.s y Ab initio 0.0 0.0 0.213 0.125 -0.130 -0.046 N=200 ω-cp 2.4 8.9 0.216 0.125 -0.130 -0.041 N=500 ω-cp 0.9 1.6 0.216 0.127 -0.130 -0.045 N=1500 ω-cp 0.3 0.5 0.214 0.126 -0.130 -0.046 Once a population (paramter set) is generated, a metric is used to score each individual (sets) in the population, here taken as the root mean squared displacement (RMSD) of KRR model predictions relative to a test set.The training/test sets were generated by randomly splitting the data set into 15%/85% subsets.Once scored by their RMSDs, survival of the fittest is applied, and from the remaining survivors, crossover and mutation operations are performed to generate a new population (set of paramters) of the same length, and the whole process is repeated until convergence.Simulated Binary Crossover (SBX) and Polynomial Mutation (PM) algorithms were used for the crossover and mutation procedure, respectively.
volume of nuclear configuration space spanned by the geometries present in the training sets used to construct them.For reference, a superposition of all training set geometries for the ethylene and ammonia cation examples are shown in FigureS3.The tight nature of the LHC sampling used is clearly seen.Even so, both models are found to remain accurate out to relatively large displacements along the branching space coordinates x and y.To demonstrate this, we consider the geometries corresponding to R CI + 0.4x and R CI + 0.4y, points at which the ω-CP model PESs are still accurate for both systems (as can be discerned from Figure5in the main text).These geometries are shown in FigureS4alongside the MECI geometries R CI .As can be seen, these geometries lie outside of the span of the set of displacements of the training set geometries from the MECI geometries.Especially stark is the R CI + 0.4x ethylene geometry, which corresponds to a different structural isomer to the geometries in the training set.Also notable is the ability of the ammonia cation ω-CP model to accurately describe the PESs at near-dissociated geometries.We thus conclude that, in stark contrast to the direct-energy KRR models, the ω-CP KRR models are able to extrapolate rather far beyond their training set data.This, in turn, is a result of the long length scales l on which the extremely smooth ω(R) and c Z i (R) surfaces vary.

Figure S3 :
Figure S3: Superposition of the 10000 training set geometries used in the construction of each of the ethylene and ammonia models.

Figure S4 :
Figure S4: MECI geometries, R CI , and displacements of length 0.4 Å along the branching space directions x and y.

Table 4 :
Relevant parameter for C2H4 KRR model

Table 6 :
Relevant parameter for CH4+ KRR modelTable7shows a comparison between the ω-cp learned branching space parameters with the ab-initio values at reduced training sizes.Parameters shown in Table1of main text uses N=3000.

Table 7 :
Dependence of branching space topography parameters with respect to training set size (N).N=200, N=500 and N=1500 is shown.Reasonable accuracy is obtained at N=500 and approaches ab-initio values at N=1500