Fitting Potential Energy Surfaces by Learning the Charge Density Matrix with Permutationally Invariant Polynomials

The electronic energy in the Hartree–Fock (HF) theory is the trace of the product of the charge density matrix (CDM) with the one-electron and two-electron matrices represented in an atomic orbital basis, where the two-electron matrix is also a function of the same CDM. In this work, we examine a formalism of analytic representation of a generic molecular potential energy surface (PES) as a sum of a linearly parameterized HF and a correction term, the latter formally representing the electron correlation energy, also linearly parameterized, by expressing the elements of CDM using permutationally invariant polynomials (PIPs). We show on a variety of numerical examples, ranging from exemplary two-electron systems HeH+ and H3+ to the more challenging cases of methanium (CH5+) fragmentation and high-energy tautomerization of formamide to formimidic acid that such a formulation requires significantly fewer, 10–20% of PIPs, to accomplish the same accuracy of the fit as the conventional representation at practically the same computational cost.


S1. Gaussian matrix elements
This section provides analytic expressions for the Gaussian integrals seen in Eqs.2-5 of the main text used in the present calculations.Since we limited our model to the s-density with a single function per atomic center, only the integrals involving s functions will be given explicitly.

S2. Asymptotic limits of density matrix elements
The scaling function that appears in Eq. 8 (main text) is defined as a sum of products of long-and short-range one-dimensional damping functions that act on the non-constant part of the density element   , The long-range damping with   → ∞ for all b while   → 0 makes sure that an isolated atom a has exactly   electrons.The short-range damping is introduced mainly to control blow-up in the   behavior at short   distances but may additionally be thought of as an enforcement condition for wavefunction collapse in the united atom limit.Namely, for a diatomic molecule ab the following must hold for   = 0 regardless of the linear coefficients   atom and   pair , which is guaranteed by Eqs. 8, 10 in the main text, and Eq.S8.

S3. Beyond the core Hamiltonian model
The exact Hartree-Fock energy is given in Eq. 7 (main text) and is reproduced here for clarity but with an altered notation, where we have explicitly distinguished the density matrix appearing in the two-electron contribution to the energy:   .With  =  the equation is the true HF energy.However, for a linear class of least squares solutions of the unknown coefficients   atom and   pair , we cannot express  in terms of these coefficients but approximate it using some reasonable models.Here we will examine two such models based on the following approximation, Although it is a rough approximation, it is symmetric and precisely conserves the overall charge.
Additionally, it assumes that there can be no more than three-center integrals, leading to the twoelectron/two-center model by retaining only    and    integrals (2e2c), and the full twoelectron/three-center model (2e3c).
In Table S1, we review calculations based on these two models along with the Core Hamiltonian Model (CHM) for H3 + .As can be seen, the 2-electron/3-center models do not perform better than the CHM for a range of selected PIP basis sets.However, it is possible to detect a convergence of the error to the same limit with an increasing PIP basis.This leads to two main conclusions, (i) the present linear approximation given by Eq.S11 is not useful and in fact, impractical given the need to calculate the two-and three-center integrals, and (ii) to take full advantage of the exact HF energy expression, it may be necessary to solve for the linear coefficients   atom and   pair using non-linear regression methods.The latter point is being considered in our ongoing investigations.
Table S1.Comparison of trained RMSE (cm -1 ) of the CHM model and two-electron models involving 2-center (2e2c) and 3-center (2e3c) integrals using H3 + fits on the training set generated with a CCSD/cc-pVDZ trajectory propagated at a 7335 cm -1 total energy using 3000 points.The  Table S3.Training and testing RMSEs (in cm -1 ) of CH5 + fit trained on 13605 configurations and tested on 52400 configurations, calculated at the B3LYP/cc-pVTZ level of theory.

S6. Isomerization landscape and set pruning for H2NCHO
For a set of trial potential energy distributions {fi(V)} derived from microcanonical trajectories of different total energies numbered i = 1, 2, ... each of length N time steps, we seek such coefficients in the "pruned" distribution where  is a constant to be determined by the g(V) normalization requirement and where M (<< N) is the size of the pruned set.
where the overlap matrix is calculated as using quadrature.The condition S13 ensures that g(V) is as close to a uniform, or a top-hat, distribution as possible by assigning approximately equal weights to the configurations with the potential energies in [0, Vmax].Carrying out differentiation, the solution for the best coefficients is For practical purposes, we need the frequency of pruned points (sampling intervals) in each trial set i.This frequency is given by the reciprocal of the coefficient   =   −1 .

Figure S1 .
Figure S1.Energy decomposition of the CHM(4) model for the HeH + potential energy curve calculated at CCSD/cc-pVDZ level of theory.E(HF) is the component of the Core Hamiltonian energy with the nuclear repulsion energy, and E is the conventional PIP based correction as seen in Eq. 12 (main text).

Figure S2 .Figure S3 .
Figure S2.Fourier transforms of the potential energy elements of H3 + as functions of time of a

Figure S4 .
Figure S4.Five outlier "giraffe" configurations of CH5 + that appear in the test set but are largely missing from the train set.Only configuration (d) can be loosely identified in the training set.The accompanying numbers are bond distances in Å.

Figure S5 .
Figure S5.Key stationary points on the formamide and its H-transfer isomer, formimidic acid surfaces provide a representation of the training and testing sets for the reported fits.The numbers next to each structure are electronic energies in cm -1 relative to MIN1 and calculated at the level of B3LYP/aug-cc-pVDZ/GD3.Structures MIN5 and TS25 are largely absent from both sets due to an insufficiently extensive sampling.

Figure S6 .
Figure S6.Potential energy distributions of the training and testing sets of H2NCHO calculated The normalized s function on center A with coordinate   is The distance between two centers A and B is   = |  −   |.The one-electron Coulomb integral involving two basis functions centered at A and B nuclear centers coupled over the nuclear center C with charge   is