Tensor Hypercontraction Form of the Perturbative Triples Energy in Coupled-Cluster Theory

We present the working equations for a reduced-scaling method of evaluating the perturbative triples (T) energy in coupled-cluster theory, through the tensor hypercontraction (THC) of the triples amplitudes (tijkabc). Through our method, we can reduce the scaling of the (T) energy from the traditional to a more modest . We also discuss implementation details to aid future research, development, and software realization of this method. Additionally, we show that this method yields submillihartree (mEh) differences from CCSD(T) when evaluating absolute energies and sub-0.1 kcal/mol energy differences when evaluating relative energies. Finally, we demonstrate that this method converges to the true CCSD(T) energy through the systematic increasing of the rank or eigenvalue tolerance of the orthogonal projector, as well as exhibiting sublinear to linear error growth with respect to system size.


I. INTRODUCTION
Coupled-cluster (CC) theory [1,2] is one of the most important advances of modern quantum chemistry, allowing for a polynomial-time evaluation of the electronic energies and wavefunction of a molecule, as a size-extensive alternative to truncated configuration interaction (CI) methods [3,4].Truncated CC methods also avoid the intractable super-exponential scaling of full configuration interaction (FCI), yielding reasonable and chemically accurate relative energies compared to both the FCI limit and to experimental results, especially in the context of CCSD(T), also known as the "gold standard" method in computational quantum chemistry [5].The tractability and accuracy of CC methods make the development of efficient CC methods crucial for the future of quantum chemistry, as evaluation of accurate energies and wavefunctions is made possible for larger and more complex systems through hardware advances such as massively parallel computing [6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21][22] and GPUs [23][24][25][26][27][28][29].
Because of this, DFT and MP2 can be run on system tens or even hundreds of times the size of a system typically evaluated with CC methods [34,35].To close the gap between CC and less reliable electron correlation methods, it is useful to devise approximation schemes to CC which reduce the scaling, but also allow a means to systematically control the error compared to the non-approximated CC method.One such approach involves local-correlation [36][37][38][39][40][41][42][43][44][45][46][47], such as used in the DLPNO methods [48,49].With large enough molecules, these methods achieve asymptotic linear-scaling.
Another approach is the rank reduction of the coupled-cluster amplitudes [50], using orthogonal projectors that transform the single and double cluster-amplitudes into a smaller basis Because of the orthogonal nature of the projectors, getting the full amplitudes from the rank-reduced form is trivial As shown by Parrish and co-workers, the size of the V and W indices, also known as the projector rank, can be made directly proportional to the system size, while maintaining a set relative error from the absolute energy of a molecule [50].More recently, Hohenstein et.al. have shown how to create a tensor hypercontracted (THC) form of the t ab ij amplitudes, through the CANCENCOMP/PARAFAC (CP) decomposition [51] of the orthogonal projectors [52].
Parrish and Hohenstein have also shown that, in the context of CCSD, the size of the X index can be made proportional to the system size to maintain a set relative error.Rankreduction methods have also been applied to coupled-cluster theories involving higher levels of excitation, recently by Lesiuk with the SVD-CCSDT method [53], where the concept of orthogonal projectors is used to approximate the triples amplitude in CCSDT theory In the following sections, we will combine the concepts of orthogonal projectors and THC to develop working equations for a reduced-scaling variant of the non-iterative perturbative triples correction to the CCSD energy [5].Recently, Lesiuk derived an O(N 6 ) approach to the (T) energy with orthogonal projectors which he calls RR-CCSD(T) [54].In the current paper, we will improve upon the work of Lesiuk's approach utilizing tensor hypercontraction.
Similar to how the THC-CCSD method [52] improves upon the RR-CCSD method [50,54], our new approach, which we name THC-CCSD(T), will commensurately enhance RR-CCSD(T), reducing the scaling of Lesiuk's from O(N 6 ) to O(N 5 ).For consistency, we will use many of the same formalisms as Lesiuk [53] and Hohenstein [52].

A. Notation
We will use the following conventions to describe the indices appearing in this work: • i, j, k, l: Occupied molecular orbitals, which ranges from 1 to n occ .
• a, b, c, d: Virtual molecular orbitals, which ranges from 1 to n virt .
• P, Q: Auxiliary indices of density-fitted/Cholesky-decomposed ERIs, which ranges from 1 to n aux .
• w, v: Laplace denominator weight indices, which ranges from 1 to n w .
• U, V, W : Rank-reduced dimension of the doubles orthogonal projector, which ranges from 1 to n proj .
• A, B, C: Rank-reduced dimension of the triples orthogonal projector, which ranges from 1 to n proj .
• X, Y, Z: CP-decomposition rank of the triples orthogonal projector, which ranges from 1 to n proj .
The relative sizes of the indices are as follows: Note that n w does not grow with increasing molecular system size, and therefore, run-time analysis of intermediates with w, v indices will only treat the Laplace index as a prefactor.
The frozen-core approximation was used in all post-Hartree-Fock computations in this work; i.e., the 1s electrons are not correlated for all first-row atoms.The occupied space n occ always refers to the number of correlated occupied orbitals.Einstein summation convention is used throughout -all indices appearing on the right-hand side but not on the left-hand side of an expression are summed over.

B. Perturbative Triples Correction to CCSD
CCSD is often not sufficient to obtain "chemically-reliable" theoretical predictions, and it has been shown that only after triple excitations are considered that relative energies of under 1 kcal/mol can be regularly achieved [55][56][57][58][59][60].However, an explicit treatment of all triples has a very high cost of O(N 8 ).Therefore, the triples amplitudes are often determined in a perturbative manner, based on the work of Raghavachari and co-workers [5].In their formalism, the perturbative triples correction to the CCSD energy is defined as where T 1 , T 2 , and T 3 are known as the "cluster operators" and, in second-quantization formalism, are defined as E ai represents the singlet, spin-adapted excitation operator, and is defined as where the barred creation/annihilation operators refer to the beta spin orbitals and nonbarred refer to the alpha spin orbitals.
The accuracy of the (T) method stems from a highly favorable error cancellation between T and E ST .In restricted, single-reference, closed-shell coupled cluster theory, one can write the equation for the (T ) correction as [61] where and Following the formalism of Lesiuk [53], we define P L and P S , or the "long" and "short" permutation operations as The perturbative triples amplitude (t abc ijk ), is defined as Using the perturbative triples amplitude, as well as the permutational symmetry of the Laplace denominator, one can rewrite Equation 17as: We will use this equation when deriving the formulas for the THC-CCSD(T) energy.
The cost of evaluating expression 23 scales as O(N 6 ).However, the cost of evaluating expression 18 scales as O(N 7 ), leading to an overall unfavorable O(N 7 ) scaling of the CCSD(T) method.

C. Orthogonal Projectors
One crucial step of rank-reduced coupled cluster methods is the the formation of the orthogonal projectors to reduce the dimensionality of the amplitudes, as given in equations 1-4 and 8.There are a variety of methods that can be used to compute orthogonal projectors.
One such method for the CCSD doubles amplitude is to form them from the definition of the MP2 t ab ij amplitudes [50].
Using density-fitting (DF) [62,63], also known as resolution-of-the-identity (RI), or Cholesky Decomposition (CD) [64], the set of electron-repulsion integrals (ERIs) in the molecular orbital (MO) basis (ia|jb) can be written as follows [65]: The energy denominator can be factored with a constant-sized index w (with growing molecular system size) through the Laplace denominator approach [66] Combining these techniques, and the following intermediates, as defined by Parrish and co-workers [50], allows us to diagonalize M and form the MP2 projector (U V ia ) as Note that the size of the index V can be truncated based on the magnitude of the corresponding eigenvalue τ V .Even though the diagonalization of M is technically cubic-scaling, the size of the w index can provide a large prefactor.In the case of larger molecules, the size of the V index is often much smaller than the size of the [Qw] index, and thus truncated diagonalization approaches like the one given in reference 67 may be used.Overall, this approach scales O(N 4 ).Similarly, projectors can be derived from MP3, albeit the equations are more complex [50,54], For triples amplitudes, we present two approaches devised by Lesiuk.In his SVD-CCSDT algorithm [53], he took guess t abc ijk amplitudes, such as from CC3, and applied either a TUCKER-3 decomposition (scaling O(N 8 )) or an iterative SVD approach (scaling O(N 6 )), yielding the form of equation 8.
In his RR-CCSD(T) paper, Lesiuk devised an O(N 5 ) scheme to compute projectors from the form of the perturbative triples amplitudes (Equation 22), in a variant of HO-OI (Higher Order-Orthogonal Iteration) [54].The steps of the algorithm are as follows: • Start with the a guess of the triples projector V A ia .This can be done naively by setting V A ia = U A ia from the doubles amplitudes.
• Evaluate t ia,BC from the current guess of the triples amplitudes, where By using the explicit expression for t abc ijk and W abc ijk , this can be evaluated in O(N 5 ).The working equations are presented in reference 54.
• Compute the SVD of t ia,BC , and take the largest n proj left singular vectors as the next ia .This can be done in O(N 5 ) time using a modified variant of truncated SVD, given in reference 67.In this algorithm, we save the singular values of this step (σ A ), when we perform the CP decomposition of the triples projector.Pseudocode for this will be presented in Section IV.
• Iterate until convergence.Convergence is defined when the difference between the Frobenius norm of the rank-reduced triples amplitudes t ABC , defined as between two successive iterations, falls below 10 −5 .
Since the source of the orthogonal projectors is not relevant to the scope of this paper, we will only present results from computations utilizing the MP2 projector for the doubles amplitudes, and Lesiuk's HO-OI approach for the perturbative triples amplitudes.

D. Tensor Hypercontraction (THC)
Tensor hypercontraction (THC) can be viewed as a "double approximation," where two auxiliary indices are introduced to fit a high-dimensional tensor instead of just one.The THC form of electron repulsion integrals is defined as [68]: This can be derived from the CP decomposition of B Q ia , (Equation 25) Similarly, the THC form of coupled-cluster amplitudes can be derived from the tensor hypercontraction of the orthogonal projectors, given by, in the case of the doubles projector: [52] U For the triples projector, it assumes a very similar form, A PARAFAC/CANDENCOMP (CP) decomposition approach on V A ia may be used.This approach is not dependent on the source of the projectors, and any of the projector building approaches from Section C may be used.Here we use the variant of CP decomposition, first introduced by Hohenstein et.al. for the doubles projector [52], where the eigenvalues of the doubles projector are in the CP decomposition, into the alternating least-squares (ALS) iterations.
In our algorithm, for the decomposition of the triples amplitude, instead of using the eigenvalues of the doubles projector, we use the singular values of the t ia,BC intermediate (σ A ).The functional to minimize is hence: And the update rule for each intermediate is given as Note that the update rule for θ is the same as in traditional CP decomposition.
Since a CP decomposition does not exactly recreate the original projector, the projectors lose their orthogonal property [52].Therefore, we have to re-create the projectors after the CP decomposition: The t abc ijk amplitudes can now be rewritten as, from equation 8: Recently, Hohenstein et.al. have devised an algorithm that takes advantage of the THC form of the t ab ij amplitudes to develop an O(N 4 ) scaling implementation of CCSD [52].In the next section, we will show how to extend this to the (T) correction with the THC form of the t abc ijk amplitudes.

III. DERIVATION OF WORKING EQUATIONS
We first define a couple of intermediates.From Lesiuk [54], we define: Next, we define the following chain of intermediates from contracting the polyadic vectors (z X i and z X a ) of the triples projector with the the doubles projector, the DF/RI or CD decomposed ERIs, the D intermediate from equation 47, as well as the T 1 amplitudes.
We then take Equation 23, Equation 19, Equation 45, and the previously defined intermediates, to arrive at the THC form of the triples energy correction:

Contractions (Equations 56 -57)
Contractions (Equations 58 -62) for V in [0, n proj ) do parallelize ), GEMM, built on the fly to save storage ), GEMM, built on the fly to save storage end for

Contractions (Equations 63 -65)
for V in [0, n proj ) do parallelize ), GEMM, built on the fly to save storage for Y in [0, n proj ) do The code is implemented in a developmental plugin version of the Psi4 Quantum Chemistry code [69], following the completion of an exact CCSD computation.Tensor contractions are performed with the help of the EinsumsInCpp software (public on GitHub).The compressed doubles amplitudes T V W used to build the triples projector are formed by transforming the exact CCSD amplitudes from the preceding computation by the MP2 projector amplitudes.This method is designed to be fully compatible and used with Hohenstein's THC-CCSD method [52].Future studies of using THC-CCSD(T) in conjunction with THC-CCSD is encouraged.
V. RESULTS

A. Conformation Energies
We first evaluate our new THC-CCSD(T) method on the CYCONF [70] [71] data set, a set containing 11 different conformations of gaseous cysteine, with 10 corresponding conformation energies, relative to the lowest conformer.We evaluate conformation energies for each of the 10 conformations in CCSD, CCSD(T), and THC-CCSD(T), and for each system, and we use the exact CCSD(T) conformation energy as the reference.We do this using the cc-pVDZ and jun-cc-pVDZ Dunning correlation-consistent basis sets [72][73][74][75].The basis set jun-cc-pVDZ consists of diffuse functions added to all heavy atoms, except for the basis functions with the highest angular momentum.For the THC-CCSD(T) computations, we set the eigenvalue tolerance of the MP2 projector to be 10 −4 .In other words, the ranks (n proj ) of the doubles and triples projectors are determined from how many eigenvalues of the MP2 t ab ij amplitudes are greater than 10 −4 , defined as τ from Equation 29 in our work.For these computations, n proj is around 400, compared to the max possible rank of 2205 (n occ n virt ) in the cc-pVDZ basis, yielding a compression ratio of around 18%.Similarly, in the jun-cc-pVDZ basis, the ratio is 440/2793, which is around 16%.
The summary statistics are presented in Table I, and the results for each individual conformation are presented in Figure 1.In the table, for the THC-CCSD(T) algorithms, the eigenvalue tolerance is given in parentheses.To summarize the findings, THC-CCSD(T) consistently gives lower errors compared to CCSD, for both basis sets, and the errors are on the order of less than 0.1 kcal/mol.It is further encouraging to note that the absolute energy errors for these sets of computations hover around 0.3 − 0.4 kcal/mol, such that the evaluation of relative energies benefits from favorable error cancellation.The error also does not significantly grow with the addition of diffuse functions, from cc-pVDZ to jun-cc-pVDZ.

B. Potential Energy Surface
We perform next, a potential energy surface scan on the benzene-HCN dimer system (compound 19 from the on S22 data set [76]), with the hydrogen atom of HCN pointing towards the π-bonds in the benzene.We measured the energy of the system at five different inter-atomic distances, relative to the equilibrium geometry, ranging from 0.9 to 2.0 times the equilibrium geometry length, with the geometries coming from the S22x5 data set [77].
In Figure 2, we plot the shape of the potential energy surface of the THC-CCSD(T) method at an eigenvalue tolerance of 10 −4 , as well as using predetermined projector ranks of 400 and 500.For all systems, an eigenvalue tolerance of 10 −4 corresponds to a projector rank between 420-430.All THC-CCSD(T) computations better capture the potential energy surface than the reference CCSD computations, with the computations with the predetermined projector ranks better capturing the shape of the surface than the one with a set eigenvalue tolerance.The THC-CCSD(T) potential energy surface with n proj set to 500 exactly matches the CCSD(T) potential energy surface, for practical purposes, with a max error of 0.027 kcal/mol, and a RMSE of 0.014 kcal/mol.The shape of the potential energy surface, for each method, is shown in Figure 2, while the error statistics are presented in Table II.
The errors are especially encouraging for the case of n proj set to 500, as the absolute energy error of each system compared to CCSD(T) hover around 0.4 kcal/mol.

C. Rank Convergence
Next, to demonstrate the convergence of the THC-CCSD(T) method, compared to the exact CCSD(T) energy, we ran a series of computations of the water dimer from the S22 set [76], at eigenvalue tolerances from 10 −3 to 10 −11 .An eigenvalue tolerance of 10 −11 corresponds to no rank compression for this system.The errors with respect to eigenvalue tolerance and compression ranks are plotted in Figure 3, and it is encouraging to see the errors decrease smoothly to the true CCSD(T) energy, within the DF/RI approximation of the ERIs.We attribute the "kink" in the graph from 10 −4 to 10 −6 as an artifact of the CP decomposition of the triples projector, with the CP error increasing slightly between the projector ranks of 122 -156, before going back down.This artifact is well known on studies of the CP decomposition algorithm [51], where medium CP decomposition ranks suffer larger losses in accuracy compared to small or large ranks.Further studies and work are encouraged to look for ways to mitigate this phenomenon in the context of decomposing CC amplitudes.

D. Scaling
To establish the O(N 5 ) scaling of the THC-CCSD(T) method, it must be shown that the projector rank, or n proj must scale linearly with respect to system size.Hohenstein and Parrish have previously established the linear scaling of n proj for doubles amplitudes in their previous work [50,52].However, to verify this in our algorithm, we must show that the error does not grow more than linearly with linear increases in system size.Below, we present THC-CCSD(T) computations on systematically larger systems of waterclusters and linear alkanes, from 1-8 heavy atoms, in the cc-pVDZ and jun-cc-pVDZ basis sets, evaluated at an eigenvalue tolerance of 10 −4 .As shown in Figures 4-7, sub-linear to linear error growth are shown, with respect to projector rank and system size, with virtually no loss in accuracy from cc-pVDZ to jun-cc-pVDZ in both systems.

VI. CONCLUSIONS
In this paper, we present the working equations for the THC-CCSD(T) method, a O(N 5 ) scaling approximation to CCSD(T), that allows for systematic control of errors.In our pilot implementation, we show the errors are controllable to the point of maintaining chemical accuracy of less than 0.1 kcal/mol for relative energies, and 1 mEh for absolute energies, while maintaining size extensivity.We also showed that the method yields continuous potential energy surfaces that closely matches the CCSD(T) surfaces with sufficient projector rank.In the future, we hope to consider ways to improve the errors of the method at a given eigenvalue tolerance, such as through using other sources for the orthogonal projector.We would also like to look into alternative approaches to the THC factorization of orthogonal projectors.
Though a CP decomposition is generally applicable, and relatively easy to implement, it does not assume any underlying form about the amplitudes.One avenue is the extension of the quadrature-based approach of Parrish, Hohenstein, Martinez, and Sherrill with Least-Squares Tensor Hypercontraction (LS-THC) to the triples amplitudes [78,79].

FIG. 1 .
FIG. 1. Errors in conformation energies for CCSD and THC-CCSD(T) evaluated on the CYCONF data set, compared to the exact CCSD(T) reference, evaluated in with the cc-pVDZ and jun-cc-pVDZ basis sets, with a 10 −4 eigenvalue tolerance.

FIG. 3 .
FIG.3.The convergence of the absolute energy of a water dimer system (S22), with respect to eigenvalue tolerance and rank.
67 aid future research and development, we present pseudocode for some of the algorithms we use for the optimal contraction of intermediate terms to evaluate the THC-CCSD(T) energy.We first present our non-iterative SVD algorithm to factorize the t ia,BC intermediate, inspired by the truncated SVD and diagonalization algorithms given in Ref.67.In Algorithm 1, we present a non-iterative truncated SVD algorithm to avoid the O(N 6 ) scaling of a traditional SVD of the t ia,BC intermediate.In Algorithms 2-4, we present suggested contraction orders, as well as tensor slicings, for each term of the THC-CCSD(T) energy expression.We try to make the contractions such that highly-efficient level 3 BLAS matrix multiplication calls are utilized as much as possible.For each step of each algorithm, the runtime is given, and if a level 3 BLAS matrix multiplication call is possible, then the term QV jb intermediates.It may be possible to reduce the memory cost in future implementations of this method, but that is beyond the scope of this paper.Truncated SVD algorithm for t ia,BC Equations 56 and 57 correspond to the first term in equation 23, equations 58 through 62 the second term, and equations 63 to 65 the third term.All of the contractions can be determined in O(N 5 ) time or less.IV.IMPLEMENTATION DETAILS (GEMM) is added.Additionally, the D QV XY intermediate is never fully built to help with memory costs.The runtime of this algorithm is O(N 5 ), with O(N 4 ) storage costs, the only quartic memory requirements involve the storage of the t ia,BC and D

TABLE I .
Errors in conformation energy compared to the exact CCSD(T) reference (kcal/mol).The number in parenthesis is the eigenvalue tolerance used to determine projector rank.