Automatic Generation of Accurate and Cost-Efficient Auxiliary Basis Sets

We have recently discussed an algorithm to automatically generate auxiliary basis sets (ABSs) of the standard form for density fitting (DF) or resolution-of-the-identity (RI) calculations in a given atomic orbital basis set (OBS) of any form, such as Gaussian-type orbitals, Slater-type orbitals, or numerical atomic orbitals [J. Chem. Theory Comput.2021, 17, 6886]. In this work, we study two ways to reduce the cost of such automatically generated ABSs without sacrificing their accuracy. We contract the ABS with a singular value decomposition proposed by Kállay [J. Chem. Phys.2014, 141, 244113], used here in a somewhat different setting. We also drop the high-angular momentum functions from the ABS, as they are unnecessary for global fitting methods. Studying the effect of these two types of truncations on Hartree–Fock (HF) and second-order Møller–Plesset perturbation theory (MP2) calculations on a chemically diverse set of first- and second-row molecules within the RI/DF approach, we show that accurate total and atomization energies can be achieved by a combination of the two approaches with significant reductions in the size of the ABS. While the original approach yields ABSs whose number of functions NbfABS scales with the number of functions in the OBS, NOBSbf, as NABSbf = γNOBSbf with the prefactor , the reduction schemes of this work afford results of essentially the same quality as the original unpruned and uncontracted ABS with γ ≈ 5–6, while an accuracy that may suffice for routine applications is achievable with a further reduced ABS with γ ≈ 3–4. The observed errors are similar at HF and MP2 levels of theory, suggesting that the generated ABSs are highly transferable and can also be applied to model challenging properties with high-level methods.


Introduction
The density fitting [1][2][3][4][5] (DF) also known as the the resolution of the identity 6 (RI) technique has become one of the mainstays of quantum chemistry.The DF/RI approach is widely available in many program packages as an optional technique to accelerate self-consistent field (SCF) as well as post-Hartree-Fock calculations.It is used by default in some packages, such as Psi4 7 and Orca 8 , while other packages like BAGEL 9 and Demon2K 10 go even further by relying solely on the use of DF/RI for all calculations, thus foregoing traditional routines based on the exact electron repulsion integrals (ERIs).
Assuming that the basis-functions are realvalued, these ERIs are typically written in the Mulliken notation as where µ, ν, σ, and ρ are indices of atomicorbital basis functions χ µ that together form the orbital basis set (OBS).In the DF/RI approach, the two-electron integrals of the OBS are approximated with the help of an auxiliary basis set (ABS) as 6 (µν|σρ) where A and B are functions in the ABS and (A|B) −1 denotes the A, B element of the inverse of the two-index Coulomb overlap matrix (A|B).Traditionally, ABSs are manually optimized for ground-state energy calculations for each OBS for each level of theory, 11,12 but some automatic algorithms for generating an ABS from the given OBS have also been suggested; [13][14][15][16][17][18] see ref. 12 for a recent review.While some of these automated schemes inherently assume the use of Gaussian basis sets, 14,15,17 we recently suggested an automated algorithm which can be used with any type of atomic-orbital basis set.In short, the central idea of this scheme is to generate the ABS by choosing it such that it spans all one-center products χ µ χ ν in the OBS to a given degree of precision.
Because the angular part is closed-the product of two spherical harmonics yielding a linear combination of spherical harmonics-the task can be reduced to studying radial functions that should be well-described by the sought-for ABS.As the one-center OBS products χ µ χ ν can couple to angular momentum |l µ −l ν | ≤ L ≤ l µ +l ν , where l µ and l ν are the angular momenta of the functions χ µ and χ ν , respectively, the algorithm of ref. 18 works by assembling a list of normalized product radial functions for all possible angular momenta L in the auxiliary basis set, N I being the normalization coefficient in eq. ( 3).These product functions R L I serve as candidate auxiliary basis functions for each shell L. The set of candidate radial functions can be prescreened by a pivoted Cholesky decomposition of the ERI tensor; 19 this yields a more compact auxiliary basis set at no loss in accuracy. 18ext, the algorithm chooses the set of auxiliary radial functions for each angular momentum L, R L I (r), by a pivoted Cholesky decomposition of the one-center Coulomb overlap matrix (R L I |R L J ), motivated by earlier work of Aquilante et al. 15 The decomposition is carried out to the given threshold τ .The resulting set of functions is capable of reproducing all the functions in the candidates pool {R L I } to within the used threshold τ , thus affording an optimal 20 black-box method to generate auxiliary basis sets.
The algorithm of ref. 18 is applicable to any atomic-orbital basis sets: the necessary equations to compute the one-center Coulomb overlap matrix for Gaussian-type orbitals (GTOs) as well as Slater-type orbitals (STOs) were given in ref. 18, while the necessary integrals for numerical atomic orbitals 21 (NAOs) can easily be computed by quadrature. 22s a side note, we have also successfully used an analogous technique to cure OBS overcompleteness in SCF calculations by solving the Roothaan equations 23 F C = SCE, where F is the Fock matrix, S is the overlap matrix, and C and E are the matrix of orbital expansion coefficients and the corresponding diagonal matrix of orbital energies, in a subbasis chosen by the pivoted Cholesky decomposition of S. 24,25 Choosing the basis functions by a pivoted Cholesky decomposition of the (Coulomb) overlap matrix-which is a so-called Gram matrix-is analogous to using an optimal Gram-Schmidt orthogonalization. 12ven if automatically generated ABSs tend to be large compared to their possible manually optimized counterparts, they can be extremely useful for practical applications.Although an automatically generated auxiliary basis set may be several times larger than a manually optimized conterpart, the cost of DF/RI methods scales only linearly with the size of the ABS.The automatic auxiliary basis set machinery enables the use of the DF/RI technique in all OBSs, offering significant speedups in calculations compared to the use of the exact TEIs, because the factorization of eq. ( 2) can be employed to formulate efficient implementations of many quantum chemical models.Therefore, DF/RI calculations with automatically generated basis sets tend to not only be considerably faster than calculations with exact ERIs, but also not orders of magnitude slower than DF/RI calculations employing hand-optimized ABSs-provided that such ABSs exist for the employed OBS and the studied property at the considered level of theory.
Our previous study 18 focused on showing that negligible errors in Hartree-Fock (HF) and second-order Møller-Plesset perturbation theory (MP2) total energies could be achieved with the technique proposed therein.The calculations in ref. 18 were carried out with GTO basis sets to allow the straightforward computation of exact values without the DF/RI approximation.
In this work, we will focus on improving the cost-efficiency of the automatically generated ABS.We will accomplish this along two routes.First, we will consider contracting the ABS employing a singular value decomposition (SVD).The use of this technique to reduce the cost of RI/DF methods was originally suggested by Kállay 26 within a different context: systemspecific truncations of fixed ABSs in polyatomic calculations.In this work, the contraction of the ABS serves to restore the energetic connection of the autogenerated basis to the twoelectron integrals, which was severed in the algorithm of ref. 18 that was laid out above.The contraction procedure is especially useful in the case of contracted Gaussian OBSs, as the resulting contracted ABSs turn out to have fewer auxiliary basis functions.The contraction procedure is also seen to be beneficial with uncontracted Gaussian basis sets, as it results in a reduced number of auxiliary basis functions.We expect the contraction procedure to be useful especially for NAOs, because the recontraction of NAOs can be carried out at no cost to integral evaluation.
Second, we will also consider two schemes to discard high-angular momentum (HAM) functions from the ABS: in addition to the scheme used by previous automated schemes (refs.14 and 17), we will also discuss a first-principles scheme, which has been previously discussed in the context of optimized ABSs. 27As we will demonstrate in this work, the HAM functions can be safely discarded in global fitting methods, as they have negligible effect on total and relative energies.
The layout of this work is the following.Next, in section 2, we will discuss the theory behind the present approach: the contraction of the ABS is discussed in section 2.1, while the pruning of HAM functions is discussed in section 2.2.The computational details of the benchmarking based on HF and MP2 calculations are given in section 3. The results of the study are discussed in section 4, and the study concludes in a brief summary and discussion in section 5. Atomic units are used throughout, unless specified otherwise.

Contracted sets
For clarity, we will reuse the notation of Kállay 26 , who writes eq. ( 2) in the form where I µν,ρσ = (µν|ρσ) is the ERI tensor, I µν,A = (µν|A) are the three-index integrals and V A,B = (A|B) are the two-index Coulomb overlap integrals.Equation (4) can be expressed in a more compact form in the orthogonalized ABS: defining orthogonalized three-index integrals, eq. ( 4) is given simply by Kállay 26 pointed out that the size of the auxiliary basis can be reduced by a SVD of J by throwing out vectors with suitably small singular values.The singular values can be computed as the eigenvalues of the symmetric matrix which has the dimension N aux ×N aux where N aux is the number of auxiliary basis functions.Be-cause the number of auxiliary functions typically scales as the number of orbital basis functions, the matrix W of eq. ( 7) can be diagonalized even in large calculations.
In this work, we use the above method to contract the atomic auxiliary basis.This merely requires forming W for each element, choosing the contraction based on the singular values, and repeating the procedure for all elements in the basis set.
Because the W matrix defined by eq. ( 7) is obtained by summing over all OBS products on the element, W turns out to have a special angular structure.The only significant values are found when the angular momenta of the bra and ket basis functions coincide, l ′ = l and m ′ = m, the other elements of W being numerically zero.As one could expect, all spatial directions are therefore averaged in W .
Because of this special structure, it suffices to compute, e.g., the m ′ = 0 = m subblock of the l = l ′ angular submatrix of W , which we denote as L, which contains information on the importance of the radial auxiliary functions of angular momentum l.
The eigendecomposition of L then yields where Λ is a diagonal matrix of eigenvalues and U is the corresponding matrix of eigenvectors.Now, we can choose a reduced ABS corresponding to some precision ϵ by including only those columns U i that correspond to eigenvalues λ i ≥ ϵ.Such an ABS has guaranteed accuracy, as the three-index integrals are reproduced to the precision specified by ϵ. 26 The contraction algorithm is thus the following: 1. Define the contraction threshold ϵ and read in the OBS and the ABS.
2. Loop over atoms, and the atomic angular momentum l in the ABS (a) Form the l = l ′ and m = 0 = m ′ angular subblock of W from eq. ( 7), denoted as L.
(b) Compute the eigenvectors U i with eigenvalues λ i of L.
(c) Include the eigenvectors U i in the auxiliary basis if their eigenvalues satisfy λ i ≥ ϵ .
To extract the contraction coefficients of a GTO auxiliary basis, a further step is necessary.As Gaussian basis functions are by convention tabulated in terms of primitives normalized in the overlap metric, while the coefficients of the eigenvectors C formed above correspond to Coulomb normalized primitives, one has to convert the contraction coefficients into the expected normalization.This is achieved by scaling the coefficient of the µ:th Gaussian primitive in the i:th auxiliary function, C µi , by the square root of the corresponding Gaussian exponent, C µi → C µi √ α µ , see the Appendix for a derivation.

Pruned Angular Momentum
By default, automatically generated basis sets arising from a large orbital basis set may contain functions of extremely HAM, as an orbital basis with maximum angular momentum l max OBS will give rise to orbital products with maximum angular momentum 2l max OBS .For instance, a polarized quintuple-ζ basis for the oxygen atom (l max OBS = 5) will lead to an auxiliary basis set with up to l = 10 functions.
Such functions are problematic in Gaussianbasis programs, as most programs do not implement integrals to such HAM.Moreover, since Gaussian integrals are typically computed in the cartesian Gaussian basis, computing integrals with HAM functions is extremely costly: the number of cartesian functions of angular momentum l is (l + 1)(l + 2)/2 compared to 2l + 1 for spherical functions.Furthermore, the HAM functions' contributions to total ground state energies tend to be negligible.This motivates removing them from the ABS altogether by a pruning procedure.
The question of what HAM functions can be removed is best approached by asking the opposite question: what are the functions which should never be pruned from the ABS?
Even in highly excited states, most of the total energy arises from orbitals that are occupied in the atomic ground state.The correct description of occupied orbitals being of key importance, the pruned auxiliary basis set should therefore reproduce (i) the highly energetic interactions of the atom's occupied orbitals with each other, as well as (ii) the interactions of the atom's occupied orbitals with its unoccupied orbitals, which are important for polarization and correlation effects in polyatomic systems.
Criterion (i) is automatically satisfied, if all products of occupied orbitals are correctly reproduced by the auxiliary basis set.Therefore, we demand that the maximum angular momentum of the functions kept in the auxiliary basis, l max keep , is at least two times the maximum angular momentum of occupied shells in the ground state of the atom l max occ , l max keep ≥ 2l max occ .Yang et al. 14 define l max occ as having the values 0, 1, 2, and 3 for Z ≤ 2, 3 ≤ Z ≤ 18, 19 ≤ Z ≤ 54, and Z ≥ 55, respectively, while Stoychev et al. 17 use the ranges Z ≤ 2, 3 ≤ Z ≤ 20, 21 ≤ Z ≤ 56, and Z ≥ 57, respectively.That is, while both Yang et al. 14 and Stoychev et al. 17 include p functions for the pre-d Li and Be atoms, the former likewise add d and f functions for the pre-d and pref alkali and alkaline atoms, respectively, while the latter do not.We will follow Yang et al. 14 and include d and f functions for K and Ca, and Cs and Ba, respectively.
Criterion (ii) for the minimal accuracy of the auxiliary basis set is satisfied, if the auxiliary basis is able to accurately describe products of an occupied atomic orbital with any other orbital.Denoting the maximum angular momentum of the orbital basis by l max OBS , the second condition is seen to be satisfied by l max keep ≥ l max occ + l max OBS .The two criteria must be satisfied simultaneously, which is why we choose where l inc is an adjustable parameter.As far as we are aware, this criterion has not been studied in the literature in the context of automatic generation of auxiliary basis sets, although a similar discussion can be found in the work of Weigend 27 discussing the optimization of auxiliary basis sets.(The implementation of the atomic Cholesky procedures of Aquilante et al. 13 and Aquilante et al. 15 in the OpenMolcas program 28 does include optional pruning of the angular momentum to a maximum defined by the auxiliary basis sets of Weigend 27 ; however, this procedure does not appear to be fully documented.) In contrast, the work of Yang et al. 14 on Coulomb fitting limited the maximum angular momentum by and employed l inc = 1 for all elements, while Stoychev et al. 17 similarly employed eq. ( 10) with l inc = 1 for elements up to Ar and 2 for other elements.
Our truncation scheme (eq.( 9)) is seen to have important differences from that used by Yang et al. 14 and Stoychev et al. 17 (eq.( 10)): eq. ( 9) includes the atomic angular momentum and therefore naturally employs a larger angular momentum cutoff for heavier elements.
We will compare the two schemes for pruning the angular momentum defined by eqs.( 9) and (10) in section 4.1, examining various values for l inc and comparing the results to ones obtained with the full, unpruned and uncontracted autogenerated auxiliary basis.

Computational Details
The computational details of this work are similar to those of ref. 18.As in the previous work, we will examine the accuracy of the scheme on the triple-ζ to quintuple-ζ nZaPa-NR basis sets of Ranasinghe and Petersson 29 , for which optimized auxiliary basis sets have not been reported in the literature.These OBSs were obtained from the Basis Set Exchange, 30 in whose Python backend we previously implemented 18 the AutoAux procedure of Stoychev et al. 17 , which will again be used as a state-of-the-art point of reference.In contrast to ref. 18, where all ABSs were generated for fully uncontracted OBSs, the ABSs used in this work are generated for contracted OBSs.
The full primitive ABSs were generated with ERKALE 31,32 following the procedure of ref. 18, employing the threshold τ = 10 −7 for the pivoted Cholesky decomposition.Unless specified otherwise, the orbital products were prescreened with a pivoted Cholesky decomposition of the ERI tensor, as recommended in ref. 18.
The contraction and reduction schemes presented above in section 2 were implemented in ERKALE 31,32 in this work, and they are freely and openly available on GitHub.The employed values for the contraction and reduction parameters, ϵ and l inc , respectively, are discussed in section 4.
Our previous work 18 studied HF and MP2 total energies in the non-multireference part of the W4-17 test set 33 and showed that their RI/DF errors can be made negligible by the use of suitably large primitive ABSs.As the ABSs generated by the previously suggested automated procedure are already known to be transferable, 18 and the contraction and reduction schemes of section 2 are based on mathematical principles, we will study here the chemically diverse subset of first-and second-row molecules from the database of Weigend and Ahlrichs 34 , which has been previously used to assess DF/RI auxiliary basis sets for Hartree-Fock. 27We limit the calculations to molecules containing at most nine atoms, which suffices to demonstrate the accuracy of the reduced basis sets.
As our test systems only include first-and second-row molecules, the same value of l inc is used for all elements in eq. ( 10) in our test calculations; note that Yang et al. 14 and Stoychev et al. 17 similarly use l inc = 1 for all elements up to argon.

Detailed Analysis in the 3ZaPa-NR Basis Set
We begin by analyzing how RI/DF errors in HF and MP2 total and atomization energies are affected by contracting the auxiliary basis set according to the procedure of section 2.1, when the 3ZaPa-NR OBS is used.Because the DF/RI error is an extensive quantity, we will examine errors in total energies per electron, and errors in atomization energies per atom.
The RI/DF errors of the studied test set are shown as a function of the contraction threshold in fig. 1 for HF and MP2 total and atomization energies.The RI/DF errors converges rapidly when the contraction threshold ϵ is decreased.We observe that already the contraction threshold ϵ = 10 −4 affords RI/DF errors in the order of a few µE h per electron in total energies, and a few cal/mol per atom in atomization energies, which are negligible compared to the truncation error of the OBS and the inherent errors in the employed levels of theory.
It is also important to note here that the HF and MP2 errors behave very similarly as a function of ϵ.This result is fully expected for the contraction based on a SVD of the threeindex integrals: because all integral classes are treated on the same footing, the error is not expected to depend on the examined level of theory.Interestingly, the RI/DF errors afforded by small values of ϵ are smaller than those obtained with the full parent auxiliary basis set.This difference can likely be attributed to the better conditioning of the contracted auxiliary basis set: unlike the parent ABS, the functions of the contracted ABS are orthonormal on each atom, leading to improved numerical stability in the RI/DF method when the contracted ABS is used in polyatomic calculations; this in turn reduces numerical errors.In addition, the contracted ABS also contains fewer functions, which likewise improves numerical stability.
We move on to studying the effect of pruning HAM auxiliary functions according to eqs. ( 9) and (10).The results of these procedures on HF and MP2 total and atomization energies are shown in fig. 2. The value l inc = 0 yields good results with eq. ( 9): although the MP2 total energy exhibits a small maximum absolute error of some 13 µE h per electron (which arises for H 2 ), the errors in HF total energies appear to be small and close to those of the full original ABS.
In contrast, the value l inc = 0 yields unacceptably large errors for eq.( 10): for instance, the MP2 total energy of the nitrogen atom is off by −223µE h per electron and the MP2 atomization energy of P 2 is off by −893 cal/mol per atom.
Because eq. ( 10) clearly exhibits the wrong physical behavior, for the rest of this work we will consider only eq. ( 9) for pruning the HAM functions.With this scheme, l inc = 0 already yields good accuracy, and the values with l inc = 1 are almost converged to the full uncontracted and unpruned ABS, which is reproduced by l inc = 2 for the presently considered triple-ζ 3ZaPa-NR basis.

Performance in Larger Basis Sets
We observe from the analysis of section 4.1 that contraction thresholds ϵ ≲ 10 −4 afford suitably accurate total and atomization energies, while pruning HAM functions with eq. ( 9) also works well.In this section, we examine the accuracy of the compound procedure, where the ABS is both contracted and pruned of HAM functions.
(a) Total energies, pruning with eq. ( 9) (b) Total energies, pruning with eq. ( 10) (c) Atomization energies, pruning with eq. ( 9) (d) Atomization energies, pruning with eq. ( 10) Figure 2: The effect of pruning high-angular-momentum functions according to eqs. ( 9) and ( 10) on the RI/DF errors HF (cyan) and MP2 (orange) total and atomization energies of the studied molecules in the 3ZaPa-NR basis set.The last entries in the violin plots show the error distribution for the full, unpruned and uncontracted parent auxiliary basis set.
Starting with the total energies shown in fig.3, we observe for all OBSs that the ABS contracted and pruned with eq. ( 9) with l inc = 1 affords negligible errors: as long as the contraction threshold ϵ ≲ 10 −5 , the errors in the HF and MP2 total energies are of the order of 1 µE h per electron or smaller.
We also observe that for a fixed value of l inc , the maximum RI/DF errors go down when the size of the OBS is increased.While l inc = 0 leads to errors in the total energy of the order of 13 µE h per electron with the triple-ζ 3ZaPa-NR OBS, the error decreases to O(4 µE h ) per electron in the quadruple-ζ 4ZaPa-NR OBS, and O(1 µE h ) per electron in the quintuple-ζ 5ZaPa-NR OBS.
Given that the increase in the size of the OBS is accompanied with an increase in the maximum angular momentum of the corresponding ABS (eq.( 9)), a likely explanation for the decreasing error with increasing OBS is that the one-center approximation that is the cornerstone of the DF/RI method 12 is becoming more accurate: energetically important twocenter products of OBS functions are captured to better and better accuracy, when the ABS becomes larger and larger.Increasing the cardinal number of the OBS adds higher polarization shells in the autogenerated ABS, but it also adds more radial functions to lower angular momenta in the ABS.
The corresponding results for atomization energies are shown in fig. 4. The errors in atomization energies of the order of cal/mol per atom are orders of magnitude smaller than the usual limit of chemical accuracy of 1 kcal/mol.We again note the increasing accuracy of the ABS with fixed l inc in increasing size of the OBS, which is clear in the small-ϵ data for l inc = 0. Excellent accuracy is achieved with l inc = 1 and ϵ ≲ 10 −5 in all OBSs.

Size Reductions
Having established that an excellent level of accuracy can be reached with the DF/RI approximation with the presently considered pruning and contraction technique for the automatically generated ABSs, we can proceed to discussing the associated reductions in the size of the ABS.As was already mentioned above, the computational cost of the DF/RI technique is linear in the size of the ABS; therefore, discussion on the cost of the DF/RI technique usually boils down to the examination of the ratio that is, the number of auxiliary functions divided by the number of orbital functions.
In the following, we will examine the accuracy and cost of automatically generated ABSs with the present method, considering simultaneous contraction with ϵ ∈ [10 −4 , 10 −5 , 10 −6 ] and pruning with l inc ∈ [0, 1], which yielded good results above in section 4.2.We will compare the results to those of the AutoAux procedure of Stoychev et al. 17  The accuracy of these ABSs is shown in fig. 5. From these data, we can identify reasonable combinations of the truncation parameters, which appear to be balanced in the contraction and pruning of HAM functions.The choice ϵ = 10 −4 and l inc = 0 yields a "small" ABS, while ϵ = 10 −5 and l inc = 1 leads to a "large" ABS, and ϵ = 10 −6 and l inc = 1 yields a "verylarge" ABS.
The AutoAux basis is much more accurate for HF than for MP2 in large OBSs, and the method does not offer a straightforward way to improve the accuracy of the generated ABS.In contrast, the "small", "large" and "verylarge" ABSs obtained with the above procedure provide a sequence of improving accuracy in all studied OBSs.With the exclusion of the "small" ABS in the 3ZaPa-NR OBS, they also afford similar accuracy at both HF and MP2 levels of theory.
The ratios γ defined by eq. ( 11) for the various ABSs are shown in fig.6, where the size of the full, unpruned and uncontracted autogenerated ABS is also shown for reference.The minimum and maximum values of γ are also given in table 1 for all OBSs and the corresponding ABSs.
As was already discussed in ref. 18, the full autogenerated auxiliary basis sets are large: we see from fig. 6 and table 1 that the ratio γ varies from 8 to 12 in the full ABS.However, we also The effect of contraction on the RI/DF errors in the HF (cyan) and MP2 (orange) total energies of the studied molecules in various basis sets, when the auxiliary basis is also pruned with eq. ( 9) with l inc = 0 or l inc = 1.The last entry in the 3ZaPa-NR and 4ZaPa-NR plots show the error distributions for the full, unpruned and uncontracted parent auxiliary basis sets.see that the contraction and pruning procedure of this work leads to significant reductions in the size of the ABSs.Already going from the full ABS to the "verylarge" ABS is seen to roughly halve the number of auxiliary functions, indicating potential for significant savings with negligible errors as shown by the related accuracy data in fig. 5.
A further reduction is achieved by going to the "large" ABS, which still affords consistently small errors (fig.5).The "small" ABS, in turn, exhibits γ in the range 3-4 for all OBSs, enabling quick calculations with somewhat larger errors, which may still be sufficiently small for practical applications.

Role of Primitive Auxiliary Basis
Having demonstrated the accuracy of the reduction scheme, we examine the role of the primitive auxiliary basis fed into the contraction procedure.We will compare contracted ABSs formed with and without the prescreening of the orbital products, which was performed for the results above with a pivoted Cholesky decomposition of the ERI tensor as recommended in ref. 18.
Comparing the compositions of the contracted ABSs for the 3ZaPa-NR and 4ZaPa-NR OBSs shown in table 2, we observe that despite the large differences in the number of Gaussian primitives in the ABSs generated with and without the orbital product screening-which were already observed in ref. 18-in both cases the contracted ABSs turn out to have the same composition, that is, the same number of contracted functions.
However, the use of the prescreening technique is still highly attractive for GTO calculations.Because the integrals are evaluated in terms of the primitive GTOs, reducing the number of primitive functions is key to making the approach computationally efficient.
But, when the present contraction scheme is used with NAOs, it appears that prescreening the orbital products via a pivoted Cholesky decomposition of the ERI tensor is indeed unnecessary: the simpler scheme of simply form-ing all radial orbital products and recontracting them leads to the same compact ABS at the same computational cost.The key difference to GTOs is that NAO integrals are evaluated by quadrature, and the contraction can be carried out as a transformation of the NAOs' expansion coefficients; see ref. 22 for details.

Summary and Discussion
We have described an automated scheme for contracting auxiliary basis sets (ABSs) for a given orbital basis set (OBS).The scheme works with any type of atomic basis functions: in addition to the Gaussian-type orbitals (GTOs) considered in this work, the scheme can also be straightforwardly applied to other types of atomic basis functions, such as Slater-type orbitals (STOs) and numerical atomic orbitals (NAOs).Our procedure starts from a "full" ABS generated using the algorithm of ref. 18.This ABS is then contracted and pruned of high-angular momentum (HAM) functions to produce computationally efficient ABSs.
By default, the scheme of ref. 18 employs a pivoted Cholesky decomposition of the electron repulsion integral (ERI) tensor to choose the subset of orbital products from which the auxiliary basis functions are chosen with another pivoted Cholesky decomposition.We found in ref. 18 that prescreening the orbital products with a pivoted Cholesky of the ERIs significantly decreases the number of primitive GTOs in the resulting autogenerated auxiliary basis set, leading to better computational efficiency while maintaining the same level of accuracy.
Interestingly, we find that the contracted auxiliary basis set obtained with the singular value decomposition (SVD) of eq. ( 7) comes out similar even if the initial pivoted Cholesky decomposition of the ERIs is not performed; that is, when the full set of orbital products is employed to generate the auxiliary functions.The present scheme thus appears to reproduce a similar number of contracted auxiliary functions regardless of whether any initial screening of orbital products is done.
Although GTOs incur additional cost from  As NAO integrals are evaluated by quadrature, the linear transformation of the input auxiliary functions can be simply carried out to their expansion coefficients; the initialization with the pivoted Cholesky of the ERIs to choose the product functions is therefore not needed in the context of NAO calculations.We therefore believe that in addition to the application of the present method to GTO basis sets presented in this work, our scheme will be found to be extremely useful for NAO based calculations, as well.We have recently developed modern software for atomic electronic structure calculations employing high-order numerical methods that allow rapid convergence to the complete basis set (CBS) limit with few numerical basis functions, 21,22,[36][37][38] and hope to apply these techniques to NAO based calculations in the near future, where the schemes of ref. 18  and this work will be of critical importance.

Figure 1 :
Figure 1: The effect of contraction on the RI/DF errors in the HF (cyan) and MP2 (orange) total and atomization energies of the studied molecules in the 3ZaPa-NR basis set.The last entry in the violin plot shows the error distribution for the full, unpruned and uncontracted parent auxiliary basis set. .

1 Figure 3 :
Figure3: The effect of contraction on the RI/DF errors in the HF (cyan) and MP2 (orange) total energies of the studied molecules in various basis sets, when the auxiliary basis is also pruned with eq.(9) with l inc = 0 or l inc = 1.The last entry in the 3ZaPa-NR and 4ZaPa-NR plots show the error distributions for the full, unpruned and uncontracted parent auxiliary basis sets.

1 Figure 4 :
Figure4: The effect of contraction on the RI/DF errors in the HF (cyan) and MP2 (orange) atomization energies of the studied molecules in various basis sets, when the auxiliary basis is also pruned with eq.(9) with l inc = 0 or l inc = 1.The last entry in the 3ZaPa-NR and 4ZaPa-NR plots show the error distributions for the full, unpruned and uncontracted parent auxiliary basis sets.

Figure 5 :
Figure 5: Summary of the performance of the various automated schemes for automated auxiliary basis set construction across various orbital basis sets: the AutoAux procedure of Stoychev et al.17 , as well as contracted and pruned basis sets with the parameters ϵ = 10 −x and l inc = y specified with the shorthand (x, y).The last entry in the 3ZaPa-NR and 4ZaPa-NR plots show the error distributions for the full, unpruned and uncontracted parent auxiliary basis sets.

17
Figure 5: Summary of the performance of the various automated schemes for automated auxiliary basis set construction across various orbital basis sets: the AutoAux procedure of Stoychev et al.17 , as well as contracted and pruned basis sets with the parameters ϵ = 10 −x and l inc = y specified with the shorthand (x, y).The last entry in the 3ZaPa-NR and 4ZaPa-NR plots show the error distributions for the full, unpruned and uncontracted parent auxiliary basis sets.
Figure 5: Summary of the performance of the various automated schemes for automated auxiliary basis set construction across various orbital basis sets: the AutoAux procedure of Stoychev et al.17 , as well as contracted and pruned basis sets with the parameters ϵ = 10 −x and l inc = y specified with the shorthand (x, y).The last entry in the 3ZaPa-NR and 4ZaPa-NR plots show the error distributions for the full, unpruned and uncontracted parent auxiliary basis sets.

ZFigure 6 :
Figure 6: Sizes of autogenerated auxiliary basis for 3ZaPa-NR, 4ZaPa-NR, and 5ZaPa-NR for H-Ar.Reading in order from the left, the original primitive auxiliary basis set, generated by the procedure of ref.18 is shown in black.The verylarge (ϵ = 10 −6 and l inc = 1), large (ϵ = 10 −5 and l inc = 1), and small (ϵ = 10 −4 and l inc = 0 ) auxiliary sets are shown in blue, red, and orange, respectively.For comparison, the size of the ABS generated with the AutoAux procedure of Stoychev et al.17 is shown in magenta.

Table 2 :
Compositions of contracted ABSs for 3ZaPa-NR and 4ZaPa-NR OBSs with ϵ = 10 −5 .The labeling of the angular momentum follows the Gaussian'94 convention, which does not skip the letter J for l = 7 unlike most other conventions.