MP2-Based Correction Scheme to Approach the Limit of a Complete Pair Natural Orbitals Space in DLPNO-CCSD(T) Calculations

The domain-based local pair natural orbital (PNO) coupled-cluster DLPNO-CCSD(T) method has been proven to provide accurate single-point energies at a fraction of the cost of canonical CCSD(T) calculations. However, the desired “chemical accuracy” can only be obtained with a large PNO space and extended basis set. We present a simple yet accurate and efficient correction scheme based on a perturbative approach. Here, in addition to DLPNO-CCSD(T) energy, one calculates DLPNO-MP2 correlation energy with the same settings as in the preceding coupled-cluster calculation. In the next step, the canonical MP2 correlation energy is obtained in the same orbital basis. This can be efficiently performed for essentially all molecule sizes accessible with the DLPNO-CCSD(T) method. By taking the difference between the canonical MP2 and DLPNO-MP2 energies, we obtain a correction term that can be added to the DLPNO-CCSD(T) correlation energy. This way, one can obtain the total correlation energy close to the limit of the complete PNO space (cPNO). The presented approach allows us to significantly increase the accuracy of the DLPNO-CCSD(T) method for both closed- and open-shell systems. The latter are known to be especially challenging for locally correlated methods. Unlike the previously developed PNO extrapolation procedure by Altun, Neese, and Bistoni (J. Chem. Theory Comput.2020, 16, 6142−614932897712), this strategy allows us to get the DLPNO-CCSD(T) correlation energy at the cPNO limit in a cost-efficient way, resulting in a minimal overall increase in calculation time as compared to the uncorrected method.


INTRODUCTION
The struggle for higher accuracy is a never-ending battle in the field of computational studies. The right answer is in principle known and is called the full configuration interaction (FCI) method, but it is limited to very small molecules. In an attempt to build a hierarchy of cheaper methods approaching FCI accuracy, one typically starts with the Hartree−Fock (HF) method. Subsequently, a new correlated wave function is constructed by allowing certain excitations out of a HF determinant in order to recover FCI correlation energy. However, the cost of the calculations increases exponentially with the allowed order of excitations.
A simple remedy to make the calculations more accessible is to cut a part of the orbital space, leaving only the most important orbitals. While truncating canonical HF orbitals may lead to uncontrolled issues, natural orbitals (NOs) are better suited for this purpose. Such orbitals proposed by Loẅdin 1 provide a good basis for rapidly converging CI expansion. Moreover, the CI calculations can then be performed in such a truncated space spanned by strongly and weakly occupied NOs. However, since some untruncated CI calculations are essential to obtain good NOs, this approach is not well-suited for large systems. A more convenient pair-natural orbital (PNO) approximation (sometimes called pseudonatural orbitals) has been proposed and explored by Edmiston and Krauss, 2 by Meyer, 3,4 and later by Ahlrichs and co-workers. 5 At early stages, PNOs have been used with an independent electron pair approximation (IEPA), 4,6 coupled electron pair approximation (CEPA), or configuration interaction doubles (CID). 4,5 These early implementations have been able to produce a decent agreement with the experimental data. 7,8 However, the merit of the PNO approach only came after the development of more accurate methods such as the coupled cluster theory (CC). 9−12 This new combination turned very accurate and expensive calculations into robust wave function based alternatives to the density functional theory calculations with its approximated density functionals, where the accuracy is hard to control. 13,14 Nowadays, coupled cluster method with single and double excitations and perturbative treatment of triple excitations [CCSD(T)] is considered a "gold standard" since it can provide accuracy of less than 1 kcal/mol. 15, 16 Of course, such high accuracy can only be achieved in single-reference cases and at the complete basis set (CBS) limit. Therefore, real-life applications of CCSD(T) are rather limited to small molecules. The scope of applications can be significantly extended by exploring the locality of electron correlation, 17−20 as well as compression of the virtual orbital space via PNOs. Some prominent developments include works of Werner and Schuẗz, 20−27 Head-Gordon, 28,29 and Neese. 30−34 Present work focuses on the domain-based local pair natural orbital coupled-cluster (DLPNO-CCSD(T)) method developed by Neese's group.
DLPNO-CCSD(T) method combines the local correlation theories with the PNO machinery. 35 It relies on the fact that most PNOs are rather localized, and when combined with localized internal orbitals, it allows for a very compact expansion of the wave function. 34 The method has been implemented both for closed-and open-shell systems and has been shown to scale favorably with the system size. 35,36 Several groups have demonstrated that the method produces results of "chemical accuracy" level. 37−39 On the hand, Liakos et al. 38 have noted that the accuracy of the DLPNO-CCSD(T) is lower for open-shell systems. The latter have been found to be more demanding in terms of PNO space size. This is not surprising as already Meyer 4 had noted that certain electron delocalization inevitably occurs in open-shell molecules. Moreover, the realization of the triples (T) correction significantly influences the results. The noniterative, original scheme, denoted here as (T0), 33 does not produce proper spin-state energetics in open-shell systems. 40,41 This may be improved by employing iterative triples correction, denoted here as (T1). 42−44 Decreasing the truncation parameters for PNO occupation numbers provides a systematic way to converge the results to canonical CCSD(T) numbers. At the same time, however, the computational time increases along with memory consumption. Altun and co-workers 45 have observed that the DLPNO-CCSD(T) correlation energy converges exponentially with the increasing number of PNOs. Based on this observation, they have developed a two-point extrapolation procedure that greatly increases accuracy of the calculation. Still, the necessity to perform DLPNO-CCSD(T) calculations twice with different parameters makes this approach expensive in terms of computational time. As a modification of this procedure Drosou et al. 44 proposed to use lower truncation parameters for the extrapolation without significant decrease of accuracy but with improved efficiency. One should note that the entire procedure involves fitted parameter F that depends on the actual PNO truncation parameters used in the extrapolation.
It is interesting to note that the two-point extrapolation is a well-known technique to reach the complete basis set (CBS) limit in the wave function based calculations. 46,47 Another, more economic way to approximate the CBS of a given highlevel method, pioneered by Hobza and co-workers, 48−50 is to calculate the MP2 energy with the same basis set and then estimate the MP2-CBS energy. By taking the difference between the MP2-CBS and finite-basis MP2 correlation energies, one obtains a correction factor that can be added to the high-level method correlation energy calculated at the finite basis set. In the context of locally correlated methods, Wang and co-workers used this method to reach the CBS limit in their generalized energy-based fragmentation CCSD(T)-F12a method. 51 Inspired by this approach, we have designed an analogous correction scheme to approximate the complete PNO space of the DLPNO-CCSD(T). Here, after the DLPNO-CCSD(T) energy evaluation, one calculates the DLPNO-MP2 correlation energy with the same PNO settings. In the next step, the canonical MP2 correlation energy is obtained at the same basis set. This can be efficiently performed for essentially all molecule sizes accessible with the DLPNO-CCSD(T) method. By taking the difference between canonical MP2 (E corr MP2 ) and DLPNO-MP2 (E corr  ) correlation energies, one can obtain a correction term that can be added to the DLPNO-CCSD(T) correlation energy (E corr DLPNO-CCSD(T) ). This way, one can obtain total correlation energy close to the limit of the complete PNO space (denoted here as cDLPNO-CCSD(T), E corr cDLPNO-CCSD(T) ) at a given basis set without any fitted parameters: In this work, we first evaluate the approach against two subsets of the GMTKN55 database: 52 S22 and MB16-43. The former constitutes a classical test to check the accuracy of the quantum-chemical method to describe noncovalent interactions. 53 It contains small model complexes, DNA base pairs, and amino acid pairs. The MB16-43 contains an interesting set of decomposition reactions of nonexistent but virtually stable molecules. Most importantly, it involves open-shell systems and has been shown to be the most challenging for the DLPNO-CCSD(T) calculations reported to date. 38,45 Using computational timings obtained for the MB16-43, we also demonstrate the efficiency of the proposed scheme.
As our main application field of the DLPNO-CCSD(T) approach is transition metal catalyzed reactions, we checked our correction scheme against the modified MOBH35 set 54,55 that focuses on metal−organic barrier heights for relatively large closed-shell complexes. To explore the accuracy of our approach for open-shell transition metal-containing systems, we designed an additional benchmark. Here, we test the method against the canonical CCSD(T) for a set of potential energy surface scans for the six three-atomic systems, featuring a metal−carbon monoxide interaction in neutral and cationic forms. These are the smallest models for CO reactivity toward single-atom catalysts. 56−58 The size of these systems permits exploration of basis-set dependency of the PNO space incompleteness error and consequently the behavior of the proposed correction scheme with respect to the basis set. We discuss our results in light of previous investigations that look into basis set-dependency of the DLPNO-CCSD(T) results obtained with various PNO truncation parameters. 37,59

COMPUTATIONAL DETAILS
All calculations have been conducted with the use of ORCA 5.0.2 package. 60 Reference canonical coupled-cluster calculations have been performed with unrestricted reference (UHF) and quasi-restricted open-shell orbitals (QROs). 61 DLPNO-CCSD(T) correlation energies have been evaluated with the UHF reference as well, except for entries of MOBH35. In the latter case, due to efficiency reasons, restricted Hartree−Fock reference was employed with disabled full MP2 guess to stay consistent with UHF-based calculations. Single-point energies for S22 and MB16-43 test sets have been evaluated using the scalar relativistic second-order Douglas− Kroll−Hess Hamiltonian and the compatible aug-cc-pVDZ-DK basis set. 62 The resolution-of-identity (RI) 63 auxiliary basis set has been generated automatically with the AutoAUX feature. 64 This setup is the same as in the work of Altun and co-workers; 45 therefore results and timings are comparable. Ahlrichs's split-valence Def2-SVP, Def2-TZVP, Def2-TZVPP, and Def2-QZVPP basis sets 65 have been used for other calculations (see details in text) along with the corresponding auxiliary basis sets. 66,67 Def2 effective core potentials (ECPs) have been employed in all calculations involving late transition elements (Pd, Pt, Au). 68 We have also made use of the RI-MP2 approach for canonical MP2 calculations. 69 Recently, Bistoni and co-workers 45 have developed a DLPNO extrapolation procedure to which the presented scheme may be compared. Inspired by the two-point basis set extrapolation schemes, they proposed to approach the complete PNO space with extrapolated correlation energy values obtained in two calculations involving a different number of PNOs, controlled by the T cutPNO parameter. Although this approach greatly increases the accuracy of the calculation, the same can be said about the computational cost. The extrapolation from T cutPNO of 10 −x and 10 −y is denoted here as (x/y) extrapolation. Other truncation parameters are set according to the TightPNO keyword in the ORCA package (see Table 1). As found by Altun et al., 45 the increase of x and y brings the DLPNO-CCSD(T1) correlation energy to the proximity of the canonical CCSD(T) method.
In the proposed scheme, we have made use of two sets of the default DLPNO-CCSD(T) settings: NormalPNO and TightP-NO (Table 1). These have been applied consistently with both DLPNO-CCSD(T) and DLPNO-MP2 methods. More specifically, the DLPNO-CCSD(T) calculations have been performed by setting NormalPNO or TightPNO in the simple input line while the truncation parameters for the DLPNO-MP2 calculations have been defined explicitly in the %mp2 block. The Supporting Information contains an example compound job that can be easily used with the current ORCA version to obtain DLPNO-CCSD(T) energies corrected with the proposed scheme.
One should note that PNOs in open-shell DLPNO-CCSD(T) and DLPNO-MP2 calculations are constructed in a different way. In the former case, the high-spin open-shell variant of the N-electron valence perturbation theory formalism is used to define the initial guess wave function and consequently also the open-shell PNOs. 36 The DLPNO-MP2 follows the genuine UHF-MP2 implementation, i.e., without referencing to QROs. Here, the critical point is to obtain the PNO correction factor from the DLPNO-MP2 and the canonical MP2 with the same reference determinant and with the same MP2 implementation. In principle, one can use QROs generated in a separate step for UHF-MP2/UHF-DLPNO-MP2 calculations. However, we did not find this beneficial as the resulting interaction curves for the model Pt− CO system were virtually identical irrespective of the tested reference (see SI Figure S1). Additionally, to confirm the similarity of the PNO spaces in DLPNO-CCSD(T) and DLPNO-MP2 calculations, we checked the correlation between semilocal MP2 energy in the NEV-PNO space of the DLPNO-CCSD(T) calculations and the DLPNO-MP2 correlation energy. This came out to be linear (see SI Figure  S2), which means that both PNO spaces cover the same physical interactions.
The current work shows that even with different strategies employed for the PNO construction in the DLPNO-CCSD(T) and the DLPNO-MP2 methods, the final correction factor provides significant improvement to the uncorrected DLPNO-CCSD(T) method.
Reference CCSD(T) data for the S22 and MB16-43 sets has been taken from the work of Altun et al. 45 while the modified MOBH35 has been referenced to the CCSD(T)/def2-SVP values obtained by Semidalas and Martin 55 with the PSI4 code. Statistical errors have been evaluated using the following error measures: • r o o t -m e a n -s q u a r e e r r o r , where y i ref is a reference value for the ith entry estimated with a calculated value y i and N is the number of entries in the database.   All timings reported in the study have been obtained with dedicated computational nodes equipped with 18-core Intel Xeon Gold 6254 CPU running at 3.10 GHz, 128 Gb of RAM, and SSD storage. The calculations have been run with 6 cores and 5 Gb RAM memory available per core.

S22 Test Set.
In the first step, the correction proposed in eq 1 was tested against the S22 set composed of 22 interaction energies between small, closed-shell organic molecules. The calculations for this benchmark database constitute a kind of sanity test as any correction scheme designed to approach complete PNO space should improve the interaction energies of these dimers.
Statistical measures for the uncorrected (DLPNO) and the corrected (cDLPNO) calculations performed with Normal-PNO and TightPNO settings are presented in the Table 2 The perturbative triples have been evaluated either with the original noniterative scheme (T0) or with the iterative (T1) method. For comparison, we also present statistics for the recently reported extrapolation scheme by Altun et al. 45 We have observed clear accuracy improvement when the correction for the complete PNO space is accounted for. Interestingly, already with economic settings (NormalPNO and (T0) triples) the RMSE is below 1 kcal/mol. However, to bring MPE and MNE below 1 kcal/mol, it is important to use the TightPNO settings. Our approach works consistently well irrespective of perturbative triples treatment as expected for closed-shell molecules. Among the proposed setups, cDLPNO-CCSD(T) with TightPNO thresholds outperforms extrapolated results, especially in terms of error spread that drops below 0.6 kcal/mol in our case as compared to >2 kcal/mol for the extrapolated values. 45 3.2. MB16-43 Subset of the GMTKN55 Database. The MB16-43 has been the most challenging set in the recent benchmark studies of both the original 38 and extrapolated DLPNO-CCSD(T) methods. 45 This is the case because of the presence of open-shell species in the set that require more extended PNO space to recover the canonical correlation energy. Due to the size and nature of the molecules in the set (relatively large polyatomic species including molecules of closed-and open-shell character composed of atoms up to the third period), this database is also well suited for timing comparison.
According to Table 3, the proposed correction scheme allows for significant improvement of the obtained results. Here, the combination of TightPNO settings and iterative treatment of the triple excitations (T1) provides results very    Table 3). For the systems in the MB16-43 set, the (T1) iterative scheme for the perturbative triple excitation is clearly favored over noniterative original (T0) formulation. Interestingly, by keeping the (T1) option and loosening the PNO parameters to NormalPNO, one obtains cDLPNO errors smaller than for the extrapolation scheme (5/6) about 30% faster.

MOBH35 Database.
The original MOBH35 database of Iron and Janes 54 contains 35 single-step reactions (substrate, transition state, and product) of relatively large, closed-shell transition metal-containing systems. Following the analysis by Semidalas and Martin, 55 we decided to remove entries 8 and 9 due to severe and unbalanced static correlation that renders any single-reference coupled-cluster calculations unreliable. At the same time, the authors provide CCSD(T)/def2-SVP reference values for the entire MOBH35 set except entries 17− 20 that are too large for the computations. Thus, this study considers 29 organometallic reactions.
The overall performance of the proposed correction scheme is satisfying, providing accuracy similar to that of the previously proposed scheme at a significantly lower computational cost. For example, the combination of TightPNO settings and the (T1) approach provides results very close to the extrapolated (6/7) values (RMSE of 0.686 kcal/mol compared to 0.676 kcal/mol, respectively; Table 4).
Relatively large MPE and MNE for the corrected and uncorrected results highlight a deeper issue with the adapted default DLPNO-CCSD(T) method. First of all, the iterative T1 scheme does not account sufficiently for the dynamic correlation-induced orbital relaxation effects as has been demonstrated very recently by Altun et al. 72 Semidalas and Martin 55 have shown that the discrepancy of the DLPNO-CCSD(T1) with the canonical CCSD(T) can be related to the energy difference between (T1) and (T0) triples corrections. An empirical correction (denoted as DECIOR) has later been proposed 72 for MOBH35 as the DLPNO-CCSD(T1) error shows a roughly linear correlation with square of the norm of the single-amplitude vector, ∥t 1 ∥ 2 . Another source of errors in the DLPNO-CCSD(T) calculations is the treatment of semicore−valence correlations, which have been shown to be more demanding in terms of the T cutPNO . However, tightening of the truncation parameter for the PNOs involving semicore orbitals (e.g., 3s3p for 3d transition metals) is not automatic. It requires redefinition of the number of core electrons for relevant centers as well as adjustment of the energetic window for the frozen core treatment. In principle both corrections can be applied with the current scheme, but we prefer to keep the defaults that can be adjusted for particular applications.

Metal−CO Interactions.
As we are mainly interested in catalytic applications of the high-level ab initio methods, we have set up a model system that comprises a transition metal (M) and a carbon monoxide interacting via the carbon atom (C−O bond distance has been fixed to the experimental value of 1.128 Å). The metallic center acts as the simplest representation of the CO binding active site in the catalyst. In these calculations, we have varied the M−C distance in order to obtain interaction curves with various methods. Such triatomic systems are small enough to permit reference CCSD(T) calculations, and testing can be performed with basis sets of various sizes. This set will be termed M−CO neutral. Because positively charged Lewis centers are often more reactive toward small molecule activation, we also propose another set of singly charged triatomics from the M− CO set. This set will be denoted as M−CO cation. For both sets, the metal selection includes catalytically relevant M = Au, Ni, Cu, Pd and Pt atoms. The platinum and platinum ion have been tested with two spin multiplicities, 1/3 and 2/4, respectively. The summary of metallic centers along with their spin multiplicities used in the calculations is provided in Table 5.
Mean average errors (MAEs) of the interaction energies averaged over M−C distances of 1.6−2.6 Å are presented for neutral and cationic M−CO systems in Table 6 and Table 7, respectively. Such distances have been chosen to cover a range around the average M−C equilibrium bond length of the chosen transition metals. The tables provide results for basis sets of increasing size, from the compact def2-SVP up to extended def2-QZVPP. Interaction energies are calculated with respect to systems where M−C distance is set to a large value (10 Å).
In almost all cases, the proposed correction scheme reduced the DLPNO error by about 50%. The improvement is consistent for both the neutral and cationic species. One should note that the absolute errors for M−CO neutral are larger than those obtained for M−CO cationic. This is mainly due to the small energy gap between nd x (n + 1)s y and nd x−1 (n + 1)s y+1 electronic configurations of M atoms. For the same reason, low-spin Pt cationic systems show a small increase of the error when our correction is applied. We have noted that despite the same UHF reference, the MP2 and CCSD(T) unrelaxed densities display significantly different 5s occupations. In fact, this underlines the importance of the static correlation effects, which are not meant to be covered by the single-reference methods. Our correction shows expected behavior with respect to the choice of triples treatment (T0 or T1): error obtained with the cDLPNO-CCSD(T1) scheme is generally lower than the corresponding cDLPNO-CCSD(T0) error. Some departure from this behavior is observed for the cationic Pt−CO system, e.g., error for the cDLPNO-CCSD(T1) is larger than that for the cDLPNO-CCSD(T0) by about 0.5 kcal/mol with the def2-SVP basis set and TightPNO settings. Similar deviations were observed for some of the extrapolated values, especially those obtained from (5/6) extrapolation (e.g., neutral Pt−CO at a multiplicity 3 with def2-SVP basis set). Again, this could be traced back to similar issues observed in the MOBH35 data set. 55,72 We have found that the proposed scheme provides a welldefined increase in the accuracy with minor dependence on the basis set size (Figure 1). This finding is of special importance because of the recently published result stating the dependence of the DLPNO error on this parameter. 37 It is also evident that the use of (T1) instead of (T0) is mandatory to bring the calculations in the chemical accuracy regime (error <1 kcal/ mol). Our correction scheme brings cDLPNO-CCSD(T1) TightPNO and NormalPNO calculations below this threshold for the tested triatomics.

CONCLUSIONS
The presented MP2-based correction scheme provides an economic, yet accurate, way to account for the PNO space truncation in the DLPNO-CCSD(T) calculations. Importantly, the method is free of any fitted parameters. We have shown that the proposed correction allows us to minimize the impact of the T cutPNO choice on the accuracy of the results for both closed-and open-shell systems. Especially the latter can be thought of as a hard test since MP2 is known to fail often in this case.
Our approach displays remarkable robustness, but it cannot remove inherent DLPNO-CCSD(T) issues, like dynamic correlation-induced orbital relaxation effects. Special care should also be taken in cases where static correlation effects come into play, like near-degeneracies. In these cases, MP2 may describe a different electronic state than the DLPNO-CCSD(T) method and deteriorate results. For such systems, however, the applicability of single-reference methods should be carefully examined. In this study, we did not test any non-Hartree−Fock reference determinants, such as DFT orbitals. While these are not an issue with coupled cluster methods containing the singles operator, they become problematic at the MP2 level. The latter, in most implementations, assumes Brillouin's theorem to hold, i.e., singles are not computed. The current correction scheme can be extended to not only correct for the PNO space truncation but also estimate the DLPNO-CCSD(T) CBS limit with the MP2 method. In principle, DLPNO-MP2, canonical MP2, or even MP2-F12 74 may be used for the basis set extrapolation. We are now examining the accuracy and efficiency of various choices to develop cDLPNO-CCSD(T)/CBS(δMP2) model chemistry, and this will be the scope of the upcoming work. ■ ASSOCIATED CONTENT
M−CO energetics and error analyses (ZIP) Figures S1 and S2, example ORCA input file along with a compound job script, detailed energetics and error analyses for each benchmark set (XLSX)