From simple molecules to nanotubes. Reliable predictions of ionization potentials from the ΔMP2-SCS methods

The vertical ionization potentials for systems of various sizes, ranging from simple molecules, DNA/RNA bases, donor and acceptor organic molecular systems as well as nanotubes are calculated using the ΔMP2-SCS family of methods [Śmiga et al 2018 J. Chem. Theory Comput. 14 4780–90]. We have shown that for all investigated cases, the ΔMP2-SCS methods, being almost cost free single-step post-Hartree–Fock calculation, provide very accurate vertical ionization potentials comparable in quality with state-of-art outer valence Green’s function methods. Moreover, we show that a combination of the ΔMP2-SCS methods with the resolution of identity technique is effective and reliable an alternative to the semi-empirical density functional theory methods and Green’s function-based calculations of ionization potentials for large molecular systems such as silicon-based or organic-based solar cells, for which the IP-EOM-CCSD calculations are too expensive for routine calculations.


Introduction
A photo-voltaic effect is a physical process allowing to convert light into electricity in solar cell devices. The effect mostly occurs in the semiconducting materials characterized by small energy difference (the energy gap) between the valence and the conduction bands or between the highest occupied electron energy and lowest unoccupied electron energy levels, respectively.
One of the key parameters of solar cell devices is the power conversion efficiency (PCE) ratio [1] which gives the information about the ability to convert light into electricity. The PCE ratio is proportional to open-circuit voltage (V oc ) which, as was recently shown [2], is in turn governed by energy difference between the value of first electron attachment (EA) (or negative lowest unoccupied molecular orbital energy (LUMO) EA A = −ε A LUMO ) and first ionization potential (IP) (or negative highest occupied molecular orbital (HOMO) energy IP D = −ε D HOMO ) of the acceptor and donor material, respectively. However, according to reference [3], in order to identify potential candidates for photovoltaic materials, one basically needs only to know the value of their first IP.
The IP of a molecule is also a fundamental quantity in other contexts such as charge transfer and transport processes, photo-electron spectroscopy and especially molecular electronics, where together with EA attracts great interest in applied and theoretical research [1,[4][5][6]. Thus, the ability of accurate prediction (with the accuracy in range of 0.1-0.3 eV) of these two intrinsic parameters has become a key importance in the context of a new developments in these field [7][8][9][10][11][12][13].
To date, many useful approaches have been established allowing to calculate EAs and IPs with various accuracy and cost [11,14]. Unfortunately, the size of the molecules used e.g. in the construction of silicon-based or organic-based solar cells [15][16][17][18] limits the application of first-principle ab initio methods in modeling of such structures. The utilization of state-of-art IP/EA equation-of-motion coupled-cluster (IP/EA-EOM-CC) [19][20][21] or electron propagator (EP) theory  methods (providing high accuracy results) become especially difficult, particularly when one needs to perform calculations in larger basis sets. On the other hand, application of simple and relatively cheap density functional theory [48] (DFT) methods often do not lead to accurate predictions. For example reference [49] reports that standard DFT approximations fail in the accurate predictions of IPs for large molecular systems due to delocalization error. In contrast, the ab initio DFT functionals [50] can provide good predictions of IPs [51][52][53], but they share the same limitations as IP-EOM or EP approaches. Thus, the development of affordable and at the same time accurate methods for calculating IP/EA is still an important topic of research [11,13,[54][55][56][57].
Recently, we have proposed [11] extremely simple, practical, reliable and computationally not expensive (scaling as O(N 3 ) in post-Hartree-Fock step [58], where N is the system size) methods denoted as ΔMP2-SCS(IP) and ΔMP2-SOS(IP) for the calculation of vertical ionization potentials (VIP). Both methods are based on the Dyson equation [46] with the self-energy in the second-order, diagonal, frequency-independent approximation, namely the ΔMP2 [46,47,59] method, combined with spin-component-scaled (SCS) [60] decomposition of the second-order self-energy expression In the above, indices i, j indicate occupied orbitals, a, b indicate unoccupied ones (p, q indices will be used for any orbitals), while σ and τ are spin indices. The c SS and c OS parameters scaling the same spin (SS) and opposite spin (OS) parts of the second-order self-energy, respectively, have been optimized with respect to VIPs benchmark data set of fifty variously sized molecules (for more details see reference [11]). The optimized c SS and c OS parameter values, having strictly property (i.e. IP) based nature [61,62], define hereon the ΔMP2-SCS(IP) (c OS = 0.67 and c SS = 0.38) and ΔMP2-SOS(IP) (c OS = 1.0 and c SS = 0.0) methods. It was shown [11] that both can predict the values of VIP with accuracy comparable with the state-of-art EP methods such as outer valance Green's function (OVGF) [63].
We note that, similarly to the construction of other EP methods [64,65] (e.g. OVGF methods), the introduction of scaling parameters in equation (2) is justified based on the following empirical observations: (1) it is generally known that the first, non-vanishing, second-order self-energy Σ (2) pp term (ΔMP2/EP2 method) leads to a large overestimation of Koopmans orbital energies. Addition of the third order Σ (3) pp term contribution (ΔMP3/EP3 method), in turn, leads the opposite trend [64,66]; (2) the experimental IPs values quite often lay between second-order and third-order ΔMP/EP values. Instead of rescaling third-order term (like in OVGF methods), one can disregard computationally demanding Σ (3) pp term and scale only the second-order one [11,56,57,67]. As shown in figure 1 of reference [11] the rescaling of Σ (2) pp self-energy term leads to a significant reduction of overestimation in ΔMP2 method. Furthermore, the SCS decomposition allows to exploit mutual, almost linear relation between OS and SS self-energy terms what justify the construction of one parameter variant which is computationally more favorable.
The methods and results presented in reference [11] are encouraging, however, in order to demonstrate the real predictive power and robustness of ΔMP2-SCS(IP) and ΔMP2-SOS(IP) approaches, larger assessment is required. Therefore, in the present study, we have employed both ΔMP2-SCS methods for the calculation of VIP for variously sized systems ranging from simple molecules to large organic photo-voltaic molecules and single-wall carbon nanotubes (SWCNT) and have made the most stringent performance analysis of our new methods providing a thorough benchmark of them in comparison with other approaches.

Computational details
All EP method calculations (OVGF, P3, P3+, EP3, EP2) [66] have been performed with the Gaussian 16 package. [68] The IPs with low pole strength (PS) (lower than 0.85) have been ignored in subsequent analysis. The IP-EOM calculations have been performed with the NWChem software [69]. If not otherwise stated the ΔMP2 and the spin-resolved ΔMP2 methods [ΔMP2-SCS(IP) and ΔMP2-SOS(IP)] results have been obtained with our local version of the ACESII package [70]. In the case of large molecular systems reported in sections 3.3 and 3.4 all ΔMP2 calculations have been performed with locally modified version of PSI4 quantum chemistry packages [71] where the resolution of identity (RI) [72,73] was employed in the calculation of electron repulsion integrals. As was shown in references [74,75] the utilization of RI approximation allows to substantially reduce the computational bottleneck of EP methods, i.e. atomic-to molecular-orbital transformation scaling as O(N 5 ), without significant loss of accuracy. If not otherwise stated, all calculations have been performed using Dunning [76] cc-pVXZ family of basis sets (X = D, T). In the case of RI approximation, the auxiliary basis have been chosen as a counterpart of original basis set [77]. Likewise in other studies [11,54,58,78], the CBS results reported in tables 2-4 have been estimated using the two point inverse cubic formula [79]. For more details we refer the reader to section 3.

Vertical valence ionization potentials
In our previous study (see table 6 in reference [11]) we have shown that ΔMP2-SCS(IP) and ΔMP2-SOS(IP) methods are able to provide accurate results not only for the first IP but also for all valence IPs. However, the conclusion was based on a few simple molecules, and more broader assessment in this context should be made. In the recent study [9] Ranasinghe et al have presented a reliable benchmark data set of 1468 valence VIPs computed for 155 molecules using IP-EOM method with singles and doubles [19,80] (IP-EOM-CCSD) in cc-pVTZ basis set. The data set, consisting of two subsets denoted as CGB and NIST was compiled from molecules taken from references [81] and [82], respectively. We will refer to these names in the following.
In order to assess the methods on the same footing (lack of basis set error compensation), we have performed the ΔMP2-SCS(IP) and ΔMP2-SOS(IP) calculations using the same computational setup as in reference [9], namely cc-pVTZ basis sets and geometries. Additionally, for comparison, we also computed the same quantities for several EP approaches.
In table 1 we report MSE and MAE calculated with respect to IP-EOM-CCSD/cc-pVTZ data for all methods mentioned above. We remark, that in the evaluation of errors we have considered only the states where all methods yield well converged results (PS 0.85).
Inspection of the data reveals that in the case of valence IPs, the ΔMP2-SCS(IP) and ΔMP2-SOS(IP) methods yield MAEs comparable with much more sophisticated and numerically demanding OVGF approaches. In the case of CGB data set, the smallest error is observed for P3+ method (MAE = 0.16 eV). In the same time all OVGF methods give MAEs in the range of 0.19 to 0.27 eV. The bare EP2 and ΔMP2 methods (without any parametrization) overestimate significantly the IPs whereas the application of EP3 method leads to the opposite trend. Similar trends are observed for NIST subset. Here the best performance is provided by OVGF-N method yielding MAE = 0.17 eV. This is closely followed by P3 (MAE = 0.18 eV) and P3+ (MAE = 0.2 eV) results and other OVGF methods which give the MAEs in the range of 0.19 to 0.23 eV. We remark that ΔMP2-SCS(IP) and ΔMP2-SOS(IP) give, in this case, only slightly worse performance with the MAEs which are only 0.05 ∼ 0.07 eV larger than those for OVGF-N method. To Table 1. Mean signed error (MSE) and mean absolute error (MAE) (in eV) of IPs obtained from various EP methods calculated with respect to the reference IP-EOM-CCSD/cc-pVTZ data from reference [9]. For valence IPs we have only considered the states where PS is larger than 0.85 in EP calculations. For ΔMP2-SCS methods (in parenthesis) we also report the same errors for all states from reference [9]. The best result (in each category) is highlighted bold. conclude the valence IPs assessment, we report (in parenthesis) the MAEs computed for all states reported in reference [9] for ΔMP2-SCS(IP) and ΔMP2-SOS(IP) methods. This step allows to directly compare the performance of both IP-optimized ΔMP2-SCS methods with the results obtained for CGB subset gathered in table 2 in reference [9]. One can note that both methods yield MAEs which are almost twice as good as those obtained using QTP [83] family of DFT functionals, which were specially designed and parametrized to provide best performance for VIPs. Moreover we note that both methods outperform the CCSD results obtained from extended Koopmans theorem [84][85][86], which surprisingly gives, in this case, MAE = 0.8 eV, probably mainly due to the lack of inclusion of higher excitations [87]. For the first IPs (HOMO column in table 1) the performance of the methods is quite similar for both subsets. The ΔMP2-SOS(IP) yield for both sets best performance MAE of 0.16 eV and 0.15 eV for CBG and NIST, respectively. These are closely followed by ΔMP2-SCS(IP) and P3+ methods results giving only slightly larger MAE of about 0.01 ∼ 0.04 eV. Again the CGB HOMO IPs can be directly compared to those gathered in table 2 in reference [9]. One can note that both ΔMP2-SCS(IP) and ΔMP2-SOS(IP) outperform significantly IPs obtained using standard DFT methods and they are almost twice as good as IPs calculated by QTP [83] functionals.
Summarizing this part, we can conclude that the broad assessment conducted has proven that at a small fraction of computational cost (see table 1 in reference [58]), the ΔMP2-SCS(IP) and ΔMP2-SOS(IP) methods provide results which are inline with more advanced EP approaches such as OVGF or P3/P3+ for all valance VIPs and are much better than standard and more sophisticated DFT functionals [9].

Ionization potentials of nucleic acid bases
Another interesting application of the ΔMP2-SCS methods was performed for five DNA/RNA nucleobases (depicted in figure 1), namely, guanine, adenine, cytosine, thymine, and uracil. As discussed in references [90][91][92], the accurate determination of the ionization energies of these molecules plays an important role toward a better understanding of electrostatic interaction of nucleobases with other molecules, or e.g. phenomena related photo damage or repair of the genetic materials [92]. The initial DNA/RNA geometries have been taken from reference [88] and re-optimized at the B3LYP/6-31G(2df, p) [93][94][95] level of theory. To enable a broad comparison, also in this case, we have performed calculation for all methods listed in table 1 at the CBS limit (see section 2 for more details). Unfortunately, the direct comparison with experimental results is rather difficult due to large dispersion of data [91,92]. Thus as a reference we have Table 2. First two VIPs (in eV) of nucleic acid bases obtained for various methods at CBS limit. First column reports experimental IP values from reference [89] whereas the second one, the same values but including ZPE and geometry relaxation corrections. In the last two lines we report the MAE and MARE calculated with respect to IP-EOM-CCSD/CBS data. The best results are marked bold.

OVGF ΔMP2
Expt. a Expt. a adiabatic IP data measured by photo ionization mass-spectrometery in gas phase from reference [89]. b experimental data from reference [89] corrected for ZPE and geometry relaxation corrections calculated at B3LYP/6-31G(2df, p) C = Δ relax + ZPE c − ZPE n note, that same correction was also applied for second IP (see reference [54]). c IP-EOM-CCSD results, all electron are correlated.   [98][99][100]. b see reference [54] for the description of methods and results. The OVGF-N, P3, and P3+ cc-pVDZ results have been corrected using EP2 cc-pVDZ and cc-pVTZ results extrapolated to CBS.
utilized IP-EOM-CCSD [19,21,80] results extrapolated to CBS limit as in references [78,79] which yield quite good agreement with other IP-EOM calculations [90,96]. In table 2 we present first two VIPs for aforementioned DNA and RNA nucleobases together with MAE and mean average relative error (MARE) calculated with respect to reference IP-EOM-CCSD data. First one can note that IP-EOM-CCSD reference results (also from other studies [90,96]) are in good agreement with experimental data from reference [89] corrected for zero-point energy (ZPE) and geometry relaxation corrections calculated at B3LYP/6-31G(2df, p) level. This finding additionally confirms the quality of IP-EOM-CCSD data. Further inspection of table 2 reveals the excellent performance of ΔMP2-SOS(IP) and ΔMP2-SCS(IP) methods which yield in this case relatively small errors of MAE = 0.10 eV (MARE = 0.98%) and MAE = 0.11 eV (MARE = 1.15%), respectively. Second best result is provided by OVGF-C method with MAE of 0.13 eV and MARE = 1.33%. Similar good performance is observed for other OVGF and P3+ method which give MAE in the range of 0.17 to 0.22 eV. Surprisingly, the performance of the P3 method, in contrast to the results reported in section 3.1 and other work related to VIP of nucleobases [90,92], is rather poor and comparable with EP2 method, which gives quite similar MAE = 0.32 eV and MARE of 3.38%. The behavior of P3 method can be explained by investigation of MAE with respect to the basis set size, as was done in reference [58]. The larger basis set the bigger discrepancy from reference results are observed for P3 method [58]. For small basis set, on the other hand, the P3 performs relatively well and thus benefits from basis set deficiencies. This was actually the case in both previous studies [90,92] regarding DNA and RNA nucleobases. Therefore, to conclude, in order to obtain reliable results for a particular method one need also to test its convergence with respect to the basis set size.

Large photo voltaic molecules
In reference [11] we have employed the ΔMP2-SCS methods for calculation of IPs of 24 medium sized organic acceptor molecules [102] that are particularly interesting from the standpoint of their future application in organic photo voltaic devices [15][16][17][18]. Due to the small computational cost (especially the variant combined with RI technique) of the ΔMP2-SCS methods their potential applicability to identify this kind of systems belongs to one of the most important advantages of these approaches. Thus, in order to discuss the capabilities and limitations of the present methods, we make additional assessment for several large acceptors or donors organic molecules (listed in table 3 and depicted in figure 2), often used in organic electronics [54,97,98]. The geometries of all structures have been re-optimized at the B3LYP/6-31G(2df, p) [93][94][95] level of theory.
In table 3 we report first IPs values extrapolated to CBS limit, calculated using various EP and ΔMP2 methods. The core orbitals were kept frozen in all EP calculations. In the case of rubrene, C 60 , C 70 , and PCBM systems for all ΔMP2 methods we report two values. First is obtained with the same basis set as for EP methods (see table 3 for more details) and the other (in the parenthesis) is the value obtained at CBS limit.
Inspection of MAE and MARE (calculated with respect to the molecular experimental data) reveals that the best performance among EP methods is given by OVGF-C which yields MAE of 0.14 eV (MARE = 2.12%). It is closely followed by OVGF-N results with MAE of 0.16 (MARE = 2.41%) and later by P3+ method which yields MAE of 0.21 eV (MARE = 3.11%). The worst performance is given by P3, EP3, OVGF-A, and B methods whose MAEs range between 0.27 and 0.37 eV (MARE between 3.72% and 5.52%). Turning now attention to ΔMP2-methods results gives similar picture to those reported in sections 3.1 and 3.2 i.e. the ΔMP2-SCS methods improve upon the unscaled ΔMP2 variant which yield MAE of 0.60 eV (MARE = 8.48%). The ΔMP2-SOS(IP) and ΔMP2-SCS(IP) give here, in turn, MAE = 0.34 eV (MARE = 5.03%) and MAE = 0.29 eV (MARE = 4.28%), respectively, being quite similar to ones reported by OVGF-B/C and P3 methods.
Due to very small computational cost of ΔMP2 methods, we have been also able to calculate CBS results (reported in parenthesis) for rubrene, C 60 , C 70  . This is well known fact that for small basis sets (like employed here 6-31G and cc-pVDZ) the EP2 or ΔMP2 gives quite large errors with respect to the experimental data [104]. This feature is unfortunately inherited by ΔMP2-SCS methods what can be clearly seen in table 3.
For three systems where molecular gas phase experimental values were unavailable, namely DCV3T, DIP, and 6T, we report the bulk phase experimental data. One can immediately note the large difference between the bulk phase and computed IPs values. The latter have much better agreement with the molecular experimental results as indicated in reference [97]. The difference between gas and bulk phase experimental data are mostly dictated by the polarization effect of bulk material due to the molecular ion which remains in the solid after an electron is removed [105]. We note, that also in this case, the quality of results given by ΔMP2-SCS methods is not worse than other EP approaches such as OVGF-N and C.
One can note that table 3 does not report the EP2 IP values. This is due to the fact that the PS for most systems was found to be lower than 0.85 for many calculated first IPs. This finding is important in the context of recently published composite electron propagator (CEP) [54] method where authors propose focal point analysis [54,106] to the calculation of IPs. In the CEP method, the high-level EP calculations (e.g. OVGF, P3, etc) are performed using quite small basis sets and then improved using incremental Table 5. Comparison of IPs (in eV) obtained for several SWCNTs (C 20n H 20 , with n = 2-4) and computational methods in 6-31G basis set. The MAE and MARE are calculated with respect to IP-EOM-CCSD/6-31G data from reference [103]. Core electrons have been kept frozen in all calculations. The best results are marked bold.  corrections based on EP2 method results obtained for both smaller and larger basis sets. The approach was found to be extremely accurate in the IP predictions for organic photo-voltaic molecules. In table 4 we report the comparison of the ΔMP2-SCS(IP) and ΔMP2-SOS(IP) methods and the best CEP results reported in reference [54] for benzo[a]pyrene, C 60 and MgOEP. One can note that the quality of ΔMP2-SCS(IP) results is similar to the best CN/23 CEP method, being at the same time computationally much less demanding due to avoidance of performing high-level calculations. Thus, we conclude that also in this case the ΔMP2-SCS methods are robust and provide high-quality results.

Carbon nanotubes
Since their discovery, the SWCNTs became recognized as potential candidates for materials for organic solar cells devices [107], due to their various interesting electronic properties [108]. In several recent studies [103,109,110], the impact of the nanotube length on the IPs was studied using various theoretical approaches e.g. IP-EOM-CCSD [103] or several DFT methods [110]. Thus, in this section, as a final test, we perform a similar assessment of ΔMP2-SCS methods. To this end, we have considered SWCNTs of various lengths using geometries from reference [111] (C 20n H 20 , with n = 2-10). First, we investigate the accuracy of EP and ΔMP2 type methods in the prediction of the first two highest IPs. As a reference we utilized the IP-EOM-CCSD data taken from reference [103]. All calculations have been performed using the 6-31G basis set [95,101]  As was shown in references [103,110] the IP-EOM-CCSD SWCNTs IPs data are quite sensitive to the quality of basis set used in the calculations. For example the difference between 6-31G and aug-cc-pVDZ [76] C 40 H 20 results is roughly of 0.5 eV. Thus, in order to investigate basis set dependence of ΔMP2-SCS methods result in figure 3 we also report the value of first IP as a function of SWCNT length. As a reference, we report the IP-EOM-CCSD data taken from reference [110]. For comparison, we report also the ωB97XD results obtained without and with IP-tuned value of ω (denoted with asterisk-ωB97XD * ). Likewise in reference [110] all data have been obtained in may-cc-pVDZ [112] basis set. As we can see for n = 2-6 the latter data are in quite good agreement with the reference IP-EOM-CCSD values. Thus, for n = 7-10 (where IP-EOM-CCSD results are unavailable) the ωB97XD * IPs are used as a reference.
From figure 3 we clearly see that the ΔMP2-SCS(IP) and ΔMP2-SOS(IP) results present similar qualitative behavior to IP-EOM-CCSD data even for C 80 H 20 where ωB97XD * shows almost same IP values as for C 60 H 20 and C 100 H 20 SWCNTs. Furthermore, one can note that both ΔMP2 methods differ in results by 0.25 ∼ 0.40 eV from reference IP-EOM-CCSD values for n = 2-6. Similar behavior is observed for ωB97XD (without IP-tuning) method results what indicate that optimization of ω, which is not always straightforward task especially for complex molecular systems [113], is crucial for obtaining reliable IP values.
For larger carbon nanotubes (n = 7-10) the agreement of ΔMP2 results with ωB97XD * is more noticeable, especially for ΔMP2-SOS(IP) method. This assessment indicates that both ΔMP2-SCS approaches also, in this case, can provide good qualitative predictions despite their simplicity.

Conclusions
In this study we have performed the comprehensive assessment of the ΔMP2-SCS methods for large set of small and medium size molecules, DNA/RNA bases, donor and acceptor organic molecular systems, and finally SWCNTs.
The analysis of the results clearly shows that the ΔMP2-SCS family of methods are able to provide IPs comparable in accuracy with state-of-art EP approaches such as OVGF, P3+ or CEP methods with substantially lower computational cost (O(N 3 )), also outperforming the standard DFT methods.
Additionally the combination of our ΔMP2-SCS(IP) and ΔMP2-SOS(IP) methods with the RI technique, provides excellent results for a variety of large molecular systems what is really attractive from the standpoint of the search and development of new organovoltaic materials for solar cell devices. Therefore, the ΔMP2-SCS(IP) and ΔMP2-SOS(IP) methods can be used as a simple but reliable and practical method for routine calculation of accurate VIPs in broad applications.
Finally, we remark, that despite the fact, that the ΔMP2-SCS methods provide results which differ from the reference data i.e. the IP-EOM-CCSD and experimental ones of only 0.1-0.2 eV, there is a still room for improvement e.g. by utilization in equation (2) the system dependent c OS and c SS parameters [51,52,62,[114][115][116] which will be the object of our further study.