Hunting for Complex Organic Molecules in the Interstellar Medium: The Role of Accurate Low-Cost Theoretical Geometries and Rotational Constants

A new approach to computation at affordable cost of accurate geometrical structures and rotational constants for medium-sized molecules in the gas phase is further improved and applied to a large panel of interstellar complex organic molecules. The most distinctive feature of the new model is the effective inclusion of core–valence correlation and vibrational averaging effects in the framework of density functional theory (DFT). In particular, a double-hybrid functional in conjunction with a quadruple-ζ valence/triple-ζ polarization basis set is employed for geometry optimizations, whereas a cheaper hybrid functional in conjunction with a split-valence basis set is used for the evaluation of vibrational corrections. A thorough benchmark based on a wide range of prototypical systems shows that the new scheme approaches the accuracy of state-of-the-art wave function methods with the computational cost of the standard methods (DFT or MP2) routinely employed in the interpretation of microwave spectra. Since the whole computational workflow involves the postprocessing of the output of standard electronic structure codes by a new freely available web utility, the way is paved for the accurate yet not prohibitively expensive study of medium- to large-sized molecules also by nonspecialists.


■ INTRODUCTION
The characterization of molecular structures and properties in the gas phase plays a central role in different fields ranging from molecular astrophysics and astrochemistry to the impact of atmospheric chemistry on climate change, not to mention combustion processes.From another point of view, the disentanglement of intrinsic stereoelectronic and environmental effects in processes occurring in condensed phases requires preliminary knowledge of gas-phase results.In this framework, the pivotal role of rotational spectroscopy is related to its highly selective nature: the observed spectral patterns depend sensitively on three rotational constants that, in turn, are inversely related to the principal moments of inertia of the target molecule.Furthermore, the resolution of spectrometers, either in a laboratory or through a telescope, is on the order of parts per million, meaning that the assignment of a molecule based on the rotational spectrum is usually unambiguous.Thanks to the development of spectrometers coupling supersonic-jet expansion 1 and laser ablation, 2 rotational spectroscopy can be applied to molecules with up to about 50 atoms including thermolabile molecules with high melting points (like several biomolecule building blocks).However, as the dimensions of the molecule increase, the spectral congestion grows due to both intrinsic factors and the presence of different low-energy species separated by sufficiently high-energy barriers to avoid their interconversion under the experimental conditions. 3,4Quantum chemical (QC) computations play a fundamental role in the disentanglement of these complexities, but the accuracy requirements of rotational spectroscopy are very stringent.−11 However, the intrinsic errors of these approaches (0.5−1%) are too large to allow unbiased analyses without additional experimental pieces of information (e.g., quadrupole coupling constants in the presence of nuclei, in which they do not vanish). 12Improved results can be obtained correcting the density functional theory (DFT) geometrical parameters by bond-specific parameters 13 derived from a large database of accurate molecular structures 14,15 [the so-called linear regression (LR) approach], or with reference to suitable fragments whose accurate geometries are already known [the so-called template molecule (TM) approach]. 16Recently, the LR and TM approaches have been combined with remarkable success in the nano-LEGO model 14,15 (also called LEGO-brick 17,18 ).However, the use of a large number of parameters remains quite unsatisfactory and, above all, suitable fragments are not always available. 19,20Based on these premises, it is the purpose of this paper to show that much better results can be obtained by the recent Pisa composite scheme (PCS), 19,21−23 which improves the accuracy of current approaches for medium-sized molecules by nearly an order of magnitude, without any significant increase of computational cost, especially when core−valence (CV) correlation is accounted for by a simple one-parameter approach (PCS/Bonds).We will consider in particular polyaromatic hydrocarbons (PAHs), 24 together with some heteroaromatic derivatives (PANHs) 25 and CN-substituted species. 26The last class of compounds is especially significant in astrochemistry since the presence of the CN moiety strongly increases the vanishing (or nearly vanishing) dipole moment of PAHs, thus allowing their characterization by means of rotational spectroscopy. 27,28METHODS PCS/Bonds Model.The key feature of the reduced cost composite methods developed in our group during the past few years 29−32 is that starting from an accurate evaluation of valence energy (E V ) with a (partially augmented) triple-ζ basis set, 33,34 complete basis set (CBS) extrapolation and inclusion of CV correlation can be performed separately with good accuracy and negligible additional cost by means of second order Møller− Plesset perturbation theory (MP2). 35An equivalent partitioning of the different contributions starts, instead, from the valence energy computed at the MP2 level (E V2 ) and then adds post-MP2 valence contributions (ΔE V ) and MP2 CV correlation with where ae means all-electrons, fc means frozen-core, and CW3Z is the correlation-consistent cc-pwCVTZ basis set. 36This formulation permits the use of several computational models to estimate the ΔE V contribution and the use of different basis sets (possibly also CBS extrapolations) for evaluating both E V2 and ΔE V .Derivation of eq 1 wrt Cartesian coordinates leads to energy gradients that can be employed for geometry optimizations by composite methods. 37,38However, a much simpler approach (referred to as the geometry scheme 7,39 ) is obtained assuming that the additivity approximation can be directly applied to geometrical parameters (r) and only requires geometry optimizations at different levels of theory Several benchmarks have shown that gradient and geometry schemes provide nearly identical results, 40 possibly due to the smallness (a few mÅ) of the corrections to the starting geometrical parameters.Furthermore, the geometry scheme does not require any modification of standard electronic structure codes, is embarrassingly parallel, and is particularly effective since low-scaling methods or analytical implementations are widely available for MP2 (or double-hybrid density functionals) gradients.
For systems not involving strong dynamical correlation, the coupled-cluster ansatz including single, double, and (perturbatively) triple excitations (CCSD(T)) 41 can be confidently employed to evaluate ΔE V and Δr V . 31,32However, remarkable results are obtained at a much reduced cost by evaluating the Δr V contribution with the rev-DSD-PBEP86-D3(BJ) doublehybrid functional 42 (hereafter rDSD) in conjunction with the same basis sets employed for r V2 . 21The final PCS/DFT model is obtained neglecting the CBS extrapolation and employing the cc-pVTZ-F12 basis set 43 (hereafter 3F12), which involves a quadruple-ζ expansion of s, p valence orbitals in conjunction with a triple-ζ expansion of polarization functions.Under such circumstances, Δr V = r rDSD − r V2 and eq 3 becomes A further reduction of computational cost with negligible loss of accuracy is obtained by removing d functions on first-row atoms and replacing the two f functions on second-and thirdrow atoms by a single f function taken from the cc-pVTZ basis set. 33The resulting basis set (referred to as 3F12 − in the following) has dimensions comparable with those of the jun-cc-pVTZ basis set employed systematically in previous studies, 31 but it delivers results much closer to those of augmented quadruple-ζ basis sets 21 which are, in turn, practically at the CBS limit. 44From another point of view, extensive benchmarks confirmed that CV corrections are entirely negligible for valence and dihedral angles, 22 which are predicted accurately at the rDSD/3F12 − level.Improved bond lengths can then be obtained by the following recipe where the first corrective term accounts for CV correlation and the second one stands for possible small inaccuracies in the treatment of valence electrons.Bonded atoms can be easily identified employing covalent radii (r cov , taken for instance from ref 45) or, equivalently, Pauling bond orders In eq 6, r ij is the interatomic distance in Å and two atoms are considered bonded if P ij is larger than 0.3 (which corresponds to a distance 0.35 Å longer than the sum of the covalent radii).In a previous work, 22 we have shown that Δr CVB is a simple function of the product (n ij ) of the principal quantum numbers of the involved atoms.Here, a further step is taken by employing the same covalent radii already used in eq 6 in place of the rDSD/ 3F12 − bond lengths in order to obtain a method-independent estimate of the CV contributions (Δr CVB ) In the above equation, k = 0.0011 is the same as in ref 22, and vanishing corrections are correctly obtained for bonds between first-row atoms.In the case of the rDSD/3F12 − model, the Δr VB correction is needed only for multiple CC bonds, with the following simple expression having the correct behavior and remarkable numerical performance ) The Kronecker δ permits the inclusion of the corrective term only for specific bonds, and the precise values of the bond orders are not critical, provided that they are close to 1, 2, and 3 for simple, double, and triple bonds.In the following sections, we show that the PCS/Bonds model can deliver accurate results by a single geometry optimization at the rDSD/3F12 − level.
While Z-matrix nonredundant internal coordinates are usually employed to define (and correct) bond lengths and valence angles, 18 this approach is ill adapted to deal with intrinsic The Journal of Physical Chemistry A

The Journal of Physical Chemistry A
redundancies (e.g., cycles).As a consequence, a full set of redundant internal coordinates is preferable, which can be automatically derived when the bond pattern of the molecular system is defined. 47However, the displacements computed by eq 5 are finite, so that it is, in principle, impossible to modify all of the bond lengths while retaining the original values of the other internal coordinates.Furthermore, the transformation between (redundant) internal and Cartesian coordinates is nonlinear, making the coordinate conversion an iterative procedure.Contrary to the customary Newton procedure, which requires the generalized inverse of the Jacobian matrix B, 47 we resorted to a penalty function approach in which the following function is minimized by a gradient-descent algorithm where the index i labels the n B bonds (with bond length r i ), the index α labels the n A angles θ, and f is a scale factor that normalizes the weight of bond lengths and angles (a default value of 0.00115 is used).
In particular, the Cartesian coordinates at the i + 1st iteration are ( 1) where ∇ x S is the Cartesian gradient of S and η is the so-called learning-rate in the machine learning (ML) jargon, with a value of around 0.1 performing well in the present context.The components of the gradient are The derivatives of internal coordinates wrt Cartesian coordinates appearing in the previous equations are wellknown 48 and full convergence of the procedure (root-meansquare changes of Cartesian coordinates lower than 10 −6 ) is obtained without problems in all the studied cases.
The whole procedure has been implemented in a Website, which, starting from a rDSD/3F12 − geometry in Cartesian coordinates (xyz-or Gaussian log-file), interactively produces the PCS/Bonds structure, the corresponding equilibrium rotational constants, and a 2D representation of the molecular structure.

■ RESULTS AND DISCUSSION
Validation.In general terms, predicted rotational constants with an accuracy of 1% are not at all helpful for the analysis of samples containing several different species, whereas an accuracy of 0.01% would be extraordinarily valuable.The relationship between the accuracy of the rotational constants and that of the underlying geometrical parameters is not straightforward for polyatomic molecules.However, an accuracy between 0.0005 and 0.001 Å for bond lengths (in the range from 1 to 2 Å) and of about 0.1°for valence angles can be considered a reasonable guess for an accuracy of 0.1% on rotational constants. 49While this target is reachable by state-of-the-art composite wave function methods, 7,50,51 the 0.01% goal would require consideration of additional effects (e.g., diagonal Born− Oppenheimer approximation and finite nuclear models) and also places essentially insurmountable demands on the basis set selection and correlation treatments. 49Based on these premises, we will analyze the perspectives of a low-cost model able to approach the 0.1% accuracy target for large molecules.
From another point of view, the current practice of comparing experimental rotational constants with computed equilibrium values neglects the contribution of the interaction between vibrations and rotations.For semirigid molecules, this contribution is well approximated employing second-order vibrational perturbation theory (VPT2) 52,53 where α r i denotes the vibrational−rotation interaction constants, with i being the inertial axis (a, b, or c) and the sum running over all the r vibrational modes.
The magnitude of the vibrational contribution ΔB vib ranges typically between 0.1 and 1.0% of the corresponding rotational constant (i.e., above the target accuracy of our model), and its calculation by QC methods requires evaluation of the harmonic and (semidiagonal cubic) anharmonic force fields.The error in such computations is comparable to that of the force constants, which is of the order of 10% for hybrid density functionals with medium-size basis sets like the B3LYP-D3(BJ)/6-31+G* model employed in the present context.Therefore, the average error in the vibrational contribution to the rotational constants can be estimated to be about 0.05% of the total value of the ground vibrational state constants B 0 , which is within the accuracy target (0.1%) discussed above.
In principle, the small electronic correction to rotational constants should also be considered 52 where g ii is expressed in units of the nuclear magneton, m is the electron mass, and M p is the proton mass.Several studies have

The Journal of Physical Chemistry A
shown that ΔB el can be safely computed by hybrid density functionals using London orbitals. 16,22dditional parameters of particular relevance for rotational spectra are the nuclear quadrupole coupling constants, together with quartic and sextic distortion constants, with the latter two quantities depending on harmonic and anharmonic force fields, respectively. 53Since the accuracy of the rDSD functional has been recently validated for all these quantities 54−57 and also for vibrational transitions, 58,59 we will not analyze again these aspects.The same remark applies to the components of dipole moments, which determine the intensities of rotational transitions. 60efore considering cyclic and bicyclic compounds of astrochemical interest, we analyze the performance of the new PCS/bonds model for a panel of medium-sized compounds (Figure 1) with the aim of determining the role of the different approximations and the weight of the various contributions.
The results collected in Table 1 show that the accuracy of the rDSD/3F12 − model is reasonable, but the experimental groundstate rotational constants are systematically overestimated.However, the straightforward interpretation of this result in terms of underestimated bond lengths (related to the inverse proportionality between rotational constants and inertia moments) is misleading because it completely neglects the contribution of vibrational corrections.As a matter of fact, ΔB vib is always the largest corrective contribution, it is only partially counterbalanced by the positive contribution of the CV correlation, and it actually leads to a non-negligible reduction of the equilibrium value.The final PCS/Bonds results always fulfill the target accuracy of 0.1% without any underlying error compensation.In fact, the rigorous ΔB CV2 correction and its

The Journal of Physical Chemistry A
one-parameter ΔB CVB approximation are always very close, while the additional ΔB VB contribution vanishes in the absence of multiple CC bonds and corrects for the slight overestimation of conjugative effects typical of DFT methods.Moreover, the use of a double-hybrid functional in conjunction with a sufficiently large basis set permits one to avoid any bondspecific correction and, even more importantly, any correction to valence and dihedral angles.It is apparent that the ΔB el contribution is smaller than the target accuracy of our computational approach, at least for the molecules considered in the present study (see Table 1).As a consequence, in the following, this contribution will be neglected.
An intuitive picture of the performance of a given model is offered by a graphical representation of the results based on normal distributions defined by

The Journal of Physical Chemistry A
In eq 14, Δ av and Δ std are the relative unsigned mean error and standard deviation, respectively, whereas N c is a suitable normalization constant.The results shown in Figure 2 confirm the remarkable accuracy and robustness of the PCS/Bonds model.
While all the computed rotational constants are in remarkable agreement with experiment, it would be useful to have a direct comparison between experimental and computed geometrical parameters since, especially for medium-sized molecules, the final experimental outcome could be tuned by different factors.Unfortunately, accurate experimental structures are not available for the quite large PAHs, which are the main targets of the present study.However, the very recent determination of the semi-experimental equilibrium structure of 2-furonitrile 61 (see

The Journal of Physical Chemistry A
Figure 3) permits a detailed comparison between PCS/Bonds and experimental geometrical parameters for a heteroaromatic molecule containing the CN moiety.The results collected in Table 2 show that the accuracy of all the geometrical parameters is in the expected range and rivals that of the composite "Cheap" wave function method, 31,62 which requires orders-of-magnitude larger computer resources and, above all, is not applicable to large PAHs.
While the main targets of the present study are cyclic and bicyclic molecules, the search for other similar-sized linear and complex molecules in space is also important, especially in exoplanet studies.In principle, the errors of the PCS/Bonds model should be the same for linear and cyclic molecules, but the former class of molecules is usually more flexible than the latter one.This feature increases the difficulty of accurate structure evaluations due to the presence of large amplitude motions, which limit the precise location of energy minima and the accuracy of the perturbative evaluation of vibrational correc-   4) and compounds containing heteroatoms (Table 5).
The Journal of Physical Chemistry A tions. 57In order to address these limitations for a typical flexible organic molecule, we report in Table 3 the geometrical parameters and spectroscopic constants for the most stable conformer of glycolic acid (see Figure 3).Comparison with the accurate semi-experimental equilibrium structure 56 shows that the average error of PCS/Bonds bond lengths and valence angles are generally close to 1 mÅ and 0.1°, respectively.As expected, both the uncertainties on the semiexperimental parameters and the errors of their PCS/Bonds counterparts are larger than those of the semirigid 2-furonitrile molecule.However, the PCS/Bonds errors on rotational constants are significantly reduced with respect to their rDSD/3F12 − and CCSD(T)/ccpVTZ counterparts.
Cyclic and Bicyclic Compounds.It is well recognized that PAHs and their nitrogen-containing PANH counterparts are widespread throughout the universe. 26Unfortunately, PAHs are either nonpolar or weakly polar (i.e., have a dipole moment lower than 0.5 Debyes), this preventing not only their radioastronomical detection but also their laboratory characterization by rotational spectroscopy. 24,27,63In any case, ground-state rotational constants are available for several PAHs, and the corresponding experimental results are compared with their computed counterparts in Table 4, whereas the molecular structures are shown in Figure 4.
More recently, bicyclic compounds have been searched in the interstellar medium (ISM), 64 and the prototypical norbornadiene molecule is included in the same table.Noted is that while naphthalene, anthracene, phenanthrene, and pyrene only involve six-membered aromatic rings, the situation is different for azulene, indene, and acenaphthylene.This aspect is not marginal since the very effective Lego brick approach 18 mentioned in the Introduction cannot be applied to the last three molecules (and to norbornadiene) due to the lack of accurate structures for suitable fragments.The results collected in Table 4 show that the accuracy of the PCS/Bonds model is the same as that obtained for the validation set of molecules.
This finding is indeed remarkable since the dimensions of the considered PAHs are outside the range of application of state-ofthe-art QC methods.The only outlier is phenanthrene, for which the relative error increases to 0.4%.Since there is no reason why the PCS/bonds and Lego brick 18 approaches should perform well for anthracene and pyrene (and also for the CN derivative of phenanthrene, vide infra) and not for phenanthrene, we can only hypothesize that the experimental rotational constants are affected by some problem. 65he issues related to the vanishing (or very small) dipole moments of most PAHs can be overcome by resorting to the cyano-substituted derivatives (CNPAHs), whose large permanent dipole moments produce intense rotational spectra.The abundance of the CN radical in the ISM 66 and its extreme reactivity 27,63 suggests that CNPAHs are good proxies of PAHs and can be used to guess the presence of the corresponding PAHs in the ISM.In the same vein, PANHs and other heterocyclic compounds have sufficiently large dipole moments and have been detected in several regions of the ISM.Furthermore, CN-substituted heteroaromatic and bridged bicyclic compounds have been recently searched 64 in order to have an independent guess of the abundance ratio between CNPAHs and their parent PAHs.The structures of a panel of molecules belonging to all of these classes are shown in Figures 4  and 5, whereas the corresponding rotational constants are collected in Table 5.
In general terms, the performances of the PCS/Bonds model are again comparable to those found for the validation set, and this can be better appreciated from the Gaussian distributions shown in Figure 6.These results confirm that the PCS/Bonds approach and the companion Web site permit the unbiased computation of accurate equilibrium geometries and rotational constants for semirigid molecules containing a few dozen atoms.Above these dimensions, the computation of vibrational corrections becomes the bottleneck of the whole procedure, and more effective approaches are under development in this connection. 22However, as already mentioned, the most critical issue is related to the flexibility of most molecules of biological interest.While the conformational search of stable structures can profit from multilevel QC methods possibly driven by Machine Learning algorithms, 67 the underlying problem of large-amplitude motions limits the accuracy of any perturbative approach to vibrational corrections.Here, new ideas and implementations are surely needed.

■ CONCLUSIONS
The main target of the present study was the computation of accurate geometrical parameters and rotational constants in the framework of a general strategy that aims to accurately characterize structural and spectroscopic properties of mediumto large-sized molecules.The main outcome is that accurate geometrical parameters can be computed by the PCS/Bonds model without any additional cost starting from geometries optimized by the revDSD-PBEP86-D3(BJ) functional in conjunction with the 3F12 − basis set.Then, the vibrational corrections needed for a proper comparison with experimental ground-state rotational constants can be effectively obtained from B3LYP-D3(BJ)/6-31+G* semidiagonal cubic force fields.In summary, the way seems to be paved toward the systematic study of molecules of astrochemical and/or prebiotic interest in the gas phase at a reasonable cost by an accurate black-box procedure.

Figure 1 .
Figure 1.Structures of the molecules belonging to the validation set.

Figure 2 .
Figure 2. Error statistics for the semirigid molecules of the validation set.

Figure 4 .
Figure 4. Structures of the PAHs studied in this work together with some CN-substituted species.

Table 2 .
Computed and Experimental Rotational Constants (in MHz) of 2-Furonitrile CH bond lengths, together with H6C4C3, H8C7O5, H10C9C4, and H10C9C7 valence angles fixed at the Cheap computed values.c For the definition and expected accuracy of the Cheap model, see refs 17 and 62.
a From ref 61 with rotational constants truncated to the first decimal place.b d This work.

Table 3 .
Computed and Experimental Rotational Constants (in MHz) of Glycolic Acid The vibrational corrections (taken from ref 56) are 96.9, 45.8, and 30.4 MHz.d In conjunction with the 3F12 − basis set.
a Experimental ground-state rotational constants from ref 56 truncated to one decimal place.b In conjunction with the cc-pVTZ basis set.c

Table 4 .
Computed and Experimental Rotational Constants (in MHz) of Polycyclic Aromatic Molecules aIn parenthesis is the ΔCV2 contribution.b Not including phenanthrene.

Table 5 .
Computed and Experimental Rotational Constants (in MHz) of Polycyclic Heteroaromatic Molecules