Harnessing the power of curvilinear internal coordinates: from molecular structure prediction to vibrational spectroscopy

Different standard VPT2 codes employ Cartesian coordinates for the computation of rotational and vibrational spectroscopic parameters. However, curvilinear internal coordinates offer a number of advantages provided that a general non-redundant set of coordinates can be built and employed in an unsupervised workflow. In the present paper I summarize the main results and perspectives of a general engine employing curvilinear internal coordinates and perturbation theory for the computation of rotational and vibrational spectroscopic parameters of large molecules beyond the conventional rigid rotor/harmonic oscillator model. Some examples concerning biomolecule building blocks are discussed in some detail in order to better analyze the performance of the proposed strategy.


Introduction
The quest for a reliable yet practical modeling of large molecular systems has always played a central role in the field of theoretical and computational chemistry, and this balance between accuracy and feasibility is likely to persist into the future [1].Combination of different complementary techniques is often the key to determining in an univocal way the molecular structure and bond topology for a wide range of systems, ranging from small molecules to large biomolecules.In particular, high-resolution spectroscopic studies both in the gas-phase (microwave, MW) and in inert matrices (infrared, IR) allow an unbiased disentanglement of intrinsic stereo-electronic effects without any strong perturbing effect from the environment.However, the tuning of experimental spectra by different low-lying structures and the interplay of different contributions can make the interpretation of experimental data troublesome or even impossible without the aid of trustworthy in silico simulations.In fact, ongoing improvements of hardware, software, and, above all, underlying physical-mathematical models are allowing the analysis of experimental data and their interpretation for molecular systems of increasing complexity.Despite the undisputed effectiveness of static structure-property correlations and the fundamental rigid rotor/harmonic oscillator (RRHO) model, results that are directly comparable to experiment can only be achieved through more advanced models: (i) at the electronic level, employing highly-correlated methods; (ii) at the nuclear level, by incorporating anharmonic effects in the description of the nuclear motions.In fact, a strong limitation of the harmonic approximation is that the real vibrations are intrinsically anharmonic, and as a consequence vibrational energies are systematically overestimated.A common empirical scheme for improving the agreement with experimental results rests on the application of scaling factors depending on the employed quantum chemical (QC) method and, possibly, on the range of investigated frequencies.However, those scaling factors, generally obtained from statistical analyses of fundamental transitions, do not improve intensities and are often not suitable for higher-quanta transitions.The introduction of anharmonic effects requires a representation of potential energy surfaces (PESs) around stationary points beyond the quadratic approximation, introducing a series of higherorder terms involving one or more vibrational coordinates.In this context, a crucial point is represented by the inclusion of couplings involving different vibrations, which have to be properly considered in order to obtain accurate estimates of line shapes, frequency shifts, etc. Computational strategies aimed at the study of vibrational properties of molecular 133 Page 2 of 19 systems are usually based on perturbative [2][3][4][5][6][7][8][9] variational [10][11][12] approaches or their combinations [13,14].Within variational methods, the energy is minimized starting from an expansion of the wave functions over a basis of known states.This class of methods includes the vibrational selfconsistent-field (VSCF) [12,[15][16][17][18][19][20] and vibrational configuration interaction (VCI) [21][22][23].While the use of accurate variational approaches in conjunction with high-order force fields allows to obtain estimates vibrational energies close to their experimental counterparts, their use is mainly limited to systems containing few atoms due to their unfavorable scaling with the size of the molecule [24].With the aim of reducing the computational cost, several strategies have been devised, such as the iterative subspace expansion algorithms [25][26][27][28] and protocols for the reduction of the VCI basis [29,30].Over the last decades, several variational approaches have been proposed for the resolution of the vibrational problems, not only relying on the Watson's Hamiltonian and exploiting different sets of nuclear coordinates.In particular, different approaches and sophisticated variational, semivariational nuclear motion and dynamics theoretical frameworks (and codes) built upon the use of internal coordinates have been discussed by Carrington [31,32], Csaszar [33,34], Lauvergnat [35][36][37], Tennyson [38,39], and Yurchenko [40], among others.Variational approaches exploiting Monte Carlo diffusion methods in internal coordinates have been also investigated [41,42], as well as the use of optimized and localized coordinates [43,44], for example, in the vibrational coupled-cluster (VCC) framework [45].Another option is the parametrization of the vibrational wavefunction in tensor formats such as canonical decomposition [46] and matrix product states.Following that, the wavefunction representations can be optimized through the recent extension of the density matrix renormalization group (DMRG) [47,48] algorithm to the study of the vibrational problem, the resulting method being referred to as vibrational DMRG (vDMRG) [49,50].
When comparing the various methods for including anharmonic effects, [2, 3, 16-19, 21, 22, 51-53] those based on perturbation theory applied to the Watson Hamiltonian offer a remarkable balance between accuracy and computational cost provided that resonance effects are taken into proper account.In particular, the vibrational second-order perturbation theory (VPT2), [2,54] based on a fourth-order polynomial approximation of the potential energy in terms of normal coordinates, allows to study medium-to-large size molecular systems.Furthermore, the formulation of VPT2 based on the Van Vleck contact transformation method [55] enables the extension to a generalized model (GVPT2) [6].Within this framework, the diagonalization of a limited set of reduced-dimensionality Hamiltonians involving strongly interacting states is carried out, while the other states are corrected by second-order perturbative contributions.At the VPT2 level, also the vibrational dependence of rotational constants on molecular vibrations can be expressed through a perturbative expansion leading to the determination of specific vibro-rotational interaction constants and paving the route for the calculation of accurate molecular structures through the semi-experimental (SE) approach [56].
While VPT2 can be successfully applied in many cases, it is also characterized by some intrinsic drawbacks that prevent its use in some scenarios.As mentioned above, the first problem is related to the presence of resonances (both Fermi and Darling-Dennison), even though several computational strategies as well as robust schemes have been developed over the years [6,[57][58][59][60][61][62] even for the treatment of vibrational intensities.Second, the theoretical framework largely depends on the symmetry of the molecule since the presence of degenerate vibrations implies a distinct derivation of the equations [63,64].As a consequence, different derivations and implementations of VPT2 are required for the treatment of systems belonging to different classes of symmetry.Recently, this issue was solved by the formulation of a unified model relying on the proper application of a posteriori transformations to the wavefunctions [65].Last but not least, large amplitude motions (LAMs) are poorly described at the VPT2 level and more generally with a quartic force field.Unfortunately, variational approaches such as the VCI or more refined perturbative methods (such as VPT4 [66]) become rapidly prohibitive as the size of the molecular systems increases, due to the necessity of more demanding anharmonic force field calculations and the increasing complexity concerning the vibrational calculation.On the other hand, low-dimensionality methods tailored for describing one or a limited number of LAMs [67][68][69][70][71][72] imply their separation from the rest of vibrations, usually referred to as small amplitude motions (SAMs).As shown in a previous study, [73] the definition of a suitable set of internal coordinates able to decouple these two classes of vibration allows to treat each normal mode through the most appropriate method.Starting from the available literature, [74,75] the VPT2 framework in terms of curvilinear coordinates was developed and successfully applied to both semi-rigid and flexible systems, showing in the latter case that not only LAMs can be effectively decoupled from SAMs, but also that each LAM can be treated independently from the others.As a matter of fact, SAMs can be treated through the internal-based VPT2, while variational approaches can be employed for the calculation of the energy levels of the floppy degrees of freedom.When a single LAM is present, it can be effectively treated through the discrete variable representation (DVR) [76][77][78][79][80][81][82][83].On the other hand, a higher number of LAMs can be in principle handled through the reaction-surface Hamiltonian (RSH) [84], reaction-volume Hamiltonian (RVH) [85] or VCI-based methodologies.
In this paper, the versatility of internal coordinates and their application in both fields of rotational and vibrational spectroscopy will be described.The paper is organized as follows.First, an overview of the Cartesian-based version of VPT2 will be provided, including the calculation of the vibrational contributions to the rotational constants required in the SE approach.Particular attention will be devoted to the use of different sets of internal coordinates in the nonlinear least-squares fit needed for the calculation of the SE equilibrium molecular geometry.Then, the focus will shift to the extension of VPT2 to the use of internal coordinates, from the calculation of the anharmonic energy levels to the definition of a robust scheme for the automatic identification of Fermi resonances.In this context, the importance of internal coordinates in reducing inter-mode couplings will be underlined through the application to systems of biological and technological interest.

Overview of the Cartesian-based VPT2
In this section, the unified Cartesian-based, VPT2 framework for asymmetric, symmetric, linear, and spherical tops is described.In this theoretical model, all normal modes are treated independently from one another regardless the presence of vibrational degeneracies.In the latter instance, a series of linear transformations are employed a posteriori to determine the correct energies, wave functions and properties.The discussion is structured to give a broad overview of this methodology up to the derivation of the vibrational energy levels also accounting for the presence of Fermi resonances.
Let us consider a molecule composed of N vibrational modes (N being equal to 3N a − 6 for nonlinear molecules and to 3N a − 5 for linear ones, where N a is the number of atoms).The reference Hamiltonian is obtained by expanding the more general vibro-rotational Hamiltonian proposed by Watson [86] in terms of normal coordinates and then only considering the purely vibrational terms: where q i is the i-th dimensionless normal coordinate, p i is its conjugate momentum, i is the corresponding harmonic wavenumber (in cm −1 ), B eq is the equilibrium rotational (1) q i p j q k p l + U constant along the axis and is the anti-symmetric matrix of Coriolis couplings.The terms ijk and ijkl are, respectively, referred to as cubic and quartic force constants, defined by the following notation, where V represents the potential energy, while U is a mass- dependent term which does not contribute to the calculation of the transition energies, and for this reason it will be not considered from now on.Within the VPT2 framework, the vibrational Hamiltonian is expanded through a perturbative series up to the second order, the resulting Schrödinger equation being typically solved by means of two different approaches, namely the Rayleigh-Schrödinger (RSPT) and Van Vleck (CVPT) [55] perturbation theory.As a result, the VPT2 vibrational energy of a state �v R ⟩ is where v R,i is the number of quanta associated with the mode i in the state R, and 0 is the anharmonic resonance-free zeropoint vibrational energy (ZPVE) [9,87,88].
The matrix contains the anharmonic corrections and is defined by the following expressions, Inspection of Eqs. 4 and 5 evidences that the presence of denominators approaching zero would lead to unphysical results.These conditions, collectively known as Fermi resonances (FRs), occur when i ≈ 2 j or i ≈ j + k .The first case is commonly referred to as type I, while the second as type II [89].Several strategies can be adopted to handle this problem.In the deperturbed VPT2 (DVPT2), a sequential screening of all potentially resonant terms is performed, and an analysis based on one or more criteria allows to state which terms should be removed from the anharmonic calculation.Generally, this procedure is organized into two step.The first one is the evaluation of the energetic proximity of the interacting states at the harmonic level, where i and j can be equal and Δ 1−2 is a threshold defined a priori.Second, the overall weight of the term is estimated.Several strategies have been proposed for the latter [7,58,90].In this work, the test proposed by Martin and co-workers [90] has been employed, leading to the following condition, [91] where K 1−2 is a second threshold required in the procedure.
In the DVPT2 scheme, each term fulfilling Eqs.6 and 7 is labeled as resonant and removed from the anharmonic calculation.If on the one side this method avoids singularities in the calculation of the energy levels, on the other side it can lead to a truncate treatment, since the resonant terms are systematically neglected.In order to prevent this situation, the interaction terms related to FRs can be introduced back in a successive, variational step.For the purpose, a variational matrix H is built starting from the DVPT2 energies and Van Vleck Hamiltonian interaction terms.A new set of energies is obtained by diagonalizing H , the full procedure being referred to as GVPT2 F (hereafter, simply GVPT2).

Vibrational corrections
The rotational constants of a target molecule are among the most important parameters obtained from the study of rotational and vibro-rotational spectra, and they are inversely proportional to the principal moments of inertia.In general, the availability of this kind of data for different isotopic species allows for the determination of structural parameters like bond lengths and angles [92].Nevertheless, since molecules are not rigid rotors and are subject to vibrational motion, the notion of a ( 6) reference molecular geometry is far from being a simple concept.An important step forward has been done by Pulay and co-workers [56] through the introduction of the SE approach for the determination of the equilibrium structure of a molecule.While more difficult to measure at the experimental level, this kind of structure properly accounts for vibrational effects and it is independent from the isotopic species.Furthermore, it is directly comparable to quantum-mechanical data.Within the SE method, the structural parameters, are obtained through a fit of the principal moments of inertia (or the corresponding rotational constants).Hence, the SE rotational constant B SE is obtained as follows [93], where = a, b, c is one of the principal inertial axes, while B 0,exp and (ΔB 0 ) QM are, respectively, the experimental rota- tional constant and its correction along .In general terms, (ΔB 0 ) QM has a double nature.It is composed of a vibrational contribution ( ΔB vib ) stemming from the VPT2 framework, and an electronic contribution ( ΔB ele ) [93][94][95] related to the effect of electron on the moments of inertia.In this context only the vibrational contribution will be analyzed, since the electronic term is often negligible and required only for highly accurate calculations.
The vibrational contribution can be derived from the contact-transformed Hamiltonian, leading to an explicit dependence of rotational constants on vibrational degrees of freedom [94], where B v R and B eq are the rotational constants associated with the vibrational state �v R ⟩ and the corresponding equilibrium value, respectively, while ,i are the vibro-rotational interaction constants: where i, are the derivatives of the effective moments of inertia tensor evaluated at the equilibrium geometry.When the vibrational ground state is considered, Eq. 9 can be rearranged to define the vibrational contribution ΔB vib = B 0 − B eq as, ( 8) As it can be deduced from Eq. 10, the coefficients ,i are affected by resonances, occurring when i ≈ j .However, in the calculation of the vibrational correction this issue is solved by recasting the Coriolis contribution as follows, Once the set of SE rotational constants is assembled, a nonlinear least-squares fit is carried out in order to determine the molecular geometry, which is described by a proper set of nuclear coordinates.For the purpose, the theoretical framework has been implemented in the MSR (Molecular Structures Refinement) software, [96][97][98] developed in our research group and designed for the determination of equilibrium structures through the SE approach.Over the last years, the MSR code has been employed for the characterization of numerous molecular systems including sulfur-containing [96,99] and astrochemical systems, [100] biological building blocks [97] and non-covalent complexes [101].The program has been developed with the target of being as general as possible, and equipped with a series of features at different levels.First, a wide range of optimization algorithms (such as the Gauss-Newton and Levenberg-Marquardt) has been included.Second, the so-called predicate observations [102][103][104] can be added to the set of reference data.Within this approach, the set of SE data is augmented with quantum-mechanical estimates of one or more parameters.This feature can be particularly advantageous when the number of isotopic species is not sufficiently large for the full structural characterization.Third, a detailed error analysis has been implemented, with the possibility to calculate, for example, the standard deviations on the singular parameters, the outliers and the condition number.

Curvilinear coordinates formalism
As previously anticipated, the choice of a proper set of curvilinear internal coordinates plays a central role in vibro-rotational analyses.For the present discussion, let us consider a set of non-redundant internal coordinates s .Since the latter ( 11) are nonlinear functions of Cartesian coordinates, a strategy commonly adopted is represented by a Taylor-series expansion around the reference structure, where x = {x 1 , x 2 , … , x 3N a } collects the nuclear Cartesian coordinates, N a is the number of atoms, and the Wilson B matrix and its Cartesian derivative B ′ have been introduced, It is worth pointing out that Eq. 13 can be truncated at different levels depending on the area of study.In the present work, a first-order expansion is sufficient to obtain a set of internal coordinates for the structural refinement, while the Wilson B ′ tensor is required in the field of anharmonic vibrational spectroscopy.
Concerning the calculation of accurate molecular structures, Z-matrix internal coordinates (ZICs) represent the most common choice.ZICs are mostly based on chemical intuition; however, they are not completely unambiguous.Because of this, selecting a different set of ZICs could provide a distinct outcome, indicating a substantial user-dependency.Moreover, poorly designed Z-matrices could prevent the optimization process from converging.In order to bypass this problem, a protocol based on molecular symmetry has been adopted.The latter is supposed to be maintained during the entire optimization procedure, and the formulation of suitable geometrical constraints applied to the internal coordinates is typically sufficient when ZICs are the reference set of coordinates.Although this approach has shown to be quite effective, the use several dummy atoms may be necessary when studying complex chemical topologies, which would involve the addition of numerous degrees of freedom.From this perspective, a remarkable improvement can be achieved by incorporating the symmetry in the definition of coordinates itself.More specifically, the totally-symmetric (A 1 ) coordinates selected from a non-redundant set of symmetry internal coordinates are used, while all the other ones are kept fixed at their guess values.In the MSR code, the delocalized internal coordinates (DICs) [105][106][107][108][109] are used as a reference set, since they are characterized by simple criteria for the identification of redundancies.In order to build DICs, the starting point is the definition of N r redundant internal coordinates, which usually correspond to the so-called primitive internal coordinates (PICs), given by the full list of all bond lengths, valence and dihedral angles.DICs are built as linear combinations of PICs, the transformation matrix U •being generated by the eigenvectors of BB † corresponding to non-null eigenvalues, ( 13) while R contains the redundancies.The selection of A 1 coor- dinates is carried out through a multi-step procedure which is shortly outlined here, while a full description is reported in Ref. [97].First, the guess structure is displaced along each DIC, and the resulting geometry is converted to its Cartesian counterpart.The latter is then subjected to the application of all symmetry operations of the point group, and only if it remains unchanged, the corresponding displacement internal coordinate is marked as totally symmetric.Finally, the matrix product BB † expressed in terms of DICs can be cast in a block-diagonal form, each block being related to an irreducible representation of the point group.Once all A 1 coordinates have been detected, their presence in the same block is checked as a further confirmation.The use of A 1 coordinates in the optimization presents two basic advantages.In fact, it is characterized by a black-box definition of a non-redundant set of internal symmetry coordinates.Second, the generation of geometrical constraints is completely automatized in order to handle in a transparent way even large molecular systems.With the aim of highlighting the effectiveness of the mentioned procedure, a complete study at different levels will be presented later.In particular, the stability of the fit with respect to the set of coordinates has been object of interest.The accuracy of the proposed methodology will be shown through the analysis of systems of biological and astrochemical interest.Furthermore, the comparison of our results with advanced, highly accurate protocols will be object of discussion as well.

Internal-based VPT2 framework
In this section, the main aspects of the internal-based VPT2 framework are described, while interested readers are referred to Ref. [65] for a more detailed discussion.
In order to set up the expressions required for the calculation of VPT2 energies within the internal-based VPT2 framework, the first step is the definition of the vibrational Hamiltonian.At variance with the Cartesian-based formulation, the kinetic energy operator is not diagonal anymore, implying additional terms arising from the perturbative expansion.
Let us consider a set of M internal coordinates s = {s 1 , s 2 , … , s M } ( M ≥ N ).The expression of the kinetic energy operator (KEO) in terms of s is well known and can be stated through the introduction of the Wilson G matrix, where M is the diagonal matrix of atomic masses and B is built over the coordinates s .By introducing ̃ = det(G) , the expression of the operator is [35,110,111] (15) BB † (UR) = 0 0 0 (UR) where ℏ is the reduced Planck constant and is an inherently quantum-mechanical contribution, commonly known as extra-potential term.Similarly to the Cartesian-based treatment, the unperturbed eigenstates stem from the harmonic theory of vibrations.Within the present framework, the harmonic frequencies and normal modes are obtained through the Wilson GF method: [112] The diagonal elements of are the squared angular frequencies, the matrix L contains the eigenvectors linking the normal coordinates Q = {Q 1 , Q 2 , … , Q N } to the vector s, while the Wilson F matrix is the Hessian of the potential energy in terms of the internal coordinates, and can be directly calculated from its Cartesian counterpart H x (see Appendix A).Since the extra-potential term can be generally approximated with its equilibrium value, it can be neglected in the calculation of transition energies.Following the introduction of the customary dimensionless normal coordinates q and their conjugate momenta p , the vibrational Hamilto- nian H v can be obtained, where is the G matrix expressed in wavenumbers and V is the potential energy operator.The anharmonic contributions due to the kinetic energies can be evaluated through the Taylor-series expansion of the g matrix, where eq ij = i ij and ij,kl… represent the derivatives of the matrix expressed in wavenumbers: By inserting Eq. 23 into Eq.21 and considering the expansion of potential energy, the following expression is obtained: Page 7 of 19 133 (24) Analogously to the treatment in terms of Cartesian-based normal coordinates (CNCs), the anharmonic energies can be obtained by either CVPT or RSPT.Furthermore, thanks to the invariance of the harmonic Hamiltonian, the secondquantization formalism can be still applied.The main difference with respect to the Cartesian-based framework is that each vibrational property Q is composed of three contribu- tions, namely a purely potential ( Q V ), a purely kinetic ( Q T ) and a cross term ( Q × ).Within the internal-based VPT2, the energy is provided by the same expression as Eq. 3, where the anharmonic matrix is expressed as the sum of the three contributions mentioned above: As expected, the purely potential term does not show anydifference compared with its Cartesian counterpart but the absence of the Coriolis term and the explicit expression of all contributions are reported in the following, ( 26) ijkl q i q j q k q l + 6 ij,kl p i q k q l p j 133 Page 8 of 19 where few misprints of Eqs.35 and 37 of Ref. [73] have been corrected.By applying the partial fraction decomposition and introducing the tensors} the elements of the matrix can be rewritten in a more compact form, which is also required for the application of the DVPT2 scheme: (33) One of the most interesting aspects of Eqs.37 and 38 is their analogy with the Cartesian-based treatment from the algebraic point of view.In the first place, the internal-based elements of show the same functional form as those introduced in Eqs. 4 and 5. Consequently, the internal-based matrix can be interpreted as a full-fledged generalization of the corresponding Cartesian counterpart, implying a remarkable simplification at the implementation level.In fact, the extension of any code implementing the Cartesian-based VPT2 becomes straightforward.Moreover, the diagnostic of Fermi resonances can be carried out through a direct extension of the Martin test.While the first step in the identification of FRs (see Eq. 6) does not require further changes, the second step becomes a straightforward generalization of Eq. 7: It is evident that Eq. 7 can be generated from Eq. 39 by setting the derivatives of the g matrix to zero, since in that case ijk = ijk .
Once the set of FRs has been identified, the corresponding interaction elements of the contact-transformed Hamiltonian can be calculated, paving the route for the extension of the GVPT2 scheme to the use of internal coordinates.Let us underline that thanks to the analogy between the Cartesian-and internal-based frameworks of VPT2 in terms of the analytical expressions of wave functions, vibrational energies and diagnostic of FRs, the extension of the internal-based VPT2 for asymmetric tops to the treatment of linear, symmetric and spherical tops can be carried out through the procedure described in Ref. [65].

Implementation
In this section, the implementation of the algorithms employed in this work is briefly addressed, while a deeper discussion is reported elsewhere [65,97].

Internal-based VPT2 framework
A full calculation within the internal-based VPT2 framework is carried out through different steps and exploits an interplay of implementations within a recently devised standalone code and a development version of the quantumchemistry package Gaussian.In particular, the standalone program performs the harmonic vibrational analysis, the generation of the single-point Gaussian input files required for the calculation of both potential and kinetic energy derivatives with respect to the normal coordinates by finite differences, and the calculation of such derivatives.On the other side, the tasks carried out through the Gaussian package are the initial optimization, the calculation of Cartesian force constants at both equilibrium and out-of-equilibrium geometries, and the application of the VPT2 framework for the calculation of the anharmonic frequencies.Summarizing, the new program is mainly aimed at generating the anharmonic force field and the Wilson g derivatives, while the internal-based VPT2 framework has been implemented in the Gaussian package.Even though different quantumchemistry programs can be employed for the generation of the derivatives required in the anharmonic treatment, in the present work the G16 [113] package has been always employed.

Calculation of semi-experimental structures
As previously discussed, the main ingredients for the calculation of SE structures are the experimental rotational constants, vibrational corrections, weights and the guess geometry.In general both the guess geometry and vibrational corrections are evaluated through a quantum-chemistry package.Similarly to the application of the VPT2 framework in internal coordinates, the G16 package has been used for this purpose.Once these preliminary operations have been carried out, the whole set of data is used to start the structural refinement by the MSR program, leading to the characterization of the molecular geometry.
Since the new code and MSR share a common set of libraries, they have been recently assembled in a single suit of programs for vibro-rotational analyses.

Computational details
Most of the available electronic structure methods implementing analytical second-order derivatives of potential energy have been employed.The hybrid density functional B3PW91 [114] has been used in conjunction with the julcc-pVDZ (hereafter julDZ) basis set [115].Furthermore, tight d functions have been included (julDZd) for the treatment of atoms belonging to the third period.The doublehybrid functional revDSD-PBEP86 [116] and second-order Møller-Plesset perturbation theory (MP2) [117] have been employed in conjunction with the jun-cc-pVTZ (hereafter junTZ) basis set [115].The latter has been augumented with tight d functions (junTZd) for calculations involving thirdperiod atoms.The empirical dispersion contributions have been systematically accounted for in density functional theory (DFT) computations by means of Grimme's D3 model with Becke-Johnson damping [118,119].

Results and discussion
Several systems have been analyzed in detail with the objective of validating the theoretical framework developed in the fields of both molecular structure prediction and vibrational spectroscopy.In this section, selected test cases will be discussed to highlight the main aspects characterizing the computational protocols developed so far.

Determination of SE equilibrium structures
As previously outlined, the choice of the set of coordinates represents one of the main steps needed to obtain reliable SE structures.While the effectiveness of ZICs for this type of calculations is now well recognized, in this context the attention will be focused on symmetry-based optimizations.With the aim of investigating different symmetries, the simplest Criegee intermediate (point group C s ) and thiophene (C 2v ) have been selected as case studies.Furthermore, for both systems detailed analyses are available in the literature, which provided highly accurate SE structures.For this reason, they represent ideal test cases to validate the computational protocol discussed in the present work.
First, let us consider the Criegee intermediate (see Fig. 1), which is a planar system fully described by 7 degrees of freedom.
A detailed analysis has been performed by McCarthy and co-workers [120], where the experimental rotational 133 Page 10 of 19 constants of nine isotopic species combined with vibrational corrections at the CCSD(T)/ANO1 [121] level of theory have been employed in the prediction of the SE structure.In the present work, the same set of experimental data has been used for the calculation of both the effective ( r 0 ) structure and, in conjunction with vibrational corrections at the rDSD/ junTZ level, its SE counterpart.However, due to the planarity of the system only two of the three moments of inertia are independent.As a matter of fact, 18 rotational constants have been actually included in the optimization.The fit has been performed employing both ZICs (see Section S1 of the Supplementary Information) and A 1 -DICs as nuclear coordinates.Concerning the latter, a set of PICs has been generated from the molecular connectivity, followed by the calculation of DICs and the extraction of the totally-symmetric coordinates.As expected from a C s planar systems, the A 1 -DICs correspond to those DICs which do not alter dihedral angles.Since A 1 -DICs are much less chemically intuitive if compared with ZICs, an error propagation has been applied to express both SE geometries in terms of the latter through a double-step procedure.In the first place, the variance-covariance matrix expressed in terms of A 1 -DICs ( A 1 ) has been converted to the corresponding Cartesian-based counterpart ( x ) through the following expression, where B † A 1 represents the pseudo-inverse of the Wilson B matrix expressed in terms of A 1 -DICs evaluated at the optimized geometry.Secondly, the x matrix has been converted to ZICs, where B Z is the Wilson B matrix in terms of ZICs at the optimized geometry.The new standard deviations and confidence intervals have been obtained starting from the square roots of the diagonal elements of Z .The different sets of (41) structural parameters are reported and compared with those proposed by McCarthy and co-workers in Table 1.Inspection of Table 1 shows that that the SE geometries obtained through ZICs and A 1 -DICs exactly coincide both in terms of structural parameters and standard deviation and, what is even more important they present also structural parameters close to their reference counterparts.Consequently, the symmetry-based approach has been applied without the setup of a user-defined Z-matrix, but still reaching the same accuracy.
A further validation of the proposed methodology has been performed by the structure determination of thiophene (see Fig. 2), a C 2v heteroaromatic cycle.
A first characterization of this system performed by our research group employed the experimental rotational constants of eight isotopologues combined with vibrational corrections based on the B2PLYP and B3LYP functionals in conjunction with the cc-pVTZ and SNSD basis sets, respectively.[122] More recently, a detailed experimental and theoretical investigation of thiophene has been carried out by Orr and co-workers [123].In particular, the set of experimental data has been extended to include 26 isotopic species, with the latter being used together with vibrational and electronic corrections at the CCSD(T)/cc-pCPVTZ level of theory for the structural refinement.In agreement with the calculations reported in Ref. [123], 24 isotopic species have been included in the present work.Similarly to the Criegee intermediate, only two rotational constants for each isotopologue have been considered, leading to a set of 48 experimental rotational constants usable in the nonlinear regression.At variance with Ref. [123], only vibrational corrections at the B3P/julDZd level have been used to correct the experimental data, as well as to compute the guess structure for the optimization.A full comparison of the effective and SE structure with those proposed in Ref. [123] is reported in Table 2.
In analogy with the Criegee intermediate, both the SE equilibrium structure and the standard deviation are invariant under change of coordinates.The search of totally symmetric coordinates provided the correct number of coordinates (8), and their use reaches the same structure obtained by using the Z-matrix indicated in Ref. [123] and reported in Section S1 of the Supplementary Information.Last but not least, a good agreement between the SE structural parameters retrieved in this work and those proposed by Orr and co-workers has been detected despite the different nature of the corrections to the experimental rotational constants.

Anharmonic calculations in internal coordinates
With the aim of highlighting the advantages of an internalbased formulation of VPT2, our recently developed engine has been be applied to both semi-rigid and flexible systems.In the former case, the choice of the coordinates set plays a more marginal role even though it leads to a remarkable reduction of inter-mode couplings.This finding paves the way to the systematic use of cost-effective protocols aimed at calculating vibrational properties.In fact, the reduction of couplings between different vibrations results in the calculation of low-dimensionality anharmonic force fields, which represent the bottleneck of an anharmonic calculation.This effect is even stronger when flexible systems are taken into account, especially concerning the couplings involving LAMs.
In the present work, 1,1-difluoroethylene (see Fig. 3) has been selected as test-case for semi-rigid systems.
The calculation of the fundamental vibrational frequencies at the anharmonic level has been performed by employing all VPT2 schemes discussed in the previous sections within both internal-and Cartesian-based frameworks.A full comparison of the results with the experimental data is reported in Table 3.
As expected, the Cartesian-and internal-based calculations do not show any difference in the resulting transition frequencies.It is also worth specifying that Coriolis contributions have been properly considered in the calculations based on Cartesian coordinates.As a of fact, specific second-order terms in the expansion of the KEO yield contributions formally equivalent to the Coriolis terms.Despite the GVPT2 results obtained on the basis of Cartesian and internal coordinates are numerically indistinguishable, the magnitude of the couplings is remarkably affected by the choice of the nuclear coordinates.This pattern has been also observed for terms which do not contribute directly to the calculation of anharmonic transition energies in this context, such as three-mode quartic force constants.
Inspection of Fig. 4 shows that the magnitude of intermode terms is definitively reduced when switching from the Cartesian-based representation of normal coordinates to that employing curvilinear coordinates.Furthermore, vibrational couplings are not transferred from potential to kinetic energy, since there exist only six first-order derivatives of the Wilson matrix exceeding 100 cm −1 , while all second-order derivatives are well below this value.A 1 -DICs [b]  Ref. [120]   The AAT conformer of glycolic acid (see Fig. 5) has been selected as a flexible system presenting LAMs.
A recent detailed analysis of the conformers of glycolic acid [126] shows that most of them are characterized by the presence of this kind of vibrations, poorly described at the VPT2 level.In this work, the AAT conformer has been considered, since it is characterized by two distinct LAMs, representing then an ideal system for studying also intermode couplings between different LAMs.The simulations have been performed employing a dual-level method based on the substitution approach [73], where the anharmonic calculation at the B3P/julDZ level of theory has been refined by the inclusion of harmonic frequencies evaluated at the rDSD/junTZ level (hereafter rDSD/junTZ//B3P/julDZ).Furthermore, the reduced-dimensionality scheme has been used, so that anharmonic corrections have been applied to all vibrations but LAMs.The set of fundamental wavenumbers obtained within both the Cartesian-and internal-based GVPT2 framework are compared with their experimental counterparts in Table 4.
The data reported in Table 4 show that both sets of GVPT2 transition frequencies are in remarkable agreement with their experimental counterparts, with a mean absolute error (MAE) always lower than 10 cm −1 .Despite the similarity of the results for Cartesian and internal coordinates, the behavior of the inter-mode couplings is very different, especially when interaction terms involving LAMs are examined.In order to highlight this aspect, the contribution of LAMs has been investigated by comparing the quartic force constants involving modes 20 and 21, which are reported in Fig. 6.
It is apparent that both diagonal and off-diagonal terms are much larger in Cartesian coordinates (exceeding 80,000 cm −1 for 21,21,21,21 ), while the values are definitely more reasonable for the internal-based counterparts.Interestingly, this behavior also involves couplings between LAMs and stretching modes.As an example, the values the semi-diagonal quartic force constants 20,20,1,1 and 21,21,1,1 expressed in CNCs are, respectively, −3733 and −9347 cm −1 , while the corresponding internal-based counterparts are −27 and −62 cm −1 .Hence, the internal-based representation of vibrations allows not only for an improved  A 1 -DICs [b]  Ref. [ Fig. 4 Comparison of the number of cubic ( ijk ( i ≠ j ≠ k )) and quartic ( ijkk ( i ≠ j )) force constants of 1,1-difluoroethylene above a given threshold (in cm −1 ) computed at the MP2/junTZ level with Cartesian or curvilinear coordinates 133 Page 14 of 19 separation of LAMs from SAMs, but it also lays the foundations for a multi-mode, one-dimensional treatment of LAMs.

Conclusion
In this work, I have presented a general engine for vibrational and rotational spectroscopy based on curvilinear internal coordinates.
Concerning the determination of semi-experimental molecular structures, the new methodology provides a transparent way for the a priori definition of the nuclear coordinates required in the optimization process and a unambiguous definition of geometrical constraints without loss of accuracy.The algorithm implemented into the MSR software enables SE nonlinear regression starting from a minimal set of input data without any intermediate, user-dependent step.The applications presented in this context confirmed our initial hypotheses and previous studies, paving the route for the study of larger systems.
The outcomes of a number of test cases demonstrate the reliability of the new GVPT2 engine based on curvilinear internal coordinates and allowing to effectively handle medium-to-large-size molecules.It is worth mentioning that this methodology is completely general for all those electronic structure methods implementing analytical Cartesian force constants.Comparison with the standard Cartesian-based framework shows two main benefits.The first one is the ease of implementation from an existing code based on the Cartesian formulation of VPT2.The second one is a significant improvement in the separation of large amplitude motions and their further treatment by more appropriate methodologies.In fact, a significant reduction in the inherent problems of VPT2 applied to this kind of systems has been verified at different levels.More sophisticated strategies for treating LAMs are currently under study in our research group, with the aim to study molecular systems featured by a growing number of this type of vibrational motions.[127] (IR, Ar matrix), [128] (IR N 2 matrix) and [129] (Raman, Ar matrix) [d] From Raman Ar matrix measurement (Ref.[129]) [e] From IR Ar matrix measurement (Ref.[127]) [f] Average of measurements of Refs.[127] (IR, Ar matrix) and [129] (Raman, Ar matrix) Cartesian [b]  Curvilinear [b]  Exp �1

Fig. 1
Fig. 1 Molecular structure and atom labeling of the simplest Criegee intermediate [a] Equilibrium geometry at the rDSD/junTZ level [b] All fits have been performed on moments of inertia equally weighted [c] Vibrational corrections at the rDSD/junTZ level of theory [d] Vibrational corrections at the CCSD(T)/ANO1 level of theory [e] Mean standard deviation (uÅ 2 ) [f] r = n − p is the number of degrees of freedom, n and p being the number of SE data and parameters, respectively Coordinate r [a]e ZICs[b]

Fig. 2
Fig. 2 Molecular structure and atom labeling of thiophene [a] Equilibrium geometry at the B3P/julDZd level [b] All fits have been performed on moments of inertia equally weighted [c] Vibrational corrections at the B3P/julDZd level of theory [d] Vibrational and electronic corrections at the CCSD(T)/cc-pCPVTZ level of theory [e] Mean standard deviation (uÅ 2 )[f] r = n − p is the number of degrees of freedom, n and p being the number of SE data and parameters, respectively Coordinate r[a]   e ZICs[b]

Table 1
Equilibrium molecular structure of the simplest Criegee intermediate (distances in Å, angles in degrees)

Table 2
Equilibrium molecular structure of thiophene (distances in Å, angles in degrees) Comparison of the Cartesian (top panel) and curvilinear (bottom panel) quartic force constants of the AAT conformer of glycolic acid involving modes 20 and 21 at the rDSD/junTZ//B3P/julDZ level of theory