Electronic circular dichroism of proteins computed using a diabatisation scheme

ABSTRACT Circular dichroism (CD) spectroscopy is a powerful technique employed to study the structure of biomolecules. More accurate calculation of CD from first principles will aid both computational and experimental studies of protein structure and dynamics. We apply a diabatisation scheme to improve the description of nearest neighbour interactions between two electronic transitions (nπ* and πnbπ*) localised on each individual peptide bond (amide group) in a protein. These interactions are incorporated into DichroCalc, an exciton-based computational method to calculate CD, and yield improvements over the standard DichroCalc parameter set, particularly for calculation of CD for important secondary structural elements such as an α helix. GRAPHICAL ABSTRACT


Introduction
Calculations from first principles of the electronic circular dichroism (CD) spectroscopy of proteins are challenging. It is a topic that has attracted the interest of many researchers, including the late Nick Besley [1], to whose memory this Special Issue is dedicated. In this paper, we present a new theoretical development, but we begin by surveying some of the contributions of Besley to the field, which underpin the current study. The CD spectra of proteins in the far-ultraviolet (far-UV) have distinct signatures, reflecting the protein secondary structure, which arise from the regular, repeating arrays of backbone amide chromophores. Thus, CD is a widely used biophysical technique for characterising proteins in solution [2]. Often, statistical techniques are used to estimate the secondary structure content from the experimentally determined CD spectrum [3,4]. However, our focus is on the reverse, namely, computing the CD spectrum from the atomic coordinates of the protein.
CD spectra can be computed from protein structures in various ways, e.g. see [5][6][7][8][9][10], some of which have been reviewed elsewhere [11]. We focus here on the exciton framework, whereby an effective Hamiltonian is constructed through a consideration of localised chromophores within the protein [12,13]. The peptide bond is the most important contributor to the far-UV CD spectra of proteins. A simple effective Hamiltonian would consider the nπ * and π nb π * electronic transitions of the amide chromophore, which occur at about 222 and 190 nm, respectively. The π nb , n and π * molecular orbitals for an example diamide are depicted in Figure 1.
A desire to improve the quality of protein CD calculations that use the exciton framework motivated several studies of the electronic structure of small amides using ab initio multi-configurational methods, including complete active space self-consistent field (CASSCF) calculations on formamide [14] and N-methylacetamide [15] in the gas phase. Analogous calculations with various models of solvent showed subtle, but important, changes to the electronic transitions, including, for example, differences in the orientation of the π nb π * electric transition dipole moment compared to the gas phase [16,17]. This factor was influential in leading to an improvement in the accuracy of calculations of protein CD spectra based on parameterisation of the ab initio descriptions of the amide chromophore [18][19][20] compared to spectra calculated using parameters from semi-empirical calculations [21]. The computational methodology for protein CD calculations and the chromophore parameters are with an isosurface cut-off of 0.05. Orbitals from a state-averaged CASSCF calculation as described in reference 32. (a) π * for amide 1, (b) n for amide 1, (c) π nb for amide 1, (d) π * for amide 2, (e) n for amide 2 and (f) π nb for amide 2.
available via a web-interface [22], which allows the nonexpert simply to upload a Protein Data Bank file with the coordinates of a protein structure and to compute the CD spectrum. The theoretical background and related developments have been summarised elsewhere [11,23] and the approach has been used, sometimes in conjunction with molecular dynamics simulations, to study a diverse range of systems, e.g. glycosylated antimicrobial peptides [24], SARS-CoV-2 proteins [25] and chymotrypsin adsorbed on silica [26].
The above discussion has focused on two localised electronic transitions, the nπ * and π nb π * excitations, on an individual amide chromophore. Whilst the computational approach based on this approximation has been successful, additional insights have been gained through the study of cyclic [27][28][29] and linear diamides [30][31][32]. Electronic excitations with higher transition energies may be important [33] and in this region charge transfer transitions between orbitals on neighbouring peptide groups are also observed. Quantum chemical calculations at the CASSCF level on a linear diamide enabled the generation of parameters describing the charge transfer transitions [31]. Protein CD calculations incorporating these transitions gave an encouraging level of agreement with synchrotron radiation CD spectra [34] measured down to 170 nm. The CASSCF calculations on the diamide provide an additional opportunity, which is the subject of our study, namely, to consider the interaction between two covalently bound amide chromophores at the quantum chemical level using a procedure known as diabatisation.
The effective Hamiltonian matrix in the exciton framework comprises diagonal elements corresponding to the energies of monomer (diabatic) states and off-diagonal elements. The latter are inter-state interactions or couplings. Whilst the isolated monomer, i.e. N-methylacetamide, appears to be a good choice of diabatic state with the couplings computed using a Coulombic model, an alternative approach would be to apply a diabatisation scheme to a full ab initio calculation on a diamide model, whereby the eigenstates of the diamide are transformed to a diabatic basis. This method is capable of treating multiple excited states and also taking into account both short-and long-range effects of the excitonic couplings. Thus, an orthogonal transformation is used to connect the adiabatic excited states with a set of diabatic excited states. There is no single 'ideal' set of diabatic states [35], but several strategies can be adopted to define the adiabatic-to-diabatic transformation. The fragment excitation difference scheme uses the so-called excitation density [36]. A variant of this and some extensions are available in the EXAT software [37]. An alternative strategy derives the couplings from the energy splitting between the excited states and the extent of exciton delocalisation between the fragments [38]. Other approaches have maximised the overlap with a reference wavefunction [39] or the similarity with the dipole moment [40]. Diabatisation approaches have been applied to aggregates of anthracene and tetracene [41], to aggregates of propeller-shaped emitters [42] using the fromage software [43], dimers of ethylene and derivatives [44] and to various nucleic acid systems [45,46]. Herein, we apply a diabatisation scheme to compute the CD of some polypeptide structures corresponding to conformations of particular significance, including α-helices, β-strands, polyproline II helices and 3 10 -helices.
Proteins rich in β-strands can be separated into two distinct classes: β-I and β-II [47]. This classification can be made based on their CD spectra [47,48]. β-I proteins, e.g. concanavalin A, contain regular β strands and give rise to CD spectra with a positive band at 195 nm and a negative band at 215-220 nm. β-II proteins, e.g. elastase, yield CD spectra reminiscent of unordered polypeptides with a negative band around 198 nm [49,50]. It is thought that β-II proteins may adopt more polyproline II conformation [50], in addition to the possibility that they are more conformationally labile. Accurate computation of the CD spectrum of the polyproline II conformation would, thus, help provide useful insight into the spectra of β-II proteins. Another conformation of interest is the 3 10 helix, which, despite some structural similarity to the α helix, has a distinct CD spectrum [51].

Diabatisation
A diabatisation procedure was performed to generate more accurate inter-amide couplings between two electronic transitions (nπ * and π nb π * ) localised on each of the two amide monomers when in a dimeric system (a diamide). The procedure follows that described by Aragó and Troisi [41], which is based on orthogonal transformation of the adiabatic states (eigenstates of the total Hamiltonian) to diabatic (noninteracting) states for a two-level dimer system and generalisable to systems with an arbitrary number of interacting states (four in this study). For two monomers (amides) labelled 1 and 2 that combine, via a peptide bond, as a dimer (a diamide), consider four singlet excited states on the dimer and two singlet excited states on each of the monomers. The four adiabatic excited states on the dimer are the n 1 π 1 * and π nb1 π 1 * localised on amide 1 in the dimer and the n 2 π 2 * and π nb2 π 2 * localised on amide 2 in the dimer. The two excited states on each of the two isolated monomers are nπ * and π nb π * . The diabatic Hamiltonian H D is found by a transformation of the adiabatic Hamiltonian H A , thus: The unitary transformation matrix C relates the diabatic states (on the dimer) to the adiabatic states (on the dimer). It is desirable that the diabatic wave functions be as close as possible to the initial noninteracting wave functions. The unitary matrix C is computed by minimising the difference between the adiabatic transition dipole moments (TDMs) on the dimer and the isolated monomer TDMs in the coordinate frame of the dimer [41]. Computation of matrix C proceeds as follows [41]. The matrix M is formed from a product of matrices representing the adiabatic TDMs on the dimer, µ A , and the isolated TDMs on each of the monomers, µ ISO , Singular valued decomposition is carried out on matrix M to find the matrices U and V * . These are used to obtain C * ,best , where C * ,best is the transpose of C best which is defined as the best unitary transformation satisfying the following equation, where ||·|| denotes the Frobenius norm. The adiabatic Hamiltonian is diagonal, where the diagonal elements are the transition energies of the excited states in the dimeric system. The output of the diabatisation is the diabatic Hamiltonian with transition energies on the diagonal and the coupling between excited states in the diabatic representation in the offdiagonal matrix elements. The amide is the monomer and the diamide is the dimer. The monomer and the dimer state properties (TDMs) and adiabatic Hamiltonian are listed as matrices in Table S1. The input properties are from DichroCalc parameter sets for the amide N-methylacetamide (set NMA4FIT2) [17] and the diamide N-acetylglycine N -methylamide (13 sets for various ϕ and ψ dihedral angles of the diamide) [31,32]. The TDMs for the two monomers (from the above parameter set) are transformed onto the coordinate frame of the dimer prior to diabatisation. The 13 pairs of ϕ and ψ angles and the corresponding parameter sets for the diamide are listed in Table 1. The geometries were chosen to span the Ramanchandran plot. The diabatic Hamiltonian output from the diabatisation contains off-diagonal matrix elements that are the couplings between the states on each monomer in the system of the DichroCalc parameter set [22] 180 180 Polyproline II 2m # CTP2007A # Notation following Oakley et al. [32] * Set exclusive to Oakley and Hirst [31].
dimer, and it is these quantities that are used to modify the computation of CD (discussed below). The diabatic Hamiltonian matrices are listed in Table S1.

Model peptides
13 model idealised peptide structures, each comprising 20 L-alanine residues, with different repeating ϕ and ψ dihedral angles to describe differing proteinaceous secondary structure elements were constructed using the molecular editor Avogadro [52,53]. The 13 model peptides were considered based on the availability of DichroCalc parameter sets that describe intra-amide and inter-amide (charge-transfer) electronic transitions in a diamide (N-acetylglycine N -methylamide) of varying ϕ and ψ angles [31,32], for which the diabatisation procedure was carried out. The diamides are listed in Table 1.
The naming of diamides 2l and 2m follows the labelling sequence of Oakley et al. [32], who studied diamides 2a through to 2k ( Table 1). The three α-helical diamide structures 2j, 2k and 2l are, respectively, the Corey-Pauling-Branson helix, an idealised helix and the Barlow-Thornton helix [54]. Figure 2 shows a Ramachandran plot of the ϕ and ψ dihedral angles for the 13 structures. For clarity, we adopt a notation for each diamide and peptide with the ϕ and ψ dihedral angles in square brackets, e.g. for 2d the notation is 2d [−135°, 135°].

CD calculations
DichroCalc [22] was used to compute the CD spectra for the 13 model peptides. A modified version of the DichroCalc code was used to read a Hamiltonian matrix where, for each peptide, the nearest neighbour interamide interactions from the diabatisation procedure replaced the default off-diagonal matrix elements generated by DichroCalc. The theoretical approach utilised  Table 1). Inset is the structure of peptide 2k by DichroCalc, as well as alternative approaches to the computation of CD for biomolecules, has been described in detail in a recent review [11]. Briefly, the Dichro-Calc software implements the exciton framework, with the transition electron densities associated with excitations of biological chromophores represented by a set of point charges (monopoles) that reproduce the electrostatic potential arising from the electron density. The parameters (transition energies, electric and magnetic transition dipole moments, electric permanent dipole moments and monopole charge sets) for the amide backbone are derived from ab initio calculations (parameter set NMA4FIT2) [17] and describe the intra-amide nπ * transition (at 220 nm) and the π nb π * transition (at 193 nm). To obtain a CD spectrum, a rotational strength line spectrum, computed by DichroCalc, is convoluted with Gaussian functions of full-width half-maximum (FWHM) of typically, e.g. 9.0 or 12.5 nm. Throughout the rest of the paper, results using the un-modified Dichro-Calc Hamiltonian are termed 'ab initio' and the results obtained from modifying the off-diagonal elements of the DichroCalc Hamiltonian are termed 'diabatic'.  Table S1 displays, for each of the 13 diamides, the three input matrices for the diabatisation procedure: the monomer TDMs, the dimer TDMs and the adiabatic Hamiltonian of the dimer. Table S1 also shows the outputted diabatic Hamiltonian, where the off-diagonal matrix elements 1-2 and 3-4 are intra-amide coupling interactions (for n 1 π 1 * with π nb1 π 1 * and n 2 π 2 * with π nb2 π 2 * transitions) and where matrix elements 1-3, 1-4, 2-3 and 2-4 are inter-amide coupling interactions (for n 1 π 1 * with n 2 π 2 * , n 1 π 1 * with π nb2 π 2 * , π nb1 π 1 * with n 2 π 2 * and π nb1 π 1 * with π nb2 π 2 * transitions, respectively).

Results
The ab initio and the diabatic inter-amide interactions (off-diagonal elements of the Hamiltonian matrix) are shown in Table 2 Table 2. Inter-amide interactions. Off-diagonal elements of the Hamiltonian (cm −1 ). DichroCalc ab initio matrix elements in the first row and diabatic matrix elements in the second row for each diamide geometry. The ab initio parameters have nπ * and π nb π * transition energies at 220 nm (45455 cm −1 ) and 193 nm (51813 cm −1 ), respectively. Diamide (ϕ, ψ) n 1 π 1 * :n 2 π 2 * n 1 π 1 * :π nb2 π 2 * π nb1 π 1 * :n 2 π 2 * π nb1 π 1 * :π nb2 π 2 * with π nb2 π 2 * ) having the largest difference between the ab initio and diabatic Hamiltonians. The Coulombic coupling of two electronic transitions is related to the dot product of the transition dipole moments located on monomers A and B [55], V dd is the coupling between the two dipoles µ A and µ B , and R AB is the distance between the two monomers A and B. If the diabatic coupling is different from the ab initio coupling and the dot product of the transition dipole moments is similar in magnitude, then the coupling may have a short-range non-Coulombic interaction that is not described by the interaction of two dipole moments (or two sets of monopole charges) from a monomeric calculation. The diabatic coupling is from diabatisation of the adiabatic states on a complete dimer, including all the electrons in the system, using a basis set (for the diabatic states) of the monomer transition dipole moments. For cases where the ab initio and diabatic coupling differs and the magnitude of the dot product of the transition dipole moments differ, the diabatic states may contain a more accurate description of Coulombic interactions between the two transitions, due to the more accurate wave function of the adiabatic states, i.e. for the complete dimer. Table 3 Table S5.

Discussion α helices
The experimental spectrum for an α helix possesses an intense positive band peak at 190 nm and a double minimum band with the negative band peaks at 208 and 220 nm. The CD spectra computed for the α-helical peptides  Table 3. Dot products of the un-normalised transition dipole moments (D 2 ) for six of the 13 diamide structures (Table S1). Upper triangle of Hamiltonian H ij . Intra-amide interactions are 1-2 and 3-4. Inter-amide interactions are 1-3, 1-4, 2-3 and 2-4. Transition dipole moments are not normalised.

β strand
The experimental CD spectrum for a β strand features a positive band at around 195 nm and a negative band at around 215-220 nm. The 2d [−135°, 135°] peptide has a computed CD spectrum using the diabatic interamide interactions that does not possess this feature in the 215-220 nm region, whereas the CD spectrum using the ab initio inter-amide interactions displays a degree of the correct character in this region ( Figure 3) as well as a positive band at 186 nm.
Due to the arbitrariness of the phase of the wave functions, the signs of the excitonic coupling from the diabatisation are not well defined. A bimodal distribution (plus or minus) for the excitonic coupling is often observed (see, e.g. Figure 14 from Jiang et al. [56]). Thus, we usually select the signs of the diabatic coupling such that the computed diabatic CD spectra best agree with experimental and/or CD spectra computed using DichroCalc. To examine the differences between the two predicted spectra, the sign and/or magnitude of the three coupling interactions discussed above (1-4, 2-3 and 2-4; respectively, n 1 π 1 * with π nb2 π 2 * , π nb1 π 1 * with n 2 π 2 * and π nb1 π 1 * with π nb2 π 2 * ) in the diabatic Hamiltonian were varied so that the negative band at 215-220 nm could be observed in the computed CD spectra (Figure 4). Table 4 Table 2). The dot product of the un-normalised transition dipole moments for these couplings are, respectively, −0.032, −0.350, −0.342 and -2.037 D 2 for the monomer and -0.036, 0.407, −0.399 and 2.842 D 2 for the dimer (Table 3). Using the rationale outlined in the Methods section, H 14 , H 23 and H 24 may have non-Coulombic interactions in the diabatic representation, with H 24 having significant non-Coulombic interactions and additional Coulombic interactions in the diabatic representation. As there are little differences in the coupling values and dot products for the H 13 coupling, this is likely a Coulombic interaction.

helix
The two computed spectra for diamide 2i [−74°, −4°] both feature an intense couplet centred around 197 nm for the ab initio spectrum and centred around 190 nm for the diabatic spectrum (Figure 3), which has the relatively more intense couplet of the two spectra. For the ab initio spectrum, the couplet has a positive peak at 189 nm and a negative peak at 203 nm, which has a shoulder region around 220 nm. The couplet in the diabatic spectrum consists of a negative peak at 184 nm and a positive peak at 197 nm. This spectrum also features a relatively shallow and broad negative band around 218 nm (where the shoulder in the ab initio spectrum is situated).

Polyproline II
The ab initio spectrum for diamide 2m [−75°, 145°] features an intense couplet centred around 188 nm with a negative band at 181 nm and a positive band at 194 nm ( Figure 3). The spectrum also has a shallow negative band at 208 nm and a weak and broad positive band at 220 nm. In contrast, the diabatic spectrum possesses a positive band at 186 nm and two negative bands at 199 and 220 nm with the latter negative band being broader and shallower than the former.
The coupling values for H 13 , H 14 , H 23 and H 24 are, respectively, 31, −7, 16 and −971 cm −1 for the monomer and 81, 356, 145 and 29 cm −1 for the dimer ( Table 2). The dot product of the un-normalised TDMs for these couplings are, respectively, −0.002, 0.005, −0.619 and -2.865 D 2 for the monomer and 0.048, 0.543, −0.610 and −4.660 D 2 for the dimer ( Table 3). As noted for the β strand dimer (2d [−135°, 135°]), H 14 , H 23 and H 24 may have non-Coulombic interactions in the diabatic representation, with H 24 having significant non-Coulombic interactions and additional Coulombic interactions in the diabatic representation. As there are little differences in the coupling values and dot products for the H 13 coupling, this is likely a Coulombic interaction.
The experimental CD spectrum of a polyproline II structure (poly-L-proline) features a negative peak in the region around 200 nm and a positive peak at 220 nm [57]. The computed diabatic CD spectrum features a negative band at 199 nm, in agreement with experiment, and a negative band at 220 nm, in disagreement with experiment.

Future work and outlook
Future work will involve derivation of an empirical correction term (for inter-amide couplings) to incorporate into the DichroCalc NMA4FIT2 parameter set, based on the diabatisation work presented here. The empirical correction will follow the work of Scholes and Ghiggino [58], who defined a distance-based exponential term for short-range, non-Coulombic interactions between two chromophores, and Voityuk [55], who derived expressions to compute the electronic coupling based on vertical transition energies, transition dipole moments (of the system and of individual chromophores) and the relative orientation of the chromophores or individual transition dipole moments. The correction term will account for the short-range, non-Coulombic interactions between excited states on neighbouring amides, an interaction currently neglected in the NMA4FIT2 parameter set and which has been shown, in this study, to be a significant contributor toward computing more accurate CD spectra, e.g. the double minimum for an α-helical polypeptide.