A structural role for tryptophan in proteins, and the ubiquitous Trp Cδ1—H⋯O=C (backbone) hydrogen bond

A set of hydrogen bonds which occur in proteins between the Cδ1—H donor group of the tryptophan indole and main-chain carbonyl O atoms separated by 1–4 peptide units were characterized. These interactions stabilize unique and functionally important structural motifs and noncanonical side-chain conformations of tryptophan.


Introduction
Tryptophan (Trp) is the largest amino acid, with important functional roles in proteins.It is often found at proteinprotein interfaces, such as antibody-antigen interfaces, accounting for tight interactions and specificity (Samanta & Chakrabarti, 2001), and is ubiquitous in the ligand/substratebinding sites of, for example, lectins and various enzymes (Zhang et al., 2004;Spier & Lummis, 2000).It is also enriched on the surface of membrane proteins embedded in the lipid membrane, where its hydrophobic indole moiety interacts intimately with the lipid phase (Khemaissa et al., 2021).The multiple functions of Trp are contingent on the conformation that it adopts in the active site or at the interface.Consequently, understanding the nature of the forces stabilizing the discrete conformations of this amino acid is essential in structural biology and drug discovery.
The structure of Trp is defined by four dihedral angles (Fig. 1): the backbone Ramachandran ' and angles and the two side-chain dihedral angles � 1 and � 2 .The first, � 1 , is a rotameric angle with minimum energies at À 60 � (gÀ , or m), +60 � (g+, or p) and 180 � (trans, or t).In contrast, � 2 , which involves the sp 2 �-carbon, should in theory only assume values of À 90 � or +90 � .However, it was noted early on that a significant cohort of Trp residues in proteins exhibit an unfavourable m0 conformation (Lovell et al., 2000; we follow the notation introduced by Lovell and coworkers here, where the letter m, p and t is followed by the value of the � 2 angle).Recent results reaffirm that m0 constitutes �10% of the Trp conformers in proteins, while m95 and t-105 dominate the conformational space, with a combined frequency of 65.7% (Hameduh et al., 2023).The question that arises is what are the noncovalent interactions that are responsible for stabilizing the conformations of Trp, especially noncanonical conformations.In the first attempt to address this question, Petrella & Karplus (2004) studied 25 protein crystal structures determined at a resolution of 2.0 A ˚or higher.Based on observed stereochemistry and molecular-dynamics calculations, they concluded that C-H� � �O hydrogen bonds, including those with Trp C �1 -H as a donor, were involved in stabilizing the m0 conformation.In contrast, a subsequent more comprehensive study of nonredundant protein crystal structures determined to better than 2.5 A ˚resolution concluded that the C �1 -H group does not appear to impact the local stereochemistry, perhaps due to a low energy of the interactions (Nanda & Schmiedekamp, 2008).
The existence of hydrogen bonds in which a polarized C-H group can serve as a donor was initially invoked in 1937 to explain the physical properties of mixtures of chloroform with acetone (Glasstone, 1937).Subsequently, such hydrogen bonds have been independently postulated based on the stereochemistry of selected intermolecular interactions observed in the crystal structures of organic compounds (Sutor, 1962(Sutor, , 1963;;Taylor & Kennard, 1982).More recent spectroscopic (for example infrared and NMR) and computational studies provided detailed insights into the nature of this class of interactions (Hobza & Havlas, 2000;Joseph & Jemmis, 2007;Majerz & Olovsson, 2012;Driver et al., 2016;Shi & Min, 2023;Gilli et al., 1994;Gilli & Gilli, 2000;Isaacs et al., 1999Isaacs et al., , 2000;;Derewenda, 2023).As a result, the current definition of a hydrogen bond endorsed by IUPAC includes C-H groups as donors (Arunan et al., 2011).
Although generally regarded as being significantly weaker than canonical hydrogen bonds, the interaction energy of C-H� � �O bonds is enhanced if the C-H group is polarized by an adjacent electron-withdrawing moiety, such as nitrogen in heterocyclic compounds next to a methine group, i.e.
CH-. Biological macromolecules, i.e. proteins and nucleic acids, contain a number of such groups capable of forming C-H� � �O bonds.The occurrence and significance of these interactions have been the subject of several comprehensive reviews (Scheiner, 2006a;Gu et al., 1999;Horowitz & Trievel, 2012;Derewenda, 2023).In DNA and RNA, methine groups in nitrogen bases are involved in base-pairing and basepentose interactions (Beiranvand et al., 2021;Yurenko et al., 2011;Balaceanu et al., 2017).In proteins, the side chain of histidine contains highly polarized C "1 -H and C �2 -H groups (particularly in the protonated, i.e. imidazolium, state) which are often involved in hydrogen bonds (Steinert et al., 2022), including functionally important groups in the active sites of enzymes such as serine hydrolases (Derewenda et al., 1994).The main-chain C � -H group is another example of a polarized bond, despite the sp 3 hybridization of carbon, owing to the adjacent electron-withdrawing peptide linkages.These groups are directly involved in stabilizing the � secondary structure via C � -H� � �O C interstrand bonds in both parallel and antiparallel sheets (Derewenda et al., 1995;Scheiner, 2005Scheiner, , 2006bScheiner, , 2010)).
Tryptophan contains a polarized methine group within the indole moiety.The N "1 atom polarizes the adjacent C �1 -H bond, making it suitable to serve as a hydrogen-bond donor.Ab initio calculations showed the energy of such a hydrogen bond to a water molecule to be À 2.1 kcal mol À 1 , with a C� � �O distance of 3.35 A ˚ (Scheiner et al., 2002).Given the dramatic increase in the number of protein structures determined at high resolution, particularly during the Structural Genomics Initiative (Standley et al., 2022), we decided to revisit the question of the role of the Trp C �1 -H group in protein structures and its possible role in Trp side-chain stereochemistry.Using a subset of nonredundant protein structures from the PDB, with a conservative resolution cutoff of 1.5 A ˚, we discovered that C �1 -H groups have a high propensity to interact with main-chain carbonyl O atoms, specifically with those located nearby in the polypeptide chain.Our stereochemical analysis is consistent with the notion that these interactions have all of the properties of hydrogen bonds, and quantum-mechanical calculations of interaction energies corroborate this conclusion.The presence of hydrogen bonds involving the Trp C �1 -H group correlates with hitherto uncharacterized discrete structural motifs, with important implications for protein structure and function.

Data mining in the Protein Data Bank and stereochemical analysis
A subset of crystal structures determined to a resolution of 1.5 A ˚or better was extracted from the Protein Data Bank.Redundancy was reduced by using a maximum 95% aminoacid identity cutoff.This resulted in a database of 7911 structures.The vast majority did not contain H atoms; those The four conformational dihedral angles defining the structure of a tryptophan residue within a polypeptide.
that did had various C-H distances depending on the refinement program used.Notably, Phenix uses 0.93 A ˚, which is significantly shorter than the actual value of the C �1 -H distance in indole/tryptophan.It is well established from spectroscopy that the C-H distance shortens in the ethane/ ethene/ethyne series, from 1.099 to 1.091 and 1.070A ˚, respectively, although the difference may not stem from hybridization but from the coordination number of carbon (Vermeeren et al., 2021).Inspection of the crystal structures of multiple Trp derivatives in the Cambridge Structural Database shows a variation from 0.93 to 1.13 A ˚, a range of �20% (data not shown).The most accurate measurements of the C-H bonds in crystals are from neutron diffraction.They show that sp 3 and sp 2 C-H bonds shorten to 1.092 and 1.081A ˚, respectively (Lu et al., 2021).As PyMOL adds riding H atoms to C �1 of Trp at 1.09 A ˚, we used its algorithm to add them to all investigated structures, thus replacing the existing atoms.
This database was searched for any contacts between the H atoms of the C �1 -H and O atoms, with a d HO distance of 2.86 A ˚(sum of van der Walls radii) and a minimum � H of 110 � (recommended as a minimum hydrogen-bond angle by IUPAC).In this study, we relied on a set of van der Waals radii that differ from those introduced by Bondi (1964), which are still routinely used.A recent reassessment of the atomic values of van der Waals radii (Chernyshov et al., 2020) noted that Bondi's values consistently underestimate the position of the energy minima by 0.3-0.4A ˚. Using a new concept of line-ofsight and also taking chemical context into account, Chernyshov et al. (2020) provided a revised set of values.They suggest values of 1.21 A ˚for hydrogen in the context of C-H� � �X contacts (where X is not hydrogen) and 1.65 A ˚for an sp 2 oxygen in a neutral carbonyl group.The sum, 2.86 A ˚, is the value we use rather than 2.72 A ˚, which would reflect Bondi's values.Similarly, we note that the new sum of van der Waals radii for sp 2 carbon and sp 2 carbonyl oxygen is 3.56 A ˚rather than 3.22 A ˚, as previously inferred from Bondi's values.Importantly, the estimate of 3.56 A ˚is more in line with the observed C� � �O distances in C-H� � �O bonds, established theoretically as 3.35 A ˚for Trp C �1 -H� � �water (Scheiner et al., 2002) and experimentally as 3.34 A ˚between methine in theophylline and oxygen in formaldehyde (Southern & Bryce, 2022).
The resulting database of close contacts had another layer of redundancy due to the presence of noncrystallographic symmetry, which includes biologically relevant oligomers.To eliminate multiple observations of the same contact, we arbitrarily selected the median interaction from oligomeric structures.We assumed that at 1.5 A ˚resolution or higher, differences between monomers may be due to genuine differences in crystal packing, and so averaging would not be appropriate.However, as the shortest distances might be encumbered by errors, the median contact might be more representative.This final nonredundant data set was used for further calculations of stereochemistry.
The stereochemical analysis was also performed using the PyMOL scripting engine.For each contact identified, the exact distance between the H atom and the O atom was determined, as well as additional geometric parameters as described in Section 3. The database was then split into clusters depending on the number of amino acids between the donor and acceptor groups.The data arising were recorded in tabular form using Excel for each identified conformational cluster separately.All statistical analysis was then carried out in Excel.

Quantum-chemical calculations of interaction energies
Quantum-chemical calculations were performed via the density-functional approach (DFT) within the context of the M06-2X functional (Zhao & Truhlar, 2008), which has been shown to be an accurate means of treating hydrogen bonds and related noncovalent bonds (Kr ˇı ´z ˇ& R ˇeza ´c ˇ, 2022; Boese, 2015;Kozuch & Martin, 2013;Walker et al., 2013;Thanthiriwatte et al., 2011;Liao et al., 2003;Deible et al., 2014;Li et al., 2014;Mardirossian & Head-Gordon, 2013;Elm et al., 2013;Bhattacharyya et al., 2013).A polarized triple-� def2-TZVP basis set was chosen so as to afford a large and flexible set.The Gaussian 16 program (Frisch et al., 2016) was chosen as the specific means to conduct these computations.The interaction energy E int of each dyad was evaluated as the difference between the energy of the complex and the sum of the energies of the two constituent subunits.The counterpoise procedure (Boys & Bernardi, 1970) was applied to correct basis-set superposition error.

Identification of interactions involving Trp C �1 -H as the donor group
We generated a database of nonredundant protein crystal structures refined at a resolution of 1.5 A ˚or higher from the Protein Data Bank (Burley et al., 2022; see Section 2 for the definition of redundancy etc.).Next, we calculated the positions of riding H atoms in all structures with the Trp C �1 -H distance set to 1.09 A ˚. We then identified interactions involving Trp C �1 -H groups as donors and potential oxygen acceptors, i.e. waters, hydroxyl groups (Ser, Thr and Tyr), sidechain groups (Asx and Glx) and main-chain carbonyl O atoms, using a maximum distance cutoff for H� � �O (d HO ) of 2.86 A ånd a minimum C �1 -H� � �O angle (� H ) of 110 � (see Section 2 for an explanation of the cutoff criteria).
We obtained 17 012 close contacts, 5983 of which were with water O atoms.Another 1046 contacts involved Glu and Asp carboxylate groups and 1010 contacts were with side-chain hydroxyl groups of Ser, Thr and Tyr.A further 542 contacts involved side-chain carbonyl groups of Asn and Gln.Interestingly, nearly half of all contacts, i.e. 8431 (49.6%), were with backbone carbonyl O atoms, which are particularly strong acceptors owing to their partial negative charge.Given the preponderance of these interactions, we focused on this group of contacts and analysed the respective stereochemistry in order to assess their character and potential function.

The stereochemistry of the Trp C �1 -H� � �O C backbone contacts
In order to characterize the stereochemistry of interactions involving Trp C �1 -H groups, we first calculated the distribution of the donor-acceptor, or C� � �O, distances (d CO ), as well as the C-H� � �O angles (� H ), separately for all carbonyl O atoms as donors and for water O atoms (Figs. 2 and 3).The distribution of distances to carbonyl O atoms has a distinct maximum at 3.35 A ˚.In contrast, water O atoms were found further away on average, at 3.55 A ˚.The shortest distances in both cases were just below 3 A ˚. � H increases gradually for both types of interactions with the C� � �O distance.
It should be stressed that intramolecular steric constraints significantly impact the observed distance distributions.Nevertheless, we note interesting trends.The peak of the d HO distribution is shorter by 0.2 A ˚compared with the sum of the van der Waals radii of O and C atoms used in this study (i.e.3.56 A ˚; see Section 2), suggesting a cohesive interaction.The higher deviation from linearity than observed in canonical hydrogen bonds can be rationalized in terms of the van der Waals interactions between the donor C and acceptor O atom.Specifically, at shorter C� � �O distances the � H angle assumes more acute values, as the H atom is pushed out to avoid steric collision between H and O, which are further apart by at least 0.3 A ˚than the corresponding distance in canonical hydrogen The stereochemical parameters used in this study.d HO , d CO and � are given in a ˚ngstro ¨ms and all angles are given in degrees.

Table 1
Interaction energies (E int ) calculated for 3-methylindole and N-methylacetamide pairs based on the coordinates of specific interactions in protein structures.
The �d CO values are the changes in the C� � �O distance (d CO ) resulting from additional refinement; the final d HA values (i.e.hydrogen� � �acceptor distances) were obtained after the riding H atoms were replaced with those calculated by PyMOL.Other parameters are � H (the C-H� � �O angle), � O (the C O� � �H angle) and � (the elevation of H from the sp 2 plane).

PDB entry
Class Conformer  bonds, owing to the partly covalent character of the latter.
Overall, the stereochemistry is consistent with that expected for C-H� � �O hydrogen bonds in small organic molecules (Taylor & Kennard, 1982).
Next, we calculated a scatter plot of the two Trp side-chain dihedral angles, i.e. � 1 and � 2 , for all Trp C �1 -H� � � O C backbone contacts (Fig. 4).The purpose was to investigate whether the various structural motifs involve Trp side chains in canonical or strained conformations.Nine conformer clusters are observed.The results are intriguing: although lowenergy m105 and t-105 are the dominant clusters, as expected, not only is m0 strongly represented, but the unfavourable t0 has a nearly equal frequency, and some cases of p0 are also identifiable.
We then asked what the separation was for the observed pairs of interacting moieties along the polypeptide chain.Fig. 5 illustrates the relative register in the sequence between the donor Trp and the acceptor carbonyl group.Positive values indicate that acceptor O atoms are located downstream in the sequence, and negative values refer to oxygen acceptors that are located upstream, i.e. towards the amino-terminus.The most common interactions are those with peptide O atoms in nearby positions: +1, À 1, À 2, À 3 and À 4 (Fig. 5).Intrigued by this observation, we carried out additional stereochemical characterization for all contacts within each class (Fig. 2), including the C O� � �H angle (� O ) and the C � -C O� � �H dihedral angle (�), which allowed calculation of the elevation of the hydrogen from the sp 2 plane (�).Canonical hydrogen bonds demonstrate a strong preference for hydrogens to cluster with � O angles in the range 120-240 � and close to the sp 2 plane (i.e.low elevation; Murray-Rust & Glusker, 1984), and similar trends, albeit not as pronounced, have been reported for C-H� � �O bonds (Taylor & Kennard, 1982).We were interested in whether we could reproduce these trends in the present study.Finally, we calculated the Ramachandran angles for all of the Trp residues involved to identify possible correlations between local secondary structure and side-chain conformation.
All calculations up to this point were carried out using raw coordinates from the Protein Data Bank (except for the riding hydrogen positions, which were added independently).As we embarked on the detailed analysis of specific structures, we were concerned about inconsistencies inherent in the data sets in the PDB introduced by different protocols or refinement and different software.Specifically, we were concerned about the lack of inclusion of H atoms during refinement, the lack of coordinates in the file etc.To avoid bias, all structures described below were subjected to additional standardized refinement and addition of riding H atoms at correct, uniform positions using the PyMOL script.Details are described in the supporting information and Supplementary Table S1.

The C �1 -H ! O C (+1) class. In this class of interactions, the C �1 -H group of Trp points towards the
Distribution of side-chain dihedral angles for Trp residues involved in contacts with all main-chain carbonyl O atoms.Blue outlines indicate the most populous, low-energy clusters found in proteins, green shows energetically favourable but less common clusters and red represents theoretically unfavourable conformations.

Figure 5
A histogram showing the number of contacts between Trp C �1 -H as a donor and the ith main-chain carbonyl O atom as the acceptor.For example, À 2 denotes an acceptor located two peptide units upstream in the sequence.
carbonyl O atom of the next residue downstream in the sequence, reaching across a single peptide bond.This requires a favourable combination of four dihedral angles: two Ramachandran angles, in Trp and ' in the residue downstream, and both the � 1 and � 2 angles in the Trp side chain.There are three possible combinations, leading to only three specific conformational clusters out of the nine possible (Fig. 6).The most populous (361 structures) is a distinct, tight cluster corresponding to the rather rare (4.7% frequency) p90 conformer (average � 1 and � 2 of 64 � and 90 � , respectively).The Trp residue is invariably in the �-secondary structure and the C �1 -H approaches the acceptor O atom from the re face.The bond is close to linear (the average � H is 153 � ), but the angle on the acceptor is unfavourable (average � O of 107 � ; Fig. 7a), resulting in the hydrogen being located significantly outside the sp 2 plane of oxygen (average 2.2 A ˚).A number of such interactions result in very close d HO distances.
Both remaining clusters are in the trans conformation with � 1 close to 180 � .The first is identifiable as t0 (154 structures).In this cluster, C �1 -H also approaches the O atom from the re face (as defined by IUPAC), with H significantly out of the sp 2 plane, and the d HO distances are often short.The bond tends to be less linear than in p90, with an average � H of 137 � , and the angle on the acceptor (� O ) is unfavourable (average of 106 � ) (Fig. 7b), although the H atom is closer to the sp 2 plane (average � of 1.9 A ˚).
The second trans cluster is the canonical t-105 (113 structures), showing optimal C �1 -H� � �O bond stereochemistry.This is accomplished specifically when the downstream residue is proline (15 of the 30 shortest distances, including the five shortest distances) or alanine (nine of the 30 shortest distances).The reason is that the secondary structure of this residue needs to be of the collagen type, and both proline and alanine have a strong preference for this conformation (Berisio et al., 2002;Parchan ˇsky ´et al., 2013).The stereochemistry leads to a mean � O of 123 � , with the hydrogen on average only 0.6 A ˚out of the sp 2 plane, in an excellent position to interact with one of the free sp 2 electron pairs of oxygen (Fig. 7c).

The C �1 -H ! O C (À 1) class.
In this unique type of contact, the C �1 -H group of the indole ring points towards the preceding peptide, engaging in an interaction with the carbonyl O atom immediately upstream in the sequence.The vast majority in this group (694 structures) are in the unfavourable m0 conformation, initially identified by Lovell et al. (2000).Our observation rationalizes the high frequency of this conformer.The motif restricts the Ramachandran ' angle to a narrow range of À 90 � to À 135 � , while is allowed a broader range (Fig. 8).The average � 2 is À 3.2 � .Although the C-H� � �O interaction is close to linear (the average � H is 146.3 � ), the average � O is very unfavourable (86 � ) and the hydrogen is out of the amide plane by more than 2 A ˚on average.We note that such motifs often occur within a �-strand or at the end of one, resulting in a sharp turn.
A small minority of contacts in this class, i.e. 30 examples, are of the m105 type and almost all involve Trp residues in the � L region of the Ramachandran plot, with long d HO distances.Such stereochemistry suggests weak interactions.There are only three structures in the p-90 cluster.

The C �1 -H ! O C (À 2) class.
More conformational freedom is allowed in this class of contacts owing to the insertion of a residue between the acceptor and Trp.Although this motif is more diverse, the same three conformational clusters are observed as were seen in the previous class, albeit with very different frequencies (Fig. 9).By far the most common here is the canonical m105 conformation, with 698 structures.With very few exceptions, Trp is in the �-secondary conformation, with the hydrogen this time approaching from the si face and significantly outside the sp 2 plane.The average � H and � O angles are 137 � and 114 � , respectively.
The m0 cluster is represented by 190 structures.It is very close in conformational space to m105 because the m105 structures are shifted to lower � 2 , with an average value of 82 � , while the m0 cluster is also shifted to higher values of � 2 , with an average of 23 � .In both groups Trp is primarily found in extended, �-secondary conformations, although right-handed and left-handed helical structures are also observed.
There are 277 motifs that constitute the p-90 cluster.The secondary conformation of Trp is restricted to right-handed �-helices and �-structure only.Examples of each of the clusters are shown in Fig. 10.
Of note is the fact that many of the motifs in all three clusters resemble the classic type II �-turn.The conformation of Trp is such that the C �1 -H group mimics the peptide amide which would serve as a donor in a classical �-turn, adding just one atom to the turn (11 atoms instead of 10).Therefore, the direction of the hydrogen bond is preserved, with residue i donating the hydrogen bond to residue i À 2. Unlike the canonical �-turn, this structural feature does not reverse the direction of the polypeptide chain but creates kinks and turns of �110 � .

The C �1 -H ! O C (À 3) class.
In this class, two amino acids are inserted between the acceptor carbonyl group and Trp, adding additional degrees of freedom.Nevertheless, we observe the presence of the same three conformational clusters as was the case for the À 1 and À 2 classes, i.e. m105, m0 and p-90.The difference is that owing to weaker steric constraints, the m105 and m0 clusters are now distinctly separate and closer to the theoretical values for � 2 angles (averages of 98.5 � and À 3.6 � , respectively), and the frequencies are decidedly shifted towards the canonical, low-energy conformations.There are 361 structures in the m105 cluster

Figure 9
The À 2 class of interactions.A double scatter plot (Ramachandran '/ , blue; conformational, � 1 /� 2 , red) for Trp residues in all structural motifs identified in this class.and 290 in the p-90 cluster, with only 34 in the unfavourable m0 group (Fig. 11).
The m105 cluster contains motifs with Trp found in both � and � secondary structures.The average � H is 137.9 � , but � O is again unfavourable (average 113.8 � ).Except for a few outliers, the p-90 cluster is stereochemically tight, with a mean � 1 of 66 � and � 2 of À 89 � .The vast majority of the motifs contain Trp in an �-helical form, and the putative hydrogen bond has a more favourable geometry, with an � H of 138.13).The majority, i.e. �75%, of motifs in the m105 cluster contain Trp in the �-helical conformation, often at the C-terminus of an �-helix (Fig. 14), capping the i À 4 carbonyl with three-centre hydrogen bonds donated by the main-chain amide and the C �1 -H group.
The 80 motifs in the p-90 cluster (average � 2 of À 88 � ) contain primarily (85%) �-helical Trp, with a slightly more favourable average � H of 138 � .Most of these motifs also contain a three-centred hydrogen bond such that the amide group and C �1 -H cap the carbonyl O atom of residue i À 4. This is analogous to the recently documented capping of carbonyl O atoms within membrane helices by Thr and Ser hydroxyls, with a net gain of 127% in enthalpy compared with a single hydrogen bond (Brielle & Arkin, 2020).
The rare m0 motifs also contain Trp in both � and � secondary conformations.They tend to have an unfavourable angular stereochemistry, with an average � H of 128 � and � O of 141 � , and longer d HO distances.

The interaction energies of C-H� � �O C bonds
Whereas the stereochemical descriptors of close interatomic contacts provide useful information for the identification of hydrogen bonds, proximity per se does not imply a cohesive interaction or a structural function in the stabilization of a specific conformation.Historically, this was the argument used by Jerry Donohue in his criticism of June Sutor's proposal for the existence of C-H� � �O bonds based on crystallographic data (Schwalbe, 2012).To support his view, he quoted Ramachandran's opinion that H� � �O distances of
2.2 A ˚in proteins need not necessarily indicate the presence of a hydrogen bond (Ramachandran et al., 1963).It is in principle true that the presence of a hydrogen bond is only hypothesized based on stereochemistry, and its strength is somewhat speculatively inferred from parameters such as linearity (� H ) and hydrogen-acceptor distance (d HO ).However, current knowledge of the physical chemistry of the hydrogen bond makes it possible to predict its existence based on the nature of the participating groups and stereochemistry with a very high degree of confidence.The nature of the various structural motifs described above, harbouring close C �1 -H� � �O C interactions, is strongly suggestive of cohesive hydrogen bonds, but to assess the energies we turned to quantummechanical calculations.
It has been shown by one of us (Scheiner et al., 2002) that a water molecule binds as a hydrogen-bond acceptor to the C �1 -H of indole with an energy of À 2.1 kcal mol À 1 at a d CO distance of 3.35 A ˚.Because a peptide carbonyl is a stronger acceptor, we repeated this calculation for acetamide, representing an amide group, and indole as a model for Trp.The planes of the two molecules were perpendicular to avoid any steric repulsions, with a fully linear C-H� � �O C arrangement.Following the optimization of d HO (2.27A ˚), we obtained a value for the energy of the interaction (E int ) of À 2.6 kcal mol À 1 , which is consistent with a stronger bond.(For comparison, we also calculated the E int value for the interaction of a carbonyl O atom of acetamide with the aromatic C "2 -H group of indole; the result was À 1.05 kcal mol À 1 ).
The above calculations use a perfectly linear C O� � �H-C bond as a model system.The motifs found in actual protein structures are quite different from such ideal stereochemistry, and specifically many show � H and � O values that deviate significantly from linearity.We were interested in whether the energies of these interactions are still significant when compared with the reference system.To this end, we used eight representative cases from among those described above, with � H ranging from 135 � to 172 � , � O ranging from 94 � to 165 � and � ranging from 0.3 to 2.15 A ˚.In each case, we truncated the Trp moiety to 3-methylindole and the acceptor peptide to N-methylacetamide, added H atoms using the PyMOL script and calculated interaction energies (E int ; see Section 2).The results are shown in Table 1 and Fig. 15.
All interactions show cohesive E int values irrespective of stereochemistry.As expected, the weakest E int values were obtained for those interactions in which the H atom is located significantly out of the sp 2 plane of the acceptor O atom.It appears that the � H and � O angles are less of a factor: both can be as low as �130 � without a significant reduction in E int , as long as the hydrogen is within �0.8A ˚of the sp 2 plane.

Figure 13
The À 4 class of interactions.A double scatter plot (Ramachandran '/ , blue; conformational, � 1 /� 2 , red) for Trp residues in all structural motifs in this class.We also noted that many of the structural motifs that we investigated show d HO distances as short as �2.0A ˚, significantly shorter than the predicted optimal distance of �2.3A ˚. We wondered whether such short interactions, resulting from intramolecular constraints, might be less favourable.
We used PDB entry 3ts3 structure as a model case.We translated the 3-methylindole moiety along the H� � �O line and evaluated E int between 2.5 and 1.8 A ˚(Fig.16).We find that while E int reaches a maximum at 2.3 A ˚, the interaction is cohesive down to �1.85A ˚, which corresponds well to the shortest observed contacts in the crystal structures.There is little loss of energy when the bond is stretched to 2.5 A ˚, consistent with the primarily electrostatic nature of the interaction.

Conclusions
It is well established that main-chain/side-chain interactions mediated by hydrogen bonds are involved in specific conformational motifs, often capping secondary-structure elements such as helices and �-sheets (Eswar & Ramakrishnan, 2000;Krishna Deepak & Sankararamakrishnan, 2016).However, such motifs reported to date invariably involved canonical hydrogen bonds, i.

Figure 15
The stereochemistry of the C �1 -H� � �O C interactions for which energies of interaction have been calculated (Table 1) superposed on the Trp side chain.The PDB codes are shown for each carbonyl O atom.

Figure 16
The dependence of the energy of interaction (E int ) on the d HO distance for an N-methylacetamide and 3-methylindole pair derived from PDB entry 3ts3.The arrow shows the position on the energy curve corresponding to the actual d HO distance in the crystal structure, i.e. 2.02 A ˚.The black arrow indicates the line along which the 3-methylindole moiety was translated to obtain the curve of E int versus distance.
nearby in the sequence.The most common such interactions involve residues one peptide unit downstream, i.e. i +1, or 1-4 peptide units upstream, i.e. i À 1 to i À 4. Interestingly, Trp is found in these motifs in only six of the possible nine conformers, with the i + 1 class containing only p90, t0 and t-105 conformers, while the remaining four classes show Trp only in m105, m0 and p-90 conformations.The frequencies of the high-energy m0 and t0 conformers is increased significantly in those classes where the contacts are strongly restricted by short-range steric constraints, while m105, the most populous class found in proteins, is strongly enriched in the À 3 and À 4 classes.Our work helps to explain the relatively common occurrence of the m0 and t0 classes.It is important to note that the function of Trp residues is intimately contingent on their conformation.For example, Trp in transmembrane helices occurs most often in m0, t0 and p-90 conformations, all of which have been characterized in our study ( de Jesus & Allen, 2013).Of importance is our observation that in the À 3 and À 4 classes Trp is often engaged in capping the acceptor O atom with hydrogen bonds donated by both the amide and C �1 -H groups.
We also present evidence based on quantum-chemical calculations that the short C �1 -H� � �O C contacts revealed by structural data mining are in fact invariably cohesive interactions of the order of approximately half a canonical hydrogen bond, and less sensitive to specific stereochemistry, such as C-H� � �O and H� � �O C angles, than previously thought.The critical factor is the position of the H atom close to the sp 2 plane of the acceptor O atom.

Related literature
The following references are cited in the supporting information for this article: Adams et al. (2010), Emsley et al. (2010) and Kovalevskiy et al. (2018).

Figure 3
Figure 3 Left: a histogram of the number of interactions of Trp C �1 -H with backbone carbonyl O atoms as a function of the distance d CO (green bars) and a mean value of the � H angle in each group, corrected with cubic interpolation.Right: the same statistics for interactions with water molecules.

Figure 6 A
Figure 6 A double scatter plot (Ramachandran '/ , blue; conformational, � 1 /� 2 , red) for Trp residues in all structural motifs in the +1 class.The clusters are identified by type as shown in Fig. 4.
5 � and an � O of 135.4 � , with an average elevation of 0.6 A ˚on the si face.The small m0 cluster contains several motifs with Trp in �, � and left-handed helical secondary conformations.The d HO distances are longer in this cluster, with an average � H of 140.7 � and � O of 136.4 � Examples of a structural motif from each of the clusters are shown in Fig. 12. 3.2.5.The C �1 -H !O C (À 4) class.This is the most ubiquitous and the most diverse motif, owing to the flexibility generated by the insertion of three residues between the acceptor and donor amino acids.Nevertheless, perhaps surprisingly, only the same three conformational clusters are again present: m105, m0 and p-90.The canonical m105 conformer (average � 2 of 99 � ) is by far the most common, with nearly 1500 examples, compared with only 80 examples of p-90 and just 44 of m0 (Fig.

Figure 12
Figure 12 Examples of the three conformational Trp clusters in the +3 class.(a) m0 (PDB entry 5js4; only the C � atoms of non-Trp residues are shown for clarity), (b) m105 (PDB entry 6x8o), (c) p-90 (PDB entry 6qo9).Note that (b) and (c) contain three-centred hydrogen bonds from the Trp C �1 -H and amide groups to the i À 3 carbonyl reminiscent of a 3 10 -helical hydrogen-bonding pattern (canonical amide-to-carbonyl hydrogen bonds are shown as fine dashed lines).
e. those involving N and O atoms.Typical examples are Asx-turns, in which the side-chain carbonyl O atom of Asp or Asn engages the main-chain amide of the i + 2 residue, mimicking a �-turn (D'mello et al., 2022).Similarly, N �1 of the histidine imidazole has been shown to engage with the backbone amide groups (Krishna Deepak & Sankararamakrishnan, 2016).Interestingly, C �1 of Trp occupies a position isosteric to O � of Asx and N �1 of His, and because it is protonated it engages the carbonyl and not the amide groups of the main chain.Our study demonstrates that the C �1 -H group of a tryptophan residue plays an important role in stabilizing unique structural motifs by engaging as an hydrogen-bond donor with main-chain carbonyl O atoms

Figure 14
Figure 14 Examples of the three conformational Trp clusters in the +4 class.(a) m0 (PDB entry 7mzy; only the C � atoms of non-Trp residues are shown for clarity, (b) m105 (PDB entry 4gxw), (c) p-90 (PDB entry 3s92).Note that all motifs contain hydrogen bonds from the Trp C �1 -H and amide groups to the i À 4 carbonyl, capping it with a three-centred bond.