’ A careful disorderliness ’ in biomolecular structure revealed by Raman optical activity ☆

and

Following its first observation 50 years ago Raman optical activity (ROA), which refers to a circular polarization dependence of Raman scattering from chiral molecules, has evolved into a powerful chiroptical spectroscopy for studying a large range of biomolecules in aqueous solution.Among other things ROA provides information about motif and fold as well as secondary structure of proteins; structure of carbohydrates and nucleic acids; polypeptide and carbohydrate structure of intact glycoproteins; and protein and nucleic acid structure of intact viruses.Quantum chemical simulations of observed Raman optical activity spectra can provide complete threedimensional structures of biomolecules, together with information about conformational dynamics.This article reviews how ROA has provided new insight into the structure of unfolded/disordered states and sequences, ranging from the complete disorder of the random coil to the more controlled type of disorder exemplified by poly L-proline II helix in proteins, high mannose glycan chains in glycoproteins and constrained dynamic states of nucleic acids.Possible roles for this 'careful disorderliness' in biomolecular function, misfunction and disease are discussed, especially amyloid fibril formation.

Introduction
It gives me great pleasure to contribute a paper to this special issue of Spectrochimica Acta Part A honouring the immense contributions Professor Laurence A. Nafie has made to the development and application of vibrational optical activity (VOA) spectroscopy.His authoritative book Vibrational Optical Activity [1] bears witness to his 'hands on' experience of, and expertise in, virtually every theoretical and experimental aspect of both infrared and Raman manifestations of the subject.I have known Larry since the early 1970s when we both embarked on VOA studies, Larry focusing initially on the vibrational circular dichroism (VCD) and myself on the Raman optical activity (ROA) approach.I have enjoyed and benefitted greatly from his work in the intervening years.Fig. 1 shows the author with Larry and Werner Hug, another early pioneer of VOA, in Switzerland in 1976.
The first observations of vibrational Raman optical activity (ROA), achieved in Cambridge in 1972 as a small difference in the intensity of Raman scattering in right-and left-circularly polarized incident light (incident circularly polarized, ICP, ROA) [2] constituted the first observations of the long-sought vibrational optical activity of molecular origin.This, together with the first observations of infrared vibrational circular dichroism (VCD) reported in 1974 [3], heralded an exciting new era in the study of chiral molecules.It took a further two decades for ROA instrumentation to be developed sufficiently for this small effect to be observed in the molecules of life, a watershed event being the first observations in peptides and proteins in 1990 [4].Studies of a plethora of biomolecules including peptides, proteins, carbohydrates, glycoproteins, nucleic acids and even intact viruses quickly followed [5,6].As well as giving information about the structure of primary folded states of proteins [7,8], ROA can provide incisive information about the structure and behavior of unfolded/disordered states and sequences [9,10].This article provides a personal perspective, based mainly on the work of the author's group in his Glasgow laboratory, on the new insights ROA has provided into a 'careful disorderliness' that it has uniquely revealed in the states of biomolecules and how this may facilitate aspects not only of their normal function but also misfunction and disease.

A 'Eureka Moment': The amyloidogenic prefibrillar state of destabilized human lysozyme
Many protein misfolding diseases involve formation of amyloid fibrils, as in amyloidosis from mutant human lysozymes, neurodegenerative diseases such as Parkinson's and Alzheimer's due to the fibrillogenic propensities of α -synuclein and tau, respectively, and the prion encephalopathies such as scrapie, BSE and new variant CJD where amyloid fibril formation is triggered by exposure to the amyloid form of the prion protein.In addition, aggregation of serine protease inhibitors such as α 1 -antitrypsin is responsible for diseases such as emphysema and cirrhosis.
It has been suggested that poly L-proline II (PPII) helix is the 'killer conformation' in some of these amyloid diseases [11].This was prompted by an ROA study of human lysozyme, the thermal denaturation behavior of which is subtly different from that of hen lysozyme.Below pH ~ 3.0 the thermal denaturation of human lysozyme is not a two-state process.Unlike hen lysozyme, it supports a partially folded state at elevated temperatures.Incubation at 57 • C and pH 2.0, under which conditions the partially folded state is the most highly populated, induces the formation of amyloid fibrils while incubation at 70 • C and pH 2.0, under which conditions the fully denatured state is the most highly populated, leads to the formation of amorphous aggregates.ROA revealed that the α -domain of the prefibrillar state is destabilized, with the formation of PPII structure at the expense of α -helix [11]: as shown in Fig. 2, the strong positive α -helix bands at ~1340 and 1300 cm 1 have been replaced by a strong positive band centred at ~1318 cm − 1 assigned to PPII-helix.The same study revealed that hen lysozyme, which has a much lower propensity to form amyloid fibrils than the human variant, is virtually native-like under the same conditions of low pH and elevated temperature.
Notice also that, although the parent Raman band at ~1551 cm − 1  Adapted from [11] L.D. Barron assigned to the tryptophan residues is still clearly discernible in the destabilized protein, the associated ROA band (red) has disappeared.This highlights the remarkable sensitivity of VOA generally and ROA in particular, compared with standard 'achiral' spectroscopies, to the dynamic aspects biomolecular structure arising from cancellation of signals from conformers with 'opposite' chirality [5]; in this case due to conformational heterogeneity among the tryptophans in the destabilized protein.
Although originally defined for the conformation adopted by polymers of L-proline, the PPII-helix can be supported by amino acid sequences other than those based on L-proline and has been recognized as a common structural motif within the longer loops in the X-ray crystal structures of many proteins [12].As illustrated in Fig. 3a, PPII-helix consists of a left-handed extended helical conformation with threefold rotational symmetry in which the ϕ,ψ angles of the constituent residues are restricted to values around − 78 • , +146 • corresponding to a region of the Ramachandran surface adjacent to the β-region (Fig. 3b).The extended nature of the PPII-helix precludes intrachain hydrogen bonds, the structure being stabilized instead by main chain hydrogen bonding to water molecules and side chains.PPII-helix attracts interest as a major conformational element of disordered polypeptides and unfolded proteins in aqueous solution.It can be distinguished from random coil in polypeptides using other optical spectroscopies [13], but these techniques have difficulty in identifying it when other conformational elements are present, as in proteins.However, it is readily identified even in proteins using ROA, the main signature being the strong positive ROA band in the extended amide III region at ~1318 cm 1 mentioned above.This and other PPII ROA signatures have been validated by, inter alia, quantum-chemical calculations for model peptide structures clustering in the PPII region of the Ramachandran surface and which closely reproduce the experimental ROA spectrum of the model peptide XAO known to adopt the PPII conformation [14].

Amyloid fibril formation
PPII-helix appears to be a favorable conformation for amyloid fibril formation for the following reasons.Elimination of water molecules between extended PPII chains having hydrated backbone C--O and N -H groups to form β-sheet hydrogen bonds is a favorable thermodynamic process (Fig. 3c).Since PPII chains are close in conformation to β-strands, they would be expected to readily undergo this type of aggregation with each other and with established β-sheet.Subsequent ROA studies of the natively unfolded brain proteins α -synuclein and tau, which have a propensity to form amyloid fibrils associated with Parkinson's and Alzheimer's disease, respectively, revealed that they consist largely of PPII helical sequences [15].Their ROA spectra (Fig. 4) are very similar to that of the fibrillogenic destabilized state of human lysozyme in Fig. 2, albeit the latter has several additional bands associated with β-sheet.A later comprehensive study of α α -synuclein in various states displays a similar spectrum to that in Fig. 4 for the intrinsically disordered state [16].
Although disorder of the PPII type may be essential for the formation of regular fibrils, the presence of a large amount of PPII structure does not necessarily impart a fibrillogenic character since not all PPII-rich non-regular protein structures form amyloid fibrils.For example, the natively unfolded casein milk proteins and the brain proteins β-and γ-synuclein show little propensity for amyloid fibril formation and are not associated with disease, yet their ROA spectra [15] are very similar to that of α -synuclein, which is highly amyloidogenic, showing they are similarly based mainly on PPII-helix.A more complete understanding of the fibrillogenic propensity of a particular sequence, PPII or otherwise, requires knowledge of the various physicochemical properties of the constituent residues.In particular, a combination of high net charge and  Adapted from [15] low mean hydrophobicity, which tends to keep the chains apart, has been shown to be an important prerequisite for protein sequences to remain natively unfolded [17,18].

Max Perutz poses a key question
In his later career Max Perutz (who shared the 1962 Nobel Prize for Chemistry with John Kendrew for their determination of the X-ray crystal structures of hemoglobin and myoglobin) took an interest in protein misfolding and disease which at the time was attracting much interest.In 1997 he posed an incisive question [19], paraphrased as follows: Why do some proteins assemble into ordered fibrils instead of just forming amorphous precipitates like other proteins when they are destabilized?
ROA studies have provided the following answer: In natively unfolded proteins the physicochemical properties of their constituent amino acids have been tuned by natural selection to inhibit association of the constituent PPII-helical sequences into amyloid fibrils (high net charge, low hydrophobicity).Mutations can tip the balance, as in familial early onset Parkinson's and Alzheimer's, disease, lysozyme amyloidosis, etc.But in natively folded proteins that have been destabilized, if newly exposed sequences are dominated by PPII-helix, they may not have been chosen by natural selection to resist association into amyloid fibrils (like in human lysozyme).

Natively unfolded proteins
ROA is proving valuable for the study of proteins that are unfolded in their native functional state.Such 'natively unfolded' or 'intrinsically disordered' proteins are now recognized as constituting a major structural class that have a variety of important functions [18,20,21].ROA has helped to establish this new area of protein science [9].A study of residual structure in disordered peptides and unfolded proteins carried out via multivariate analysis (nonlinear mapping, NLM) and ab initio simulation of ROA spectra revealed striking differences between the structural characteristics of natively unfolded proteins and proteins unfolded by denaturation [10].The former tend to cluster in the mainly disordered/irregular region of the 2D NLM plot and contain a significant amount of the PPII conformation; the latter appear in other regions and contain significant amounts of β-sheet in the case of reduced proteins, and of α -helix in the case of acid molten globules.Multivariate analysis was also used to reinforce the visual interpretation of the ROA spectra measured during a study which found dramatic differences in the influence of Cu 2+ and Mn 2+ ions on prion protein folding, perhaps due to their distinct coordination and hydration characteristics: the former induced a mainly disordered or irregular structure, whereas the latter reinforced the α -helix content at the expense of PPII [22].
As well as the examples of the brain proteins α -synuclein and tau in Fig. 4, two other examples of the ROA spectra of natively unfolded proteins dominated by PPII-helix bands are represented by bovine β-casein [15] and wheat ω-gliadin [23] in Fig. 5.The reason that caseins, which constitute nearly 80% of bovine milk, exist in PPII-rich natively unfolded states could be that this facilitates digestion, since the open rheomorphic (meaning 'moving shape' [15]) structures allow rapid and extensive degradation to smaller peptides by proteolytic enzymes, thereby facilitating absorption of nutrients.The PPII-rich natively unfolded structures of cereal proteins like gliadins may serve an analogous purpose since they provide nutrition for seedlings.The manner in which hydration stabilizes the PPII helical conformation in a plastic adaptable rheomorphic fashion is especially important since it enables disordered sequences in folded and natively unfolded proteins to perform many essential functions [12,24,25].In particular, the interconversion between the PPII element of disordered structure and α-helix appears to be a facile process which may be important in many order − disorder transitions.For example, it is primarily α -helix that melts into PPII structure when human lysozyme is destabilized (vide supra), some natively unfolded proteins are known to adopt an α -helical conformation when binding to partner functional proteins [20], and PPII converts to α -helix in the Mn 2+ -refolded prion protein (vide supra) [22].Furthermore, hydrated PPII structure may facilitate protein folding since it pre-organizes the unfolded state thereby lowering the entropy and reducing the conformational space to be searched [13].

Disordered sequences in natively folded proteins, especially viral coat proteins
It is emphasised that ROA has revealed very few examples of genuine random coil within disordered sequences in proteins, either natively folded or unfolded.The only clear examples to date are the Cu 2+refolded prion protein (vide supra) [22], and the high-mannose glycoproteins invertase and mucin whose ROA spectra contain no signals from PPII, α -helix or β-sheet secondary polypeptide structure, perhaps due to the high glycosylation forcing the polypeptide into a completely disordered state [26].Otherwise, disordered sequences comprise mainly PPII structure.

L.D. Barron
ROA spectra [5,27] associated with the many long disordered loops connecting the β-domains and for which this structural element imparts the necessary flexibility to facilitate binding to antigens.Other examples of PPII-rich natively folded proteins revealed by ROA include subtilisin Carlsberg which contains several long disordered loops in its α / β fold [6], type III fish antifreeze protein [10] and the Bowman-Birk inhibitor [28] both of which have X-ray/NMR structures dominated by long disordered loops.Furthermore, the X-ray crystal structure of α -type human estrogen receptor reveals a PPII-helical sequence DAEPPILYSEY in the ligand-binding domain of the protein: the corresponding free peptide was shown by ROA and far-UV CD to be well-structured in PPII with a very similar ROA spectrum to those of α -synuclein and tau in Fig. 4 [29].This last example provides unequivocal evidence for PPII structure with a clear functional role in a disordered protein loop (it antagonizes estradiol-induced transcription).
Fig. 6 displays the ROA spectra of two especially significant examples of folded proteins which show this typical PPII-helix band.The fulllength ovine prion protein PrP 23-230 , which contains a long N-terminal tail, shows a strong positive ROA band at ~1315 cm − 1 that disappears in the truncated protein PrP 90-230 in which most of the tail has been cut off [30], revealing that the N-terminal tail comprises mainly PPII structure.The coat proteins of the icosahedral satellite tobacco mosaic virus (STMV) contain three jelly roll β-sandwich domains linked by many long disordered loops which the strong positive ~1316 cm − 1 band in its ROA spectrum in Fig. 6 reveals comprise large amounts of PPII structure.The ROA spectrum of the icosahedral cowpea mosaic virus (CPMV), the coat proteins of which have the same fold as in STMV, is very similar to that of STMV again with the strong PPII band [31].The ROA spectrum of the cylindrical tobacco rattle virus (TRV) [32] reveals that the coat proteins contain a large amount of PPII structure, much more than the cylindrical tobacco mosaic virus (TMV) and could serve, inter alia, to fill the extra volume required by the larger diameter of the TRV particles relative to those of TMV.
The importance of intrinsic disorder in viral proteins is being increasingly recognized, especially its ability to facilitate rapid changes at almost all stages of their life cycle, with many functions attributed to disordered regions.However, two extensive reviews on the subject [33,34] make no mention of PPII structure; instead they envisage intrinsically disordered proteins and sequences as dynamic ensembles having backbone Ramachandran angles that vary significantly over time.This is at odds with the ROA data which show well-defined characteristic viral protein bands like the strong positive ~1316 cm − 1 band in STMV (Fig. 6), the sharpness of which is strong evidence for a welldefined 'carefully disordered' PPII conformation rather than a Fig. 6.Backscattered ICP Raman and ROA spectra of the full-length and truncated ovine prion protein (top and middle pair, adapted from [30]) and of satellite tobacco mosaic virus (bottom pair, adapted from [31]) all in aqueous buffer.dynamic ensemble.They do, however, acknowledge that intrinsically disordered proteins and sequences have a significantly higher degree of hydration compared with folded globular proteins.This accords with the central role of hydration in stabilizing and conferring the 'careful disorderliness' on PPII-helical sequences (vide supra) and suppressing a more dynamic ensemble.The intense interest in the role of intrinsically disordered proteins and sequences on viral infection and treatment triggered by the COVID-19 pandemic [35] makes these considerations especially timely.

PPII structure and the immune response
Short PPII sequences may be important for the immune response.This idea arose from an ROA study of PPII structure in poly(L-lysine) dendrigrafts (DGLs, Fig. 7), which revealed that, although generation 1 supported predominantly the PPII conformation, the PPII content steadily decreases with increasing generation, with a concomitant increase in other backbone conformations [36] thereby generating truly disordered sequences.This behaviour may be due to increasing crowding of the lysine side chains, together with suppression of backbone hydration, with increasing branching.Suppression of the PPII content of DGLs with increasing branching could be associated with their nonimmunogenic properties.Disorder is known to be crucial in the immune response: short disordered peptides are good antigens, whereas long disordered regions and intrinsically disordered proteins initiate only weak immune responses or are completely nonimmunogenic [37].This suggests that a high content of short well-defined PPII sequences may elicit a strong immune response.

Nucleic acids
Although nucleic acids do not appear to support a 'careful disorderliness' directly analogous to that supported by polypeptide sequences in proteins, ROA has nonetheless revealed evidence for constrained dynamic aspects of nucleic acid structure which may have relevance to biological function as in the protein-nucleic acid interactions central to translation, transcription and replication.For example, ROA measurements on double-stranded poly(rA).poly(rU)between 20 and 45 • C revealed that, with increasing temperature but still below the melting temperature, the same average A-type conformation is conserved but the degree of structural mobility increases and that this mobility is correlated through the bases, glycosidic links, sugar rings and phosphate groups of both strands [38].It could be no coincidence that this premelting regime, where the polyribonucleotides display significant mobility but with the complete molecules still retaining their overall three-dimensional integrity (retaining an A-helical form) corresponds to physiological temperatures.
This study was extended by measuring the intensities of certain key ROA bands in both single-stranded poly(rA) and double-stranded poly (rA).poly(rU) over the temperature range 2-45 • C which provided evidence for a previously undetected glass-like transition at ~15-18 • C embracing the entire biopolymer [39].This could be associated with the freezing of motions between conformational substates, perhaps in conjunction with changes in water structure around the biopolymer.
In icosahedral viruses, only those parts of the RNA core in contact with the capsid inner wall are sufficiently ordered to reveal structures via X-ray crystallography [40].In a remarkable experiment, the ROA spectrum of the core RNA in CPMV was isolated by subtracting the ROA spectrum of the empty protein capsid from that of the intact virus [31].It took the form of a model A-type RNA single-stranded helix in aqueous solution, showing that the viral RNA adopts this conformation despite most being too mobile to diffract X-rays.

Glycan chains in glycoproteins
Like disordered sequences in proteins, glycan chains in glycoproteins are also disordered, albeit in a rather different way, to provide the flexibility necessary for optimum interactions with other biomolecules like antigens.Presumably this requires a type of 'careful disorderliness' analogous to that found in protein sequences rather than a complete random coil type of disorder.ROA studies have provided hints of this with regard to high-mannose N-linked glycan chains like the example in Fig. 8. First, a study of the intact glycoprotein α 1 -acid glycoprotein [41] revealed ROA band patterns very similar to those seen in the free disaccharide N,N'-diacetylchitibiose which provides the N-links to asparagine residues.This suggests that the N,N'-diacetylchitibiose cores of the glycan chains in the intact glycoprotein retain a relatively rigid conformation.Second, an ROA study of three mannobioses found in high-mannose glycan chains [42] revealed that, unlike other disaccharides, their ROA spectra show no bands characteristic of their glycosidic linkage types, suggesting significant conformational freedom around the glycosidic links.This conclusion was reinforced by accompanying molecular dynamics and density functional computations which revealed a high level of delocalization of the corresponding vibrational modes.This remarkable and unusual conformational freedom may have functional significance for mannose oligomers as the key constituents of glycoproteins, supporting a 'careful disorderliness' within the array of high mannose glycan chains attached to rigid Nlinked N,N'-diacetylchitibiose stems, thereby providing a controlled  The spike proteins of the Covid-19 virus are, like other coronaviruses, glycoproteins in which the carbohydrates help mask viral proteins from the host's immune system.They could be good samples for future ROA studies since, unlike the standard techniques of structural biology, ROA would be able to characterize both the glycan and polypeptide parts of the intact spike proteins.An added bonus is that, as recently reported [43], several peptide sequence within the spike proteins are highly fibrillogenic so it would be interesting to look for PPII signatures in their ROA spectra.

Conclusion
These results suggest that the dictum "there are some enterprises in which a careful disorderliness is the true method" (Herman Melville, Moby Dick), with PPII-helix the quintessential 'carefully disordered' conformation, seems to be just as applicable to biomolecular structure and function, and perhaps to the entire physicochemical basis of life, as it was to chasing whales around the globe in the nineteenth century!

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

•
ROA is an incisive probe of unfolded/ disordered states of biomolecules.•Polyproline II helix is the quintessential 'carefully disordered' protein sequence.• Association of polyproline II strands into β-sheet may generate amyloid fibrils.• Long disordered sequences in viral coat proteins comprise mainly PPII structure.• Mannose oligomers impart a 'careful disorderliness' on glycan chains in glycoproteins.

Fig. 2 .
Fig. 2. Backscattered ICP Raman and ROA spectra of native human lysozyme (top pair) and of the destabilized prefibrillar intermediate (bottom pair).The α -helical domain (yellow bands) has melted into PPII structure (green band) in the destabilized protein, and the tryptophan band (red) has disappeared.Adapted from[11]

Fig. 3 .
Fig. 3. (a) The extended left-handed poly-L-proline II helical conformation.(b) The Ramachandran potential energy surface.(c) Extended polypeptide chains with fully hydrated backbone C--O and N -H groups.In certain circumstances, chain association may occur via formation of β -sheet hydrogen bonds, each involving elimination of three water molecules (W) between the chains.

Fig. 8 .
Fig. 8.A typical high mannose glycan chain attached to a rigid N-linked N,N ′diacetylchitibiose stem.