Proteins turn “ Proteans ” – The over 40-year delayed paradigm shift in structural biology : From “ native proteins in uniquely de fi ned con fi gurations ” to “ intrinsically disordered proteins

The current millennium brought up a revolutionary paradigm shift in molecular biology: many operative proteins, rather than being quasi-rigid polypeptide chains folded into unique configurations – as believed throughout most of the past century – are now known to be intrinsically disordered, dynamic, pleomorphic, and multifunctional structures with stochastic behaviors. Yet, part of this knowledge, including suggestions about possible mechanisms and plenty of evidence for the same, became available by the 1950s and 1960s to remain then nearly forgotten for over 40 years. Here, we review the main steps toward the classic notions about protein structures, as well as the neglected precedents of present views, discuss possible explanations for such long oblivion, and offer a sketch of the current panorama in this field.


Introduction
The development of protein theory along with the realization of the functions and structures of nucleic acids are the two most relevant accomplishments in molecular biology, so diverse valuable reviews covering different periods in such advancements are available for both. Surprisingly, however, a scant examination is presently available about the origins and turn of events for the unexpected current change of paradigm regarding the structures of operative proteins, in which the classic model of "native protein" with one fixed distinctive conformation is being rapidly substituted by a novel picture of pleomorphic, intrinsically disordered, and multifunctional polypeptide arrangements [1]. A recent chronology [2] and previous reports [3][4][5][6] deal with the findings behind present conceptions on this matter, but unfortunately discuss little or nothing about key contributions by major previous authors in the field.
Here, we briefly recall such epistemic processes throughout the twentieth century, including those neglected earlier findings and hypothesesthe opportune heeding of which could have greatly accelerated development in structural molecular biologyfollowed by a summary of the main revolutionary advances in recent years. In a prior note on this unexpected conceptual shift, we asked for the reasons behind its surprising delay [7], and now we offer tentative answers. For detailed reviews on the notions about protein structure preceding the stages covered here, starting from the mid-nineteenth century, see previous studies [8,9]. through peptide bonds. At that time, such chains were conceived as made up of just a small number of amino acids, with overall molecular weights of around 4,200 at most. But this simple picture changed dramatically in the 1920s when the German organic chemist and future Nobel Laureate Hermann Staudinger proposed [12], and in the following years amply demonstrated against wide disapproval by most of his colleagues, the real existence of linear macromolecules consisting of large numbers of covalently linked monomers according to August Kekulé's original theory [13], already pointing out that proteins and other high-molecular-weight components of living matter were probably polymers of this sort [14, p. 3042].
Conceiving proteins as long chains of amino acids thus became the predominant view since the early 1930s. Yet, as the noted Chinese biochemist Hsien Wu put it in 1931 [15], it was "a matter of common experience that natural, soluble proteins easily become insoluble under a variety of conditions," a phenomenon which became known as "denaturation." Hence, following a thorough review of the available literature plus numerous experiments of his own on this structural change, Wu concluded that a "soluble protein is not an open chain but a compact structure" [15]. In addition, Mortimer Anson and Alfred Mirsky, after a series of studies with several proteins starting from 1925, showed that the denaturation of hemoglobin and a few other proteins could be reverted with a return of their main properties [16]. Thus, investigating the actual shapes of proteins, i.e., the polypeptides' three-dimensional conformationslater known as tertiary structuresand their changes became a prime goal in protein research.
In one of the first attempts to use X-ray crystallography for inspecting the structure of biomacromolecules, John Bernal and Dorothy Crowfoot inferred that pepsin molecules "are relatively dense globular bodies" [17], a conclusion on which William Astburywho with his co-workers and also using X-ray techniques had recently showed that the elasticity of hair keratin fibers "is based on a protein chain-system which, under the proper conditions, is capable of being stretched to twice or contracted to half its normal length" [18] commented in an immediately adjoining note: "the paradox that the pepsin molecule is both globular and also a real, or potential, polypeptide chain system … which is afterwards folded in some neat manner … is merely an elaboration of the intra-molecular folding that has been observed in the keratin transformation" [19].
Despite these facts, however, Mirsky and Linus Pauling next described "a native protein molecule (showing specific properties) [sic]" as "one polypeptide chain which continues without interruption throughout the molecule …, folded into a uniquely defined configuration … held by hydrogen bonds." [20] Yet, the following year Astbury went back to his suspicion of relativity in protein conformation, reporting that "The tobacco mosaic virus is another protein which has properties both fibrous and globularand, to cut a long story short, it looks now as if the original apparent distinction between the two types is beginning to disappear" [21]. And still, 10 years later he put his conception of changeability in protein shapes with characteristic personal style: "The versatility and variability of the form of the proteins is so remarkable that they might equally well have been called 'proteans'" [22].
Nevertheless, Mirsky's and Pauling's notion of morphologically distinct native proteins was strengthened shortly after when Pauling, in collaboration with Robert Corey and Herman Branson, reported that polypeptide chains present α-helix and β-sheet configurations held indeed in specific positions by hydrogen bonds [23,24]. And next Max Perutzwho had been a student under Bernal in the 1930sarrived at similar conclusions from his lifetime work on the structure of hemoglobin, paying special attention to Pauling ideas [25], while John Kendrewworking at Perutz labdetermined a specific molecular structure also in myoglobin [26]. Therefore, when Christian Anfinsen established in 1973 that protein structure is determined by its amino acid sequence [27], the uniquely folded configuration idea took hold.

Issues on the way
Still, the field was far from being firmly consolidatedor even on the way to becoming soin the early 1950s because of some basic considerations. One of these was that the very peptide-bond linking between amino acids, communicated simultaneously by Hofmeister [10] and Fischer [11] since 1902, started to generate doubts and alternative models from the 1930s [28,29]. Caution over this matter was shown even by Frederick Sanger who, in a review that appeared shortly after his milestone 1951 publications in collaboration with Hans Tuppy on the sequence of amino acids in insulin [30,31] the first protein to be completely sequenced -flatly stated that "while this peptide theory is almost certainly valid…, it should be remembered that it is still a hypothesis and has not been definitely proved" [32]. And it would still take over a decade more to confirm the polypeptide structure of insulin through its artificial synthesis [33,34].
In addition, another concern about the "native protein molecule" with a "uniquely defined configuration" was that from the start this concept became commonly associated with functional specificity, as an obvious explanation inherited from Fischer's famous analogy that enzyme and substrate "must fit one another like lock and key" for effective chemical interaction [35]. Yet, from at least the 1960s, data of a "broad specificity" and "considerable substrate ambiguity" of many enzymes began to appear [36]. Furthermore, the new term "prion" was proposed as an elision of "proteinaceous infectious" [37] in order to denominate strange proteinaceous particles of an uncertain chemical structure known to cause degenerative diseases of the nervous system.

Parallel emergence of alternative views
In the meantime, other approaches to understand the nature of huge natural molecules such as large polysaccharides, proteins and nucleic acids had also been in rapid development. Thus, Staudinger, who from studies in the mid-1930s had realized that glucose residues may be found forming single fibrils in cellulose, as well as branched arrays in starch, and also globular bodies in glycogen [38] became convincedconcurrently with Astbury [39] that most organic macromolecules could be classified into spheroidal and linear shapes [40] [pp. 231-233].
Staudinger took up again this issue in the mid-1950s when, in collaboration with his wife Magdaa plant physiologist -, published a 73-page long chapter on the importance of macromolecular chemistry for protoplasm research [41]. Here they provided a simple explanation for transitions between globular and unfolded molecular configurations, based upon double-refraction studies with flowing solutions of proteins and other macromolecules carried out by their former student Rudolph Signer [42]: "small molecules in solution move as far apart as possible by diffusion. If these small molecules are now bound into a thread molecule, this thread molecule will take on an elongated form as possible due to the same tendency in solution. Conversely, however, the bi-molecular forces between the individual groups of a thread molecule cause an attraction, so that it folds more or less depending on the size of these intra-molecular forces" ( [41], pp. [37][38]. Henceit is noted -"a number of intermediate states are possible between solid and dissolved or liquid, all of which are related to the shape of the macromolecules", also pointing out that "the shape of the macromolecules determines essential properties that are required and exploited by the living substance" ( [41], p. 40).
Still, the Staudingers were cautious in not extending this conclusion to all biomacromolecules as yet, stating: "To what extent such phenomena play a role in the performance of proteins cannot be decided today. However, it may be assumed that linear macromolecular substances in the cell can, in addition to normal swelling phenomena [i.e., unfolding by mere interaction with the solvent], also be subjected to this peculiar type of swelling [i.e., unfolding countered by attraction between individual groups in the polymer] if necessary to increase or decrease their reactivity" ( [41], p. 45).
This view of inconstant protein shapes was next thoroughly studied with optical rotatory dispersion methods by Yen Tsi Yang and Paul Doty, who tentatively surmised that "in solution, typical globular proteins have only 20-40% of amino acid residues in the helical configuration" [43], as well as by Bruno Jirgensons who pointed out that "configurations other than rodlike helices and disordered, flexible coils should be considered," adding expressions proper of today's parlance in the field: "the major part of a macromolecule of a globular protein in aqueous solution is rather flexible and disordered" [44]. In a later article, picking up the lead from Staudinger and Astbury, Jirgensons expanded this broader look into a classification of proteins according to their conformations, which he found could have high, low and zero α-helix contents [45].
Thus, not only the notion of proteins as polypeptide chains rigidly folded into defined configurations was being discussed already in the 1930s to 1970s period, but significant experimental results were also becoming available which pointed to difficulties with that conception. So, why then could the classic model of "native proteins" survive almost intact for over six decades?

Neglect of alternative views
Disregarding the findings and views discussed in the previous section may have been due to several factors, starting from World War II since 1939, which hampered scientific research across Europe and interrupted the publication of many journals, thus also affecting scientific progress in the United States and other countries. Staudinger's laboratories, for one, "were almost destroyed during an air raid on Freiburg" in late November 1944 ( [40], p. 6), and after this conflict was over the already aging man opted for partial retirement. The importance of his contributions to an appropriate understanding of molecular biology was clearly perceived early on by modern biology historian Robert Olby [46,47], and much later to a lesser degree also by other historians on this matter [28,29]. One possible factor for Staudinger's work on the special properties of macromolecules to pass nearly unnoticed by biochemists and molecular biologists through several decades may have been that, although from the mid-1930s up to the mid-1950s he insisted on the importance of macromolecular chemistry for cell biology, providing detailed explanations with figures, tables and references in articles, a long book chapter [41] and even a full book [48], almost all of his over 800 publications appeared in German while English was becoming already the language of choice for most of the scientific literature on biology subjects.
Less understandable, however, is that Staudinger's views about biomacromolecules were not taken into account even after he was awarded the 1953 Nobel Prize in Chemistry, precisely "for his discoveries in the field of macromolecular chemistry." Furthermore, multiple remarks on biomolecular topics are found in his official Nobel lecture, an address in which he chose to conclude with the following final statement: "In the light of this new knowledge of macromolecular chemistry, the wonder of Life in its chemical aspect is revealed in the astounding abundance and masterly micromolecular architecture of living matter" [49].
Yet, most of the other indications of the inadequacy of the standard "native protein" conviction appeared in English, usually in journals with large readerships among biologists. Thus, Jirgensons continued to study conformational transitions, reorganizations and interactions of a number of proteins under different conditions up to the early 1980s, e.g., [50][51][52], getting fair number of citations for the same in the meantime; but his wide-view seminal paper on the subject [45] has just 31 citations so far, with none from 1990 to 2006i.e., only after the new paradigm was already in full swingand he is not cited either in major reviews on the history of protein research [28,29,53].
Another distraction from these key precedents was the general belief that proteins, multifarious as they are, should be the bearers of genetic information whereas nucleic acids were often dismissed as "'dull' and even 'idiotic' molecules" [54]. In fact, the first depiction of the actual polymeric structure of DNA, reported since 1935 by Phoebus Levene and R. Stuart Tipson [55] remained almost forgotten too, and their basically correct hypothesis about the polynucleotide nature of DNA was disparaged as a "scientific catastrophe" [56] and other disdainful expressions before it finally became the theoretical scaffolding over which current research in nucleic acids was built [57]. Thereafter, molecular biology became mostly concerned with DNA transcription, mRNA translation and protein synthesis processes, all under the basic assumption that newly constructed polypeptides would eventually adopt their unique native structures, either by themselves as a result of purely thermodynamic factors or with the assistance of chaperones, and then would remain so up to the end of their useful terms.
Thus, like the central dogma of molecular biology before reverse transcription was discovered, the "native state protein" dogma remained firmly in place for the following decades, being duly transmitted in textbooks and classrooms to new generations of molecular biologists. Fortunately, however, after this time novel developments were rapidly occurring in the understanding of the actual state of many operative proteins within cells.

A new vision develops
The whole picture started to change at the turn of this century with the realization that a large proportion of operative proteins are intrinsically disordered [2][3][4][5][6] as previously suggested by Staudinger in the 1950s [41] and concluded also by Jirgensons and others in the 1950s and 1960s [43][44][45] although still without recognizing these key precedents. And this paradigm shift moved swiftly forward when it was learned that the absence of a specific three-dimensional conformation in long stretches of many proteinsother than short linkers and hanging tails at the ends of structured regionsis due to the very sequence of amino acids [58], formerly held since Anfinsen [27] as the direct factor of the classic native state. Moreover, such "intrinsically disordered proteins" (IDPs) may haveagain as Jirgensons had reported [45] both ordered and intrinsically disordered regions [59,60] with varying degrees of flexibility [61]. An example of this condition is the molten globe state of globular proteins, described as a rather common intermediate between the native and the unfolded states [62].
Furthermore, it was also discovered that the threedimensional structure of active proteins is not nearly static but quite dynamic, each with specific characteristics of flexibility, amplitude, energy-landscape and timescale of their molecular motions [63][64][65]. And it was shown that some of them fold specifically upon binding to proper receptors [66], in many cases constituting interactive networks involved in intracellular signaling [67]. In addition, membrane-tethered regions of IDPs are now being found to cause phase separations within the cytoplasm [68], which in turn may affect biochemical activity in other processes [69]. In addition, the adverse side of such flexibility and mobility in proteins is misfolding, which can lead to the formation of complex amyloid aggregates that may cause intractable diseases of the nervous system like Alzheimer's and Parkinson's [70].
Thus, increasing numbers of proteins are presently regarded as "proteans," just like Astbury contemplated them back in 1947 [22]. And such capacity for frequent or constant molecular reshaping involves diverse intermediate configurations, which in turn may trigger further dynamic behaviors as illustrated by the signaling cascades initiated by G-protein-coupled receptors [71], and often involve transient high-energy states that just recently are being elucidated [72]. Now, it must be realized that most of all this occurs within a crowded scenario in which internal movements due to changing pressures and translocations take placeoften simultaneously in opposite directions as within the long and slender neurites of nerve cellsdriven by kinesins, myosins and other molecular motors pulling over microtubules or actin filaments for the transport of different organelles [73], or for partial retractions of the cytoskeleton [74]. Accordingly, the cytoplasm is being revealed as an in fact stormy place [75] where operative proteins behave in stochastic ways [1], instead of as the comparatively quiet and orderly setting previously envisaged, and therefore living cells cannot continue to be regarded as disciplined machines where everything runs smoothly [76]. Moreover, even the common terminology has become complicated in this field, for previous distinctions between native, bioactive, condensed, unfolded and denatured protein structures are not now as clear as they used to be.
Further comprehension of what actually happens within the intracellular space will undoubtedly be helped by the continuing advance of computerized simulations, first attempted in 1975 for studying protein folding [77], later used for the determination of secondary structures in proteins [78], and currently taking advantage of the novel AlphaFold and presumably other upcoming software [79]. In addition, an NMR spectroscopy-based method has recently proved successful in studying high-energy states in dynamic protein ensembles [72].
Still, given the above revelations in the current century, the epistemic questions today are how much the theory of proteins would have advanced by now if the early views published by several insightful forerunners had been opportunely taken into account, and specially what could be done to prevent such a loss of roughly 50 years from ever occurring again.
Acknowledgements: The authors are specially indebted to Prof. Jeffrey Seeman for his comments and lively discussions on a former version of the manuscript, as well as to Selene Rangel, Marisela Mondragón and Alberto Zurita for their excellent help in providing some hard-tofind bibliography items.
Funding information: The authors state that no funding other than their institutional salaries were involved in this project.
Author contributions: Eugenio Frixione conceived and designed the study, organized the bibliography and wrote the text. Lourdes Ruiz-Zamarripa searched for and collected bibliographical sources, arranged them in chronological order, and pointed out overlooked items of interest.