Skip to main content
Log in

The evolution of proteins from random amino acid sequences: II. Evidence from the statistical distributions of the lengths of modern protein sequences

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

This paper continues an examination of the hypothesis that modern proteins evolved from random heteropeptide sequences. In support of the hypothesis, White and Jacobs (1993, J Mol Evol 36:79–95) have shown that any sequence chosen randomly from a large collection of nonhomologous proteins has a 90% or better chance of having a lengthwise distribution of amino acids that is indistinguishable from the random expectation regardless of amino acid type. The goal of the present study was to investigate the possibility that the random-origin hypothesis could explain the lengths of modern protein sequences without invoking specific mechanisms such as gene duplication or exon splicing. The sets of sequences examined were taken from the 1989 PIR database and consisted of 1,792 “super-family” proteins selected to have little sequence identity, 623 E. coli sequences, and 398 human sequences. The length distributions of the proteins could be described with high significance by either of two closely related probability density functions: The gamma distribution with parameter 2 or the distribution for the sum of two exponential random independent variables. A simple theory for the distributions was developed which assumes that (1) protoprotein sequences had exponentially distributed random independent lengths, (2) the length dependence of protein stability determined which of these protoproteins could fold into compact primitive proteins and thereby attain the potential for biochemical activity, (3) the useful protein sequences were preserved by the primitive genome, and (4) the resulting distribution of sequence lengths is reflected by modern proteins. The theory successfully predicts the two observed distributions which can be distinguished by the functional form of the dependence of protein stability on length.

The theory leads to three interesting conclusions. First, it predicts that a tetra-nucleotide was the signal for primitive translation termination. This prediction is entirely consistent with the observations of Brown et al. (1990a,b, Nucleic Acids Res 18:2079–2086 and 18: 6339-6345) which show that tetra-nucleotides (stop codon plus following nucleotide) are the actual signals for termination of translation in both prokaryotes and eukaryotes. Second, the strong dependence of statistical length distributions on sequence-termination signaling codes implies that the evolution of stop codons and translation-termination processes was as important as gene splicing in early evolution. Third, because the theory is based upon a simple no-exon stochastic model, it provides a plausible alternative to a limited universe of exons from which all proteins evolved by gene duplication and exon splicing (Dorit et al. 1990, Science 250:1377–1382).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Barker WC, George DG, Hunt LT, Garavelli JS (1991) The PIR protein sequence database. Nucleic Acids Res Suppl 19:2231–2236

    Google Scholar 

  • Blake CCF (1983) Exons—present from the beginning? Nature 306:535–537

    Google Scholar 

  • Bossi L, Roth JR (1980) The influence of codon context on genetic code translation. Nature 286:123–127

    Google Scholar 

  • Brown CM, Stockwell PA, Trotman CNA, Tate WP (1990a) The signal for termination of protein synthesis in prokaryotes. Nucleic Acids Res 18:2079–2086

    Google Scholar 

  • Brown CM, Stockwell PA, Trotman CNA, Tate WP (1990b) Sequence analysis suggests that tetra-nucleotides signal the termination of protein synthesis in eukaryotes. Nucleic Acids Res 18:6339–6345

    Google Scholar 

  • Cavalier-Smith T (1985) Selfish DNA and the origin of introns. Nature 315:283–284

    Google Scholar 

  • Chan HS, Dill KA (1990) Origins of structure in globular proteins. Proc Natl Acad Sci USA 87:6388–6392

    Google Scholar 

  • Darnell JE (1978) Implications of RNA-RNA splicing in evolution of eukaryotic cells. Science 202:1257–1260

    Google Scholar 

  • Dill KA (1985) Theory of the folding and stability of globular proteins. Biochemistry 24:1501–1509

    Google Scholar 

  • Doolittle RF (1979) Protein evolution. In: Neurath H, Hill RL (eds) The proteins, vol IV. Academic Press, New York, pp 1–118

    Google Scholar 

  • Doolittle RF (1991) Counting and discounting the universe of exons. Science 253:677–679

    Google Scholar 

  • Doolittle WF (1978) Genes in pieces: were they ever together? Nature 272:581–582

    Google Scholar 

  • Doolittle WF (1990) Understanding introns: origins and functions. In: Stone EM, Schwartz RJ (eds) Intervening sequences in evolution and development. Oxford University Press, New York, pp 43–62

    Google Scholar 

  • Dorit RL, Schoenbach L, Gilbert W (1990) How big is the universe of exons? Science 250:1377–1382

    Google Scholar 

  • Dorit RL, Gilbert W (1991) The limited universe of exons. Cur Opinion Struc Biol 1:973–977

    Google Scholar 

  • Eck RV, Dayhoff MO (1966) Evolution of the structure of ferredoxin based on living relics of primitive amino acid sequences. Science 152:363–366

    Google Scholar 

  • Flory PJ (1953) Principles of polymer chemistry. Cornell University Press, Ithaca, NY, pp 1–672

    Google Scholar 

  • Gilbert W (1978) Why genes in pieces? Nature 271:501

    Google Scholar 

  • Hanyu N, Kuchino Y, Nishimura S (1986) Dramatic events in ciliate evolution: alteration of UAA and UAG termination codons to glutamine codons due to anticodon mutations in two Tetrahymena tRNAs(Gln). EMBO 15:1307–1311

    Google Scholar 

  • Hawkins JD (1988) A survey on intron and exon lengths. Nucleic Acids Res 2:9893–9908

    Google Scholar 

  • Holland SK, Blake CCF (1990) Proteins, exons, and molecular evolution. In: Stone EM, Schwartz RJ (eds) Intervening sequences in evolution and development. Oxford University Press, New York, pp 10–42

    Google Scholar 

  • Iranpour R, Chacon P (1991) Basic stochastic processes. Macmillan, New York, pp 1–258

    Google Scholar 

  • Jukes TH (1982) Possible evolutionary steps in the genetic code. Biochem Biophys Res Comm 107:225–228

    Google Scholar 

  • Jukes TH, Osawa S, Moto A, Lehman N (1987) Evolution of anticodons: variations in the genetic code. Cold Spring Harbor Sympos Quant Biol 52:769–776

    Google Scholar 

  • Lau KF, Dill KA (1990) Theory for protein mutability and biogenesis. Proc Natl Acad Sci USA 87:638–642

    Google Scholar 

  • McLachlan AD (1972) Repeating sequences and gene duplication in proteins. J Mol Biol 64:417–437

    Google Scholar 

  • Mound J (1971) Chance and necessity. An essay on the natural philosophy of modern biology. Alfred A. Knopf, New York, pp 1–199

    Google Scholar 

  • Naora H, Deacon NJ (1982) Relationship between total size of exons and introns in protein-coding genes of higher eukaryotes. Proc Natl Acad Sci USA 79:6196–6200

    Google Scholar 

  • Nei M, Chakraborty R, Fuerst PA (1976) Infinite allele model with varying mutation rate. Proc Natl Acad Sci USA 73:4164–4168

    Google Scholar 

  • Osawa S, Jukes TH (1988) Evolution of the genetic code as affected by anticodon content. Trends Genet 4:191–198

    Google Scholar 

  • Patthy L (1991) Exons—original building blocks of proteins? BioEssays 13:187–192

    Google Scholar 

  • Ross SM (1989) Introduction to probability models, 4th ed. Academic Press, San Diego, pp 1–544

    Google Scholar 

  • Rossman MG (1990) Introductory comments on the function of domains in protein structure. In: Stone EM, Schwartz RJ (eds) Intervening sequences in evolution and development. Oxford University Press, New York, pp 3–9

    Google Scholar 

  • Senapathy P (1986) Origin of eukaryotic introns: a hypothesis, based on codon distribution statistics in genes, and its implications. Proc Natl Acad Sci USA 83:2133–2137

    Google Scholar 

  • Senapathy P (1988) Possible evolution of splice-junction signals in eukaryotic genes from stop codons. Proc Natl Acad Sci USA 85:1129–1133

    Google Scholar 

  • Shakhnovich EL, Gutin AM (1989) Formation of unique structure in polypeptide chains: theoretical investigation with the aid of a replica approach. Biophys Chem 34:187–199

    Google Scholar 

  • Shakhnovich EL, Gutin AM (1990) Implications of thermodynamics of protein folding for evolution of primary sequences. Nature 346:773–775

    Google Scholar 

  • Sharp PA (1985) On the origin of RNA splicing and introns. Cell 42:397–400

    Google Scholar 

  • Smith MW (1988) Structure of vertebrate genes: a statistical analysis implicating selection. J Mol Evol 27:45–55

    Google Scholar 

  • Sommer SS, Cohen JE (1980) The size distributions of proteins, mRNA, and nuclear RNA. J Mol Evol 15:37–57

    Google Scholar 

  • Tate WP, Brown CM (1992) Translational termination: “stop” for protein synthesis or “pause” for regulation of gene expression? Biochemistry 31:2443–2450

    Google Scholar 

  • Traut TW (1988) Do exons code for structural or functional units in proteins? Proc Natl Acad Sci USA 85:2944–2948

    Google Scholar 

  • White SH (1992) The amino acid preferences of small proteins: implications for protein stability and evolution. J Mol Biol 227:991–995

    Google Scholar 

  • White SH, Jacobs RE (1990) Statistical distribution of hydrophobic residues along the length of protein chains—implications for protein folding and evolution. Biophys 157:911–921

    Google Scholar 

  • White SH, Jacobs RE (1993) The evolution of proteins from random amino acid sequences I. Evidence from the lengthwise distribution of amino acids in modern protein sequences. J Mol Evol 36:79–95.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

White, S.H. The evolution of proteins from random amino acid sequences: II. Evidence from the statistical distributions of the lengths of modern protein sequences. J Mol Evol 38, 383–394 (1994). https://doi.org/10.1007/BF00163155

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00163155

Key words

Navigation