Skip to main content
Log in

Introns Form Compositional Clusters in Parallel with the Compositional Clusters of the Coding Sequences to Which they Pertain

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

This report deals with the study of compositional properties of human gene sequences evaluating similarities and differences among functionally distinct sectors of the gene independently of the reading frame. To retrieve the compositional information of DNA, we present a neighbor base dependent coding system in which the alphabet of 64 letters (DNA triplets) is compressed to an alphabet of 14 letters here termed triplet composons. The triplets containing the same set of distinct bases in whatever order and number form a triplet composon. The reading of the DNA sequence is performed starting at any letter of the initial triplet and then moving, triplet-to-triplet, until the end of the sequence. The readings were made in an overlapping way along the length of the sequences. The analysis of the compositional content in terms of the composon usage frequencies of the gene sequences shows that: (i) the compositional content of the sequences is far from that of random sequences, even in the case of non-protein coding sequences; (ii) coding sequences can be classified as components of compositional clusters; and (iii) intron sequences in a cluster have the same composon usage frequencies, even as their base composition differs notably from that of their home coding sequences. A comparison of the composon usage frequencies between human and mouse homologous genes indicated that two clusters found in humans do not have their counterpart in mouse whereas the others clusters are stable in both species with respect to their composon usage frequencies in both coding and noncoding sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Abbreviations

SD:

Standard deviation

CDS:

Coding sequence

IS:

Intron sequence

References

  • Altschul SF, Erickson BW (1985) Significance of nucleotide sequence alignments: a method for random sequence permutation that preserves dinucleotide and codon usage. Mol Biol Evol 2:526–538

    CAS  PubMed  Google Scholar 

  • Antezana MA, Jordan IK (2008) Highly conserved regimes of neighbor-base-dependent mutation generated the background primary-structural heterogeneities along vertebrate chromosomes. PLoS One 3(5):1–51

    Article  Google Scholar 

  • Antezana MA, Kreitman M (1999) The nonrandom location of synonymous codons suggest that reading frame-independent forces have patterned codon preferences. J Mol Evol 49:36–43

    Article  CAS  PubMed  Google Scholar 

  • Begun DJ, Lindfors HA, Kern AD, Jones CD (2007) Evidence for de novo evolution of testis-expressed genes in the Drosophila yakuba/Drosophila erecta clade. Genetics 176:1131–1137

    Article  CAS  PubMed  Google Scholar 

  • Bernardi G (1995) The human genome: organization and evolutionary history. Annu Rev Genet 29:445–476

    Article  CAS  PubMed  Google Scholar 

  • Bernardi G (2000) Isochores and the evolutionary genomics of vertebrates. Gene 241:3–17

    Article  CAS  PubMed  Google Scholar 

  • Bernardi G, Olofsson B, Filipski J, Zerial M, Salinas J, Cuny G, Meunier-Rotival M, Rodier F (1985) The mosaic genome of warm-blooded vertebrates. Science 228:953–958

    Article  CAS  PubMed  Google Scholar 

  • Blake CC (1979) Exons encode protein functional units. Nature 277:598

    Article  CAS  PubMed  Google Scholar 

  • Bultrini E, Pizzi E, Del Giudice P, Frontali C (2003) Pentamer vocabularies characterizing introns and intron-like intergenic tracts from Caenorhabditis elegans and Drosophila melagaster. Gene 304:183–192

    Article  CAS  PubMed  Google Scholar 

  • Campbell A, Mrazek J, Karlin S (1999) Genome signature comparisons among prokaryote, plasmid, and mitochondrial DNA. Proc Natl Acad Sci USA 96:9184–9189

    Article  CAS  PubMed  Google Scholar 

  • Clay O, Cacciò S, Zoubak S, Mouchiroud D, Bernardi G (1996) Human coding and noncoding DNA: compositional correlations. Mol Phylogenet Evol 5(1):2–12

    Article  CAS  PubMed  Google Scholar 

  • Cruveiller S, Jabbari K, Clay O, Bemardi G (2003) Compositional features of eukaryotic genomes for checking predicted genes. Brief Bioinform 4:43–52

    Article  CAS  PubMed  Google Scholar 

  • Eyre-Walker A, Hurst LD (2001) The evolution of isochores. Nat Rev Genet 2:549–554

    Article  CAS  PubMed  Google Scholar 

  • Gilbert W (1987) The exon theory of genes. Cold Spring Harb Symp Quant Biol 52:901–905

    CAS  PubMed  Google Scholar 

  • Gottlieb LD, Ford VS (2002) The 5′ leader of plant PgiC has an intron: the leader shows both the loss and maintenance of constraints compared with introns and exons in the coding region. Mol Biol Evol 19:1613–1623

    CAS  PubMed  Google Scholar 

  • Hare MP, Palumbi SR (2003) High intron sequence conservation across three mammalian orders suggests functional constraints. Mol Biol Evol 20(6):969–978

    Article  CAS  PubMed  Google Scholar 

  • Havlioglu N, Wang J, Fushimi K, Vibranovski MD, Kan Z, Gish W, Fedorov A, Long M, Wu JY (2007) An intronic signal for alternative splicing in the human genome. PLoS One 2(11):e1246

    Article  PubMed  Google Scholar 

  • Hughes AL, Yeager M (1997) Comparative evolutionary rates of introns and exons in murine rodents. J Mol Evol 45:125–130

    Article  CAS  PubMed  Google Scholar 

  • Karlin S, Ladunga I (1994) Comparisons of eukaryotic genomic sequences. Proc Natl Acad Sci USA 91:12832–12836

    Article  CAS  PubMed  Google Scholar 

  • Levine MT, Jones CD, Kern AD, Lindfors HA, Begun DJ (2006) Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. Proc Natl Acad Sci USA 103:9935–9939

    Article  CAS  PubMed  Google Scholar 

  • Long M, Langley CH (1993) Natural selection and the origin of jingwei, a chimeric processed functional gene in Drosophila. Science 260:91–95

    Article  CAS  PubMed  Google Scholar 

  • MacQueen B (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability, vol 1. University of California Press, Berkeley, pp 281–297

  • Mattick JS (1994) Introns: evolution and function. Curr Opin Genet Dev 4:823–831

    Article  CAS  PubMed  Google Scholar 

  • Mitchell PJ, Tjian R (1989) Transcriptional regulation in mammalian cells by sequence-specific DNA binding proteins. Science 245:371–378

    Article  CAS  PubMed  Google Scholar 

  • Montaner D, Tarraga J, Huerta-Cepas J, Burguet J, Vaquerizas JM, Conde L, Minguez P, Vera J, Mukherjee S, Valls J, Pujana MA, Alloza E, Herrero J, Al-Shahrour F, Dopazo J (2006) Next station in microarray data analysis: GEPAS. Nucl Acids Res 34:W486–W491

    Article  CAS  PubMed  Google Scholar 

  • Nakamura Y, Gojobori T, Ikemura T (2000) Codon usage tabulated from international DNA sequence databases: status for the year (2000). Nucleic Acids Res 28:292

    Article  CAS  PubMed  Google Scholar 

  • Ohno S (1970) Evolution by gene duplication. Springer-Verlag, New York, p 160

    Google Scholar 

  • Ohno S (1987) Codon preference is but an illusion created by the construction principle of coding sequences. Proc Nat Acad Sci USA 84:6486–6490

    Article  CAS  PubMed  Google Scholar 

  • Osawa S, Jukes TH, Watanabe K, Muto A (1992) Recent evidence for evolution of the genetic code. Microbiol Rev 56:229–264.490

    CAS  PubMed  Google Scholar 

  • Russell GJ, Walker PM, Elton RA, Subak-Sharpe JH (1976) Doublet frequency analysis of fractionated vertebrate nuclear DNA. J Mol Biol 108:1–23

    Article  CAS  PubMed  Google Scholar 

  • Schildkraut CL, Mandel M, Levisohn S, Smith-Sonneborn JE, Marmur J (1962) Deoxyribonucleic acid base composition and taxonomy of some protozoa. Nature 196:795–796

    Article  CAS  PubMed  Google Scholar 

  • Shannon (1948) A mathematical theory of communication. Bell Syst Technol J 27:9–423

    Google Scholar 

  • Stoltzfus A (1994) Origin of introns-early or late. Nature 369:526–527 (author reply 527–528)

    Article  CAS  PubMed  Google Scholar 

  • Sueoka N (1961) Variation and heterogeneity of base composition of deoxyribonucleic acids: a compilation of old and new data. J Mol Biol 3:31–40

    Article  CAS  Google Scholar 

  • Vinogradov AE (2003) Silent DNA: speaking RNA language? Bioinformatics 19:2167–2170

    Article  CAS  PubMed  Google Scholar 

  • Vinogradov AE (2006) ‘Genome design’ model and multicellular complexity: golden middle. Nucl Acids Res 34:5906–5914

    Article  CAS  PubMed  Google Scholar 

  • Zhang J, Hu J, Shi XF, Cao H, Liu WB (2003) Detection of potential positive regulatory motifs of transcription in yeast introns by comparative analysis of oligonucleotide frequencies. Comput Biol Chem 27(4–5):497–506

    Article  CAS  PubMed  Google Scholar 

  • Zhu L, Zhang Y, Zhang W, Yang S, Chen JQ, Tian D (2009) Patterns of exon-intron architecture variation of genes in eukaryotic genomes. BMC Genomics 10:47

    Article  PubMed  Google Scholar 

  • Zuckerkandl E (1981) A general function of noncoding polynucleotide sequences. Mass binding of transconformational proteins. Mol Biol Rep 7:149–158

    Article  CAS  PubMed  Google Scholar 

  • Zuckerkandl E, Cavalli G (2007) Combinatorial epigenetics, “junk DNA”, and the evolution of complex organisms. Gene 390:232–242

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

We thank Sara Fuertes for help with statistical data. This work was supported by COSTD20/003/00, CICYT Bio BIO2002-04049-C02-01, SAF 2004-03111, and ISCIII-RETIC RD06/0021/0008-FEDER programs. An institutional grant from Fundación Ramón Areces is also acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlos Alonso.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fuertes, M.A., Pérez, J.M., Zuckerkandl, E. et al. Introns Form Compositional Clusters in Parallel with the Compositional Clusters of the Coding Sequences to Which they Pertain. J Mol Evol 72, 1–13 (2011). https://doi.org/10.1007/s00239-010-9411-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00239-010-9411-6

Keywords

Navigation