Abstract
This report deals with the study of compositional properties of human gene sequences evaluating similarities and differences among functionally distinct sectors of the gene independently of the reading frame. To retrieve the compositional information of DNA, we present a neighbor base dependent coding system in which the alphabet of 64 letters (DNA triplets) is compressed to an alphabet of 14 letters here termed triplet composons. The triplets containing the same set of distinct bases in whatever order and number form a triplet composon. The reading of the DNA sequence is performed starting at any letter of the initial triplet and then moving, triplet-to-triplet, until the end of the sequence. The readings were made in an overlapping way along the length of the sequences. The analysis of the compositional content in terms of the composon usage frequencies of the gene sequences shows that: (i) the compositional content of the sequences is far from that of random sequences, even in the case of non-protein coding sequences; (ii) coding sequences can be classified as components of compositional clusters; and (iii) intron sequences in a cluster have the same composon usage frequencies, even as their base composition differs notably from that of their home coding sequences. A comparison of the composon usage frequencies between human and mouse homologous genes indicated that two clusters found in humans do not have their counterpart in mouse whereas the others clusters are stable in both species with respect to their composon usage frequencies in both coding and noncoding sequences.
Similar content being viewed by others
Abbreviations
- SD:
-
Standard deviation
- CDS:
-
Coding sequence
- IS:
-
Intron sequence
References
Altschul SF, Erickson BW (1985) Significance of nucleotide sequence alignments: a method for random sequence permutation that preserves dinucleotide and codon usage. Mol Biol Evol 2:526–538
Antezana MA, Jordan IK (2008) Highly conserved regimes of neighbor-base-dependent mutation generated the background primary-structural heterogeneities along vertebrate chromosomes. PLoS One 3(5):1–51
Antezana MA, Kreitman M (1999) The nonrandom location of synonymous codons suggest that reading frame-independent forces have patterned codon preferences. J Mol Evol 49:36–43
Begun DJ, Lindfors HA, Kern AD, Jones CD (2007) Evidence for de novo evolution of testis-expressed genes in the Drosophila yakuba/Drosophila erecta clade. Genetics 176:1131–1137
Bernardi G (1995) The human genome: organization and evolutionary history. Annu Rev Genet 29:445–476
Bernardi G (2000) Isochores and the evolutionary genomics of vertebrates. Gene 241:3–17
Bernardi G, Olofsson B, Filipski J, Zerial M, Salinas J, Cuny G, Meunier-Rotival M, Rodier F (1985) The mosaic genome of warm-blooded vertebrates. Science 228:953–958
Blake CC (1979) Exons encode protein functional units. Nature 277:598
Bultrini E, Pizzi E, Del Giudice P, Frontali C (2003) Pentamer vocabularies characterizing introns and intron-like intergenic tracts from Caenorhabditis elegans and Drosophila melagaster. Gene 304:183–192
Campbell A, Mrazek J, Karlin S (1999) Genome signature comparisons among prokaryote, plasmid, and mitochondrial DNA. Proc Natl Acad Sci USA 96:9184–9189
Clay O, Cacciò S, Zoubak S, Mouchiroud D, Bernardi G (1996) Human coding and noncoding DNA: compositional correlations. Mol Phylogenet Evol 5(1):2–12
Cruveiller S, Jabbari K, Clay O, Bemardi G (2003) Compositional features of eukaryotic genomes for checking predicted genes. Brief Bioinform 4:43–52
Eyre-Walker A, Hurst LD (2001) The evolution of isochores. Nat Rev Genet 2:549–554
Gilbert W (1987) The exon theory of genes. Cold Spring Harb Symp Quant Biol 52:901–905
Gottlieb LD, Ford VS (2002) The 5′ leader of plant PgiC has an intron: the leader shows both the loss and maintenance of constraints compared with introns and exons in the coding region. Mol Biol Evol 19:1613–1623
Hare MP, Palumbi SR (2003) High intron sequence conservation across three mammalian orders suggests functional constraints. Mol Biol Evol 20(6):969–978
Havlioglu N, Wang J, Fushimi K, Vibranovski MD, Kan Z, Gish W, Fedorov A, Long M, Wu JY (2007) An intronic signal for alternative splicing in the human genome. PLoS One 2(11):e1246
Hughes AL, Yeager M (1997) Comparative evolutionary rates of introns and exons in murine rodents. J Mol Evol 45:125–130
Karlin S, Ladunga I (1994) Comparisons of eukaryotic genomic sequences. Proc Natl Acad Sci USA 91:12832–12836
Levine MT, Jones CD, Kern AD, Lindfors HA, Begun DJ (2006) Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression. Proc Natl Acad Sci USA 103:9935–9939
Long M, Langley CH (1993) Natural selection and the origin of jingwei, a chimeric processed functional gene in Drosophila. Science 260:91–95
MacQueen B (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley symposium on mathematical statistics and probability, vol 1. University of California Press, Berkeley, pp 281–297
Mattick JS (1994) Introns: evolution and function. Curr Opin Genet Dev 4:823–831
Mitchell PJ, Tjian R (1989) Transcriptional regulation in mammalian cells by sequence-specific DNA binding proteins. Science 245:371–378
Montaner D, Tarraga J, Huerta-Cepas J, Burguet J, Vaquerizas JM, Conde L, Minguez P, Vera J, Mukherjee S, Valls J, Pujana MA, Alloza E, Herrero J, Al-Shahrour F, Dopazo J (2006) Next station in microarray data analysis: GEPAS. Nucl Acids Res 34:W486–W491
Nakamura Y, Gojobori T, Ikemura T (2000) Codon usage tabulated from international DNA sequence databases: status for the year (2000). Nucleic Acids Res 28:292
Ohno S (1970) Evolution by gene duplication. Springer-Verlag, New York, p 160
Ohno S (1987) Codon preference is but an illusion created by the construction principle of coding sequences. Proc Nat Acad Sci USA 84:6486–6490
Osawa S, Jukes TH, Watanabe K, Muto A (1992) Recent evidence for evolution of the genetic code. Microbiol Rev 56:229–264.490
Russell GJ, Walker PM, Elton RA, Subak-Sharpe JH (1976) Doublet frequency analysis of fractionated vertebrate nuclear DNA. J Mol Biol 108:1–23
Schildkraut CL, Mandel M, Levisohn S, Smith-Sonneborn JE, Marmur J (1962) Deoxyribonucleic acid base composition and taxonomy of some protozoa. Nature 196:795–796
Shannon (1948) A mathematical theory of communication. Bell Syst Technol J 27:9–423
Stoltzfus A (1994) Origin of introns-early or late. Nature 369:526–527 (author reply 527–528)
Sueoka N (1961) Variation and heterogeneity of base composition of deoxyribonucleic acids: a compilation of old and new data. J Mol Biol 3:31–40
Vinogradov AE (2003) Silent DNA: speaking RNA language? Bioinformatics 19:2167–2170
Vinogradov AE (2006) ‘Genome design’ model and multicellular complexity: golden middle. Nucl Acids Res 34:5906–5914
Zhang J, Hu J, Shi XF, Cao H, Liu WB (2003) Detection of potential positive regulatory motifs of transcription in yeast introns by comparative analysis of oligonucleotide frequencies. Comput Biol Chem 27(4–5):497–506
Zhu L, Zhang Y, Zhang W, Yang S, Chen JQ, Tian D (2009) Patterns of exon-intron architecture variation of genes in eukaryotic genomes. BMC Genomics 10:47
Zuckerkandl E (1981) A general function of noncoding polynucleotide sequences. Mass binding of transconformational proteins. Mol Biol Rep 7:149–158
Zuckerkandl E, Cavalli G (2007) Combinatorial epigenetics, “junk DNA”, and the evolution of complex organisms. Gene 390:232–242
Acknowledgments
We thank Sara Fuertes for help with statistical data. This work was supported by COSTD20/003/00, CICYT Bio BIO2002-04049-C02-01, SAF 2004-03111, and ISCIII-RETIC RD06/0021/0008-FEDER programs. An institutional grant from Fundación Ramón Areces is also acknowledged.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Fuertes, M.A., Pérez, J.M., Zuckerkandl, E. et al. Introns Form Compositional Clusters in Parallel with the Compositional Clusters of the Coding Sequences to Which they Pertain. J Mol Evol 72, 1–13 (2011). https://doi.org/10.1007/s00239-010-9411-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-010-9411-6