Classification of triplet periodicity in the DNA sequences of genes from KEGG databank

Frenkel, F. E.; Korotkov, E. V.

doi:10.1134/S0026893308040201

Classification of triplet periodicity in the DNA sequences of genes from KEGG databank

Mathematical and System Biology
Published: 10 August 2008

Volume 42, pages 629–640, (2008)
Cite this article

Molecular Biology Aims and scope Submit manuscript

F. E. Frenkel¹ &
E. V. Korotkov¹

50 Accesses
2 Citations
Explore all metrics

Abstract

Totally, 472 288 regions of triplet periodicity were found in 578 868 genes from KEGG databank version 29 and classified. A new concept of triplet periodicity class and a measure of similarity between periodicity classes were introduced. Overall, 2520 classes were created and contained 94% of the triplet periodicity cases found. A similar correlation between the triplet periodicity and reading frame was observed for 92% of triplet periodicity regions contained in different classes. The remaining triplet periodicity regions displayed a shift of the reading frame relative to that common for the majority of genes belonging to the same triplet periodicity class. The hypothetical amino acid sequences were deduced from the periodicity regions according to the reading frame characteristic of the given triplet periodicity class. BLAST analysis demonstrated that 2660 hypothetical amino acid sequences display a statistically significant similarity to proteins from the Uni-Prot databank. It was supposed that 8% of the triplet periodicity regions contained in the classes have frameshift mutations. The triplet periodicity classes can be used to identify the coding regions in genes and to searching for frameshift mutations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PCV: An Alignment Free Method for Finding Homologous Nucleotide Sequences and its Application in Phylogenetic Study

Article 29 January 2016

Rajnish Kumar, Bharat Kumar Mishra, … Manoj Kumar Pal

A Database of Potential Reading Frame Shifts in Coding Sequences from Different Eukaryotic Genomes

Article 01 May 2019

Yu. M. Suvorova, V. M. Pugacheva & E. V. Korotkov

Search of Regions with Periodicity Using Random Position Weight Matrices in the Genome of C. elegans

References

Fickett J.W. 1998. Predictive methods using nucleotide sequences. Methods Biochem. Anal. 39, 231–245.
Article PubMed CAS Google Scholar
Staden R. 1994. Staden: statistical and structural analysis of nucleotide sequences. Methods Mol. Biol. 25, 69–77.
PubMed CAS Google Scholar
Baxevanis A.D. 2001. Predictive methods using DNA sequences. Methods Biochem. Anal. 43, 233–252.
Article PubMed CAS Google Scholar
Gutierrez G., Oliver J.L., Marin A. 1994. On the origin of the periodicity of three in protein coding DNA sequences. J. Theoret. Biol. 167, 413–414.
Article CAS Google Scholar
Gao J., Qi Y., Cao Y., Tung W.W. 2005. Protein coding sequence identification by simultaneously characterizing the periodic and random features of DNA sequences. J. Biomed. Biotechnol. 2, 139–146.
Article CAS Google Scholar
Yin C., Yau S.S. 2007. Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence. J. Theor. Biol. 247, 687–694.
Article PubMed CAS Google Scholar
Eskesen S.T., Eskesen F.N. Kinghorn B., Ruvinsky A. 2004. Periodicity of DNA in exons. BMC Mol. Biol. 5, 12.
Article PubMed CAS Google Scholar
Bibb M.J., Findlay P.R., Johnson M.W. 1984. The relationship between base composition and codon usage in bacterial genes and its use for the simple and reliable identification of protein-coding sequences. Gene. 30, 157–166.
Article PubMed CAS Google Scholar
Konopka A.K. 1994. Sequences and codes: Fundamentals of biomolecular cryptography. In: Biocomputing: Informatics and genome projects. Ed. Smith D. San Diego: Academic Press, pp. 119–174.
Google Scholar
Trifonov E.N. 1999. Elucidating sequence codes: Three codes for evolution. Ann. N.Y. Acad. Sci. 870, 330–338.
Article PubMed CAS Google Scholar
Eigen M., Winkler-Oswatitsch R. 1981. Transfer-RNA: The early adaptor. Naturwissenschaften. 68, 217–228.
Article PubMed CAS Google Scholar
Zoltowski M. 2007. Is DNA code periodicity only due to CUF-codons usage frequency? Conf. Proc. IEEE Eng. Med. Biol. Soc. 1, 1383–1386.
Google Scholar
Antezana M.A., Kreitman M. 1999. The nonrandom location of synonymous codons suggests that reading frame-independent forces have patterned codon preferences. J. Mol. Evol. 49, 36–43.
Article PubMed CAS Google Scholar
Karlin S., Bucher P. 1992. Correlation analysis of amino acid usage in protein classes. Proc. Natl. Acad. Sci. USA. 89, 12165–12169.
Article PubMed CAS Google Scholar
Zhang J. 2005. On the evolution of codon volatility. Genetics. 169, 495–501.
Article PubMed CAS Google Scholar
Trifonov E.N. 1987. Translation framing code and frame-monitoring mechanism as suggested by the analysis of mRNA and 16S rRNA nucleotide sequences. J. Mol. Biol. 194, 643–652.
Article PubMed CAS Google Scholar
Fickett J.W. 1996. The gene identification problem: An overview for developers. Comput. Chem. 20, 103–118.
Article PubMed CAS Google Scholar
Issac B., Singh H., Kaur H., Raghava G.P.S. 2002. Locating probable genes using Fourier transform approach. Bioinformatics. 18, 196–197.
Article PubMed CAS Google Scholar
Tiwari S., Ramachandran S., Bhattacharya A., Bhattacharya S., Ramaswamy R. 1997. Prediction of probable genes by Fourier analysis of genomic sequences. Comput. Appl. Bioscie. 13, 263–270.
CAS Google Scholar
Azad R.K., Borodovsky M. 2004. Probabilistic methods of identifying genes in prokaryotic genomes: Connections to the HMM theory. Briefings Bioinform. 5, 118–130.
Article CAS Google Scholar
Henderson J., Salzberg S., Fasman K.H. 1997. Finding genes in DNA with a Hidden Markov Model. J. Comput. Biol. 4, 127–141.
Article PubMed CAS Google Scholar
Snyder E.E., Stormo G.D. 1993. Identification of coding regions in genomic DNA sequences: An application of dynamic programming and neural networks. Nucleic Acids Res. 21, 607–613.
Article PubMed CAS Google Scholar
Thomas A., Skolnick M.H. 1994. A probabilistic model for detecting coding regions in DNA sequences. 1994. IMA J. Math. Appl. Med. Biol. 11, 149–160.
Article PubMed CAS Google Scholar
Korotkov E.V., Korotkova M.A., Kudryashov N.A. 2003. Information decomposition method for analysis of symbolical sequences. Physics Lett. A. 312, 198–310.
Article CAS Google Scholar
Korotkov E.V., Korotkova M.A., Frenkel F.E., Kudryashov N.A. 2003. The informational concept of searching for periodicity in symbol sequences. Mol. Biol. 37, 436–451.
Article CAS Google Scholar
Gribskov M., Veretnik S. 1996. Identification of sequence pattern with profile analysis. Methods Enzymol. 266, 198–212.
Article PubMed CAS Google Scholar
Kullback S. 1978. Information Theory and Statistics. Gloucester: Peter Smith.
Google Scholar
Chaley M.B., Korotkov E.V., Skryabin K.G. 1999. Method for revealing latent periodicity of the nucleotide sequences modified for a case of small samples. DNA Res. 6, 153–163.
Article PubMed CAS Google Scholar
Gmurman V.E. 2003. Teoriya veroyatnosti i matematicheskaya statistika (The Probability Theory and Mathematical Statistics). Moscow: Vysshaya Shkola.
Google Scholar
Grosse I., Buldyrev S.V., Stanley H.E., Holste D., Herzel H. 2000. Pacific Symposium on Biocomputing. Hawaii, USA: Abstract book, p. 611.
Ota T., Suzuki Y., Nishikawa T., Otsuki T., Sugiyama T., Irie R., Wakamatsu A., Hayashi K., Sato H., Nagai K., Kimura K., Makita H., Sekine M., Obayashi M., Nishi T., Shibahara T., Tanaka T., Ishii S., Yamamoto J., Sugano S. 2004. Complete sequencing and characterization of 21 243 full-length human cDNAs. Nature Genetics. 36, 40–45.
Article PubMed Google Scholar
Thiesen H.J. 1990. Multiple genes encoding zinc finger domains are expressed in human T cells. New Biol. 2, 363–374.
PubMed CAS Google Scholar
Raes J., van de Peer Y. 2005. Functional divergence of proteins through frameshift mutations. Trends Genetics. 21, 428–431.
Article CAS Google Scholar
Hahn Y., Lee B. 2005. Identification of nine human-specific frameshift mutations by comparative analysis of the human and the chimpanzee genome sequences. Bioinformatics. 21, 186–194.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Bioengineering Center, Russian Academy of Sciences, Moscow, 117312, Russia
F. E. Frenkel & E. V. Korotkov

Authors

F. E. Frenkel
View author publications
You can also search for this author in PubMed Google Scholar
E. V. Korotkov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to F. E. Frenkel.

Additional information

Original Russian Text © F.E. Frenkel, E.V. Korotkov, 2008, published in Molekulyarnaya Biologiya, 2008, Vol. 42, No. 4, pp. 707–720.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Frenkel, F.E., Korotkov, E.V. Classification of triplet periodicity in the DNA sequences of genes from KEGG databank. Mol Biol 42, 629–640 (2008). https://doi.org/10.1134/S0026893308040201

Download citation

Received: 14 December 2007
Accepted: 04 March 2008
Published: 10 August 2008
Issue Date: August 2008
DOI: https://doi.org/10.1134/S0026893308040201

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Classification of triplet periodicity in the DNA sequences of genes from KEGG databank

Abstract

Access this article

Similar content being viewed by others

PCV: An Alignment Free Method for Finding Homologous Nucleotide Sequences and its Application in Phylogenetic Study

A Database of Potential Reading Frame Shifts in Coding Sequences from Different Eukaryotic Genomes

Search of Regions with Periodicity Using Random Position Weight Matrices in the Genome of C. elegans

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Key words

Navigation

Classification of triplet periodicity in the DNA sequences of genes from KEGG databank

Abstract

Access this article

Similar content being viewed by others

PCV: An Alignment Free Method for Finding Homologous Nucleotide Sequences and its Application in Phylogenetic Study

A Database of Potential Reading Frame Shifts in Coding Sequences from Different Eukaryotic Genomes

Search of Regions with Periodicity Using Random Position Weight Matrices in the Genome of C. elegans

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation