Skip to main content
Log in

Classification of triplet periodicity in the DNA sequences of genes from KEGG databank

  • Mathematical and System Biology
  • Published:
Molecular Biology Aims and scope Submit manuscript

Abstract

Totally, 472 288 regions of triplet periodicity were found in 578 868 genes from KEGG databank version 29 and classified. A new concept of triplet periodicity class and a measure of similarity between periodicity classes were introduced. Overall, 2520 classes were created and contained 94% of the triplet periodicity cases found. A similar correlation between the triplet periodicity and reading frame was observed for 92% of triplet periodicity regions contained in different classes. The remaining triplet periodicity regions displayed a shift of the reading frame relative to that common for the majority of genes belonging to the same triplet periodicity class. The hypothetical amino acid sequences were deduced from the periodicity regions according to the reading frame characteristic of the given triplet periodicity class. BLAST analysis demonstrated that 2660 hypothetical amino acid sequences display a statistically significant similarity to proteins from the Uni-Prot databank. It was supposed that 8% of the triplet periodicity regions contained in the classes have frameshift mutations. The triplet periodicity classes can be used to identify the coding regions in genes and to searching for frameshift mutations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Fickett J.W. 1998. Predictive methods using nucleotide sequences. Methods Biochem. Anal. 39, 231–245.

    Article  PubMed  CAS  Google Scholar 

  2. Staden R. 1994. Staden: statistical and structural analysis of nucleotide sequences. Methods Mol. Biol. 25, 69–77.

    PubMed  CAS  Google Scholar 

  3. Baxevanis A.D. 2001. Predictive methods using DNA sequences. Methods Biochem. Anal. 43, 233–252.

    Article  PubMed  CAS  Google Scholar 

  4. Gutierrez G., Oliver J.L., Marin A. 1994. On the origin of the periodicity of three in protein coding DNA sequences. J. Theoret. Biol. 167, 413–414.

    Article  CAS  Google Scholar 

  5. Gao J., Qi Y., Cao Y., Tung W.W. 2005. Protein coding sequence identification by simultaneously characterizing the periodic and random features of DNA sequences. J. Biomed. Biotechnol. 2, 139–146.

    Article  CAS  Google Scholar 

  6. Yin C., Yau S.S. 2007. Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence. J. Theor. Biol. 247, 687–694.

    Article  PubMed  CAS  Google Scholar 

  7. Eskesen S.T., Eskesen F.N. Kinghorn B., Ruvinsky A. 2004. Periodicity of DNA in exons. BMC Mol. Biol. 5, 12.

    Article  PubMed  CAS  Google Scholar 

  8. Bibb M.J., Findlay P.R., Johnson M.W. 1984. The relationship between base composition and codon usage in bacterial genes and its use for the simple and reliable identification of protein-coding sequences. Gene. 30, 157–166.

    Article  PubMed  CAS  Google Scholar 

  9. Konopka A.K. 1994. Sequences and codes: Fundamentals of biomolecular cryptography. In: Biocomputing: Informatics and genome projects. Ed. Smith D. San Diego: Academic Press, pp. 119–174.

    Google Scholar 

  10. Trifonov E.N. 1999. Elucidating sequence codes: Three codes for evolution. Ann. N.Y. Acad. Sci. 870, 330–338.

    Article  PubMed  CAS  Google Scholar 

  11. Eigen M., Winkler-Oswatitsch R. 1981. Transfer-RNA: The early adaptor. Naturwissenschaften. 68, 217–228.

    Article  PubMed  CAS  Google Scholar 

  12. Zoltowski M. 2007. Is DNA code periodicity only due to CUF-codons usage frequency? Conf. Proc. IEEE Eng. Med. Biol. Soc. 1, 1383–1386.

    Google Scholar 

  13. Antezana M.A., Kreitman M. 1999. The nonrandom location of synonymous codons suggests that reading frame-independent forces have patterned codon preferences. J. Mol. Evol. 49, 36–43.

    Article  PubMed  CAS  Google Scholar 

  14. Karlin S., Bucher P. 1992. Correlation analysis of amino acid usage in protein classes. Proc. Natl. Acad. Sci. USA. 89, 12165–12169.

    Article  PubMed  CAS  Google Scholar 

  15. Zhang J. 2005. On the evolution of codon volatility. Genetics. 169, 495–501.

    Article  PubMed  CAS  Google Scholar 

  16. Trifonov E.N. 1987. Translation framing code and frame-monitoring mechanism as suggested by the analysis of mRNA and 16S rRNA nucleotide sequences. J. Mol. Biol. 194, 643–652.

    Article  PubMed  CAS  Google Scholar 

  17. Fickett J.W. 1996. The gene identification problem: An overview for developers. Comput. Chem. 20, 103–118.

    Article  PubMed  CAS  Google Scholar 

  18. Issac B., Singh H., Kaur H., Raghava G.P.S. 2002. Locating probable genes using Fourier transform approach. Bioinformatics. 18, 196–197.

    Article  PubMed  CAS  Google Scholar 

  19. Tiwari S., Ramachandran S., Bhattacharya A., Bhattacharya S., Ramaswamy R. 1997. Prediction of probable genes by Fourier analysis of genomic sequences. Comput. Appl. Bioscie. 13, 263–270.

    CAS  Google Scholar 

  20. Azad R.K., Borodovsky M. 2004. Probabilistic methods of identifying genes in prokaryotic genomes: Connections to the HMM theory. Briefings Bioinform. 5, 118–130.

    Article  CAS  Google Scholar 

  21. Henderson J., Salzberg S., Fasman K.H. 1997. Finding genes in DNA with a Hidden Markov Model. J. Comput. Biol. 4, 127–141.

    Article  PubMed  CAS  Google Scholar 

  22. Snyder E.E., Stormo G.D. 1993. Identification of coding regions in genomic DNA sequences: An application of dynamic programming and neural networks. Nucleic Acids Res. 21, 607–613.

    Article  PubMed  CAS  Google Scholar 

  23. Thomas A., Skolnick M.H. 1994. A probabilistic model for detecting coding regions in DNA sequences. 1994. IMA J. Math. Appl. Med. Biol. 11, 149–160.

    Article  PubMed  CAS  Google Scholar 

  24. Korotkov E.V., Korotkova M.A., Kudryashov N.A. 2003. Information decomposition method for analysis of symbolical sequences. Physics Lett. A. 312, 198–310.

    Article  CAS  Google Scholar 

  25. Korotkov E.V., Korotkova M.A., Frenkel F.E., Kudryashov N.A. 2003. The informational concept of searching for periodicity in symbol sequences. Mol. Biol. 37, 436–451.

    Article  CAS  Google Scholar 

  26. Gribskov M., Veretnik S. 1996. Identification of sequence pattern with profile analysis. Methods Enzymol. 266, 198–212.

    Article  PubMed  CAS  Google Scholar 

  27. Kullback S. 1978. Information Theory and Statistics. Gloucester: Peter Smith.

    Google Scholar 

  28. Chaley M.B., Korotkov E.V., Skryabin K.G. 1999. Method for revealing latent periodicity of the nucleotide sequences modified for a case of small samples. DNA Res. 6, 153–163.

    Article  PubMed  CAS  Google Scholar 

  29. Gmurman V.E. 2003. Teoriya veroyatnosti i matematicheskaya statistika (The Probability Theory and Mathematical Statistics). Moscow: Vysshaya Shkola.

    Google Scholar 

  30. Grosse I., Buldyrev S.V., Stanley H.E., Holste D., Herzel H. 2000. Pacific Symposium on Biocomputing. Hawaii, USA: Abstract book, p. 611.

  31. Ota T., Suzuki Y., Nishikawa T., Otsuki T., Sugiyama T., Irie R., Wakamatsu A., Hayashi K., Sato H., Nagai K., Kimura K., Makita H., Sekine M., Obayashi M., Nishi T., Shibahara T., Tanaka T., Ishii S., Yamamoto J., Sugano S. 2004. Complete sequencing and characterization of 21 243 full-length human cDNAs. Nature Genetics. 36, 40–45.

    Article  PubMed  Google Scholar 

  32. Thiesen H.J. 1990. Multiple genes encoding zinc finger domains are expressed in human T cells. New Biol. 2, 363–374.

    PubMed  CAS  Google Scholar 

  33. Raes J., van de Peer Y. 2005. Functional divergence of proteins through frameshift mutations. Trends Genetics. 21, 428–431.

    Article  CAS  Google Scholar 

  34. Hahn Y., Lee B. 2005. Identification of nine human-specific frameshift mutations by comparative analysis of the human and the chimpanzee genome sequences. Bioinformatics. 21, 186–194.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to F. E. Frenkel.

Additional information

Original Russian Text © F.E. Frenkel, E.V. Korotkov, 2008, published in Molekulyarnaya Biologiya, 2008, Vol. 42, No. 4, pp. 707–720.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Frenkel, F.E., Korotkov, E.V. Classification of triplet periodicity in the DNA sequences of genes from KEGG databank. Mol Biol 42, 629–640 (2008). https://doi.org/10.1134/S0026893308040201

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S0026893308040201

Key words

Navigation