Abstract
Periodicity of DNA segments and sequences have been studied thoroughly during the past decades. One of the main problems is the identification of protein coding and non-coding regions inside genes, using mathematical techniques. Periodicity plays an important role in the structure of DNA, as specific regions have been shown to have periodic patterns. In this paper, we consider that a DNA sequence is described by a semi-Markov chain (SMC), with discrete state space consisting of the four nucleotides. Equations in closed analytic form are derived, in order to characterize strong or weak d-periodic and quasiperiodic behaviour of our model for both the homogeneous and non-homogeneous case. The model is applied to 3-base periodic sequences, which characterize the protein-coding regions of the gene. The related probabilities and the corresponding indexes are provided, which yield a description of the underlying periodic pattern. Last, the previous theoretical results are illustrated with data from synthetic and real DNA sequences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Almagor, H.: A Markov analysis of DNA sequences. J. Theor. Biol. 104(4), 633–645 (1983)
Almirantis, Y.: A standard deviation based quantification differentiates coding from non-coding DNA sequences and gives insight to their evolutionary history. J. Theor. Biol. 196(3), 297–308 (1999)
Avery, P.J., Henderson, D.A.: Fitting Markov chain models to discrete state series such as DNA sequences. J. R. Stat. Soc.: Ser. C (Appl. Stat.) 48(1), 53–61 (1999)
Bartholomew, D., Forbes, A., McClean, S.: Statistical Techniques for Manpower Planning. Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. Wiley (1991)
Benson, G.: Tandem repeats finder: a program to analyze DNA sequences. Nucl. Acids Res. 27(2), 573–580 (1999)
Burge, C., Karlin, S.: Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268(1), 78–94 (1997)
Chechetkin, V.R., Yu. Turygin, A.: Search of hidden periodicities in DNA sequences. J. Theor. Biol. 175(4), 477–94 (1995)
Chechetkin, V.R., Turygin, A.Y.: On the spectral criteria of disorder in nonperiodic sequences: application to inflation models, symbolic dynamics and DNA sequences. J. Phys. A: Math. Gen. 27(14), 4875–4898 (1994)
Cheever, E.A., Overton, G.C., Searls, D.B.: Fast Fourier transform-based correlation of DNA sequences using complex plane encoding. Comput. Appl. Biosci.: CABIOS 7(2), 143–54 (1991)
Cohanim, A.B., Trifonov, E.N., Kashi, Y.: Specific selection pressure at the third codon positions: contribution to 10-to 11-base periodicity in prokaryotic genomes. J. Mol. Evol. 63(3), 393–400 (2006)
D’Amico, G., Petroni, F., Prattico, F.: First and second order semi-Markov chains for wind speed modeling. Phys. A: Stat. Mech. Its Appl. 392(5), 1194–1201 (2013)
Eskesen, S.T., Eskesen, F.N., Kinghorn, B., Ruvinsky, A.: Periodicity of DNA in exons. BMC Mol. Biol. 5(1), 12 (2004)
Garden, P.W.: Markov analysis of viral DNA/RNA sequences. J. Theor. Biol. 82(4), 679–684 (1980)
Herzel, H., Weiss, O., Trifonov, E.N.: 10–11 bp periodicities in complete genomes reflect protein structure and DNA folding. Bioinformatics (Oxford, England) 15(3), 187–193 (1999)
Howard, R.A.: Dynamic probabilistic systems: Markov models, vol. 2. Courier Corporation (1971)
Janssen, J.: Semi-Markov Models: Theory and Applications. Springer (1999)
Janssen, J., Manca, R.: Applied semi-Markov processes. Springer Science & Business Media (2006)
Papadopoulou, A.: Counting transitions–entrance probabilities in non-homogeneous semi-Markov systems. Appl. Stoch. Models Data Anal. 13(3–4), 199–206 (1997)
Papadopoulou, A.A.: Some results on modeling biological sequences and web navigation with a semi Markov chain. Commun. Stat.-Theory Methods 42(16), 2853–2871 (2013)
Provata, A., Almirantis, Y.: Scaling properties of coding and non-coding DNA sequences. Phys. A: Stat. Mech. Its Appl. 247(1–4), 482–496 (1997)
Reinert, G., Schbath, S., Waterman, M.S.: Probabilistic and statistical properties of words: an overview. J. Comput. Biol. 7(1–2), 1–46 (2000)
Salih, B., Tripathi, V., Trifonov, E.N.: Visible periodicity of strong nucleosome DNA sequences. J. Biomol. Struct. Dyn. 33(1), 1–9 (2015)
Schbath, S., Prum, B., De Turckheim, E.: Exceptional motifs in different Markov chain models for a statistical analysis of DNA sequences. J. Comput. Biol. 2(3), 417–437 (1995)
Tavare, S., Giddings, B.W.: Some statistical aspects of the primary structure of nucleotide sequences. In: Waterman, M.S. (ed.) Mathematical Methods for DNA Sequences (1989)
Trifonov, E.N.: 3-, 10.5-, 200-and 400-base periodicities in genome sequences. Phys. A: Stat. Mech. Its Appl. 249(1–4), 511–516 (1998)
Trifonov, E.N., Sussman, J.L.: The pitch of chromatin DNA is reflected in its nucleotide sequence. Proc. Natl. Acad. Sci. 77(7), 3816–3820 (1980)
Tsonis, A.A., Elsner, J.B., Tsonis, P.A.: Periodicity in DNA coding sequences: implications in gene evolution. J. Theor. Biol. 151(3), 323–331 (1991)
Vassiliou, P.C.G., Papadopoulou, A.: Non-homogeneous semi-Markov systems and maintainability of the state sizes. J. Appl. Probab. 29(3), 519–534 (1992)
Waterman, M.: Introduction to Computational Biology: Maps, Sequences, and Genomes: Interdisciplinary Statistics. Chapman & Hall/CRC, New York (1995)
Wu, T.J., Hsieh, Y.C., Li, L.A.: Statistical measures of DNA sequence dissimilarity under Markov chain models of base composition. Biometrics 57(2), 441–448 (2001)
Yin, C., Wang, J.: Periodic power spectrum with applications in detection of latent periodicities in DNA sequences. J. Math. Biol. 73(5), 1053–1079 (2016)
Yin, C., Yau, S.S.T.: Prediction of protein coding regions by the 3-base periodicity analysis of a DNA sequence. J. Theor. Biol. 247(4), 687–694 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kolias, P., Papadopoulou, A. (2022). Investigating Some Attributes of Periodicity in DNA Sequences via Semi-Markov Modelling. In: Malyarenko, A., Ni, Y., Rančić, M., Silvestrov, S. (eds) Stochastic Processes, Statistical Methods, and Engineering Mathematics . SPAS 2019. Springer Proceedings in Mathematics & Statistics, vol 408. Springer, Cham. https://doi.org/10.1007/978-3-031-17820-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-17820-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-17819-1
Online ISBN: 978-3-031-17820-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)