Skip to main content

Advertisement

Log in

A markovian approach for the prediction of mouse isochores

  • Published:
Journal of Mathematical Biology Aims and scope Submit manuscript

Abstract

Hidden Markov models (HMMs) are effective tools to detect series of statistically homogeneous structures, but they are not well suited to analyse complex structures. For example, the duration of stay in a state of a HMM must follow a geometric law. Numerous other methodological difficulties are encountered when using HMMs to segregate genes from transposons or retroviruses, or to determine the isochore classes of genes. The aim of this paper is to analyse these methodological difficulties, and to suggest new tools for the exploration of genome data. We show that HMMs can be used to analyse complex gene structures with bell-shaped length distribution by using convolution of geometric distributions. Thus, we have introduced macros-states to model the distributions of the lengths of the regions. Our study shows that simple HMM could be used to model the isochore organisation of the mouse genome. This potential use of markovian models to help in data exploration has been underestimated until now.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Berget S.M. (1995). Exon recognition in vertebrate splicing. J. Biol. Chem. 270(6): 2411–2414

    Google Scholar 

  2. Bernaola-Galvan, P., Carpena, P., Roman-Roldon, R., Oliver, J.L.: Mapping isochores by entropic segmentation of long genome sequences. In: Sankoff, D., Lengauer, T. (eds.) RECOMB Proceedings of the fifth annual international conference on computational biology, pp. 217–218 (2001)

  3. Bernardi G., Olofsson B., Filipski J., Zerial M., Salinas J., Cuny G., Meunier-Rotival M. and Rodier F. (1985). The mosaic genome of warm-blooded vertabrates. Science 228(4702): 953–958

    Article  Google Scholar 

  4. Bernardi G. (2000). Isochores and the evolutionary genomics of vertebrates. review. Gene 241(1): 3–17

    Article  Google Scholar 

  5. Borodovsky M. and McIninch J. (1993). Recognition of genes in DNA sequences with ambiguities. Biosystems 30(1–3): 161–171

    Article  Google Scholar 

  6. Burge C. and Karlin S. (1997). Prediction of complete gene structure in human genomic DNA. J. Mol. Biol. 268: 78–94

    Article  Google Scholar 

  7. Burge C. and Karlin S. (1998). Finding the genes in genomic DNA. Curr.Opin.Struc.Biol. 8: 346–354

    Article  Google Scholar 

  8. Chen C., Gentles A.J., Jurka J. and Karlin S. (2002). Genes, pseudogenes, and Alu sequence organization across human chromosomes 21 and 22. PNAS 9: 2930–3935

    Article  Google Scholar 

  9. Clay O., Caccio S., Zoubak S., Mouchiroud D. and Bernardi G. (1996). Human coding and non coding DNA: compositional correlations. Mol. Phyl. Evol. 1: 2–12

    Article  Google Scholar 

  10. De Sario A., Geigl E.M., Palmieri G., D’Urso M. and Bernardi G. (1996). A compositional map of human chromosome band Xq28. Proc. Natl. Acad. Sci. USA 93(3): 1298–1302

    Article  Google Scholar 

  11. D’Onofrio G., Mouchiroud D., Aïssani B., Gautier C. and Bernardi B. (1991). Correlations between the compositional properties of human genes, codon usage, and amino acid composition of proteins. J. Mol. Evol. 32: 504–510

    Article  Google Scholar 

  12. Durbin R., Eddy S.R., Krogh A. and Mitchison G.J. (1998). Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  13. Eyre-Walker A. and Hurst L.D. (2001). The evolution of isochores. Nat. Rev. Genet. Rev. 2(7): 549–555

    Article  Google Scholar 

  14. Guédon Y. (2003). Estimating hidden semi-Markov chains from discrete sequences. J. Comput. Graph. Stat. 12(3): 604–639

    Article  Google Scholar 

  15. Guéguen L. (2005). Sarment: python modules for HMM analysis and partitioning of sequences. Bioinformatics 21(16): 3427–34278

    Article  Google Scholar 

  16. Hawkins J.D. (1998). A survey on intron and exon lengths. Nucleic Acids Res. 16: 9893–9908

    Article  Google Scholar 

  17. Henderson J., Salzberg S. and Fasman K.H. (1997). Finding genes in DNA with a hidden Markov model. J. Comput. Biol. 4: 127–141

    Article  Google Scholar 

  18. Jabbari K. and Bernardi G. (1998). CpG doublets, CpG islands and Alu repeats in long human DNA sequences from different isochore families. Gene 224(1–2): 123–127

    Article  Google Scholar 

  19. Johnson M.T. (2005). Capacity and complexity of HMM duration modeling techniques. IEEE Process. Lett. 12(5): 407–410

    Article  Google Scholar 

  20. Krogh, A.: Two methods for improving performance of an HMM and their application for gene-finding. In: Proceedings of the fifth international conference on intelligent systems for molecular biology 179–186 (1997)

  21. Li W., Bernaola-Galvan P., Carpena P. and Oliver J.L. (2003). Isochores merit the prefix ‘iso’. Comput. Biol. Chem. 27(1): 5–10

    Article  Google Scholar 

  22. Lukashin V.A. and Borodovsky M. (1998). Gene-Mark.hmm: new solutions for gene finding. Nucleic Acids Res. 26: 1107–1115

    Article  Google Scholar 

  23. Macaya G., Thiery J.P. and Bernardi G. (1976). An approach to the organization of eukaryotic genomes at a macromolecular level. J. Mol. Biol. 108(1): 237–254

    Article  Google Scholar 

  24. Mouchiroud D., D’Onofrio G., Aissani B., Macaya G., Gautier C. and Bernardi G. (1991). The distribution of genes in the human genome. Gene 100: 181–187

    Article  Google Scholar 

  25. Nekrutenko A. and Li W.H. (2000). Assessment of compositional heterogeneity within and between eukaryotic genomes. Genome Res. 10(12): 1986–1995

    Article  Google Scholar 

  26. Oliver J.L., Carpena P., Roman-Roldan R., Mata-Balaguer T., Mejias-Romero A., Hackenberg M. and Bernaola-Galvan P. (2002). Isochore chromosome maps of the human genome. Gene 300(1–2): 117–127

    Article  Google Scholar 

  27. Oliver J.L., Carpena P., Hackenberg M., Bernaola-Galvan P. (2004) Isofinder: Computational prediction of isochores in genome sequences. Nucleic Acids Res. 32(1), 287–292 (2004)

    Article  Google Scholar 

  28. Rabiner L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2): 257–286

    Article  Google Scholar 

  29. Rogic S., Mackworth A.K. and Ouellette F.B. (2001). Evaluation of gene-finding programs on mammalian sequences. Genome Res. 11: 817–832

    Article  Google Scholar 

  30. Thiery J.P., Macaya G. and Bernardi G. (1976). An analysis of eukaryotic genomes by density gradient centrifugation. J. Mol. Biol. 108(1): 219–235

    Article  Google Scholar 

  31. Zhang C.T. and Zhang R. (2003). An isochore map of the human genome based on the Z curve method. Gene 317(1–2): 127–135

    Article  Google Scholar 

  32. Zoubak S., Clay O. and Bernardi G. (1996). The gene distribution of the human genome. Gene 174(1): 95–102

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christelle Melodelima.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Melodelima, C., Gautier, C. & Piau, D. A markovian approach for the prediction of mouse isochores. J. Math. Biol. 55, 353–364 (2007). https://doi.org/10.1007/s00285-007-0087-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00285-007-0087-5

Keywords

Mathematics Subject Classification (2000)

Navigation