Skip to main content
Log in

Exact Distribution of the Distances between Any Occurrences of a Set of Words

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

The distribution of the distance between two (or more) successive occurrences of a specific word in a random sequence of letters is known under different models. In this paper, a more general problem is studied: the distribution of the distance between two (or more) successive occurrences of any word of a given set under a Markov model for the sequence. The generating function and a recurrence for obtaining the probabilities are given. These results are applied to study the distribution of the "CHI" motif in the genome sequence of Haemophilus influenzae.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Aki, S. and Hirano, K. (1993). Discrete distributions related to succession events in a two state Markov chain, Statistical Sciences and Data Analysis (eds. K. Matusita, M. L. Puri and T. Hayakawa), 467-474, VSP Publishers, Amsterdam.

    Google Scholar 

  • Aki, S., Balakrishnan, N. and Mohanty, S. G. (1996). Sooner and later waiting times problems for success and failure runs in higher order Markov dependent trials, Ann. Inst. Statist. Math., 48(4), 773-87.

    Google Scholar 

  • Breen S., Waterman M. S. and Zhang, N. (1985). Renewal theory for several patterns, J. Appl. Probab., 22, 228-234.

    Google Scholar 

  • Chrysaphinou, O. and Papastavridis, S. (1990). The occurrence of sequence patterns in repeated dependent experiments, Theory Probab. Appl., 35(1), 145-152.

    Google Scholar 

  • Dembo, A. and Karlin, S. (1992). Poisson approximations for r-scans, Ann. Appl. Prob., 2(2), 329-357.

    Google Scholar 

  • Fu, C. J. (1996). Distribution of runs and patterns associated with a sequence of multi-state trials, Statist. Sinica, 6, 957-974.

    Google Scholar 

  • Fu, C. J. and Koutras, M. V. (1994). Distribution theory of runs: A Markov chain approach, J. Amer. Statist. Assoc., 89(427), 1050-1058.

    Google Scholar 

  • Karlin, S. and Macken, C. (1991). Some statistical problems in the assessment if inhomogeneities of DNA sequence data, J. Amer. Statist. Assoc., 86, 27-35.

    Google Scholar 

  • Koutras, M. V. (1997). Waiting Times and Number of Appearances of Events in a Sequence of Discrete Random Variables, Advances in Combinatorial Methods and Applications to Probability and Statistics (ed. N. Balakrishnan), 363-384, Statistics and Industry and Technology Series, Birkhäuser, Boston.

    Google Scholar 

  • Koutras, M. V. and Alexandrou, V. A. (1997). Sooner waiting time problems in a sequence of trinary trials, J. Appl. Probab., 34, 593-609.

    Google Scholar 

  • Mori, T. F. (1991). On the waiting time til each of some given patterns occurs as a run, Probab. Theory Related Fields, 67, 313-323.

    Google Scholar 

  • Robin, S. and Daudin, J.-J. (1999). Exact distribution of word occurrences in a random sequence of letters, J. Appl. Probab., 36, 179-193.

    Google Scholar 

  • Uchida, M. and Aki, S. (1995). Sooner or later waiting time problems in a two-state Markov chain, Ann. Inst. Statist. Math., 47, 415-433.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

About this article

Cite this article

Robin, S., Daudin, JJ. Exact Distribution of the Distances between Any Occurrences of a Set of Words. Annals of the Institute of Statistical Mathematics 53, 895–905 (2001). https://doi.org/10.1023/A:1014633825822

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1014633825822

Navigation