Abstract
The distribution of the distance between two (or more) successive occurrences of a specific word in a random sequence of letters is known under different models. In this paper, a more general problem is studied: the distribution of the distance between two (or more) successive occurrences of any word of a given set under a Markov model for the sequence. The generating function and a recurrence for obtaining the probabilities are given. These results are applied to study the distribution of the "CHI" motif in the genome sequence of Haemophilus influenzae.
Similar content being viewed by others
References
Aki, S. and Hirano, K. (1993). Discrete distributions related to succession events in a two state Markov chain, Statistical Sciences and Data Analysis (eds. K. Matusita, M. L. Puri and T. Hayakawa), 467-474, VSP Publishers, Amsterdam.
Aki, S., Balakrishnan, N. and Mohanty, S. G. (1996). Sooner and later waiting times problems for success and failure runs in higher order Markov dependent trials, Ann. Inst. Statist. Math., 48(4), 773-87.
Breen S., Waterman M. S. and Zhang, N. (1985). Renewal theory for several patterns, J. Appl. Probab., 22, 228-234.
Chrysaphinou, O. and Papastavridis, S. (1990). The occurrence of sequence patterns in repeated dependent experiments, Theory Probab. Appl., 35(1), 145-152.
Dembo, A. and Karlin, S. (1992). Poisson approximations for r-scans, Ann. Appl. Prob., 2(2), 329-357.
Fu, C. J. (1996). Distribution of runs and patterns associated with a sequence of multi-state trials, Statist. Sinica, 6, 957-974.
Fu, C. J. and Koutras, M. V. (1994). Distribution theory of runs: A Markov chain approach, J. Amer. Statist. Assoc., 89(427), 1050-1058.
Karlin, S. and Macken, C. (1991). Some statistical problems in the assessment if inhomogeneities of DNA sequence data, J. Amer. Statist. Assoc., 86, 27-35.
Koutras, M. V. (1997). Waiting Times and Number of Appearances of Events in a Sequence of Discrete Random Variables, Advances in Combinatorial Methods and Applications to Probability and Statistics (ed. N. Balakrishnan), 363-384, Statistics and Industry and Technology Series, Birkhäuser, Boston.
Koutras, M. V. and Alexandrou, V. A. (1997). Sooner waiting time problems in a sequence of trinary trials, J. Appl. Probab., 34, 593-609.
Mori, T. F. (1991). On the waiting time til each of some given patterns occurs as a run, Probab. Theory Related Fields, 67, 313-323.
Robin, S. and Daudin, J.-J. (1999). Exact distribution of word occurrences in a random sequence of letters, J. Appl. Probab., 36, 179-193.
Uchida, M. and Aki, S. (1995). Sooner or later waiting time problems in a two-state Markov chain, Ann. Inst. Statist. Math., 47, 415-433.
Author information
Authors and Affiliations
About this article
Cite this article
Robin, S., Daudin, JJ. Exact Distribution of the Distances between Any Occurrences of a Set of Words. Annals of the Institute of Statistical Mathematics 53, 895–905 (2001). https://doi.org/10.1023/A:1014633825822
Issue Date:
DOI: https://doi.org/10.1023/A:1014633825822