Application of Smirnov Words to Waiting Time Distributions of Runs

Consider infinite random words over a finite alphabet where the letters occur as an i.i.d. sequence according to some arbitrary distribution on the alphabet. The expectation and the variance of the waiting time for the first completed $h$-run of any letter (i.e., first occurrence of $h$ subsequential equal letters) is computed. The expected waiting time for the completion of $h$-runs of $j$ arbitrary distinct letters is also given.


Introduction
In [7], the following paradox is presented: In measuring the regularity of a die one may use waiting times for sequences of the same side of certain lengths. For example, if ones throws a regular six-sided die, it takes 7 throws on average to get a number subsequently twice and 43 throws to get a number three times in succession. Heuristically, one would expect that a smaller number of throws is needed to get such sequences with a biased die. This leads to the definition to call one die more regular than another one if more throws are needed to get sequences of one side of a certain length. Now the paradox is that there exist dice-say A and B-where the mean waiting time for two digits in a row is longer for die A while the mean waiting time for three digits in a row is longer for die B (an example has been given by Móri, see [7, p. 62]). The consequence of this paradox is that one cannot use the mean waiting times for such runs as a (sufficient) criterion for the definition of regularity of a die (or whatever random sequence of digits from a finite alphabet).
This paradox gave motivation to calculate first and second moments of such waiting times for so called h-runs. In particular, the formula for the first moment of the waiting time for the first completed hrun of any digit-which was already given in [7]-is proved without using the strong law of large numbers or any other limit theorem (see Theorem 1). Moreover, the variance of the waiting time for the first completed h-run is presented in the same theorem. We then compute the waiting time for the completion of h-runs of j different letters in Theorem 2. In particular, for j = r (the number of possible letters), we get results about the waiting time for a full collection of runs.
Our fundamental technique is the calculation of generating functions of such waiting times; our main trick is the combination of two very useful observations: Firstly, we make use of the very simple but crucial identity (1) (see [1]) which already has been a powerful tool in the treatment of the coupon collector problem and/or the birthday paradox. Secondly, we use the generating function of Smirnov words (see [2]) to count words with a limited number of repetitions of single letters using an appropriate substitution.
We conclude the paper in Section 5 with an algorithmic approach for specific situations.

Preliminaries
We consider infinite words X 1 X 2 . . . over the alphabet A = {1, . . . , r} where the random variables X i are i.i.d. with P{X i = k} = p k > 0 for some p 1 , . . . , p r .
We say that a letter ∈ A has an h-run in X 1 . . . X n if there are h consecutive letters in the word X 1 . . . X n , or in other words, if the word h = . . . (with h repetitions) is a factor of the word X 1 . . . X n .
We consider the random variable B j giving the first position n such that there exist j of the r letters having an h-run in X 1 . . . X n . This is a random variable on the infinite product space consisting of all infinite words endowed with the product measure.
On the other hand, we consider the random variable Y n counting the number of letters which had an h-run in X 1 . . . X n . This is a random variable on the finite product space consisting of all words of length n, again with its product measure.
By construction, we have With the generating function P{Y n < j}z n , this amounts to E(B j ) = G j (1). To compute the variance, we note the simple fact that where we used (1) and the definition of G j (z) given in (3). We conclude that A Smirnov word is defined to be any word which has no consecutive equal letters. The ordinary generating function of Smirnov words over the alphabet A is where v i counts the number of occurrences of the letter i, cf. Flajolet and Sedgewick [2, Example III.24].

Moments of the first h-run
In this section, we study the first occurrence of any h-run. In the framework of Section 2, this corresponds to the case j = 1 and the random variable B 1 .
We prove the following result on the expectation of B 1 : Theorem 1. If p i < 1 for 1 ≤ i ≤ r, the expectation and the variance of the first occurrence of an h-run are and The result (6) on the expectation also appears (without proof) in [7, p. 62]. Each summand of the numerator of (7) is indeed non-negative, because this is equivalent to which is true by the inequality between the arithmetic and the geometric mean, applied to both factors.
Proof of Theorem 1. In the case j = 1, (2) reads Thus we have to determine the probability that a word of length n does not have any h-run. Such words arise from a Smirnov word by replacing single letters by runs of length in {1, . . . , h − 1} of the same letter.
In terms of generating function, this corresponds to replacing each v i by Here, z marks the length of the word. We obtain By (8), we are only interested in z = 1: Replacing the summand 1 in the denominator by p 1 + · · · + p r yields For the variance, we compute G 1 (1) as By (4), we obtain Together with (6), we obtain (7).

Expectation of the first occurrence of h-runs of j letters
In this section, we consider the first position where j of the letters 1, . . . , r had an h-run. In the terminology of Section 2, this corresponds to the random variable B j .
We prove the following theorem on the expectation of B j .
and let A i and Γ i be the substitution operators mapping the variable v i to α i and γ i , respectively.
Then the expectation of the first occurrence of h-runs of exactly j letters is where S(v 1 , . . . , v r ) is defined in (5).
For j = r, i.e., the first occurrence of h-runs of all letters, (10) can be simplified:

Corollary 3. The expectation of the first occurrence of all h-runs is
where Γ i , A i and S(v 1 , . . . , v r ) are defined in (9) and (5), respectively.
In the case of equidistributed letters, i.e., p i = 1/r for all i, we get the following simple expression.
Corollary 4. If p 1 = · · · = p r = 1/r, then the expectation of the first occurrence of all h-runs is where H r denotes the rth harmonic number.
Proof of Theorem 2. As in Section 2, Y n is the number of letters that have at least one run of length ≥ h within X 1 . . . X n . Arbitrary words arise from Smirnov words by replacing single letters by runs of length at least 1 of the same letter. In terms of generating functions, this corresponds to substituting v i by As previously, z counts the length of the word. The variable u i counts the number of occurrences of (non-extensible) m-runs of the letter i with m ≥ h. We now consider the probability generating function F (u 1 , . . . , u r ; z) = S(β 1 (u 1 , z), . . . , β r (u r , z)).  Inserting this and (12) in (2) yields Summing over all n ≥ 0 amounts to setting z = 1 as long as all summands are non-singular at z = 1. As |M | < j, at least one of the u i is zero, w.l.o.g. u 1 = 0. This implies that [z n ]F (u 1 , . . . , u r ; z) ≤ [z n ]F (0, 1, . . . , 1; z) < ρ n for a suitable 0 < ρ < 1 as the word 1 h is forbidden as a factor. Thus F (u 1 , . . . , u r ; z) is regular at z = 1.
Proof of Corollary 3. The polynomial r i=1 (yΓ i + (1 − y)A i ) has degree r in the variable y. Thus extracting all coefficients but the coefficient of y r amounts to substituting y = 1 and subtracting the coefficient of y r , i.e., Inserting this into (10) yields (11).
Proof of Corollary 4. Setting p i = 1/r yields Inserting this in (11) and collecting terms with k occurrences of A i yields where we used the well-known identity cf. for example [5].
Remark 5. Let run lengths h 1 , . . . , h r be given and consider occurrences of h i -runs for the letter i. If B j is the first position n such that there are exactly j letters which had "their" run in X 1 . . . X n , the results of Theorems 1 and 2 as well as Corollary 3 remain valid when all p h i are replaced by p h i i .

Algorithmic Aspects
For fixed h, the occurrence of an h-run of the variable X i can easily be detected by a transducer automaton reading the occurrence probabilities p i and outputting 1 whenever the letter i completes an h run, see Figure 1 for the case r = 2, h = 3 and i = 2. The same can be done for the first occurrence of any h-run, see Figure 2 for r = 2 and h = 3.
The first occurrence of j runs of length h could also be modelled by a transducer.
Using the finite state machine package [4] of the SageMath Mathematics Software [6], such transducers can easily be constructed.  Accompanying this article, in [3], an extension of SageMath to compute the expectation and the variance of the first occurrence of a 1 in the output of a transducer is proposed for inclusion into SageMath.
Using this extension, the expectation and the variance of B 1 can be computed for fixed r and h as shown in Table 1.
The results coincide with those obtained in Theorem 1. For more examples, see the documentation of moments_waiting_time.
For j > 1, we did not compute V(B j ) in general. For fixed r and h, it can be computed by this algorithmic approach.
Obviously, the SageMath method can be used for computing first occurrences of everything which is recognisable by a transducer. On the other hand, explicit results for general r and h such as our Theorems 1 and 2 cannot be obtained by that method.