Self-averaging sequences which fail to converge

We consider self-averaging sequences in which each term is a weighted average over previous terms. For several sequences of this kind it is known that they do not converge to a limit. These sequences share the property that $n$th term is mainly based on terms around a fixed fraction of $n$. We give a probabilistic interpretation to such sequences and give weak conditions under which it is natural to expect non-convergence. Our methods are illustrated by application to the group Russian roulette problem.


Introduction
Suppose n ≥ 2 people want to select a loser by flipping coins: all of them flip their coin and those that flip heads are winners. The others continue flipping until there is a single loser. This problem and generalizations of it have been extensively studied, see [2,3,4,5,6,7,8,9,10,12,13]. If at some stage all players flip heads before a loser is selected, we say the process fails. It is known that the probability of failure does not converge as n increases. This sequence of probabilities is what we call a self-averaging sequence. A similar problem is the shooting problem or group Russian roulette problem, as described by Winkler [15]. Here players do not flip coins, but fire a gun on another player. Again one could ask for the probability on one survivor. Analysis of this problem is harder, since survival of an individual depends on survival of the other players. Recently van de Brug, Kager and Meester [14] rigorously showed that also here the sequence of probabilities does not converge and they gave bounds for the liminf and limsup.
In this paper, we put such problems into a mild probabilistic framework and explain why this phenomenon of non-convergence is not surprising. The fact that in each round about the same fraction α of the players survives is the key ingredient to get oscillation instead of convergence. In the loser selection problem and the shooting problem the fluctuations around the fixed fraction are of order √ n. We demonstrate that non-convergence of a self-averaging sequence is natural to expect under a much weaker condition: if the fluctuations are of order strictly less than n, the sequence should be expected not to converge. Our main theorem gives a way to bound the limit inferior * Radboud University Nijmegen, The Netherlands. E-mail: e.cator@math.ru.nl,h.don@math.ru.nl and limit superior of a self-averaging sequence. Particular details of the problem do not really play a role in these bounds.
Oscillations on a log-periodic time scale occur in many other problems as well, for example in random walks on fractal lattices and in various branches of statistical mechanics [1,11]. To illustrate our general results, we applied our methods to the group Russian roulette problem. We obtain quite sharp upper and lower bounds on the liminf and limsup of the sequence of probabilities. Non-convergence of the sequence follows immediately from these bounds.

Problem formulation and running example 2.1 General setting
We will consider bounded sequences p(n), n ≥ 0 which are defined as follows. The first term(s) are assumed to be given as starting values and then each next term is obtained by taking some weighted average over previous terms. This is a deterministic definition, but nevertheless we will adopt a natural probabilistic interpretation. The weighted average can be seen as the expectation of some random variable. So we will study a sequence p(n) which is given for 0 ≤ n ≤ n 0 and satisfies n > n 0 , where Y (n) ∈ {0, 1, . . . , n − 1} are random variables depending on n. For convenience we define Y (n) = n for n ≤ n 0 . Furthermore, we assume that the expectation of Y (n) is close to a fixed fraction of n and that the variance around this fraction is of order n as well.
(2.2) Sections 3 and 4 deal with this general problem. Section 5 discusses a specific example: group Russian roulette, which is explained below. In Section 6 we show that the condition on the variance can be weakened even further.

Running example: group Russian roulette
To demonstrate our methods, we apply them to the group Russian roulette problem.
Suppose in a group of n people, each is armed with a gun. They all uniformly at random select one of the others to shoot at and they all shoot simultaneously. The survivors continue playing this 'game' until either one or zero survivors are left. The probability that in the end no survivor is left is called p(n). One characteristic of this problem is that in each round about the same fraction survives. Indeed, the probability for an individual to survive is (1 − 1 n−1 ) n−1 ≈ 1 e , so the expected number surviving the first round is about n e . This problem was recently studied by Van de Brug, Kager and Meester [14], who showed that lim n→∞ p(n) does not exist. In the current paper we will show that this phenomenon is a natural thing to expect under the quite general conditions of (2.1) and (2.2).

Analysis for fixed n: recursions in terms of expectation and variance
Fix n and define a sequence of random variables by In the setting of group Russian roulette, this means that the starting population has size n and that X k is the number of survivors after k rounds of shooting. When we condition on X k , the number of survivors after the (k + 1)st round is expected to be close to X k /e and the variance is of order X k as well (the precise constants will be derived in Section 5). One might therefore expect that the number of survivors after k rounds is about X 0 /e k . The next lemma shows that this guess is correct for the general case if (2.1) and (2.2) hold. Lemma 3.1. Suppose we have a constant X 0 > 0, random variables X 1 , X 2 , . . . and constants 0 < α < 1 and β ≥ 0 such that for k ≥ 0 Then for all k ≥ 0 Proof. For k = 0, the statement (3.3) is trivial. For k ≥ 1 we will prove by induction the following stronger statement: To get further grip on the sequence (X k ) k≥0 , we also investigate the variance of the terms. It turns out that also the variance basically scales down with a factor α in each round.

Lemma 3.2.
Suppose we have a constant X 0 > 0, random variables X 1 , X 2 , . . . and constants 0 < α < 1 and β, γ, δ ≥ 0 such that for k ≥ 0 Then there exist constants C and D, independent of X 0 , such that for all k ≥ 0 Proof. First we split the variance into two terms: For the first term, we use (3.6) and Lemma 3.1: For the second term, we use that (3.5) implies This gives By elementary calculations, one can show that for all whenever c is positive and K > c 2 . This means that (3.13) if K > β 2 /α 2 . Now fix k and assume that (3.7) holds true for this k. Using the bounds (3.9) and (3.13), we obtain We want the constants in between brackets to be smaller than C and D respectively, i.e.
This can only be true if Since α < 1, the corresponding restriction on K is K > β 2 α 2 −α 3 . If K satisfies this inequality, then K also exceeds β 2 /α 2 and (3.15) can be fulfilled by choosing C and D large enough. The minimal solutions are given by

Bounds for subseqences of p
The previous section focussed on the random variables X k as defined in (3.1). Now we will use these results to study subsequences of p. For all k ≥ 1 we obtain (4.1) and hence for all k ≥ 0 . We are interested in the limiting behavior if n increases, so we will blow up X 0 by powers of α −1 and investigate the subsequence that emerges. Our main theorem is the following: Theorem 4.1. Let (p(n)) n∈N be a sequence satisfying (2.1) and (2.2). Choose x ∈ R, x > 0 arbitrary and let (N i ) i≥0 be an increasing integer sequence defined by .17) and define the positive constants where the weights (q k ) k∈N are a positive decreasing sequence for which ∞ k=0 q k = 1, given by Informally speaking, the idea of this theorem is that the values of p(n) close to n = x can be used to bound a whole subsequence of p. The problem setting suggests that p(n) is roughly f (log(n)), where f is some periodic function with period log(α). The scale on which the "periodic" fluctuations occur in p grows with the same speed as n. On the other hand, as x increases the intervals I k grow like √ Cx, which is less than the scale of the fluctuations. So if x is large, we might expect p(n) to vary only a little bit around p([x]) for n ∈ I k . This would imply that the subsequence p(N i ) stays close to p([x]). Taking x in a local maximum of the sequence p(n), we can use (4.3) to bound lim sup n→∞ p(n) from below. Similarly, we will construct an upper bound for lim inf n→∞ p(n).
Proof of Theorem 4.1. Let X k be defined as before by X k = Y (X k−1 ) for k ≥ 1. We will consider these random variables for X 0 = N i , i ≥ 0. Define Z i to be the conditioned random variable X i |(X 0 = N i ) and let Then for all i ≥ 1, using Lemma 3.1 and Lemma 3.2, and incorporating the rounding Our main tool to control the subsequence (p(N i )) ∞ i=1 will be the following version of Chebyshev's inequality: where X is a random variable with expectation µ and variance σ 2 . Applying this to the random variables Z i leads to Now we are ready to bound p(N i ) = E[p(Z i )] as follows: where the minima are taken over N. Now observe that by (4.7). Since the right hand side is summable, the last sum S in (4.9) can be written as which implies (4.13).
The bounds l(x) and u(x) in this lemma can be found by applying Theorem 4.1. We will use finitely many values of p(n) with n close to [x] to approximate the sums in (4.3). The tails of these sums will be bounded by uniform bounds on p(n).

Non-convergence in group Russian roulette
In this section we will apply our methods to the group Russian roulette problem, as introduced in Section 2. We will see that it is quite straightforward to prove that the probability p(n) to have no survivor in the end does not converge as the group size n increases.
We start by checking that the group Russian roulette problem indeed fits into our For i = 1, . . . , n, we define I i to be the indicator of the event that individual i survives the first round. We will calculate the expectation ν n and variance τ 2 n of Y (n) for n ≥ 2.
ECP 22 (2017), paper 16. Self-averaging sequences which fail to converge Next, we calculate the second moment of Y (n).
This gives It can be shown that for all n ≥ 0, which means that we can choose the following constants in (2.2): Suppose we start with n ≥ 1 people. Fix a subset of size 1 ≤ k ≤ n and denote the probability that exactly this subset is killed in the first round by q n,k . Then q n,1 = 0 and for 2 ≤ k ≤ n, The recursion for p(n) is the following: n k q n,n−k p(k).

(5.2)
Calculating this requires careful handling of very large binomial coefficients and very small probabilities, avoiding accumulation of rounding errors. Therefore we gratefully make use of the values for p(n) as calculated by Van de Brug, Kager and Meester [14] who rigorously computed the first couple of digits of p(n). As a last ingredient, we calculate the constants C and D of Lemma 3.2 by equation (3.17). Note that there is still the free parameter K in the expressions for these constants, which should satisfy K > β 2 α 2 −α 3 = 4e e−1 ≈ 6.33. This constant can be used for fine-tuning of C and D: increasing K gives a smaller C but a larger D. We will choose K = 138, because this appears to give the sharpest bounds in Lemma 4.2. For given x, the terms in the sum in (4.3) can now be calculated explicitly, since t and τ are determined by constants already known. A numerical lower bound for (p(N i )) i≥0 is then obtained by performing this calculation for the first terms in the sum and bounding the tail by the uniform bound p(n) ≥ 0 for all n. An upper bound for (p(N i )) i≥0 is calculated in an analogous way.
As an illustration, we plotted l(x) and u(x) in Figure 1 for x ∈ [40, 40e], which is one 'period'. The sequence p(n) itself is only defined on integers, but l(x) and u(x) are functions of a continuous variable. The discontinuities in these bounds are caused by a shifting window over which minima and maxima are taken in (4.3).
To find bounds for the liminf and limsup of p(n), we used the values of p(n) as calculated by the recursion (5.2) in the range 0 ≤ n ≤ 6000. This results in the following theorem:   Figure 2 illustrates this result. The blue curve gives values of p(n). The red curves are the bounds l(x) and u(x). For a fixed value of x, we approximated the bounds of Theorem 4.1 by using all p(n), 0 ≤ n ≤ 6000 and bounding the tails of the sums by 0 ≤ p(n) ≤ 1. So for fixed x, these curves give an interval containing all terms of the sequence (p([α −i x])) i≥0 . In particular, lim sup n→∞ p(n) is bounded from below by the maximum of the lower red curve (l(x) ≈ 0.5228, attained at x ≈ 2796). Also each local maximum of the upper red curve is an upper bound for lim sup n→∞ p(n), as is proved in Lemma 4.2. Similar statements hold for lim inf n→∞ p(n) (minimum of u(x) is about 0.4714, attained at x ≈ 4609). The two horizontal lines indicate a band which will be left infinitely many times on both sides by values of p(n). In [14], it was shown that lim inf n→∞ p(n) ≤ 0.477487 and lim sup n→∞ p(n) ≥ 0.515383. So our bounds are an improvement over the results in [14], despite the fact that our method does not rely on particular details of the group Russian roulette problem.

Changing the order of the variance
In the setting of our problem, we assumed that the variance of Y (n) is of order at most n, see (2.2). In fact, the phenomenon of non-convergence can even occur if the variance is of order n p with p < 2 as the following generalization of Lemma 3.2 shows. That the ideas still work is not really surprising, since for p < 2 the scale of the fluctuations in Y (n) is still smaller than the scale of the periodic fluctuations in the sequence (p(n)) n≥0 . If the power p gets closer to 2, the constants get worse, but the whole idea of subsequences which might have different limits essentially does not change.
Assuming that the induction hypothesis (6.3) holds for some fixed k, we can further bound this as follows: Here we have used that (α k X 0 ) p ≤ (α k X 0 ) 2 + 1 and (x + y) p ≤ 2x p + 2y p for x, y ≥ 0. For the term Var(E[X k+1 | X k ]), we obtain the bound of (3.13), after which the induction hypothesis (6.3) gives whenever K > β 2 /α 2 . Combining the bounds (6.4) and (6.5) leads to To complete the proof, this needs to be smaller than Cα p(k+1) X p 0 + D. For this to be true, we require that which can be achieved by choosing K > β 2 α 2 −α 4−p . Now since p < 2 we can first choose C and D (both independent of k) large enough such that the upper bound in (6.6) is indeed smaller than Cα p(k+1) X p 0 + D. This finishes the inductive proof.
With this lemma a statement analogous to Theorem 4.1 can be proved in the same way for the case when the variance of Y (n) is of order n p , p < 2.

Conclusions and remarks
We have studied sequences p(n) characterized by the property that each term is a weighted average over previous terms.
In several examples in the literature, such sequences do not converge to a limit, which at first sight might be surprising. The main purpose of this paper is to demonstrate that it is natural to expect non-convergence if the largest weights in the average p(n) are given to values p(k) for which k is close to a fixed fraction of n. It turns out that non-convergence is predictable or even inevitable under fairly weak conditions. The intuition is that fluctuations in p happen on a large scale, and if the averages are taken on a smaller scale, they can not let the fluctuations vanish. Our methods are illustrated by proving non-convergence for the group Russian roulette problem.
Another question one could ask is if p(n) converges in the sense that there exists a periodic function f : R → R with period 1 such that lim x→∞ |p([α −x ]) − f (x)| = 0. As is shown in [14], such a function exists in the case of group Russian roulette. However, the setting of (2.1) and (2.2) is not sufficient to prove such convergence, as the following example demonstrates. This means that one would need stronger assumptions on the random variables Y (n). We believe that for proving (7.1), a suitable requirement could be that the total variation distance between Y (n) and Y (n+1) goes to zero as n increases.
However, proving this goes beyond the scope of the current paper.
As a final remark, we note that our methods also apply to a continous setting where g : (0, ∞) → R is an absolutely bounded function and where g(x) is given for x ≤ x 0 . In this case the recursion is of the form g(x) = E[g(N x )], x > x 0 , where N x is a random variable supported on (0, x).