Exponential approximation for the nearly critical Galton-Watson process and occupation times of Markov chains

In this article we provide new applications for exponential approximation using the framework of Pek\"oz and R\"ollin (in press), which is based on Stein's method. We give error bounds for the nearly critical Galton-Watson process conditioned on non-extinction, and for the occupation times of Markov chains; for the latter, in particular, we give a new exponential approximation rate for the number of revisits to the origin for general two dimensional random walk, also known as the Erd\H{o}s-Taylor theorem.


INTRODUCTION
A new framework for estimating the error of the exponential approximation was recently developed in Peköz and Röllin (in press), where it was applied to geometric sums, Markov chain hitting times, and the critical Galton-Watson conditioned on non-extinction. In this article we provide some generalizations to the approach of Peköz and Röllin (in press) and apply them to study Markov chain occupation times and a result of Erdős and Taylor (1960) for the number of visits to the origin by the two dimensional random walk, as well as to get a rate for the result of Fahady, Quine, and Vere-Jones (1971) for the nearly critical Galton-Watson branching process conditioned on non-extinction.
The main result in Peköz and Röllin (in press) that we use is based on Stein's method (see e.g. Ross and Peköz (2007) for an introduction) and can be thought of as formalizing the intuitive notion that a random variable X has approximately an exponential distribution if X and X e are close in Version from August 25, 2010, 00:18 distribution, where X e has the equilibrium distribution with respect to X characterized by The equilibrium distribution appears in renewal theory as the time until the next renewal starting from steady-state. A renewal process with exponential inter-renewal times has the exponential distribution for its equilibrium distribution, and so the above intuition is not surprising. Peköz and Röllin (in press) give bounds on the accuracy of the exponential approximation in terms of how closely X and X e can be coupled together on the same probability space; one version of the result we will use below can be written as sup Some heuristics for Stein's method can be understood using size-biased random variables. For a continuous random variable X with probability density function f (x), the size-biased random variable X s has density xf (x)/ X. The size of the renewal interval containing a randomly chosen point as well as the children in the family of a randomly chosen child are examples of size-biased random variables; see Brown (2006) and Arratia and Goldstein (2010) for surveys and applications of size biasing.
Stein's method for the exponential distribution, as well as for some other nonnegative distributions, can be viewed in terms of size-biasing. For the Poisson approximation to some random variable X, the Stein-Chen method (see Barbour, Holst, and Janson (1992)) gives a bound on the error in terms of how closely X and X s −1 can be coupled together on the same probability space; these both have exactly the same distribution when X has a Poisson distribution. For approximation by a binomial distribution (see Peköz, Röllin,Čekanavičius, and Shwartz (2009)), we can obtain a bound in terms of how closely X s − 1 and n − (n − X) s can be coupled; both of these have exactly the same distribution if X is binomial with parameters n and p. For the exponential distribution, we can obtain a bound on the error in terms of how closely X and U X s can be coupled, where U is an independent uniform (0,1) random variable independent of all else; X e has the same distribution as U X s . This last approach is the one we use below for the nearly critical Galton Watson process conditioned on non-extinction.
The organization of this article is as follows. In Section 2 we give the notation, background and preliminaries. In Section 3 we consider the setting of a nearly critical Galton Watson branching process conditioned on nonextinction. In Section 4 we study general dependent sums, occupation times for Markov chains and the the number of times the origin is revisited for the two-dimensional general random walk.

PRELIMINARIES
We first define the probability distance metrics we use below. For two probability distributions F and G define the Kolmogorov distance metric as If both distributions have finite expectation, define the Wasserstein distance metric We can relate the two metrics using d K (P, Exp(1)) 1.74 d W (P, Exp(1)); see e.g. Gibbs and Su (2002).
Central to the approach in Peköz and Röllin (in press) is the equilibrium distribution from renewal theory, and we next give the definition we use.
Definition 2.1. Let X be a non-negative random variable with finite mean. We say that a random variable X e has the equilibrium distribution w.r.t. X if for all Lipschitz-continuous f (2.1) It is straightforward that this implies Indeed for nonnegative X having finite first moment, define the distribution function so that F e is the distribution function of X e and our definition via (2.1) is consistent with that from renewal theory. The size biased distribution will also be used below. We define it as follows.
Definition 2.2. Let X be a non-negative random variable with finite mean. We say that a random variable X s has the size-biased distribution w.
We next present the key result from Peköz and Röllin (in press) that we will use in the applications that follow.
Theorem 2.1 (Peköz and Röllin (in press), Theorem 2.1). Let W be a nonnegative random variable with W = 1 and let W e have the equilibrium distribution w.r.t. W. Then, for any β > 0, and, if in addition W has finite second moment,

THE NEARLY CRITICAL GALTON-WATSON BRANCHING PROCESS
It was shown by Yaglom (1947) that the limiting distribution of a critical Galton-Watson branching process conditioned on non-extinction is exponential. A corresponding rate of convergence was first proved by Peköz and Röllin (in press). In the super-and sub-critical cases, the limiting distributions are very difficult to calculate and are only known explicitly in very special cases; see e.g. Bingham (1988). Fahady et al. (1971), however, were able to show that the limiting distribution of a nearly critical branching process conditioned on non-extinction converges to the exponential distribution as m → 1 over a general class of offspring distributions. The following theorem gives explicit error bounds for the exponential approximation for any finite n and any m = 1. The bounds obtained confirm the exponential limit as m → 1 for the limiting distribution under the assumptions of a finite third moment for the offspring distribution and a non-zero probability of having 2 or more offspring.
Theorem 3.1. Consider a Galton-Watson branching process starting from a single particle at time zero, and let Z n be the size of the nth generation.
In particular, for any n 1 and m > 1, and, for any n 2 and 1 2 m < 1, Note that as a consequence of (3.2) and (3.3) we obtain that the limiting distribution of a nearly critical branching process conditioned on nonextinction converges to the exponential as m → 1 under the very mild condition that C does not grow too fast. This is in particular true if the third moment of the offspring distribution remains bounded from above and È[Z 1 2] remains bounded away from zero as m → 1.
Proof of Theorem 3.1. With some modifications, we follow the line of argument from Peköz and Röllin (in press), which is based on the size-biased branching tree of Lyons, Pemantle, and Peres (1995).
We assume that the particles in the tree are labeled and ordered. That is, if w and v are two particles in the same generation, then all offspring of w are to the left of the offspring of v, whenever w is to the left of v. We start in generation 0 with one particle v 0 and let it have a size-biased number of offspring. Then we pick one of the offspring of v 0 uniformly at random and label it v 1 . For each of the siblings of v 1 we continue with an independent Galton-Watson branching process with the original offspring distribution.
For v 1 we proceed as we did for v 0 , i.e., we give it a size-biased number of offspring, pick one uniformly at random, label it v 2 , and so on.
Denote by S n the total number of particles in generation n. Denote by L n and R n , respectively, the number of particles to the left (exclusive v n ) and to the right (inclusive v n ), respectively, of v n . Denote by S n,j the number of particles in generation n that stem from any of the siblings of v j (but not v j itself). Likewise, let L n,j and R n,j , respectively, be the number of particles in generation n that stem from the siblings to the left and right, respectively, of v j . We have the relations L n = n j=1 L n,j and R n = 1 + n j=1 R n,j . Next let R ′ n,j be independent random variables such that L (R ′ n,j ) = L (R n,j |L n,j = 0). and, with A n,j = {L n,j = 0}, define Below are a few facts that we will subsequently use to give the proof of the theorem. In what follows, let σ 2 = Var Z 1 and γ = |Z 1 | 3 .
(i) The size-biased distribution of L (X) is the same as that of L (X|X > 0); (ii) S n has the size-biased distribution of L (Z n ); (iii) v n is uniformly distributed among the particles of generation n; 1 − m n For (i)-(iv) see Peköz and Röllin (in press). Using independence, which proves (v). If X j denotes the number of siblings of v j , having the size-biased distribution of L (Z 1 ) minus 1, we have which proves (vii). Finally, using the Corollary on page 356 of Fujimagari (1980), we have which is (viii) (note that the result cited is for bounded offspring distribution, but easily extends to the unbounded case). Due to (iv) we can set W = λR * n . Due to (i) and (ii), S n has the size-biased distribution with respect to R * n . Let U be an independent and uniform random variable on [0, 1]. Now, R n − U is a continuous random variable taking values on [0, S n ] and, due to (iii), has distribution L (U S n ); hence we can set W e = λ(R n − U ). Therefore, we can apply (2.3) and use (v)-(vii) to obtain which proves (3.1).
Let now m > 1. Note that m 2j /(m j − 1) is an increasing function in j, except possibly at the beginning where it may be decreasing. Hence m(m n − 1) log(m) =: r 1 + r 2 + r 3 + r 4 + r 5 .

Recall that
This implies that and, using that (a + b) 2 2a 2 + 2b 2 , Furthermore, recalling that (m − 1) m log(m), Finally, The last estimate is due to the fact that log(x)/x is clearly bounded by (1 + log(x))/x for x > 1, and the latter is a decreasing function, and then by applying (3.4). Putting the estimates for r 1 through r 5 together proves (3.2). Let now 0 < m < 1. As m 2j /(1 − m j ) is a decreasing function in j (to see this, note that x 2 1−x is increasing on 0 < x < 1) we have (1 − m n ) log(m) =: r 1 + r 2 + r 3 + r 4 + r 5 .
Putting all the estimates together proves (3.3).
Lemma 3.2. Let a, b and c be real numbers bigger than 1 such that Proof. It is clear from the monotonicity of the logarithm function that for x, y > 0 we have x log(x) + y log(y) (x + y) log(x + y).
Rewriting this inequality for x = 1/b and y = 1/c, we have 1 + log bc b+c bc b+c Noting that 1+log(a) a is a decreasing function for a 1 and noting that a

VISITS TO THE ORIGIN FOR A TWO DIMENSIONAL SIMPLE RANDOM WALK
Exponential approximation results for sums of nonnegative random variables X 1 , X 2 , . . . , X n satisfying the condition Var( (X i |X 1 , . . . X i−1 )) = 0 for all i were given in Peköz and Röllin (in press, Theorem 3.1), but not for more general dependent sums. Here we give an extension to sums of arbitrarily dependent nonnegative random variables, apply it to occupation times for Markov chains and then illustrate it by getting a new exponential approximation rate for the number of times a general two-dimensional random walk revisits the origin.
Theorem 4.1. Let W = λ n i=1 X i where X 1 , X 2 , . . . , X n are (possibly dependent) nonnegative random variables and let λ = 1/ n i=1 X i . Suppose W i (x) is independent of all else and Let I be independent of all else with È[I = i] = λ X i , let U be a uniform random variable on (0, 1), and let X s i be independent of all else with the size-biased distribution with respect to X i . Then W e = W I (X s I ) + λU X s I has the equilibrium distribution with respect to W .
Proof. Let S m = λ m i=1 X i . By first conditioning on I and U , and using (2.2) and L (S i ) = L (W i (X i )) in the third line we get Remark 4.1. The argument goes through in the same way when instead we define We next apply the above result to Markov chain occupation times. Our next result gives a bound on the error of the exponential approximation for the number of times a Markov chain revisits its starting state. More general asymptotic results of this type, but without explicit bounds on the error, go back to Darling and Kac (1957). We next consider a general random walk on the two dimensional integer lattice started at the origin. As a consequence of Lawler and Limic (2010, p. 24) we have the following lemma. We are now able to give a bound on the error of the exponential approximation for the number of times the random walk revisits the origin. This type of result, for simple random walk, goes back to Erdős and Taylor (1960).
Corollary 4.4. Let Z n be an irreducible and aperiodic random walk on 2 with mean zero and finite third moment. Let R be the number of return visits to the origin by time n, and let W = λR, where λ = 1/ R. Then, there is constant C independent of n such that d W (L (W ), L (Exp(1)) C log n .
for all n.
Proof. Let X n = I {Zn=0} be the indicator for the event that the random walk revisits the origin at time n. Lemma 4.3 gives λ C/ log n and thus the result follows from Corollary 4.2 and, where C may be different (but independent of n) in each instance used, Remark 4.2. The result for the two-dimensional simple random walk sup a<x<b |È[W > x] − e −x | C log log n log n for fixed a and b follows from Erdős and Taylor (1960, Eq. (3.10)), so the above corollary can be viewed as a complement and extension. Using the method of moments, Gärtner and Sun (2009, Theorem 1.1) give an argument for the analogous exponential limit theorem for general random walks, but without a rate of convergence.