A REGENERATION PROOF OF THE CENTRAL LIMIT THEOREM FOR UNIFORMLY ERGODIC MARKOV CHAINS

Central limit theorems for functionals of general state space Markov chains are of crucial importance in sensible implementation of Markov chain Monte Carlo algorithms as well as of vital theoretical interest. Diﬀerent approaches to proving this type of results under diverse assumptions led to a large variety of CTL versions. However due to the recent development of the regeneration theory of Markov chains, many classical CLTs can be reproved using this intuitive probabilistic approach, avoiding technicalities of original proofs. In this paper we provide a characterization of CLTs for ergodic Markov chains via regeneration and then use the result to solve the open problem posed in [17]. We then discuss the diﬀerence between one-step and multiple-step small set condition.


Introduction
Let (X n ) n 0 be a time homogeneous, ergodic Markov chain on a measurable space (X , B(X )), with transition kernel P and a unique stationary measure π on X . We remark that the ergodicity means that lim n→∞ P n (x, ·) − π tv = 0, for all x ∈ X , where · tv denotes the total variation distance. The process (X n ) n 0 may start from any initial distribution π 0 . Let g be a real valued Borel function on X , square integrable against the stationary measure π. We denote byḡ its centered version, namelyḡ = g − gdπ and for simplicity S n := n−1 i=0ḡ (X i ). We say that a √ n−CLT holds for (X n ) n 0 and g if where σ 2 g < ∞. First we aim to provide a general result, namely Theorem 4.1, that gives a necessary and sufficient condition for √ n-CLTs for ergodic chains (which is a generalization of the well known Theorem 17.3.6 [11]). Assume for a moment that there exists a true atom α ∈ B(X ), i.e. such a set α that π(α) > 0 and there exists a probability measure ν on B(X ), such that P (x, A) = ν(A) for all x ∈ α. Let τ α be the first hitting time for α. In this simplistic case we can rephrase our Theorem 4.1 as follows: Furthermore we have the following formula for the variance σ 2 g = π(α)E α τα k=1ḡ (X k ) 2 .
Central limit theorems of this type are crucial for assessing the quality of Markov chain Monte Carlo estimation (see [10] and [5]) and are also of independent theoretical interest. Thus a large body of work on CLTs for functionals of Markov chains exists and a variety of results have been established under different assumptions and with different approaches (see [9] for a review). We discuss briefly the relation between two classical CLT formulations for geometrically ergodic and uniformly ergodic Markov chains. We say that a Markov chain (X n ) n 0 with transition kernel P and stationary distribution π is • geometrically ergodic, if P n (x, ·)−π(·) tv M (x)ρ n , for some ρ < 1 and M (x) < ∞ π-a.e., • uniformly ergodic, if P n (x, ·) − π(·) tv M ρ n , for some ρ < 1 and M < ∞.
Recently the following CLT provided by [8] has been reproved in [17] using the intuitive regeneration approach and avoiding technicalities of the original proof (however see Section 6 for a commentary).

Roberts and
Rosenthal posed an open problem, whether the following CLT version for uniformly ergodic Markov chains due to [4] can also be reproved using direct regeneration arguments.
Theorem 1.4. If a Markov chain (X n ) n 0 with stationary distribution π is uniformly ergodic, then a √ n−CLT holds for (X n ) n 0 and g whenever π(g 2 ) < ∞. Moreover σ 2 g := Xḡ The aim of this paper is to prove Theorem 4.1 and show how to derive from this general framework the regeneration proof of Theorem 1.4. The outline of the paper is as follows. In Section 2 we describe the regeneration construction, then in Section 3 we provide some preliminary results which may also be of independent interest. In Section 4 we detail the proof of Theorem 4.1, and derive Theorem 1.4 as a corollary in Section 5. Section 6 comprises a discussion of some difficulties of the regeneration approach.

Small Sets and the Split Chain
We remark that ergodicity as defined by (1) is equivalent to Harris recurrence and aperiodicity (see Proposition 6.3 in [13]). One of the main feature of Harris recurrent chains is that they are ψ−irreducible and admit the regeneration construction, discovered independently in [12] and [1], and which is now a well established technique. In particular such chains satisfy Definition 2.1 (Minorization Condition). For some ε > 0, some C ∈ B + (X ) := {A ∈ B(X ) : ψ(A) > 0} and some probability measure ν m with ν m (C) = 1 we have for all x ∈ C, The minorization condition (4) enables constructing the split chain for (X n ) n 0 which is the central object of the approach (see Section 17.3 of [11] for a detailed description). The minorization condition allows to write P m as a mixture of two distributions: where . Now let (X nm , Y n ) n 0 be the split chain of the m−skeleton i.e. let the random variable Y n ∈ {0, 1} be the level of the split m−skeleton at time nm. The split chain (X nm , Y n ) n 0 is a Markov chain that obeys the following transition ruleP.P and Y n can be interpreted as a coin toss indicating whether X (n+1)m given X nm = x should be drawn from ν m (·) -with probability εI C (x) -or from R(x, ·) -with probability 1 − εI C (x). One obtains the split chain (X k , Y n ) k 0,n 0 of the initial Markov chain (X n ) n 0 by defining appropriate conditional probabilities. To this end let X nm Note that the marginal distribution of (X k ) k 0 in the split chain is that of the underlying Markov chain with transition kernel P. andP From the Bayes rule we obtaiň and the crucial observation due to Meyn and Tweedie, emphasized here as Lemma 2.2 follows.
whereas hitting times τα(n) are defined as follows: We define also

Tools and Preliminary Results
In this section we analyze the sequence s i (ḡ), i 0. The basic result we often refer to is Theorem 17.3.1 in [11], which states that (s i ) i 0 is a sequence of 1-dependent, identically distributed r.v.'s withĚs i = 0. In our approach we use the following decomposition: A look into the proof of Lemma 3.3 later in this section clarifies that s i and s i are well defined. Proof. First note that s i is a function of {X (σα(i)+1)m , X (σα(i)+1)m+1 , . . . } and that Y σα(i) = 1, hence by Lemma 2.2 s 0 , s 1 , s 2 , . . . are identically distributed. Now focus on s i , s i+k and Y σα(i+k) for some k 1. Obviously Y σα(i+k) = 1. Moreover s i is a function of the pre−σα(i + k)m process and s i+k is a function of the post−(σα(i + k) + 1)m process. Thus s i and s i+k are independent again by Lemma 2.2 and for A i , A i+k , Borel subsets of R, we havě By the same pre-and post-process reasoning we obtain for A i1 , . . . , A i l Borel subsets of R thať , and the proof is complete by induction. Now we turn to prove the following lemma, which generalizes the conclusions drawn in [7] for uniformly ergodic Markov chains.
Lemma 3.2. Let the Markov chain (X n ) n 0 be recurrent (and (X nm ) n 0 be recurrent) and let the minorization condition (4) hold with π(C) > 0. Then where π C (·) is a probability measure proportional to π truncated to C, that is Proof. The first equation in (13) is a straightforward consequence of the split chain construction. To prove the second one we use Theorem 10.0.1 of [11] for the split m−skeleton with A =α. Thus τ A = τα(1) andπ := π * is the invariant measure for the split m−skeleton. Let C ⊇ B ∈ B(X ), and compute This implies proportionality and the proof is complete.
and is a function of the random variable By µ i (·) denote the distribution of (14) on X m . We will show that µ i does not depend on i. From (8), (11) and the Bayes rule, for x ∈ C, we obtaiň Lemma 3.2 together with (15) yieldš Note that νm(dy) P m (x,dy) is just a Radon-Nykodym derivative and thus (16) is a well defined measure on X m+1 , say µ(·). It remains to notice, that µ i (A) = µ(A × X ) for any Borel A ⊂ X m . Thus µ i , i 0 are identical and hence s i , i 0 have the same distribution. Due to Lemma 2.2 we obtain that s i , i 0 are 1-dependent. To proveĚ π * 0 s 2 i < ∞, we first note that νm(dy) P m (x,dy) 1/ε and also π C (·) 1 π(C) π(·). Hence where µ chain is defined by π(dx)P (x, dx 1 ) . . . P (x m−2 , dx m−1 ). Thus We need a result which gives the connection between stochastic boundedness and the existence of the second moment of s i . We state it in a general form.
Theorem 3.4. Let (X n ) n 0 be a sequence of independent identically distributed random variables and S n = n−1 k=0 X k . Suppose that (τ n ) is a sequence of positive, integer valued r.v.'s such that τ n /n → a ∈ (0, ∞) in probability when n → ∞ and the sequence (n −1/2 S τn ) is stochastically bounded. Then EX 2 0 < ∞ and EX 0 = 0.
The proof of Theorem 3.4 is based on the following lemmas.
Proof. Let (X ′ i ) be an independent copy of (X i ) and S ′ k = n i=1 X ′ i . Moreover let (ε i ) be a sequence of independent symmetric ±1 r.v.'s, independent of (X i ) and (X ′ i ). For any reals (a i ) we get by the Paley-Zygmund inequality, Hence for sufficiently large n by the Weak LLN. Thus Corollary 3.7. Let c 2 < Var(X 1 ), then for sufficiently large n, Proof. Let t 0 be as in Lemma 3.5 for δ = 1/16, then Hence by Lemma 3.5 we obtain t 0 c √ n/4 for large n. for sufficiently large n. Since (n −1/2 S τn ) is stochastically bounded, we immediately obtain Var(X 1 ) < ∞. If EX 1 = 0 then 1 √ n S τn = S τn τ n τ n n √ n → ∞ in probability when n → ∞.

A Characterization of √ n-CLTs
In this section we provide a generalization of Theorem 17.3.6 of [11]. We obtain an if and only if condition for the √ n-CLT in terms of finiteness of the second moment of a centered excursion fromα.
Furthermore we have the following formula for variance Proof. For n 0 define l n := max{k 1 : m(σα(k) + 1) n} and for completeness l n := 0 if m(σα(0) + 1) n. First we are going to show that Thus we have to verify that the initial and final terms of the sum do not matter. First observe that by the Harris recurrence property of the chain σα(0) < ∞,P π * 0 -a.s. and hence lim n→∞Pπ * 0 (mσα(0) n) = 0 andP π * 0 (σα(0) < ∞) = 1. This yields The second point is to provide a similar argument for the tail terms and to show that For ε > 0 we havě where we use thatα is an atom for the split chain, we deduce form the Lebesgue majorized convergence theorem that (20) holds. Obviously (19) and (20) yield (18). We turn to prove that the condition (17) is sufficient for the CLT to hold. We will show that random numbers l n can be replaced by their non-random equivalents. Namely we apply the LLN (Theorem 17.3.2 in [11])) to ensure that lim n→∞ l n n = lim Let n * := ⌊π(α)nm −1 ⌋, n := ⌈(1 − ε)π(α)nm −1 ⌉, n := ⌊(1 + ε)π(α)nm −1 ⌋.
Due to the LLN we know that for any ε > 0, there exists n 0 such that for all n n 0 we havě P π * 0 (n l n n) 1 − ε. Consequently Since (s j ) j 0 are 1-dependent, M k := k j=0 s j is not necessarily a martingale. Thus to apply the classical Kolmogorov inequality we define M 0 k = ∞ j=0 s 2j I {2j≤k} and M 1 k = ∞ j=0 s 1+2j I {1+2j≤k} , which are clearly square-integrable martingales (due to (17)). Hencě where C is a universal constant. In the same way we show thatP(max n * +1 l n |M l −M n * +1 | > β √ n) Cεβ −2Ě ν * m (s 2 0 ), consequently, since ε is arbitrary, we obtain The last step is to provide an argument for the CLT for 1-dependent, identically distributed random variables. Namely, we have to prove that Observe that (19), (20), (24) and (25) imply Theorem 4.1. We fix k 2 and define ξ j := s kj+1 (ḡ) + ... + s kj+k−1 (ḡ), consequently ξ j are i.i.d. random variables and Obviously the last term converges to 0 in probability. Denoting , and σ 2 s :=Ě ν * m (s 0 (ḡ)) 2 . we use the classical CLT for i.i.d. random variables to see that converges to N (0, σ 2 g ), with k → ∞. Since the weak convergence is metrizable we deduce from (26), (27) and (28) that (25) holds. The remaining part is to prove that (17) is also necessary for the CLT to hold. Note that if n k=0ḡ (X k )/ √ n verifies the CLT then ln−1 j=0 s j is stochastically bounded by (18). We use the decomposition s i = s i + s i , i 0 introduced in Section 3. By Lemma 3.3 we know that s j is a sequence of 1-dependent random variables with the same distribution and finite second moment. Thus from the first part of the proof we deduce that ln−1 j=0 s j / √ n verifies a CLT and thus is stochastically bounded. Consequently the remaining sequence ln−1 j=0 s j / √ n also must be stochastically bounded. Lemma 3.1 states that (s j ) j 0 is a sequence of i.i.d. r.v.'s, henceĚ[s 2 j ] < ∞ by Theorem 3.4. Also l n /n →π(α)m −1 by (21). Applying the inequality (a + b) 2 2(a 2 + b 2 ) we obtaiň Remark 4.2. Note that in the case of m = 1 we haves i ≡ 0 and for Theorem 4.1 to hold, it is enough to assume π|g| < ∞ instead of π(g 2 ) < ∞. In the case of m > 1, assuming only π|g| < ∞ and (17) implies the √ n-CLT, but the proof of the converse statement fails, and in fact the converse statement does not hold (one can easily provide an appropriate counterexample).

Uniform Ergodicity
In view of Theorem 4.1 providing a regeneration proof of Theorem 1.4 amounts to establishing conditions (17) and checking the formula for the asymptotic variance. To this end we need some additional facts about small sets for uniformly ergodic Markov chains.
Theorem 5.1. If (X n ) n 0 , a Markov chain on (X , B(X )) with stationary distribution π is uniformly ergodic, then X is ν m −small for some ν m .
Hence for uniformly ergodic chains (4) holds for all x ∈ X . Theorem 5.1 is well known in literature, in particular it results from Theorems 5.2.1 and 5.2.4 in [11] with their ψ = π. Theorem 5.1 implies that for uniformly ergodic Markov chains (5) can be rewritten as The following mixture representation of π will turn out very useful.
Lemma 5.2. If (X n ) n 0 is an ergodic Markov chain with transition kernel P and (29) holds, then Remark 5.3. This can be easily extended to the more general setting than this of uniformly ergodic chains, namely let P m (x, Related decompositions under various assumptions can be found e.g. in [14], [7] and [3] and are closely related to perfect sampling algorithms, such as coupling form the past (CFTP) introduced in [15].
Proof. First check that the measure in question is a probability measure.
Proof. (i) is a direct consequence of (30). To see (ii) note that Y n is a coin toss independent of {Y 0 , . . . , Y n−1 } and X nm , this allows for π * instead of π on the RHS of (ii). Moreover the evolution of {X nm+1 , X nm+2 , . . . ; Y n+1 , Y n+2 , . . . } depends only (and explicitly by (8) and (9)) on X nm and Y n . Now use (i).
Our object of interest is Next we use Corollary 5.4 and then the inequality 2ab a 2 + b 2 to bound the term A in (31).
Since π(C) = 1, we have σ 2 g = εm −1 (I + J). Next we use Lemma 2.2 andĚ π * Z 0 (ḡ) = 0 to drop indicators and since for f : X → R, alsoĚ π * f = E π f, we have Now, since all the integrals are taken with respect to the stationary measure, we can for a moment assume that the chain runs in stationarity from −∞ rather than starts at time 0 with X 0 ∼ π. Thus 6 The difference between m = 1 and m = 1 Assume the small set condition (4) holds and consider the split chain defined by (8) and (9). The following tours {X (σ(n)+1)m , X (σ(n)+1)m+1 , . . . , X (σ(n+1)+1)m−1 }, n = 0, 1, . . . that start whenever X k ∼ ν m are of crucial importance to the regeneration theory and are eagerly analyzed by researchers. In virtually every paper on the subject there is a claim these objects are independent identically distributed random variables. This claim is usually considered obvious and no proof is provided. However this is not true if m > 1.
Let ν 4 (d) = ν 4 (e) = 1/2 and ε = 1/8. Clearly P 4 (x, ·) εν 4 (·) for every x ∈ X , hence we established (4) with C = X . Note that for this simplistic example each tour can start with d or e. However if it starts with d or e the previous tour must have ended with b or c respectively. This makes them dependent. Similar examples with general state space X and C = X can be easily provided. Hence Theorem 4.1 is critical to providing regeneration proofs of CLTs and standard arguments that involve i.i.d. random variables are not valid.