A Sequential Empirical Central Limit Theorem for Multiple Mixing Processes with Application to B-Geometrically Ergodic Markov Chains

We investigate the convergence in distribution of sequential empirical processes of dependent data indexed by a class of functions F. Our technique is suitable for processes that satisfy a multiple mixing condition on a space of functions which differs from the class F. This situation occurs in the case of data arising from dynamical systems or Markov chains, for which the Perron--Frobenius or Markov operator, respectively, has a spectral gap on a restricted space. We provide applications to iterative Lipschitz models that contract on average.


Introduction
The asymptotic behaviour of empirical processes has been studied for more than 60 years. The first rigorous result was the empirical process central limit theorem for i.i.d. data, established by Donsker (1952). This theorem, conjectured by Doob (1949), made it possible to derive the asymptotic distribution of a large number of test statistics and estimators that can be represented as functionals of the empirical process, by an application of the continuous mapping theorem. Among the examples are the Kolmogorov-Smirnov goodness of fit test, the Cramér-Von Mises ω 2 criterion, and more generally von Mises Statistics. Ciesielski and Kesten (1962) were among the first to extend Donsker's empirical process CLT to weakly dependent data, studying the empirical distribution of remainders in the dyadic expansion of a random number ω ∈ [0, 1]. Billingsley (1968) proved the first general result for dependent data, namely an empirical process CLT for data that can be represented as functionals of a mixing process. For an overview of the literature on empirical processes of dependent data, see Dehling and Philipp (2002). Müller (1970), and independently Kiefer (1972), initiated the study of the sequential empirical process, defined as where F (x) = P (X 1 ≤ x). The process U n (x, t) is also known as the two-parameter empirical process. Kiefer and Müller showed that for i.i.d. data, the sequential empirical process converges in distribution to a mean zero Gaussian process K(x, t) with covariance structure E (K(x, s)K(y, t)) = min(s, t)(F (min(x, y)) − F (x)F (y)).
The limit process K(t, x) is called Kiefer process, or Kiefer-Müller process. Komlós, Major, and Tusnády (1975), using a technique due to Csörgő and Révész (1975), established the almost sharpest possible bounds for the error in the approximation of the sequential empirical process by the Kiefer process in the i.i.d. case so far. For an overview of this topic, see the book by Csörgő and Révész (1981) or the survey article by Gänssler and Stute (1979).
Many authors have studied extensions of the sequential empirical process CLT to dependent data, e.g. Berkes and Philipp (1977) and Philipp and Pinzur (1980) for strongly mixing processes and Berkes, Hörmann, and Schauer (2009) for S-mixing processes. Dehling and Taqqu (1989) determined the asymptotic distribution of the sequential empirical process in the case of long-range dependent data.
Recently, Dehling, Durieu, and Volný (2009) have developed a technique to prove empirical process CLTs for Markov chains and dynamical systems that do not necessarily satisfy any of the standard mixing conditions. The technique has been extended by Dehling and Durieu (2011), Durieu and Tusche (2012) and Dehling, Durieu, and Tusche (2012) to multivariate empirical processes and to empirical processes indexed by classes of functions.
Among the examples that could be treated by the new techniques are B-geometrically ergodic Markov chains, for which the empirical process CLT could be established. It is the goal of the present paper to extend these techniques to the sequential empirical process.
Sequential empirical process CLTs can be applied to the study of the asymptotic distribution of change-point tests based on the empirical distribution function. Suppose (X i ) i∈N is a stochastic process with marginal distribution functions µ 1 , µ 2 , . . .. Given the observations X 1 , . . . , X n , we want to test the hypothesis H 0 : "the process is stationary with marginal distribution µ" against the alternative H A : "there exists a k * ∈ {1, . . . , n − 1} such that (X 1 , . . . , X k * ) and (X k * +1 , . . . , X n ) are both stationary with different marginal distributions". We propose the test statistic where F k denotes the empirical distribution function of the observations X 1 , . . . , X k and F k+1,n denotes the empirical distribution function of X k+1 , . . . , X n (set F 0 = F n+1,n = 0). In order to determine the asymptotic distribution of T n , we study the ℓ ∞ (R × [0, 1])-valued process R n = (R n (x, t)) (x,t)∈R× [0,1] given by As proved in the appendix (Theorem 7), assuming "convergence of the sequential empirical process", we obtain under the null hypothesis H 0 that where K is the centred Gaussian process with covariance structure Cov K(x, t), K(y, s) This process is also referred to as a Kiefer process. Applying the continuous mapping theorem to the supremum-functional, we obtain the asymptotic distribution of the test statistic T n under the null hypothesis, that is Note that, in fact this result remains true for general F-indexed empirical processes, (see Proposition 5).
The remainder of this paper is organized as follows: in Section 2 we present sequential empirical CLTs for B-geometrically ergodic Markov chains (Theorem 1) and dynamical systems that have a spectral gap on the transfer operator (Theorem 3). More abstract results, such as a sequential empirical CLT for multiple mixing random variables (Theorem 5), are stated in Section 3. These results will be the foundation for the later proof of the theorems from Section 2. The proofs of the aforementioned results can be found in Section 4, 5 and 6. The asymptotic distribution of the test statistic T n (Proposition 5) is given in the appendix.

Definitions and Notations
Let (X , A) be a measurable space. For a positive measure λ on X and a λ-integrable complex-valued function f on X , we will use the notation λf := X f dλ. For s ∈ [1, ∞), we denote by L s (λ) the Lebesgue space of s-th power integrable complex-valued functions on X . This space is equipped with the norm f s = λ(|f | s ) 1/s . Further, we denote the space of essentially bounded measurable functions on X w.r.t. λ by L ∞ (λ) and the corresponding (essential) supremum norm by · ∞ . Note that this norms depend heavily on the choice of the measure λ, however throughout this paper it will always be clear which measure we refer to.
Let (X i ) i∈N be an X -valued stationary stochastic process with marginal distribution µ and let F be a class of real-valued measurable functions on X which is uniformly bounded w.r.t. the · ∞ -norm. For n ∈ N * , we define the map F n : F −→ R, induced by the empirical measure, by The sequential empirical process of the n-th order of (X i ) i∈N is then the F × [0, 1]-indexed process U n := (U n (f, t)) (f,t)∈F ×[0,1] given by where [·] denotes the lower Gauss bracket, i.e. [x] := sup{z ∈ Z : z ≤ x}. For fixed n ∈ N * , we consider U n as a random element in the metric space ℓ ∞ (F × [0, 1]) of bounded real-valued functions on F × [0, 1], equipped with the supremum norm and the corresponding Borel σ-field. Since F × [0, 1] is uncountable, here we cannot assume that U n is measurable and thus standard techniques of weak convergence do not apply. We will therefore use the theory of outer probability and expectation (see van der Vaart and Wellner (1996)).
Let E * X denote the outer expectation of a possibly non-measurable random element X, let U be measurable, and let U, We say that the process (X i ) i∈N satisfies a sequential empirical CLT if the process U n converges in distribution in ℓ ∞ (F × [0, 1]) to a tight centred Gaussian process.
Empirical CLTs usually require some bound of the size of the indexing class F. This size is usually measured by counting certain sets, e.g. balls or brackets of a given · s -size, needed to cover F (c.f. Ossiander (1987) and van der Vaart and Wellner (1996, p.83 ff.)). In our upcoming setting, we will only deal with properties for functions of a restricted class which could be disjoint of the class F. We thus need an adapted notion of bracketing numbers. This notion was introduced in Dehling, Durieu, and Tusche (2012).
Definition. Let (X , A, µ) be a probability space. For two functions l, u : X → R such that l(x) ≤ u(x) for all x ∈ X , we define the bracket [l, u] Let G be a subset of a normed real vector space (C, · C ) of measurable real-valued functions on X . For given ε > 0, A > 0, and s ∈ [1, ∞], we call [l, u] For a class of real-valued functions F on X , we define the bracketing number as the smallest number of (ε, A, G, L s (µ))-brackets needed to cover F.
This notion of brackets allows to control the number of brackets needed to cover F not only with respect to the decreasing rate of the size of the brackets in L s -norm, but also with a control of the increasing rate of the · C -size of the bracketing functions as the L s -norm goes to zero.

B-geometrically ergodic Markov chains
In the following, let (X i ) i∈N be a time homogeneous Markov chain on a measurable state space (X , A) with a probability transition P and an invariant measure ν. We assume that the Markov chain starts with initial distribution ν, i.e that the distribution of X 0 is ν. This makes (X i ) i∈N a stationary sequence. We also denote by P the associated Markov operator defined by We assume that there exists a complex Banach space (B, · B ) of measurable functions from X to C such that P is a bounded linear operator on B. We denote by L(B) the space of bounded linear operator from B to B. We will need the following properties of the space B: (A) 1 X ∈ B, |f | and f ∈ B for all f ∈ B, and the Dirac measures δ x are continuous on B.
Moreover for some m ∈ [1, ∞], Further we consider processes such that the action of the corresponding Markov operator on B satisfies (C) P n f − (νf ) 1 X B ≤ κ f B θ n for some κ > 0, θ ∈ [0, 1), and all f ∈ B.
Remark 1. Note that condition (C) corresponds to a spectral gap property of P acting on B, i.e. 1 is the only eigenvalue of modulus one, it is simple, and the rest of the spectrum is contained in a disk of radius strictly smaller than one. Further, in this case there exists a decomposition of the linear operator P in L(B), such that Πf = (νf ) 1 X is a projection on the eigenspace of 1, N • Π = Π • N = 0, and ρ(N ) := lim n→∞ N n 1/n L(B) < 1, where · L(B) denotes the operator norm on B. For a function f from X to R, using Fourier kernels, we introduce the perturbed operators In order to apply the Nagaev method (c.f. Hennion and Hervé (2001)), we also need the following condition for some real vector space C of functions from X to R, which will be specified later in the applications.
(D) For all f ∈ C, for t in a neighbourhood of 0 we have that P f,t ∈ L(B) and further that t → P f,t is two times continuous differentiable with derivative given by We will see that the conditions (A) -(D) guarantee a sequential finite dimensional CLT for functions in C (see Proposition 4 in Section 6). Remark that, in application, C will be chosen as a subset of B. Now to establish a tightness property of the empirical process, the following further condition on the space B is useful.
(E) There exist C > 0 and ℓ ∈ N * such that, if f ∈ B and g ∈ B are bounded by 1, then Note that if B is a Banach algebra, condition (E) holds with ℓ = 2. If further C is a subset of B, then for every f ∈ C, the mapping t → P f,t is an entire function and therefore condition (D) is also satisfied.
To derive a CLT for a F-indexed empirical process, we now have to precise the relation between the class F and the Banach space B or, more precisely, between F and the vector space C which satisfies condition (D). Note that, in the particular case where F is a subset of C, from (A) − (D) we can infer the finite dimensional convergence of the process (U n ) n∈N . Then, the tightness can be established under an entropy condition on F that uses the usual bracketing number defined as in Ossiander (1987). Nevertheless, in many examples, the functions of F do not belong to the space B. To overcome this difficulty, we have to measure how the functions of F are well approximated by the functions of B. We will use the bracketing numbers introduced in the preceding section to obtain a control on the size of F which depends on the possibility of approximation by the space B. Since F is composed of real-valued function, we concentrate on real-valued function on the space B. We denote by B R the subset of B composed by real-valued function. Note that (B R , · B ) is a real Banach space. Our conditions on the Markov chain (in particular condition (C)) enable us to deal with bracketing numbers allowing an exponential growth of the B-norm of the bracket functions as the · s -size of the bracket goes to zero. This leads the following entropy condition.
For some s ∈ [1, ∞] and G ⊂ B R , (F) there exist C > 0, r > −1, and γ > 1 such that Remark 2. Observe that for r ′ ≥ 0, inequality (1) holds for all r > 2r ′ − 1, if Note further, that the supremum appears to deal with the possible non-monotonicity of the bracketing number.
We can now state the sequential empirical central limit theorem, which is proved in Section 6.
Theorem 1 (Sequential empirical CLT for B-geometrically ergodic Markov chains). Let F be a · ∞ -bounded class of functions from X to R. Assume that for some m ∈ [1, ∞], the conditions (A), (B), (C), and (E) hold. If there is a · ∞ -bounded subset G ⊂ B R such that (D) is satisfied for C = Vect R (G), the smallest real vector space containing G, and if (F) is satisfied with s = m/(m − 1), then the sequential empirical process converges in distribution in ℓ ∞ (F × [0, 1]) to a centred Gaussian process K with covariance structure given by Remark 3. A centred Gaussian process K with covariance structure (2) is often referred to as a Kiefer process. Now, let us give an example by applying Theorem 1 to random iterative Lipschitz models.

Iterative Lipschitz models that contract on average
In this section, we assume that (X , d) is a (not necessarily compact) metric space in which every closed ball is compact. Further we assume, that X is equipped with the Borel σalgebra B(X ). Let {T i , i ≥ 0} be a family of Lipschitz maps from X to X . We consider the Markov chain with state space X and transition probability P given by where the p i are Lipschitz functions from X to [0, 1] which satisfy i≥0 p i (x) = 1 for all x ∈ X . Thus, each step of the Markov chain corresponds to the application of one of the maps T i which is chosen randomly with respect to a probability distribution which depends on the actual state of the chain. We assume that this model has a property of contraction in average, that is that there exists a ρ ∈ (0, 1) such that Statistical properties of such models have been studied by Dubins and Freedman (1966), , Hennion and Hervé (2001), Wu and Shao (2004), Hervé (2008), and by Hervé and Pène (2010) in the case of constant functions p i and by Doeblin and Fortet (1937), Karlin (1953), Barnsley, Demko, Elton, and Geronimo (1988), Peigné (1993), Pollicott (2001), and by Walkden (2007) in the case of variable functions p i .
As in many of the cited papers, we need the following technical properties. For some fixed Moreover assume that for all x, y ∈ X , there exist sequences of integer (i n ) n≥1 and (j n ) n≥1 such that Note that conditions (4) -(6) are verified when the family of maps T i is finite and (7) is verified when (3) -(6) hold and each p i is positive. See Peigné (1993) for a discussion on these assumptions. Under the conditions (3) - (7), Peigné (1993) proved that the Markov chain has an attractive P -invariant probability measure ν with existing first moment. We define the stationary process (X i ) i≥0 on X as the Markov chain started with distribution ν, that is X 0 ∼ ν.
A central limit theorem for the empirical process associated to the Markov chain (X i ) i≥0 was proved by Durieu (2013) (see also Wu and Shao (2004) in the case of constant functions p i ). The following theorem extends this result to the sequential empirical processes.
For α ∈ (0, 1] and K = C or K = R, we consider the space H α (X , K) of bounded α-Hölder continuous functions on X with values in K, equipped with the norm Theorem 2. Let (3) -(7) hold and consider a · ∞ -bounded class of functions F. Let s ∈ (1, 2) and G be a · ∞ -bounded subset of the space H α (X , R) for some α < s−1 s such that (F) holds. Then the F-indexed sequential empirical process (U n (f, t)) F ×[0,1] associated to the process (X i ) i≥0 converges in distribution in the space ℓ ∞ (F ×[0, 1]) to a centred Kiefer process with covariance given by (2).
Proof. First, we introduce spaces of Lipschitz functions with weights that give the geometric ergodicity of the chain. For every α, β ∈ [0, 1], let H α,β (X , C) denote the space of continuous In particular, the space H α (X , C) := H α,0 (X , C) is the space of bounded α-Hölder functions from X to C and we have · α,0 = · α . It is a subspace of H α,β (X , C) for all β > 0. The following properties are straightforward and given without proofs.
It remains to verify condition (D). We consider the space H α (X , R) of bounded realvalued α-Hölder functions on X . Let f be a function of this space and consider the perturbed operator defined by P f,t ϕ = P (e itf ϕ). Using |e ia − e ib | ≤ |a− b|, we get that e itf ∈ H α (X , C) for all t ∈ R. Thus, for every ϕ ∈ H α,β (X , C) and t ∈ R, by condition (iii) of Lemma 1, we have e itf ϕ ∈ H α,β (X , C). Since P ∈ L(H α,β (X , C)), we infer that P f,t ∈ L(H α,β (X , C)) for all t ∈ R. Further, using again condition (iii) of Lemma 1, we see that t → P f,t is an analytic function from R to L(H α,β (X , C)), given by P f,t ϕ = k≥0 P (if ) k ϕ t k /k!. We infer that (D) holds over the space H α (X , R).

Dynamical Systems with a Spectral Gap
Let us mention that, as usual, the proof of Theorem 1 can be adapted to deal with dynamical systems using the Perron-Frobenius operator in place of the Markov operator. Let (X , A, µ) be a probability space and let T be a measure preserving transformation of X , that is Further, for a function f on X , we define the perturbed operator by P f,t ϕ = P (e itf ϕ). We have the following result, for which the proof follows the one of Theorem 1 and is left to the reader.
Theorem 3 (Sequential empirical CLT for dynamical systems with a spectral gap). Let F be a · ∞ -bounded class of functions from X to R. Assume that there exist a Banach space B and a real number m ≥ 1 such that the conditions (A), (B), (C), and (E) hold with respect to the Perron-Frobenius operator and replacing ν by µ. If there exists a · ∞bounded subset G ⊂ B R such that (D) holds for the space C = Vect R (G) and (F) holds for s = m m−1 , then the process to a centred Gaussian process K with covariance structure given by As a possible application, we can extend the empirical CLT proved by Collet, Martinez, and Schmidt (2004) for a class of expanding maps of the interval, to a sequential empirical CLT. In the situation considered in Collet, Martinez, and Schmitt (2004), the spectral gap property can be established on the space of bounded variation functions. Gouëzel (2009) gave examples of expanding maps of the interval for which the Perron-Frobenius operator does not act on the space of bounded variation functions, but acts on the space of Lipschitz functions with a spectral gap property. These examples also satisfy the assumptions of our theorem and thus sequential empirical CLTs can be proved. Note that the space of Lipschitz functions is a Banach algebra and thus conditions (D) and (E) are trivially satisfied. Further, the usual class of the indicator functions of intervals can be well approximated by Lipschitz functions, and the condition (F) is verified for this class, see also Section 2.5.

Indexing Classes of Functions
To conclude this section, we present some classes of functions for which an estimate of the bracketing number can be computed. These examples, which satisfy condition (F), come from the paper by Dehling, Durieu, and Tusche (2012).
For vectors . . , d}. Further, denote the modulus of continuity of a real-valued function F defined on a subset of R d by w F . Recall that w F (t) := sup{|F (x) − F (y)| : |x − y| ≤ t}, where | · | denotes the corresponding Euclidean norm.
Proposition 1. For a metric space X , equipped with a probability measure µ, set B = H α (X , R) and G = {f ∈ B : 0 ≤ f ≤ 1}. We have the following statements about our entropy condition (F).
If µ has a bounded density w.r.t. the Lebesgue measure, then condition (F) is satisfied for all s ∈ [1, ∞].
(iii) In the situation of (ii), F can be replaced by as t → ∞ for some β ∈ (0, 1).
(iv) For an arbitrary metric space (X , ρ) and

Sequential Empirical CLTs for Multiple Mixing Processes
In this section, we present a more general result which can be applied in the setting of Section 2. In particular, the approach used here is useful when the indexing class F is disjoint from the space of functions on which we have good properties. In the more abstract setting, our technique requires two basic assumptions concerning the process (f (X i )) i∈N , where f : X −→ R belongs to some normed vector space (C, · C ) of functionals on X . We assume that for some · ∞ -bounded subset G ⊂ C the following two properties hold.
Assumption 2 (Moment bounds for G-observables). For fixed p ∈ N * , s ≥ 1, and monotone increasing functions Φ 1 , . . . , Φ p : R + −→ R + , we consider the 2p-th moment bound With these assumptions we can show the following abstract sequential empirical CLT.
Theorem 4. Let (X , A) be a measurable space, let (X i ) i∈N be a X -valued stationary process with marginal distribution µ, and let F be a uniformly bounded class of measurable functions on X . Suppose that for some normed vector space C of measurable functions on X , some subset G of C which is bounded in · ∞ -norm, p ∈ N * , s ≥ 1 and some monotone increasing functions Φ 1 , . . . , Φ p : R + −→ R + , Assumption 1 and Assumption 2 hold. Moreover, assume that there exist a constant r > −1 and a monotone increasing function Ψ : for some non-negative constants γ i such that then the sequential empirical process U n converges in distribution in ℓ ∞ (F × [0, 1]) to a tight Gaussian process K.
The proof can be found in Section 4.
Remark 4. Note that the entropy bounds presented in Proposition 1 are strong versions of entropy conditions of the type in Theorem 4.
In the general setting of Theorem 4, we cannot precise the covariance structure of the limit process. The next lemma shows that under additional conditions, the limit process of U n is indeed a Kiefer process (c.f. Remark 3).
Lemma 2. In the situation of Theorem 4, assume that (i) Assumption 1 holds with covariance matrix Σ given by Then the covariance structure of the limit process K is given by (2).
The proof is given in Section 5. Note that Proposition 4 in Section 6 shows that Assumption 1 can be established in the setting of Section 2.2. Assumption 2 has been verified for p = 2, Φ 1 (x) = log 3 (x + 1), and Φ 2 (x) = log 2 (x + 1) by Durieu (2008), who considers Markov chains and dynamical systems that support a spectral gap property. In a later work, Dehling and Durieu (2011) generalized this result to general p ∈ N * and Φ i (x) = log 2p−i (x + 1). More general, they show that for a process which satisfies the so called multiple mixing condition w.r.t. C, for every · ∞ -bounded G ⊂ C and every p ∈ N * , there is a c > 0 such that Assumption 2 holds with Φ i (x) = c log 2p+(d 0 −1)i (x + 1). The multiple mixing condition is defined as follows.
Definition (Multiple Mixing Processes). We say that a process (X i ) i∈N is multiple mixing with respect to C if there exist a θ ∈ (0, 1) and an integer d 0 ∈ N such that for all p ∈ N * , there exist an integer ℓ and a multivariate polynomial P of total degree not larger than d 0 such that holds for all f ∈ C with µf = 0 and f ∞ ≤ 1, all integers i 0 ≤ i 1 ≤ . . . ≤ i p and all q ∈ {1, . . . , p}.
Note that in the setting of Section 2.2, this property with d 0 = 0 can be derived from the spectral gap property, see Lemma 5. For multiple mixing processes, we have the following version of Theorem 4.
Theorem 5 (Sequential empirical CLT for multiple mixing random variables). Let (X , A) be a measurable space, let (X i ) i∈N be a X -valued stationary process with marginal distribution µ, and let F be a uniformly bounded class of measurable functions on X . Suppose that for some s ≥ 1, the process (X i ) n∈N is multiple mixing w.r.t. a normed vector space C of measurable functions on X , where for every p ∈ N * the multivariate polynomial P in inequality (12) is of total degree not larger than d 0 . If further Assumption 1 and condition (F) hold for some · ∞ -bounded subset G of C and γ > d 0 + 1, then the sequential empirical process U n converges in distribution in ℓ ∞ (F × [0, 1]) to a tight Gaussian process K.
If further the covariance matrix Σ in Assumption 1 is given by and if there are constants θ ∈ (0, 1) and D > 0 such that for all f ∈ G ∪ (G − G) and all ϕ ∈ L s (µ) Cov ϕ(X 0 ), f (X k ) ≤ D ϕ s f C θ k , then the covariance structure of the limit process K is given by (2).

Multiple Mixing of Lower Rate Processes of a lower mixing rate have been studied by Durieu and Tusche (2012), who consider a multiple mixing condition on
Such a mixing type is given e.g. for multidimensional causal functions of i.i.d. processes. A causal function of an i.i.d. process (ξ i ) i∈Z is defined as a process (X i ) i∈N * given by X i = G(ξ i , ξ i−1 , . . .), where G : X N → R d is a measurable function. The physical measure of dependence δ i,m (introduced by Wu (2005)) is defined by where (ξ ′ i ) i∈Z is an independent copy of (ξ ′ i ) i∈Z . Durieu and Tusche (2012) showed that a causal function of an i.i.d. process has the aforementioned mixing property with Θ(i) = δ α i,m , where m = s/(s − 1).
As an example consider the following MA-process. Let (ξ i ) i∈Z be an i.i.d. process in a normed vector space (Y, · Y ), let (a j ) j∈N be a family of R d -valued linear functionals on Y, |a| * := sup{|a(y)| : y Y ≤ 1}, and define the process (X i ) i∈N by X i = ∞ j=1 a j ξ i−j . In this case we have δ i,m ≤ (2 X 0 s ) α ∞ j=i |a j | * and thus, assuming that X 0 s and ∞ j=i |a j | * are finite, (X i ) i∈N has the multiple mixing property with Θ(i) = ∞ j=i |a j | * . Note that as a consequence of working with Φ i = c id i in this setting, in order to satisfy condition (10) in Theorem 4, we cannot choose Ψ = exp(C id 1/γ ), but need to work with a polynomial type of Ψ, which on the other hand requires stronger types of bracketing numbers then in condition (F). These can be achieved mainly using stronger conditions on µ or restrictions on F. Observe that the bracketing numbers presented in (ii) and (iii) of Proposition 1 are actually also available for polynomial Ψ (c.f. Dehling, Durieu, and Tusche (2012)) without further restrictions of µ or F. Further, bracketing numbers for indicators of semifinite rectangles of the type [−∞, t], t ∈ R d , with polynomial Ψ are implicitly given in Durieu and Tusche (2012) at the cost of stronger assumption on the marginal distribution µ.

Proof of Theorem 4
The main idea of the proof is to introduce some approximation U (q) n for the original process U n , which is based on functions in G and thus can be controlled by Assumption 1 and 2. The approximation can be constructed as follows: For all q ≥ 1, there exist two sets of N q := N (2 −q , Ψ(2 q ), F, G, L s (µ)) functions {g q,1 , . . . , g q,Nq } ⊂ G and {g ′ q,1 , . . . , g ′ q,Nq } ⊂ G, such that and for all f ∈ F, there exists some i such that g q,i ≤ f ≤ g ′ q,i . Further, by (9), To approximate the indexing function f ∈ F, construct a partition of F into N q subsets F q,i such that for each f ∈ F q,i one has g q,i ≤ f ≤ g ′ q,i . We use the notation π q f = g q,i * and π ′ q f = g ′ q,i * , where i * is the uniquely defined integer such that f ∈ F q,i * . To approximate the time parameter we use the partition of [0, 1] into subsets T q,j , j = 1 . . . , 2 q , given by T q,j := [(j − 1)2 −q , j2 −q ) for j < 2 q and T q,2 q := [1 − 2 −q , 1]. For t ∈ [0, 1] we define τ q t := max{(j − 1)2 −q ≤ t : j = 1, . . . , 2 q } and further τ ′ q t := τ q t + 2 −q . We extend the notation introduced in Section 2 to arbitrary µ-integrable functions f : X −→ R by setting and for t ∈ [0, 1] For each q ≥ 1, we introduce the approximating process Note that these process is constant on each F q,i × T q,j . To draw the connection between the weak asymptotic behaviour of the original process U n and the approximating process U (q) n , we use an altered version of Theorem 4.2 in Billingsley (1968, p.25): n , X (q) , q, n ≥ 1 be random elements with values in the Banach space (ℓ ∞ (F × [0, 1]), · ∞ ) and suppose that X (q) is measurable and separable 1 . If the conditions are satisfied, then there exists an ℓ ∞ (F × [0, 1])-valued, separable random variable X such that X (q) X as q → ∞ and X n X as n → ∞.
Proposition 3. Assume that Assumption 2 holds for some p ∈ N * , s ≥ 1 and some monotone increasing functions Φ 1 , . . . , Φ p : R + −→ R + . Moreover, suppose there exists a constant r > −1 and an monotone increasing function Ψ : R + −→ R + such that (9) holds. If (10) holds for some non-negative constants γ i satisfying (11), then for all ε, η > 0 there exists some q 0 such that for all q ≥ q 0 lim sup n→∞ P * sup 1 Since the objects we work with involve suprema over the non separable space ℓ ∞ (F × [0, 1]), measurability can not always be guaranteed. Thus we need to use the theory of outer probability as presented in van der Vaart and Wellner (1996). In this context we call any (not necessarily measurable) functions on a probability space a random element and we call it a random variable if it is also measurable. We denote the outer probability with respect to a probability measure P by P * . Furthermore a random variable X with values in some space S is called separable, if there exists some separable subset S ′ of S such that such that P(X ∈ S ′ ) = 1.
Proof of Theorem 4. We can now apply Theorem 6 with X n = U n , X (q) n , X (q) = U (q) . By Proposition 2 the convergence (15) holds, while (16) is satisfied due to Proposition 3. Therefore U n converges in distribution to an ℓ ∞ (F ×[0, 1])-valued, separable random variable W . Furthermore, we know that U (q) is a piecewise constant Gaussian process which converges in distribution to K. Thus K is Gaussian, too. Since ℓ ∞ (F × [0, 1]) is complete, the tightness of K follows from the separability (c.f. Lemma 1.3.2 in van der Vaart and Wellner (1996)).
Proof of Proposition 2. Since by construction π q f ∈ G for all f ∈ F, due to Assumption 1, the finite dimensional process (U (q) n (f 1 , t 1 ), . . . , U (q) n (f k , t k )) converges in distribution to some multi-dimensional normal distributed random variable (U (q) (f 1 , t 1 ), . . . , U (q) n (f k , t k )) for all fixed k ∈ N * , f 1 , . . . , f k ∈ F, t 1 , . . . , t k ∈ [0, 1]. All U (q) n , n ∈ N * , are constant on each F q,i × T q,j , i = 1, . . . , N q , j = 1, . . . , 2 q . Therefore U (q) is constant on all F q,i × T q,j , too. Since these sets form a partition of F × [0, 1], the finite dimensional convergence yields the convergence in distribution of the whole process (U Proof of Proposition 3. Let Z := Z − E Z denote the centring of a random variable Z and observe that for any random variables q+k f, t), using that · 1 ≤ · s for s ≥ 1 and applying (13), we obtain Moreover, for all n ≥ 2 q+k and g ∈ G where M := sup{ g ∞ : g ∈ G} is finite by assumption. Analogously to the processes U (q) n , we introduce the processes U ′(q) n given by An application of the triangle inequality, (17), and (18) yields Combining (19) with a telescopic sum argument, one obtains for any K ≥ 1 To assure ε/4 ≤ (4M + 1) √ n2 −(q+K) ≤ ε/2, choose K = K n,q , given by K n,q := log 2 4(4M + 1) √ n 2 q ε .
For each i = 1, . . . , N q , j = 1, . . . , 2 q , inequality (20) implies Set ε k = ε/(4k(k + 1)). Then ∞ i=1 ε k = ε/4 and for all i = 1, . . . , N q we have Recall that (π q+k , τ q+k ) and thus U (q+k) n and U ′(q+k) n are constant on each F q+k,i × T q+k,j , i = 1, . . . N q+k , j = 1, . . . , 2 q+k , and thus the suprema on the r.h.s. of inequality (21) are in fact maxima over finite numbers of functions. Therefore the outer probabilities may be replaced by usual probabilities here. Now, for each k ∈ N * , choose a set F(k) of at most N k−1 N k functions in F, such that F(k) contains at least one function in each non empty
Thus the factor in front of the sum is uniformly bounded w.r.t. n. Using (14), we obtain for sufficiently small η > 0 which implies that the series in (31) goes to zero as q → ∞.

Proof of Lemma 2
For f ∈ F, recall the definition of the approximating functions π q f in Section 4 and note that as a consequence of the entropy condition in Theorem 4, we know that for every q ∈ N * Similarly, for all g ∈ F and k ∈ N * there exist some g k ∈ G satisfying Let U (q) denote the limit process given in Proposition 2. Condition (i) implies that for all f, g ∈ F, t, u ∈ [0, 1] and q ∈ N * Cov U (q) (f, t), U (q) (g, u) Cov π q f (X 0 ), π q g(X k ) + ∞ k=1 Cov π q g(X 0 ), π q f (X k ) .
Since the autocovariance functions of a converging Gaussian process converge to the autocovariance functions of the limit process, the covariance structure of the limit process K of U (q) is given by Cov(K(f, t), K(g, u)) = lim q→∞ Cov(U (q) (f, t), U (q) (g, u)). Thus it suffices to show that ∞ k=0 Cov By symmetry, both series can be treated the same way. Let k(q) := 2 q/β . We consider the series in line (36). We have ∞ k=0 Cov Cov π q f (X 0 ) − f (X 0 ), π q g(X k ) Cov f (X 0 ), π q g(X k ) − g(X k ) .
Let us treat the terms separately. Recall that both F and G are uniformly bounded in · ∞ -norm. For the term in line (37), we know by Hölder's inequality, (32), and the fact that β > 1 that where again, we write x ≪ y if there is a constant C ∈ (0, ∞) depending only on global parameters such that x ≤ Cy. For the term in line (38), by (ii), (32), and (33) we obtain ∞ k=k(q)+1 Cov π q f (X 0 ) − f (X 0 ), π q g(X k ) where we used that Ψ is increasing and condition (ii) in the last step. It only remains to show, that the term in line (39) goes to zero as q → ∞. We have First, consider the term in line (40). By (ii), (33), and (35) where we used that Ψ is increasing and applied condition (ii) in the last line. To treat the term in line (41), we use Hölder's inequality and (34). We obtain since β > 1 and thus ∞ k=1 k −β < ∞, which completes the proof.

Proof of Theorem 1
Let (X i ) and (B, · ) be the Markov chain and the Banach space introduced in Section 2.2.
To prove Theorem 1, we shall apply Theorem 5. We begin by showing, that Assumption 1 holds with covariance structure (2). To this aim, we will partially follow the lines of the proof of Theorem A in Hennion and Hervé (2001). For a measurable real-valued function f on X and a real number t ∈ [0, 1], we introduce the notation The following proposition gives a sequential finite dimensional CLT under (A) -(D). . . , f k be real-valued functions on X such that ν(|f i | 2 ) < ∞ and (D) holds for the space C = Vect R (f 1 , . . . , f k ), the smallest real vector space containing f 1 , . . . , f k . Then, we have where N (0, Σ) is a normal distribution in R k with mean 0 and covariance matrix Σ = (Σ i,j ) 1≤i,j≤k . If furthermore f 1 , . . . , f k ∈ L s (ν)∩B R with s = m/(m−1), then the covariance matrix is given by This proposition will show that Assumption 1 holds with covariance structure (2) since by assumption, G is only composed by bounded real-valued functions from B.
Proof. First let f be a function as in the statement of the proposition. By the Perturbation Theorem (see Theorem III.8 in Hennion and Hervé (2001)), there exist a neighbourhood I f of 0 and 0 < θ < η < 1 such that for all t ∈ I f , there exist operators Π f,t and N f,t and complex numbers λ f,t such that Moreover, λ f,0 = 1, Π f,0 = Π, N f,0 = N and the maps t → λ f,t , t → Π f,t and t → N f,t have continuous second derivatives on I f . We thus have for all n ≥ 1 Further, if ν(f ) = 0, by Lemma IV.4' in Hennion and Hervé (2001) the Taylor expansion of λ f,t as t goes to 0 is given by with These are the main ingredients to derive a CLT for the process (f (X i )) i≥0 . Here we want to show a finite dimensional sequential CLT. Without loss of generality we will treat the case k = 2. By the Cramèr-Wold device, it is sufficient to prove the convergence of the real linear combinations a 1 n − 1 2 S n (f 1 , t 1 ) + a 2 n − 1 2 S n (f 2 , t 2 ) of any square ν-integrable functions f 1 , f 2 ∈ C to a normal distribution. Since for t 1 < t 2 , the preceding term is equal to n − 1 2 S n (a 1 f 1 + a 2 f 2 , t 1 ) + n − 1 a 2 f 2 (X i ) and C is a real vector space, it is sufficient to show the convergence of all sums of the form n − 1 2 S n with S n (f, g, s) = where f, g ∈ C, s ∈ (0, 1). So, fix f, g ∈ C, s ∈ (0, 1) and set S n = S n (f, g, s). The following lemma gives us an expression of the corresponding characteristic function.
To study the weak convergence of Sn √ n , we have to compute the limit of (1 − s)σ 2 g ) as n → ∞, where σ f and σ g are given by (43). Further, since ρ(N f,t ) < 1 and ρ(N g,t ) < 1, we have that N n f,t L(B) → 0 and N n g,t L(B) → 0 uniformly in t ∈ I f ∩ I g as n → ∞. By continuity, we also have Π f, t √ n 1 X → 1 X and Π g, t √ n 1 X → 1 X as n → ∞. We therefore obtain Proof of Theorem 1. Now, take C = Vect R (G). By Proposition 4, Assumption 1 is satisfied. In order to apply Theorem 5 to prove Theorem 1, it remains to show the multiple mixing property of (X i ) i∈N . The following lemma, which is basically Lemma 3 in Dehling and Durieu (2011), gives this property w.r.t. the Banach space B containing C. Proof. Let f ∈ B such that f ∞ ≤ 1 and set s = m/(m − 1). For all p > q > 0, for all i 0 < i 1 < . . . < i p , we write g = f P i q+2 −i q+1 f P i q+3 −i q+2 . . . f P ip (f ) . By (E), g belongs to B. Using Hölder's inequality, we obtain Using (B), (C), and f ∞ ≤ 1, we infer Now, since the spectral radius of P ∈ L(B) is 1, there exists a c ≥ 1, which does not depend on f , such that P k f B ≤ c f B for all k ∈ N * . By (E), we obtain two constants C > 0 and ℓ ∈ N * , depending only on p, such that g B ≤ C f ℓ B . This completes the proof of the lemma.
To conclude the proof of Theorem 1 observe, that the extra assumptions in Theorem 5, which concern the covariance structure of the limit process, are satisfied due to Proposition 4 and Lemma 4.
Theorem 7. Assume that (X i ) i∈N satisfies the sequential empirical CLT with indexing class F and limit process K, that is, U n K in ℓ ∞ (F × [0, 1]) as n → ∞, where K denotes a tight centred Gaussian process. Then in ℓ ∞ (F × [0, 1]) to as n → ∞.