Functional Erd\"os-R\'enyi law of large numbers for nonconventional sums under weak dependence

We obtain a functional Erd\H os-R\' enyi law of large numbers for"nonconventional"sums of the form $\Sig_n=\sum_{m=1}^nF(X_m,X_{2m},...,X_{\ell m})$ where $X_1,X_2,...$ is a sequence of exponentially fast $\psi$-mixing random vectors and $F$ is a Borel vector function extendin in several directions our previous result concerning i.i.d. random variables $X_1,X_2,...$.


Introduction
Let X 1 , X 2 , ... be a sequence of independent identically distributed (i.i.d.) random variables such that EX 1 = 0 and the moment generating function φ(t) = Ee tX1 exists. Denote by I the Legendre transform of ln φ and set Σ n = n m=1 X m for n ≥ 1 and Σ 0 = 0. The Erdös-Rényi law of large numbers from [8] says that with probability one I(α) lim for all α > 0 such that I(α) < ∞.
The nonconventional limit theorems initiated in [17] and partially motivated by nonconventional ergodic theorems (with the name coming from [10]) study asymptotic behaviors of sums of the form Σ n = n m=1 F (X m , X 2m , ..., X m ) (1.2) (and more general ones) where F is a vector function satisfying certain conditions. The main features of such sums are nonstationarity and unboundedly long (and strong) * Hebrew University of Jerusalem, Israel. E-mail: kifer@math.huji.ac.il dependence of their summands. In [18] we established (1.1) for sums (1.2) where F is a bounded Borel function and X 1 , X 2 , ... are independent identically distributed random variables. One of the main reasons for the independence assumption in [18] was the use of large deviations for nonconventional sums (1.2) which was established in [20] only for sums (1.2) with i.i.d. random variables X 1 , X 2 , ....
In this paper we modify our method so that only the standard (conventional) large deviations are used for sums of the form T n = n m=1 F (X (1) m , X 2m , ..., X m , m ≥ 1}, i = 1, 2, ..., are independent copies of the sequence {X m , m ≥ 1}. Now, when X 1 , X 2 , ... is a stationary weakly dependent sequence then the latter sum consists of stationary weakly dependent summands with similar properties which allows applications to Markov chains satisfying the Doeblin condition and to some dynamical systems such as Axiom A diffeomorphisms, expanding transformations and topologically mixing subshifts of finite type (see [2]). We assume exponentially fast ψ-mixing of the sequence X 1 , X 2 , ... which still leads to long and strongly dependent summands of Σ n but once we justify a transition to the sums T n we arrive at exponentially fast ψ-mixing summands there. Observe that the Erdős-Rényi law for conventional ( = 1) sums of exponentially fast ψ-mixing random variables was obtained in [6].
In fact, we derive a functional form of the Erdős-Rényi law for nonconventional sums (1.2) which was first introduced for (conventional) sums of i.i.d. random vectors in [1] and it was never considered before beyond this setup. This is a more general result and as a corollary we derive from it the standard form of the Erdős-Rényi law for nonconventional sums. Moreover, unlike the original form of this law its functional form allows to consider a multidimensional version where X 1 , X 2 , ... are random vectors and F is a vector function.
The structure of this paper is as follows. In Section 2 we describe precisely our setup and results. In Section 3 we exhibit a lemma which is a version of Lemma 3.1 from [15] and which plays a crucial role here. In Sections 4 and 5 we derive the corresponding upper and lower bounds which yield the functional form of the Erdős-Rényi law for nonconventional sums. After that we show how this implies the standard form of this law. In Appendix we describe applications to Markov chains and dynamical systems and then discuss some properties of rate functions of large deviations which are relevant to our proofs but hard to find in most of the books on large deviations.

Preliminaries and main results
Let X 1 , X 2 , ... be a ℘-dimensional stationary vector stochastic process on a probability space (Ω, F, P ) and let F : R ℘ → R d be a bounded Borel vector function on R ℘ . Our setup includes also a sequence F m,n ⊂ F, −∞ ≤ m ≤ n ≤ ∞ of σ-algebras such that F m,n ⊂ F m1,n1 whenever m 1 ≤ m and n 1 ≥ n which satisfies an exponentially fast ψ-mixing condition (see, for instance, [4]), for some κ 1 > 0 and all k, n ≥ 0.
We assume also the centering condition where µ is the distribution of X 1 , which is not actually a restriction since we always can take F −F in place of F . In addition, we assume that either X n is F n−m,n+m -measurable for some m ∈ N independent of n and then F is supposed to be only Borel measurable and bounded or F is supposed to be bounded and Hölder continuous and then we need only the following approximation property for all n, m ∈ N and some κ 3 > 0 independent of n and m.
Define two sums .., are independent copies (in the sense of distributions) of the stationary process {X k , k ≥ 1}. Assume that for any piece-wise constant map γ : [un] , X exists, Π(α), α ∈ R d is a convex twice differentiable function such that ∇ α Π(α)| α=0 = 0 and the Hessian matrix ∇ 2 α Π(α)| α=0 is positively definite (where (·, ·) denotes the inner product). Let and for any γ : if γ is absolutely continuous and S(γ) = ∞, for otherwise. It follows from the existence and properties of the limit (2.5) (see, for instance, Section 7.4 in [9]) that n −1 T n satisfies large deviations estimates in the form that for any a, δ, λ > 0 and every γ ∈ C([0, 1], R d ), γ 0 = 0 there exists n 0 > 0 such that for n ≥ n 0 , P {ρ(n −1 T n , γ) < δ} ≥ exp(−n(S(γ) + λ)) and Since S is a lower semi-continuous functional then each Φ(a), a < ∞ is a closed set and, moreover, it is compact for any finite a. Indeed, |Π(α)| ≤ D|α| by (2.3) which implies by (2.6) that I t (β) = ∞ provided |β| > D (take α = aβ/|β| in (2.6) and let a → ∞). Hence, |γ s | ≤ D for Lebesgue almost all s ∈ [0, 1] if γ ∈ Φ t (a), and so the latter set is bounded and equicontinuous which by the Arzelà-Ascoli theorem implies its compactness. where H(Γ 1 , Γ 2 ) = inf{δ > 0 : Γ 1 ⊂ Γ δ 2 , Γ 2 ⊂ Γ δ 1 } is the Hausdorff distance between sets of curves with respect to the uniform metric ρ and Γ δ = {γ : ρ(γ, Γ) < δ}. Observe that (X 2k , ..., X ( ) k ), k ≥ 1 is an ℘-dimensional stationary process with properties similar to the ones of the process X k , k ≥ 1, and so unlike Σ n the sum T n requires only "conventional" treatment. Our main goal here will be to show how to replace in our proofs the handling of the sums Σ n by the sums T n . We will mainly discuss the proof for the case where F and X n , n ≥ 1 satisfy the conditions (2.3) and (2.4) since the case when X n is F n−m,n+m -measurable and F is only a bounded Borel function is established by an obvious simplification of the proof just by eliminating the steps connected to approximations of X n by corresponding conditional expectations E(X n |F n−m,n+m ).
Our method goes through also for more general sums Σ n = 1≤m≤n F (X q1(m) , X q2(m) , ..., X q (m) ) where q i (m) = im for i ≤ k ≤ and q j (m) for j = k + 1, ..., being nonlinear indexes as in [19]. For instance, we may take q j (m) = m j for j > k. In this situation, it turns out that we can replace such sums Σ n by the sums T n = 1≤m≤n F (X   q (m) ), m ≥ 1 is an ℘-dimensional stationary process with properties similar to the ones of the process X m , m ≥ 1 and we can deal with such sums T n in the same way as in this paper.
Using dependence coefficients q,p (G, H) = sup{ E g|G − Eg p : g is H − measurable and g q ≤ 1} for σ-algebras G, H ⊂ F (see [4]) it is possible to obtain a version of Lemma 3.1 below beyond ψ-mixing (where ψ(G, H) = 1,∞ (G, H)), and so the proof of Theorem 2.1 can be extended assuming weaker than ψ-mixing conditions. Still, we do not give details here since our main examples (see Appendix), which should also satisfy appropriate large deviations, are anyway ψ-mixing. Our conditions are satisfied when, for instance, X n , n ≥ 1 is a ℘-dimensional Markov chain with transition probabilities P (x, Γ) satisfying the strong Doeblin type condition for some probability measure ν and a constant C > 0 independent of x and y. Then (X (1) n , X 2n , ..., X ( ) n ) is an ℘-dimensional Markov chain satisfying similarly to X n both exponentially fast ψ-mixing and the necessary large deviations estimates (see [4], [9] and [13]). Here we can take, for instance, the σ-algebras F m,n generated by X m , X m+1 , ..., X m and then F is supposed to be only bounded and Borel.
On the dynamical systems side our conditions are satisfied, for instance, when X n = g • f n where g is a Hölder continuous function and f is an Axiom A diffeomorphism on a hyperbolic set, expanding transformation or a mixing subshift of finite type (see [2]).
In this case f : M → M and X n (ω), ω ∈ M is a stationary sequence on the probability space (M, F, P ) where M is the corresponding phase space, F is a Borel σ-algebra and P is a Gibbs measure constructed by a Hölder continuous function. The function F here should satisfy (2.3) and the σ-algebras F m,n are generated by cylinder sets in the subshift case or by corresponding Markov partitions in the Axiom A and expanding case. The exponentially fast ψ-mixing for these transformations is obtained in [2] and the required large deviations results can be found in [13] and [14] and the product system (f, f 2 , ..., f ), which plays the role of independent copies in sums T n , has similar properties to the dynamical system f n itself.

Basic estimates
We start with the following result which is a corollary of Lemma 3.1 from [15].
Proof. If k = 2 then Lemma 3.1 from [15] gives that . Taking the expectation we obtain (3.1) for k = 2. Now let (3.1) holds true for all k ≤ j − 1 and any bounded Borel function of the corresponding number of arguments. In order to derive (3.1) for k = j we consider (Y 1 , Y 2 , ..., Y j−1 ) as one random vector and Y j as another . Then we obtain from Lemma where g(y 1 , y 2 , ..., y j−1 ) = Eh(y 1 , y 2 , ..., y j−1 , Y j ). Now, taking the expectation and applying the induction hypothesis to g we complete the proof. In the case when X n is F n−m,n+m -measurable for all n and a fixed m then we will be able to use Lemma 3.1 directly which will enable us to replace summands F (X n , X 2n , ..., X n ) by F (X (1) n , X (2) 2n , ..., X ( ) n ). On the other hand, under (2.3) and (2.4) we will have, first, to replace the original random vectors X m by their approximations X m,k = E(X m |F m−k,m+k ) and then using (2.3) to estimate the error.
Namely, fork = (k 1 , k 2 , ..., k n ) set Let j(u) > 0, u ∈ [0, 1] be a non decreasing integer valued function. We will use that by We observe that Lemma 3.1 applied to the summands of the form F (X j,kj , ..., X j,kj ) does not yield yet the summands of the form F (X ij,kj , i = 1, ..., are independent and have the same distributions as X ij,kj , i = 1, ..., , respectively. Thus an additional argument together with another use of (2.3) and (2.4) will be needed.

The upper bound
We will show first that with probability one, This assertion means that with probability one all limit points as n → ∞ of curves from W c n belong to the compact set Φ(1/c).
Since there are no more than 6 c ln n numbers m for which we will need this estimate it will suit our purposes.
Next, using stationarity of the sequence F (X     Now take into account that if k n < r ≤ k n+1 then It follows that (4.24) remains true if k n there is replaced by n implying (4.1) since ε > 0 is arbitrary.

Proof of Corollary 2.2
Observe that (2.9) implies, in particular, that for any continuous (with respect to the metric ρ) function f on the space of curves [0, 1] → R d with probability one,

Applications
The main applications in the discrete time case of Theorem 2.1 concern Markov chains and some classes of dynamical systems such as Axiom A diffeomorphisms, expanding transformations and topologically mixing subshifts of finite type. We will restrict ourselves to several main setups to which our results are applicable rather than trying to describe most general situations. First, let X n , n ≥ 0 be a time homogeneous Markov chain on R ℘ whose transition probability P (x, Γ) = P {X 1 ∈ Γ|X 0 = x} satisfies κν(Γ) ≤ P (x, Γ) ≤ κ −1 ν(Γ) (7.1) for some κ > 0, a probability measure ν on R ℘ and any Borel set Γ ⊂ R ℘ . Then X n , n ≥ 0 is exponentially fast ψ-mixing with respect to the family of σ-algebras F m,n = σ{X k , m ≤ k ≤ n} generated by the process (see, for instance, [12]). The strong Doeblin type condition (7.1) implies geometric ergodicity P (n, x, ·) − µ ≤ β −1 e −βn , β > 0 where · is the variational norm, P (n, x, ·) is the n-step transition probability and µ is the unique invariant measure of {X n , n ≥ 0} which makes it a stationary process. In this situation (X (1) n , X (2) 2n , ..., X ( ) n ), n ≥ 0 is the product Markov chain on R ℘ satisfying similar to (7.1) strong Doeblin condition. The limit (2.5) exists here (see Lemma 4.3 in Ch.7 of [9]) and exp(Π(α)) turns out to be the principal eigenvalue of the positive 1 , X 2 , ..., X ( ) )) (see [13] and references there) where E x is the expectation conditioned to (X 0 , X (2) 0 , ..., X ( ) 0 ) = x. It is well known (see [21], [12], [11] and references there) that Π(α) is convex and differentiable in α. Furthermore, the Hessian matrix ∇ 2 α Π(α)| α=0 is positively definite if and only if for each α ∈ R d , α = 0 the limiting variance for some bounded Borel function g (see [12]). In the discrete time dynamical systems case we consider X n (ω) = g • f n (ω), n ≥ 0 where g is a Hölder continuous vector function and f : Ω → Ω is a C 2 Axiom A diffeomorphism on a hyperbolic set or a topologically mixing subshift of finite type or a C 2 expanding transformation. Here X n , n ≥ 0 is considered as a stationary process on the probability space (Ω, F, P ) where Ω is the corresponding phase space, F is the Borel σ-algebra and P is a Gibbs measure constructed by a Hölder continuous function (see [2]). Then the exponentially fast ψ-mixing holds true (see [2]) with respect to the family of (finite) σ-algebras generated by cylinder sets in the symbolic setup of subshifts of finite type or with respect to the corresponding σ-algebras constructed via Markov partitions in the Axiom A and expanding cases.
Here the process (X (X (1) n (ω 1 ), ..., X ( ) where G(ω 1 , ω 2 , ..., ω ) = (g(ω 1 ), g(ω 2 ), ..., g(ω )). The above product dynamical system has similar properties as the original dynamical system f n ω, n ≥ 0 itself, in particular, it satisfies large deviations bounds with respect to Gibbs measures constructed by Hölder continuous functions and exponentially fast ψ-mixing holds true, as well. The existence of the limit (2.5) and its form follows from [14]. Here Π t (α) turns out to be the topological pressure for the function (α, F ) + ϕ where ϕ is the potential of the corresponding Gibbs measure (for the product system). The differentiability properties of Π t (α) in α are well known and, again, the Hessian matrix ∇ 2 α Π t (α)| α=0 is positively definite if and only if for each α ∈ R d , α = 0 the limiting variance (7.2) is positive where the expectation should be taken with respect to the chosen Gibbs measure (see [22], [11], [13], [14] and references there). The latter holds true unless there exists a coboundary representation (α, F ) = g • f − g for some bounded Borel function g.
In the Erdős-Rényi law type results it is important to know where a rate function I(β) δ X k , (7.7) where δ x denotes the unit mass at x (see [13]). Explicit formulas for J(ν) are known when X k is a Markov chain whose transition probability satisfies (5.1) and when X k = f k x with f being an Axiom A diffeomorphism, expanding transformation or subshift of finite type. In the former case (see [7]), J(ν) = − inf u>0, continuous ln( P u u )dν (7.8) and in the latter case (see [13]), ∞, otherwise, (7.9) where h ν (f ) is the Kolmogorov-Sinai entropy of f with respect to ν and ϕ is the potential of the corresponding Gibbs measure µ playing the role of probability here. Necessary and sufficient conditions for finiteness of J(ν) in the Markov chain case are given in [7] while in the above dynamical systems cases J(ν) < ∞ for any f -invariant measure ν. If (α, G(X j )) , (7.10) where X t is a stationary process as above and G ≡ 0 is a continuous vector function with EG(X 0 ) = 0, then by the contraction principle (see, for instance, [5]) the rate function I(β) given by (7.4) can be represented as I(β) = inf{J(ν) : Gdν = β} (7.11) where the infimum is taken over the space P(M ) of probability measures on M .