Moment bounds and concentration inequalities for slowly mixing dynamical systems

We obtain optimal moment bounds for Birkhoff sums, and optimal concentration inequalities, for a large class of slowly mixing dynamical systems, including those that admit anomalous diffusion in the form of a stable law or a central limit theorem with nonstandard scaling $(n\log n)^{1/2}$.


Statement of results
Consider a dynamical system T on a space X, preserving an ergodic probability measure µ. If x is distributed according to µ, the process x, T x, T 2 x, . . . on X N is stationary, with distribution µ ⊗ δ T x ⊗ δ T 2 x ⊗ · · · (equivalently, one considers a Markov chain on X, with stationary measure µ, for which the transitions from x to T x are deterministic). In particular, if f is a real-valued function on X, the real process f (x), f (T x), . . . is also stationary. We would like to understand to what extent these processes behave like independent or weakly dependent processes: Although they are purely deterministic once the starting point x is fixed, one expects a random-like behaviour if the map T is sufficiently chaotic and the observable f is regular enough. In such a situation, the Birkhoff sums S n f = n−1 i=0 f • T i of Hölder continuous functions with zero average typically satisfy the central limit theorem, and grow like √ n. On the other hand, the moments |S n f | p dµ may grow faster than n p/2 : it is possible that some subsets of X with small measure give a dominating contribution to those moments. Estimating the precise growth rate is important from the point of view of large deviations. It turns out that this precise growth rate depends on finer characteristics of the system, and displays a transition at some critical exponent p * directly related to the lack of uniform hyperbolicity of the system. The situation for uniformly expanding/hyperbolic (Axiom A) systems is easily described: all moments |S n f | p dµ grow like n p/2 and moreover |n −1/2 S n f | p dµ converges to the p'th moment of the limiting Gaussian in the central limit theorem. [MT12b] showed that convergence of all moments holds also for nonuniformly expanding/hyperbolic diffeomorphisms modelled by Young towers with exponential tails [You98]. However, it follows from [MN08,MT12b] that the situation is quite different for systems modelled by Young towers with polynomial tails [You99].
In this paper, we give optimal bounds for all moments of Birkhoff sums (by optimal, we mean that we have upper and lower bounds of the same order of magnitude), in the situation of Young towers. Many real systems are quotients of such Young towers, hence our bounds apply to such systems, including notably intermittent maps of the interval [LSV99,PM80]. See for instance [MN08] for a discussion of such applications. Our techniques also give a generalization of moment inequalities, to concentration inequalities (see [CG12] for a discussion of numerous applications of such bounds). By the methods in [CG12,MT12b], all results described here pass over to the situation of invertible systems and flows; for brevity we present the results only for noninvertible discrete time dynamics.
We formulate our results in the abstract setting of Young towers. To illustrate this setting, let us start with a more concrete example, intermittent maps, i.e., maps of the interval which are uniformly expanding away from an indifferent fixed point (see [You99] for more details). For γ ∈ (0, 1), consider for instance the corresponding Liverani-Saussol-Vaienti map T γ : [0, 1] → [0, 1] given by The first return map to the subinterval Y = [1/2, 1] is uniformly expanding and Markov. Define a new space X = {(x, i) : x ∈ [1/2, 1], i < ϕ(x)} where ϕ : Y → N * is the first return time to Y , i.e., ϕ(x) = inf{i > 0 : T i γ (x) ∈ Y }. On this new space, we define a dynamics by T (x, i) = (x, i + 1) if i + 1 < ϕ(x), and T (x, ϕ(x) − 1) = (T ϕ(x) γ (x), 0). We think of X as a tower, where the dynamics T is trivial when one climbs up while it has a large expansion when one comes back to the bottom of the tower. The point of this construction is that the combinatorics of T are simpler than those of the original map T γ , while the essential features of T and T γ are the same. More precisely, the two maps are semiconjugate: the projection π : X → [0, 1] given by π(x, i) = T i γ (x) satisfies T γ • π = π • T . Hence, results for the decay of correlations, or growth of moments, or concentration, for T readily imply corresponding results for T γ . This situation is not specific to the maps T γ : many concrete maps can be modelled by Young towers in the same way (although the Young tower is usually not as explicit as in this particular example).
Let us give a more formal definition. A Young tower is a space X endowed with a partition α 0 i<hα ∆ α,i (where α belongs to some countable set, and h α are positive integers), a probability measure µ and a map T preserving µ. The dynamics T maps bijectively ∆ α,i to ∆ α,i+1 for i + 1 < h α , and ∆ α,hα−1 to ∆ 0 = ∆ α,0 : the dynamics goes up while not at the top of the tower, and then comes back surjectively to the basis. The distance on X is defined by d(x, y) = ρ s(x,y) where ρ < 1 is fixed and s(x, y), the separation time, is the number of returns to the basis before the iterates of the points x and y are not in the same element of the partition. Finally, we require a technical distortion condition: Denoting by g(x) the inverse of the jacobian of T for the measure µ, we assume that |log g(x) − log g(y)| Cd(T x, T y) for all x, y in the same partition element.
With the distance d, the map T is an isometry while going up the tower, and expands by a factor ρ −1 > 1 when going back to the basis: it is non-uniformly expanding, the time to wait before seeing the expansion being large on points in ∆ α,0 with h α large. In particular, denoting by ϕ(x) the return time to the basis, the quantities called the tails of the return time, dictate the statistical properties of the transformation T . By Kac's Formula, tail n is summable since µ is finite by assumption. Various kinds of behaviour can happen for tail n . For instance, in the case of intermittent maps of parameter γ ∈ (0, 1), one has tail n ∼ C/n 1/γ . In general, if tail n = O(n −q ) for some q > 1, then Lipschitz functions mix at a speed O(n −(q−1) ) by [You99], and this speed is optimal, see [Sar02] and [Gou04b]. If q > 2, then n −1/2 S n f converges in distribution to a Gaussian, and the variance is nonzero provided f is not a coboundary. (More generally, for convergence to a Gaussian it suffices that the return time function ϕ is square-integrable, i.e., n tail n is summable.) When q ∈ (1, 2], more precise information is required on tail n , leading to the following result. . Consider a Young tower with tail n ∼ Cn −q for some q > 1. There is a sequence a n , and a nonempty set U in the space of Lipschitz functions f : X → R with mean zero, such that the following holds. For each f ∈ U , there exists a nondegenerate law Z such that a −1 n S n f → d Z. Moreover, a n and Z are given as follows: q > 2: a n = n 1/2 , Z is Gaussian. q = 2: a n = (n log n) 1/2 , Z is Gaussian. q ∈ (1, 2): a n = n 1/q , Z is a stable law of index q.
The set U is rather big: it contains for instance all the functions that converge to a nonzero constant along points whose height in the tower tends to infinity.
Lower bounds for the growth of moments are well-known (see [MN08]) and can be summarized in the following proposition. We write u for the Lipschitz norm of a function u, given by where the supremum in the second term is restricted to those x and y that belong to the same partition element. Note that, changing the parameter ρ in the definition of the distance, Hölder functions for the old distance become Lipschitz functions for the new one. Hence, all results that are stated in this paper for Lipschitz functions also apply to Hölder functions.
Proposition 1.2. Consider a Young tower with tail n ∼ Cn −q for some q > 1. Then, for all p ∈ [1, ∞), there exists c > 0 such that for all n 1 The phase transition in these lower bounds happens at p * = 2q − 2 for q 2, and at p * = q for q 2. Before this threshold, the first lower bound (that corresponds to an average behavior over the whole space) is more important, while the second one (that corresponds to the Birkhoff sum being large on a small part of the space) is dominating afterwards.
Proof. For the lower bound n p−q+1 , we take f that is equal to 1 on hα n i<hα ∆ α,i , and equal to another constant on the complement of this set, to make sure that f dµ = 0. Then S n f = n on hα 2n i<hα/2 ∆ α,i , hence Using a discrete integration by parts and the assumption µ(ϕ n) ∼ Cn −q , one checks that this is cn p−q+1 .
For the other bound, we fix a mean zero Lipschitz function f in the set U constructed in Theorem 1.1. This theorem shows the existence of a n and Z nondegenerate such that a −1 n S n f → d Z. Hence a −p n |S n f | p dµ is bounded from below and we get the lower bound ca p n in all three cases.
In the case q < 2 and p = q, the lower bound in the proposition is |S n f | q dµ cn. It is not sharp: for f ∈ U , S n f /n 1/q converges to a stable law Z of index q, whose q-th moment is infinite, hence |S n f /n 1/q | q dµ tends to infinity. To get a better lower bound, one should study the speed of convergence of S n f /n 1/q to Z. We can do this under stronger assumptions on the tails (this is not surprising since it is well known that the speed of convergence to stable laws is related to regularity assumptions on the tails of the random variables): Proposition 1.3. Consider a Young tower with tail n = Cn −q +O(n −q−ε ) for some q ∈ (1, 2) and some ε > 0. Then there exists c > 0 such that for all n > 0 This lower bound is considerably more complicated to establish than the ones in Proposition 1.2. Since the arguments are rather different from the rest of the paper (essentially, they reduce to a proof of a Berry-Esseen like bound for S n f /n 1/q ), we defer the proof of the proposition to Appendix A. The assumptions of this proposition are for instance satisfied for the classical Pomeau-Manneville intermittent maps [LSV99,PM80]. (See for example [MT12a,Proposition 11.12].) For q = 2, the bound c max((n log n) p/2 , n p−q+1 ) is known to be optimal for all p, see Remarks 1.6 and 1.7 below. Also, for q > 2, the bound c max(n p/2 , n p−q+1 ) is known to be optimal for all p = 2q − 2. The remaining cases are much more subtle, and are solved for the first time in this paper. We note that for q > 2 and p = 2q − 2, [CG12] obtains an additional upper bound for the weak moment of S n f , which implies for p > 2q − 2 the upper bound Cn p−q+1 , in accordance with the lower bound. Moreover, the very precise methods of [CG12] seemed to indicate that the upper bound for the weak moment at p = 2q − 2 was optimal, and that the discrepancy with the lower bound was due to a suboptimality of the (naive) lower bound. We prove below that this is not the case.
Theorem 1.4. Consider a Young tower with tail n = O(n −q ) for some q > 1. Then, for all p ∈ [1, ∞), there exists C > 0 such that for any Lipschitz function f with f 1 and f dµ = 0, for all n 0, if q < 2 and p = q.
If q < 2, we have for all t > 0 and therefore Our upper bounds all match the corresponding lower bounds given in Propositions 1.2 and 1.3, and are therefore optimal.
Note that, in the proofs, if is sufficient to understand what happens at the critical exponent p * = 2q − 2 for q > 2: a control on the L 2q−2 -norm for q > 2 readily implies the control for any p ∈ [1, ∞) thanks to the trivial inequalities In the same way, for q < 2, the control (1.1) on the weak q-th moment implies the corresponding L p controls for any p ∈ [1, ∞) thanks to the equality This formula would also apply in the q > 2 case (combined with the control of µ{|u| > s} coming from the estimate at the exponent p * and the Markov inequality), but it gives worse constants than (1.2) in this case. On the other hand, for q = 2, the bound √ n log n for the second moment does not give the desired upper bound for p > 2 (using the formulas (1.2) or (1.3), one only gets the upper bound |S n f | p dµ Cn p−1 log n, with an extra log n).
As an immediate consequence of the bounds on moments at the critical exponent, we obtain convergence of moments for all lower exponents.
Corollary 1.5. Consider a Young tower with tail n ∼ Cn −q for some q > 1. Suppose that f , a n and Z are as in Theorem 1.1. Then |a −1 n S n f | p dµ → E(|Z| p ) for all p < p * where p * = 2q − 2 for q ≥ 2 and p * = q for q ∈ (1, 2).
In particular, there exist nonzero constants C = C p,q such that • if q = 2, then |S n f | p dµ ∼ C(n log n) p/2 for all p < 2.
Proof. As in [MT12b], this is an immediate consequence of Theorem 1.1, together with the fact that |a −1/2 n S n f | p ′ dµ is bounded for any p ′ ∈ (p, p * ) as guaranteed by Theorem 1.4 (for q 2, one can even take p ′ = p * ).
Remark 1.6. Previously, no results were available on convergence of moments for q < 2. The case q > 2 in Corollary 1.5 recovers a result of [MT12b] and the result for q = 2 was obtained by [BCD13] in the context of dispersing billiards with cusps. [BCD13] consider also the critical exponent p = 2 for dispersing billiards with cusps, and prove for this example that the limiting second moment is twice the moment of the limiting Gaussian: |(n log n) −1/2 S n f | 2 dµ → 2E(|Z| 2 ). This particular behaviour is due to the very specific geometric structure of the billiard.
Remark 1.7. Certain aspects of Theorem 1.4 and Corollary 1.5 do not require the full strength of the assumption that there is an underlying Young tower structure. We can consider the more general situation where f is a mean zero observable lying in L ∞ such that f g • T n dµ ≤ C g L ∞ n −(q−1) for all g ∈ L ∞ , n ≥ 1. (Such a condition is satisfied for f Lipschitz when X is a Young tower with tail n = O(n −q ).) In the case q = 2, this weaker condition is sufficient to recover all the moment estimates (and hence the convergence of moments for p < 2) described above. By [Mel09, Lemma 2.1], |S n f | p dµ ≪ n p−1 for p > 2 and |S n f | 2 dµ ≪ n log n.
After we completed this article, we learned that, using techniques that are completely different from the ones we develop, Dedecker and Merlevède [DM14] also obtain the controls on moments given in Theorem 1.4, essentially under an assumption of the form f g • T n dµ ≤ C g L ∞ n −(q−1) . Their arguments (initially developed to control the behavior of the empirical measure) rely on general probabilistic inequalities for sums of random variables, and can apparently not give the concentration inequalities of Theorem 1.9 below.
Remark 1.8. Proposition 1.2 and Theorem 1.4 clarify certain results in the Physics literature. As in [MT12b], our results go over to flows, and apply in particular to infinite horizon planar periodic Lorentz gases. These can be viewed as suspension flows over Young towers with tail n ∼ Cn −2 so we are in the case q = 2. In particular, if r(t) denotes position at time t, then (t log t) −1/2 r(t) → d Z where Z is a nondegenerate Gaussian [SV07]. [AHO03] consider growth rate of moments for r(t), but neglecting logarithmic factors. Defining γ p = lim t→∞ log |r(t)| p dµ/ log t, they argue heuristically that γ p = max{p/2, p − 1} in accordance with our main results. [CESFZ08] conducted numerical simulations to verify the growth rates of the moments, including logarithmic factors, but based on the belief that |r(t)| p dµ scales like (n log n) p/2 for all p, whereas we have shown that this is correct only for p ≤ 2.
Two other examples of billiards that are modelled by Young towers with tail n ∼ Cn −2 are Bunimovich stadia (discrete and continuous time) [BG06] and billiards with cusps (discrete time) [BCD11,BCD13]. Again, our results apply to these situations with q = 2.
The above optimal upper bounds for moments, dealing with Birkhoff sums, can be extended to concentration estimates, for any (possibly non-linear) function of the point and its iterates. More precisely, consider a function K(x 0 , x 1 , . . . ) (depending on finitely or infinitely many coordinates) which is separately Lipschitz: for all i, there exists a constant Lip i (K) such that, for all x 0 , x 1 , . . . and The function K is defined on the spaceX = X N . This space carries a natural probability measure, describing the deterministic dynamics once the starting point is chosen at random according to µ, i.e.,μ : This is the average of K with respect to the natural measure of the system. We are interested in the deviation of Theorem 1.9. Consider a Young tower with tail n = O(n −q ) for some q > 1. Then, for all p ∈ [1, ∞), there exists C > 0 such that, for all separately Lipschitz function K, • if q > 2, • if q < 2, then for all t > 0 Note that |K − E(K)| is trivially bounded by i Lip i (K). Hence, when q > 2, it is sufficient to prove the estimates for p = 2q − 2, as the other ones follow using (1.2). In the same way, for q < 2, it suffices to prove the weak moment bound (1.4), thanks to (1.3). On the other hand, for q = 2, the inequality for p = 2 is not sufficient to obtain the result for p > 2.
There are logarithmic terms in some of the above bounds when q 2. This is not surprising, since such terms are already present in the simpler situation of Birkhoff sums, in Theorem 1.4. The precise form of these logarithmic terms may seem surprising at first sight, but it is in fact natural since such a bound has to be homogeneous: The logarithmic term should be invariant if one replaces K with λK, and therefore each Lip i (K) with λ Lip i (K). This would not be the case for the simpler bound log( Lip i (K)). When Lip i (K) does not depend on i, the bound log ( Lip i (K)) − log ( Lip i (K) q ) 1/q reduces to (1 − 1/q) log n, a constant multiple of log n as we may expect.
Compared to moment controls, concentration results for arbitrary functions K have a lot more applications, especially when K is non-linear. We refer the reader to [CG12, Section 7] for a description of such applications. Theorem 1.9 implies Theorem 1.4 (just take K(x 0 , . . . , x n−1 ) = f (x i )). However, the proof of Theorem 1.4 is considerably simpler, and motivates some techniques used in the proof of Theorem 1.9. Hence, we prove both theorems separately below. While some cases of Theorem 1.4 are already known (especially the case q = 2, see Remark 1.7), we nevertheless give again a full proof of these cases, for completeness and with the concentration case in mind.
The proofs of our results rely on two main tools: a dynamical one (very precise asymptotics of renewal sequences of operators) and a probabilistic one (inequalities for martingales, of Burkholder-Rosenthal and von Bahr-Esseen type). In addition, for the concentration inequalities, we require analytic tools such as maximal inequalities and interpolation results, since the Lipschitz constants Lip a (K) may vary considerably with a, which makes more usual inequalities too crude. All these tools are presented in Section 2. Theorem 1.4 is proved in Section 3, and Theorem 1.9 is proved in Section 4.

Preliminaries
2.1. Renewal sequences of operators. In this paragraph, we summarize the results on renewal sequences of operators that we need later on. They are proved in [Sar02,Gou04b,Gou04c].
Consider a Young tower T : X → X. The associated transfer operator L, adjoint to the composition by T , is given by Denoting by g (n) (x) = g(x) · · · g(T n−1 x) the inverse of the jacobian of T n , one has L n u(x) = T n y=x g (n) (y)u(y). Iterating the inequality |log g(x) − log g(y)| Cd(T x, T y) and using the uniform expansion when a trajectory returns to the basis, one has the following bounded distortion property: there exists C > 0 such that, for all n, for all points x and y in the same cylinder of length n (i.e., for i < n, the points T i x and T i y are in the same partition element), Among the trajectories of T , the only non-trivial behavior is related to the successive returns to the basis. Define a first return transfer operator at time n by R n u(x) = T n y=x g (n) (y)u(y) where x ∈ ∆ 0 and the sum is over those preimages y of x that belong to ∆ 0 but T i y ∈ ∆ 0 for 1 i n − 1. Since R n only involves preimages y with ϕ(y) = n, its operator norm R n with respect to the Lipschitz norm satisfies R n Cµ(ϕ = n). In particular, R n is easy to understand.
Define a partial transfer operator T n = 1 ∆ 0 L n 1 ∆ 0 . It can be written as T n u(x) = T n y=x g (n) (y)u(y), where x and y all have to belong to ∆ 0 . Decomposing a trajectory from ∆ 0 to ∆ 0 into successive excursions, one gets Formally, this is equivalent to the equality T n z n = (I − R k z k ) −1 . This makes it possible to understand T n . Denote by Π the projection on constant functions on ∆ 0 , given by The following proposition is [Gou04c, Proposition 2.2.19 and Remark 2.4.8] in the specific case of polynomial growth rate (this proposition also holds for more exotic asymptotics such as O(n −q log n) -it follows that most results of our paper could be extended to such speeds).
In particular, T n+1 − T n is summable, hence T n converges. Its limit is µ(∆ 0 )Π. Consider now a general function u and a point x ∈ ∆ 0 , we wish to describe L n u(x) = T n y=x g (n) (y)u(y). Splitting the trajectory of y into a first part until the first entrance in ∆ 0 , of length b 0, and then a second part starting from ∆ 0 at time b and coming back to ∆ 0 at time n, we obtain a decomposition The operator B b is given by , the sum being restricted to those preimages whose first entrance in ∆ 0 is at time b (the projection in the basis of those points necessarily has ϕ > b). By bounded distortion, one gets 2.2. Weak L p spaces. If a function u belongs to L p on a probability space, then P(|u| > s) s −p E(|u| p ) by Markov's inequality. On the other hand, this condition P(|u| > s) = O(s −p ) is not sufficient to belong to L p . For instance, a stable law of index p ∈ (1, 2) satisfies P(|Z| > s) ∼ cs −p , it readily follows that it does not belong to L p .
We say that a random variable u belongs to weak L p if P(|u| > s) = O(s −p ). We write This is the analogue of the L p norm in this context. It satisfies u L p,w u L p . In general, · L p,w is not a norm (i.e., it does not satisfy the triangular inequality), however it is equivalent to a norm when p > 1 (see for instance [SW71, Paragraph V.3]). The weak L p space is a particular instance of Lorentz spaces, corresponding to the space L p,∞ in the standard notation.
Apart from its natural appearance when considering stable laws, a major role of the weak L p space comes from interpolation theory. The following is a particular case of the Marcinkiewicz interpolation theorem, see for instance [SW71, Theorem V.2.4].
This result can for instance be used to prove the boundedness of the Hardy-Littlewood maximal function on any L p space, 1 < p ∞, since boundedness from L 1 to L 1,w and from L ∞ to itself hold. We recall the statement in the case of Z, since we need it later on. See for instance [SW71, Theorem II.3.7].
Theorem 2.3. To a sequence (u n ) n∈Z , associate the sequence For all p ∈ (1, +∞], there exists a constant C such that M u ℓ p C u ℓ p for any sequence u ∈ ℓ p .

Martingale inequalities.
Given a decreasing sequence of σ-algebras F 0 ⊃ F 1 ⊃ . . . on a probability space, a sequence of reverse martingale differences with respect to this filtration is a sequence of random variables D k such that E(D k | F k+1 ) = 0. This is a kind of one-sided independence condition. Moment inequalities, similar to classical inequalities for independent random variables, hold in this setting.
We will use the following Burkholder-Rosenthal inequality: Theorem 2.4. For any Q 2, there exists a constant C such that any sequence of reverse martingale differences satisfies As a consequence, The first statement is due to Burkholder [Bur73, Theorem 21.1]. The second (much weaker) statement readily follows, and is sufficient for our purposes. One interest of the second formulation is that the two terms look the same: in the applications we have in mind, we will control simultaneously E(D 2 k | F k+1 ) L ∞ and E(|D k | Q | F k+1 ) L ∞ . For Q ∈ (1, 2), the (easier) analogue of the above theorem is the inequality of von Bahr and Esseen [vBE65] stating that However, we will rather need a version of this inequality involving weak L Q norms (since the main part of Theorem 1.4 in the case q < 2 is the inequality (1.1), controlling the weak L q norm of S n f ). Such an inequality holds: Theorem 2.5. For any Q ∈ (1, 2), there exists a constant C such that any sequence of reverse martingale differences D k satisfies Proof. This is a consequence of existing results in the literature, as we now explain. First, the L Q,w -seminorm is not a norm, which can be a problem for the proof of inequalities involving an arbitrary number of terms. However, it is equivalent to a true norm, the Lorentz norm L Q,∞ (see [SW71, Paragraph V.3])), so this is not an issue. [Bra94,Theorem 7(1) on Page 39] proves that the space L Q,w = L Q,∞ satisfies the von Bahr-Esseen property of index Q, i.e., the inequality (2.4) holds whenever the D k are independent centered random variables.
Consider now a sequence of reverse martingale differences D k . Let (D k ) be independent random variables, such thatD k is distributed as D k . [ASW11, Theorem 6.1] shows that As the random variablesD k are independent, they satisfy (2.4) by [Bra94]. The same inequality follows for D k .
2.4. Miscellaneous. We use repeatedly the following classical lemma, which is readily proved by a discrete integration by parts.
Lemma 2.6. Let c h be a sequence of nonnegative real numbers with h>n We also use the following fact: If c n is a summable sequence of nonnegative real numbers and p 1, Indeed, this follows from the convexity of x → x p for c n = 1, and the general case follows.

Moment bounds
Our goal in this section is to prove Theorem 1.4. We therefore fix a Young tower with tail n = O(n −q ) for some q > 1.
The convolution of two sequences (c n ) n 0 and (d n ) n 0 is the sequence c ⋆ d given by n for a generic sequence of the form C/(n + 1) q , for a generic C that can change from one occurrence to the next, even on the same line, but only finitely many times in the whole article. We use repeatedly the fact that the convolution of two such sequences is bounded by a sequence of the same form. This fact reads (Note that the sequence c (q) n on the right is not the same as the sequences on the left, in accordance with the above convention.) We wish to understand the moments of Birkhoff sums S n f . Since martingale inequalities are very powerful, we will reduce to such martingales in the most naive way. Let F k = T −k (F 0 ) (where F 0 is the Borel σ-algebra), a function is F k -measurable if and only if it can be written as u • T k for some function u. We have for some functions A k that we now describe. Note that this is a decomposition as a sum of reverse martingale differences, hence the moments of S n f will essentially be controlled by those of A k .
Let L be the transfer operator, it satisfies E(u | F 1 ) = (Lu) • T . Hence, for k < n, Let us define a function F k = k i=0 L i f , this is the main function to understand. Lemma 3.1. If x is at height h and T x ∈ ∆ 0 , then First, we estimate 1 ∆ 0 F k . We use the formalism of renewal transfer operators introduced in Paragraph 2.1. As in (2.1), we write 1 ∆ 0 L n = ℓ+b=n T ℓ B b , where T ℓ counts the returns to the basis at time ℓ, and B b is an average over preimages at time b that did not return to the basis in between. Write Π for the projection on constant functions on ∆ 0 . Proposition 2.1 shows that the operator E ℓ = T ℓ − ΠT ℓ Π satisfies E ℓ c (q) ℓ . We get ℓ , the second sum is uniformly O(1). For the first sum, the function ΠT ℓ ΠB b f is constant by definition, and can be written as where t ℓ is uniformly bounded, and u b (f ) is summable (with sum at most |f | dµ/µ(∆ 0 )).
Consider now an arbitrary x, at height h < k, and with T x ∈ ∆ 0 . Then F k (x) = F k−h (πx) + O(h) where πx is the projection of x in the basis of the tower, i.e., the unique preimage of x under T h . We get For each b, there are at most h + 1 values of ℓ for which k − h < ℓ + b k + 1. Since t ℓ is bounded, we obtain 3.1. The case q > 2. In this paragraph, we prove Theorem 1.4 in the case q > 2. It suffices to prove the desired estimate for p = 2q − 2, since the other estimates follow using (1.2).
We start from the decomposition First, we control the last term, which is easier. Write Q = 2q − 2, we have One can use transfer operators techniques, or argue directly as in [MN08]: since the speed of decay of correlations against bounded functions is O(1/n q−1 ) by [You99], we have Hence, L k f L Q C/k (q−1)/Q = k −1/2 , giving E(S n f | F n ) L Q Cn 1/2 . Then, we turn to the first sum It is a sum of reverse martingale differences, hence we may apply Burkholder-Rosenthal inequality in the form of (2.3): Consider a point x ∈ X. If it does not belong to ∆ 0 , it has a unique preimage z, and moreover A k (z) = 0. Hence, L(|A k | r )(x) = 0. Suppose now x ∈ ∆ 0 . Let z α denote its preimages (with respective heights h α − 1).
We have proved that We use this inequality to estimate the two sums on the right hand side of (3.4). For r = 2, the above integral is uniformly bounded since ϕ has a moment of order 2. Hence, the first sum in (3.4) is bounded by n Q/2 = n q−1 . For r = Q = 2q − 2, the above integral is bounded by k q−2 thanks to Lemma 2.6. Summing over k, it follows that the second sum in (3.4) is bounded by n q−1 , as desired.
3.2. The case q < 2. In this paragraph, we prove Theorem 1.4 in the case q ∈ (1, 2). Again, it suffices to prove the estimate (1.1) regarding the weak q-moment, i.e., S n f L q,w Cn 1/q , since the other estimates follow using (1.3). We start again from the decomposition S n f = A k • T k + E(S n f | F n ). We rely on the von Bahr-Esseen result for weak moments given in Theorem 2.5, for Q = q.
First, we control the last term, as above: we have E(S n f | F n ) L q n k=1 L k f L q . Moreover, we have as above L k f L q C/k (q−1)/q . Summing over k, As the weak L q -norm is dominated by the strong L q -norm, this is the desired control. Now, we turn to the contribution of A k . We want to estimate Hence, for s larger than a fixed constant, This shows that A k L q,w is uniformly bounded. Summing over k and using Theorem 2.5, we get Cn, as desired.
3.3. The case q = 2. In this paragraph, we prove Theorem 1.4 in the case q = 2. Contrary to the previous cases, it is not sufficient to prove the result at the critical exponent p = 2, one should also control all p > 2. The arguments in the proof of the case q > 2 (notably Burkholder's inequality (3.4) combined with (3.5)) give, for a general p 2, First, we have |L k f | p dµ C/k since the speed of decay of correlations is 1/k. Hence, L k f L p k −1/p and the last term in (3.6) is bounded by n p−1 .
Let us now deal with p = 2. Lemma 2.6 gives ∆ 0 (ϕ∧ k) 2 dµ log k since we are precisely at the critical exponent for which there is an additional logarithmic factor. Summing over k and using (3.6), we obtain S n f 2 L 2 n log n as desired. Consider then p > 2. Again, ∆ 0 (ϕ ∧ k) 2 dµ log k, hence the first sum in (3.6) gives a contribution C(n log n) p/2 , which is bounded by Cn p−1 as p/2 < p − 1. For the second sum in (3.6), Lemma 2.6 gives (ϕ ∧ k) p dµ Ck p−2 . Summing over k, we get a bound n p−1 .

Concentration bounds
In this section, we prove Theorem 1.9 about concentration inequalities in Young towers with tail n = O(n −q ) for some q > 1. As before, we write c (q) n for a generic sequence that is O(n −q ).
Consider a general function K(x 0 , x 1 , . . . ) which is separately Lipschitz in each variable, with corresponding constants Lip i (K). Fix any reference point x * in the space.
To study the magnitude of K(x, T x, . . . ), the idea is to decompose it as a sum of reverse martingale differences. We consider K as a function defined on the spaceX = X N , endowed with the probability measureμ = µ ⊗ δ T x ⊗ δ T 2 x ⊗ · · · . Let F k be the σ-algebra generated by indices starting with k (i.e., a function f (x 0 , x 1 , . . . ) onX is F k -measurable if it does not depend on x 0 , . . . , x k−1 ). Let This function plays the role of the function F k (defined after (3.2)) for Birkhoff sums, and is the main object to understand. As in the proof of Lemma 3.1, we want to express K k (x k , . . . ), for x k ∈ ∆ 0 , using the transfer operator restricted to the basis, i.e., T n . Define for i k a function w i on the basis by where for each y we define j(y) as the last time in [0, i − 1] for which T j(y) (y) ∈ ∆ 0 . If there is no such time, then j(y) = −1. The idea is that, for each preimage y of x under T i , we replace its last excursion outside of ∆ 0 by the trivial sequence x * , . . . , x * .
A simple telescoping argument then gives: Indeed, in the expression (4.1), if one starts replacing successively each excursion outside of ∆ 0 , one ends up adding sums of the functions w i (x), and the remaining term (where all excursions have been replaced) is T k x=x k g (k) (x)K(x * , . . . , x * , x k , . . . ), which reduces to K(x * , . . . , x * , x k , . . . ) since g (k) (x) = 1 as the measure is invariant. The above expression also reads We will be able to use it since we know a lot about the operators T n (their properties, expressed in Proposition 2.1, were already at the heart of the proof of Lemma 3.1), but we first need to understand w i more properly. Proof. First, we control the supremum of w i . Write w i (x) = y g (i) (y)H(y), then |H(y)| i−1 j(y)+1 Lip ℓ (K). The sum of g (i) (y) over those points with j(y) < ℓ is T i−ℓ z=x g (i−ℓ) (z), where the sum is restricted to those points z that do not come back to the basis before time i − ℓ. By bounded distortion, this is comparable to µ{ϕ > i − ℓ} c We estimate now the Lipschitz constant of w i . Write for x, x ′ ∈ ∆ 0 where we have paired together the preimages y and y ′ of x and x ′ under T i that belong to the same cylinder of length i. For the second sum, bounded distortion gives |g (i) (y) − g (i) (y ′ )| Cd(x, x ′ )g (i) (y ′ ), hence the Lipschitz norm of this sum is at most C w i ∞ , which has already been controlled in (4.3). For the first sum, we have where Ψ a (z) = ρ Card{0 t<a,T t z∈∆ 0 } : this function measures the expansion of the map T a applied to z, since each return to the basis gives an expansion factor of ρ −1 > 1 by definition of the distance. Using bounded distortion, we get

By [CG12, Lemma 4.4], the sequence
n . The desired bound for the Lipschitz constant of w i follows.
Then, we turn to the analogue of Lemma 3.1.
Lemma 4.2. If x k is at height h and x k+1 ∈ ∆ 0 , then When h > k, the first sum vanishes, and the second one reduces to k a=0 Lip a (K) since Lip a (K) = 0 for a < 0.
If all the Lip a (K) are of order 1 (which is the case for instance with Birkhoff sums), it is easy to check that the expression in the lemma reduces to O(1 + h ∧ k) as in Lemma 3.1.
Proof. The case h > k is easy (just substitute each variable in the expression of K k (x k , . . . ) with the corresponding variable in K k+1 (x k+1 , . . . )), let us deal with the more interesting case h k.
We first prove the inequality Lip a (K).
We replace successively all the variables with index in (k − h, k] in the expressions of K k (x k , . . . ) and K k+1 (x k+1 , . . . ) with x * , introducing an error at most k a=k−h+1 Lip a (K) that corresponds to the last term in (4.5). Letting we may then work withK instead of K. It satisfies Lip a (K) Lip a (K) for a k − h, and Lip a (K) = 0 for a > k − h. Let w i be the corresponding functions forK, and let x = πx k be the projection of x k in the basis of the tower. We get from (4.2) We write T ℓ = ΠT ℓ Π + E ℓ , where Π is the projection on constant functions, and E ℓ c (q) ℓ by Proposition 2.1. We have

By (3.1), this is bounded by
j , which is bounded by (4.5) (to see this, in (4.5), take b = 0 in the first sum over b, and then j = k − h − a in the next sum). In the same way, we have which is again bounded by (4.5) (up to a shift of one in the indices, take b = 0 in the first sum of (4.5) and j = k − a in the second sum).
We turn to the main terms, coming from ΠT ℓ Π. We have ΠT ℓ Πw i = t ℓ u i , for some scalar sequences t ℓ and u i . Moreover, u i is bounded by w i , and |t ℓ − t ℓ+1 | c (q) ℓ by Proposition 2.1. The resulting term is Bounding |u i | by a+b=i Lip a (K)c (q) b and |t j+1 −t j | with c (q) j , we readily check that all those terms are bounded by (4.5).
This concludes the proof of (4.5). To conclude, we should show that the coefficient of Lip a (K) in this expression is bounded by min (h + 1)c as the sequences that are O(1/n q−1 ) are stable under convolution. This proves the upper bound c For the other one, note that j . From this point on, one can continue the computation as above, getting in the end the bound (h + 1)c Remark 4.3. The article [CG12] already proved concentration estimates in Young towers, but only for q > 2. In this case, the estimates were not as good as those in Theorem 1.9. Moreover, all the estimates started diverging when q 2. There are three main differences in the current approach that make it possible to improve upon [CG12]: • The decomposition (4.2) of K k , where one replaces one excursion at a time in the definition of w i , is more efficient than the corresponding decomposition of [CG12] where one only replaces one variable at a time (this creates some useless redundancy in the estimates, which is not a problem when q > 2 but causes divergence of the estimates when q 2). • The main difference between the current paper and [CG12] is that, in Lemma 4.2, we compare directly K k to K k+1 . On the contrary, in [CG12], K k and K k+1 are compared to explicit integral quantities (see for instance Lemma 2.3 there). This is more intuitive and natural, since it expresses the mixing properties of the system. However, when q 2, the convergence towards these integrals is rather slow, making again the estimates diverge. In the proof of Lemma 4.2, we do not claim that K k is close to any explicit or meaningful quantity, only that it is close to t k−i u i . This is sufficient to prove that K k is close to K k+1 since t n is close to t n+1 by Proposition 2.1. Both are also close to lim t i if n is large enough, and this is essentially what is used in [CG12], but this gives a worse estimate. • In the case q > 2, the main new ingredient compared to the techniques of [CG12] is Lemma 4.4 below.
We can now deduce concentration bounds in the different situations we considered for moment bounds.
4.1. The case q > 2. In this paragraph, we prove Theorem 1.9 in the case q > 2. As we explained after the statement of this theorem, it suffices to prove the result for p = 2q − 2.
In this situation, we use (4.4) in the form Lip a (K), i.e., we always use the same term c (q−1) k−h−a in the minimum in (4.4). Let us start the proof of the theorem. The quantity K − E(K) can be decomposed as k 0 (K k − K k+1 ). Since this is a sum of reverse martingale differences, we may use Burkholder-Rosenthal inequality in the form of (2.3), to obtain a bound is not in the basis of the tower, then E(|D k | r | F k+1 )(x k+1 , . . . ) = 0 and there is nothing to do. Assume now that x k+1 is in the basis. Let z α denote its preimages, with respective heights h α . We have With (4.6), we get Using the inequality (X + Y ) r C(X r + Y r ) to separate the two sums, we get two different terms. We should then sum over k, and get a bound in terms of Lip a (K) 2 . First, we deal with the first sum Since the sequence c (q−1) n is summable, we have by (2.5) Summing over k, we get a term The sum over h factorizes out. Then, for each a, the sum over ℓ gives a finite contribution since c (q−1) is summable. We are left with a Lip a (K) r , which is bounded by a Lip a (K) 2 r/2 as desired.
Then, we deal with the second sum k a=k−h+1 Lip a (K). Summing over k, the corresponding term is We need to treat separately the cases r = 2 and r = 2q − 2. For r = 2, we simply use Cauchy-Schwarz inequality: We can factorize out h µ(ϕ = h + 1)h 2 , which is finite since q > 2, by Lemma 2.6. We are left with C a Lip a (K) 2 as desired.
For the case r = 2q − 2, we should prove an inequality It turns out that this inequality is more difficult than the previous ones. It is given in Lemma 4.4 below. With this lemma, the proof is complete.
Lemma 4.4. Let q > 2. Consider a sequence a n 0 with n N a n = O(N −q ). There exists a constant C such that, for any sequence (u n ) ∈ ℓ 2 (Z), Although the statement of the lemma is completely elementary, this result is not trivial, even for a n = 1/n q+1 (as is maybe indicated by the fact that it fails for q = 2). In particular, we have not been able to find a direct proof: We need to resort to maximal inequalities and interpolation.
Proof. We associate to a sequence u n the sequence v(n, h) = n+h i=n−h u i (2h + 1) 1/2 .
We consider v as a function on the space Z × N endowed with the measure ν(n, h) = (h + 1) q−1 a h .
By Lemma 2.6, we have h t (h + 1) q−1 a h C/(t + 1). Hence, This shows that v is bounded in L ∞ and in weak L 2 by C u ℓ 2 . One could deduce boundedness in any L p for 2 < p < ∞ by using classical interpolation arguments, but it is simpler to use the formula (1.3): we get Taking p = 2q − 2, we get The powers of h cancel on the right, and we are left with the statement of the lemma. This is the desired moment estimate.
Let us now start the proof of (1.4). Thanks to Proposition 2.5, the decomposition We have D k (x) = 0 if T x ∈ ∆ 0 , and otherwise Lemma 4.2 gives the bound where h = h(x). We should bound the weak L q norm of both terms on the right to conclude. Let us denote them by U k (x) and V k (x).
We start with V k . Fix some s 0, let h 0 (k) be minimal such that k a=k−h 0 +1 Lip a (K) s. Then where M (k) is the maximal function associated to Lip i (K), i.e., , by Theorem 2.3. This is the desired upper bound. We turn to U k . We have Lip a (K) min (h + 1)c The next lemma shows that this is bounded by C Lip i (K) q (set ε = q − 1, n = k − h, i = a and u i = Lip i (K) to reduce to this statement). This concludes the proof.
Lemma 4.5. Let q > 1 and ε > 0. Consider a sequence a n 0 with n N a n = O(N −q ). There exists a constant C such that, for any sequence (u n ) ∈ ℓ q (Z), Proof. We proceed as in the proof of Lemma 4.4. Define a sequence v(n, h) = 1 (1 + h) 1−ε i∈Z u i · min h + 1 1 + |n − i| 1+ε , We consider it as a function on the space Z × N with the measure ν(n, h) = a h (1 + h) q(1−ε) . We have for any n ∈ Z i∈Z min h + 1 1 + |n − i| 1+ε , This shows that the operator A : u → v is bounded from ℓ ∞ (Z, µ) (where µ is the counting measure) to ℓ ∞ (Z × N, ν). Moreover, writing m = n − i, As we have seen above, the last sum over m is O((1 + h) 1−ε ). Hence, the sum over h and m reduces to h a h (1 + h) q(1−ε) , which is finite by Lemma 2.6. This shows that v ℓ 1 (ν) C u ℓ 1 (µ) .
The operator A : u → v is bounded from ℓ r (µ) to ℓ r (ν) for r = 1 and ∞. By interpolation (see Theorem 2.2), it is also bounded from ℓ q (µ) to ℓ q (ν). This is the desired inequality.
4.3. The case q = 2. In this paragraph, we prove Theorem 1.9 in the case q = 2. As we explained after the statement of this theorem, it suffices to prove the result for p 2. We follow essentially the same steps as in the q > 2 case. We start with Burkholder-Rosenthal inequality (2.3) Moreover, for r ∈ {2, p}, we have (4.10) Let us first consider the contribution of the first line when we sum over k. For r = 2, Lemma 4.5 shows that the resulting term is bounded by C Lip a (K) 2 . Its contribution to Burkholder-Rosenthal inequality is therefore at most This bound is compatible with the statement of the theorem. For r = p, we write Using again Lemma 4.5, it follows that the contribution of this term to Burkholder-Rosenthal inequality is at most Lip i (K) 2 ( Lip i (K)) p−2 as desired. Let us now turn to the second line of (4.10). We define a sequence a∈Z Lip a (K). Let us control its weak L 2 norm. Let s 0. For fixed k, let h 0 (k) be the smallest h such that k a=k−h+1 Lip a (K) s. Then where M (k) is the maximal function associated to Lip a (K), defined in (4.9). By Theorem 2.3, it satisfies k M (k) 2 C Lip a (K) 2 . Hence, we have proved that the weak L 2 norm of v is bounded by C Lip a (K) 2 1/2 . Using the bounds on the weak L 2 norm of v and on its L ∞ norm, one deduces a bound on its strong L 2 norm as in (4.8), and on its strong L p norm for p > 2 as in (4.7). These bounds read: For p = 2, we deduce directly that the contribution of the second line of (4.10) to Burkholder-Rosenthal inequality is bounded as in the statement of the theorem.
For p > 2, we also obtain that the contribution of this line, for r = p, is bounded as desired. It remains to check the contribution of this line with r = 2. Writing u a = Lip a (K), we should prove that Since this equation is homogeneous, it suffices to prove it when u 2 a = 1. In this case, writing x = u a 1, it reduces to the inequality (1 + log x) p/2 Cx p−2 , which is trivial on [1, ∞).

Appendix A. Speed of convergence to stable laws
In this appendix, our goal is to prove Proposition 1.3. To do so, we estimate the speed of convergence of the Birkhoff sums to the stable law, first on the basis ∆ 0 of the tower using the Nagaev-Guivarc'h spectral method. Then, we induce back those estimates to the whole tower. Those ideas are classical: the first step comes from [AD01], the second step from [MT04] (see [Gou13] for a general explanation of the method). However, since we want quantitative estimates, we need to go beyond the results of these articles.
Let Y = ∆ 0 be the basis of the Young tower. We denote by T Y : Y → Y the induced map on the basis, by µ Y = µ |Y /µ(Y ) the induced probability measure, by S Y n the Birkhoff sums for T Y , and by ϕ : Y → N the first return time to Y .
We define a function f on the tower, by f = 1 − 1 Y /µ(Y ), so that f dµ = 0. The induced function on the basis of the tower is by definition Denote by L the transfer operator associated to T Y , and define a family of perturbed transfer operators L t (u) = L(e itf Y u). Their interest is that Hence, spectral properties of L t make it possible to understand the characteristic function of S Y n f Y , and therefore its closeness to the limiting stable law. Lemma A.1. The family of operators t → L t is C 1 .
Proof. We omit the standard argument which shows in fact that the family is C q , see for instance [AD01,Theorem 2.4].
The unperturbed operator L = L 0 has a simple eigenvalue at 1, and the rest of its spectrum is contained in a disk of strictly smaller radius. This spectral description persists for small t, see [Kat66]. Denote by λ t the leading eigenvalue of L t , by Π t the corresponding (one-dimensional) spectral projection, and by Q t = L t − λ t Π t the part of L t corresponding to the rest of the spectrum. All those quantities depend in a C 1 way on t, by the previous proposition. Moreover, for small t, we have for some fixed r < 1. The main contribution in this equation comes from the perturbed eigenvalue λ t .
Lemma A.2. We have for small t > 0 where c is a complex number with ℜc < 0.
Let Z Y be the real probability distribution whose characteristic function is given for t > 0 by E(e itZ Y ) = e ict q , where c is given by Lemma A.2. It is a totally asymmetric stable law of index q. We can now estimate the speed of convergence of S Y n f Y to Z Y : Proposition A.3. There exists C > 0 such that for any n > 0 and for any s ∈ R, In particular, we recover the (already known) convergence of S Y n f Y /n 1/q to Z Y , the novelty being the control on the speed of convergence. Below, in Proposition A.4 and Theorem A.5, we also recover known stable limits, with additional controls on the speed of convergence.
Proof. The quantity to estimate is the L ∞ -norm of the difference between the distribution functions of S Y n f Y /n 1/q and Z Y . Berry-Esseen's lemma (see for instance [Fel66, Lemma XVI.3.2]) ensures that, for any M > 0, this quantity is bounded by where C is a universal constant, and ϕ n and ψ denote respectively the characteristic functions of S Y n f Y /n 1/q and Z Y . We estimate this integral, taking M = M n = α 0 n 1/q for some suitably small α 0 .
First, for t < 1/n, we have In the same way, |ψ(t) − 1| Ct q . Hence, Now, we turn to the interval t ∈ [1/n, M n ]. Combining the formula (A.1) for ϕ n and the spectral expansion (A.2) of L n t , we get ϕ n (t) = λ(t/n 1/q ) n u(t/n 1/q ) + r n (t/n 1/q ), where r n is exponentially small, u is a C 1 function close to 0 and the asymptotic expansion of λ is given in Lemma A.2. The contribution of r n to the integral (A.3) is exponentially small (this is why we had to discard the interval [0, 1/n]). We can write λ(s) = e cs q +B(s) where B(s) = O(s q+ε ), by Lemma A.2. Hence, λ(t/n 1/q ) n = e n(ct q /n+B(t/n 1/q )) = e ct q e nB(t/n 1/q ) = ψ(t)e nB(t/n 1/q ) .
Hence, ψ(t)e nB(t/n 1/q ) e ℜ(c)t q e Cα ε 0 t q . If α 0 is small enough, this is bounded by e −at q , for some a > 0. Since the function u is C 1 with u(0) = 1, it follows that Finally, in I 2 , we use the inequality |e s − 1| |s|e |s| , to get a bound I 2 Mn 0 ψ(t)e n|B(t/n 1/q )| n|B(t/n 1/q )| t dt.
As above, the factor ψ(t)e n|B(t/n 1/q )| is bounded by e −at q . Moreover, the second factor is bounded by t q+ε−1 n −ε/q . This gives I 2 Cn −ε/q . Finally, we obtain a bound for (A.3) of the form Cn −1/q + Cn −ε/q , which is bounded by Cn −ε/q as ε < 2 − q < 1.
We can then lift the above bound to the original Birkhoff sums S n f . Let Z = µ(Y ) 1/q Z Y , it is again a (completely asymmetric) stable law of index q.
Proposition A.4. Let δ = min((q − 1)/(1 + 2q 2 ), ε/q) > 0. There exists C > 0 such that for any n > 0 and for any s ∈ R, Proof. For x ∈ Y , the Birkhoff sums S n f (x) and S Y nµ(Y ) f Y (x) should be close (since a return to Y takes on average 1/µ(Y ) iterates of T , both sums involve roughly the same number of iterations of T ), and we know that S Y nµ(Y ) f Y (x)/(nµ(Y )) 1/q is close to Z Y in distribution. (We write nµ(Y ) instead of its integer part for notational simplicity.) The result follows if we can show that the different errors are suitably small.
Define a function H on ∆ as follows: if x is at height i (i.e., it belongs to ∆ α,i for some α), let πx = T −i x be its unique preimage in the basis, and let H(x) = i−1 j=0 f (T j πx).
Let N (n, x) denote the number of returns to Y of a point x ∈ Y before time n. We get S n f (x) = S Y N (n,x) f Y (x) + H(T n x). We expect N (n, x) to be close to nµ(Y ), hence we decompose further as . Suppose that, for u n = n −δ for some δ ∈ (0, ε/q], we have (A.4) µ Y {|E n |/n 1/q > u n } Cu n , µ Y {|F n |/n 1/q > u n } Cu n .
As Z Y has a bounded density, the probability that µ(Y ) 1/q Z Y belongs to the interval [s − 2u n , s) is bounded by Cu n . Finally, we obtain µ Y {S n f /n 1/q > s} P(µ(Y ) 1/q Z Y > s) + Cn −ε/q + Cu n .
The lower bound is similar, and we obtain the conclusion of the proposition.
It remains to prove (A.4). We first deal with the bound involving F n . We have µ(F n u n n 1/q ) = µ(H • T n u n n 1/q ) = µ(H u n n 1/q ).
The function H can only be A on the set of points with height at least A. The set of points with height i has measure tail i+1 ∼ Ci −q , hence µ(H A) CA −q+1 . We get µ(F n u n n 1/q ) C(u n n 1/q ) −(q−1) . This is bounded by Cu n if u n = n −δ with δ (q − 1)/q 2 .
In the first set, as S Y N (n,x) f Y (x) and S Y nµ(Y ) f Y (x) are separated by u n n 1/q , one of them is distant from S Y nµ(Y )−Mn f Y (x) by at least u n n 1/q /2. Hence, the first set is included in By the invariance of the measure µ Y under T Y , the measure of this set is |S Y k f Y | u n n 1/q /2 .
The sequence S Y i f Y /i 1/q converges in distribution, but more is true: It follows from [CG07, Lemma 7.1 and proof of Theorem 2.10] that this sequence remains bounded in L 1 , and that the weak L 1 norm of the corresponding maxima also remain bounded. Hence, the above equation is bounded by CM 1/q n /(u n n 1/q ). This is bounded by Cu n if u n = n −δ with δ (1 − r)/(2q).
We can now conclude the proof of Proposition 1.3. The probability distribution Z has heavy tails, since it is a stable law of index q: there exists c > 0 such that, for all s 1, we have P(Z > s) cs −q . It follows from Proposition A.4 that µ Y {S n f /n 1/q > s} cs −q − Cn −δ . This is cs −q /2 if Cn −δ cs −q /2, which holds for s ∈ [1, n r ] if r < δ/q and n is large enough. In this range, it follows that µ{S n f /n 1/q > s} c ′ s −q .
Using (1.3) for the first equality, we have |S n f /n 1/q | q dµ = q ∞ s=0 s q−1 µ |S n f /n 1/q | s ds q n r s=1 s q−1 c ′ s −q ds = c ′ qr log n.
This is the desired lower bound.
One can also deduce from Proposition A.4 a speed of convergence towards the stable law Z on the whole space (∆, µ). Although this is not needed for Proposition 1.3, we include it for completeness: Theorem A.5. Let δ = min((q − 1)/(1 + 2q 2 ), ε/q). There exists C > 0 such that for any n > 0 and for any s ∈ R, µ{x : S n f (x)/n 1/q > s} − P(Z > s) Cn −δ .
Proof. Consider a set ∆ α,i , with its renormalized probability measure µ α,i = µ |∆ α,i /µ(∆ α,i ). This measure is sent by T hα−i * to a measure on Y , which is equivalent to µ Y , with a density bounded from above and from below, and with uniformly bounded Lipschitz constant. Proposition A.3 still works for this measure, with uniform constants, since all we need to apply the spectral argument is that the density is Lipschitz. It follows that Proposition A.4 also works for these measures. Adding the additional error coming from the h α − i first steps needed to reach Y (bounded by (h α − i)/n 1/q ), we deduce: for n > h α − i, µ α,i {x ∈ ∆ α,i : S n f (x)/n 1/q > s} − P(Z > s) C(n − (h α − i)) −δ + C(h α − i)/n 1/q .