Decay of correlations for non-Holder observables

We consider the general question of estimating decay of correlations for non-uniformly expanding maps, for classes of observables which are much larger than the usual class of Holder continuous functions. Our results give new estimates for many non-uniformly expanding systems, including Manneville-Pomeau maps, many one-dimensional systems with critical points, and Viana maps. In many situations, we also obtain a Central Limit Theorem for a much larger class of observables than usual. Our main tool is an extension of the coupling method introduced by L.-S. Young for estimating rates of mixing on certain non-uniformly expanding tower maps.


Introduction
In this paper, we are interested in mixing properties (in particular, decay of correlations) of non-uniformly expanding maps. Much progress has been made in recent years, with upper estimates being obtained for many examples of such systems. Almost invariably, these estimates are for observables which are Hölder continuous. Our aim here is to extend the study to much larger classes of observables.
Let f : (X, ν) be some mixing system. We define a correlation function C n (ϕ, ψ; ν) = (ϕ • f n )ψdν − ϕdν ψdν for ϕ, ψ ∈ L 2 . The rate at which this sequence decays to zero is a measure of how quickly ϕ • f n becomes independent from ψ. It is well known that for any non-trivial mixing system, there exist ϕ, ψ ∈ L 2 for which correlations decay arbitrarily slowly. For this reason, we must restrict at least one of the observables to some smaller class of functions, in order to get an upper bound for C n .
Here, we present a result which is general in the context of towers, as introduced by L.-S. Young ([Yo]). There are many examples of systems which admit such towers, and we shall see that under a fairly weak assumption on the relationship between the tower and the system (which is satisfied in all the examples we mention) we get estimates for certain classes of observables with respect to the system itself. One of the main strengths of this method is that these classes of observables may be defined purely in terms of their regularity with respect to the manifold; this contrasts with some results, where regularity is considered with respect to some Markov partition.
All of our results shall take the following form. Given a system f : X , a mixing acip ν, and ϕ ∈ L ∞ (X, ν), ψ ∈ I, for some class I = (Ri, γ) as above, we obtain in each example an estimate of the form where · ∞ is the usual norm on L ∞ (X, ν), C(ψ) is a constant depending on f and ψ, and (u n ) is some sequence decaying to zero with rate determined by f and R ε (ψ). Notice that we make no assumption on the regularity of the observable ϕ; when discussing the regularity class of observables, we shall always be referring to the choice of the function ψ. (This is not atypical, although some existing results do require that both functions have some minimum regularity.) For brevity, we shall simply give an estimate for u n in the statement of each result. For each example we also have a Central Limit Theorem for those observables which give summable decay of correlations, and are not coboundaries. We recall that a real-valued observable ψ satisfies the Central Limit Theorem for f if there exists σ > 0 such that for every interval J ⊂ R, Note that the range of examples given in the following subsections is meant to be illustrative rather than exhaustive, and so we shall miss out some simple generalisations for which essentially the same results hold. We shall instead try to make clear the conditions needed to apply these results, and direct the reader to the papers mentioned below for further examples which satisfy these conditions.

Uniformly expanding maps
Let f : M be a C 2 -diffeomorphism of a compact Riemannian manifold. We say f is uniformly expanding if there exists λ > 1 such that Df x v ≥ λ v for all x ∈ M , and all tangent vectors v. Such a map admits an absolutely continuous invariant probability measure µ, which is unique and mixing.
Such maps are generally regarded as being well understood, and in particular, results of exponential decay of correlations for observables in (R1) go back to the seventies, and the work of Sinai, Ruelle and Bowen ([Si], [R], [Bo]). For a more modern perspective, see for instance the books of Baladi ([Ba]) and Viana ([V2]).
I have not seen explicit claims of similar results for observables in classes (R2 − 4). However, it is well known that any such map can be coded by a one-sided full shift on finitely many symbols, so an analogous result on shift spaces would be sufficient, and may well already exist. The estimates here are probably not sharp, particularly in the (R4) case.
The other examples we consider are not in general reducible to finite alphabet shift maps, so we can be more confident that the next set of results are new.

Maps with indifferent fixed points
These are perhaps the simplest examples of strictly non-uniformly expanding systems. Purely for simplicity, we restrict to the well known case of the Manneville-Pomeau map.
In the case where ψ ∈ (R4, γ) for every large γ, this gives u n = O(n 1− 1 α ), which is the bound obtained in [Yo] for ψ ∈ (R1). We do not give separate estimates for observables in classes (R2) and (R3), as we obtain the same upper bound in each case. Note that the polynomial upper bound for (R1) observables is known to be sharp ( [Hu]), and hence the above gives a sharp bound in the (R2) and (R3) cases, and for (R4, γ) when γ is large.
The above results apply in the more general 1-dimensional case considered in [Yo], where in particular a finite number of expanding branches are allowed, and it is assumed that xf ′′ (x) ≈ x α near the indifferent fixed point.
In our remaining examples, estimates will invariably correspond to either the above form, or that of Theorem 1, and we shall simply say which is the case, specifying the parameter τ as appropriate.

One-dimensional maps with critical points
Let us consider the systems of [BLS]. These are one-dimensional multimodal maps, where there is some long-term growth of derivative along the critical orbits. Let f : I → I be a C 3 interval or circle map with a finite critical set C and no stable or neutral periodic orbit. We assume all critical points have the same critical order l ∈ (1, ∞); this means that for each c ∈ C, there is some neighbourhood in which f can be written in the form for some diffeomorphism ϕ : R → R fixing 0, with the ± allowed to depend on the sign of x − c.
For c ∈ C, let D n (c) = |(f n ) ′ (f (c))|. From [BLS] we know there exists an acip µ provided If f is not renormalisable on the support of µ then µ is mixing.
Case 1: Suppose there exist C > 0, λ > 1 such that D n (c) ≥ Cλ n for all n ≥ 1, c ∈ C. Then we have estimates for (u n ) exactly as in the uniformly expanding case (Theorem 1).
Case 2: Suppose there exist C > 0, α > 2l − 1 such that D n (c) ≥ Cn α for all n ≥ 1, c ∈ C. Then we have estimates for (u n ) as in the indifferent fixed point case (Theorem 2) for every τ < α−1 l−1 . In particular, the Central Limit Theorem holds in either case when ψ ∈ (R4, γ) for sufficiently large γ, depending on R ∞ (ψ).
Again, we have restricted our attention to some particular cases; analogous results should be possible for the intermediate cases considered in [BLS]. In particular, for the class of Fibonacci maps with quadratic critical points (see [LM]) we obtain estimates as in Theorem 2 for every τ > 1.

Viana maps
Next we consider the class of Viana maps, introduced in [V1]. These are examples of non-uniformly expanding maps in more than one dimension, with sub-exponential decay of correlations for Hölder observables. They are notable for being possibly the first examples of non-uniformly expanding systems in more than one dimension which admit an acip, and also because the attractor, and many of its statistical properties, persist in a C 3 neighbourhood of systems.
Let a 0 be some real number in (1, 2) for which x = 0 is pre-periodic for the system x → a 0 − x 2 . We define a skew productf : where d is an integer ≥ 16, and α > 0 is a constant. When α is sufficiently small, there is a compact interval I ⊂ (−2, 2) for which S 1 × I is mapped strictly inside its own interior, andf admits a unique acip, which is mixing for some iterate, and has two positive Lyapunov exponents ( [V1], [AV]). The same is also true for any f in a sufficiently small C 3 neighbourhood N off .
Let us fix some small α, and let N be a sufficiently small C 3 neighbourhood off such that for every f ∈ N the above properties hold. Choose some f ∈ N ; if f is not mixing, we consider instead the first mixing power.
Another way of saying the above is that if ψ ∈ (R4, γ), then u n = O(n 2−ζγ ), with the usual dependency of ζ on R ∞ (ψ). Note that for observables in ∩ γ>1 (R4, γ), we get super-polynomial decay of correlations, the same estimate as we obtain for Hölder observables (though Baladi and Gouëzel have recently announced a stretched exponential bound for Hölder observables -see BG).
There are a number of generalisations we could consider, such as allowing D ≥ 2 ( [BST] -note they require f to be C ∞ close tof ), or replacing sin(2πs) by an arbitrary Morse function.

Non-uniformly expanding maps
Finally, we discuss probably the most general context in which our methods can currently be applied, the setting of [ALP]. In particular, this setting generalises that of Viana maps.
Let f : M → M be a transitive C 2 local diffeomorphism away from a singular/critical set S, with M a compact finite-dimensional Riemannian manifold. Let Leb be a normalised Riemannian volume form on M , which we shall refer to as Lebesgue measure, and d a Riemannian metric. We assume f is nonuniformly expanding, or more precisely, there exists λ > 0 such that For almost every x in M , we may define The decay rate of the sequence Leb{E(x) > n} may be considered to give a degree of hyperbolicity. Where S is non-empty, we need the following further assumptions, firstly on the critical set. We assume C is non-degenerate, that is, m(C) = 0, and ∃β > 0 such that ∀x ∈ M C we have d(x, C) β Df x v / v d(x, C) −β ∀v ∈ T x M , and the functions log det Df and log Df −1 are locally Lipschitz with Lipschitz constant d(x, C) −β . Now let d δ (x, S) = d(x, S) when this is ≤ δ, and 1 otherwise. We assume that for any ε > 0 there exists δ > 0 such that for Lebesgue a.e. x ∈ M , lim sup We define a recurrence time Let f be a map satisfying the above conditions, and for which there exists α > 1 such that Then f admits an acip ν with respect to Lebesgue measure, and we may assume ν to be mixing by taking a suitable power of f .

Young's tower
In the previous section, we indicated the variety of systems we may consider. We shall now state the main technical result, and with it the conditions a system must satisfy in order for our result to be applicable. As verifying that a system satisfies such conditions is often considerable work, we refer the reader to those papers mentioned in each of the previous subsections for full details. The relevant setting for our arguments will be the tower object introduced by Young in [Yo], and we recap its definition. We start with a map F R : (∆ 0 , m 0 ) , where (∆ 0 , m 0 ) is a finite measure space. This shall represent the base of the tower. We assume there exists a partition (mod 0) P = {∆ 0,i : i ∈ N} of ∆ 0 , such that F R |∆ 0,i is an injection onto ∆ 0 for each ∆ 0,i . We require that the partition generates, i.e. that ∞ j=0 (F R ) −j P is the trivial partition into points. We also choose a return time function R : ∆ 0 → N, which must be constant on each ∆ 0,i .
We define a tower to be any map F : (∆, m) determined by some F R , P, and R as follows. Let ∆ = {(z, l) : z ∈ ∆ 0 , l < R(z)}. For convenience let ∆ l refer to the set of points (·, l) in ∆. This shall be thought of as the lth level of ∆. (We shall freely confuse the zeroth level {(z, 0) : z ∈ ∆ 0 } ⊂ ∆ with ∆ 0 itself. We shall also happily refer to points in ∆ by a single letter x, say.) We write ∆ l,i = {(z, l) : z ∈ ∆ 0,i } for l < R(∆ 0,i ). The partition of ∆ into the sets ∆ l,i shall be denoted by η.
The map F is then defined as follows: We notice that the map F R(x) (x) on ∆ 0 is identical to F R (x), justifying our choice of notation. Finally, we define a notion of separation time; for x, y ∈ ∆ 0 , s(x, y) is defined to be the least integer n ≥ 0 s.t. (F R ) n x, (F R ) n y are in different elements of P. For x, y ∈ some ∆ l,i , where x = (x 0 , l), y = (y 0 , l), we set s(x, y) := s(x 0 , y 0 ); for x, y in different elements of η, s(x, y) = 0.
We say that the Jacobian JF R of F R with respect to m 0 is the real-valued function such that for any measurable set E on which JF R is injective, We assume JF R is uniquely defined, positive, and finite m 0 -a.e. We require some further assumptions.
• Measure structure: Let B be the σ-algebra of m 0 -measurable sets. We assume that all elements of P and each n−1 i=0 (F R ) −i P belong to B, and that F R and (F R |∆ 0,i ) −1 are measurable functions. We then extend m 0 to a measure m on ∆ as follows: for E ⊂ ∆ l , any l ≥ 0, we let m(E) = m 0 (F −l E), provided that F −l E ∈ B. Throughout, we shall assume that any sets we choose are measurable. Also, whenever we say we are choosing an arbitrary point x, we shall assume it is a good point, i.e. that each element of its orbit is contained within a single element of the partition η, and that JF R is well-defined and positive at each of these points.
Let F : (∆, m) be a tower, as defined above. We define classes of observable similar to those we consider on the manifold, but characterised instead in terms of the separation time s on ∆. Given a bounded function ψ : ∆ → R, we define the variation for n ≥ 0: Let us use this to define some regularity classes: We shall see that the classes (V1-4) of regularity correspond naturally with the classes (R1-4) of regularity on the manifold respectively, under fairly weak assumptions on the relation between the system and the tower we construct for it. (We shall discuss this further in §13.) These classes are essentially those defined in [P], although there the functions are considered to be potentials rather than observables.
We now state the main technical result.
Theorem 6 Let F : (∆, m) be a tower satisfying the assumptions stated above. Then F : (∆, m) admits a unique acip ν, which is mixing. Furthermore, for all ϕ, ψ ∈ L ∞ (∆, m), where C(ψ) > 0 is some constant, and (u n ) is a sequence converging to zero at some rate determined by F and v n (ψ). In particular: The existence of a mixing acip is proved in [Yo], as is the result in the case ψ ∈ (V 1). As a corollary of the above, we get a Central Limit Theorem in the cases where the rate of mixing is summable.
In §13 we shall give the exact conditions needed on a system in order to apply the above results.

Overview of method
Our strategy in proving the above theorem is to generalise a coupling method introduced by Young in [Yo]. Our argument follows closely the line of approach of that paper, and we give an outline of the key ideas here.
First, we need to reduce the problem to one in a slightly different context. Given a system F : (∆, m) , we define a transfer operator F * which, for any measure λ on ∆ for which F is measurable, gives a measure F * λ on ∆ defined by whenever A is a λ-measurable set. Clearly any F -invariant measure is a fixed point for this operator. Also, a key property of F * is that for any function Next, we define a variation norm on m-absolutely continuous signed measures, that is, on the difference between any two (positive) measures which are absolutely continuous. Given two such measures λ, λ ′ , we write

Now let us fix an acip ν and choose observables
where ψν denotes the unique measure which has density ψ with respect to ν. So Hence we may reduce the problem to one of estimating the rate at which certain measures converge to the invariant measure, in terms of the variation norm. In fact, it will be useful to consider the more general question of estimating |F n * λ − F n * λ ′ | for a pair of measures λ, λ ′ whose densities with respect to m are of some given regularity. (We shall require an estimate in the case λ ′ = ν when we consider the Central Limit Theorem.) Let us now outline the main argument. We work with two copies of the system, and the direct product and consider it to be a measure on ∆ × ∆. If we let π, π ′ : ∆ × ∆ → ∆ be the projections onto the first and second coordinates respectively, we have that Our strategy will involve summing the differences between the two projections over small regions of the space, only comparing them at convenient times that vary with the region of space we are considering. At each of these times, we shall subtract some measure from both coordinates so that the difference is unaffected, yet the total measure of P 0 is reduced, giving an improved upper bound on the difference.
The key difference between our method and that of [Yo] is that we introduce a sequence (ε n ), which shall represent the rate at which we attempt to subtract measure from P 0 . When the densities of λ, λ ′ are of class (V 1), (ε n ) can be taken to be a small constant, and the method here reduces to that of [Yo]; however, by allowing sequences ε n → 0, we may also consider measure densities of weaker regularity.
We shall see that it is possible to define an induced mapF : ∆×∆ → ∆ 0 ×∆ 0 for which there is a partitionξ 1 of ∆×∆, with every element mapping injectively onto ∆ 0 × ∆ 0 underF . In fact, there is a stopping time The density of P 0 with respect to m × m has essentially the same regularity as the measures λ, λ ′ , and the density ofF * (P 0 |Γ) will be similar, except possibly weakened slightly by any irregularity in the mapF . (We shall see that the mapF is not too irregular.) Let For any ε 1 ∈ [0, 1], we may writê for some (positive) measure P 1 |Γ; this is uniquely defined sinceF |Γ is injective. Essentially, we are subtracting some amount of mass from the measureF * (P 0 |Γ). Moreover, we are subtracting it equally from both coordinates; this means that writing Γ = A × B and k = T 1 (Γ), the distance between the measures F k * (λ|A) and F k * (λ ′ |B), both defined on ∆ 0 , is unaffected. However, we also see that the remaining measureF * (P 1 |Γ) has smaller total mass, and this is an upper bound for |F k * (λ|A) − F k * (λ ′ |B)|. We fix an ε 1 , and perform this subtraction of measure for each Γ ∈ξ 1 , obtaining a measure P 1 defined on ∆ × ∆. The total mass of P 1 represents the difference between F n * λ and F n * λ ′ at time n = T 1 , taking into account that T 1 is not constant over ∆ × ∆. Clearly, we obtain the best upper bound by taking ε 1 = 1; however, we shall see that it is to our advantage to choose some smaller value for ε 1 .
We choose a sequence (ε n ), and proceed inductively as follows. First, we define a sequence of partitions {ξ i } such that Γ ∈ξ i is mapped injectively onto ∆ 0 × ∆ 0 underF i . Now given the measure P i−1 , we take an element Γ ∈ξ i and consider the measureF i and specify P i |Γ bŷ As before, we construct a measure P i , the total mass of which gives an upper bound at time T i . To fully determine the sequence {P i }, it remains to choose a sequence (ε i ). Our choice relates to the regularity of the densities dλ dm , dλ ′ dm . This is relevant because the method requires that the family of measure densities has some uniform regularity. (In fact, we require that the log of each of the above densities is suitably regular.) We require this in order that at the ith stage of the procedure, when we subtract an ε i proportion of the minimum local density, this corresponds to a similarly large proportion of the average density. Hence, provided this regularity is maintained, the total mass of P i−1 is decreased by a similar proportion. When we subtract a constant from a density as above, this weakens the regularity. However, at the next step of the procedure, we work with elements of the partitionξ i+1 . Since these sets are smaller, we regain some regularity by working with measures on ∆ 0 × ∆ 0 pushed forward from such sets. That is, we expect the densities dF i+1 * to be more regular than the densities (This relies on the mapF being smooth enough that another application of the operatorF * doesn't much affect the regularity.) The degree of regularity we gain in this way depends on the initial regularity of Φ, and hence of dλ dm , dλ ′ dm , with respect to the sequence of partitions. In the usual case, where dλ dm , dλ ′ dm ∈ (V 1), the regularity we gain each time we refine the partition is similar to the regularity we lose when we subtract a small constant proportion of the density; hence we may take every ε i to be a small constant ε. Where the initial regularities are not so good, we gain less regularity from refining the partition, and so we may only subtract correspondingly less measure.
For this reason, outside the (V 1) case we shall require that the sequence (ε i ) converges to zero at some minimum rate. However, if (ε i ) decays faster than necessary, we will simply obtain a suboptimal bound. So part of the problem is to try to choose a sequence (ε i ) decaying as slowly as is permissible. We shall also need to take into account the stopping time T 1 (which is unbounded), in order to estimate the speed of convergence in terms of the original map F .

Related work
Let us now mention some other results concerning estimates on decay of correlations for non-Hölder observables. Most of these are stated in the context of one-sided finite alphabet shift maps, or subshifts of finite type. (For a comprehensive discussion of shift maps and equilibrium measures, we suggest the book of Baladi, [Ba].) Shift maps are relatively simple dynamical systems, but are often used to code more complicated systems via a semi-conjugacy, in much the same way that each of the examples we consider can be represented by a suitable tower (see 13). Where a system F : X → X being coded has an invariant measure µ which is absolutely continuous with respect to Lebesgue measure, µ is an equilibrium measure for the potential φ = − log JF , where JF is the Jacobian with respect to Lebesgue measure. Most results for shift maps work with an equilibruim measure given by a potential φ which is Hölder continuous (in terms of the usual metric on shift spaces -two sequences are said to be distance 2 −n apart if they agree for exactly the first n symbols). This assumption corresponds to assuming good distortion for JF .
The have been various results concerned primarily with weakening the assumption on the regularity of φ, and obtaining (slower) upper bounds for the rate of mixing with respect to the corresponding equilibrium measures. Kondah, Maume and Schmitt ( [KMS]) used a method of Birkhoff cones and projective metrics, Bressaud, Fernandez and Galves ( [BFG]) used a coupling method (different from the one described here), with estimates given in terms of chains of complete connections, and Pollicott ( [P]) introduced a method involving composing transfer operators with conditional expectations. Each of these results has slightly different assumptions and gives slightly different estimates, but in each case a number of different classes of potentials are considered, and estimates are given for for observables of some similar regularity to (usually not much worse than) the potential. In particular, in all three examples polynomial mixing is given for a potential and observables with variations decaying at suitable polynomial rates.
In addition, Fisher and Lopes ( [FL]) and Isola ([I]) have obtained polynomial decay of correlations for some specific classes of potentials on the full 2-shift, each for a class of observables not unlike our (V 4) class.
We emphasise that each of the above results concerns only shifts (or subshifts) on a finite alphabet. Furthermore, with the exception of uniformly ex-panding maps ( §2.1), none of the examples we consider may be coded (with a sufficiently regular semi-conjugacy) by shifts with finite alphabets, due to distortion considerations. Hence outside the uniform case, there is no direct overlap between the above results and our results. These are essentially generalisations in a different direction, but are worth mentioning if only to note the variety of different methods that have been applied to obtain results for non-Hölder observables in slightly different contexts.
We now move on from shift spaces to mention a result on towers. Following the method of [KMS] mentioned above, Maume-Deschamps ( [M]) essentially reproduced Young's results ( [Yo]), with some slightly weaker estimates. Together with Buzzi ([BM1]), the method has been extended to allow the bounded distortion assumption of §3 to be replaced by a condition on the variation of JF , and also to allow observables of similar regularity. One important difference is that the variation is defined in terms of JF and a partition of the whole tower (the partition η of §3) rather than JF R and the base partition, as here. This significantly reduces the classes of observables they can consider; for instance, for any tower with unbounded return times, our class (V 1) contains many functions that cannot be dealt with at all by their method. It is not clear that any estimates can be obtained for our examples, except for the uniformly expanding case; for Manneville-Pomeau maps, for instance (see §2.2), while we can construct the same tower in their context, the semi-conjugacy between the tower and the system is not regular enough to give any comparable results. Some applications are given in the related paper [BM2], including certain multi-dimensional piecewise expanding affine maps.
We also note that a similar result to that of [BM1] was obtained by Holland ( [Hol]), using a coupling method very similar to the one we use here.
Finally, we mention a result which applies directly to certain non-uniformly expanding systems, rather than to a symbolic space or tower. Pollicott and Yuri ([PY]) consider a class of maps of arbitrary dimension with a single indifferent periodic orbit and a given Markov structure, including in particular the Manneville-Pomeau interval maps. The class of observables considered is dynamically defined; each observable is required to be Lipschitz with respect to a Markov partition corresponding to some induced map, chosen to have good distortion properties. This class includes all functions which are Lipschitz with respect to the manifold, and while some estimates are weaker than comparable results for Hölder observables, bounds are obtained for some observables which cannot be dealt with at all by our methods, such as certain unbounded functions.

Coupling
Over the next few sections, we give the proof of the main technical theorem.
Let F : (∆, m) be a tower, as defined in §3. Let I = {ϕ : ∆ → R | v n (ϕ) → 0}, and let I + = {ϕ ∈ I : inf ϕ > 0}. We shall work with probability measures whose densities with respect to m belong to I + . We see where C ϕ depends on inf ϕ. Let λ, λ ′ be measures with dλ dm , dλ ′ dm ∈ I + , and let P = λ × λ ′ . For convenience, we shall write v n (λ) = v n ( dλ dm ) for such measures λ, and use the two notations interchangeably. We let C λ = C ϕ above, where ϕ = dλ dm . We shall write ν for the unique acip for F , which is equivalent to m.
We consider the direct product F × F : (∆ × ∆, m × m) , and specify a return function to ∆ 0 × ∆ 0 . We first fix n 0 > 0 to be some integer large enough that m(F −n ∆ 0 ∩ ∆ 0 ) ≥ some c > 0 for all n ≥ n 0 . Such an integer exists since ν is mixing and equivalent to m. Now we letR(x) be the first arrival time to ∆ 0 (settingR|∆ 0 ≡ 0). We define a sequence {τ i } of stopping time functions on ∆ × ∆ as follows: and so on, alternating between the two coordinates x, y each time. Correspondingly, we shall define an increasing sequence ξ 1 < ξ 2 < . . . of partitions of ∆×∆, according to each τ i . First, let π, π ′ be the coordinate projections of ∆ × ∆ onto ∆, that is, π(x, y) := x, π ′ (x, y) := y. At each stage we refine the partition according to one of the two coordinates, alternating between the two copies of ∆. First, ξ 1 is given by taking the partition into rectangles E × ∆, E ∈ η, and refining so that τ 1 is constant on each element Γ ∈ ξ 1 , and F τ1 |π(Γ) is for each Γ an injection onto ∆ 0 . To be precise, we write using throughout the convention that for a partition ξ, ξ(x) denotes the element of ξ containing x. Subsequently, we say ξ i is the refinement of ξ i−1 such that each element of ξ i−1 is partitioned in the first (resp. second) coordinate for i odd (resp. even) so that τ i is constant on each element Γ ∈ ξ i , and F τi maps π(Γ) (resp. π ′ (Γ)) injectively onto ∆ 0 .
For i ≥ 1 we let T i be the time corresponding to the ith iterate ofF , i.e. T 1 ≡ T , and for i ≥ 2, Corresponding to {T i } we define a sequence of partitions η × η ≤ξ 1 ≤ξ 2 ≤ . . . of ∆ × ∆ similarly to before, such that for each Γ ∈ξ n , T n |Γ is constant andF n maps Γ injectively onto ∆ 0 × ∆ 0 . It will be convenient to define a separation timeŝ with respect toξ 1 ;ŝ(w, z) is the smallest n ≥ 0 s.t.F n w,F n z are in different elements ofξ 1 . We notice that if w = (x, x ′ ), z = (y, y ′ ), then s(w, z) ≤ min(s(x, y), s(x ′ , y ′ )).
Proof: Let w = (x, x ′ ), z = (y, y ′ ). Whenŝ(w, z) ≥ n, there exists k ∈ N withF n ≡ (F × F ) k when restricted to the element ofξ n containing w, z. So log JF n (w) Let j be the number of times F i (x) enters ∆ 0 , for i = 1, . . . , k. We have for some C ′ > 0, and similarly for x ′ , y ′ . So log JF n (w) for some CF > 0. For the second part, we have We now come to the core of the argument. We choose a sequence (ε i ) < 1, which represents the proportion of P we try to subtract at each step of the construction. Let ψ 0 ≡ψ 0 ≡ Φ. We proceed as follows; we push-forward Φ byF to obtain the function ψ 1 (z) := Φ(z) JF (z) . On each element Γ ∈ξ 1 we subtract the constant ε 1 inf{ψ 1 (z) : z ∈ Γ} from the density ψ 1 |Γ. We continue inductively, pushing forward by dividing the density by JF (F z) to get ψ 2 (z), subtracting ε 2 inf{ψ 2 (z) : z ∈ Γ} from ψ 2 |Γ for each Γ ∈ξ 2 , and so on. That is, we define: We show that under certain conditions on (ε i ), the sequence {ψ i } satisfies a uniform bound on the ratios of its values for nearby points. A similar proposition to the following was obtained simultaneously but independently by Holland ([Hol]); there, the emphasis was on the regularity of the Jacobian.
are both satisfied for some sufficiently large constant K 0 allowed to depend only on F and v 0 (Φ). Then there existδ < 1 andC > 0 each depending only on F , C Φ and v 0 (Φ) such that if we choose ε i =δε ′ i for each i, then for all w, z witĥ s(w, z) ≥ i ≥ 1, Proof: Suppose we are given such a sequence (ε ′ i ) and assume that for each i we have for every w, z withŝ(w, z) ≥ i. We shall see that we may achieve this by a suitable choice of (ε i ). We note that whenŝ(w, z) ≥ i, Applying this inductively, we obtain the estimate We see this is bounded above by the constant (C Φ + β −1 CF )K 0 =:C. So we have It remains to examine the choice of sequence (ε i ) necessary for (6) to hold. For now, let (ε i ) be some sequence with ε i ≤ ε ′ i for each i. Let Γ =ξ i (w) =ξ i (z) and write ε i,Γ := ε i,w = ε i,z . Then We see that 0 ≤ εi,Γ ψi(w) ≤ ε i for all w ∈ Γ, and so C 1 may be chosen so as not to depend on anything. Continuing from the estimate above, where C 2 may be chosen independently of i, w, z since ψi(w) ψi(z) ≥ e −(C+CF ) , provided that at each stage we choose ε i small enough that C 1 C 2 εi 1−εi ≤ ε ′ i . We confirm that it is sufficient to take ε i =δε ′ i for small enoughδ > 0. This means 7 Choosing a sequence (ε i ) Having shown that it is sufficient for our purposes for the sequence (ε ′ i ) to satisfy conditions (4) and (5), we now consider how we might choose a sequence (ε i ) which, subject to these conditions, decreases as slowly as possible. Having chosen a sequence, we shall then estimate the rate of convergence this gives us.
Proof: We start by defining a sequence (v * i ) > 0 as follows: unless v i (Φ) decays exponentially fast, in which case v * i decays at some (possibly slower) exponential rate. To see this, suppose otherwise, in the case where v i (Φ) decays slower than any exponential speed. Then for large i certainly v * i > v i (Φ), and so v * i = cv * i−1 for large i, and (v * i ) decays exponentially fast. But this means v i (Φ) decays exponentially fast, which is a contradiction.

Let us now choose ε
(We ignore the trivial case v 0 (Φ) = 0.) We see that all terms are small enough that (5) is satisfied, and in particular,

Convergence of measures
We introduce a sequence of measure densitiesΦ 0 ≡ Φ ≥Φ 1 ≥Φ 2 ≥ . . . corresponding to the sequence {ψ i } in the following way: Lemma 2 Given a sequence (ε i ) = (δε ′ i ) satisfying the assumptions of Proposition 1, there exists K > 1 dependent only on F , C Φ and v 0 (Φ) such that for Proof: If we fix i ≥ 1, Γ ∈ξ i , and w, z ∈ Γ, then by Proposition 1 we havê Now we obtain a relationship betweenΦ i andΦ i−1 by writinĝ So for any z ∈ ∆ × ∆ we have that The above lemma gives an estimate on the total mass ofΦ i for each i. To obtain an estimate for the difference between F n * λ and F n * λ ′ , we must use this, and also take into account the length of the simultaneous return time T .
Lemma 3 For all n > 0, where K is as in the previous lemma.
Proof: We define a sequence {Φ i } of measure densities, corresponding to the measure unmatched at time i with respect to F × F . We shall often write Φ i (m × m), say, to refer to the measure which has density Φ i with respect to The first term is clearly ≤ 2 Φ n d(m × m). Our construction should ensure the remaining terms are zero, since we have arranged that the measure we subtract is symmetric in the two coordinates. To confirm this, we partition ∆ × ∆ into regions on which each T m is constant, at least while T m < n.
Consider the family of sets A k,i , i, k ∈ N, where A k,i := {z ∈ ∆×∆ : T i (z) = k}. Clearly, each A k,i is a union of elements ofξ i , and for any fixed k the sets A k,i are pairwise disjoint. It is also clear that on any We show that this measure is unchanged if we replace π with π ′ in the last expression. Let E ⊂ ∆ be an arbitrary measurable set, and fix some Γ ∈ξ i |A k,i . Then where C is constant on Γ. This equals Since (m × m)(E × ∆) = (m × m)(∆ × E), the terms of the sum in (7) all have zero value, as claimed. Now in fact, since T i ≥ i, all terms of the series are zero for i > n. For 1 ≤ i ≤ n, The estimate claimed for |F n * λ − F n * λ ′ | follows easily. Finally we state a simple relationship between P {T > n} and (m × m){T > n}. From now on we shall use the convention that P {condition|Γ} := 1 P (Γ) P {x ∈ Γ : x satisfies condition}.
Sublemma 2 There existsK > 0 depending only on C Φ and v 0 (Φ) s.t. ∀i ≥ 1, The dependence ofK on P may be removed entirely if we take only i ≥ some i 0 (P ).

Combinatorial estimates
In Lemma 3 we have given the main estimate involving P , T , and the sequence (ε i ). It remains to relate P and T to the sequence m 0 {R > n}. Primarily, this involves estimates relating the sequences P {T > n} and m 0 {R > n}. We shall state only some key estimates of the proof, referring the reader to [Yo] for full details. Our statements differ slightly, as the estimates of [Yo] are stated in terms of m{R > n}; they are easily reconciled by noting that m{R > n} = i>n m 0 {R > i}. (As earlier,R ≥ 0 is the first arrival time to ∆ 0 .)
This proposition follows from estimates involving the combinatorics of the intermediate stopping times {τ i }. Let us make explicit a key sublemma used in the proofs, concerning the regularity of the pushed-forward measure densities dF n * λ dm ; for the rest of the argument we refer to [Yo], as the changes are minor. Sublemma 3 For any k > 0, let where the dependence on λ is only on v 0 (λ) and C λ , and may be removed entirely if we only consider k ≥ some k 0 (λ).
Proof: Let ϕ = dλ dm , fix x, y ∈ ∆ 0 , and let x 0 , y 0 be the unique points in Ω such that F k x 0 = x, F k y 0 = y. We note that dµ where j is the number of visits to ∆ 0 up to time k. Clearly the penultimate bound can be made independent of λ for j ≥ some j 0 (λ).
The following result combines the estimates above with those of the previous section.

When
for sufficiently small δ 1 .
Proof: In the first case, Proposition 2 and Lemma 3 tell us that for any 0 < δ 1 < 1, for some 0 < θ 0 < 1. For sufficiently small δ 1 , the middle term is ≤ C[δ 1 n]θ ′n , which decays at some exponential speed in n.
In the second case, for any 0 < δ 1 < 1 we have We estimate the middle term by noting that For the last step, note that Proposition 2 applies to the normalisation of (m×m) to a probability measure.

Specific regularity classes
We now combine all of our intermediate estimates to obtain a rate of decay of correlations in the specific cases mentioned in Theorem 6. First, we set ζ =δ K , which can be seen to depend only on F , C Φ and v 0 (Φ). Throughout this section, we shall let C denote a generic constant, allowed to depend only on F and Φ, which may vary between expressions.

Exponential return times
In this subsection, we suppose that m 0 {R > n} = O(θ n ), and hence m{R > n} = O(θ n ).
It is easy to show that e −ζ(log

Polynomial return times
Here we suppose m 0 {R > n} = O(n −α ) for some α > 1. Suppose v n (Φ) = O(n −γ ), for some γ > 2 ζ . We can take (ε i ) such that By Proposition 3, for some δ 1 , The third term here is of order n 1−ζγ . To estimate the second term, we consider three cases.

Decay of correlations
Finally, we show how estimates for decay of correlations may be derived directly from those for the rates of convergence of measures.
Taking λ ′ = ν, we see that v n (Φ) ≤ Cβ n + Cv n (ψ). This shows that estimates for |F n * λ− F n * λ ′ | carry straight over to estimates for decay of correlations. To check that the dependency of the constants is as we require, we note that we can take So an upper bound for this constant is determined by v 0 (ψ). Clearly C Φ depends only on F and an upper bound for v 0 (ψ), and in particular these constants determine ζ =δ K .

Central Limit Theorem
We verify the Central Limit Theorem in each case for classes of observables which give summable decay of autocorrelations (that is, summable decay of correlations under the restriction ϕ = ψ).
A general theorem of Liverani ( [L]) reduces in this context to the following.
Theorem 7 Let (X, F , µ) be a probability space, and T : X a (non-invertible) ergodic measure-preserving transformation. Let ϕ ∈ L ∞ (X, µ) be such that whereT * is the dual of the operatorT : ϕ → ϕ • T . Then the Central Limit Theorem holds for ϕ if and only if ϕ is not a coboundary.
In the above, the dual operatorT * is the Perron-Frobenius operator corresponding to T and µ, that is .
Of course the Jacobian JT here is defined in terms of the measure µ. Let ϕ : ∆ → R be an observable which is not a coboundary, and for which We shall show that φ satisfies the assumptions of the theorem above. It is straightforward to check that C n (ϕ, ϕ; ν) = C n (φ, φ; ν) = (φ • F n )φdν . Hence condition (8) above is satisfied for φ.
Since m and ν are equivalent measures, it suffices to verify the condition in (9) m-a.e. The operatorF * is defined in terms of the invariant measure, so for a measure λ ≪ m it sends dλ dν to dF * λ dν . By a change of coordinates (or rather, of reference measure), we find that where P is the Perron-Frobenius operator with respect to m, that is, the operator sending densities dλ dm to dF * λ dm .
We shall now write φ as the difference of the densities of two (positive) measures of similar regularity to φ. We letφ = b(φ + a), for some large a, with b > 0 chosen such that φ ρdm = 1. We define measures λ, λ ′ by It is straightforward to check this gives two probability measures, and that As we showed in the previous section, v n (φρ) ≤ Cβ n + Cv n (φ) for some C > 0. Also, bφ +φ = b(2φ + a), which is bounded below by some positive constant, provided we choose sufficiently large a. We easily see v n (λ), v n (λ ′ ) ≤ Cv n (ϕ). We now follow the construction of the previous sections for these given measures λ, λ ′ , and consider the sequence of densities Φ n defined in §8. We have . Let ψ n be the density of the first term with respect to m, and ψ ′ n the density of the second. Since P is a linear operator, we see that These densities have integral and distortion which are estimable by the construction. We know ψ n dm = ψ ′ n dm = Φ n d(m × m). In the cases we consider (sufficiently fast polynomial variations) this is summable in n; notice that we have already used this expression as a key upper bound for 1 2 |F n * λ − F n * λ ′ | (see Lemma 3). It remains to show that a similar condition holds pointwise, by showing that ψ n , ψ ′ n both have bounded distortion on each ∆ l , and hence |F n * λ − F n * λ ′ | is an upper bound for ψ n + ψ ′ n , up to some constant. This follows non-trivially from Proposition 1, which gives a distortion bound on {Φ k }, and hence on {Φ n } when we restrict to elements of a suitable partition. The remainder of the argument is essentially no different from that given in [Yo], and we omit it here.

Applications
Having obtained estimates in the abstract framework of Young's tower, we now discuss how these results may be applied to other settings. First, we define formally what it means for a system to admit a tower.
Let X be a finite dimensional compact Riemannian manifold, with Leb denoting some Riemannian volume (Lebesgue measure) on X. We say that a locally C 1 non-uniformly expanding system f : X admits a tower if there exists a subset X 0 ⊂ X, Leb(X 0 ) > 0, a partition (mod Leb) P of X 0 , and a return time function R : X 0 → N constant on each element of P, such that • for every ω ∈ P, f R |ω is an injection onto X 0 ; • f R and (f R |ω) −1 are Leb-measurable functions, ∀ω ∈ P; • ∞ j=0 (f R ) −j P is the trivial partition into points; • the volume derivative det Df R is well-defined and non-singular (i.e. 0 < | det Df R | < ∞) Leb-a.e., and ∃C > 0, β < 1, such that ∀ω ∈ P, ∀x, y ∈ ω, where s is defined in terms of f R and P as before.
We say the system admits the tower F : (∆, m) if the base ∆ 0 = X 0 , m|∆ 0 = Leb|X 0 , and the tower is determined by ∆ 0 , R, F R := f R and P as in §3. It is easy to check that the usual assumptions of the tower hold, except possibly for aperiodicity and finiteness. In particular, | det Df R | equals the Jacobian JF R .
If F : (∆, m) is a tower for f as above, there exists a projection π : ∆ → X we shall simply call the tower projection, which is a semi-conjugacy between f and F ; that is, for x ∈ ∆ l , with x = F l x 0 for x 0 ∈ ∆ 0 , π(x) := f l (x 0 ). In all the examples we have mentioned in §2, the standard tower constructions (as given in the papers we cited there) provide us with a tower projection π which is Hölder-continuous with respect to the separation time s on ∆. That is, given a Riemannian metric d, in each case we have that ∃β < 1 such that for x, y ∈ ∆, d(π(x), π(y)) = O(β s(x,y) ). (10) Note that the issue of the regularity of π is often not mentioned explicitly in the literature, but essentially follows from having good distortion control for every iterate of the map. (Formally, a tower is only required to have good distortion for the return map F R , which is not sufficient.) Given a system f which admits a tower F : (∆, m) with projection π satisfying (10), we show how the observable classes (R1 − 4) on X correspond to the classes (V 1 − 4) of observables on ∆. Recall that for given ψ, R ε (ψ) := sup{|ψ(x) − ψ(y)| : d(x, y) ≤ ε}.
Given a regularity for ψ in terms of R ε (ψ), we estimate the regularity of ψ • π, which is an observable on ∆.
Proof: The computations are entirely straightforward, so we shall just make explicit the (R4) case for the purposes of illustration.
Let us point out that the condition (10) is not necessary for us to apply these methods. If we are given some weaker regularity on π, the classes (V 1 − 4) shall simply correspond to some larger observable classes on the manifold. It remains to check that the semi-conjugacy π preserves the statistical properties we are interested in.
Lemma 6 Suppose the Central Limit Theorem holds for (F, ν) for some observable ϕ : ∆ → R. Then the Central Limit Theorem also holds for (f, π * ν) for the observableφ = ϕ • π.  Figure 1: A system admitting a tower via a non-Hölder semi-conjugacy.
(See figure 1.) Note that the map has unbounded derivative near a, and that a maps onto the critical point at 1. It is easy to check that f is monotone increasing on each interval, and that f has a Markov structure on the intervals Taking ∆ 0 = [0, b], P = {[0, a], (a, b)} with R([0, a]) = 2, R((a, b)) = 1, it is clear that the conditions for f to admit a tower F : ∆ are satisfied. For x, y ∈ [0, a], we have that |x − y| ≈ ( b a ) −s(x,y) . If we fix k and consider |f (x) − f (y)| for x, y ∈ [0, a) with s(x, y) = k, then for y close to a, we have |f (x) − f (y)| ≈ (k log(b/a) + C) −α ≈ k −α for some C. This determines the regularity of the tower projection π, which is in particular not Hölder continuous. However, if we take ψ ∈ (R1, γ) for some