Polynomial loss of memory for maps of the interval with a neutral fixed point

We give an example of a sequential dynamical system consisting of intermittent-type maps which exhibits loss of memory with a polynomial rate of decay. A uniform bound holds for the upper rate of memory loss. The maps may be chosen in any sequence, and the bound holds for all compositions.


Introduction
The notion of loss of memory for non-equilibrium dynamical systems was introduced in the 2009 paper by Ott, Stenlund and Young [10]; they wrote: Let ρ0 denote an initial probability density w.r.t. a reference measure m, and suppose its time evolution is given by ρt. One may ask if these probability distributions retain memories of their past. We will say a system loses its memory in the statistical sense if for two initial distributions ρ0 andρ0, |ρt −ρt|dm → 0.
In [10] the rate of convergence of the two densities was proved to be exponential for certain sequential dynamical systems composed of one-dimensional piecewise expanding maps. Coupling was the technique used for the proof. The 1 same technique was successively applied to time-dependent Sinai billiards with moving scatterers by Stenlund, Young, and Zhang [14] and it gave again an exponential rate. A different approach, using the Hilbert projective metric, allowed Gupta, Ott and Török [7] to obtain exponential loss of memory for time-dependent multidimensional piecewise expanding maps.
All the previous papers prove an exponential loss of memory in the strong sense, namely |ρ t −ρ t |dm ≤ Ce −αt .
In the invertible setting, Stenlund [13] proves loss of memory in the weak-sense for random composition of Anosov diffeomorphisms, namely where f is a Hölder observable, T n denotes the composition of n maps and µ 1 and µ 2 are two probability measures absolutely continuous with respect to the Riemannian volume whose densities are Hölder. It is easy to see that loss of memory in the strong sense implies loss of memory in the weak sense, for densities in the corresponding function spaces and f ∈ L ∞ m . A natural question is: are there examples of time-dependent systems exhibiting loss of memory with a slower rate of decay, say polynomial, especially in the strong sense? We will construct such an example in this paper as a (modified) Pomeau-Manneville map: We use this version of the Pomeau-Manneville intermittent map because the derivative is increasing on [0,1), where it is defined, and this allows us to simplify the exposition. We believe the result remains true for time-dependent systems comprised of the usual Pomeau-Manneville maps, for instance the version studied in [9]. * We will refer quite often to [9] in this note. As in [9], we will identify the unit interval [0, 1] with the circle S 1 , in such a way the map becomes continuous. We will see in a moment how an initial density evolves under composition with maps which are slight perturbations of (0.1). To this purpose we will define the perturbations of the usual Pomeau-Manneville map that we will consider.
The perturbation will be defined by considering maps T β (x) as above with 0 < β ≤ α. Note that T β = T α on 2/3 ≤ x ≤ 1. The reference measure will be Lebesgue (m). If 0 < β k ≤ α is chosen, we denote by P β k the Perron-Frobenius (PF) transfer operator associated to the map T β k .
Let us suppose φ, ψ are two observables in an appropriate (soon to be defined) functional space; then the basic quantity that we have to control is Our goal is to show that it decays polynomially fast and independently of the sequence P βn • · · · • P β1 : we stress that there is no probability vector to weight the β k . Note that, by the results of [11], one cannot have in general a faster than polynomial decay, because that is the (sharp) rate when iterating a single map T β , 0 < β < 1.
In order to prove our result, Theorem 1.6, we will follow the strategy used in [9] to get a polynomial upper bound (up to a logarithmic correction) for the correlation decay. We introduced there a perturbation of the transfer operator which was, above all, a technical tool to recover the loss of dilatation around the neutral fixed point by replacing the observable with its conditional expectation to a small ball around each point. It turns out that the same technique allows us to control the evolution of two densities under concatenation of maps if we can control the distortion of this sequence of maps. The control of distortion will be, by the way, the major difficulty of this paper.
Note that the convergence of the quantity (0.2) implies the decay of the non-stationary correlations, with respect to m: provided φ is essentially bounded and 1( ψdm) remains in the functional space where the convergence of (0.2) takes place. In particular, this holds for C 1 observables, see Theorem 1.6. Conze and Raugy [4] call the decorrelation described above decorrelation for the sequential dynamical system T βn • · · · • T β1 . Estimates on the rate of decorrelation (and the function space in which decay occurs) are a key ingredient in the Conze-Raugy theory to establish central limit theorems for the sums n−1 k=0 φ(T β k • · · · • T β1 x), after centering and normalisation. The question could be formulated in this way: does the ratio converge in distribution to the normal law N (0, 1)?
It would be interesting to establish such a limit theorem for the sequential dynamical system constructed with our intermittent map (0.1). Besides the central limit theorem, other interesting questions could be considered for our sequential dynamical systems, for instance the existence of concentration inequalities (see the recent work [2] in the framework of the Conze-Raugy theory), and the existence of stable laws, especially for perturbations of maps T α with α > 1/2, which is the range for which the unperturbed map exhibits stable laws [6].
We said above that we did not choose the sequence of maps T β according to some probability distribution. A random dynamical system has been considered in the recent paper [3] for similar perturbations of the usual Pomeau-Manneville map. To establish a correspondence with our work, let us say that those authors perturbed the map T α by modifying again the slope, but taking this time finitely many values 0 < α 1 < α 2 < · · · < α r ≤ 1, with a finite discrete law. This random transformation has a unique stationary measure, and the authors consider annealed correlations on the space of Hölder functions. They prove in [3] that such annealed correlations decay polynomially at a rate bounded above by n 1− 1 α 1 . As a final remark, we would like to address the question of proving the loss of memory for intermittent-like maps, but with the sequence given by adding a varying constant to the original map, considered to act on the unit circle (additive noise). This problem seems much harder and a possible strategy would be to consider induction schemes, as it was done recently in [12] to prove stochastic stability in the strong sense.
We will see that very often the choice of β k will be not important in the construction of the concatenation; in this case we will adopt the useful notations, where the exponent of the P 's is the number of transfer operators in the concatenation: P βn • P βn−1 • · · · • P βm := P n−m+1 m P n k = P k+n−1 • P k+n−2 • · · · • P k In the same way, when we concatenate maps we will use the notation Finally, for any sequences of numbers {a n } and {b n }, we will write a n ≈ b n if c 1 b n ≤ a n ≤ c 2 b n for some constants c 2 ≥ c 1 > 0. The first derivative will be denoted as either T ′ or DT and the value of T on the point x as either T x or T (x).
1 The cone, the kernel, the decay Thanks to a general theory by Hu [8], we know that the density f of the absolutely continuous invariant measure of T α in the neighborhood of 0 satisfies f (x) ≤ constant x −α , where the value of the constant has an expression in terms of the value of f in the pre-image of 0 different from 0. We will construct a cone which is preserved by the transfer operator of each T β , 0 < β ≤ α, and the density of each T β will be the only fixed point of a suitable subset of that cone.
We define the cone of functions . Then a direct computation shows that The result now follows since the maps We now denote m(f ) = 1 0 f (x)dx and recall that for any 0 < β < 1 we have m(P β f ) = m(f ). Lemma 1.2. Given 0 < α < 1, the cone is preserved by all the operators P β , 0 < β ≤ α, provided a is large enough.
Proof. Let us suppose that 1 0 f dx = 1; then we look for a constant a for which P β f (x) ≤ ax −α . Using the notations in the proof of the previous Lemma and remembering that where the last step is justified by the fact that β ≤ α and 0 ≤ χ β ≤ 1/2. By taking the common denominator one gets Remark 1.3. The preceding two lemmas imply the following properties which will be used later on.

For any concatenation
See the proof of Lemma 2.4 in [9] for the proof of the first item, the second follows immediately from the first. Remark 1.4. Using the previous Lemmas it is also possible to prove the existence of the density in C 2 for the unique a.c.i.m. by using the same argument as in Lemma 2.3 in [9].
We now take f ∈ C 2 and define the averaging operator for ε > 0: where B r (x) denotes the ball of radius r centered at the point x ∈ S 1 , and define a new perturbed transfer operator by where n ε will be defined later on. It is very easy to see that Proof. By linearity and contraction of the operators P β we bound the left hand side of the quantity in the statement of the lemma by |A ε f − f |dx and this quantity gives the prescribed bound as in Lemma 3.1 in [9].
It is straightforward to get the following representation for the operator P ε,m : We now observe that standard computations (see for instance Lemma 3.2 in [9]), allows us to show that the preimages a α n := T −n α,1 1 verify a α n ≈ 1 denotes the left pre-image of T −1 α , a notation which we will also use later on. Those points are the boundaries of a countable Markov partition and they will play a central role in the following computations; notice that the factors c 1 , c 2 in the bounds c 1 depend on α (and therefore on β), but we will only use the a n associated to the exponent α; in particular we will denote by c α the constant c 2 associated to T α ; the dependence on α, although implicit, will not play any role in the following.
We will prove in the next section the following important fact.
A similar rate of decay holds for C 1 observables φ and ψ on S 1 ; in this case the rate of decay has an upper bound given by where the function F : R → R is affine.
We now treat the first term I in φ on the right hand side ( the terms in ψ being equivalent), and we consider the last term III after that. We thus have: To simplify the notations we put Q k := P nε m+1+(k−1)nε , which reduce the above inequality to By induction we can easily see that with R −1 = 1 and Q 0 = 1; by setting φ m := P m 1 (φ) andφ m = P m 1 (φ − ψ), we have therefore to bound by the quantity We now observe that Q j−l−1 φ m ∈ C 2 ; moreover R m g 1 ≤ g 1 ∀g ∈ C 2 , 1 ≤ m ≤ k, since R m is a concatenation of transfer operators and the averaging map A ε which are all contractions on L 1 . Then we finally get, by invoking also Lemma 1.5, We now look at the third term III which could be written as, by using the simplified notations introduced above: III = R k · · · R 1φm 1 . By using Property (P) and by applying the same arguments as in the footnote 6 in [9], one gets R k · · · R 1φm 1 ≤ e −γk φ − ψ 1 .
In conclusion we get

for a conveniently chosen κ.
In order to prove the second part of the theorem for C 1 observables, we invoke the same argument as at the end of the proof of Theorem 4.1 in [9]. We notice in fact that if ψ ∈ C 1 then we can choose λ, ν ∈ R such that ψ λ,ν (x) = ψ + λx + ν ∈ C 2 , the dependence of the parameters with respect to the C 1 norm being affine.
For instance λ and ν could be chosen in such a way to verify the following constraints:

Distortion: Proof of Property (P)
The main technical problem is now to check the positivity of the kernel; we will follow closely the strategy of the proof of Proposition 3.3 in [9]. We recall that where J = B ε (z) is an interval which we will take later on as a ball of radius ε around z. By iterating we get (we denote with T −1 l,k , k = 1, 2, the two inverse branches of T l ): where x nε = T −1 1,l1 · · · T −1 nε,ln ε x ranges over all points in the preimage of x ∈ T nε • · · · • T 1 J. The quantity on the right hand side is bounded from below by We have therefore to control the ratio inf z∈J 1 |T ′ 1 (z)T ′ 2 (T 1 z) · · · T ′ m (T m−1 · · · T 1 z)| where m is the time needed for an interval J of length greater than 2ǫ to cover all the circle. We proceed as in the proof of Proposition 3.3 in [9].
We need to introduce first some notations. Recall that a α n is the sequence of the preimages of 1 by the left branch of T α . We use similarly a β n for T β and define a 0 n as the infimum over all β > 0 of a β n . Remark that a 0 n is the sequence of the preimages of 1 by the left branch of the map T 0 defined by For k ≥ 1, we define the sequence a k n so that a k 0 = 1 and a k n is the preimage of 1 by T n k+1 the most at the left. In particular, a k n is the preimage of a k+1 n−1 by the left branch of T k+1 . Remark that a k n is a decreasing sequence in n and that a 0 n ≤ a k n ≤ a α n . We define the intervals I k n = [a k n+1 , a k n ], which satisfy T n k+1 I k n = [ 2 3 , 1]. We also define I k n,+ = I k n+1 ∪ I k n = [a k n+2 , a k n ]. We define the intermittent region I = [0, a 0 2 ] and the hyperbolic region H = [a 0 2 , 1]. Let J be an interval of length 2ǫ. We will iterate J under the non-stationary dynamics until it covers the whole space, and will control the distortion in the meantime.
At time k, the iterate K = T k 1 J verifies one of the following condition 1. K ∩ I = ∅; 2. K ∩ I = ∅, and K contains at most one a k ℓ , ℓ > 2; 3. K ∩ I = ∅, and K contains more than one a k ℓ , ℓ > 2.
Case 1. Suppose we are in the situation 1. Either one of the iterates of K will cross the point 2 3 where the maps are not differentiable, or it will fall in the situation 2 or 3. Let n ≥ 1 be the time spent before one of these situations occurs.
Since all maps are uniformly expanding on the hyperbolic region with uniformly bounded second derivatives, by standard computations, we have for all a, b ∈ K : b| .
Since 0 < β ≤ α < 1, the ratio |T ′′ β x| |T ′ β x| and the quantity |T ′ β x| are bounded from above uniformly in β and x ∈ H respectively by D > 0 and 0 < r < 1. We then have where c 1 = D 1−r depends only on α. After integration with respect to b, we find If this new iterate of K intersects the intermittent region, we consider the situation 2 or 3, and continue the algorithm. If it is still in the hyperbolic region, but now contains the point 2 3 , we proceed in the following way. Let us call L = T n k+1 (K) the new iterate, and L l and L r the parts of the interval at the left and the right respectively of 2 3 . Either |L l | > 1 3 |L| or |L r | > 1 3 |L|. In the first case, after one iteration, the image of L l will be contained in [ 2 3 , 1], with the right extremity at 1. So after say m steps, it will cover the whole unit interval, and the distortion is well controlled during this iteration. We then have Setting n 1 = n + m + 1, we thus have In the second case, if the right part is longer than the left part, after one iteration, the iterate of the right part will be of the form [0, x], and we fall in the case 3 of the algorithm. We can apply the control on the distortion given in the case 3 to L r . Like in the previous case, doing this, we will get a factor 1 3 , but as we will see, the case 3 leads to the end of the algorithm, so we will meet the discontinuity point 2 3 at most one time during the whole procedure. Hence the factor 1 3 will appear only one time, and will not multiply itself several times, which could have spoiled the estimate.
Case 2. K is included in an interval I k ℓ,+ . Since T ℓ k+1 (I k ℓ,+ ) = [a k+ℓ 2 , 1] ⊂ [a 0 2 , 1], after exactly ℓ iterations, the image of K will be included in the hyperbolic region, and we continue with the case 1. During this period of time, the distortion is controlled using the Koebe principle, that we recall below : . For all τ > 0, there exists C = C(τ ) > 0 such that for all increasing diffeomorphism g of class C 3 with a non-positive Schwarzian derivative † , for all subintervals J 1 ⊂ J 2 such that g(J 2 ) contains a τ -scaled neighborhood of g(J 1 ) ‡ , one has ≤ 0 ‡ i.e. the intervals on the left and on the right of g(J 1 ) in g(J 2 ) have length at least τ |g(J 1 )| We apply it to g defined as the composition of the analytic extensions to (0, +∞) of the left branches of T k+ℓ , . . . , T k+1 with J 1 = I k ℓ,+ and J 2 = [δ, 2], where δ = δ(k, ℓ) is chosen small enough so that δ < a k ℓ+2 and T ℓ 0 δ < 1 2 a 0 2 . g has non-positive Schwarzian derivative since it is a composition of maps that have non-positive Schwarzian derivatives.
We have g( , which does not depends on the composition of maps, nor the number of steps ℓ. The interval at the left of g(J 1 ) in g(J 2 ) contains [ 1 2 a 0 2 , a 0 2 ], and thus has length longer than 1 2 a 0 2 ≥ τ (1−a 0 2 ) ≥ τ |g(J 1 )|. Similarly, the interval at the right of g(J 1 ) in g(J 2 ) contains [1,2] and thus has length longer than 1 ≥ τ (1 − a 0 2 ) ≥ τ |g(J 1 )|. We have proved that g(J 2 ) contains a τ -scaled neighborhood of g(J 1 ), so the Koebe principle implies there exists C = C(τ ) such that for all a, b ∈ J 1 one has Case 3. If more than one third of K is in [ 2 3 , 1] and is of the form [a, 1], then we consider K ∩ [ 2 3 , 1] and case 1 will hold until we cover the whole interval, and we lose a factor 1 3 (only one time). Otherwise, we define ℓ − as the least integer such that I k ℓ− is included in K. We consider two sub-cases according to whether the part of K at the right of a k ℓ− is of length at least |K| 3 or not. In the first sub-case, we set , we have by the step 2: Since T 2 k+ℓ− I k+ℓ−−1 1 = [0, 1], and |T ′ β (x)| is bounded from above uniformly in β and x by some constant M > 0, we find P ℓ−+1 In the second sub-case, we choose K ′ in such a way that |K ′ | ≥ |K| 3 , the right extremity of K ′ is a k ℓ− and the left extremity is to the right of 0. We cut K ′ into pieces I k ℓ− , . . . , I k ℓ+ such that their union is of length longer than |K ′ | 3 ≥ |K| 9 , with ℓ + minimal. This choice to cut K ′ rather than K allows us to estimate ℓ + : indeed, if we set K = [a, a k ℓ− ], since ℓ + is minimal, the length of [a, a k ℓ+−1 ] is at least 2|K ′ | 3 ≥ 2|K| 9 . Hence, and consequently ℓ + = O(|K| −α ). By the computation done for the case 2, we have , which is sent after one iteration onto the whole interval, we have thanks to Remark 1.3 item 2 with c 3 the constant given in Remark 1.3, since P Conclusion. Let J be an interval of length at least 2ǫ. We associate to J a sequence of integers n 1 , m 1 , n 2 , m 2 , . . . , n p such that for n 1 steps the iterates of J is in the hyperbolic region (with possibly n 1 = 0), then for m 1 steps, it is in situation of the case 2, then it is again for n 2 steps in the hyperbolic region (recall that from the case 2, we can only fall into the case 1), and so on, until one iterate of J crosses the singularity 2 3 , or case 3 happens. These two situations lead to the end of the algorithm. We will only consider the situation where case 3 happens, and when the part of K to the right of a l l− has length not more than |K| 3 , the others being similar. For n ≥ n 1 + m 1 + . . . + n p + ℓ + + 1, we have One has to estimate the supremum over all possible values of t = n 1 + m 1 + . . .+ n p + l + and shows it is of order ǫ −α . Let n ′ ǫ the minimal time needed for an interval of length at least 2ǫ to cover all the circle. We claim that n ′ ǫ = O(ǫ −α ), which concludes the proof since n 1 + m + . . . + n p ≤ n ′ ǫ , as after these iterations, J has not covered the circle, and l + = O(ǫ −α ), as we showed previously.
It remains to prove the claim. Since the first derivatives of all the T β is strictly increasing on the circle, the minimal time associated to intervals J of size 2ε, will be attained when an iterate of J will be located around 0, then moving according to case 3. We first consider an iterate whose length is one third of that of J (see above), located in (0, 2ε/3): we call this situation F. This implies a 1 d+t ≤ 2ε/3 which in turn shows the time needed to cover the circle is n ′′ ǫ = [ 3cα 2ε ] α . Take now J far from 0; if in n ′′ ε steps it will not meet the point 2/3, it will cover the circle, since the derivatives will be continuous along the path. Otherwise if it will meet 2/3 in a number of steps < n ′′ ε , the worst successive situation is to be sent in 0 in the situation F. In conclusion, the minimal time associated to intervals J of size 2ε will be bounded from above by 2n ′′ ε .
paper was completed during the Workshop Mathematics for Planet Earth. MN was supported by NSF grant DMS 1101315. AT was partially supported by the Simons Foundation grant 239583. The authors thank the referee for the useful comments. We are extremely grateful to Sébastien Gouëzel who pointed out a gap in the distortion argument of a preliminary version and suggested how to correct it by using the Koebe principle.

Note added in proof
A more careful analysis shows that in the proof of Property (P) the monotonicity of the derivatives is not necessary to estimate n ǫ . Thus, Theorem 2.6 holds for more general maps than (0.1), e.g. those in [9]; the details can be found in [1].