Instability for a priori unstable Hamiltonian systems: a dynamical approach

In this article, we consider an a priori unstable Hamiltonian system with three degrees of freedom, for which we construct a drifting solution with an optimal time of instability. Such a result has been already proved by Berti, Bolle and Biasco using variational arguments, and by Treschev using his separatrix map theory. Our approach is new: it is based on a special type of symbolic dynamics corresponding to the random iteration of a family of twist maps of the annulus, and it gives the first concrete application of this idea introduced by Moeckel in an abstract setting and further studied by Marco. Our method should also be useful in obtaining the optimal time of instability in the more difficult context of a priori stable Hamiltonian systems.


Introduction
The theory of perturbations of Hamiltonian systems is essentially the study of near-integrable Hamiltonian systems, generated by functions of the form H(θ, I) = h(I) + f (θ, I), (θ, I) ∈ A n = T n × R n , where H is sufficiently smooth and f sufficiently small.
1. For f = 0, H = h is an integrable system and the situation is wellunderstood. In this case, the variables (θ, I) are called angle-action coordinates for h and the Hamiltonian depends only on the action variables. The phase space A n is trivially foliated by invariant Lagrangian tori parametrized by the action variables: indeed the equations of motion of H read θ = ∇h(I) I = 0, so for each I 0 ∈ R n , the torus T 0 = T n × {I 0 } is Lagrangian and invariant under the Hamiltonian flow. The latter is complete and restricts to a linear flow on T 0 with frequency ω 0 = ∇h(I 0 ), that is The action variables remain fixed for all times, and all solutions are quasiperiodic.
2. Now if we let f to be non zero but small, that is |f | = ε < < 1, with respect to some norm | . |, the situation is dramatically different. For such a near-integrable system, the action variables are no longer first integrals and we may find "unstable" solutions (θ(t), I(t)) that experience a substantial drift in the action direction. Such an instability phenomenon is usually referred to Arnold diffusion, after the original work of Arnold ([Arn64]). In this paper, the author devised a beautiful mechanism to construct an example of an analytic near-integrable system with three degrees of freedom (n = 3), possessing an orbit (θ(t), I(t)) drifting in the space of actions, that is |I(τ ) − I(0)| ≥ 1 for some time τ = τ (ε) > 0, which is called the time of instability (or time of diffusion).
3. However, showing that such drifting solutions exist in some large class of near-integrable Hamiltonian systems is a very difficult task. Indeed, on the one hand KAM theory ( [Kol54], [Mos62] and [Arn63]) gives, under some non-degeneracy condition on h and provided ε is small enough, the existence of a set of positive measure of quasi-periodic solutions. These solutions are perpetually stable, in the sense that the variation of their action components is of order at most √ ε for all time. Moreover, for n = 2 and if the integrable Hamiltonian is isoenergetically non degenerate ( [Arn63]), one can even show that all solutions are stable for all time. On the other hand, for n ≥ 3 we know from Nekhoroshev theory ( [Nek77], [Nek79]) that all solutions are stable, not for all time, but for an interval of time which is exponentially long with respect to ε −1 , provided that h meets some quantitative transversality condition and the perturbation is sufficiently small. But even though the existence of drifting orbits for near-integrable Hamiltonian systems seems to be quite exceptional, Arnold conjectured that this topological instability is in fact a "typical" phenomenon (see [Arn63] or [AKN06]). We refer to [Loc99] for a very lucid and enlightening discussion on Arnold mechanism (and on Arnold diffusion in general). Under some "generic" conditions and for n = 3, Mather ([Mat04]) announced that this conjecture holds if the unperturbed Hamiltonian is convex (the convexity is required by the use of his variational methods). This is so far the best result in this direction, however, his proof is highly technical and still incomplete. Another connected question, which is even harder, is to find orbits that densely fill the energy level (this is the so-called quasi-ergodic hypothesis) : for instance, in [Her98] it is asked whether there exists a Hamiltonian system, C ∞ -smooth and for r ≥ 2, C r -close to the integrable Hamiltonian h(I) = 1 2 |I| 2 , which has a dense orbit on an energy level (progress towards this question have been made recently in [KLS10] and [KZZ09]).
4. Yet there is another case, derived from Arnold's original example, which is much simpler and hence more studied and understood, in which the unperturbed Hamiltonian is completely integrable (in the sense of symplectic geometry) but possesses some "hyperbolicity". The prototype for such a system is given by an uncoupled product of rotators with a pendulum. This case is usually referred to as a priori unstable (following the terminology introduced in [CG94]), in contrast with the a priori stable case where the unperturbed system is integrable in angle-action coordinates and therefore has no hyperbolic feature (one can also say that a priori stable systems are "fully elliptic"). The interest in studying such a priori unstable systems is in fact double. First, one can use the original strategy of Arnold to find unstable orbits, by constructing and following a "transition chain" made of sets with suitable hyperbolic properties (or minimizing properties when using variational arguments). The second interest lies in the fact that, due to normal form theory, the study of a priori stable systems in a neighbourhood of a simple resonance reduces to the a priori unstable case (see [LMS03] for example). Results on the topological instability of a priori unstable systems under fairly general conditions can be found in [DdlLS06] and [Tre04] by means of geometrical methods and in [CY04], [Ber08] and [CY09] by variational methods. However there are many difficulties in using these results to tackle Arnold's conjecture which concerns a priori stable systems.

5.
Another feature of a priori unstable systems is that one avoids all exponentially small phenomena which are typical for analytic (or Gevrey) a priori stable systems, as the lower bound on the time of instability imposed by Nekhoroshev's theorem. Indeed, if µ denotes the small parameter in the a priori unstable case, then it was realized by Lochak that the time of instability should be polynomial, and then Bernard ([Ber96]), adapting Bessi's work ( [Bes96]), showed that one can obtain an upper bound on the time of instability which is of order µ −2 . In [Loc99], it was conjectured that the optimal time of instability should be µ −1 ln µ −1 , and this was proved by Berti, Bolle and Biasco [BBB03], using Bessi's ideas and suitable a priori estimates, obtained by convexity arguments, to localize the minima of the action functional. This time of instability was also found by Treschev ([Tre04]) using fairly different geometric methods.
6. The goal of this article is to give yet another method to construct an example with this optimal time of instability, which should be useful to obtain the optimal time of instability for a priori stable systems. Our approach is dynamical: it uses the notion of polysystem introduced by Moeckel ( [Moe02]) in the context of Arnold diffusion, and which corresponds to the random iteration of a family of maps (the term "polysystem" is due to Marco, such systems are just locally constant skew-products over a Bernoulli shift and they are also known as iterated function systems). More precisely, our example uses an explicit construction of a polysystem, which can be seen as a concrete realisation of an abstract mechanism introduced by Marco in [Mar08]. Finally, let us also note that polysystems are a crucial ingredient in the recent work of Marco towards the instability for general a priori stable systems with 3 degrees of freedom ( [Mar10b], [Mar10c], [Mar10d] and [Mar10a]).

Main results
7. For n ≥ 1, let us recall that a function f ∈ C ∞ (A n ) is α-Gevrey, for α ≥ 1, if for any compact subset K ⊆ A n there exist two positive constants with the standard multi-index notation. We shall denote by G α (A n ) the space of such functions. For α = 1, these are exactly the analytic functions, but for α > 1, the space G α (A n ) contains non zero compactly-supported functions (we refer to [MS02], Appendix A, for more details). Our perturbation will be α-Gevrey for α > 1.
Below we shall state two versions of our theorem, one for Hamiltonian diffeomorphisms (Theorem 2.2) and one for Hamiltonian flows (Theorem 2.1). For the discrete case, our unperturbed diffeomorphism is the time-one map of the Hamiltonian flow generated by h, Φ h : A 2 → A 2 . In the continuous case, our unperturbed system is the a priori unstable Hamiltonian with three degrees of freedom defined bŷ h(θ, I) = h(θ 1 , θ 2 , I 1 , I 2 ) + I 3 = 1 2 (I 2 1 + I 2 2 ) + I 3 + cos 2πθ 1 .

Let us state our main result.
Theorem 2.1. For α > 1, there exist positive constants C, µ 0 and a func- has an orbit (θ(t), I(t)) t∈R such that with the estimates We note that even though our perturbation is only Gevrey regular, using the techniques developed in [LM05] one can obtain Theorem 2.1 in the analytic case, but with considerably more work. However, as opposed to [BBB03] where their method works for an open set of perturbations, our construction really requires a specific perturbation (this will be explained below in section 3.2, point 3, when discussing the choice of our perturbation).
The unstable orbit constructed in the theorem above is bi-asymptotic to infinity, and the estimates show that the time of instability is of order µ −1 ln µ −1 . As we have already said, similar examples have been constructed by means of very different methods, in [BBB03], [Tre04] (see also [CG03]). Moreover, in [BBB03] the authors proved that for such a system the following stability estimates hold true in the analytic case: given any ρ > 0, there exist positive constants µ 0 and c such that for µ ≤ µ 0 , Their proof relies on Nekhoroshev's estimates, for both convex and steep integrable systems, that lead to exponential stability in the part of the phase space (away from the separatrices) where one can introduce angle-action coordinates for the integrable part (thus reducing the system to an a priori stable one), as well as some direct arguments leading to the time of stability µ −1 ln µ −1 close to the separatrices. Therefore this time of instability is optimal within the analytic category. But as it is only polynomial (and in fact almost linear) and since the arguments close to the separatrices are essentially independent of the regularity, it should be also optimal in the Gevrey case and in the C k case for k large enough, provided that the corresponding Nekhoroshev's estimates hold true. In [MS02] (resp. in [Bou10b]), exponential (resp. polynomial) stability estimates have been proved for Gevrey (resp. C k ) perturbations of quasi-convex systems. Therefore to obtain a stability result similar to [BBB03] in lower regularity, one only needs the more general steep case, and this has been done in ( [Bou10a]).

9.
Let us recall that one of our main motivation in using a new mechanism to construct an unstable orbit with an optimal time of instability in this a priori unstable case, is to tackle the corresponding problem for the a priori stable case, which is a much more challenging problem. In fact, in this case the optimal time of instability τ (ε) is of order exp (ε −a ), for some positive exponent a, and it was shown recently in [BM11] that one has the lower bound a ≥ (2α(n − 1)) −1 , for α-Gevrey perturbations, with α ≥ 1 (this include the analytic case α = 1). In the same paper, the authors conjectured that this value is indeed optimal, in the sense that the upper bound a ≤ (2α(n − 1)) −1 should also hold (for the moment, one only knows that a ≤ (2α(n − 2)) −1 , this is due to Herman, Marco and Sauzin ( [MS02]) for the case α > 1, and to Ke Zhang ([Zha09]) for the more difficult analytic case α = 1). We believe that using some technical tools introduced in the present work, in particular the construction of our polysystem, one should finally be able to reach the value a = (2α(n − 1)) −1 (first in the easiest case α > 1). Indeed, one can show that polysystems do appear in the a priori stable case, as in the examples of [Mar05] and [LM05], our example being just an a priori unstable version of these. But of course, whereas in the a priori unstable case, the splitting (which is the "distance" between stable and unstable manifolds of partially hyperbolic tori, or equivalently, the size of the drift in the action after one homoclinic excursion in the annulus) is simply of order µ, in the a priori stable case it is exponentially small (with respect to ε) and therefore much more difficult to deal with. Nevertheless, we believe that one can "embed" our polysystem into an a priori stable system, either by refining the splitting estimates of [Mar05] and [LM05], or by using an embedding mechanism analogous to [MS02], in which the authors use a very clever lemma to estimate the time of instability without any splitting estimates (they easily recover the latter as a consequence of their construction).
10. We shall not prove Theorem 2.1 directly, but the following equivalent version in terms of diffeomorphisms. Given a function H on A n , we shall denote by Φ H t : A n → A n the time-t map of its Hamiltonian flow and by Φ H = Φ H 1 the time-one map.
Theorem 2.2. For α > 1, there exist positive constants C, µ 0 and a func- has an orbit (θ k , I k ) k∈Z such that lim k→±∞ I k 2 = ±∞, with the estimates |I N 2 − I 0 2 | ≥ 1, N ≤ Cµ −1 ln µ −1 . Theorem 2.1 follows easily from Theorem 2.2 by a classical suspension argument (see [MS02] for a simple method in the Gevrey case using compactlysupported functions), so we shall not repeat the details. Of course, the constants µ 0 , C and the function f are not the same in both theorems, but we have kept the same notation for simplicity. Moreover, in the sequel we will not give explicit values for these constants, in fact sometimes it will be more convenient to use asymptotic notations: given u(µ) and v(µ) defined for µ ≥ 0, we shall write u(µ) = O(v(µ)) if there exist positive constants µ and c independent of µ such that the inequality u(µ) ≤ c v(µ) holds true for 0 ≤ µ ≤ µ .
11. The plan of this paper is the following. In section 3, we shall describe the perturbation. We will show that the perturbed system has an invariant normally hyperbolic manifold, the stable and unstable manifolds of which intersect transversely along a homoclinic annulus.
Then, this will be used in section 4 to show the following generalisation of the Birkhoff-Smale theorem to the normally hyperbolic case: near the homoclinic manifold, there exist an invariant set on which a suitable iterate of the system is conjugated to a symbolic dynamic, more precisely to a skew-product on the annulus A over a Bernoulli shift. Let us give a precise definition.
Given an alphabet A ⊆ N, we let Σ A = A Z be the Cantor set of bi-infinite sequence of elements in A. We will denote by σ = σ A the left Bernoulli shift on Σ A , that is ifn = (n k ) k∈Z belongs to Σ A , then σ(n) = (n k ) k∈Z is defined by n k = n k+1 , k ∈ Z.
Definition 2.3. Let M be a manifold. A skew-product on M over σ is a map G : where Fn : M → M is an arbitrary map, forn ∈ Σ A .
However we are not able to prove the existence of our orbit using directly this symbolic dynamic. Hence in section 5, we will first construct this orbit for a simplified model called polysystem, which appears as a random iteration of "standard" maps of the annulus, and which is close to our skew-product. Here is the definition.
Definition 2.4. A polysystem on M is a skew-product over a Bernoulli shift of the form with f n : M → M for n ∈ A.
In other terms, a polysystem is just a locally constant skew-product over the shift, and this special property will greatly help us in the construction of our orbit. Moreover, polysystems are equivalent to iterated functions systems on the annulus, and these dynamical systems are known to have drifting orbits under generic assumptions (see [Moe02]). Then, in section 6, the orbit for the polysystem will be considered as a pseudo-orbit for the skew-product and, using some hyperbolicity of our orbit, we will conclude the existence of an orbit for the general skew-product using shadowing arguments.
Finally we have gathered in an appendix some estimates on the so-called time-energy coordinates for the simple pendulum that are used in section 4.

Construction of the perturbation
This section is devoted to a description of our system, which is similar to the one introduced in [Mar05]. We will first consider the integrable case, and then explains the construction of the perturbation.

Integrable system
Our integrable diffeomorphism F 0 : A 2 → A 2 is the time-one map of the Hamiltonian flow generated by h, that is which is the product of the pendulum map Φ P = Φ 1 2 I 2 1 +cos 2πθ 1 and the integrable twist map Φ S = Φ 1 2 I 2 2 (see figure 1). It is an "a priori unstable" map in the sense that it possesses an invariant normally hyperbolic annulus. Indeed, let O = (0, 0) be the hyperbolic fixed point of the pendulum map Φ P , its stable and unstable manifolds obviously coincide, and if S is the upper part of the separatrix, then Due to the product structure of the map F 0 , it is easy to see that the annulus A = O × A is invariant by F 0 , and one can check that it is symplectic for the canonical structure of A 2 . But the most important feature is that this annulus is normally hyperbolic, more precisely it is r-normally hyperbolic for any r ∈ N, in the sense of [HPS97]. To see this, just decompose the tangent bundle of A 2 along A as where E s (resp. E u ) is the one-dimensional contracting (reps. expanding) direction associated to the hyperbolic fixed point. Then note that the restriction of F 0 to A coincides with the integrable twist map Φ S , hence there is zero contraction and expansion in the tangent direction T A.
The 3-dimensional stable and unstable manifolds of A also coincide, and are given by the product

Perturbed system
Now we will describe the geometric properties of our perturbed system. It will be of the form where µ > 0 is the small parameter, and f : A 2 → R a function of the form to be defined precisely below.
1. First χ : T → R will be a bump function. Of course such a function can be chosen to be α-Gevrey for α > 1 but not analytic. More precisely, let us chooseθ ∈ T such that Since the separatrix S is symmetric about the section {θ 1 = 1/2},θ is welldefined. Then choose anyθ ∈ ]θ, 1/2[, and a function χ ∈ G α (T), α > 1, such that Since χ is identically zero at 0, the perturbation Φ µf is the identity on the normally hyperbolic annulus A, and therefore the latter remains invariant and normally hyperbolic for the map F µ . Moreover, as χ vanishes in a neighbourhood of 0, some pieces of the stable and unstable manifolds for F µ will coincide with the ones of F 0 . The use of a bump function here is only to simplify the construction, it can be relaxed but at the expense of a more involved analysis (as in [LM05]).
For (τ, e, θ 2 , I 2 ) ∈ D * , our perturbation can be written explicitly as and otherwise the diffeomorphism Φ µf is the identity.

5.
In the sequel we shall need the following property.
Proposition 3.1. The immersed manifolds W + (A, F µ ) and W − (A, F µ ) intersect transversely along the annulus Let us remark that this annulus is also given by in the original coordinates.
Proof. By definition of D, one easily sees that the sets are disjoint from D. Since F µ and F 0 coincide outside D, we can define pieces of stable and unstable manifolds by Then on the one hand, and on the other hand,

Now in time-energy coordinates, one simply has
and therefore while, using the expression of the perturbation (2), Since f 2 is nowhere zero, one easily sees that the manifolds ) intersect transversely along the annulus 4 Construction of a symbolic dynamic 6. In this section, we will take advantage of the fact that our diffeomorphism possesses an invariant normally hyperbolic annulus A, whose stable and unstable manifolds intersect transversely along a homoclinic annulus I µ .
In the case where the normally hyperbolic manifold is a point, under such a transverse homoclinic intersection it is well-known that chaotic dynamics arise: the system has an invariant Cantor set on which a suitable iterate is conjugated to a shift map (this is the Horseshoe theorem, due to Birkhoff, Smale and Alexeiev). In our more general situation, the symbolic dynamic is more complicated.

7.
Recall from definition 2.3 that a skew-product (over σ) is completely defined by the family of maps (Fn)n ∈Σ A : M → M , and the skew-product will be denoted by [[Fn]] n∈Σ A . Then one can easily see that a sequence In this section, we will need our alphabet A to contain integers of order ln µ −1 , but for subsequent arguments, we will also need it to contain integers n ∈ N as large as µ − 1 2 ln µ −1 , so we may already fix For simplicity, we shall get rid of the subscript µ.
8. Let O µ ⊆ D * be a neighbourhood (in time-energy coordinates) of the homoclinic annulus I µ . For x ∈ O µ , let us set We will show below that there exists Λ µ ⊆ O µ such that for any x ∈ Λ µ , the doubly infinite sequencen x = (n x k ) k∈Z is a well-defined element of Σ A . In fact, Λ µ is homeomorphic to a Cantor set of annuli, and we will be interested in the dynamics restricted to this set. For that, following Moser ([Mos73]) we define the transversal map 0 < +∞, which is the first return map to the neighbourhood O µ , and if the sequencē n x is well-defined, then so isF n µ (x) for n ∈ Z. 9. In the sequel, we will consider the discrete topology on the alphabet A, and the sets A N and Σ A = A Z will be endowed with the product topology, for which they are compact and metrizable. The goal of this section is to prove the following proposition.
The proof of this result is long and technical. We will first prove an abstract result in section 4.1, which is contained in Proposition 4.4, and then we will apply this proposition to our example in section 4.2.

1.
Here we will use the framework developed by Chaperon ([Cha04], [Cha08]), which was designed to obtain rather general invariant manifold theorems, in particular in the normally hyperbolic case.
Consider a complete metric space X and a complete subspace Y of a metric space F . We endow the product space X × F with the product metric, that is for ( and we make the following two assumptions. (H1) There exists a constant ρ −1 0 > 0 such that for all x ∈ X, y, y ∈ Y d(g(x, y), g(x, y )) ≥ ρ −1 0 d(y, y ).
Hence, for all x ∈ X, the map is a bijection.
, then G is Lipschitz. We will say that the map h satisfies hypothesis (H) if it satisfies both hypotheses (H1) and (H2). Under these assumptions, we will consider three positive constants ν 0 , σ 0 and κ 0 where ν 0 is the Lipschitz constant of G with respect to Y , that is and κ 0 the Lipschitz constant of f . Let us now formulate another hypothesis.
The following result is due to Chaperon ([Cha04]). is the graph of a contracting map Φ : X → Y , with 2. The above theorem is concerned with the iteration of a single map h, but with this formalism we obviously have an analogous result for the iteration of a family of maps. More precisely, consider a family of maps where A is the given alphabet. We will assume that each map satisfies hypothesis (H) and a uniform version of hypothesis (L), that is: Then one can state the following proposition.
Proposition 4.3. With the previous notations, assume that Y is bounded and that for any n ∈ A the maps

satisfy hypotheses (H) and (L'). Then for any sequencen
is the graph of a contracting map Φn + : X → Y , with Moreover, if we endow the space of continuous function C(X, Y ) with the topology of pointwise convergence, then the map is continuous.
The proof is the same as in Theorem 4.2, with some obvious modifications.
3. Now we consider two metric spaces F + and F − , and two complete subspaces X ⊆ F − and Y ⊆ F + . Let V another complete metric space, and One has to think of V as a central direction and F − (resp. F + ) as a contracting (resp. expanding) direction. We consider two families of maps (h + n ) n∈A and (h − n ) n∈A of the form and we decompose them as h ± n = (f ± n , g ± n ), where The hypotheses (H) and (L') for h ± n refer to these decompositions. Let us also denote by 4. The aim of this section is to prove the following result.
Proposition 4.4. With the previous notations, assume that X, Y are bounded, V is locally compact and that for any n ∈ A the maps satisfy hypotheses (H) and (L'). Let us also assume that for any n ∈ A, (ii) the sets Z + n (resp. Z − n ) are pairwise disjoint.
Then for any sequencen ∈ Σ A , the map h + n 0 has an invariant set Λ and there exist a homeomorphism which conjugates h + n 0 |Λ to the skew-product on V given by As a consequence, the functions Fn : V → V , forn ∈ Σ A , are also defined by the following equation This remark will be useful later on to obtain the estimates (4) in Proposition 4.4.
Proof. Forn ∈ Σ A , let us definē Since each family of maps (h + n ) n∈A and (h − n ) n∈A satisfies hypotheses (H) and (L'), we can apply Proposition 4.3 so both sets are graphs of contracting maps Therefore for each v ∈ V , the maps are also contracting, and so are the maps Since X and Y are complete, these maps have fixed points x(n, v) ∈ X and y(n, v) ∈ Y from the contraction principle, and by uniqueness they satisfy Then by the previous relation this set is non-empty since it is the graph of the contraction and, from (6) and (7), Moreover, as the maps are continuous, from the contraction principle one also has the continuity of the mapn The fact that this set is invariant under h + n 0 will follow from our condition (i). Indeed, if z ∈ Λ, then z ∈ Λ(n) for somen ∈ Σ A and so by definition From the first relation we get and since by hypothesis and Λ(n) ⊆ Z + n 0 , we get from the second relation Now (8) and (9) means exactly that h + n 0 (z) ∈ Λ(σ(n)), so Λ is positively invariant under h + n 0 . In fact, a completely similar argument using condition (i) shows that h + n 0 (Λ(n)) = Λ(σ(n)), and hence Λ is totally invariant under h + n 0 . More generally, one obtains h ± n k−1 • · · · • h ± n 0 (Λ(n)) = Λ(σ ±k (n)), k ∈ Z.
Next let us prove that as a consequence of (ii) the union is disjoint. Letn =n , so there exists l ∈ Z such that n l = n l . First suppose that l = 0, then on the one hand and on the other hand As Z + n 0 and Z + n 0 are disjoint by hypothesis, then so are Λ(n) and Λ(n ). Now if l ≥ 1, we can assume without loss of generality that n k = n k for 0 ≤ k ≤ l − 1, and as before so Λ(n) and Λ(n ) have to be disjoint. Finally, the case l ≤ −1 is completely similar using the hypothesis that Z − n are pairwise disjoint for n ∈ A. To conclude, as Λ is a disjoint union, every point z ∈ Λ can be uniquely is a well-defined continuous bijection, and as V is locally compact, one can check that it is a homeomorphism with respect to the product topology on The last equality exactly means that the map h + n 0 : Λ −→ Λ is conjugated by Υ to the skew-product where F (n, v) = F + n 0 (Φn(v), v). This ends the proof.

Proof of Proposition 4.1
This section is entirely devoted to the proof of Proposition 4.1, this will be done in several steps. We will have to use notations and estimates on time-energy coordinates, contained in Appendix A, and this will require to choose µ sufficiently small. Moreover, we shall use various coordinates but we shall keep the same notation for the diffeomorphisms expressed in different coordinates.
Step 1. Straightening of the invariant manifolds.
Using time-energy coordinates on the first factor, our homoclinic annulus is given by and in a neighbourhood of it, the stable manifold of F µ is given by while the unstable manifold is Now we introduce the change of coordinates where τ = τ − (µf 2 (θ 2 )) −1 e, e = (µf 2 (θ 2 )) −1 e.
Step 2. Choice of the box Z.
Our goal is to use Proposition 4.4, so we will explain how to choose the domain Z. Our central direction V = A will be the annulus given by the coordinates (θ 2 , I 2 ), our contracting and expanding directions will be one-dimensional, so X ⊆ F − = R and Y ⊆ F + = R. We will choose with τ 0 = c 1 µ 2π−1 , e 0 = c 2 µ 2π−1 , and c 1 > 2π, c 2 > 2π. Eventually where all the constructions will take place (see figure 2). This domain Z is located above the homoclinic annulus I µ , and for µ small enough it is contained in the domain D * where the manifolds are straightened.
But as the period function T is decreasing, this gives e n − δ e n > e n+1 + δ e n+1 which implies that e (A 1 n ) > e (A 2 n+1 ). Let us also remark that by definition of our domain D * , one can ensure that for µ small enough (and so n − T (e) is small enough) Step 4. Construction of the domains Z − n . The construction is similar (see figure 4), namely Z − n ∩ Z(θ 2 , I 2 ) is a rectangle with vertices − δ τ n , e 0 , θ 2 , I 2 , B 2 n = − e n µf 2 (θ 2 ) + δ τ n , e 0 , θ 2 , I 2 , where δ τ n is defined by for a constant c 4 > π max{c 1 +4π, c 1 +c 2 }, and T n = T (e n ) (see Appendix A and (22)). As in the previous step, one can check that τ (B 1 n+1 ) > τ (B 2 n ) so the domains Z − n are pairwise disjoint.
Step 6. Relative position of Z ± n and F ±n µ (Z ± n ). Here we will prove that the figures (5) and (6) make sense, that is we will show that the horizontal (resp. vertical) edges of F n µ (Z + n ) (resp. F −n µ (Z − n )) are not contained in Z. The upper horizontal edge of Z + n is the segment joining A 2 n to A 3 n , therefore the upper horizontal edge of F n µ (Z + n ) is the segment joiningÃ 2 n toÃ 3 n . We can compute e (Ã 2 n ) = 1 f 2 (θ 2 + nI 2 ) + 1 f 2 (θ 2 ) µ −1 e n − τ 0 + n − T (e n + δ e n ) and e (Ã 3 n ) = 1 f 2 (θ 2 + nI 2 ) + 1 f 2 (θ 2 ) µ −1 e n + τ 0 + n − T (e n + δ e n ). From (11) we have T (e n + δ e n ) ≤ n − c 3 µ 2π−1 , and as τ 0 = c 1 µ 2π−1 and c 3 > c 1 + c 2 this gives We also have e (Ã 3 n ) = e (Ã 2 n ) + 2τ 0 > e 0 , and so the upper horizontal edge of Z + n is not contained in Z. For the lower horizontal edge of F n µ (Z + n ), which is the segment joining A 4 n toÃ 1 n , one has to prove that e (Ã 4 n ) < 0 and e (Ã 1 n ) < 0. We compute and as 2c 3 > c 3 > c 1 + 2π, this gives e (Ã 4 n ) < 0. We also have e (Ã 1 n ) = e (Ã 4 n ) − 2τ 0 < 0, and so the lower horizontal edge of Z + n is not contained in Z. Similarly, one can check that the vertical edges of F −n µ (Z − n ) are not contained in Z, and this follows from the choice of the constant c 4 .
Step 7. Definition of the maps h ± n . Our maps h + n and h − n will be suitable extensions of the maps F n µ and F −n µ . First we set and we want to define Lipschitz extensions of h ± n to Z in order to have Let us begin with the maps h + n . Take z = (τ , e , θ 2 , I 2 ) ∈ Z \ Z + n , then we can find a uniquez = (τ , e (z), θ 2 , I 2 ) ∈ Z + n : indeed, either e n + δ e n µf 2 (θ 2 ) < e < e 0 in which case we choose e (z) = e n + δ e n µf 2 (θ 2 ) , or 0 < e < e n − δ e n µf 2 (θ 2 ) and then we choose e (z) = e n − δ e n µf 2 (θ 2 ) .
Then we set h + n (z) = F n µ (z) + (0, α + n (e (z) − e (z)), 0, 0), for some positive constant α + n yet to be chosen. Hence the map is a well-defined Lipschitz extension of F n µ and, using the form of the extension, one can check that For the maps h − n , this is completely analogous. For z = (τ , e , θ 2 , I 2 ) ∈ Z \ Z − n , then we can find a uniquez = (τ (z), e , θ 2 , Then we define Step 8. Verification of Hypotheses (H) and (L').
By step 6 (see the figure 5), the image under (h + n ) |Z + n = (F n µ ) |Z + n of the horizontal edges of Z + n do not belong to Z, and this implies that so the hypothesis (H2) is satisfied.
Now it remains to show hypothesis (H1) and (L'). This follows from the choice of α + n and lengthy calculations of the various partial derivatives of F n µ , using both the explicit expression obtained in step 5 and the estimates of Appendix A. We find the following: if we define α + n = π −1 (µ|T n |), then (H1) is satisfied with and we estimate the Lipschitz constants so (L') is satisfied for µ small enough as The situation for h − n is of course similar.
Then on the one hand, using the definition of f 1 we can estimate This concludes the proof.

Construction of a pseudo-orbit
1. In this section, we will restrict our study to a special type of skewproduct over a Bernoulli shift, which is called "polysystem" (see [Mar08], it is also known as Iterated Function System).
Recall from definition 2.4 that a polysystem is a skew-product (over σ) such that for anyn = (n k ) k∈Z ∈ Σ A , one has Fn = f n 0 . So a polysystem does not depend on the whole sequencen ∈ Σ A but only on its first component n 0 ∈ A, hence instead of being defined by a family of maps indexed by Σ A , it is defined by a family of maps indexed by A. Then one can easily see that a sequence (n k , x k ) k∈Z ∈ A×M gives rise to an orbit (σ k (n), x k ) k∈Z ∈ Σ A ×M , wheren = (n k ) k∈Z , for the polysystem [[f n ]] n∈A if and only if and its projection onto M corresponds to the iteration of the maps (f n ) n∈A in the order prescribed by the sequencen.

2.
In the previous section, we showed that the dynamics of the first return map of our diffeomorphism F µ in a neighbourhood of the homoclinic annulus is conjugated to the skew-product map Fn(θ, I) = (θ + n 0 I, I + 2µf 1 (φn(θ, I)) cos 2π(θ + n 0 I)),n ∈ Σ A , (θ, I) ∈ A.
These "standard" maps can be seen as perturbations of iterates of the integrable twist map T (θ, I) = (θ+I, I): more precisely we can write f n = V •T n where V (θ, I) = (θ, I + 2µ cos 2πθ) is a "vertical" map which is close to identity if µ is small. We will first construct an orbit for the polysystem [[f n ]] n∈A defined by the maps f n , and in the next section, we will use it as a pseudo-orbit for the skew-product [[Fn]]n ∈Σ A defined by the maps Fn.
3. The goal of this section is to prove the following proposition.
Proposition 5.1. There exists a positive constant C such that for µ small enough, there exists an orbit (σ k (n), x k ) k∈Z ∈ Σ A × A for the polysystem [[f n ]] n∈A defined by f n (θ, I) = (θ + nI, I + 2µ cos 2π(θ + nI)), n ∈ A, (θ, I) ∈ A, such that lim k→±∞ I k = ±∞, and the estimates We will see that this proposition holds only if we can choose the integers n k , k ∈ Z, as large as µ − 1 2 ln µ −1 , so this explains the choice of The upper bound on the sum of the integers n k needed to produce a drift of order one is basically the "time of instability" in this context. In the sequel, we shall only explain the construction of the positive sequence (n k , θ k , I k ) k≥0 , since, of course, the construction of the negative sequence (n k , θ k , I k ) k≤−1 is completely similar.

4.
Let us now describe the construction of our orbit. In all this section we will need to introduce two real numbers 0 < K < K < √ 2/2. First, we fix K such that 0 < K < √ 2/2 and we define the non-empty domain We shall only need the first condition cos 2πθ ≥ K in this section, the other condition sin 2πθ ≤ −K will be used in the next section. One can also write Now K being fixed, we shall also consider K ∈ ]K, √ 2/2[ such that the domain B K = J δ/2 × R, where J δ/2 is an interval of length δ/2. Now recall that we have f n = V • T n with V (θ, I) = (θ, I + 2µ cos 2πθ), T (θ, I) = (θ + I, I), (θ, I) ∈ A and by definition, the map V leaves the set B K invariant and produce in B K a drift in the I-direction which is at least equal to 2µK by the condition cos 2πθ ≥ K. Therefore, to prove the first part of our proposition, we will construct a sequence of points (θ k , I k ) k∈Z ∈ A for which we can find a sequence of integers (n k ) k∈Z ∈ A such that Indeed, in this case (θ k+1 , Then to prove our second part, we shall need estimates on these integers n k , k ∈ Z. In fact the relation (16) can be written as where R I is the rotation on the circle T of angle I. Most of the time, that is if either I k is irrational or if I k = p/q with |q| ≥ δ −1 , this will be realized, and the integers n k , k ∈ Z, can be estimated. This is an easy consequence of the "ergodization" theorem recalled below, that we shall use crucially in our construction.

5.
For I ∈ R, consider the rotation R I (θ) = θ + I defined on the circle T. Given 0 < δ < 1, let J δ be the set of intervals of T of length δ. We define the δ-ergodization time N (I, δ) ≤ +∞ by N (I, δ) = inf{n ∈ N | {θ, . . . , R n I (θ)} ∩ J = ∅, ∀θ ∈ T, ∀J ∈ J δ }, or equivalently One can easily see that N (I, δ) < +∞ except if I is a rational number with a denominator smaller than δ −1 , but it is more difficult to prove that when it is defined, this number is essentially given by the inverse of the distance to these "bad" rationals. This is the content of the theorem below, which is due to Berti, Biasco and Bolle ( [BBB03]).
Theorem 5.2 (Berti-Bolle-Biasco). There exist a positive constant M such that if This is a consequence of Theorem 4.2 in [BBB03] (see also the estimate (5.3) in [BBB03]), where the above proposition is proved both for the continuous and multi-dimensional case. Of course one can give an explicit value for the numerical constant M but this will not be useful.
In fact, most of the time we shall use this result in the following form.
Lemma 5.3. Let I ∈ R \ R δ , θ ∈ T and J ⊆ T any interval of length δ/2. Then for any m ∈ N, one can find an integer such that θ + nI ∈ J.
6. In order to construct our orbit, now we know that we have to consider the distance to this set of "bad rationals" R δ . So given µ > 0, we first define the domain of "fast drift" and finally the domain of "resonances" Obviously we have But in the sequel, it will be more convenient to further decompose these sets. Indeed, for p/q ∈ R δ , if we define then, for µ small enough, one has From now on we will assume that µ is sufficiently small, with respect to δ and M which are fixed, so the above decomposition (17) is in fact a partition of A (since the set R δ is discrete) and moreover the following properties hold true: for any (θ, I) ∈ D S (µ), there exist a unique p/q ∈ R δ such that (θ, I) ∈ D S (µ, p/q), and if m = inf{n ∈ N * | f n (θ, I) / ∈ D S (µ, p/q)}, then m is well defined and necessarily the point (θ m , I m ) = f m (θ, I) does not belong to D S (µ) (either it is in D R (µ) or D F (µ), since the size of the jump, which is of order µ, is much smaller than the width of the connected components of D R (µ) or D F (µ)). We also ask the same for a point (θ, I) ∈ D R (µ): under the iteration of a map f n the first time it escapes the domain D R (µ, p/q), it also escapes the domain D R (µ) (in fact it enters into the domain D S (µ)). This situation is depicted in figure 8.
7. The construction of our orbit will be inductive, and we will start with a point (θ 0 , I 0 ) ∈ D F (µ). Then we have the following easy application of the ergodization theorem. 8. As long as the orbit stays in D F (µ), we can use the previous lemma. Then it will enter into the domain of slow drift, and this is where the ergodization theorem gives integers as large as µ − 1 2 ln µ −1 . However in the lemma below we will see how, after iterating a finite number of maps f n k , n k ∈ A, with n k of order µ −1 ln µ −1 , one can actually cross through this domain of slow drift.
Let us point out that in the proof of Proposition 5.1 below, the above lemma will always be used with j ≥ 1 so item (iv) will be available.
Proof. There exists a unique p/q ∈ R δ such that (θ, I) ∈ D S (µ, p/q). By Lemma 5.3 (with m = [ln µ −1 ]) and since we can find an integer n 0 ∈ A such that f n 0 (θ, I) ∈ B K . Now if f n 0 (θ, I) / ∈ D S (µ), then we can take j = 0 in the statement and assertions (i), (ii) and (iii) are proven.
Otherwise, we construct inductively (n k ) 1≤k≤j where j ∈ N is defined by Since our orbit always stays in B K , then j is obviously well-defined, and at each step we have used Lemma 5.3 so conditions (i) and (ii) are satisfied. Moreover by a previous remark we can also write and condition (iii) is also satisfied. Finally, let us write (θ 0 , I 0 ) = (θ, I) and (θ k+1 , I k+1 ) = f n k • · · · • f n 0 (θ 0 , I 0 ), for 0 ≤ k ≤ j. Since these points belong to B K we have The second term on the right-hand side is a Riemann sum, hence for µ small enough it can be estimated by an integral, namely This is exactly (iv), and so this ends the proof.
9. Now that we have escaped the domain of slow drift, we are in the resonant domain. Here we cannot use any ergodization result. However our point belongs B K , and in the lemma below we will show that by iterating j times a map f n , with n of order ln µ −1 , it can cross the resonant domain while staying in the larger domain B K .
Note that it is possible that the point f j n (θ, I) does not belong to B K either, but then we are back in the domain of slow drift and we can find an integer n ∈ A such that T n (f j n (θ, I)) ∈ B K ⊆ B K .
We will also add a further smallness condition on µ by requiring that 2 K ln µ −1 < 1 2π max{arccos K − arccos K , arcsin K − arcsin K}.
Let us define inductively where j is defined as follows: j = inf{j 1 , j 2 } with It follows from the definition of j that (θ k , I k ) ∈ D R (µ), for 0 ≤ k ≤ j − 1, so now let us show that (θ k , I k ) ∈ B K .
In fact we will prove below by induction on k that Since n ≤ 2 ln µ −1 this implies and as k ≤ j 2 , then This means that cos 2πθ k ≥ K and sin 2πθ k ≤ −K, and therefore this gives So now let us go through the induction, that is let us prove (18). This is obviously true for k = 0, so let us assume it is satisfied for some 0 ≤ k ≤ j−2. We have θ k+1 = θ k + nr k , but since I k ≥ p/q − µ 1 2 / ln µ −1 and n = dq then θ k+1 ≥ θ k + dp − nµ as dp is an integer. Then, using the hypothesis of induction this gives Similarly using the fact that I k ≤ p/q + µ 1 2 / ln µ −1 one obtains therefore (18) is proven and so is (i). Now assume that j = j 2 , then since (θ k , I k ) ∈ B K for 0 ≤ k ≤ j − 1, which means that j 2 ≥ j 1 . This is absurd, therefore j = j 1 , so (θ j , I j ) / ∈ D R (µ, p/q) and this means that (θ j , I j ) / ∈ D R (µ). This proves (ii).
10. Now we can finally conclude the proof of Proposition 5.1.
Proof of Proposition 5.1. We choose any point (θ 0 , I 0 ) ∈ D F (µ). Applying successively Lemma 5.4, Lemma 5.5, Lemma 5.6 and Lemma 5.5 once again, in this precise order, one obtains a positive sequence (n k , θ k , I k ) k∈N ∈ A × A which gives a positive orbit for our polysystem, that is Note that since we have started in D F (µ), at each time Lemma 5.5 is applied with an integer j ≥ 1. Now by construction, T n k (θ k , I k ) ∈ B K for any k ∈ N, and since by definition of B K one obtains This clearly shows that lim k→+∞ I k = +∞, which proves the first part of the statement. Then as if we set N = [(µK) −1 ] + 1, It remains to estimate the sum of integers To do that, we will write For each k ∈ σ 1 , we know that n k ≤ 2 ln µ −1 and hence To estimate S 2 , first observe that the set R δ is discrete, so its intersection with the compact set {I 0 ≤ I ≤ I N } is finite. But the latter set is included in {I 0 ≤ I ≤ I 0 + 3}, so the constant is independent of µ. Then setting Now each σ 2 (p/q) is not reduced to a point, so by Lemma 5.5 This finally gives with C = 2 sup 4K −1 , M 2K −1 + (2K ) −1 , and this proves the proposition.
6 Proof of Theorem 2.2 11. In this section we will prove Theorem 2.2, and in view of Proposition 4.1, this will follow easily from the next result.
Proposition 6.1. There exists a positive constant C such that for µ small enough, there exists an orbit (σ k (n), x k ) k∈Z ∈ Σ A × A for the skew-product The strategy will be to consider the orbit constructed in Proposition 5.1 as a pseudo-orbit for the skew-product. So in order to find a true orbit nearby, we shall need some hyperbolicity and this will be described below.

Recall that
The condition sin 2πθ ≤ −K will be used here to prove the following lemma.
Lemma 6.2. Let x ∈ A such that T n (x) ∈ B K , for n ∈ A, andx ∈ R 2 a lift of x. Then the eigenvalues λ ± of df n (x) are real and for µ small enough, they satisfy Moreover, if e ± ∈ R 2 are eigenvectors associated to λ ± , and for v ∈ R 2 , v = v + e + + v − e − , then 1 2 , and where | . | is the supremum norm on R 2 .
Therefore we easily obtain and then using the equality λ + λ − = 1, for µ small enough one finds This proves the first part of the statement.
13. Let us now describe an abstract fixed point theorem that will be used to find an orbit close to our pseudo-orbit. Consider a Banach space (E, | . |) and T : E → E a continuous linear map. Recall that the spectrum Sp(T ) of T is the set of complex numbers λ such that T C − λId C is not an automorphism of E C , where E C and T C are the complexifications of E and T .
Given two real numbers κ s , κ u satisfying 0 < κ s < 1 < κ u , we say that T is (κ s , κ u )-hyperbolic if In such a case, there exists a T -invariant decomposition and a constant c > 0 such that |(T |Es ) n | ≤ cη n s , |(T |Eu ) −n | ≤ cη n u , for any n ∈ N, η s < κ s , η u > κ u and where | . | is the induced norm on linear operators. In fact, one can always find a norm . on E which is adapted to T in the following sense: . is equivalent to | . | and satisfies (i) x s + x u = sup{ x s , x u }, x s ∈ E s , x u ∈ E u ; (ii) T |Es ≤ κ s , (T |Eu ) −1 ≤ κ −1 u .
The following theorem is proved in [Yoc95], section 2.1.
Then U has a unique fixed point p ∈ E, and if . is a norm adapted to T , 14. Now we can prove Proposition 6.1.
Proof of Proposition 6.1. Consider the orbit (σ k (n), θ k , I k ) k∈Z ∈ Σ A × A given by Proposition 5.1. Then the proof is an immediate consequence of the following claim: there exist a sequence (θ k , I k ) k∈Z ∈ A such that (σ k (n), θ k , I k ) k∈Z ∈ Σ A × A is an orbit for the skew product and |I k − I k | ≤ µ 2 , k ∈ Z.
So let us construct this orbit. Let x k = (θ k , I k ) k∈Z ∈ A, andx k = (θ k , I k ) k∈Z one of its lift in R 2 . First we define a linear map Using the supremum norm on R 2 let us define Then E is obviously a Banach space with the norm Now recall that for anyx = (θ, I) ∈ R 2 , if s = −2πµ sin 2π(θ + nI), then df n (x) = 1 n 2s 1 + 2ns ∈ M 2 (R).
Let us set where . k is a norm in R 2 adapted to T k , k ∈ Z, then . is adapted to T , and from Lemma 6.2 Note that if v is a fixed point of U , then F σ k−1 (n) (x k−1 + v k−1 ) =x k + v k , so (σ k (n), x k ) k∈Z ∈ Σ A × A, where x k ∈ A is the projection onto A of x k =x k + v k ∈ R 2 , is an orbit for the skew-product. To prove that U has a fixed point, we will use Theorem 6.3 and for that we need to estimate the Lipschitz constant of ∆ = U − T with respect to the adapted norm . . First if v, v ∈ E satisfy |v| ≤ µ 2 , |v| ≤ µ 2 , then from Taylor  is obvious from the definition of the maps (f n ) n∈A .
Then setting ∆ = U − T , for any v, v ∈ E one easily obtains and this gives In particular, this shows U (F ) ⊆ F , and the Lipschitz constant of U − T with respect to the adapted norm .
By definition of U , this shows that v is in fact a fixed point of U , and therefore the sequence (σ k (n), x k ) k∈Z ∈ Σ A × A, where x k is the projection onto A ofx k =x k + v k = (θ k , I k ), k ∈ Z, is an orbit for the skew-product. If we set x k = (θ k , I k ) ∈ A, then from (19) we obtain |I k − I k | ≤ µ 2 , k ∈ Z.
This concludes the proof.