Continuous averaging proof of the Nekhoroshev theorem

In this paper we develop the continuous averaging method of Treschev to work on the simultaneous Diophantine approximation and apply the result to give a new proof of the Nekhoroshev theorem. We obtain a sharp normal form theorem and an explicit estimate of the stability constants appearing in the Nekhoroshev theorem.


Introduction
In the papers [Tr1,Tr2], Treschev developed a new averaging method called continuous averaging. It is a powerful tool to derive sharp constants in the exponentially small splitting problems in Hamiltonian systems with one and a half degrees of freedom. But the technicality becomes very heavy when we use the method to study Hamiltonian systems of more degrees of freedom. For this reason, the method has not been applied to other problems yet.
We complexify the variables and extend the domain of (I, x, y) to a σ neighborhood and that of θ to a ρ neighborhood of the original domains respectively. The extended phase space to the complex domain is where ρ is the width of analyticity in θ and σ is that of the slow variables I, x, y.
for some constants a, b, C 0 , C 1 , C 2 > 0 independent of , where I(t) is the action variable component of any orbit associated to Hamiltonian (1.1) with initial condition in the set D.
There are many works studying the stability exponents a and b (c.f. [LN, Po, BM]). Their approaches are based on a careful study of the geometric and number theoretical aspects of resonances. Instead, in this paper we try to sharpen the estimates in the analytic part of the proof using continuous averaging to obtain an improved normal form (see Theorem 3.1). Then we apply the normal form to Lochak's argument to get the Nekhoroshev theorem (see Theorem 2.1) where all the stability constants are estimated explicitly. In this paper, we only work on the case a = b = 1/2n. But the normal form theorem can be easily applied to other prescribed a and b to get the corresponding C 2 .
The method of Lochak is called the simultaneous Diophantine approximation, which turns out to be an important alternative to the classical approach via small divisor techniques, as explained in [L2]. Its main idea is to do the averaging in a vicinity of a periodic orbit. So it is essentially an averaging procedure for systems with one fast angle. In general, we can kill the dependence on the fast angle up to exponential smallness. This makes the simultaneous Diophantine approximation suitable to prove the Nekhoroshev theorem. The work [PT] can be considered as a development of the continuous averaging to the small divisor case. In this paper, it is the first time that the continuous averaging has been developed to the simultaneous Diophantine approximation.
We point out the relation between continuous averaging and some important PDEs. The idea of the continuous averaging is to study the averaging procedure using PDE instead of iterations. The PDE has the form H δ = {H, F }, where F is the Hilbert transform of H in some special cases (see Section 3 for more details). This type of equation has been studied (c.f. [CCF]) as a simplified model for quasi-geostrophic equation (c.f. [KNV]), incompressible Euler equation, etc. It would be interesting if we could apply some PDE techniques to our problem.
To state our theorems, we need the following definitions.
(1) We use | · |, | · | 2 , | · | ∞ to denote the l 1 , l 2 , l ∞ norms for a vector in R n or Z n . Without causing confusion, we also use | · | to denote the absolute value of a function whose range is in R or C.
We also use the following definition to characterize the convexity of the unperturbed part H 0 (I).
Definition 1.2. Consider a Hamiltonian H 0 (I) defined on G n + σ. Then, we define the associated constants M ± > 0 to characterize the convexity of H 0 (I). ( Now we state a simplified version of our main theorem. The complete version is stated in the next section. Theorem 1.1. Consider a Hamiltonian system (1.1) satisfying inequalities (1.3) in Definition 1.2 and n ≥ 2, m ≥ 0. For every orbit (I, θ, x, y)(t) with initial condition (I, θ, x, y)(0) ∈ D(ρ, σ) and (x(t), y(t)) ∈ W 2m + σ, we have the following estimates provided ε is small enough.
The norm | · | ∞ is taken over I ∈ G n .
This theorem gives the estimate of the stability constant C 2 in (1.2). For a given system, we need to optimize ρ 1 under the constraints in the theorem. We see that the decomposition of ρ can be qualitatively written as ρ = ρ 1 + c 0 µ 1/n + c 1 ρ A possible application of the result is to the 3-body problem in order to get long time stabilities. This direction is already pioneered in [N]. But the mass ratio of Jupiter to the sun obtained in [N] is too small to be satisfactory. On the other hand, in [FGKR], the authors construct diffusing orbits for restricted planar 3-body problem. The diffusion time there is polynomial w.r.t. 1/ε.
The paper is organized as follows. First we give a complete statement of the main theorem and compare it with previous results in Section 2. Then we state a normal form Theorem 3.1 about averaging in a vicinity of periodic orbits in Section 3. This is the main result that we obtain using continuous averaging, which improves the corresponding one in [LNN, N]. Then we give a brief introduction to the continuous averaging method in Section 4. After that we give a proof of Theorem 3.1 in Section 5. This section is a higher dimensional generalization of the case studied in Section 4. We try to draw analogy between the two sections. With the normal form theorem, we first show local stability result of Nekhoroshev theorem in Section 6, and then global stability in Section 7. Here local stability means the stability result in a neighborhood of a periodic orbit and global stability means stability for all initial conditions. Finally, we have two appendices A and B. The first one contains some technical estimates for the continuous averaging. The second one is some basics of majorant estimates.

The complete statement of the main theorem and discussions
We give a complete statement of the main theorem as follows.
The norm | · | ∞ is taken over I ∈ G n .
The constant µ plays the same role as the constant E in [LNN, N]. It is dual to ε since only the product εµ enters the original Hamiltonian. We need the smallness of µ to make the first bullet point in Theorem 2.1 satisfied. The same restriction is expressed in [LNN, N] by introducing a constant g. The second bullet point can be satisfied easily by taking ε small enough. To improve the stability time, we want ρ 1 to be as large as possible, but the third bullet point gives a restriction of ρ 1 so that we need to optimize among ρ 1 , ρ 2 , ρ 3 . This restriction appears due to the finiteness of the width of analyticity of the action variables I and degenerate variables x, y. It shows up in a different form in [LNN] as item (ii) of Theorem 2.1, where the choice of R there can be as small as ε 1/2n . We will give more discussions in Remark 6.1 and 7.1. We will see from the following Theorem 3.1 that our normal form theorem obtained from continuous averaging improves that obtained from the iteration method. Therefore we see we also get improved C 2 here even though ρ 1 is not expressed explicitly.

Normal form
Our main work in this paper is to obtain a normal form theorem using continuous averaging. Following Lochak, we do the averaging in a neighborhood of a periodic orbit.
where each of the terms is given in the next definition.
Definition 3.2. We use the Taylor expansion of H 0 to split it as H 0 (I) = ω * , I + G(I), where G(I) contains the higher order terms. For H 1 part, we use the Fourier expansion H 1 = k∈Z n H k (I, x, y)e i k,θ to write εH 1 (I, θ, x, y) = εH(I, θ, x, y) + εH(I, θ, x, y), H k e i k,θ , the resonant part, H k e i k,θ , the nonresonant part.
The exponential smallness obtained here improves that of [LNN, N]. Continuous averaging enables us to get rid of some extraneous numerical factors that worsen the estimates. Moreover, our method has an advantage, that is we do not need to do a preliminary transform which is necessary in [LNN, LN]. The proof of this result is contained in Section 5.

A brief introduction to the continuous averaging
In this section, we give an introduction to the continuous averaging method. Please see the chapter 5 of [TZ] for more details. We try to explain the key points of the method that will be used in our later proof.  with initial value Z δ=0 = z, then the change of variables is symplectic and we get where the subscript δ means partial derivative. The last equality follows from the fact that the Poisson bracket is invariant under symplectic transformations. In the following, we only work with the variables z.
To simplify our discussion, we consider a special case of (1.1) with m = n = 1. A further simplification is to consider only time-periodic nonautonomous systems.
This is equivalent to requiring that H 0 (I) = I in equation (1.1) and H 1 (x, y, θ) independent of I. From equation (4.3), we have: where {·, ·} (x,y) stands for the x, y part of the Poisson bracket.
Our goal is to show that if we choose a suitable Hamiltonian isotopy F and extend δ as large as possible, the dependence of H on θ can be killed to be exponentially small, i.e. O(e −c/ε ) for some constant c.
Suppose H(z, δ) has Fourier expansion where ε H 1 means the zeroth Fourier coefficient of H 1 . We choose the Hamiltonian isotopy F as the "Hilbert transform": Now equation (4.4) has the form in terms of Fourier coefficients: We show this F is the good choice that makes the dependence on θ decrease exponentially.

4.2.
The choice of the Hamiltonian isotopy F .

4.2.1.
Heuristic argument. Following [TZ], we explain here the heuristic ideas that justify this choice of F . If we set ε = 0 in (4.7), we get H k δ = −|k|H k , whose solution tends to zero as δ → ∞. If we neglect the third term in the RHS of (4.7), we have . It has an exact solution of the form where g means the Hamiltonian flow generated by the Hamiltonian H 1 . Notice the imaginary unit i here. It tells us that the flow is considered with purely imaginary time. As δ increases, the complex width of analyticity is lost gradually. So formula (4.8) has sense only if we take εδ < ρ, where ρ is the width of analyticity in θ. This is an obstacle for the extendability of δ.
We see from the heuristic argument that this choice of F gives us the exponential decay as well as a good guess for the stopping time.

4.2.2.
Comparison with the Lie method. The Lie method is used in the works [N, LN, LNN]. Before working out the detailed proof of the above heuristic argument, we explain the "Hilbert transform" first. In fact this choice of F is strongly related to the classical averaging theory. Let us recall what we usually do in the Lie method.
Define the linear operator of taking Lie derivative along the Hamiltonian flow generated by the Hamiltonian functionF : The time-1 map of (4.1) and (4.3) is: In each step of iteration, we need to solve the cohomological equation In fact, we are only able to solve (4.10) By comparing the Fourier coefficients, we obtain the following Now we can explain why we choose F as the Hilbert transform of H in (4.6). We select F to inherit the most important information inF , namely, the imaginary unit i and sgn(k). Readers can check that we still get the heuristic argument above if we choose theF whose Fourier coefficients are (4.11) to do the averaging in (4.3).
4.3. The integral equation. Now we take into account the third term in the RHS of equation (4.7). We first remove the −|k|H k term in equation (4.7) by setting If we define an operator g is * f := f • g is , where g t is the flow generated by the Hamiltonian − H 1 , the exact solution of the truncated equation would be g εσ k iδ * u k (x, y, 0). Then using the variation of parameter method in ODE, we can write the exact solution to equation (4.12) in the following form.
We will analyze this equation to study its solution. To do so, we need a good control of the non-homogeneous term, i.e. the second term in the RHS.

4.4.
Control of the nonhomogeneous term of equation (4.13). To control the nonhomogeneous term, we use the majorant estimate. The majorant relation " " is defined as follows.
Definition 4.1. For any two functions f (z), g(z), z = (z 1 , z 2 , · · · , z m ), analytic at the point z = 0, The proof is first to guess a majorant assumption, then show the function in the assumption satisfies an equation that majorates the integral equation (4.13). This checks the assumption and closes up the argument. Now we make a majorant assumption (4.14) where δ * ∼ ρ/ε is the maximal extension time determined by the homogeneous part of equation (4.7) in the heuristic argument. The e −|k|ρ characterizes the way how the Fourier coefficients decay in the case of analytic perturbation and µ = H 1 ρ . We choose Y = x+y to make it easier to calculate the derivatives since Then the integrand of equation (4.13) can be majorated by This C depends on the smoothness and magnitude of H 1 and the number of combinations l + m = k. The number of combinations of integers in one dimensional case is easy to estimate, but in higher dimensional case it becomes very difficult, which is the main difficulty that we need to overcome in this paper.
If we can solve the equation V δ = CV 2 Y , then equation (4.13) can be viewed as This checks the majorant assumption (4.14). In order to solve the equation where σ is the width of analyticity in the slow variables (x, y).

4.5.
Outcome of the continuous averaging procedure. The Burgers equation can be solved explicitly using the characteristics method in PDE. The solution is In order to ensure (σ − Y ) 2 − 8σCδ ≥ 0, we obtain the maximal flow time given by the slow variables is δ < σ 8C .
Therefore each Fourier coefficient H k after the continuous averaging would be less than e −δ = O(e −c/ε ) for some constant c. Adding up all these Fourier terms, we recover the Hamiltonian after the averaging, which is of order O(e −c/ε ). This is the result proved in [Tr1,Tr2,TZ]. We will work out all the details in Section 5.5.

Continuous averaging proof of the Normal form Theorem 3.1
Now we prove Theorem 3.1 using the continuous averaging method. Let us go back to the setup in Section 3. Since we are looking at a motion that is very close to periodic orbit in the region of the phase space, the continuous averaging explained in the previous section could be applied. The periodic orbit corresponds to the fast angle θ in equation (4.5). The nonresonant partH corresponds to the θ dependence term k =0 H k e ikθ in equation (4.5). The ω * , I will produce the exponential decay in the same way as the term I in equation (4.5) did in equation (4.8). And the term G(I) will generate the imaginary flow in the same way as the term H 1 in equation (4.5) did in equation (4.8). Finally, the termH leads to additional difficulties.
We devote the remaining part of this section to the proof of Theorem 3.1. The proof is organized as follows.
• Set up the continuous averaging in terms of Hamiltonian and get some heuristic understanding of the averaging process in Section 5.1. • Apply it the Hamiltonian vector field in Section 5.2.
• Following procedures in Section 4, we define the operator g to write the differential equations as integral equations, then we write down the majorant equation and prove the majorant relations. • Derive necessary estimates in the theorem from the majorant estimates in Section 5.5.
5.1. Continuous averaging for Hamiltonian (3.2). In this section, we write down the continuous averaging and get a heuristic understanding. We start with a definition. As we have seen in Section 4, in the process of continuous averaging, we have different aspects like exponential decay, imaginary flow and nonhomogeneous terms.
Definition 5.1. We define a partition of the width of analyticity ρ, ρ = ρ 1 + 2ρ 2 + ρ 3 , ρ 1 , ρ 2 , ρ 3 > 0, For δ > 0, we also define the following sets to form a partition of the grid Z n . Finally, we define two functions of δ associated to the above sets.
• We split the analyticity width ρ of the fast angle θ into ρ = ρ 1 + 2ρ 2 + ρ 3 . This splitting is quite flexible. We will optimize it to make ρ 1 as large as possible in Section 6 and 7. Here ρ 1 would be used to control the imaginary flow, ρ 3 is used to do averaging, and ρ 2 is the remaining width of analyticity in angular variables after averaging. These distinctions will be made clear in the course of the proof.
• We choose the cut-off K to make sure that if |k| ≥ K, then the corresponding Fourier coefficient is smaller than e −ρ 3 K , which we think to be sufficiently small. A Fourier coefficient with k ∈ D ± (δ) will become smaller as δ increases. Once it is smaller than e −ρ 3 K , the vector k enters D > (δ). So D ± (δ) keeps shrinking as δ increases. We stop running the continuous averaging once D ± = ∅.
Proof. From the definition of F we know the Fourier harmonics of F come only from D ± (δ). As a result for any k = 0, we must write k = l ± + l for l ± ∈ D ± (δ) and some l. The equation (5.3b) is straightforward.
Lemma 5.3. If we define R as the confinement radius of I, i.e. |I| 2 ≤ R, I ∈ G n + σ ⊂ C n , then we have the following estimates Proof. We first notice G(0) = ∇G(0) = 0. For |G|, we use the formula and Definition 1.2 to get the estimate in the lemma. For |∇G(I)| 2 , we use The following lemma helps us to understand the heuristic ideas of the process of continuous averaging and Definition 5.1.
Lemma 5.4. If we omit the k=l ± +l terms in the RHS of equations (5.3), then equation (5.3) can be solved explicitly and the solution satisfies . Moreover, at the stopping time δ * we have where the domain of variables is (I, x, y) ∈ (G n + σ) × (W 2m + σ).
If we truncate equations (5.3), then the first and the third become H k δ = 0 for k ∈ D > (0) ∪ D 0 . So we have the corresponding estimate of |H k | stated in the lemma. However, equation (5.3c) becomes
In the splitting ρ = ρ 1 + 2ρ 2 + ρ 3 , we use ρ 1 to bound the term k, ∇G . Namely, we need |σ k k, ∇G |δ ≤ ρ 1 |k|. It is enough to require that This also gives an upper bound for δ. We equate this upper bound with the one given in Lemma 5.1 to obtain the value of K in Definition 5.1. Now we have The definition of D ± (δ) implies that once this H k term is already e −2ρ 2 |k| e −ρ 3 K , the k will enter D > (δ) and not belong to D ± (δ) any more.

Continuous averaging for a vector field.
In order for the majorant estimates to be applicable to understand equations (5.3), we need to write the continuous averaging equations in terms of Hamiltonian vector field.
Definition 5.2. We introduce the following vector fields h * , h 0 ,h,h corresponding to different parts I, ω * , G,H,H of the Hamiltonian (3.2) and f corresponding to F . We also use h k to denote the k-th Fourier coefficients ofh andh. Moreover, corresponding to F in equation (5.2), we define With this definition, we can rewrite the continuous averaging equation ( Lemma 5.5. If we set v k = h k e S k (δ) ω * ,k (recall S k (δ) was defined in Definition 5.1), then equations (5.3) can be rewritten in the following form in terms of Hamiltonian vector field.
Proof. In equations (5.3), we replace the Poisson bracket by Lie bracket and the upper case letters H, F by the lower case letters h, f respectively. Then we remove the −| ω * , k |h k in the second case as we did in Section 4.3. We set v k = h k e S k (δ) ω * ,k . Then direct computation proves the lemma.
5.3. The operator g and the majorant commutator. What we do next is to write the differential equations for v k 's as integral equations. As we did in Section 4.3, we first need to define an operator g which solves the homogeneous part of equations (5.6), esp. (5.6b).
Definition 5.3 (Section 2 of [PT]). Let g t be the Hamiltonian flow of the Hamiltonian vector field h 0 (I) generated by the Hamiltonian G(I). We put f k =f (I, x, y)e i k,θ for an arbitrary analytic functionf defined on D(ρ, σ) and then define: It is shown in Section 5 and 7 of [PT] that g has the following two properties. • With this operator g, we can write differential equations (5.6) as integral equations.
Lemma 5.6. If we denote the k=l ± +l terms in equations (5.6a, 5.6b, 5.6c) by η a , η b η c respectively, then we have the following three integral equations equivalent to equations (5.6).
Proof. The equations (5.7a) and (5.7c) are straightforward. The equation (5.7b) is an application of the first property of the operator g above and the variation of parameter method in ODE.

5.3.2.
The majorant commutator. We need the following majorant commutator to perform estimates.
Definition 5.4 (Section 7 of [PT]). For any two functionsF ,Ĝ : C n+2m → C, and any two vectors l, k ∈ Z n , we define the majorant commutator: For this commutator, we have the following lemmas.

Majorant equation, the derivation and the solution.
5.4.1. Majorant control on the initial value. We first have majorant control on the initial value.
Lemma 5.9. For |δ| ≤ δ * , and R < σ, k ∈ Z n , we have v k (I, x, y, 0) Proof. We first consider v k (I, x, y, 0) = h k . We know |H k |, |h k | ∞ ≤ µe −ρ|k| for (I, x, y) ∈ (G n + σ) × (W 2m + σ) from the definition of µ in Definition 1.1. Then we use Lemma B.1 (4) in Appendix B to obtain the majorant control of v k (I, x, y, 0). Now we consider the effect of g. The operator g is defined by the Hamiltonian flow generated by the Hamiltonian iG(I) in Definition 5.3. The variables I, x, y are constants of motion of this Hamiltonian flow. So g only shrinks the width of analyticity in θ but has no influence on that of I, x, y. From the definition of g, we see We also have | k, ∇G |δ ≤ M + Rδ * |k| ≤ ρ 1 |k| according to inequality (5.4). This tells us |g δ v k (I, x, y, 0)| ∞ ≤ µe −(2ρ 2 +ρ 3 )|k| , for (I, x, y) ∈ (G n + σ) × (W 2m + σ).
Now use Lemma B.1 (4) in Appendix B again to obtain the lemma.

Majorant equations.
The following construction is given in [PT].
Definition 5.5. Consider a continuous function a(δ). We define the functions W and W |k| as follows as solutions of PDEs. (5.8) Lemma 5.10. The solutions W and W |k| are given explicitly by The solutions are defined up to time δ * and for Y satisfying the restrictions. (5.10) Proof. The fact that W and W |k| are exact solutions can be checked directly. To obtain the restriction for δ * , we need to ensure (σ − Y ) 2 − 4A(δ) ≥ 0 so that the square root makes sense.
We want that when δ = δ * , we still have |I| 2 , |x| 2 , |y| 2 ≤ R. We know Remark 5.2. Let us try to understand the PDEs (5.8) heuristically. Consider The way to solve it is the characteristic method. The characteristics is given by dx dt = −W . Then we are able to write the PDE in the form: dU/dt = V U . Then U = U 0 e V dt . So we see that, W determines how fast we approach the intersection of characteristics, while V determines how U grows.
The main result of this section is summarized in the following proposition, which implies the solutions of equations (5.8) majorate that of equation (5.7).
Proposition 5.11. For any τ such that |τ |+δ ≤ δ * , we have the following majorant control of the solution v k (I, x, y, δ) of equation (5.7) . under the restriction 5.10 coming from Lemma 5.10. (The expression of a(δ) and A(δ) will be given explicitly in Lemma 5.13.) Moreover A(δ * ) is given by Proof. We first cite Proposition A.1 in [PT].
Applying the definition of the majorant commutator (Definition 5.4), we get Here we use Lemma 5.12(3). This gives (W W |l| ) Y 2W W |l| Y . We introduce the notations (5.13) The second term in the RHS is the most complicated one. We only consider this term. The other two terms are done similarly. (5.14) Here |l > | ≤ K + |k|, because l > = k − l ± , |l ± | ≤ K. We used Lemma 5.12(5) to decrease the exponent of W |l| . We also imposed a mild restriction: (5.15) 2(n + 2m) ≤ K.
If |k| ≥ K, we get the majorant equation for the W |k| part in equation (5.8).
If |k| ≤ K, using Lemma 5.12(1) and W |k| = W , we replace the last " " in (5.14) by This is the majorant equation for W in equation (5.8).
For Σ ± and Σ 0 , we get the same majorant estimate with Σ > replaced by Σ ± and Σ 0 . Now the problem is to find a(δ) to give bound for 6Ke σ εσµ(2Σ ± + Σ 0 + Σ > ). We need to do some careful analysis for this and the result is summarized in the following lemma.
The proof of this lemma is given in Appendix A.
This lemma gives the restriction (5.12) in Proposition 5.11. What we have shown is that each integrand of equations (5.7) has majorant estimate e −|k|(ρ 3 +2ρ 2 ) µζ k ≤ W |k| δ , where W |k| satisfies equations (5.8). Combined with the majorant control on initial condition in Lemma 5.9, this implies the LHS of equations (5.7) is majorated by W |k| . Now the proof of the proposition is complete. 5.5. The system after the averaging. The continuous averaging gives us the following information about the Hamiltonian vector fields.
Lemma 5.14. At time δ = δ * , we have Proof. Recall in Lemma 5.5, we set v k = h k e S k (δ) ω * ,k . Using the definition of S k (δ) in Definition 5.1, we get v k = h k for k ∈ D 0 ∪ D > (0). Then Proposition 5.11 applies to such k's. For k ∈ D ± (0), we must have ρ 3 |k| + S k (δ * ) ω * , k = ρ 3 K according to the definition D ± (δ). Then apply Proposition 5.11 to this case. whereΨ is the resonant term andΨ is the nonresonant term as defined in Definition 3.2. The following lemma gives estimates for the functionsΨ andΨ.

The deviation of action variables in the real domain.
Lemma 5.16. Under the same hypothesis as Lemma 5.15, after the averaging the total deviation of the variables is (I , θ , x , y ) − (I, θ, x, y)| ∞ ≤ 5εµT 2π(ρ 3 + ρ 2 ) n .
Here the norm | · | ∞ is taken in the real domain.
Proof. For simplicity, we consider only the I component. The other components are similar. From equation (4.2), we have where the RHS is a real function. Then (5.16) Hence we have the estimate (recall 2πT = T .) Proof of Theorem 3.1. Lemma 5.15 and 5.16 complete the proof of the theorem. Notice the conditions of the theorem coincide with that of Lemma 5.15 and 5.16, where the last condition in the theorem is exactly A(δ * ) < 4σ 2 /25. Lemma 5.15 gives the estimate ofΨ and Lemma 5.16 gives the estimates for the deviation of the variables.
6. Local stability: stability in a vicinity of a given periodic orbit In this section, we derive stability result using the normal form Theorem 3.1. Recall in Section 3, we have set ω * = ∂H 0 ∂I (0) as the frequency vector of the periodic orbit that we are considering. We consider initial condition I(0) such that |I(0)| 2 ≤ r.
We obtain the following inequality using Definition 1.2 (6.1) We use the energy conservation and Lemma 5.15 for the first term of the RHS to get For the second term in the RHS of inequality (6.1), we have For the first term in the RHS, we use the Hamiltonian equation, Lemma 5.15 and the fact that ω * , ∂Ψ ∂θ = 0.
We choose We set Then we have The proof is now complete.
Remark 6.1. We introduce a restriction 6.3 instead of introducing a constant g as did in [LNN, N]. The two restrictions of this theorem implies ρ 2 , ρ 3 can be sufficiently small if ε is. Then ρ 1 can be very close to ρ. The restrictions of Theorem 3.1 are also satisfied for ε small enough. Then we get improved stability time compared with [LNN, N].
7. Global stability: stability for arbitrary initial data In this section, we consider stability result for arbitrary initial data and give a proof of Theorem 2.1. We first prove the following lemma.
Proof. First recall the Dirichlet theorem for simultaneous approximation: for any α ∈ R n , Q ∈ R, and Q > 1, there exists an integer q, 1 ≤ q < Q, such that |qα − Z n | ∞ ≤ Q −1/n . An improvement of the estimate can be obtained by rescaling α to α/|α| ∞ . Then apply the Dirichlet theorem to approximate the remaining n − 1 components of α with one of whose ±1 entries removed. We get the following: there exists a rational vector α * of periodT = q |α|∞ , q ∈ N, 1 < q < Q and |α * − α| ∞ ≤ 1 T Q 1/(n−1) (see (the only) Proposition in [N]).
The frequency vector is ω(I) = ∇H 0 (I). Consider two points I * and I such that ω(I * ) is as stated in the lemma and approximates ω(I) in the same way as α * approximates α. Remark 7.1. The first restriction in Theorem 2.1 can be satisfied by making µ smaller while ε larger. This will lead to shorter stability time. The µ here plays the same role of the factor g in [LNN, N]. The second restriction can always be satisfied by making ε small. The third restriction can be satisfied by making µ or ρ 1 /ρ 3 small. However, since n! grows very fast, for large n, this restriction is easy to satisfy.
Appendix A. Proof of Lemma 5.13 The proof is done in the following Claim 1,2,3, which estimates Σ ± , Σ > , Σ 0 in Lemma 5.13 respectively. Before the proof of the lemma, let us first analyze the geometry of numbers involved.
A.1. The geometry of integer vectors. Let us look at the Figure 2. • The diamond: the diamond in the figure encloses all the vectors k with |k| ≤ K (in 3-dim it is an octahedron. In general it is a ball of radius K under the l 1 norm). The total number of integer vectors inclosed in the diamond is (2K) n n! . Indeed, in n-dim, the diamond consists of 2 n simplices.
Each of the simplices has volume K n n! . • The hyperplane: the small arrow indicates the rational frequency ω * . HP 0 is a hyperplane that is perpendicular to ω * . HP 0 = D 0 and the (n − 1)dim volume of HP 0 ∩ Diamond is less than (2K) n−1 (n − 1)! , which is the (n − 1)dim volume of an (n − 1)-dim Diamond. Any vector lies above HP 0 has positive inner product with ω * , while any vector below has negative inner products. Moreover, if two vectors lie on the same hyperplane which is parallel to HP 0 , they will have the same inner product with ω * . Let us denote HP d = {k ∈ Z n | k, ω * = d/T }. HP 0 ∩ Diamond contains at most (2K) n−1 (n − 1)! integer points.
• The parallelogram: consider the vectors l + , l − , k in the Figure 2. Suppose we have the relation l + + l − = k. Then the three vectors together with the origin form a parallelogram. Suppose k, ω * T = 1, and l + ω * T = 2, then l − , ω * T = −1. l + and l − can move on their corresponding hyperplane, but a parallelogram is always preserved. • The shape of the diamond under the averaging flow: in the definition of D ± (δ), we have the restriction |l ± |ρ 3 + | l ± , ω * |δ ≤ ρ 3 K for l ± ∈ D ± (δ). When δ = 0, this is our diamond. When δ increases, The diamond will collapse, i.e. the integer vectors becomes fewer on HP d .
The rate of decreasing depends on the inner product | l ± , ω * |. The farther a hyperplane HP d is away from HP 0 (The larger the d), the faster it collapses (with volume decreasing rate d/(ρ 3T )). HP 0 does not change at all. When δ = δ * , the diamond would collapse to its intersection with HP 0 . By then we would have successfully killed all the nonresonant terms up to the desired exponential smallness e −ρ 2 |k|−ρ 3 K . We denote the collapsed diamond at time δ by Diamond(δ).
Moreover, the majorant relation is also preserved by solving differential equations or integral equations.
We have the theorem Theorem B.1 (Chapter 5 of [TZ]). If f (z, δ), 0 ≤ δ ≤ δ 0 is a solution of the majorant system (B.2) associated with (B.1), then the system (B.1) has a solution and f k (z, δ) f k (z, δ) for any δ ∈ [0, δ 0 ], k ∈ Z. The same is true if we rewrite systems (B.1) (B.2) in the integral form: F k (f (z, s), z, s) ds, f k (z, δ) =f k (z) + δ 0 F k (f (z, s), z, s) ds, With this theorem, we treat δ as a parameter instead of a variable. So we do not need to do the Taylor expansion w.r.t. δ.
completed when the author is visiting IAS and he would like to thank IAS for its hospitality.