Normal form for lower dimensional elliptic tori in Hamiltonian systems

We give a proof of the convergence of an algorithm for the construction of lower dimensional elliptic tori in nearly integrable Hamiltonian systems. The existence of such invariant tori is proved by leading the Hamiltonian to a suitable normal form. In particular, we adapt the procedure described in a previous work by Giorgilli and co-workers, where the construction was made so as to be used in the context of the planetary problem. We extend the proof of the convergence to the cases in which the two sets of frequencies, describing the motion along the torus and the transverse oscillations, have the same order of magnitude.


Foreword
In a previous work in collaboration with Locatelli (see [7]), we have emphasized the key role played by the invariant elliptic tori in the dynamics of the FPUT non-linear chains with a small number of nodes. In fact, in that article we have described a suitable algorithm constructing the normal forms corresponding to those lower-dimensional manifolds. Moreover, we have shown that most of the initial conditions of the same type of those considered in the original work written by Fermi, Pasta and Ulam (see [15]) stay in the effective stability region around elliptic tori. These are keystone concepts to which Prof. Antonio Giorgilli has repeatedly given fundamental contributions during his scientific career: dynamics in FPUT models, perturbative methods constructing normal * 2020 Mathematics Subject Classification. Primary: 37J40; Secondary: 70H08, 70H15. Key words and phrases: lower dimensional invariant tori, KAM theory, normal form methods, perturbation theory for Hamiltonian systems.

Introduction
Lower dimensional elliptic tori can be seen as an extension of stable periodic solutions, because they are invariant manifolds on which the motion is quasi-periodic but characterized by a number of frequencies less than the number of degrees of freedom of the system; furthermore, the linear approximation of the local expansion of the equations of motion in the remaining degrees of freedom possesses an elliptic equilibrium point in correspondence of the origin. For what concerns the persistence of elliptic tori in nearly-integrable Hamiltonian systems of finite dimension, a perturbation theory was firstly proposed by Melnikov (see [35]), even if a complete proof was lacking; the first proofs were given by Moser ([37]) and Eliasson ([13]); in [40], Pöschel gave another proof of the existence of lower dimensional elliptic tori using a quadratic perturbative method and including the case of tori embedded in infinite dimensional Hamiltonian systems. These results have been further improved in [41], in order to make them suitable to applications in nonlinear PDEs. More recently, in [4] and [5], the authors proved the existence of elliptic tori in two kinds of planetary problems. However, analytical results concerning the persistence of these invariant objects tend to be unusable when dealing with the application to specific dynamical systems, being the hypothesis on the smallness of the perturbation usually unrealistically small. Moreover, when dealing with physical problems, the perturbation is not a free parameter, but it is given. Therefore, despite the numerical evidence of stable motions, the existence of invariant manifolds cannot be guaranteed by the mean of these analytical results, because the problem is not so close to an integrable one.
For what concerns the KAM theorem, several different proof schemes have been introduced in the last 50 years with the aim of improving the applicability to realistic physical problems. In this context, approaches based on explicit techniques that can be translated in an algorithm, which can be coded in any programming language, are in a better position. In the present work, the attention will be focused on the particular kind of proofs based on a classical convergence scheme with respect to the small parameter ε (see [21] and [22]). The key idea of these approaches is that explicit computations of proper normal forms can significantly reduce the size of the perturbation so as to allow the application of analytical results, otherwise inaccessible. Moreover, they can be effectively complemented with computer-assisted estimates (see [11] and, for computer-assisted rigorous results based on a different technique, e.g., [9]). In the same spirit of what was done for the existence of full dimensional invariant tori, in [43] the authors developed a constructive algorithm for showing the existence of lower dimensional elliptic tori in the planetary problem. The construction of elliptic tori was performed by giving the Hamiltonian a suitable normal form, using an infinite sequence of near-to-the-identity canonical transformations defined by the Lie series operator. In the subsequent paper [25], the complete proof of the convergence of such a procedure was included. Though, that method and the proof were designed with the aim of being applied to the study of planetary systems, where typically the frequencies of the motion on the lower dimensional torus are faster with respect to the transverse frequencies, that are indeed generated by the small perturbation.
We adapt that algorithm and, more specifically, the proof so as to apply it to a general class of Hamiltonians for which the frequencies of the torus are of the same order as the frequencies describing the limit of the small oscillations that are transverse to the torus. On one hand such a difference does not affect the general structure of the algorithm, on the other hand it has a role in the proof of the convergence of the construction of the normal form, in particular in the discussion of the so called geometry of the resonances. Moreover, let us remark that this new formulation of the theorem includes also the cases in which the two sets of frequencies are of different order, as the one considered in [25]. The main obstacle to the convergence is the introduction of possible small divisors, that are integer combinations of the two sets of frequencies, i.e., the frequencies of the invariant torus and the ones related to the motion transverse to the torus. Furthermore, the algorithm itself introduces small corrections on both the frequencies, thus making the non-resonance condition to be verified at each step different and somehow unpredictable. Since the transverse frequencies have the same weight with respect to the others and they depend on the frequencies of the torus, we need to introduce some hypotheses in order to assure the convergence for an initial set of frequencies of positive measure. Let us stress that some of the hyphoteses used here could be weakened. For example, in [25] the authors assumed the weaker condition τ instead of the usual Diophantine condition in the estimates for the analitic part (see [27] and [26] for a more detailed discussion about the condition τ and its link with Bruno numbers). However, while the set of Diophantine numbers has positive Lebesgue measure, the one satisfying condition τ that are not Diophantine has zero Lebesgue measure. Since the result we are presenting here cannot ensure the convergence for a specific set of initial frequencies and, therefore, we cannot ensure that the frequencies for which it will work satisfy such a condition, we decided to not include it here. In [3], the authors showed how also the Melnikov conditions can be imposed just to the final frequencies. Both these kind of improvements could be considered by substituting the measure argument in favour of a statement for specific final frequencies. Something in this direction has already been done in the case of full dimensional tori in [42], where the set of initial frequencies for which the theorem is valid is determined a posteriori.

Outline of the algorithm and of the proof
By following the approach of Pöschel [40], let us consider a family of initial Hamiltonians parametrized with respect to ω (0) and written as where the variables, split in action-angle variables (p, q) and canonical variables (x, y), belong to neighborhoods of the origin. We collect the Hamiltonian terms in the functions F j , where the subscript j denotes the quantity 2 deg(p) + deg(x, y). Moreover, the terms of total degree j ≥ 3 are split in the averaged terms F (av.) ≥3 and in the part with null average with respect to the angles q, i.e., εF (n.av.) ≥3 . As usual, ε denotes a small parameter. Here, ω (0) is considered to be a parameter with values in some open domain. Under the hypothesis of the nondegeneracy of the frequency-action map related to the integrable approximation, we can expand a given analytic quasi-integrable Hamiltonian as in formula (1) by doing expansions for different values of the actions p, that therefore correspond to different frequency vectors ω (0) . An example of how to define such a family of Hamiltonians in the case of the FPUT model can be found in [7]. The search for an elliptic torus proceeds by applying to the Hamiltonian a perturbative procedure that is a natural adaptation of the original scheme of Kolmogorov. Let us give a sketch of it.
We look for a canonical transformation that gives the Hamiltonian above the normal form where ω (∞) = ω (∞) (ω (0) ) and Ω (∞) = Ω (∞) (ω (0) ) . For this purpose, the transformation should remove all terms F 0 (q; ω (0) ), F 1 (q, x, y; ω (0) ) and F 2 (p, q, x, y; ω (0) ). Indeed, as a straightforward consequence of the normal form above, the torus P = 0, X = Y = 0 is invariant and elliptic, and it carries a quasi-periodic motion with frequencies ω (∞) . The proof is divided in two parts: analytic and geometric. The analytic part requires an infinite sequence of near to the identity canonical transformations through the Lie series (see, e.g., [31] and [20] for a self-consistent introduction) defined as where L f (·) = {f, ·} is the Lie derivative with respect to a generic dynamic function f , being {·, ·} the Poisson bracket 1 . An accurate control of the accumulation of the small divisors is needed in order to assure the convergence of the procedure. Starting with H (0) as in (1), we introduce an infinite sequence of Hamiltonians such that at every step the size of the perturbing terms is reduced, as indicated by the factor ε r+1 . During the iteration of the algorithm, we introduce possible small divisors, that are integers combinations of the frequencies ω (r) and Ω (r) , which are at each step different. This is an additional issue to the control of their contribution. Usually, the accumulation of small divisors is controlled by the mean of a perturbation method that exhibits a fast convergence (quadratic method), as in the original paper of Kolmogorov. Here, we follow the classical procedure that works step-by-step in powers of the parameter ε, with the aim of producing an effective constructive algorithm. The accumulation of small divisors can be geometrically controlled, as it follows some strict "selection rules". The method is essentially the same that has been used for the proof of Kolmogorov theorem in [21] and [22] and described in [23]. The analytic procedure is followed by a geometric argument concerning the estimate of the measure of a suitable set of nonresonant frequencies, based on an adaptation of the approach described in [40]. The scheme is actually a revisiting of the geometric construction of non-resonant domains that can be found, e.g., in [1], [38], [39], [13] and [28].

Statement of the result
We come now to a formal statement of our main result. Let us consider a 2(n 1 + n 2 )dimensional phase space endowed with n 1 pairs of action-angle coordinates (p, q) ∈ O 1 × T n 1 and n 2 pairwise conjugated canonical variables (x, y) ∈ O 2 ⊆ R 2n 2 , where both O 1 ⊆ R n 1 and O 2 are open sets containing the origin. We also introduce an open set U ⊂ R n 1 and the frequency vector ω (0) ∈ U which plays the role of a parameter. Theorem 1.1 Consider a family of analytic real Hamiltonians as in (1), parametrized by the n 1 -dimensional frequency vector ω (0) , with ε playing the usual role of small parameter. Let us assume that (a) the frequencies Ω (0) j : U → R are analytic functions of ω (0) ∈ U ; similarly F 0 , F 1 , F 2 and F ≥3 are analytic functions of (p, q, x, y; (c) the functions F j are such that j = 2p + m, where p is the degree in p, while m is the degree in (x, y); in particular, F ≥3 contains terms with degree j at least 3; (d) for some E > 0 one has sup (p,q,x,y;ω (0) )∈O 1 ×T n 1 ×O 2 ×U F j (p, q, x, y; ω (0) ) ≤ E ∀ j ≥ 0.
Then, there is a positive ε ⋆ such that for 0 ≤ ε < ε ⋆ the following statement holds true: there exists a non-resonant set U (∞) ⊂ U of positive Lebesgue measure and with the measure of U \ U (∞) tending to zero for ε → 0 for bounded U, such that for each ) and an analytic canonical transformation (p, q, x, y) = ψ (∞) ω (0) (P , Q, X, Y ) leading the Hamiltonian in the normal form We remark that the main difference with respect to the result reported in [25] is the lack of the small parameter ε in front of the terms which describe the motion transverse to the n 1 -dimensional torus, as these transverse oscillation are of the same order of the quasi-periodic motion. For the sake of simplicity, in the statement above we omitted any explicit estimate of the threshold value ε ⋆ , which can be found in (104). In turn, this estimate depends on a set of parameters introduced in lemma 3.1, in proposition 3.11 and in formulae (40), (53) and (70). The whole proof requires a lot of technical estimates and it is organized as follows. In section 2 we describe the formal algorithm. Section 3 is devoted to the proof of theorem 1.1 and it is composed of two different parts: the analytic part of the proof contains the quantitative estimates that lead to the proof of convergence, with a special emphasis to the control of small divisors; in the geometric part of the proof, we show that our procedure applies to a set of initial frequencies of large relative measure. Finally, in section 4 we comment the result and the possible applications.

Formal algorithm
This section is devoted to the formal algorithm that takes a Hamiltonian (1) of the family H (0) , parametrized by the frequency vector ω (0) , and brings it into a normal form. Therefore, the section includes all the formulae that will be used in order to establish the convergence of the normalization process. We found convenient to use the complex variables (z, iz), defined by the canonical transformation z = (x + iy)/ √ 2, in order to deal with the transverse oscillations.

Initial settings and strategy of the formal algorithm
For some fixed positive integer K, we introduce the distinguished classes of functions P sK m,l , with integersm,l, s ≥ 0 , which can be written as g(p, q, z, iz) = m∈N n 1 |m|=m (ℓ,l)∈N 2n 2 |ℓ|+|l|=l k∈Z n 1 |k|≤sK c m,ℓ,l,k p m z ℓ (iz)l exp(ik · q) , with coefficients c m,ℓ,l,k ∈ C. Here we denote by | · | the ℓ 1 -norm and we adopt the multi-index notation, i.e., p m = n 1 j=1 p m j j . We say that g ∈ P sK ℓ in case Finally, we denote by g ϑ = T n dϑ 1 . . . dϑ n g/(2π) n the average of a function g with respect to the angles ϑ ∈ T n . We shall also omit the dependence of the function from the variables, unless it has some special meaning.
In the following lemma we state a relevant algebraic property what we are going to use several times.
The straightforward proof is left to the reader.
We start with the Hamiltonian in the form where f (0,s) ℓ ∈ P sK ℓ . The Hamiltonian (1) may be written in the form (5) (see section 3). Starting from H (0) , we introduce an infinite sequence of Hamiltonians H (r) r≥0 with H (r) in normal form up to order r , according to the following description. We transform H (r−1) into H (r) via a near-the-identity canonical transformation generated by a composition of three Lie series of the form where χ 1 and χ (r) 2 are determined by homological equations. At every normalization step the frequencies ω (r) and Ω (r) may change by a small quantity (see formulae (25) and (26)).
The small divisors are controlled by introducing a non-resonance condition up to a finite order rK, namely min k∈Z n 1 , 0<|k|≤rK ℓ∈Z n 2 , 0≤|ℓ|≤2 and min 0<|ℓ|≤2 ℓ · Ω (r−1) (ω (0) ) ≥ γ , where γ > 0 and τ ≥ n 1 − 1. For |ℓ| = 0 condition (7) is the usual condition of strong non-resonance, while for |ℓ| = 1, 2 it is usually referred to as the first and second Melnikov condition, respectively. We come to the description of the generic r-th normalization step. Let us write the Hamiltonian H (r−1) as where f (r−1,s) ℓ ∈ P sK ℓ . In the expansion above, the functions f (r−1,s) ℓ may depend analytically on ε, the relevant information being that they carry a common factor ε s . Such an expansion is clearly not unique, but this is harmless.

First stage of the normalization step
Our aim is to remove the term f Considering the Taylor-Fourier expansion we readily get The denominators are not zero in view of the non-resonance condition (7) with |ℓ| = 0 .
The new Hamiltonian is determined as the Lie series with generating function ε r χ (r) 0 , namely The functions f (I;r,s) ℓ are recursively defined as with f q has been omitted. Indeed, it does not affect the Hamilton equations, but it is just the level of the energy of the solution we are going to construct.

Second stage of the normalization step
We remove f (I;r,r) 1 from (13). We determine a second generating function χ Again, if we consider the Taylor-Fourier expansion then the solution of the homological equation is where the divisors cannot vanish in view of condition (7) and of the condition Ω

The new Hamiltonian is calculated as
and is given the form (13), replacing the upper index I by II , with

Third stage of the normalization step
We remove here at the same time the q-dependent part and the average one of f (II;r,r) 2 .
Considering again the Taylor-Fourier expansion we determine the generating function χ with where the coefficients c We get where the divisors cannot vanish in view of conditions (7) and (8).
The terms of the Hamiltonian can now be calculated as The average terms that cannot be eliminated contained in f (II;r,r) 2 are recollected in the frequencies ω (r) and Ω (r) , which are therefore corrected as: and Ω (r) with Z (r) as in (22). The transformed Hamiltonian is given the form (9), replacing r − 1 with r, namely eventually with a change of the frequencies ω (r) and Ω (r) . Let us add a few considerations. First, we remark that our formulation of the algorithm works for both real and complex Hamiltonians. However, if the expansion (5) contains only real functions, then all terms of type ω (r) · p , n 2 j=1 Ω (r) j z jzj and f (r,s) ℓ generated by the algorithm are real too, as it can be easily checked. Finally, the Hamiltonian H (r) in (27) has the same form of H (r−1) , so that the induction step can be iterated provided the conditions (7) and (8) hold true with r + 1 in place of r .
3 Proof of the convergence of the algorithm 3.1 Analytic settings open balls centered at the origin with radii ρ and R, respectively, W is a subset of R n 1 , while the subscripts σ and h denote the usual complex extensions of real domains (see [20]), i.e., and Let us consider a generic analytic function g : where We define the weighted Fourier norm It is convenient to introduce the Lipschitz constant for the Jacobian of the function We notice that the dependence on the parameter ω (0) plays no role in the following, thus we shorten the notation by ignoring it. (a') the initial set W (0) of frequencies is non-resonant up to the finite order K , namely every being dist the usual euclidean distance from a vector to a set and K (0) ℓ the closed convex hull of the gradient set ℓ ; (d') the following upper bounds hold true: We give a sketch of the proof. The Hamiltonian terms should be split in different parts, each of them with a finite expansion. For any fixed value of index ℓ , standard arguments on the Fourier decay of the coefficients allow us to determine suitable values of the parameters K and σ , such that for s ≥ 0 the norms of the functions f (0,s) ℓ are uniformly bounded (see, e.g., the proof of lemma 5.2 in [20]). For a fixed K, we can determine γ > 0 , τ > n 1 − 1 , h 0 > 0 and a compact set W ⊂ U such that a Diophantine inequality in (a') is satisfied, being U the initial set of frequency vectors ω (0) in the hypotheses of theorem 1.1. The first property in (b') is a straightforward consequence of the analyticity of Ω (0) j on the domain U and the second property is satisfied if we choose W (0) small enough. Indeed, by reducing W (0) we could reduce also the gradient set G (0) ℓ in view of the analiticity of Ω (0) in W (0) . Property (c') is a consequence of the fact that we can organize different terms of the Taylor-Fourier expansion of the Hamiltonian. Finally, the inequality at point (d') is satisfied for appropriate values of ρ and R determined by standard arguments on the Taylor expansions of homogeneous polynomials 2 . Taking into account that the usual sup-norm is bounded by the weighted Fourier one, for ε < 1 we have

This implies that the Hamiltonian
We give now some estimates for the Lie series. We shorten the notation by writing · α in place of the norm · α(ρ,σ,R) , where α is a positive number.
Estimates similar to (35) are contained, e.g., in [20]. Nevertheless, a little additional work is needed in order to adapt them to the present context.

Analytic part
In this section, we translate our formal algorithm into a recursive scheme of estimates on the norms of the functions involved in the normalization process.

Small divisors
It is well known that the accumulation of the small divisors is one of the major obstacles to the convergence of perturbative proof schemes. In order to control it, firstly, we introduce lists of indices S = {s 1 , . . . , s j } that may contain repeated elements and we give some definitions that will be useful later.

Definition 3.3
For any fixed ω (0) , we say that the sequence of frequency vectors is Diophantine, if there are two constants γ > 0 and τ ≥ n 1 − 1 such that (7) and (8) are satisfied, for all r ≥ 1.
Definition 3.4 Let S be a list of indices. We define the evaluation operator V on S as follows: where τ is related to the Diophantine condition in definition 3.3. Moreover, let S 1 , . . . , S n be lists of indices, the operator MAX extracts the list of indices that maximizes the evaluation operator, i.e., Finally, we introduce the operator N k on a list S, which counts the numbers of elements in S with value in a quadratic range, that is Now, let us describe the mechanism of accumulation of the small divisors in a rather informal way. The key remark is that the small divisors are created by the solution of a homological equation and accumulate through Lie derivatives. For instance, when we solve (10) with r = 1, in χ (1) 0 we introduce a divisor which can be estimated by a 1 = γ/K τ according to condition (7). We can therefore associate to χ For a sum of several Lie derivatives, we select the greater list, according to the definition of the operator MAX in (37). Thus, the process of accumulation of divisors is described via the union of lists. An heuristic analysis of the mechanism of accumulation, with the aim of identifying the worst one, could be made by unfolding all the recursive definitions of the functions f (...) ... in (14), (19) and (24), in consequence of the three transformations of coordinates introduced at every normalization step. The process is substantially similar to the simpler cases of the Birkhoff normal form or the Kolmogorov normal form. Using as a reference the description in [23] and [33], we define a valid selection rule that allows us to control geometrically the product of small divisors.
A second source of divergence is connected with the restriction of domains required by the estimate of multiple Poisson brackets. The restrictions are controlled by a sequence d r that should converge to some positive d small enough. We define the sequences {d r } r≥0 and {δ r } r≥1 as At the r-th step of the algorithm, we find an Hamiltonian H (r) which is analytic in D (1−dr)(ρ,σ,R) , namely in a domain restricted by δ r . Nevertheless, the parameters can be chosen in a way that lim r→∞ d r = 1/4 and, therefore, the sequence of domains converges to a compact set with no empty interior. Finally, in order to give quantitative estimates of the accumulation of small divisors, we are going to use the operator N k (·) that counts how many indices of a specific range we have in a list.

Convergence of the algorithm under non-resonance conditions
It is convenient to introduce the constant This will allow us to produce uniform estimates in the parameters. The number of summands in the recursive formulae (14), (19) and (24) Lemma 3.5 Consider a Hamiltonian H (0) of type (5). Assume that the non-resonance hypotheses (7) and (8) are satisfied up to order r ≥ 1, for some γ > 0 and τ ≥ n 1 − 1. Then, the formal algorithm described in section 2 can be iterated up to introduce H (r−1) expanded as in (9) with f (r−1,s) ℓ ∈ P sK ℓ and also the r-th normalization step can be performed. Moreover, we assume that the terms of the Hamiltonian H (r−1) are bounded so that where ζ ℓ = 3 − ℓ if ℓ ≤ 3, ζ ℓ = 0 otherwise, and V is the operator defined in (36). Therefore, the norms of the generating functions of the r-th step are subject to the following limitations: Furthermore, the terms appearing in the expansion of the new Hamiltonian H (r) in (27) are bounded by The lists of positive integer numbers H (r,s) ℓ ,and G (r) j 0≤j≤2 are recursively defined as follows: Finally, for r ≥ 1 , the variations of the frequencies induced by the r-th normalization step, are bounded by The proof is made by induction on r and it is deferred to appendix A.
Lemma 3.6 The following rules hold true for the list of indices defined in (45) for all r ≥ 1 and s ≥ r when ℓ = 0, 1, 2, or s ≥ 0 if ℓ ≥ 3: The proof is made by induction and it is deferred to appendix B. We can estimate the sequence {ν r,s } r≥0 , s≥0 , counting the number of summands generated by the algorithm, as in the following lemma. The proof is available in [25].
Lemma 3.8 Consider a Hamiltonian H (0) of type (5). Assume that the non-resonance hypotheses (7) and (8) are satisfied up to order r ≥ 1, with γ > 0 and τ ≥ n 1 − 1. Then the formal algorithm described in section 2 can be iterated up to introduce H (r−1) expanded as in (9), with f (r−1,s) ℓ ∈ P sK ℓ , and also the r-th normalization step can be performed. Moreover, let us suppose that conditions (c') and (d') of lemma 3.1 are satisfied. Therefore, at step r of the normalization procedure, the generating functions are bounded by: 2e the functions f (r,s) ℓ , with s > r for 0 ≤ ℓ ≤ 2, s ≥ 0 for ℓ ≥ 3, are bounded by: finally, the sequences {ω (r) } r≥0 and {Ω (r) } r≥0 satisfy: where A = M 3 2 12(4+τ ) 2 8 and M is defined as in (40).
Proof Thanks to the hypotheses (c') and (d') of lemma 3.1, it is straightforward that the terms of the Hamiltonian at step 0 statisfy (42). Then, lemma 3.5 applies and we can iterate it up to r times. Therefore, since we can use the estimates in (43), (44) and (46), in order to prove the new inequalities in the statement we need to provide an upper bound for V(·). Using the definition (36) and the rules for the list of indices in (48), we have the following estimate: It is convenient to work with log 2 V G (r) 0 : Thus, it follows that V G . For the generating functions, it suffices to observe that we can estimate ν r−1,r , ν (I) r,r and ν (II) r,r with ν r,r and finally use the estimate provided in lemma 3.7. This allows to fully justify (49). The proof of (50) is very similar; indeed, we have an analogous estimate for log 2 V H (r,s) ℓ as that in (52), by replacing r with s. Using that ν r,s ≤ ν s,s , we can verify (50) . We have now to prove (51). Using again (46) and the estimates for V(G (r) 2 ) and ν r,r , the inequality directly follows. This concludes the proof.
We summarize the analytic estimates in the following (f ') the parameter ε is smaller than the "analytic threshold value" ε ⋆ an , being where M is defined in (40) and τ is related to the Diophantine condition in 3.3. Then, there exists an analytic canonical transformation Φ (∞) (54) Furthermore, the norms of the functions f (∞,s) ℓ ∈ P sK ℓ are bounded by and the sequences {ω (r) } r≥0 and {Ω (r) } r≥0 converge to the limits ω (∞) and Ω (∞) , respectively.
The rest of the section contains a sketch of the proof, which depends on all the previous lemmas.
First, thanks to the hypothesis (e') of non-resonance and the condition (c'), (d') of lemma 3.1, the estimates in lemma 3.8 hold true at every step r. Remarking that εA < 1 we have that H (r) , written as in (27), is analytic on D (1−dr)(ρ,σ,R) ⊃ D 3/4(ρ,σ,R) . With similar calculations, in view of condition (f'), since the inequalities in (51) hold true at every step r, we can conclude that the sequences {ω (r) } r≥0 and {Ω (r) } r≥0 are Cauchy sequences, so they have limits.
Let us now focus on the difference H (r) − H (r−1) . Using (27) and (9) and the fact that f (r,r) ℓ = 0 for ℓ = 0, 1, 2 and f Therefore, in view of the inequalities in (50)-(51), we have Recalling that the sup-norm is bounded by the weighted Fourier one, this proves that {H (r) } r≥0 is a Cauchy sequence of analytic Hamiltonians admitting a limit H (∞) , since the r.h.s. of the estimate above tends to zero for r → ∞. Moreover, formulae (56)-(57) imply that also {ω (r) } r≥0 , {Ω (r) } r≥0 and {f It remains to prove that the canonical transformation Φ (∞) ω (0) : D 1/2(ρ,σ,R) → D 3/4(ρ,σ,R) is analytic. The proof is based on standard arguments in the Lie series theory, that we recall here, referring to subsection 4.3 of [21] for more details. Let us denote by ϕ (r) the canonical change of coordinates induced by the r-th step, i.e., Using inequalities in (43), one can easily verify that the following inequalities hold true: For example, let us consider the generating function χ (r) 0 , that depends only on q. Then, Thanks to the condition on ε, the sum converges and we obtain the corresponding estimate in (59). Estimates for the other Lie series appearing in (59) can be deduced analogously using the inequalities in (43) and in (35). Therefore, one has ϕ (r) (D (1/2+d r−1 )(ρ,σ,R) ) ⊂ D (1/2+dr )(ρ,σ,R) and defining Φ (r) = ϕ (1) • . . . • ϕ (r) one has Φ (r) (D 1/2(ρ,σ,R) ) ⊂ D 3/4(ρ,σ,R) . By repeatedly using the so-called exchange theorem for Lie series, one immediately obtains that H (r) = H (0) • Φ (r) . By using estimate (59), we can prove that the canonical transformation Φ (∞) Actually, with some additional effort, we could prove that Φ (∞) ω (0) differs from the identity just for terms of order O(ε) . As a final comment, we adopt the symbol Φ (∞) ω (0) in order to emphasize the parametric dependence of that canonical transformation on the initial frequency ω (0) , which is relevant in the next section.

Geometric part: measure of the resonant regions
The aim of this section is to show that there exists a set of frequencies ω (0) of relatively big measure, for which our procedure converges. To this end we consider the sequence (ω (r) , Ω (r) ) r≥0 as function of the parameter ω (0) recursively defined by (25) and (26).
We start from the compact set W (0) ⊂ R n 1 and its complex extension W h 0 , the image having large relative measure. Let us describe the process in some more detail. For r ≥ 1 and some fixed positive values of the parameters γ , τ ∈ R and K ∈ N , we define the sequence of real domains {W (r) } r≥0 as follows: at each step r, we remove from W (r−1) all the resonant regions related to the new small divisors appearing in the formal algorithm described in section 2. Therefore, where each so called resonant strip is defined as follows: The complex extensions W (r) hr are technically necessary in order to work with Cauchy estimates, but it is crucial that the measure is evaluated on the real domain. It will be also convenient to introduce the functions δω (r) and ∆Ω (r) defined as The following part is an adaptation of the approach described by Pöschel in [40]. First, we report here lemma D.1 in [40], since it will be used in the proof of the proposition. Lemma 3.10 Let f be a real analytic function from W h to C n , with W h the complex estension of a domain W ⊂ R n . If then f admits a real analytic inverse ϕ on Wh 4 , that satisfies the following inequalities where the norm · ∞; is defined in a way analogous to (34).
with η = min{1/K , σ} . Considering the sequence of Hamiltonians {H (r) } r≥0 , formally defined by the algorithm in section 2, let us assume that the functions ω (1) , Ω (1) , . . . , ω (r) , Ω (r) satisfy the following hypotheses up to a fixed normalization step r ≥ 0: (g') the function ω (s) (ω (0) ) has analytic inverse ϕ (s) on W where δω (s) and ∆Ω (s) are defined as in (62), while A is given in (53); (j') the parameter ε is smaller than the "geometric threshold value" Then, the function ω (r) (ω (0) ) admits an analytic inverse ϕ (r) : W . Moreover, the following non-resonance inequalities hold true: Finally, the Lipschitz constants related to the Jacobians of the functions ϕ (r) and Ω (r) •ϕ (r) are uniformly bounded as where B is defined as Proof The proof proceeds by induction. By hypotheses, Ω (0) (ω) is an analytic function on the complex extended domain W (0) h 0 and its Jacobian is uniformly bounded in W is defined in (34). Thus, starting from the inequality at point (a') of lemma 3.1, if then the non-resonance conditions (7) and (8)  h 0 for 0 < |k| ≤ K , |ℓ| ≤ 2. More precisely, if ω ∈ W (0) and |ω| < h 0 , we have and, for 0 < |ℓ| ≤ 2 , For r = 0, the inequalities in (68) immediately follow from the previous ones, recalling that ϕ (0) = Id. The first change of the frequencies might occur at the end of the first perturbation step and the transformed fast frequencies read j (ω) ≤ γσ εA in view of the assumption (i') in proposition 3.11. If in view of lemma 3.10, the function (Id + δω (0) ) admits an analytic inverse ϕ (1) : , the following estimates hold true: where the norm · ∞;W (0) h 0 4 on the Jacobian of the function ϕ (1) −Id : Wh 0 4 → C n 1 is defined in a way analogous to (34). The growth of the Lipschitz constants for the function ϕ (1) −Id is controlled by settingJ i.e., we have ∂ ϕ (1) Recall now that the step r = 1 includes also the preparation of the next step r = 2, namely cutting out the resonant regions Thus we need an upper bound on both the sup-norm of Ω (1) • ϕ (1) and the Lipschitz constant of its Jacobian. The new transverse frequencies can be written as Using this formula we can bound the Jacobian of the function Ω (1) • ϕ (1) . To this end, we need the preliminary estimate . Thus, we can ensure that the Jacobian of the transformed transverse frequencies is bounded as Using again formula (77), we can bound the deterioration of the non-resonance conditions involving the transverse frequencies. In fact, for ℓ ∈ Z n 2 , |ℓ| ≤ 2 , we have Thus, for ω ∈ W , 0 < |k| ≤ K and |ℓ| ≤ 2 , we obtain the non-resonance estimate where we started from estimate (72) (holding true on all the complex domain W uniformly with respect to ω ∈ W k,ℓ for K < |k| ≤ 2K , |ℓ| ≤ 2 . First, we remove them from the domain, by defining W (1) according to (60)-(61). Having required that the new radius of the complex extension is so small that for K < |k| ≤ 2K , |ℓ| ≤ 2 and ω ∈ W (1) , |ω| < h 1 , we have Here we used the definitions (60)-(61) for the real set W (1) and subtracted the contribution due to the complex extension; moreover, we also used formulae (78), (82), (recall that , in view of (60) and (82)). The inequalities in (68) are justified in view of (80), (81) and (83). This concludes the proof of the statement for r = 1 .
Iterating the procedure for a generic step r > 1 is now straightforward, provided the sequence of restrictions of the frequency domain is suitably selected. Let us describe such a scheme of estimates in detail.
Recall now that we have to prepare for the next step, namely we have to cut out the resonant regions defined in (61), i.e.,
Thus, we need an upper bound on both the sup-norm of Ω (r) • ϕ (r) and the Lipschitz constant of its Jacobian. The new transverse frequencies Ω (r) ϕ (r) (ω (r) )) can be written as Since we have max 1≤j≤n 1 sup ω∈W (r−1) j (ω) ≤ γ(εA) r , in view of the hypothesis (i') in proposition 3.11, using again lemma 3.10 so as to estimate |∂(φ (r) − Id)/∂ω| ∞;W (r−1) , the following inequality (89) at the previous inductive step r − 1, formula (87) and the Cauchy inequality, we can provide an estimate for Therefore, we can bound the Jacobian of the function Using again formula (62) and the inequality above at the previous inductive step, we can bound the deterioration of the non-resonance conditions involving the transverse frequencies. In fact, for ℓ ∈ Z n 2 , |ℓ| ≤ 2 , we have Thus, for ω ∈ W , 0 < |k| ≤ rK and |ℓ| ≤ 2 , if h r−1 satisfies the following condition ), (90) and (84) allows to obtain the non-resonance where we used this same inequality at the previous inductive step, i.e., Moreover, using the inequalities for the transverse frequencies at step r − 1, (90) and the estimate for h r−1 in (91), one can easily obtain the lower bound uniformly with respect to ω ∈ W k,ℓ as they are defined in (61) for rK < |k| ≤ (r + 1)K , |ℓ| ≤ 2 . Then, if we require that for rK < |k| ≤ (r + 1)K , |ℓ| ≤ 2 and ω ∈ W (r) , |ω| < h r , we have Therefore, for r ≥ 1 , it is convenient to resume all the conditions in the following way: whereμ r = ∞ s=r µ s and B is defined in (70). The inequalities in (ii) and (iii) follow from (i) and (65), by estimatingμ 0 as follows: The sequence {h r } r≥0 in (65) has been chosen so as to satisfy all the smallness conditions (71) and (iv), which are required along this proof. The smallness conditions (74) and (i) on ε are satisfied since ε < ε * ge . Let us see it in details. For r = 1, condition (i) is verified. At step r the condition is verified if that is satisfied when ε < ε * ge , which is defined in (67). We have to prove now that h 0 and h r , as defined in proposition 3.11, satisfy condition (iv). Let us start with h 0 . By definition h 0 ≤ γ 4K τ e(J 0 + 1 σ ) , then the inequality with the second term in (71) is satisfied. Using that max{ K 2 , 1 σ } ≤ max{K, 1 σ } = 1 η , we obtain the following estimate By definition h r = h r−1 2 τ +2 , therefore the inequality with the first term in (iv) is satisfied. Therefore, we only need to show the second inequality. We need a lower bound for the following term: Since h r = h 0 (2 τ +2 ) r and (r + 1) τ +1 2 r−1 ≤ (2 r ) τ +1 2 r = (2 τ +2 ) r , the inequality is satisfied. This concludes the proof.

Explicit estimates for the volume of the resonant strips
In order to prove theorem 1.1, we need to focus on a set of tori with "Diophantine" frequencies as defined in definition 3.3. First, let us denote by K We emphasize that the previous definition is analogous to that of G (0) ℓ given in the context of hypothesis (b') in lemma 3.1. That assumption ensures that the Euclidean distance from the initial set of frequencies is such that dist(k, K is just a little bigger than the original one, so that also at the r-th step it is safely away from the integer vectors, i.e., dist(k, K In order to prove that, the following further smallness condition is needed: Let us see it in details. In order to control the growth of K (r) ℓ , it suffices to show that the following estimate holds true with ω ∈ W (r) . Since W (r) ⊂ W (r−1) ⊂ W , using the inequality in (88), that has been verified in the proof of proposition 3.11, we can bound the function as follows: where J r−1 , that is the Lipschitz constant about the Jacobian of Ω (r−1) (ω (0) ), has been estimated as in (iii) at the end of the proof of proposition 3.11. Then, if ε satisfies the condition in (97), by using also the definition and the estimate of µ r−1 in (i) at the end of the proof of proposition 3.11 and the definition of h r−1 in (65), for r ≥ 2 we have where the condition (97) is used in the last inequality. For r = 1, the r.h.s. in the estimate (99) can be reformulated as where we used again the estimate (i) at the end of the proposition 3.11 and the condition (97). This ends the justification of (98). Therefore, we obtain the following lower bound: The estimate of the volume of resonant zones that are removed at each step is based on lemma 8.1 of Pöschel paper [40], that we report here (the Lebesgue measure is denoted by m(·)).
where D is the diameter of W (0) with respect to the sup-norm.
The volume of the resonant regions must be compared with respect to the initial set W (0) . To this end, we estimate the measure of ϕ (r) (R (r) k,ℓ ) in the original coordinates ω (0) , recalling that ϕ (r) is the inverse function of ω (r) (ω (0) ). Using lemma 3.12 and assumption (97), for k ∈ Z n 1 \ {0} with rK < |k| ≤ (r + 1)K and |ℓ| < 2 we have Starting from the first inequality in (69) and using the well known Gershgorin circle theorem, we control the actual expansion of the resonant zones due to the stretching of the frequencies, by verifying that Using the inequalities (100)-(101), we easily get a final estimate of the total volume of the resonant regions included in W (0) : where c n 2 = (n 2 + 1)(2n 2 + 1) is the maximum number of polynomial terms having degree ≤ 2 in the transverse variables (z, iz) and the maximum number of k such that rK < |k| ≤ (r + 1)K is estimated with K · 2 n 1 ((r + 1)K) n 1 −1 . The last series is convergent if τ > n 1 and the sum is of order O(γ).

Proof
The proof is a straightforward combination of lemma 3.8 and propositions 3.9 and 3.11, which summarize the analytic study of the convergence of our algorithm and the geometric part, respectively. We sketch the argument.
According to lemma 3.1 the family of Hamiltonians H (0) , parametrized by the frequency vectors ω (0) and defined on a real domain, can be extended to a complex domain D ρ,σ,R × W h 0 with suitable parameters; moreover, their expansions can be written as H (0) in (5). Possibly modifying the values of parameters γ and τ , we can choose γ and τ > n 1 such that the property (a') of lemma 3.1 is still satisfied and the estimate of the resonant volume in the last row of (102) is smaller than m(W h 0 ), i.e., we impose that wanted non-resonance conditions are satisfied. This cannot be avoided, because the change of the frequencies is an effect of the perturbation that is not known at the beginning of the procedure. By suitably controlling the small corrections on the frequencies introduced by the algorithm, we can prove the existence of tori for a set U (∞) of positive measure of initial frequencies, but at the same time we cannot obtain the same result for one specific torus or, equivalently, for a specific Diophantine frequency vector ω. Indeed, if on one hand we could add at every step a translation of the actions of the torus in order to fix the frequency vector, as suggested by Kolmogorov in the case of full dimensional tori, at the same time we cannot fix the transverse frequencies Ω j , that depend generically on ω. However, the geometric construction above shows that with high probability the method will work fine at every order; in particular, it is even less probable to encounter a resonance in the explicit calculations, because the algorithm is iterated up to a finite order. In [42], in the case of the classic Kolmogorov algorithm for constructing full dimensional tori (but without the translations to fix the frequency vector), the authors present an approach to determine a posteriori the initial frequencies in order to converge to a picked value of ω.
With some additional work, we believe that the same approach could be extended to the case of lower dimensional elliptic tori. Let us now comment the threshold value ε ⋆ on the small parameter ε, that is explicitly defined in (104). Although the definition involves many parameters, we might produce an asymptotic estimate of the volume of the resonant regions for ε → 0. In view of inequality (102), it is O(γ). Moreover, one can easily show that ε ⋆ is O(γ 3 ), since every element in (104) is O(1/A) and A ∼ 1/γ 3 . Thus the complement of the set of the invariant elliptic tori, i.e., W (0) \ ∞ r=0 ϕ (r) W (r) hr , has a measure estimated by O(ε 1/3 ) . This law is strictly related to the fact that each normalization step requires the solution of three homological equations and this is related with the occurrence of the factor 3 in the accumulation rules of the small divisors that are stated in lemma 3.6. However, an estimate such that the measure of the complementary set of the invariant tori should be O ε 1/3 is not optimal. For instance, this same quantity has been proved to be smaller than a bound O(ε b 1 ) , with b 1 < 1/2, in [4] and [5], where the existence of elliptic tori in a couple of planetary problems is investigated. Let us emphasize that considering a planetary problem (as in [25]) makes much easier the control of the transversal frequencies because they are O(ε). Here, careful additional estimates have been introduced, e.g., in subsection 3.3.1, in order to deal with this difficulty. On the other hand, it is well known that purely analytic estimates are usually unrealistically small. For this reason we did not pay attention to produce optimal estimates.
The interest of such an algorithm is indeed related to the fact that, being fully constructive, it can be used to obtain semi-analytical results in realistic problems, by explicitly performing a number of perturbation steps (e.g., with the help of an algebraic manipulator especially designed for these kind of computations, see [29]) and significantly improving both the applicability threshold and the estimate of the measure. Furthermore, by implementing a suitable scheme of computer-assisted estimates, the semi-analytical results could be made also fully rigorous (see, e.g., [10] and [32] in the case of full dimensional tori). Computer-assisted proofs of existence of elliptic lower dimensional tori have been given e.g. in [34], where the authors described a constructive algorithm without the request of using action-angle canonical coordinates nor the quasi-integrability of the systems. Also the Poincaré-Lindstedt method can be used for computing an approximation of the periodic or quasi-periodic orbits (see, e.g., [14], [16], [12] and [18]). Still, these methods can be used to construct a particular solution of the Hamilton equations, but further work is necessary in order to study the dynamics in its neighborhood. On the contrary, the fact that the method is based on a construction of a suitable normal form, i.e., we produce not just a specific solution but also a local holomorphic Hamiltonian, makes it an optimal starting point for studying the dynamics in the neighborhood of the tori.
From a theoretical point of view, a description of the approach to follow in order to investigate the stability in the neighborhood of invariant tori can be found in [36]. In that article, in the case of full dimensional tori, Morbidelli and Giorgilli showed that, if ρ is the distance with respect to the torus, therefore the upper bound on the diffusion rate is superexponentially small, as it roughly goes as exp(exp(−1/ρ)). This phenomenon is strictly related to the fact that the more an invariant solution is robust (with respect to the perturbation) the more its neighborhood is filled by other invariant tori. This realizes a sort of freezing of the diffusion that is made extremely slow (see [28]). A similar phenomenon happens in the case of lower dimensional elliptic tori, which are surrounded by a stable zone filled by tori of maximal dimension. The slow diffusion from invariant tori is particularly useful in those physical problems for which an eventual proof of stability for perpetual times is much more than necessary, while it is more appropriate to refer to the effective stability. This concept is easy to define in the case, e.g., of a planetary system: the typical lifetime of a star, despite very long, is obviously finite; therefore, in this case, it would be enough to show that the motion is forced to be confined in a small subset of the phase space for a time that exceeds the expected lifetime of the star. To this purpose, a simple tool for providing a lower bound for the time needed to diffuse from small region in the neighborhood of a lower dimensional or full dimensional invariant torus is given by the Birkhoff normal form. Suitable estimates of the remainder of Birkhoff normal form, can be used in order to investigate the stability in Nekhoroshev sense (see, e.g., [36], [24] and [44]), thus for times exponentially long with respect to the perturbation. In [7], Birkhoff normal form is used in this way in order to explain the stability numerically observed in the FPUT model, that is a chain of particle interacting with anharmonic potentials. Moreover, also in this case, the computation of the stability time can be made rigorous with a computer-assisted scheme of estimates (for an introduction, see [6], where it is considered the simpler case of Birkhoff normal forms in the neighborhood of an elliptic equilibrium point). Finally, the convenience of combining different normal forms and of doing expansions nearby elliptic tori is emphasized in [8], where an efficient construction of a one-dimensional elliptic torus turns out to be an advantageous intermediate step before the construction of a full dimensional KAM tori in its neighborhood.