UvA-DARE ( Digital Academic Repository ) A parametrised version of Moser ' s modifying terms theorem

A sharpened version of Moser’s ‘modifying terms’ KAM theorem is derived, and it is shown how this theorem can be used to investigate the persistence of invariant tori in general situations, including those where some of the Floquet exponents of the invariant torus may vanish. The result is ‘structural’ and works for dissipative, Hamiltonian, reversible and symmetric vector fields. These results are derived for the contexts of real analytic, Gevrey regular, ultradifferentiable and finitely differentiable perturbed vector fields. In the first two cases, the conjugacy constructed in the theorem is shown to be Gevrey smooth in the sense of Whitney on the set of parameters satisfy a “Diophantine” non-resonance condition.


Introduction.
1.1. Object. Moser's modifying terms theorem [23] is in essence an averaging result. On the phase space M = T m × R n , it considers small deformationsX of an integrable vector field where ω ∈ R m and A ∈ gl(n, R) are constant and are assumed to satisfy so-called Diophantine non-resonance conditions. The theorem says that if the deformation is sufficiently small in some function norm, say 0 < ε = X − X 1 then there is a constant vector field with δ ∈ R m , µ ∈ R m and B ∈ gl(n, R), such that the following holds. If X 0 denotes the modified vector field X 0 =X − ∆, then there is a conjugacy Φ, ε-close to the identity, for which In particular, the torus T = T m × {0} is invariant under Φ * X 0 , and consequently the torus Φ −1 (T ) is invariant under X 0 . The natural interpretation of the vector field ∆ is that it represents that part of the perturbation which cannot be removed by successive averaging. The object of the present article is to derive a modifying terms theorem for parametrised families of vector fields, incorporating results on smoothness [28] and Gevrey-regularity [25,26] of parameter dependence that have been added to KAMtheory since Moser's article appeared. An extension to general Carleman (or ultradifferentiable) classes is given as well. A second motivation is to make the result a convenient tool for quasi-periodic bifurcation theory. In particular, the condition imposed by Moser that ad A should be semi-simple is removed, so that all situations can be treated for which the unperturbed invariant tori have several Floquet exponents equal to zero. Recall that if a vector field is of the form of the right hand side of equation (2), then the eigenvalues of A are called the Floquet exponents of the invariant torus T . As an application, we sketch the analysis of persistence of tori in the quasi-periodic Bogdanov-Takens bifurcation.
The main result of the present article is to show the existence of a modifying terms vector field ∆ with the above properties, for small parametrised deformationsX of integrable vector fields X. Here the vector fields X andX can be restricted to an admissible structure in the sense of [9], like Hamiltonian, volume preserving, equivariant etc. The deformations are either real analytic, Gevrey-regular, ultradifferentiable or finitely (but sufficiently often) differentiable, and for each category we find regularity properties of the conjugacy Φ and the vector field ∆. In this way, the results contribute to a resolution of problem 10 of Sevryuk's list [33].
1.2. Related work. Invariant tori with one or more vanishing Floquet exponents occur in the integrable versions of many bifurcation scenarios. In the context of a degenerate Hopf bifurcation Chenciner [13] has investigated the saddle-node bifurcation of invariant quasi-periodic circles. His results have been extended by Broer, Huitema, Takens and Braaksma [9], and, in the context of Hamiltonian vector fields, by Hanßmann [18]. The scope of these studies is restricted to the case of a one-dimensional normal space, in the general context, or a two-dimensional normal space, in the Hamiltonian context. More recently, higher order degeneracies have been studied as well [6,19,37].
For one-dimensional normal spaces, the Rüssmann-Herman translated torus theorem is available, which is the discrete-time analogon of the modifying terms theorem. Recently, there modifying terms theorem has been applied in several settings [15,16].
Higher dimensional normal spaces have been treated extensively by other methods in the case of non-vanishing Floquet exponents; we refer the reader to [8,9] and the references there. The results reached in those investigations were restricted to the case that all Floquet exponents are distinct; recently, this restriction has been removed by the work of Hoo [20], which extended previous work of de Jong [17] and Ciocci [14].
Pöschel [28] demonstrated that the conjugacies of KAM theory depend differentiable in the sense of Whitney on the parameters, even in the case that the original 2.1.1. Vector fields and invariant tori. In the following, a family of objects is always taken in the sense as a parametrised family, where the parameter takes values in some subset of a finite dimensional vector space.
Let M be a manifold. We consider small deformations of families of vector fields X on M that leave a family of embedded tori T invariant. Let TM , TT and and T T M denote respectively the tangent bundle to M , the tangent bundle to T and the restriction of TM to T . The quotient T T M /TT is a smooth vector bundle over T , the normal bundle NT of T . By the tubular neighbourhood theorem, NT is diffeomorphic to an open neighbourhood U of T . Assuming the normal bundel to be trivial, the diffeomorphism transfers vector fields on U to vector fields on NT ∼ = T m × R n ; note that then T ∼ = T m × {0}.
Accordingly, in the following families of vector fields X(p) on the phase space M = T m × R n will be considered, where the parameter p takes values in a space P which is an open and bounded neighbourhood of the origin of R q . Note that M can still be identified with the normal bundle NT of the torus T .
A regularly parametrised family of vector fields p → X(p) is usually not distinguished from the equivalent vertical vector field X on M × P. Recall that a vector field is called vertical if the canonical projection of X to the tangent bundle TP of the parameter space P vanishes everywhere. A vertical vector field on M × P is typically written as where x ∈ T m , y ∈ R n and p ∈ P. The set of all differentiable vertical vector fields on M × P is denoted by X .
2.1.2. Normal linear vector fields. If X ∈ X is a vector field of the form (3), the normal linear part NX of X is defined as NX = f (x, 0, p) ∂ ∂x + g(x, 0, p) + ∂g ∂y (x, 0, p)y ∂ ∂y .
Note that the flow of NX maps fibers of the normal bundle NT affinely to fibers; "normal affine vector field" would perhaps be a more appropriate name, but we stick to the convention introduced in [9]. Generally, a vector field L will be called normally linear if it is equal to its normal linear part. If X ∈ X is such that the term g(x, 0, p) in (4) vanishes identically, then X is tangent to the torus T , and T is invariant under the flow of X. Introduce for ε > 0 the scaling diffeomorphism D ε (x, y, p) = (x, ε −1 y, p). If X is tangent to T , then lim ε↓0 (D ε ) * X = NX, and consequently Hence, without loss of generality, it can be assumed that the unperturbed vertical vector field is normally linear.
2.1.3. Integrability. A vertical vector field X ∈ X is called integrable, if it is equivariant with respect to the action Θ of the group T m on M × P that is given as Θ β (x, y, p) = (x + β, y, p) for β ∈ T m . Equivariance means that Define the T m -average [f ] of a function f defined on M × P as here dx denotes the Haar measure on T m . If X = f ∂ ∂x + g ∂ ∂y ∈ X is any vector field, the integrable part [X] of X is given as Note that with this definition, a vector field X is integrable if and only if X = [X]. A vector field X which is such that [X] = 0 is said to be mean-0. Any vector field can be decomposed in an integrable part and a mean-0 part:

2.1.4.
Frequencies. An integrable vector field X of the form (5) can be written uniquely as X = L + Q with L = NX and Q = X − NX. The normal linear part L of X is then of the form Note that if µ(p 0 ) = 0, then the vector field X(p 0 ) is tangent to T , which is consequently invariant. The maps ω : P → R m and Ω : P → R m × gl(n, C), the latter given by THE PARAMETRISED MODIFYING TERMS THEOREM   5 are called the internal frequency map and the (full) frequency map of X, respectively. For a given frequency map Ω, let 2.1.5. Structures. In order to describe families vector fields that admit certain symmetries, "admissible structures" are introduced, following [9,23]. For every d > 0 and every vertical vector field X, define the Fourier truncation T d X of X as An admissible structure is a pair (g, h), where g is the Lie algebra of a finite dimensional Lie group G ⊂ GL(n, R), and where h ⊂ X is an infinite dimensional Lie algebra of vector fields on M , such that g and h satisfy the following properties. For every X ∈ h, the normal linear vector field NX as well as the truncation T d X is in h, for every d > 0. Moreover, the frequency map Ω = (ω, A) of an integrable vector field in h takes values in R m × g.
Let U be an open and bounded subset of M , and let Φ : U → M be an embedding. If for any X ∈ h the vector field Φ * X is the restriction of a vector field Y ∈ h to Φ(U ), then Φ is called a structure-preserving conjugacy associated to h. is a smooth versal unfolding of Ω 0 , if for every smooth deformation Ω = (ω, A) of Ω 0 (that is, for every smooth map p → Ω(p) for which Ω(0) = Ω 0 ) defined on an open neighbourhood P of the origin of R q , the following holds. There is a smaller neighbourhoodP ⊂ P of 0 and there are maps ψ :P → Σ and C :P → GL(n, R), such that ψ(0) = 0, C(0) = I and ω(ψ(p)) = ω(p) C(p)Ā(ψ(p))C(p) −1 = A(p).
More generally,Ω is a versal unfolding of Ω 0 in the Lie algebra R m × g of the Lie group T m × G, if Ω 0 ∈ R m × g, and if for every smooth deformation Ω of Ω 0 taking values in R m × g, maps ψ : P → Σ and C : P → G can be found such that the equations (7) hold. The map Ω is called miniversal if the dimension of P is the smallest possible for a versal unfolding (see [1], §30).
for all k ∈ Z m \{0}. If κ > m − 1 and if γ 0 > 0 is sufficiently small, the set of (γ 0 , κ)-Diophantine vectors has positive Lebesgue measure in R m . For A ∈ g, let α = α A be the vector of imaginary parts of the eigenvalues of A. If A depends continuously on a parameter p, the components of α are assumed to be arranged such that they depend continuously on p.
2.1.8. Normal conjugacies. The vector field X is said to be normally conjugated to a normal linear vector field L at a parameter value p, if there is a neighbourhood U of T and a conjugacy Φ(p) : Note that if L is tangent to T , and X is normally conjugated to L, then X is tangent to the torus Φ −1 (T ), and this torus is invariant under the flow of X.
Let π 2 : M → R n be the projection 2.2. Differentiability classes. The modifying terms theorem stated below will be proved for several differentiability classes.
2.2.1. Notation. Let V ⊂ R m be an open set, and let W a normed vector space. For a multi-index β ∈ N m , the β-derivative D β f with respect to x ∈ R m of a |β|-times differentiable function f is defined as

2.2.2.
Finitely differentiable functions. For V and W as above, let f : V → W be a continuous function that satisfies for some 0 < s < 1 the inequality Then f is Hölder continuous with exponent s. The smallest C such that the equality holds is the Hölder norm f s of f . The space of Hölder continuous functions f : the space of s-times differentiable functions is a Banach space, which will also be denoted by C s .
Let f C M h be the smallest constant C for which these estimates are satisfied: this defines the C M h -norm of f , and with this norm C M h is a Banach space. Note that if h 1 < h 2 , then C M h2 ⊂ C M h1 . If M k = k!, then C M is the class C ω of real analytic functions, and C M h is the space C ω h of real analytic functions that can be extended to complex analytic functions on a complex strip of width h in the imaginary direction. Since this class will be used extensively in the following, the norm . C M h is written as |.| h in this case. If M k = (k!) µ , with µ > 1, then C M is the Gevrey class G µ . The associated Gevrey spaces are denoted by G µ h . Unlike the real analytic class, for every µ > 1 there are functions in G µ h with compact support.
2.2.4. Whitney smoothness. The definitions of the function spaces just introduced can be extended to cover functies f : F → W that are defined on closed sets F ⊂ V , by replacing partial derivatives D β f with components f β of a Whitney jet (cf. [34]). Let a collection of functions {f β } β : F → W be given such that f 0 = f and such that the following consistency condition is satisfied for all β: At every interior point x of F obviously f β (x) = D β f (x). Finite differentiability and the smoothness classes C M are now defined for functions on closed sets in the obvious way. Whitney differentiable functions of a given smoothness class can be extended from F to all of V ; however, the results in this direction are increasingly weaker with increasing differentiability. For finite differentiability, there is a continuous linear extension operator [34]; for smooth functions, extension can still be shown to be a continuous operation [24]. Finally, Gevrey regular functions can be extended to Gevrey functions of the same class, but in general not continuously [5].

Smoothness classes.
The regularity of conjugacies and invariant tori in the results below depends on the regularity of the data; to shorten the statement of the theorem, the following formalism is introduced: the original vector field and its perturbations (the "data") will be in a smoothness class B, while mappings that are constructed in the proof in the theorem will be in a less regular class B , which depends on the original class B. For each of the four B-classes C ω , G µ , C M and C s , we describe the corresponding B -class.
Let f (x, y, .) ∈ B 2 (P , W ); by specifying B 1 and B 2 , we specify B . Note that since parameters are restricted to the closed set P , the smoothness of the parameter dependence is always meant in the sense of Whitney.
Let > 0 be a positive integer, which denotes the maximal degeneracy of a normal eigenvalue of the unperturbed vector field X.
1. Analytic data. If B = C ω 2h , then for any ζ > 0, 2. Gevrey regular data. If B = G µ h with µ > 1, then for any ζ > 0 , where h 1 , h 2 > 0 are some constants, and where Fix η > 0, and for every s ∈ N let λ s = (s + 1) log C 0 + ηs log s + log M s , where C 0 = max{c 1 /h, C} with c 1 the constant given in lemma 4.1 below. Let λ * : [0, ∞) → R be the largest convex function such that λ * (s) ≤ λ s for s ∈ N. Denote by Lλ * the Legendre transform of λ * , which is given by We construct a function g M as follows. For a fixed constant 1 < β < 2, chosen in the course of the proof, let g 0 = Lλ * (0) and let Finally, let g M be the largest convex function such that for all j. Here r j = r 0 a j 1 with 0 < a 1 < 1 and r 0 > 0, which are also chosen in the course of the proof.
Then for any fixed ζ > 0 Note that always B ⊂ C n+1 .

Remark 1.
The conjugacies can be extended as maps, using the theorems mentioned above, to larger parameter sets; however, in general they will cease to be conjugacies on these larger sets.
Remark 2. The size of the open neighbourhood of the unperturbed vector field X for which the perturbation theorem below holds, will in general depend on the constant ζ.
2.2.6. Vector fields. Since all tangent bundles which will appear in this article are trivial, a vector field X is identified with its component map ∂y . The classes of vertical vector fields whose components are of class B or B are denoted by X = X (M × P) or X = X (U × P ), respectively. In particular, by X ω , X µ X M and X s are respectively indicated the class of vector fields that are analytic, Gevrey regular, Carleman regular and finitely differentiable. The norms X B of vector field X in X are defined analogously to the function norms above.

2.3.
Parametrised modifying terms theorem. In order to formulate the main theorem, let the parameter space P ⊂ R m × g × Rq be an open connected set. Write p ∈ P as p = (Ω,p) = (ω,Ā,p), and let (g, h) define an admissible structure of vector fields.

Main Theorem.
Fix Ω 0 = (ω 0 , A 0 ) ∈ ND c , and let a frequency map Ω be given as Ω(p) = Ω 0 +Ω. Let X ∈ h ∩ X be an integrable vector field with normal linear part L Ω . Then there exists an ε 0 > 0 such that for any perturbation P ∈ h ∩ X with P B < γ 0 ε 0 , the following holds.
There is an integrable vector field Λ ∈ h ∩ X , Λ B < C P B , such that if Ω ∈ ND c , then X + P − Λ is normally conjugated to L Ω by a vertical mean-0 structurepreserving conjugacy Φ in B . We have that Φ is normally linear in y and that Φ − id B < C P B for some C > 0.
The proof of this theorem is given in section 4.

3.1.
Perturbations of non-linear integrable families. We are interested in the following situation. Let p → Ω(p) = (ω(p), A(p)) be a frequency map, defined on a neighbourhood P of 0 ∈ R q . Denote by A 0 : R n → R n the linear map given by the matrix Let N and R denote the kernel and the range of A 0 , respectively. Choose complementary subspaces N c and R c to N and R; that is, Given these choices, there is a unique decomposition of a vector z ∈ R n as a sum z = z 1 + z 2 with z 1 ∈ R and z 2 ∈ R c . Define projections π R and π c R by setting π R z = z 1 and π c R z = z 2 ; projections π N and π c N are defined analogously. Let X ∈ h ∩ X be an integrable vector field of the form where q 1 = O(|y|) and q 2 = O(|y| 2 ). Note that we do not make any assumptions on the matrix A(p) in terms of multiplicity or vanishing of eigenvalues, and that therefore the "standard" KAM theorem, as for instance in [9], is not applicable.
Introducing µ = π c N τ and ν = π N τ , and projecting equation (10) on both R and R c , we obtain and , if p takes values in a neighbourhood of 0, then equation (11) can be solved for µ = µ(ν, p) as a function of ν and p. Let then the vector field X has an invariant torus at y = τ (ν, p).
In the statement of the following theorem a map F : There exists an ε 0 > 0, independent of γ 0 , such that for any P ∈ h∩X with P B < γ 0 ε 0 the following holds.
There is a smaller neighbourhoodP of 0, a conjugacy Φ p : T m × R n → T m × R n , a frequency mapΩ :P → R m × g, and maps µ : N ×P → N c , τ : N ×P → R n , f : N ×P → R c , both B -smooth, as well as maps ρ 1 : R n ×P → R m , ρ 2 : R n ×P → R n , ρ 3 : R n ×P → g, at least C n+1 -smooth, with the following properties.
The map µ = µ(ν, p), with ν ∈ N and p ∈P solves the equation The The map f is of the form The frequency mapΩ reads aŝ Finally, if τ = τ (ν, p) and f (ν, p) = 0, then Φ is a mean-τ conjugacy that normally conjugates X + P to LΩ at all parameters for whichΩ(p) is normally Diophantine.
Proof. The proof runs along the same lines as the example of the introduction.
Let Ψ τ : M × P → M × P be a localising transformation, given by Its normal linear part takes the form such that Λ 1 B ≤ Cε, and a B -smooth conjugacy The modifying terms vector field Λ 1 can be extended, non-uniquely, to a vector field defined for allΩ that is at least C n+1 , see [34], and which will also be denoted by Λ 1 .

3.2.
Corollaries. Note that in the situation of theorem 3.1, the linear map A 0 is invertible, then dim R c = 0, and the equation f = 0 disappears. Moreover, if Ω is a versal unfolding of Ω(0), then so isΩ, and the set of parameters p such thatΩ(p) ∈ ND c has positive Lebesgue measure.

Reduction of parameters.
The previous results can also be applied to situations with few parameters. The reduction is based on the following result of Pyartli.
Theorem 3.2. (Pyartli [29]). Let U be an open neighbourhood of a point q ∈ R m , and let a smooth map α : R m → R n (n > m) be given, parametrising a mdimensional submanifold S in R n . Assume that there is a curve ξ : . If κ > n 2 − n + 1, and if γ > 0 is sufficiently small, then the set has positive Lebesgue measure in U .
The significance of this theorem is expressed by the following, less precise, reformulation: if κ > 0 is sufficiently large, then for a generic frequency map Ω, the inverse image Ω −1 ( ND c ) has positive Lebesgue measure.
Suppose Ω is a frequency map such that Ω(0) ∈ ND c . It is always possible to find a versal unfoldingΩ of Ω(0), defined on another parameter space Σ, such that Ω is a subfamily ofΩ; that is, such that there is a map σ : P → Σ with the property that Ω(p) =Ω(σ(p)). Hence, a given vector field X(p) = L Ω (p) + Q(p) -only the parameter dependence is made explicit -can be replaced byX(p, σ) = LΩ(σ) + Q(p) with (p, σ) ∈ P × Σ. By theorem 2.3, for every small perturbation P (p) there is an integrable vector fieldδ(p, σ) such thatX + P +δ has an invariant quasi-periodic torus of mean 0 wheneverΩ(σ) ∈ ND c , sinceΩ is quasi-periodically nondegenerate. Then, by using that X(p) =X(p, σ(p)), the conclusion is obtained that for a generic set of vector fields X, there is an integrable vector field δ(p) =δ(p, σ(p)), such that the set of parameters p for which X(p) + P (p) + δ(p) has an invariant quasi-periodic torus of mean 0, has positive Lebesgue measure in P.
3.4.1. Integrable normal form. Recall that a Bogdanov-Takens singularity occurs if a singular point, say x = 0, of a planar vector field Z 0 , has a multiple eigenvalue 0 with geometric multiplicity 1; that is, the linearisation has a nilpotent part. We assume that Z 0 is a member of a family of vector fields Z σ , parametrised by a twodimensional parameter σ. If some nondegeneracy conditions are met, by a suitable change of phase space and parameter space coordinates, the vector field can be brougth into the form where b = ±1 and r = O(|y| 3 ). Note that Z σ is an unfolding of the nilpotent singularity y = 0. We shall limit our attention to the case b = 1. Consider now the integrable unfolding X σ of the normally nilpotent invariant Introduce the standard basis vectors e 1 = (1, 0) and e 2 = (0, 1). In terms of subsection 3.1, we have We choose N c = R c = Re 2 . Let π 1 and π 2 be the projections on Re 1 and Re 2 respectively. Then π N = π R = π 1 and π c We shall assume that the smoothness class B contains C s , where s > 0 is such that B contains at least C 4 . For sufficiently small ε > 0, theorem 3.1 ensures the existence of a B -smooth map Φ and functions µ, τ , f , ρ 1 , ρ 2 , ρ 3 , such that ρ i C 3 ≤ Cε, and such that the following hold.

3.4.4.
Quasi-periodic Hopf bifurcations. At parameters for which the normal frequencies of an invariant m-torus are located on the imaginary axis, quasi-periodic Hopf bifurcations can occur. The full normal form analysis is not given here, but it runs along entirely standard lines. From the normal partÂ of the frequency map, we obtain the conditions and D(τ 1 , σ) = detÂ = −2τ 1 + τ 2 1 η 7 + ϕ 7 > 0. Note that necessarily at all quasi-periodic saddle-node bifurcation points σ * , with corresponding τ * Solving equation (22) for τ 1 , we obtain substitution in (21) yields the locus of the quasi-periodic Hopf bifurcation points as those parameter values σ suchΩ(−σ 2 + ϕ 8 , σ) in normally Diophantine, for which as long as D(−σ 2 + ϕ 8 , σ) > 0. This yields with q 1 = O(|y|) and q 2 = O(|y| 2 ). The perturbation term P will be written as it satisfies P B ≤ γ 0 ε 0 . In the following the vector field X + P shall be denoted byX. After scaling the time by t = γ 0 t , it may be assumed that the Diophantine condition ND c is of the form ND c (1, γ/γ 0 , κ), and that P B = ε < ε 0 . Note that the frequency map Ω(p) = Ω 0 +Ω is a linear function of p.

Multiple normal eigenvalues.
For the following remarks, cf. [20,7]. In order to motivate the definition of the parameter domains below, we need some estimates on the parameter dependence of eigenvalues in the case that the matrix A 0 has multiple eigenvalues. Let f be the characteristic polynomial of A(p) = A 0 +Ā; that is, If λ ∈ C is an -fold zero of f (z, 0), then by the Weierstraß preparation theorem (see for instance [21], p. 155), there are unique analytic functions q(z, p), a i (p), defined in a neighbourhood of (z, p) = (λ, 0), such that q(λ, 0) = 0, a i (0) = 0 for i = 0, · · · , n − 1, and The function g(z, p) = 1/q(z, p) is defined in a, possibly smaller, neighbourhood of (λ, 0), and For z ∈ C N , introduce the norm The functions z k satisfy for some C > 0. To see this, assume (as we may) that U is the common domain of definition for the functions a i (p) and z k (p). Since the a i (p) are analytic and satisfy a i (0) = 0, there is a constant C > 0 such that |a i (p)| < C |p| on U . For |p| < 1/( C ) and |z| ≥ 1, it follows that consequently |z k (p)| < 1 if |p| < 1/( C ), and then f λ (z k (p), p) = 0 implies that In turn, this implies inequality (24). We conclude that the eigenvalues of A(p) are Hölder continuous as a function of p. The Hölder exponent is equal to 1/ , where is the largest multiplicity of an eigenvalue of A 0 . Moreover, for all p such that the eigenvalues of A(p) are all different, they depend analytically on p.

Parameter domains.
Define the distance d between two points x, y ∈ C N as d(x, y) = |x − y|, where the norm |.| has been introduced in equation (23).
Define the open complex strip U + r of width r around a set U by Let {d j } be a given sequence of positive real numbers, monotonically increasing towards infinity. Let the set nd j c ⊂ R m × R n of normally Diophantine frequencies be the set of vectors (ω, α), where α is of the form are satisfied for all (k, ) ∈ Z m × Z n for which 0 < |k| ≤ d j , | | ≤ 2. For given Ω = (ω, A) ∈ T m × g, let α A be the vector of imaginary parts of eigenvalues of A. Introduce the set ND j c ⊂ T m × g of normally Diophantine Ω = (ω, A) by requiring that their frequency vectors (ω, α A ) are normally Diophantine.
Furthermore, if {ρ j } is a positive sequence that decreases monotonically to 0, let ND j c (ρ j ) ⊂ T m × g be the set of Ω = (ω, A) such that their frequency vec- Finally, introduce P(ρ j ) = p ∈ P | Ω(p) ∈ ND j c (ρ j ) , and note that P(ρ j+1 ) ⊂ P(ρ j ) and ∩ ∞ j=1 P(ρ j ) = P . Take p ∈ P andp ∈ P\P(ρ j ). Recall from subsubsection 4.1.1 that the normal eigenvalues α A (p) are Hölder continuous with Hölder exponent , where is the highest algebraic multiplicity of an eigenvalue of A 0 . Then and |p − p| > ρ j /C . As a consequence, we have that . Let in this case U be the complex neighbourhood V + h of V ; otherwise, if no analytic extension ofX to a complex neighbourhood of V exists, let U be equal to V .
For analytic functions on some complex open set O, the norm

4.2.
Structure of the proof. One of the main technical problems of the proof is to deal with the smoothness of the vector fieldX in the non-analytic cases. We shall work with analytic approximations: in the first part of the proof a sequence of analytic vector fields {X j } is constructed, whereX j is defined onD j , which tends toX in an appropriate sense.
In the second part of the proof, coordinate transformations Φ j , "modifying terms" vector fields Λ j and auxiliary vector fields X j , ∆ j and∆ j are constructed inductively by the following "staircase construction".
To set up the induction, choose as the identity (Φ 1 ) p (x, y) = (x, y), Λ 0 = 0, and X 1 as the restriction ofX 1 to D 1 . Note that due to the assumptions r 1 ≤r 1 and ρ 1 ≤ρ 1 , we have that D 1 ⊂D 1 , so that Φ 1 is well-defined. At the beginning of the induction step, assume that an embedding a domainD j of the form (27), and an integrable vector field Λ j defined onD j and another vector field X j defined on D j are already determined. During the induction step, an embedding Ψ j : D j+1 → D j and vector fields ∆ j on D j+ 1 2 , and∆ j and Λ j+1 on Φ j (D j+ 1 2 ), are constructed simultaneously, such that the following two properties hold. First, the vector fields Λ j+1 and∆ j = Λ j − Λ j+1 = Φ j * ∆ j are integrable. Second, the vector fieldX j defined on D j+1 that satisfies Ψ j * Xj = X j + ∆ j has the property that its normal linear part NX j is much closer to L Ω than NX j , in a sense that will be made precise below. Note that, unlike the vector field∆ j , the vector field ∆ j need not and in general will not be integrable.
The coordinate transformation Φ j+1 is then obtained by setting With the knowledge of Φ j+1 , the domainD j+1 is determined by (27), and the vector field X j+1 is determined by setting Finally, we show that the limits Remark 3. Necessary for these constructions is that for all j: The first inclusion ensures that the vector field X j is defined on D j , and the second ensures that Λ j+1 is defined on all ofD j+1 .

4.3.
Approximation. In order to construct analytic approximationsX j ofX on the complex domainsD j , a modified version of Zehnder's approximation technique (see [39]) is used, which gives explicit information on the growth of constants that depend on the degree of differentiability.  be a monotonically decreasing sequence of positive real numbers. For every η ∈ (0, 1), and for every j > 0, there exists an entire holomorphic function f j : C n → C, taking real values on real vectors, such that and here c 1 = 2 e 2 2 6 η 2n . If f is periodic in its argument, then every f j can be chosen to be periodic with the same periods.
The proof of lemma 4.1 follows [39] closely; the main difference is that C ∞ bump functions are replaced by Gevrey regular bump functions.
The construction of these bump functions is the content of the next lemma. Then in lemma 4.1 the approximating functions are constructed by convolving f with the inverse Fourier transform ϕ of Gevrey bump functionsφ. Estimates on the derivatives of the smoothed functions are obtained in terms of the derivatives ofφ. Finally, the smoothing is applied repeatedly in different directions.
for all x ∈ R, and all s ≥ 0.
Proof. The function ψ is constructed by repeatedly convolving multiples of indicator functions (see e.g. [22]). Introduce a k = c (k + 1) −1−η and choose c such that ∞ k=0 a k = 1. Since The convolution u * v of two integrable functions u, v : R → R is given by Using the sequence a k , define a sequence of functions u k = H a0 * H a1 * · · · * H a k , and note that R u k dx = 1 since R H a dx = 1. It follows from theorem 1.3.5 of [22] and the fact that a k = 1 that the sequence {u k } converges uniformly to a smooth function u : R → R with support in [0, 1], which is such that R u dx = 1 and , it is odd, and R v dx = 0. Hence, its primitive is even, vanishes for all x in the complement of [−2, 2], and satisfies ψ(x) = 1 for |x| ≤ 1. Moreover, for s ≥ 1, and ψ ∈ G 1+η . Using c ≥ η 1+η ≥ η/2 for 0 < η < 1 yields the lemma.
We can now prove lemma 4.1. The proof consists of three parts: first we define holomorphic approximations f j of f ; then we show that these converge to f as j → ∞, and finally we demonstrate the bound on the difference |f j − f j−1 |.
Proof. Letφ be equal to the function ψ given by lemma 4.2, and let ϕ be its inverse Fourier transform, given as The functionφ is a Schwartz function, that is, |x| kφ(s) (x) is bounded for every k, s > 0; as the Fourier transformation interchanges differentiation and multiplication with a mononomial, the transformed function ϕ is a Schwartz function as well, and hence ϕ and all its derivatives are integrable. Moreover, sinceφ is even, the function ϕ maps R onto itself, it satisfies R ϕ dx =φ(0) = 1, and asφ has compact support, the function ϕ can be continued analytically to an entire function ϕ on C.
For t > 0, introduce ϕ t (x) = tϕ(tx); note that for every t > 0 the function ϕ t has the same properties as those stated for ϕ in the previous paragraph. For every bounded continuous real-valued function f on R, the analytic smoothing S t f of f is defined by The analytic smoothing of f is an entire holomorphic function on C, taking real values on real arguments. It is easy to verify that if f is periodic, then so is S t f , and for functions f with bounded derivatives, smoothing commutes with differentiation: for s ∈ N with s < r we have S t f (s) = (S t f ) (s) . The holomorphic approximations f j are defined as where {ρ j } is the given monotonic sequence. Let s ∈ [0, r), and introduce g = f [s] , where [s] is the largest integer smaller than or equal to s. We wish to show convergence of S t g to g as t → ∞ in the C α -norm, where 0 < α = s − [s] < r − [s] = β. For this, note that g ∈ C β (R). Fix δ > 0 arbitrarily.
For h > δ, we have that For the first inequality, we used the fact that ϕ t is even. The last inequality follows by choosing t so large that the integral on the one but last line is made smaller than δ β .

FLORIAN WAGENER
For 0 < h ≤ δ, the following straightforward estimates hold: As δ > 0 was arbitrary, f − S t f C s → 0 as t → ∞. This shows the first clause of lemma 4.1.
We need an explicit bound of χ s (y) for all |y| < 1. Using (31) and the fact thatφ is the Fourier transform of ϕ yields: The inequality follows since the support ofφ is contained in [−2, 2]. By splitting the domain of integration over x, noting that the integrand is even in x, and repeated partial integration over ξ, the following estimate is obtained: Restricted to the support ofφ, the integrands are estimated using (29) and 0 ≤ η ≤ 1, which yields Let ρ j be as in the statement of the lemma, and set f j = S ρ −1 j f . Combining the estimate (34) with (33) yields It follows immediately from (30) and the definition of χ s (y) that

THE PARAMETRISED MODIFYING TERMS THEOREM 23
Consequently Consider now a C r function f : R n → R. To ease notation, let S i j denote the smoothing operator S ρ −1 j in the direction of x i . Introduce f j = S 1 j · · · S n j f , and estimate The second inequality follows from (33) and the fact that the smoothing operators in the different directions commute, and the final inequality follows from equation (35). As χ 0 (0) ≤ 2 11 /η 2 , we obtain that This implies the lemma in the general case. and f (x) = px − g(p) for somex > 0. If equality holds and f is differentiable atx, then p = f * (x).
The function g is also convex. Moreover, if lim x→∞ f (x)/x = ∞, then the gradient f (x) = p of f tends to infinity as x → ∞, g(p) is defined for all p > 0, and lim p→∞ g(p)/p = ∞ as well.
As an example we calculate the Legendre transformation of f (x) = a e bx − cx, which will be needed later. Since f is differentiable, Solving for x yields that x = (1/b) log((p + c)/(ab)). We find g by substitution: In general, if f is convex, left-and right-hand limits of the derivative f exist at every point x > 0. The interval ∂f (x 0 ) = [lim x↑x0 f (x), lim x↓x0 f (x)] is called the subgradient of f . If the graph of f has a corner, that is, if x 0 is such that the subgradient ∂f (x 0 ) has nonempty interior, then for p ∈ ∂f (x 0 ): Let {f n } ∞ n=0 be an increasing sequence of real numbers. The largest convex put differently, f * is that function for which the epigraph We obtain from lemma 4.1 that there exists a sequence of entire holomorphic functions f j , converging to f in every C s -norm, and such that Let 0 < η < 1 be a given constant, and let c 1 be as in lemma 4.1. Let λ : [0, ∞) → R be any strictly increasing convex function, such that for s ∈ N λ(s) ≥ λ s = log C 0 + s log C 0 + η log s! + log M s .
Note that we could take for λ the largest convex minorant λ * of the sequence {λ s }, since for every other function λ satisfying the conditions we have λ(s) ≥ λ * (s). It is however convenient, when dealing with the Gevrey class, to be able to work with differentiable functions λ.
Recall that the domains D j are defined in terms of the decreasing sequence {r j } in (26). and for any vector fieldX ∈ X M h , there is a sequence of approximating holomorphic vector fieldsX j such that h , which holds for every s ∈ N. The left hand side of the inequality does not depend on s.
Let s j be the smallest value of s such that the right hand side of the inequality is minimal, that is, such that for all s ≥ 0: . Taking logarithms of this inequality yields that λ(s) ≥ log b j + s log r −1 j ; moreover, equality holds if s = s j . This is exactly the formulation of the Legendre transform, and we find that . Since lim s→∞ λ(s)/s = ∞, we have that lim p→∞ Lλ(p)/p = ∞. Making use of the fact that {r j } is a decreasing geometric sequence, we find for fixed s ∈ N that For the Gevrey class G µ h , the constants M s equal (s!) µ with µ > 1, and f can be taken equal to λ(s) = (µ + η)s log s + (s + 1) log c 1 , for some η > 0. We find where C = (µ + η) e −1−log c1/(µ+η) . Consequently 4.3.4. Application. Recall from subsection 4.1, that the vector fieldX can be written in the formX = Z + P = L Ω +Q + P, where L Ω = ω(p) ∂ ∂x + A(p)y ∂ ∂y , and whereQ = q 1 (y, p) ∂ ∂x + q 2 (y, p) ∂ ∂y is integrable and such that NQ = 0. The map L Ω is real analytic; the (vertical) vector fieldsQ and P are in the smoothness class X (M × P).
In the case thatX is itself real analytic, takeX j =X for all j.
For the other cases, lemmas 4.1 and 4.3 yield a sequence {b j }, which is determined only by the smoothness class, and holomorphic vector fieldsQ j and P j ofQ and P respectively, defined onD j , that satisfy Here b j = c s r s j in the case that B = C s , and b j is given by lemma 4.3 if B = C M s . Note that in general the normal linear part NQ j ofQ j will not vanish identically.
In both cases, defineX j = L Ω +Q j + P j , and note that the vector fieldsX j are holomorphic and tend toX as j → ∞. This concludes the first stage of the proof. 4.4. The induction step. This subsection treats the second stage of the proof, the inductive construction of the embedding Φ j+1 and the vector fields X j+1 , ∆ j and∆ j . At the beginning of the construction, an embedding Φ j : D j →D j and a vector field X j on D j are given.
As sketched in subsection 4.2, the aim of the induction step is to construct an embedding Φ j+1 and an integrable vector field∆ j , such that the normal linear part of the vector field X j+1 that satisfies is "much" closer to L = L Ω = ω ∂ ∂x + Ay ∂ ∂y than NX j . If X j is written as where Q j is such that NX j = L + R j and NQ j = 0, the 'distance' between NX j and L can be expressed by the size of R j . We shall demonstrate that |R j | j → 0 as j → ∞; moreover, the speed of this convergence is linked to the smoothness of the limiting embedding Φ ∞ = lim j→∞ Φ j .

Induction assumptions.
We begin by stating the induction hypothesis precisely. It is assumed that embeddings Ψ 1 , ..., Ψ j−1 , Φ 1 , ..., Φ j and vector fields X 1 , ..., X j , Λ 1 , ..., Λ j are already constructed as indicated in subsection 4.2. All embeddings and all vector fields are complex extensions of real analytic ones, taking real values when restricted to real vectors.
To formulate the assumptions, introduce maps ϕ i and ψ i by setting Φ i = id Di +ϕ i and Φ −1 i = id Φ(Di) + ψ i , and define maps (ϕ i ) p and (ψ i ) p taking values in T m × R n by setting ϕ i (x, y, p) = ((ϕ i ) p (x, y), 0) etc.
Hypothesis. There is a constant c ∈ (0, 1), not depending on j, such that and such that for all 1 ≤ i ≤ j. Moreover, there is a constant C > 0, also not depending on j, such that for R j and Q j as in (40), Finally, the vector fields∆ i are integrable for all 1 ≤ i ≤ j − 1.
Note that the hypothesis holds for the case j = 1, with X 1 =X 1 and Φ 1 = id D1 .

4.4.2.
'+' and '·'-notation. In order not to overburden the notation, so-called '+'notation will be used. All indices 'j' are dropped, and indices 'j + 1' are replaced by '+'. In this notation, the vector field X j + ∆ j defined on D j is written as X + ∆, defined on D.
In the estimates below, also the so-called '·'-notation will be used. Whenever s < · t is written, it is taken to signify s < M t, where the constant M does not depend on j.

Inclusion of domains.
According to the sketch of the proof given in 4.2, see in particular Remark 3, we should have thatD + ⊂ Φ(D 1 2 ) and Φ(D) ⊂D. We shall require a little bit more.
Recall that V is a bounded real neighbourhood of T = T m × {0}, and that U equals the complex neighbourhood V +h in the real analytic case, and V otherwise.
Proof. The second clause is immediate. The first clause is a direct consequence of the induction hypothesis; this can be seen as follows. For the first inclusion, take z = z 0 + z 1 ∈ U + r such that z 0 ∈ U and |z 1 | < r. Then by the mean value theorem, there is ϑ ∈ (0, 1) such that for z ϑ = z 0 + ϑz 1 : Since |z 1 + ϕ p (z ϑ )z 1 | < r + cr, the condition r + 2cr <r implies that Φ p (z) ∈ Φ p (U ) +r − cr.
To see the second inclusion, take z = z 0 + z 1 ∈ U + r such that z 0 ∈ U , z is on the boundary of U +r 1 2 and that the norm |z 1 | of z 1 is minimal, and therefore equal to |z 1 | = r 1 2 . With the same notation as before, again (46) holds. We conclude that the distance from Φ p (z) to Φ p (U ) is bounded from below by . Consequently any point in the set Φ p (U ) +r + is necessarily in the interior of the set Φ p (U + r 1 2 ). We shall assume that {r j }, {r j }, {ρ j } and {ρ j } are decreasing geometric sequences; in particular, we set for some 0 < a 1 , a 2 < 1, and writer j =r 0 a j 1 , r j = r 0 a j 1 , etc. In terms of these constants, the inequalities (44) are equivalent to 1 + 2c <r Necessarily the constant c should be so small that 1 + 2c 1 − c < 1 2a 1 + 1; note that for any given a 1 , such a c > 0 exists, since the left hand side of this inequality tends to 1 as c ↓ 0. the 'modifying terms' δ(p) ∈ C m , b(p) ∈ C n and B(p) ∈ g C taking real values on real vectors. Note that since Λ is integrable, the vector field Λ + = Λ +∆ will be integrable as well.
The vector field ∆ on D is the image of∆ under the inverse of the already known map Φ; it can be written in the form where |Θ| ≤ |Dψ| Φ(D) (|δ| + |b| + |B|).

4.4.5.
Form of the conjugacy. The new conjugacy Φ + will be of the form Φ + = Φ•Ψ; given Ψ, introduceX = Ψ −1 * (X + ∆). The conjugacy Ψ is taken as the time-1 map e −Y of a real analytic average-0 vector field −Y ∈ h, defined on D and written as Requiring Y to be of average-0 (over T m ) is equivalent to require the coefficient functions to satisfy [u] T m = 0 and

Recall that the Lie bracket of two vector fields
∂ ∂y is given as We have Ψ −1 = exp(Y ) anď where The coefficient functions u, v 0 and v 1 of Y will be chosen as trigonometric polynomials in x.
For any vertical vector field Z on D, introduce the Fourier trunctation T d Z to order d. That is, if Z = k∈Z m Z k (y, p) e i k,x , let

THE PARAMETRISED MODIFYING TERMS THEOREM 29
The vector fields ∆ and Y are determined by the requirement that they annihilate the contribution of the term T d R inŽ; that is, they are taken to solve the homological equation Note that this is an equation in h. Under the assumption that (50) holds, using (49) and writingX = L +Ř +Q with NX = L +Ř, it follows thať In the next subsections, equation (50) is solved and estimates forŘ andQ are given.

4.4.6.
Determining the conjugacy. The techniques of solving the homological equation (50) are mostly well-known and only brief indications are given. However, the determination of the modifying terms δ, b and B requires some care.
Here f (x, p), g 0 (x, p) and g 1 (x, p) are trigonometric polynomials in x, taking real values on real vectors; the functionsδ,b andB are also trigonometric polynomials in x; moreover, they depend analytically on p as well as on (δ, b, B), and they satisfy estimates of the form the analytic functions q 1 and q 2 satisfy q 1 = O(|y|) and q 2 = O(|y| 2 ). Equation (50) can be split into three components: Here In the following, we set Equations (52)-(54) are solved in three steps. First v 0 and b will be determined from equation (53), as functions of (x, p, δ, B) and (p, δ, B) respectively. Then δ and B will be determined from equations (52) and (54), and finally u and v 1 are obtained from the same equations. Equation (53) is equivalent to the following relations between the Fourier coefficients of v 0 and g 0 : recall that v 00 = 0 since Y is average-0. The first equation is solved by using the implicit function theorem together with the estimates (42) and (51), which yields b =b(p, δ, B). Note that the estimate (59) below will imply that the second equation can be solved on D, and that it yields an analytic solutionsv 0k : for 0 < |k| ≤ d, and v 0k = 0 otherwise. Averaging equations (52) and (54) leads to where everywhereb(p, δ, B) is substituted for b. Applying the implicit function theorem again yields solutions δ = δ(p) and B = B(p). Substituting these inb andv 0k yields b(p) and v 0k (p). Finally equations (52) and (54) are solved for the case 0 < |k| ≤ d; this yields As before, estimate (59) and (60) imply that these solutions are bounded analytic functions.
Note that the vector field Y = u ∂ ∂x + (v 0 + v 1 y) ∂ ∂y is a linear combination of vector fields in h, and therefore Y ∈ h.
As mentioned, this implies that all formal solutions given above are in fact welldefined analytic functions.
Recall that Cramer's rule allows us to express the inverse of a matrix A as where A * is the adjoint of A, that is, the matrix whose (i, j)'th element is the minor of the matrix obtained from A by removing the i'th row and the j'th column. We have to invert the linear maps i k, ω I − A and i k, ω I − ad A . If λ i , i = 1, · · · , n are the eigenvalues of A, then the eigenvalues of these maps are i k, ω I − λ i and i k, ω I − (λ i1 − λ i2 ) respectively, for i, i 1 , i 2 ∈ {1, · · · , n}. The matrix elements of the adjoint to these maps contain terms with at most n factors k, ω in the first case, and n 2 such factors in the second case. Using Cramer's rule, and Rüssmann's technique to obtain optimal estimates (cf. [31,32]), for b and v 0 the following inequalities are obtained: Using these, a second application of Cramer's rule and Rüssmann's estimates yields for u, v 1 , δ and B: The factor (r−r 1 2 ) in the denominator of the estimates of δ and u is due to estimating the derivative of q 1 with respect to y, and the factor r in the denominators of estimates of B and v 1 is due to the fact that g 1 is the derivative of T d R with respect to y, evaluated at y = 0. In the same estimates the factors (r 1 4 − r 1 2 ) stem from derivatives of v 0 and q 2 , respectively. Finally note that the relation r ϑ − r ϑ < · r has been used repeatedly, for ϑ − ϑ ≥ 1 4 . The estimates can be combined in 4.4.8. Mapping of domains. The following result is needed in the estimates below.
have not yet been fully specified; only a number of conditions -(47), (48), (69)have been given which they have to satisfy. We give here for every statement in the induction hypothesis sufficient conditions. Condition (41) is vacuous if j = 1. We have to show that it holds for i = j, if the induction hypothesis is satisfied for i < j; that is, we have to show that |Ψ − id| + < cr + . It follows from (48) and (63) that Therefore (41) is certainly satisfied if for given c, this condition can always be satisfied if r 0 is chosen sufficiently small. Condition (42) is for j = 1 trivially satisfied, since Φ 1 = id and ϕ 1 and ψ 1 vanish identically. For i = j + 1 the condition can be written as Details are given only for the estimate of Dψ + , the others being easier. Note first that . By placing the condition (1 + c)r + < r 15 16 on c, or equivalently, by demanding that we ensure that Φ −1 maps the domain D + inside D 15/16 ; now we can estimate the derivative of Ψ −1 − id on this domain.
The first part of condition (43) is satisfied for j = 1 if the size ε of the initial perturbation is sufficiently small; the second part can be satisfied by choosing r 0 sufficiently small. To show that these conditions hold for j + 1, if they hold for j, is the subject of the next subsection.
Excepting this last verification, we have the following conditions on the sequences {r j }, {r j }, {ρ j }, {ρ j }, {d j } (conditions (47), (48), (69) and (75), together with the condition that ρ 0 > 0 is small enough to imply d 1 > 2: If r 0 and a 1 are given, then a 2 , c andr 0 can always be found such that these inequalities all hold. Note therefore that we are always free to choose r 0 and a 1 , provided r 0 > 0 and 0 < a 1 < 1.

4.5.
Smallness of the remainder term. The sequences r j = r 0 a j 1 and ρ j = ρ 0 a j 2 have now to be determined in such a way that |R j+1 | j+1 |R j | j ; in the next subsection, this will be shown to ensure that the embeddings Φ j = Ψ 1 • · · · • Ψ j−1 converge to an embedding Φ ∞ that has the properties stated in theorem 2.3. Note that from this point onwards, the '+'-and '·'-notations are dropped.
Inequality (72) reads then as where the constant C does not depend on j. Recall that the truncation level is defined in (58), which reads as We introduce ε = P B . There are several cases, depending on the smoothness class of the original perturbed vector fieldX = X + P . IfX is real analytic, thenX j =X for all j, and the first term in (76) vanishes. IfX fails to be real analytic, there is an approximating holomorphic sequenceX j satisfying |X j+1 −X j |D j+1 ≤ εb j+1 , where the b j are given by lemmas 4.1 or 4.3. In particular, ifX is Gevrey regular, approximations can be found for which the quantity log 1/b j increases exponentially in j. IfX is not Gevrey, but still in some Carleman class, then log 1/b j increases slower than exponentially, but faster than any linear function in j. Finally, in the finitely differentiable class, the sequence log 1/b j increases linearly with j.
For each of these four cases, a sequence {δ j } will be determined that decreases monotonically towards 0, such that, under appropriate conditions, |R j | j ≤ δ j for all j ∈ N.
First, we make some definitions that will hold for several of the cases considered below. If a 1 ∈ (0, 1) is fixed, we choose a 2 ∈ (0, a κ+1 1 ) such that With this choice, and setting we have that Lemma 4.6. LetX ∈ X ω h be real analytic. If ε 0 > 0, r 0 > 0 and ρ 0 > 0 are sufficiently small, and if δ j = ε e −β j for 0 < ε < ε 0 , then (78) holds for all j.
Proof. Recall that 0 < ε < ε 0 . We proceed by induction. It is given that |R 0 | 0 ≤ ε. With the induction assumption |R j | j ≤ ε e −β j , inequality (76) reads as where c 0 = r 0 d 0 (1 − a 1 )/8. For given sequences d j and r j , the first term in this sum can be made smaller than 1/2 by choosing ε 0 > 0 sufficiently small. The second term is of the form e f (j) , where f (x) = log C 0 + x log α − Aβ x , with A = c 0 + 1 − β and α only depending on a 1 , a 2 , κ, m and n, but not on r 0 and ρ 0 . Computing f , we see that this concave function, restricted to x ≥ 0, takes its maximum at if log α/ log β ≥ A, otherwise at x * = 0. If we take ρ 0 sufficiently small, and, by (79), consequently c 0 and A sufficiently large, the second case occurs; the value of the maximum is then f (0) = log C 0 − A. It follows that Note that by fixing r 0 and taking ρ 0 sufficiently small, again by invoking (79) the right hand side can be made smaller than 1/2. It follows that we can make the right hand side of (80) smaller than 1, uniformly in j, by taking ε 0 > 0 and ρ 0 sufficiently small.
Proof. The proof resembles that of the previous lemma. Using (38), inequality (76) reads as |R j+1 | j+1 ε e −β j+1 < C e −β j C r where c 0 = r 0 d 0 (1−a 1 )/8. The first term can be made smaller than 1/3 by taking r 0 sufficiently small. It follows exactly as in the proof of lemma 4.6 that if ε > 0, r 0 > 0 and ρ 0 > 0 are sufficiently small, the other two terms are both smaller than 1/3, making the right hand side is smaller than 1, uniformly in j.

4.5.3.
Case three: ultradifferentiability. IfX is in the Carleman class X M , that is, if it is infinitely differentiable but not Gevrey regular, let {λ s } be the sequence given in (37), and let λ * : [0, ∞) → R be its largest convex minorant. We construct a function g M as follows. Let g 0 = Lλ * (0) and let g j = min βg j−1 , Lλ * log r −1 j .
Finally, let g M be the convex function whose epigraph equals the convex hull of the points (log r −1 j , g j ) and the half-line {(0, g 0 + t) | t ≥ 0}. Then g M is a convex minorant of Lλ * , which moreover satisfies g M (log r −1 j+1 ) ≤βg M (log r −1 j ). Since g M is a minorant of Lλ * , it follows from lemma 4.3 that there is an approximating sequenceX j such that |X j+1 −X j |D j+1 < ε e −g M (log r −1 j ) .
The sequence {σ j } given by σ j = g M (log r −1 j−1 ) has by construction of g M the property that σ j+1 < βσ j for all j. Note that it follows from lemma 4.3 that σ j increases faster than any linear function of j.
If we choose C 2 = 3C, the first term is at most equal to 1/3. By the choice of the σ j , we have σ j+1 − 2σ j < (β − 2)σ j < 0; as a consequence, the second term on converges absolutely and uniformly on the intersection D ∞ = ∞ j=1 D j , and the limit Φ ∞ is therefore at least continuous there.
This inequality describes exactly the anisotropic differentiability (in the sense of [28]) of the conjugacy in the presence of a multiple normal eigenvalue of multiplicity . We find that for all α satisfying this inequality that 4.6.2. A lemma. We need the following result a couple of times. The result follows from this.