Hypoelliptic multiscale Langevin diffusions: Large deviations, invariant measures and small mass asymptotics

We consider a general class of non-gradient hypoelliptic Langevin diffusions and study two related questions. The first one is large deviations for hypoelliptic multiscale diffusions. The second one is small mass asymptotics of the invariant measure corresponding to hypoelliptic Langevin operators and of related hypoelliptic Poisson equations. The invariant measure corresponding to the hypoelliptic problem and appropriate hypoelliptic Poisson equations enter the large deviations rate function due to the multiscale effects. Based on the small mass asymptotics we derive that the large deviations behavior of the multiscale hypoelliptic diffusion is consistent with the large deviations behavior of its overdamped counterpart. Additionally, we rigorously obtain an asymptotic expansion of the solution to the related density of the invariant measure and to hypoelliptic Poisson equations with respect to the mass parameter, characterizing the order of convergence. The proof of convergence of invariant measures is of independent interest, as it involves an improvement of the hypocoercivity result for the kinetic Fokker-Planck equation. We do not restrict attention to gradient drifts and our proof provides explicit information on the dependence of the bounds of interest in terms of the mass parameter.

is one of the most classical equations in probability theory as well as in mathematical physics ( [18,11,26]). It describes, under Newton's law, the motion of a particle of mass τ in a force field f (q), q ∈ R n , subject to random fluctuations and to a friction proportional to the velocity. Here W t is the standard Wiener process (Brownian motion) in R n , λ > 0 is the friction coefficient.
In this paper we are interested in the case where the force field f (q) has multiscale structure and the magnitude of the random fluctuations are small. In particular, our starting object of interest is the second order hypoelliptic multiscale Langevin equation, where ε, δ 1 and δ = δ(ε) ↓ 0 as ε ↓ 0. Here, λ(q) > 0 is an inhomogeneous friction coefficient. Moreover, ε represents the strength of the noise, whereas δ is the parameter that separates the scales. We study the homogenization regime where ε δ → ∞ as ε, δ ↓ 0. It is well known that when τ ↓ 0, the solution to (0.1) approximates that of a first order equation. In particular, if λ is a constant, then in the overdamped case, i.e. when τ is small, the motion can be approximated by the first order Langevin equation (see for example [12] The situation is much more complex in the case that the friction coefficient depends on the position too, see [15,13]. In particular, in the setting of (0.1), the motion of q ε as τ ↓ 0 is approximated bẏ where α(q, r) = σ(q, r)σ T (q, r). Clearly, when λ(q) = λ = constant, (0.3) reduces to (0.2).
The first goal of this paper is to consider the large deviations behavior of the solution to (0.1) q ε in such a way that, when the mass is small, it is consistent with the large deviations behavior of the solution to the overdamped counterpart (0.3), or equivalently (0.2). In particular we want to investigate the conditions under which the tail behavior of (0.1) and of (0.3) agree, at least in a limiting sense.
It turns out that we get interesting non-trivial behavior when the mass τ relates to ε, δ in a specific way that will be explained in the sequel. For this reason we shall write τ ε in place of τ when we want to emphasize this dependence. We prove that if the mass of the particle τ scales appropriately with the order of the fluctuations and in particular if it is of order δ 2 /ε, i.e., if τ = m δ 2 ε with m small but positive, then the large deviation behaviors of the overdamped and underdamped systems agree. The large deviations result for (0.1) is given in Theorem 1.4 and the agreement in terms of the large deviations behavior of (0.1) and (0.3) is given in Theorem 1.8. In order to derive the large deviations principle we follow the weak convergence approach, see [6]. This framework transforms the large deviations problem to convergence of a hypoelliptic stochastic control problem. Due to the hypoellipticity one needs certain a-priori bounds that establish compactness, see [14]. We obtain an explicit form of the control (equivalently change of measure) that leads to the proof of large deviations upper bound in the multiscale hypoelliptic case. Even though we do not address this issue in the current paper, we mention that the explicit information on the optimal control can be used for the construction of provably-efficient Monte Carlo schemes in the spirit of the constructions done in [8,29] for the corresponding elliptic case.
Under the parameterizations τ = m δ 2 ε and when δ ε we derive the large deviations principle for {q ε , ε > 0}, where q ε solves (0.1), see Theorem 1.4. The large deviations rate function is derived in closed form and it depends on m. The next natural question is to derive that as m ↓ 0 the large deviations rate function converges to that of the large deviations principle for the overdamped case, i.e., for the solution to (0.3). However, to our surprise, we find that even in the case of constant diffusion the rigorous proof of such a convergence is highly involved. We prove such a convergence in the special case of diffusion coefficient σ(q, r) = 2Dλ(q)I, D > 0 (which is the parametrization of the fluctuation-dissipation theorem) and we include a discussion for the general variable diffusion coefficient case in Remark 1.9. This result supports the claim that the large deviations behavior of the multiscale second order Langevin diffusion and for its first order counterpart agree, see Theorem 1.8.
The second and related goal of this paper is to rigorously develop small mass asymptotics for the invariant measure, see Theorem 1.6 and for certain Poisson equations, see Theorem 1.7, that appear in the rate function of the large deviation principle (see Theorem 1.4) due to the homogenization effects. Our proof of the convergence as m ↓ 0 of the large deviation rate function requires a thorough analysis of the small mass asymptotic for the invariant measure of the fast motion corresponding to (0.1). In particular, since we will allow the drift term b(q, r) to be a general vector field rather than a gradient field, our proof of the convergence involves a non-trivial improvement of the hypocoercivity result for linear Fokker-Planck equation ( [30,Section 1.7], see also [5]). If b(q, r) is not a gradient field, then certain operators that appear in the analysis are not anti-symmetric. This implies that extra terms appear that need to be appropriately handled. Then making use and extending the hypocoercivity results of [30], we prove that the invariant measures corresponding to the m > 0 case, converges in L 2 to the invariant measure corresponding to the m = 0 problem. Here we make use of the ((·, ·)) inner product introduced in [30] and we combine the different terms in such a way that the desired bounds follow. To accomplish this goal in the general non-gradient case, we use the structure of the hypoelliptic operator in an effective way.
Using the convergence of the invariant measure as the mass parameter goes to zero and Poincaré inequality, we also prove that the solution to the related hypoelliptic Poisson equation converges to the appropriate elliptic Poisson equation (the so-called "cell problems") in the appropriate L 2 sense as the mass parameter goes to zero. These Poisson equations appear due to the homogenization effects of the drift b(q, r). In addition to that, the proof provides a rigorous justification of the corresponding multiscale expansion of the solutions of the corresponding equations in powers of √ m as m ↓ 0.
Related heuristic, i.e., without proof, asymptotic expansions can be also found in [25]. We would like to emphasize that our method of proof allows to obtain upper bounds for the norms of interest with detailed dependence on the parameters of interest, such as the mass of the particle.
Partial motivation for our work comes from chemical physics and biology, and in particular from the dynamical behavior of proteins such as their folding and binding kinetics. As it has been suggested long time ago (e.g., [19,32]) the potential surface of a protein might have a hierarchical structure with potential minima within potential minima. As a consequence, the roughness of the energy landscapes that describe proteins has numerous effects on their kinetic properties as well as on their behavior at equilibrium.
One of the first papers that used a simple model with two separated time scales to model diffusion in rough potentials is [32]. The situation usually investigated [19,32,9] is based on the first order equation (0.2) even though the physical model and what is many times used in molecular simulations is the more complex second order Langevin equation that involves both position and velocity, see for example [20] and would also usually include more than two separated time scales. The usual choice of coefficients 2DλI, where k β is the Boltzmann constant and T is the temperature, in such a way that the fluctuation-dissipation theorem holds. We remark here that our formulation is general and includes the parametrization suggested by the fluctuation-dissipation theorem as a special case. Notice that the choice of the separable drift b(q, q/δ) = −∇Q(q/δ), represents the motion of a massless particle in a rough potential εQ(q/δ) + V (q). In particular, the model of interest in this case becomeṡ The questions of interest in [32,9] are related to the effect of taking δ ↓ 0 with ε small but fixed. This is almost the same to requiring that δ goes to 0 much faster than ε does, which is the regime that we study in this paper.
The related mathematical literature is quite rich. For the related hypocoercivity theory the reader is referred to [30]. For the case δ = 1, the large deviations principle of the solutions to (0.1) and (0.2) as ε ↓ 0 is being compared in [4]. For the case ε = 1, periodic homogenization for a special case of (0.1) (in particular when c(q, r) = 0 and b(q, r) = b(r)) has been addressed in [14]. Also, when ε = 1 random homogenization for (0.1) when c(q, r) = 0 and the special case of gradient drift b(q, r) = −∇Q(r) has been addressed in [2,24]. More is known about the overdamped case (0.2), see [7,17,21,28] where homogenization and large deviation results for the solution to equations of the form (0.2) are obtained under different relations between ε and δ, in both periodic and random environments. The rest of the paper is structured as follows. In Section 1 we formulate the problem, our assumptions and the main results of this paper in detail. In Sections 2-3 we prove the large deviations principle for the hypoelliptic problem. In Sections 4-6 and in the Appendix we exploit the small mass asymptotics.
In particular, using the weak convergence approach we turn the large deviations principle into a law of large numbers for a stochastic control problem. Section 2 proves the convergence of the controlled stochastic equation and Section 3 proves the convergence of the cost functional, which is the Laplace principle. In Section 4 we prove the small mass limit of the rate function in the diffusion σ(q, r) = 2Dλ(q)I case, using the convergence of the invariant measures as m → 0 (Section 5) and of the related "cell problems" that are auxiliary Poisson equations that appear in the rate functions due to homogenization effects (Section 6). We emphasize that Section 5 is of independent interest as it is an extension of the hypo-coercivity result for the linear kinetic Fokker-Planck equation [30, Section 1.7], since we do not restrict our attention to drifts that are of gradient form. The method of proofs also yields explicit decay rates of the norms of interest with regards to parameters of interest such as the mass of the particle. Most of the proofs to technical lemmas are deferred to the Appendix.

Problem formulation, assumptions and main results
In this section, we formulate more precisely the problem that we are studying in this paper, we state our main assumptions and our main results. In preparation for stating the main results, we recall the concept of a Laplace principle. Definition 1.1. Let {q ε , ε > 0} be a family of random variables taking values on a Polish space S and let I be a rate function on S. We say that {q ε , ε > 0} satisfies the Laplace principle with rate function I if for every bounded and continuous function h : If the rate function has compact level sets, then the Laplace principle is equivalent to the corresponding large deviations principle with the same rate function (see Theorems 2.2.1 and 2.2.3 in [6]). Hence, instead of proving a large deviations principle for {q ε } we prove a Laplace principle for {q ε }.
Our main regularity assumption in regards to the coefficients of (0.1) is given by Condition 1.2.
Condition 1.2. The functions b(q, r), c(q, r), σ(q, r) are 1. periodic with period 1 in the second variable in each direction, and 2. C 1 (R d ) in r and C 2 (R d ) in q with all partial derivatives continuous and globally bounded in q and r.
Using the parametrization τ = m δ 2 ε , the system being considered is t we obtain the following system of equations which we also supplement with initial conditionṡ Condition 1.2, guarantees that (1.1) and (1.2) too, have a unique strong solution; this is a classical result, see for example [12] or Theorem 5.2.1 of [23]. The infinitesimal generator for the (q, p) process satisfying (1.2) is given by where we recall that α(q, r) = σ(q, r)σ T (q, r). We can assume that p o is a random variable, as long as it is independent of the driving Wiener process W t and as long as E e Sometimes, we may write X ε t = (q ε t , p ε t ). Let | • | be the Euclidean norm in R d and introduce the control set The result in [3] gives the following representation Here the process q ε t is the q-component of the hypoelliptic controlled diffusion process X ε t = (q ε t , p ε t ): Let us define now the operator For each fixed q, the operator L m q defines a hypoelliptic diffusion process on (p, r) ∈ Y = R d × T d . Let µ(dpdr|q) be the unique invariant measure for this process. Notice that L m q is effectively the operator corresponding to the fast motion. The following centering condition is essential for the validity of the results. Condition 1.3. We assume that for every q ∈ R d Y b(q, r)µ(dpdr|q) = 0.
Let us consider the preliminary cell problem It is clear that the solution to (1.4) Φ depends also on q, but we sometimes suppress this in the notation for convenience. By the work of [14], we know that under Condition To support the claim that the particular parametrization is consistent with the large deviations principle of the overdamped case (0.3), we need to prove that lim m→0 S m (φ) = S 0 (φ), where S 0 (φ) is the rate function associated to (0.3). To that end, we recall the corresponding large deviations result from [7].
Let µ 0 (dr|q) be the unique invariant measure corresponding to the operator α(q, r) : ∇ 2 r equipped with periodic boundary conditions in r (q is being treated as a parameter here).
By Theorem 1.6, Condition 1.3 implies the following centering condition for the drift Under this centering condition, the cell has a unique bounded and sufficiently smooth solution χ = (χ 1 , ..., χ d ). After these definitions we recall the result from [7] that will be of use to us.
For the small mass, i.e., m → 0, asymptotic that follow, we assume that σ(q, r) = 2Dλ(q)I, D > 0, i.e., we assume that the noise is such that we are in fluctuationdissipation balance. In this case, for a function f ∈ C 2 (Y), we have We denote by µ(dpdr|q) = ρ m (p, r|q)dpdr the invariant measure corresponding to the operator L m q . Also, let us write µ 0 (dr|q) = ρ 0 (r|q)dr for the invariant measure corresponding to the operator L 0 q . Let us also define π(dp) = ρ OU (p)dp to be the invariant measure on R d for the Ornstein-Uhlenbeck process with generator A. With this notation, let us write ρ m (p, r) = ρ m (p, r)ρ 0 (p, r), where ρ 0 (p, r) = ρ OU (p)ρ 0 (r), suppressing the dependence on q.
Then, in Sections 5 and 6 respectively we prove the following Theorems which constitute the second main result of our paper. Theorem 1.6. Let Condition 1.2 hold and assume that σ(q, r) = 2Dλ(q)I, D > 0. Then, for every q ∈ R d , we have Using then Theorems 1.6 and 1.7 we prove in Section 4 that the rate function S m (φ) converges S 0 (φ), as m ↓ 0. holds. When, the diffusion coefficient σ is not a multiple of the identity matrix, then the operator A is not the classical Ornstein-Uhlenbeck that has the Gaussian measure ρ OU (p)dp ∼ e − |p| 2 2D dp as its invariant measure. Some of our technical lemmas use this explicit structure in order to derive the necessary estimates. However, since the spirit of the proof does not rely on this structure, we believe that this is only a technical problem.

Law of large numbers
In this section we study the limiting behavior of the solution to the control problem (1.3). It turns out that we need to consider the solution to (1.3) together with an appropriate occupation measure and then consider the limit of the pair. Let us be more specific now.
scales parameter. We introduce the occupation measure Let us define the function Definition 2.1 captures the notion of a viable pair as introduced in [7] which characterizes the required law of large numbers.
T ]) will be called viable with respect to (γ, L m q ) or simply viable if there is no confusion, if the following are satisfied. The function ψ t is absolutely continuous, P is square integrable in the sense that L m ψs g(p, r)P(dz, dpdr, ds) = 0 ; We write (ψ, P) ∈ V (γ,Lq) .
Let us apply Itô's formula to Φ p ε t , q ε t δ in (1.4) and use (2.7) to get a representation formula for q ε t as follows: Using this representation formula, Condition 1.2 and Theorem 3.3 of [14] (see also Appendix A), we can then establish that for every η > 0 This implies the tightness of the family {q ε • }. Tightness of the occupation measures {P ε,∆ , ε > 0} follows from the bound [6]. Notice that the last inequality in (2.8) follows by the uniform L 2 bound on the family of controls {u ε , ε > 0}.
In addition, as in Proposition 3.1 of [7], we can show that the family {P ,∆ , > 0} is Next, we prove that any accumulation point will be a viable pair according to Defini- Making use of (1.4) and (2.7) we get Combining the latter expression with (2.9) we get Due to the a-priori bounds from Appendix A the right hand side of the last display goes to zero in L 2 , which means that and, in probability, γ(q s , (p, r), z)∇f (q s )P(dz, dpdr, ds) → 0. (2.11) Relations (2.10) and (2.11) imply that the pair (q, P) solves the martingale problem associated with (2.4), which then proves that (2.4) holds.
Let us now analyze the different terms in (2.12). We start by observing that converges to zero uniformly. Hence, the left hand side of (2.12) converges to zero in probability as ε ↓ 0.
Let us next study the right hand side of (2.12). We have the following 1. Conditions 1.2, the L 2 uniform bound on the controls and tightness of {q ε , ε > 0}, imply that the first and the third term in the right hand side of (2.12) converge to zero in probability as δ/ ↓ 0.
2. The second term on the right hand side of (2.12) also converges to zero in probability, by the fact that δ/ ↓ 0 and uniform integrability of P ,∆ .
Thus, by combining the behavior of the different terms on the left and on the right hand side of (2.12), we obtain that we should necessarily have that = t follows from the fact that analogous property holds at the prelimit level, P(Z × Y × {t}) = 0 and the continuity of t → P(Z × Y × [0, t]) and (2.6) follows.

Laplace principle
The main result of this section is the following Laplace principle. During the proof of Theorem 3.1 we also establish the alternative representation of Theorem 1.4. with the convention that the infimum over the empty set is ∞. Then for every bounded and continuous function h Moreover, for each s < ∞, the set In other words, {q ε • , ε > 0} satisfies the Laplace principle with rate function S(•).
Proof of Theorem 3.1. The proof of this theorem borrows some of the arguments of the related proof of the LDP for the elliptic overdamped case of Theorem 2.10 in [7]. We present here the main arguments, emphasizing the differences. Part 1. [Laplace principle lower bound]. Theorem 2.2 and Fatou's lemma, guarantee the validity of the following chain of inequalities.
Hence, the lower bound has been established. Part 2.
[Laplace principle upper bound and alternative representation]. We first observe that one can write (3.1) in terms of a local rate function Here we set Z×Y L m q f (p, r)P(dz, dpdr) = 0, ∀f ∈ C 2 loc (Y) , Z×Y |z| 2 P(dz, dpdr) < ∞ and ν = Z×Y γ(q, (p, r), z)P(dz, dpdr) We can decompose the measure P ∈ P(Z × Y) into the form P(dz, dpdr) = η(dz|p, r)µ(dpdr|q) , where µ is a probability measure on Y and η is a stochastic kernel on Z given Y. This is referred to as the "relaxed" formulation because the control is characterized as a distribution on Z (given q and (p, r)) rather than as an element of Z. We now have, for every f ∈ C 2 loc (Y) and for every q ∈ R d , that Y L m q f (p, r)µ(dpdr) = 0 .
Here we have used the independence of L m q on the control variable z to eliminate the stochastic kernel η. Thus µ(dpdr) is the unique corresponding to the operator L m q , written as µ(dpdr|q).
Since the cost is convex in z and γ is affine in z, the relaxed control formulation is equivalent to the following ordinary control formulation of the local rate function (p, r), v(p, r))µ(dpdr) .
One can show as in [7, Section 5] that L r (q, ν) = L o (q, ν). Let us recall now the definitions of r m (q) and Q m (q) from Theorem 1.4. For any v ∈ A o q,ν we can write ν = Y γ(q, (p, r), v(p, r))µ(dpdr|q) Then, ν − r m (q) can be treated as β, and κ(q, (p, r)) = 1 √ m (∇ p Φ(p, r)) T (σ(q, r)) T , u(p, r) = (v(p, r)) T in Lemma 5.1 of [7]. We apply this lemma and then we get that for all This shows that and the minimum is achieved with the control given by (3.2). Now, that we have identified that the action functional can be written in the proceeding form we can proceed in proving the Laplace principle upper bound. We must show that for all bounded, continuous functions h mapping C( By the variational representation formula, it is enough to prove that lim sup To be precise, we consider for the limiting variational problem in the Laplace principle a nearly optimal control pair (ψ, P). In particular, let η > 0 be given and consider It is clear now that L o (x, ν) is continuous and finite at each pair (x, ν) ∈ R 2d . Hence, a standard mollification argument, allows us to assume thatψ is piecewise constant, see It is easy to see that Condition 1.2 guarantees thatū t is continuous in all of its arguments and that (1.3) has a unique strong solution with u t =ū t . Then, by Theorem 2.2, we obtain that in distributionq ε D →q, wherē γ q s , (p, r),ūψ s (q s , (p, r)) µ(dpdr|q s )ds.
Keeping in mind the definition of A o q,ψt and that ψ 0 = q o , we obtain that s ds = ψ t for any t ∈ [0, T ], with probability 1 .
Therefore, we finally obtain that Since η is arbitrary, we are done with the proof of the Laplace principle upper bound.
At the same time get the explicit form of the rate function  , r m (q) and Q 0 (q), r 0 (q) be as in Theorems 1.4 and 1.5 respectively. Then, for any η > 0, there exist some m 0 > 0 such that for every q ∈ R d and every 0 < m < m 0 we have Proof. For notational convenience and without loss of generality, we shall set σ(q, r) = 2I, D = λ(q) = 1. Since q is viewed as a parameter, we do not mention it explicitly in the formulas. We have Taking absolute value and using Cauchy-Schwarz inequality we obtain  The results (4.1) and (4.2) imply that there exists a uniform constant C such that sup m∈(0,1) and by classical elliptic regularity theory there exists a uniform constant C, clearly independent of m, such that

L 2 Convergence of the invariant density
In this section we prove Theorem 1.6. For notational convenience and without loss of generality, let us assume in this Section that D = λ = 1 and as a consequence that α(q, r) = 2I (recall σ(q, r) = 2Dλ(q)I). Since q ∈ R d is viewed as a parameter, it will not be mentioned explicitly.
Notice that in the case of gradient potential, i.e., when b(q, r) = −∇ r V (q, r), then (5.1) is immediately true even without the limit. In fact in this case we have that the invariant density is basically ρ m (p, r) = ρ OU (p)ρ 0 (r) for every finite m ∈ R + which implies thatρ m (p, r) = 1 completing the proof of (5.1). Our goal here is to show that this is true in the more general setting where the drift is not necessarily of gradient form.
By Condition 1.2 the drift b(q, r) and its partial derivatives are uniformly bounded with respect to q. For this reason we sometimes suppress the dependence on q and write b(q, r) = b(r). Also, for notational convenience, let us set h(r) = b(r) − ∇ r log ρ 0 (r) .
This definition for h(r) will also be used throughout the rest of the paper.
Notice that in the gradient case, i.e, when b(r) = −∇V (r), we have that h(r) = 0, but in the general case one has h(r) = 0. Let us next establish some useful relations Lemma 5.1. Let f, g be two functions that belong in the domain of definition of L m q . Then, we have the identity Y L m q f (p, r) g(p, r) + L m q g(p, r) f (p, r) ρ 0 (p, r)dpdr = In particular, we have that Lemma 5.2. Let f, g be two functions that are in W 1,0 2 (Y), i.e., the set where functions and their first derivatives with respect to p are in L 2 (Y). Then, there exists a finite constant K < ∞ that depends only on sup r∈T d |h(r)| such that h(r)p, f g L 2 (Y;ρ 0 ) ≤ K f L 2 (Y;ρ 0 ) ∇ p g L 2 (Y;ρ 0 ) + ∇ p f L 2 (Y;ρ 0 ) g L 2 (Y;ρ 0 ) . Lemma 5.3. For every η > 0, there exists constant constant K < ∞ that depends only on sup r∈T d |h(r)| such that

Lemma 5.5.
There is a universal constant K > 0 that depends on sup r∈T d |h(r)|, but not on m > 0, such that for all m sufficiently small There is a universal constant K > 0 that depends on sup r∈T d max(|h(r)| , |∇ r h(r)|), but not on m > 0, such that for all m sufficiently small Let us define L 1 to be the operator L m q with m = 1. We recall that where A = −p · ∇ p + ∆ p and B = p · ∇ r + b(q, r) · ∇ p . It is easy to check that, with respect to the measure ρ 0 (p, r)dpdr we can actually write that One can also check that the adjoint operator of B is formally given by Notice that the latter relation implies that B is antisymmetric only if h(r) = 0 which essentially is the case of gradient drift. However, in the general case h(r) = 0 which would imply that B is not antisymmetric. Next, we introduce the operator C = [A, B] = [∇ p , p∇ r + b(r)∇ p ] = ∇ r .
A word on notation now. In order to make the notation lighter we will write from now on · = · L 2 (Y;ρ 0 ) , and ·, · = ·, · L 2 (Y;ρ 0 ) , for the norm and for the inner product in the space L 2 (Y; ρ 0 ).
In order to show that (5.1) holds, we use the work of [30]. In particular, as in [30], let a, b, c be constants to be chosen such that 1 > a > b > c > 0 and let us define the norm In fact, as it is argued in [30], the norms ((f, f )) and f 2 H 1 (Y;ρ 0 ) are equivalent as Since, we are dealing with a real Hilbert space, all the inner products are real. By polarization we have One important difference between the current setup and the setup of [30] is that there B * = −B whereas here that is not the case, as we have B * = −B + ph(r). Keeping that in mind and repeating the argument of the proof of Theorem 18 in [30], we obtain that there are constants a, b, c that are sufficiently small such that 1 a b 2c with b 2 < ac (the exact same constants as in [30]) such that The bracket term of the right hand side of the inequality is due to the fact that in our case h(r) = 0 and thus B is not anti-symmetric. The bracket term is equal to zero in [30].
Let us now choose f = δ m in (5.2). The strategy of the proof is: (a) bound from below the bracket term on the right hand side of (5.2) using Lemmas 5.2-5.6 and the equation that δ m satisfies, and (b) bound from above the left hand side of (5.2) using Lemmas 5.2-5.6 and the equation that δ m satisfies. Putting the two bounds together one will then obtain a bound for δ m 2 H 1 which will give the convergence to zero of (5.1) that we need, combined with Poincaré inequality for the measure ρ 0 (p, r)dpdr.
We would like to highlight here that one of the obstacles in putting the lower and upper bounds together, are the order one terms f, L 1 f in the definition of ((f, L 1 f )) and f, Bf in the lower bound (5.2). However, as it turns out, see (5.6), for f = δ m , we actually have that L 1 δ m , δ m − Bδ m , δ m = o( √ m) which then allows us to proceed with the bounds. The rest of the terms are being handled via Lemmas 5.2-5.6. We start with obtaining a lower bound for the bracket term on the right hand side of Let η > 0 to be chosen. By Lemmas 5.2-5.3, recalling that Aδ m = ∇ p δ m and Cδ m = ∇ r δ m and using the generalized Cauchy inequality ab ≤ ηa 2 + 1 4η b 2 we have that where the positive constant K < ∞ may change from line to line but it is always independent of m. Choosing now η = η(m) such that lim m↓0 η(m) = lim m↓0 So, overall we have that for m sufficiently small there isη(m) ↓ 0 as m ↓ 0 such that  Hence, recalling the definition of the inner product ((·, ·)), using (5.3) and rearranging the expression a little bit we have obtained the following bound The next goal is to derive an appropriate upper bound for the left hand side of (5.4). First, we need to obtain the equation that δ m satisfies. By factoring out ρ m (p, r) = ρ 0 (p, r)ρ m (p, r) where ρ 0 (p, r) = ρ OU (p)ρ 0 (r), we obtain the following equation forρ m (p, r): where we recall that h(r) = b(r) − ∇ r log ρ 0 (r). Hence, the equation for δ m (p, By multiplying both sides of (5.5) by δ m and integrating over Y with respect to the measure ρ 0 (p, r)dpdr we then obtain that Hence, using (5.4) and (5.6) we have the following bound Our next goal is to derive upper bounds for the terms T i (δ m ) for i = 1, 2, 3, 4. For better readability, we collect the required bounds in the following lemma, which we also prove in Appendix B.
Now that we have obtained the desired bounds for the terms T i (δ m ) for i = 1, 2, 3, 4 let us put them together. There are some constants K 1 , K 2 < ∞, and a sequencê η(m) = max{η(m), √ m η(m) } ↓ 0 such that for m sufficiently small Now we choose m small enough such thatη(m) < 1, (η(m) + √ m)K 2 < 1/2. Moreover, we also note that since by construction b 1 we can write for m small enough b(1 + √ m) 1/2. In fact the proof of [30] shows that we can choose a, b, c to be positive but as small as we want, as long we choose the constants a, b, c to be ordered appropriately. Putting these estimates together, we get that there is some constant K 3 < ∞ such that for m small enough, one has In order now to close the estimate we need to use Poincaré inequality. Here we make the assumption that the drift b(r) is such that the invariant measure ρ 0 (p, r)dpdr satisfies the Poincaré inequality with constant κ > 0 . In particular, for a function Q(p, r), we have that the Poincaré inequality in the following form holds Let us set now Q(p, r) = δ m (p, r). Notice that by definition of δ m (p, r) we have Y δ m (p, r)ρ 0 (p, r)dpdr = 0 .
Therefore, we have obtained δ m 2 ≤ κ δ m 2 H 1 . (5.10) Inserting now (5.10) into (5.9), we finally obtain that for m small enough from which the desired result finally follows: This concludes the L 2 (Y; ρ 0 ) convergence of the invariant measures.

Convergence of the solution to the cell problem
The goal of this section is to analyze the cell problem (1.4) that Φ(p, r) satisfies and we want to prove Theorem 1.7. As it will become clear from the proof below, we prove even more. We rigorously derive an asymptotic expansion of Φ(p, r) in terms of powers of √ m. Let us recall our assumption α(q, r) = 2Dλ(q)I. Let = 1, 2, ..., d be a given direction and let us define where e is the unit vector in direction . Then, bearing in mind (1.4) the equation that Ψ (p, r) satisfies is given by where we have already defined A = −p · ∇ p + D∆ p and B = p · ∇ r + b(q, r) · ∇ p .
Hence, we have that Having established the last display, it is easy to see that in order to show (4.2), we basically need to show that or, in other words, it is sufficient to show lim m→0 √ m∇ p Ψ 2 (p, r) L 2 (Y;ρ m ) = 0 , (6.6) and Relation (6.6) can be claimed to be true by the fact that Ψ 2 (p, r) is solution to the elliptic problem (6.4) and Theorem 1.6.
So, it remains to prove (6.7). At this point let us recall that Ψ m ,3 (p, r) is solution to (6.5), i.e., it solves L m q Ψ m ,3 (p, r) = − √ mBΨ ,2 (p, r) . (6.8) Notice that for the purposes of this section q is seen as a fixed parameter by the operators and recall that we have already assumed α(q, r) = 2Dλ(q)I. Namely Dλ(q) is seen as a fixed constant. Hence, from now on and for notational convenience, we shall assume without loss of generality that α(q, r) = 2I, i.e., that D = λ(q) = 1. Let us first apply Lemma 5.1 and we get ≤ κ ∇ p f L 2 (Y;ρ 0 ) , (6.10) for some constant κ > 0 independent of m.
for some constant C > 0 independent of m.  [∇ p f (p, r) · α(q, r)∇ p g(p, r)] ρ m (p, r)dpdr . (6.13) We set in particularly in (6.9) f = g = Ψ m ,3 , then we will have that But, we also know that Ψ m ,3 (p, r) satisfies (6.8). Therefore, multiplying both sides of (6.8) by Ψ m ,3 (p, r) and integrating against the invariant density ρ m (p, r) gives us the or, in other words (6.14) We now have the estimate Applying Lemma 6.3 and Lemma 6.2 and the fact that lim Thus we have by (6.14) as m → 0. This is (6.7), completing the proof of Theorem 1.7.

A On properties of the solution to the hypoelliptic cell problem
In this section we recall some results on the solution to the hypoelliptic Poisson equation (1.4) from [14]. Since the set-up of the current paper has some differences from the setup in [14], we formulate the results that we need in the current setup, even though we emphasize that the derivation follows basically from [14].
Under the assumptions made in this paper, Theorem 3.3 from [14] guarantees that, (1.4) has a smooth solution that does not grow too fast at infinity. In particular, we have that for every η > 0, we can write Φ(p, r) = e η 2 |p| 2Φ (p, r) whereΦ ∈ S, the Schwartz space of smooth functions with fast decay. Furthermore, as it can be derived from the proof of Theorem 3.3 of [14], if we let σ max = max i,j=1,···d sup (q,r) |σ i,j (q, r)|, then we have that for every η ∈ (0, 2σ −2 max ) the solution Φ is unique (up to additive constants) in the space L 2 Y, e −η|p| 2 dpdr .
Moreover, it is clear that for each fixed q, the operator L m q defines a hypoelliptic diffusion process on (p, r) ∈ Y = R d × T d . Let us define this process by (p · , r · ). We recall then the following useful bounds from [14]. Based then on these bounds, the computations of [14] reveal that the following bounds for the solution to (1.4) are true. In particular we have that for every T, p > 0 there exists a constant C > 0 that is independent of ε, δ such that Therefore, we obtain R d pf (p, r)g(p, r)ρ OU (p)dp = R d ∇ p (f (p, r)g(p, r))ρ OU (p)dp = R d (∇ p f (p, r)g(p, r) + f (p, r)∇ p g(p, r)) ρ OU (p)dp .
Multiplying both sides by h(r)ρ 0 (r) and integrating over r ∈ T d we then obtain after using Hölder inequality This completes the statement of the lemma.
Similarly, we have Thus, we get Putting the representations of Term1 m and Term2 m together, we have in fact obtained Hence, by Lemma 5.2 we have that there exists a constant K < ∞ that depends on sup r∈T d |h(r)| such that where we use the generalized Cauchy-Schwarz inequality ab ≤ η|a| 2 + 1 4η |b| 2 for any η ∈ (0, ∞). This concludes the proof of the lemma.
Proof of Lemma 5.4. Recall that by (5), the equation for δ m (p, r) =ρ m (p, r) − 1 is Let us multiply now the last equation by δ m (p, r) and integrate over Y against ρ 0 (p, r).
Doing so, we get The next step is to rewrite the term Bδ m , δ m L 2 (Y;ρ 0 (p,r)) . By Lemma 5.3 we have Inserting the latter expression into (B.2) we obtain Next step is to apply Lemma 5.1 with f (p, r) = g(p, r) = δ m (p, r) to get .
Combining the last two expressions, we obtain and after rearranging, we obtain This concludes the proof of the lemma.
Proof of Lemma 5.5. The proof goes along the same lines of Lemma 5.4. We take ∂ pi on both sides of the equation (B.1) and we get the following equation Multiplying both sides of the above equation by ∂ pi δ m and integrate with respect to L 2 (Y; ρ 0 )-inner product we get We apply Lemma 5.1 with f (p, r) = g(p, r) = ∂ pi δ m (p, r) to get We now apply Lemma 5.3 and we have Furthermore, we can calculate Thus, we get the identity Making use of Lemma 5.2 and Young's inequality we estimate where K > 0 is a constant that depends only on sup r∈T d |h(r)|. This implies the lemma.
Proof of Lemma 5.6. The proof goes again along the same lines of Lemma 5.4. We take ∂ ri on both sides of the equation (B.1) and we get the following equation Multiplying both sides of the above equation by ∂ ri δ m and integrate with respect to L 2 (Y; ρ 0 )-inner product we get We apply Lemma 5.1 with f (p, r) = g(p, r) = ∂ ri δ m (p, r) to get We now apply Lemma 5.3 and we also have Furthermore, we can calculate We can apply a straightforward generalization of Lemma 5.2 with h(r) replaced by ∂ ri h(r), as well as Young's inequality, to estimate the right hand side of the above equation by Thus we get the identity Making use of Lemma 5.2 and Young's inequality again we estimate where K > 0 is a constant that depends only on sup r∈T d max(|h(r)|, |∇ r h(r)|). This implies the lemma.
Proof of Lemma 5.7. We start with T 1 (δ m ). By Lemma 5.3 with f = δ m we have Thus, by Lemma 5.2 with f = g = δ m we have the following bound Next we derive an upper bound for T 2 (δ m ) = Aδ m , AL 1 δ m . For this purpose we first notice that where in the last inequality we used Lemma 5.3. Then, using the equation for δ m , (5.6) and Lemma 5.2 we have Next step now is to use Lemma 5.5. Doing so we get the bound Next we derive an upper bound for T 4 (δ m ) = Cδ m , CL 1 δ m . For this purpose we first notice that where in the last inequality we used Lemma 5.3. Then, using the equation for δ m Using Lemma 5.2 we subsequently obtain K η ∇ r δ m 2 + 1 4η ∇ p ∇ r δ m 2 + (1 + √ m) η ∇ r δ m 2 + 1 4η K ∇ p δ m 2 + √ mK ∇ r δ m 2 + ∇ p δ m 2 + ∇ p ∇ r δ m 2 + δ m 2 .
The constant K may change from line to line, but it is always independent of m.
Next we bound terms from above. Using Lemma 5.2, we have for η > 0 The constant K may change from line to line. Using Lemmas 5.5 and 5.6 we obtain Applying then Lemma 5.4 to estimate the term ∇ p δ m 2 on the fourth line of the last display, we obtain the following bound This concludes the proof of the lemma.

C Proofs of Lemmas in Section 6.
Proof of Lemma 6.1. This can be shown by using Theorem 4.2.5 in [1]. Let (P t ) t≥0 be the Markov semigroup corresponding to generator L 1 on Y. By Lemma 5.1 with m = 1, we obtain for the first term (recall that ρ 0 (p, r)dpdr is the invariant measure corresponding to the operator L 1 ) that the Dirichlet form associated with (P t ) t≥0 can be calculated as follows Thus by Theorem 4.2.5 of [1] the validity of Poincaré inequality is equivalent to exponential convergence to equilibrium of the semigroup (P t ) t≥0 : (P t f )ρ 0 (p, r)dpdr 2 ρ 0 (p, r)dpdr ≤ c(f )e −2t/κ . for some constant κ > 0. The above inequality is true since L 1 q admits a spectral gap (see [10]).
Proof of Lemma 6.2. We make use of our equation (6.8), (6.9) as well as Lemma 5.2 and we get for some constant K > 0 independent of m.
This proves the lemma.
Proof of Lemma 6.3. Let us write Ψ in place of Ψ m ,3 for similicity of notations. We set f = Ψ 2 and we look for the equation that f satisfies: Using the equation ( This gives Making use of (C.3), (C.4), the fact that f ≥ 0 and Lemma 5.2 we get, for some constant K > 0 independent of m that may vary from line to line, Now we apply Lemma 6.1 and we see that for some κ > 0 we have In the last step we used the fact that f = Ψ 2 . Now we apply Lemma 6.2 and we see that Ψ 2 L 2 (Y;ρ 0 ) ≤ Km 3 for some constant K > 0 independent of m. Thus we see that f L 2 (Y;ρ 0 ) ≤ K[ ∇ p f L 2 (Y;ρ 0 ) + m 3 ] (C.6) Combining (C.5) and (C.6) we see that This gives lim m→0 ∇ p f 2 L 2 (Y;ρ 0 ) = 0. Apply (C.6) again we see that the claim of the Lemma follows.
Proof of Lemma 6.4. The proof of this lemma is completely analogous to that of Lemma 5.1 and thus it is omitted.