Non-equilibrium steady state and subgeometric ergodicity for a chain of three coupled rotors

We consider a chain of three rotors (rotators) whose ends are coupled to stochastic heat baths. The temperatures of the two baths can be different, and we allow some constant torque to be applied at each end of the chain. Under some non-degeneracy condition on the interaction potentials, we show that the process admits a unique invariant probability measure, and that it is ergodic with a stretched exponential rate. The interesting issue is to estimate the rate at which the energy of the middle rotor decreases. As it is not directly connected to the heat baths, its energy can only be dissipated through the two outer rotors. But when the middle rotor spins very rapidly, it fails to interact effectively with its neighbors due to the rapid oscillations of the forces. By averaging techniques, we obtain an effective dynamics for the middle rotor, which then enables us to find a Lyapunov function. This and an irreducibility argument give the desired result. We finally illustrate numerically some properties of the non-equilibrium steady state.


Introduction
Hamiltonian chains of mechanical oscillators have been studied for a long time. Several models describe a linear chain of masses, with polynomial interaction potentials between adjacent masses, and pinning potentials which tie the masses down in the laboratory frame. Under the assumption that the interaction is stronger than the pinning, it was shown in [5] that the model has an invariant probability measure when the chain is attached at each extremity to two heat baths at different temperatures. That paper, and later developments, see e.g., [3], relied on analytic arguments, showing in particular that the infinitesimal generator has compact resolvent in a suitable function space.
Two elements were added later in the paper [12]: First, the authors used a more probabilistic approach, based on Harris recurrence as developed by Meyn and Tweedie [10]. Second, a detailed analysis allowed them to understand the transfer of energy from the central oscillators to the (dissipative) baths. In that case the convergence to the stationary state is of exponential rate. In [1], this reasoning was extended to more general contexts.
The dynamics of the chain is very different when the pinning potential is stronger than the interaction potential. In that case the chain may have breathers, i.e., oscillators concentrating a lot of energy, which is transferred only very slowly to their neighbors. This may lead to subexponential ergodicity, as shown by Hairer and Mattingly [7] in the case of a chain of 3 oscillators with strong pinning.
In this paper, we discuss a model with three rotors (see Figure 1), each given by an angle q i ∈ T = R/2πZ and a momentum p i ∈ R, i = 1, 2, 3. The phase space is therefore Ω = T 3 × R 3 , and we will consider the measure space (Ω, B), where B is the Borel σ-field over Ω. We will denote the points of Ω by x = (q, p) with q = (q 1 , q 2 , q 3 ) and p = (p 1 , p 2 , p 3 ). Figure 1 -A chain of three rotors with two external torques τ1 and τ3 and two heat baths at temperatures T1 and T3.
We introduce the Hamiltonian with some smooth interaction potentials W b : T → R, b = 1, 3, and some smooth pinning potentials U i : T → R, i = 1, 2, 3. We now let the two outer rotors (i.e., the rotors 1 and 3) interact with Langevin-type heat baths at temperatures T 1 , T 3 > 0, and with coupling constants γ 1 , γ 3 > 0. Moreover, we apply some constant (possibly zero) external forces τ 1 and τ 3 to the two outer rotors. Introducing w b = W b and u i = U i , we obtain the system of SDE: where B 1 and B 3 are standard independent Brownian motions.
Notation. In the sequel, the index b always refers to the rotors 1 and 3 at the boundaries of the chain, and we write b instead of b=1,3 .
Remark 1.1. Our model can be viewed as an extreme case of that studied in [7]. A key factor in that paper is to realize how the frequency of one isolated pinned oscillator depends on its energy. Indeed, for an isolated oscillator with Hamiltonian p 2 /2 + q 2k /(2k), the frequency grows like the energy to the power 1 2 − 1 2k . When k → ∞, the exponent converges to 1 2 . In this limit, the pinning potential formally becomes an infinite potential well, so that the variable q is constrained to a compact interval. In our model, the position (angle) of a rotor lives in a compact space, and its frequency scales like its momentum, i.e., like the square root of its energy. Therefore, we can view our rotor model as some kind of "infinite pinning" limit.
We make the following non-degeneracy assumption (clearly satisfied for e.g., w 1 = w 3 = sin): Assumption 1.2. There is at least one b ∈ {1, 3} such that for each s ∈ T, at least one of the derivatives w (k) b (s), k ≥ 1 is non-zero. For all initial conditions x ∈ Ω and all times t ≥ 0, we denote by P t (x, · ) the transition probability of the Markov process associated to (1.1). Since the coefficients of the SDE (1.1) are globally Lipschitz, the solutions are almost surely defined for all times and all initial conditions, so that P t (x, · ) is well-defined as a probability measure on (Ω, B).
We now introduce the main theorem, in which we denote by ν TV the total variation norm sup A∈B ν(A) − inf A∈B ν(A) of a signed measure ν on (Ω, B). (i) The transition kernel P t has a density p t (x, y) in C ∞ ((0, ∞) × Ω × Ω).
(ii) The process admits a unique invariant measure π, which has a smooth density. (iii) There are constants C, λ, β > 0 such that for all x = (q 1 , q 2 , . . . , p 3 ) ∈ Ω, 2 )e βH(x) e −λt 1/2 . This result follows immediately from Theorem 2.1, Proposition 2.2, and Proposition 5.1, as will be discussed in the next sections. Remark 1.4. If both heat baths have the same temperature, say, T 1 = T 3 = T > 0, and the forces τ 1 and τ 3 are zero, then the system is at thermal equilibrium and the Gibbs measure with density proportional to e −H/T is invariant. Indeed, one easily checks that this density verifies the stationary Fokker-Planck equation L * e −H/T = 0, where L * is the formal adjoint of the generator L introduced below.
Remark 1.5. In fact, the results we prove here apply with hardly any modification to the "star" configuration with one central rotor interacting with m external rotors, which in turn are coupled to heat baths (i.e., m + 1 rotors and m heat baths).
In addition, some studies (e.g., [9]) consider chains with fixed boundary conditions. For the left end of the chain, this corresponds to adding a "dummy" rotor 0 which does not move but interacts with rotor 1. This is covered by our theory by adding some contribution to the pinning potential U 1 . The same applies to the right end and U 3 .
Chains of rotors provide toy models for the study of non-equilibrium statistical mechanics. In [9] long chains have been studied numerically, and it appears that even when the external temperatures are different and external forces are applied, local thermal equilibrium is satisfied in the stationary state in the limit of infinitely long chains. This stationary state may have some surprising features, like a large amount of energy in the bulk of the chain when the boundary conditions are properly chosen. In our case of course we are far from local thermal equilibrium, since we only study systems made of three rotors. We will present some numerical simulations of our system in §6, highlighting some interesting properties of the stationary state.
What corresponds here to the breathers observed in other models is the situation where the energy of the system is very large and mostly concentrated in the middle rotor. The middle rotor then spins very rapidly, and the interaction forces oscillate so fast that they have very little net effect. In this case, the middle rotor effectively decouples from the rest of the system, and the main difficulty is to show that its energy eventually decreases with some well-controlled bounds.
The idea used in [7] for the chain of three pinned oscillators is to average the oscillatory forces, and exhibit a negative feedback in the regime where the breather dominates the dynamics. The proof of Theorem 1.3 in the present paper is based on a systematization of this idea, as explained in §3. 4.
The paper is structured as follows: In §2 we introduce a sufficient condition for subgeometric ergodicity from [2]. In §3 we study the behavior of the middle rotor. In §4 we show how to use the study of p 2 to get a Lyapunov function. In §5 we provide the necessary technical input to the theorem of [2]. Finally, we illustrate numerically some properties of the non-equilibrium steady state in §6.

Ergodicity and Lyapunov function
The proof of Theorem 1.3 relies on the results of [2] which in turn are based on the celebrated theory exposed in [10]. The theory of [10] shows that one can prove the ergodicity of an irreducible Markov process and estimate the rate of convergence toward its invariant measure if one has a good control of the return times of the process to particular sets, called petite sets. A set K is petite if there exist a probability measure a on [0, ∞) and a non-zero measure ν a on Ω such that for all x ∈ K one has ∞ 0 P t (x, · )a(dt) ≥ ν a ( · ). In the case we are interested in, control arguments and the hypoellipticity of the generator imply that each compact set is petite (see §5.1 for a proof of this property).
Let L be the infinitesimal generator of the process, i.e., the second-order differential operator Recall that for any sufficiently regular function f we have Lf (x) = d dt f (y)P t (x, dy) t=0 . A classical way to control the return times to a petite set is to make use of Lyapunov functions. We call Lyapunov function a smooth function V : Ω → [1, ∞) with compact level sets (i.e., due to the structure of Ω, a function such that V (q, p) → ∞ when p → ∞) such that for all x ∈ Ω, is an increasing function, and K is a petite set. If one can find such a function, and prove that some skeleton P ∆ (∆ > 0) is µ-irreducible for some measure µ (i.e., µ(A) > 0 implies that for all x ∈ Ω there exists k ∈ N such that P k∆ (x, A) > 0), then the Markov process is ergodic, with rate depending on ϕ. In the common case where ϕ(V ) ∝ V , the convergence is geometric if = 1 and polynomial if < 1 (see [2,11]). In this paper, we obtain We rely on the work of Douc, Fort and Guillin [2], which gives a sufficient condition for subgeometric ergodicity of continuous-time Markov processes. We give here a simplified version of their result, adapted to our purpose. This statement is based on Theorem 3.2 and Theorem 3.4 of [2]. Theorem 2.1 (Douc-Fort-Guillin (2009)). Assume that some skeleton chain is irreducible and that there exist a smooth function V : Ω → [1, ∞) with V (q, p) → ∞ when p → ∞, an increasing differentiable concave function ϕ : [1, ∞) → (0, ∞), a petite set K, and a constant C such that (2.1) holds. Then the process admits a unique invariant measure π, and there exists a constant C such that ds ϕ(s) . The core of the paper is devoted to the construction of a Lyapunov function such that (2.1) is satisfied with ϕ(s) ∼ s/ log s, and a set K which is compact and therefore petite. This yields a stretched exponential convergence rate (see Lemma 2.3 below). The existence of an irreducible skeleton required by Theorem 2.1 and the fact that every compact set is petite are proved in §5.
One might at first think that a Lyapunov function is simply given by the Hamiltonian H. Unfortunately, this is not the case, as where the right-hand side remains positive when p 1 , p 3 are small and p 2 → ∞. Thus, there is no bound of the form (2.1) for H. The same problem occurs if we take any function of the energy f (H).
In order to find a bona fide Lyapunov function, we will need more insight into how fast all three momenta decrease. The equality (2.2) suggests that p 1 and p 3 will not cause any problem. In fact, we have for b = 1, 3, that essentially decays at exponential rate when it is large. This is of course due to the friction terms that act on p 1 and p 3 directly. Such a result does not hold for p 2 . In fact, the decay of p 2 is much slower. Our main insight is that in a sense The proof of such a relation occupies a major part of this paper. As indicated earlier, this very slow damping of p 2 comes from the lack of effective interaction when the forces oscillate very rapidly. Once we have gained enough understanding of the dynamics of p 2 , we will be able to construct a Lyapunov function, whose properties are summarized in There is a function V : Ω → [1, ∞) that satisfies the two following properties: 1. There are constants β, c 1 , c 2 such that There are positive constants c 3 , c 4 and a compact set K such that Lemma 2.3. In the notation of Theorem 2.1, the function ϕ defined in (2.3) yields a convergence rate for some c, λ > 0.
Proof. A simple calculation shows that H ϕ (u) = 1 . Clearly the desired bound holds for λ < √ 2c 4 and c large enough.

Effective dynamics for the middle rotor
The hardest and most interesting part of the problem is to determine how p 2 decreases when it is very large. 2 In this section, we obtain some asymptotic, effective dynamics for p 2 when p 2 → ∞.

Expected rate
Before we start making any proof, we can get a hint of how p 2 decreases in the regime where p 2 is very large and both p 1 , p 3 are small. Assume for simplicity that u i ≡ 0 and that W b (s) = −κ cos(s) so that w b (s) = κ sin(s). In the regime of interest, we expect the middle rotor to decouple, so that p 2 will evolve very slowly. We will consider the system over times that are small enough for p 2 to remain almost constant (say equal to ω), but large enough for some "quasi-stationary" regime to be reached. The reader can think of ω as being the "initial" value of p 2 . For b = 1, 3, we expect p b to be well approximated, at least qualitatively, by the equation 1 The 2 in the denominator ensures that ϕ is concave and increasing on [1, ∞), as required in Theorem 2.1. 2 To simplify notation, we say p2 is large, but we always really mean that |p2| is large.
We have neglected the exponentially decaying part p b (0)e −γ b t since we assume that a quasi-stationary regime is reached. By (2.2), the rate of energy flowing into of the system at b is

Squaring p b and taking expectations, what remains is
Neglecting again an exponentially decaying term, we obtain Since cos 2 (ωt) oscillates very rapidly around its average 1/2, we expect to see an effective contribution − γκ 2 2ω 2 . This approximation was obtained by assuming that p 2 is almost constant and equal to ω. Now, when p 2 is very large, the energy H is dominated by the contribution 1 2 p 2 2 , so that we expect to have We will obtain this result rigorously in Proposition 3.4.

Notations
Let Ω † = {(q, p) ∈ Ω : p 2 = 0}. We denote throughout X t = (q(t), p(t)) the solution of the stochastic differential equation (1.1) with initial condition X 0 = (q(0), p(0)). For now, we restrict ourselves to X 0 ∈ Ω † since we aim to obtain an effective dynamics for the middle rotor by performing an expansion in negative powers of p 2 . Remark that since d dt p 2 is bounded, there is for each initial condition X 0 ∈ Ω † a deterministic time t * > 0 (proportional to |p 2 (0)|) such that X t ∈ Ω † for all t ∈ [0, t * ) and all realizations of the random noises. To define a smooth Lyapunov function on the whole space Ω we will perform a regularization in §4.
Definition 3.1. We let U be the set of stochastic processes u t which are solutions of an SDE of the form for some functions f i : Ω → R.
Notation: In the sequel, we write For any smooth function h on Ω, the stochastic process h(X t ) is in U by the Itō formula (see below). Without further mention, we will both see h as a function on Ω and as the stochastic process h(X t ). When referring to the stochastic process, we shall write simply dh instead of dh(X t ). Of course, only very few processes in U can be written in the form h(X t ) for some function h on Ω.
The variables p 2 and q 2 will play a special role, as we are merely interested in the regime where p 2 is very large. For any function f over Ω we call the quantity the q 2 -average of f (or simply the average of f ), which is a function of p, q 1 and q 3 only.
Assumption 3.2. We assume This assumption merely fixes the additive constants of the potentials and therefore results in no loss of generality.
For conciseness, we shall omit the arguments of the potentials and forces, always assuming that To simplify the notations, we also introduce the potentials Φ 1 , Φ 2 , Φ 3 associated to the three rotors, and the corresponding forces ϕ 1 , ϕ 2 , ϕ 3 defined by Of course, Φ i and ϕ i are functions of q only. With these notations the dynamics reads more concisely We will mainly deal with functions of the form p 2 p n 1 p m 3 g(q) and their linear combinations. We therefore introduce the notion of degree. Definition 3.3. We say that a function f on Ω † has degree ∈ Z if it can be written as a finite sum of elements of the kind p 2 p n 1 p m 3 g(q) for some n, m ∈ N and a smooth function g : a generic expression of order at most (which can vary from line to line), i.e., a finite sum of functions of degree , − 1, − 2,. . . .
We have by the Itō formula that for any smooth function f on Ω (3.4) (By the discussion following Definition 3.1, f , its partial derivatives, p 2 and the functions ϕ i in this SDE are evaluated on the trajectory X t .) Observe that when acting on a function of degree , the contribution d + increases the degree of p 2 by one, while d 0 and d − respectively leave it unchanged and decrease it by one. In this sense, we will see d + as the "dominant" part of d.

General idea
In this section we introduce the main idea, which consists in successively removing oscillatory terms order by order in the dynamics of p 2 . We perform here the first step of the method in a somewhat naive, but pedestrian way. In the next two sections, we systematize the method and apply it. We begin by looking at the equation When p 2 is large while p 1 and p 3 are small, the right-hand side is highly oscillatory and its timeaverage is almost zero, since ϕ 2 = 0. We will proceed to a change of variable in order to "see through" this oscillatory term. We first make the relation between the time-average and the q 2 -average more precise. Consider some function g on Ω. In the regime where p 2 is very large and p 1 , p 3 are small, the only fast variable is q 2 . Now consider some interval of time [0, T ] short enough so that the other variables do not change significantly, but still large enough for q 2 to swipe through [0, 2π) many times. We have in that case so that the time-average of g is expected to be very close to the q 2 -average g . Now, we want to estimate p 2 (t) = t 0 ϕ 2 (q(s))ds in this situation. Approximating ϕ 2 as in (3.6) and integrating formally with respect to time (remember that ϕ 2 = −∂ q 2 Φ 2 ) leads naturally to the decomposition which consists in writing p 2 as sum of an oscillatory term Φ 2 /p 2 which is supposed to capture "most" of the oscillatory dynamics, and some (hopefully) nicely behaved "slow" processp 2 . And indeed, if we differentiate (3.7) we get As a result, we have a new processp 2 which is asymptotically equal to p 2 in the regime of interest, and whose dynamics involves only terms that are small when p 2 is large, so thatp 2 is indeed a slow variable. Observe that the choice of adding Φ 2 /p 2 to p 2 has the effect that d + (Φ 2 /p 2 ) = −ϕ 2 dt, which precisely cancels the right-hand side of (3.5) while the remaining terms have negative powers of p 2 . This observation is the starting point of the systematization of the method. Unfortunately, (3.8) is not good enough to understand howp 2 (and therefore p 2 ) decreases in the long run, since the dynamics (3.8) ofp 2 still involves oscillatory terms. The idea is therefore to eliminate these oscillatory terms by absorbing them into a further change of variablep 2 =p 2 + G for some suitably chosen G. The result is that dp 2 is a sum of terms of degree −2 at most, which turn out to be still oscillatory. This procedure must then be iterated, successively eliminating oscillatory terms order by order, until we get some dynamics that has a non-zero average (which happens after finitely many steps). We will follow this idea, but in a way that does not require to write the successive changes of variable explicitly. More precisely, we will prove (By Assumption 3.2, no arbitrary additive constant appears in W 2 1 and W 2 3 .) The next two sections are devoted to proving Proposition 3.4.

Averaging
The crux of our analysis is to average oscillatory terms in the dynamics. This a well known problem in differential equations. In classical averaging theory [13,15], it is an external small parameter ε that gives the time scale of the fast variables. Here, the role of ε is played by 1/p 2 , which is a dynamical variable. We develop an averaging theory adapted to this case, and also to the stochastic nature of the problem.
The starting point is as follows. Imagine that for a function h on Ω we find an expression of the kind for some function f = f (X t ) of degree and some stochastic process r t ∈ U (see Definition 3.1) which denotes the part of the dynamics that we do not want to interfere with. Thinking of f (X t ) as a highly oscillatory quantity when p 2 is very large, we would like to write h =h + F for some small function F on Ω such that dh = f dt + dr t + small corrections , (3.11) where the notion of small will be made precise in terms of powers of p 2 . That is, we want to find somē h close to h, such that its dynamics involves, instead of f dt, the q 2 -average f dt plus some smaller corrections. In other words, we are looking for some F such that Remembering that in terms of powers of p 2 , d + is the dominant part of d, the key is to find some F such that Thus, we really need to invert L + (which is in fact the dominant part of the generator L when p 2 is large). We call here K the space of smooth functions Ω † → R, and we denote by K 0 the space of functions f ∈ K such that f = 0. Note that L + maps K to K 0 since for all f ∈ K, we have by periodicity We can define a right inverse (L + ) −1 : K 0 → K 0 by letting for all g ∈ K 0 where the integration "constant" c(p, q 1 , q 3 ) is uniquely defined by requiring that (L + ) −1 g = 0.
This leads naturally to the following Definition 3.5. For any function f ∈ K, we define the operator Q : K → K 0 by Remark 3.6.
• If f is a function of degree , then Qf is of degree − 1.
• By construction, • Moreover, by definition, Qf is the only function such that Therefore, if (3.10) holds for some f of degree , then we obtain a quantitative expression for (3.11), namely where the corrections are small in the sense that Qf , d 0 (Qf ) and d − (Qf ) have degree respectively − 1, − 1 and − 2.
Remark 3.7. Observe that (3.7) can be written now as p 2 =p 2 + Qϕ 2 , since Qϕ 2 = −Φ 2 /p 2 . Thus, the "naive" correction we added in (3.7) also follows from the systematic method we have just introduced. This is no surprise: the naive correction in (3.7) was motivated by the approximation (3.6) in which only q 2 moves, which corresponds to considering only d + .
Remark 3.8. Our averaging procedure is inspired by techniques of [7]. There, the equivalent of L + is the generator −q 2k−1 2 ∂ p 2 + p 2 ∂ q 2 of the free dynamics of the middle oscillator, where q 2k 2 /(2k) is the pinning potential. In their case, one cannot explicitly invert L + , but one can show that (L + ) −1 basically acts as a division by E , where E 2 is the energy of the middle oscillator. Again, taking formally the limit k → ∞, one obtains that (L + ) −1 acts as a division by √ E 2 , much like in our case where (L + ) −1 acts as a division by p 2 ∼ √ E 2 .
We now restate our averaging method as the following lemma, which follows from a trivial rearrangement of the terms in (3.12). Lemma 3.9. (Averaging lemma) Consider some function f =Ô(p 2 ) for some ∈ Z. Then We now prove Proposition 3.4 by using Lemma 3.9 repeatedly.

Proof of Proposition 3.4
We make the following observations, which we will use without reference. For any function f on Ω † that is smooth in q 2 , we have by periodicity (3.14) Moreover, if g is another such function, then we can integrate by parts to obtain Furthermore, we have by Assumption 3.2, (3.3) and (3.14) that We start by doing again the first step, which we did in §3.3, but this time using the new toolset. In order to average the right-hand side of dp 2 = ϕ 2 dt , we use Lemma 3.9 with f = ϕ 2 , which is of order 0. We have f = 0 and Qf = −Φ 2 /p 2 (by definition of ϕ 2 and Φ 2 ). We obtain This is exactly what we found in (3.8). We deal next with the terms −p b w b /p 2 dt in (3.15). Using Lemma 3.9 with f = p b w b /p 2 (and therefore with Qf where here and in the sequel, we denote by dÔ(p k 2 ) any generic expression of the kind dw(X t ) for some function w =Ô(p k 2 ) on Ω. Here dÔ p −2 . Substituting (3.16) into (3.15) leads to dp 2 = I dt + J dt + 1 We next deal with the terms I dt and J dt.
Since I is of order −2 and I = 0, we find that QI is of order −3 and thus d − (QI) =Ô p −4 2 dt. Applying Lemma 3.9 with f = I, we find (3.18) Using that QI = 0, the definition (3.4) of d 0 leads, upon inspection, to where E is a sum of terms of order −3 and E = 0. Applying Lemma 3.9 to w b ∂ p b (QI)dt and E dt, Using the definition of w b , integrating by parts once and using (3.13), we have for b = 1, 3, where again we have used that (3.19) and then the result into (3.18) we finally get We next deal with the term J dt of (3.17). First, by Lemma 3.9,

(3.22)
Unfortunately, J = 0, 3 and we will need some more subtle identifications. Integrating by parts, we have (3.23) On the other hand, since p −3 2 Φ 2 2 does not depend on q 2 , we find d  .

This together with (3.17) and (3.21) finally show that
, which implies (3.9) and completes the proof of Proposition 3.4.
Remark 3.10. We can argue (in a nonrigorous way) that when |p 2 | is very large, the dynamics ofp 2 is approximately that of a particle interacting with two "effective" heat baths at temperatures T 1 and T 3 , but with some coupling of magnitude p −4 2 . Indeed, we can write (3.9) in the canonical "Langevin" form dp . We would like to introduce an effective temperatureT b by some Einstein-Smoluchowski relation of which instead of a constant is an oscillatory quantity (with mean T b ). Now observe that these oscillations disappear if we approximate the oscillatory term W b in σ b by its quadratic mean W 2 b 1/2 . This approximation is reasonable in the following sense: for small t and large |p 2 |, we have that

But then, by the Dambis-Dubins-Schwarz representation theorem, there is another Brownian motioñ
Clearly, when |p 2 | is very large, τ (t) ≈ t so that M (t) is very close toB b t . In this sense, when |p 2 | → ∞, it is reasonable to approximate Remark 3.11. The ergodicity of 1D Langevin processes is well understood: for any δ ∈ (−1, 0), processes satisfying an SDE of the kind dp ∼ −C 1 p δ dt + C 2 dB t asymptotically (when |p| → ∞) are typically ergodic with a rate bounded above and below by exp(−c ± t (1+δ)/(1−δ) ) for some constants c + , c − > 0 (see [2,6] and references therein, in particular [6] for the lower bound). As argued in Remark 3.10, the variablep 2 (which is expected to be the component of the system that limits the convergence rate) essentially obeys an equation of the kind dp ∼ −C 1 p −3 dt + C 2 p −2 dB t asymptotically. It is easy to check that a change of variable y = p 1/3 yields the asymptotic dynamics dy ∼ −C 1 y −1/3 dt + C 2 dB t so that with δ = 1/3, we expect a rate exp(−ct 1/2 ). This suggests that the rate of convergence we find is optimal.

Lyapunov function
We now prove Proposition 2.2. Throughout this section,p 2 is the function defined in Proposition 3.4. The basic idea is to consider a Lyapunov function where (p) is non-zero only when |p 2 | is much larger than |p 1 | and |p 3 |. We will obtain that LV −ϕ(V ), with ϕ(s) ∼ s/ log(s) as in Proposition 2.2. The fact that we do not have a bound of the kind LV −cV (which would yield exponential ergodicity) comes from the very slow decay of p 2 . The basic idea is that, when p 2 → ∞ and p 1 , p 3 ∼ 0, We now introduce the necessary tools to make this observation rigorous.
Lemma 4.1. For β > 0 small enough, there are constants C 1 , C 2 > 0 such that , and p k 2 = p k 2 +Ô(p k−1 2 ) for all k, we find after taking the expectation value which gives the desired bound if β is small enough (recall that the W 2 b are bounded).

Convention:
We fix β > 0 small enough so that the conclusions of Lemma 4.1 and Lemma 4.2 hold.
Let k ≥ 1 be an integer and R > 0 be a constant (which we will fix later). We split Ω into three disjoint sets Ω 1 , Ω 2 , Ω 3 defined by Fix some m, n ∈ N and ≥ 1.

Convention:
We fix k and R such that the conclusions of Lemma 4.3 hold. By construction (p) is smooth, (p) = 0 on Ω 1 and (p) = 1 on Ω 3 , with some transition on Ω 2 . Remember thatp 2 is by construction smooth on Ω † , i.e., when p 2 = 0. In particular, since Ω 2 ∪ Ω 3 ⊂ Ω † , the function (p)p 2 2 e β 2p 2 2 is smooth on Ω, and so is V . We can now finally give the Proof of Proposition 2.2. We show here that V satisfies the conditions enumerated in Proposition 2.2 if A is large enough. Let us first prove the first statement, which is that there exist c 1 , c 2 > 0 such that 3 ) e βH ≤ c(p 2 2 + 2C 4 p 2 + C 2 4 )e βH ≤ c(1 + p 2 2 )e βH . But then V ≤ 1 + c(1 + p 2 2 )e βH ≤ c(1 + p 2 2 )e βH , where the last inequality holds because H is bounded below, so that e βH is bounded away from zero.
• On Ω 2 , the key is to observe that there is a polynomial z(p 1 , p 2 , p 3 ) such that 3 ) e βH , where the second inequality comes from (4.4). Now, since p 2 1 + p 2 2 ∼ |p 2 | 1/k on Ω 2 , we have that z(p)e − β 2 (p 2 1 +p 2 3 ) is bounded on Ω 2 . Therefore, by this and Lemma 4.1, we have on Ω 2 , which, as in the previous case, implies that (4.7) holds on Ω 2 if M is large enough. On the set {x ∈ Ω 3 : C 1 − C 2 (p 2 1 + p 2 3 ) ≤ −1}, we simply have LV ≤ −e βH , so that (4.7) holds trivially. On the other hand, on the set {x ∈ Ω 3 : C 1 − C 2 (p 2 1 + p 2 3 ) > −1} the quantity p 2 1 + p 2 3 is bounded, so that e β 2p 2 2 ≥ ce βH by (4.4), which with (4.8) implies that LV ≤ (C 1 − C 2 (p 2 1 + p 2 3 ))e βH − cAe βH ≤ (C 1 − cA)e βH . By making A large enough, we again find a bound LV ≤ −ce βH , so that (4.7) holds. Therefore, (4.7) holds on all of Ω. To obtain (4.6), we need only show that e βH ≥ cV /(2 + log V ). By the boundedness of the potentials and the definition of V , we have 1 + p 2 2 ≤ 2H + c ≤ c log(e βH ) + c ≤ c log V + c ≤ c(log V + 2). But then by (4.5) we indeed have that e βH ≥ cV /(1 + p 2 2 ) ≥ cV /(2 + log V ). This completes the proof of Proposition 2.2. to be strictly positive. Remark 4.6. Although we assume throughout that T 1 and T 3 are strictly positive, the computations that lead to the Lyapunov function apply to zero temperatures as well (the temperatures only appear in some non-dominating terms in V and LV ). In that case, the existence of an invariant measure can still be obtained by compactness arguments (see e.g., Proposition 5.1 of [7]). However, the smoothness, uniqueness and convergence assertions do not necessarily hold: when T 1 = T 3 = 0 the system is deterministic, the transition probabilities are not smooth, and there is at least one invariant measure concentrated at each stationary point of the system. The positive temperatures assumption is crucial in the next section.

Smoothness and irreducibility
This section is devoted to proving that the hypotheses of Theorem 2.1 other than the existence of the Lyapunov function are satisfied. More precisely we will prove the following proposition.
Proposition 5.1. The following properties hold.
(i) The transition probabilities P t (x, · ) have a density p t (x, y) that is smooth in (t, x, y) when t > 0. In particular, the process is strong Feller. (ii) The time-1 skeleton (X n ) n=0,1,2,··· is irreducible, and the Lebesgue measure m on (Ω, B) is a maximal irreducibility measure. (iii) Every compact set is petite.
In a sense, (i) shows that we have some effective diffusion in all directions at very short times, and (ii) shows that every part of the phase space is eventually reached with positive probability. Observe that (iii) follows from (i) and (ii). Indeed, by (i), (ii) and Proposition 6.2.8 of [10], every compact set is petite for the time-1 skeleton. But then every compact set is also petite with respect to the process X t (simply by choosing a sampling measure on [0, ∞) that is concentrated on N). Therefore, we need only prove (i) and (ii), which we do in the next two subsections.

Smoothness
We show here that the semigroup has a smoothing effect. More specifically, we show that a Hörmander bracket condition is satisfied, so that the transition probability P t (x, dy) has a density p t (x, y) that is smooth in t, x and y, and every invariant measure has a smooth density [8].
We identify vector fields over Ω and the corresponding first-order differential operators in the usual way (we identify the tangent space of Ω with R 6 ). This enables us to consider Lie algebras of vector fields over Ω of the kind i (f i (q, p)∂ q i + g i (q, p)∂ p i ), where the Lie bracket [ · , · ] is the usual commutator of two operators.
is the drift part of L.
By the definition of a Lie algebra, M is closed under linear combinations and Lie brackets. Proof. By definition, the constant vector fields ∂ p 1 and ∂ p 3 belong to M. Moreover, for b = 1, 3, Since M is closed under linear combinations and ∂ p b ∈ M, it follows that ∂ q b ∈ M for b = 1, 3. Thus it only remains to show that at each x ∈ Ω, we can span the directions of ∂ q 2 and ∂ p 2 . In the following, f denotes a generic function on Ω that can be each time different. We have [∂ q b , A 0 ] = w b (q 2 − q b )∂ p 2 + f (q)∂ p b so that commuting n − 1 times with ∂ q b we get that for all n ≥ 1 w Commuting the above with A 0 , we find that for all n ≥ 1, For any fixed x ∈ Ω, there is by Assumption 1.2 some b ∈ {1, 3} and some n ≥ 1 such that Thus, by (5.1) and (5.2) the proof is complete. Thus, we have proved Proposition 5.1 (i).

Irreducibility
We show in this section that the process has an irreducible skeleton. We give in fact two different proofs. The first one is given in a general and abstract framework, and works for chains of any lengths. The second one is more explicit, gives more than the irreducibility of a skeleton, but relies strongly on the fact that the chain is made of only three rotors.

Abstract version
Consider the transition probabilitiesP t ( · , · ) of the system at equilibrium, i.e., with parameters τ 1 = τ 3 = 0 and T 1 = T 3 = 1 β for some β > 0. For all x and t, the measures P t (x, · ) andP t (x, · ) are equivalent. This equivalence holds because any change of the parameters τ 1 , τ 3 (respectively T 1 , T 3 ) can be absorbed by shifting (respectively scaling) the Brownian motions appropriately. Therefore, it is enough to prove the irreducibility claim at equilibrium.
At equilibrium, the Gibbs measure ν with density 1 Z exp(−βH) is invariant (with some normalization constant Z) as mentioned earlier. Note that we do not assume a priori that ν is the unique invariant measure at equilibrium, nor that the system at equilibrium is irreducible. The only two properties that we need are invariance and (everywhere) positiveness of the density of ν . Proof. We have by the invariance of ν, which completes the proof.
Lemma 5.5. Let A be a closed set. If A is invariant underP 1 (i.e.,P 1 (x, A) = 1 for all x ∈ A), then either A = ∅ or A = Ω.
Proof. By Lemma 5.4, A cP 1 (x, A)dν = AP 1 (x, A c )dν = 0 sinceP 1 (x, A c ) = 0 for all x ∈ A. This implies thatP 1 (x, A) = 0 for all x ∈ A c , since x →P 1 (x, A) is continuous on the open set A c and ν has an everywhere positive density. But thenP t (x, A) is 1 when x ∈ A and 0 when x ∈ A c , so that by continuity we have ∂A = ∅. Since Ω is connected, the conclusion follows.
Note that same does not hold for non-closed sets: for example Ω minus any set of zero Lebesgue measure is still an invariant set.
Proof. As discussed above, it is enough to prove the result at equilibrium, i.e., withP 1 ( · , · ). Let B be a set such that m(B) > 0. We need to show that the set A = {x ∈ Ω : ∞ n=1P n (x, B) = 0} is empty. By the smoothness of x →P n (x, B), it is easy to see that A c = {x ∈ Ω : ∃n > 0,P n (x, B) > 0} is open, so that A is closed. Moreover, for all x ∈ A it holds that 0 = ∞ n=1P y, B). But since by the definition of A we have ∞ n=1P n (y, B) > 0 for all y ∈ A c , we must haveP 1 (x, A c ) = 0 for all x ∈ A, so that A is invariant. But then by Lemma 5.5 either A = ∅ or A = Ω. We need to eliminate the second possibility. Since m(B) > 0 and ν has positive density, we have ν(B) > 0. By the invariance of ν, we have ΩP 1 (x, B)dν = ν(B) > 0. But then there is some x ∈ Ω such thatP 1 (x, B) > 0, so that x ∈ A c . Therefore A = Ω, and thus A = ∅ and the process is irreducible with measure m. That m is a maximal irreducibility measure follows immediately from the fact that the transition probabilities are absolutely continuous with respect to m. This completes the proof. move q b , p b , b = 1, 3, almost instantly to any configuration. In the next lemma, we show how the middle rotor can be forced into any configuration with some piecewise constant force. Then, we will show that this piecewise constant force can "almost" be realized by moving q b appropriately.
Observe that as soon as ∆ > 2π/K * , we can choose a ∈ [0, K * ] so thatq 2 (Θ + 2∆) takes any value (modulo 2π). In particular, we can choose it to be q f 2 , so that we have the advertised result with T * = Θ + 2π/K * . Remark 5.9. We have given a proof only if u 2 ≡ 0. However, the result remains true even if u 2 = 0, although the proof is much more involved. Typically, if the pinning is stronger than the interaction forces w b , and the initial condition is such that p 2 is small, we sometimes have to push the middle rotor several times back and forth to increase its energy enough to pass above the "potential barrier" created by U 2 . Conversely, we sometimes have to brake the middle rotor with some non-trivial controls.
We now have some piecewise constant control g(t) that can bring the middle rotor to the final configuration of our choice. It remains to show that we can make the external rotors follow some trajectories that have the appropriate initial and terminal conditions, and such that the force exerted on the middle rotator closely approximates g(t). We do not prove this in detail, but we list here the main steps.
• Since K − ≤ g(t) ≤ K + , it is possible to find piecewise smooth functions q * b (t), b = 1, 3, such that b w b (q 2 (t) − q * b (t)) ≡ g(t), whereq 2 (t) is the solution of (5.4). • Let δ > 0 be small. We can find some smooth trajectories q b (t) compatible with the boundary conditions x i and x f , such that q b (t) = q * b (t) for all t ∈ [0, T ] \ A δ , where A δ consists of a finite number of intervals of total length at most δ. We can choose the controls f b so that the q b (t) constructed here are solutions to (5.3) (when δ is small, f b (t) is typically very large for t ∈ A δ ).
• Since the interaction forces w b are bounded, their effect during the times t ∈ A δ is negligible when δ is small. More precisely, it can be shown that the solution q 2 (t) and p 2 (t) of (5.3) converge uniformly on [0, T ] to the solutionsq 2 (t) andp 2 (t) of (5.4) when δ → 0. Therefore, the system is approximately controllable in the sense of Proposition 5.7.

Numerical illustrations
In this section we illustrate some properties of the invariant measure in the case where U i ≡ 0 and W 1 = W 3 = − cos.
We use throughout the values γ 1 = γ 3 = 1 and τ 1 = 0. We give examples of how the marginal distributions of p 1 , p 2 , p 3 depend on the temperatures T 1 , T 3 and the external force τ 3 . We apply the numerical algorithm given in [9] with time-increment h = 0.001. The resulting graphs are quite independent of h. In order to obtain good statistics and smooth curves, the probability densities shown below are sampled over 10 8 units of time and several hundred bins.
At equilibrium, i.e., when T 1 = T 3 = 1/β and τ 3 = 0 (remember that τ 1 = 0 in this section), the marginal law of each p i has a density proportional to exp(−βp 2 i /2) for i = 1, 2, 3. This is obviously not the case out of equilibrium. Moreover, since we work with a finite number of rotors, we do not expect to see any form of local thermal equilibrium in the bulk of the chain (here the "bulk" consists of only the middle rotor). Clearly, the distribution of p 2 can be quite far from Maxwellian (Gaussian).
In Figure 2 we show the marginal distributions of p 1 , p 2 , p 3 for different temperatures and no external force. For each pair of temperatures, we show the distributions both in linear and logarithmic scale. At equilibrium, when T 1 = T 3 = 10, all three distributions coincide exactly and are Gaussian. However, when T 1 = T 3 , we see that the distribution of p 2 is not Gaussian (clearly, the distribution is not a parabola in logarithmic scale).  frequent the switches between these two regimes. The asymmetry of the two maxima in Figure 3 is explained by the inequality T 1 < T 3 , which makes the fluctuations larger in the second regime, so that the mean sojourn time there is shorter.