Hypocoercive relaxation to equilibrium for some kinetic models via a third order differential inequality

This paper deals with the study of some particular kinetic models, where the randomness acts only on the velocity variable level. Usually, the Markovian generator cannot satisfy any Poincar\'e's inequality. Hence, no Gronwall's lemma can easily lead to the exponential decay of Ft (the L2 norm of a test function along the semi-group). Nevertheless for the kinetic Fokker-Planck dynamics and for a piecewise deterministic evolution we show that Ft satisfies a third order differential inequality which gives an explicit rate of convergence to equilibrium.


Introduction
In order to improve MCMC algorithms one can try to resort to higher order dynamics, for instance kinetic ones. Indeed, non-reversible dynamics naturally possess more inertia than reversible ones and have less tendency to turn back and hesitate than the simple reversible process. This is an important issue for the escape of local minima and such non-reversible processes may then converge faster to equilibrium (cf. [13], [14], [38], [39]).
For instance, [39] compare numerically the following sampling procedures of the Gibbs measure e −U (x) dx associated to a potential U . First, thanks to the Fokker-Planck dynamics dXt = −U (Xt) + σdBt and secondly with the kinetic Fokker-Planck one (shorten from now on to kFP ; it is called Langevin dynamics in [39], but we stick here to [11] for the denomination) where Bt stands for a standard brownian motion. It turns out, numerically, that the second one is generally more efficient, in the sense that it converges faster toward the steady regime.
Despite (or thanks to) all this work, some phenomena arising from the interplay between the deterministic transport and the stochastic part of the generator still deserve to be better understood. In particular the convergence to equilibrium appears to be inhomogeneous in time: in [24], where the L 2 distance d(t) between the distribution at time t and the equilibrium is explicitly computed for the kFP process with a quadratic potential, the decay is flat for small times, i.e. d(t) 1 − ct 3 . Indeed, if d (0) were non zero, it would imply a Poincaré inequality (see [1]) but none is satisfied there. Furthermore in some cases we have d(t) = gte −λt for some λ > 0 but with a periodic prefactor gt. Such oscillations, linked to the competition for the convergence to equilibrium between the position and the velocity (see the discussion p.66 of [12]), have also been numerically observed for the Boltzmann equation in [22]. This behaviour is reminescent of functions of the form φ(t) = e −λt (a + b cos(νt + θ)), which are solutions of (∂t + λ) 3 + ν 2 (∂t + λ) φ = 0.
The third order may also be linked to the number of Lie Brackets one has to take in Hörmander's hypoellipticity theory to obtain a full rank (cf. [32]), and is expected to get bigger for higher order models (for instance oscillator chains [19]). Yet most of the current results rely on the existence of some quantity that somehow decreases at all time, in other words in a first order differential equation (with the notable exception of [40] where the usual dissipation of entropy is checked in mean in time). We can expect, in fact, a third order differential inequality to be satisfied, which can account for these inhomogeneities. This is the scope of the present article. This is not a new idea (cf. [11], [34]) but up to our knowledge it had never been succesfully completed. In fact for the kFP model it has been noted in [24] that no linear combination of the L 2 norm and its three first derivatives can be non-positive for all test functions, so we will clarify in the sequel the meaning of third order differential inequality.
(Xt, Yt) ∈ R 2 is then the position-speed process of a particle in a potential U with friction and noise. Results about its convergence to equilibrium can be found in [24] for a quadratic potential, and, according to one favorite method, [30], [18] or [11] (among others) for more general cases (the coupling method, in [18], only deals with convex potentials). The second one is a generalised version of the telegraph process, for which (Xt, Yt) ∈ R × {±1}, where dXt = Ytdt and Yt jumps to its opposite following an inhomogeneous rate a(Xt, Yt). Here the particle go forward at constant speed and only does U-turn (cf. Figure 1 and 2 for an illustration). In the classical telegraph process the rate of jump a is constant over its definition space. If we take Xt ∈ R/2πZ to ensure ergodicity, we obtain maybe one of the simplest toy models for kinetic processes, cited as a basic example in [21] or [17] and precisely studied in [36]. When the rate is no longer constant, the underlying algebra collapses. An ergodic version on the real line has recently been investigated in [23] but, again with coupling method, the invariant measure corresponds to a convex potential.
In our cases, (Xt, Yt) has a unique invariant measure denoted µ. Recall that the semigroup (Pt) t≥0 of operators on L 2 (µ) is defined by

Its infinitesimal generator L is
Ptf − f t for f such that the limit exists. To focus on other questions, from now on we assume the existence of a core D dense in L 2 (µ), stable by L, and we will always consider f ∈ D. For a more analytical setting of the problem, denoting byL the dual of L, which operates on measures, the law µt of (Xt, Yt) is the (weak) solution of
For the kFP model, µ = e −U (x) dx ⊗ e − y 2 2 dy is the Gibbs measure associated to the Hamiltonian U (x) + y 2 2 , and For the telegraph one, µ = e −U (x) dx ⊗ These two processes share some common features. One of them is that there is no coercivity from the deterministic part of the dynamics when the potential is not convex; in other words two particles coupled with the same random part don't have any trend to get closer. In the other hand the randomness only occurs in the velocity variable, and thus the processes are fully degenerate in the sense of [4] and their Bakry-Emery curvature (definition 5.3.4 in [1]) is equal to −∞.
The study is restricted to dimension 1 in order to keep a reasonable level of computations and let the main ideas clear. The author did succesfully apply the method presented below to the telegraph in higher dimension, but surely we could improve our understanding of it and write it in more abstract settings, better suited for generalization. Figure 1: First marginal of the telegraph process at different time with a bi-modal invariant law e −U (x) dx, (X 0 , Y 0 ) = (7, −1) and a(x, y) = (yU (x)) + . While the potential decreases along the trajectory, the process is deterministic. It easily escapes from the local minimum. Here a(x, y) = 1 + (yU (x)) + . In other words, contrary to Figure 1, there is always a minimal level of randomness : the behaviour is more diffusive and it takes longer to leave the local minimum.
• For ν * > 0, φ presents damped oscillations with a period 2π ν * and a magnitude of order e −ηt . The theorem shows that F is interlaced with φ : F can be above φ but only if it's already been below, and not for too long.
• The rate of convergence is independent from the function f , but this result does not give a bound for the operator norm of the semi-group in L 2 . As will be seen in the sequel, φ 0 depends on F (0) and ∂xf0 2 , which can be arbitrarily large with F0 = 1 (we could obtain a bound by using estimates from the pseudodifferential calculus theory, but our aim was to avoid resorting to this powerfull tool and to stay very elementary). The result in [16] does the job with no derivative -but not exactly with the L 2 norm ; it could be possible to do the same in the present work.
Proof. The Gronwall lemma gives In the case where ν * ≤ 0, using twice the Gronwall lemma gives So now assume ν * > 0 and define Mt is always nondecreasing and it is constant when h is increasing. Le us show that ht ≤ Mt at every time. Assume it is false and consider s = inf{t > 0, ht > Mt}. Mt is constant for t in a neighborhood of s, hs−ε < Ms and hs+ > Ms for ε > 0 small enough. So, as h s ≤ −ν * hs < 0, necessarily h s > 0, which and we've reached a contradiction.
Concerning the length of an interval where F > φ, in other words where h > 0, define and so vanishes, according to the Sturm-Liouville comparison theorem (cf. [15] for instance), between two successive zeros of cos(ν * t + θ) for any θ.
In Section 2, the kFP and telegraph models are proven to satisfy an inequality of the form (4). Section 3 is devoted to numerical studies, whose conclusion is that the method can give the good order of magnitude for the exponential rate of convergence, but shouldn't be trusted to compute parameters which accurately give the asymptotically fastest convergence. Finally an appendix gathers the proof of the technical lemmas used throughout this work.
Acknowledgements. The author thanks Laurent Miclo, who initiated this work, and Sebastien Gadat, for fruitfull discussions.

Third order inequality
We start with considerations applying to both models. To compute the derivatives of Ft, we'll split L in its symmetric and anti-symmetric part. More precisely, if A and B are operators on L 2 (µ), we denote by A * the dual operator of A and by [A, B] the Lie Brackets AB − BA. <, > stands for the scalar product on L 2 (µ).
The proof is given in the appendix. The successive derivation of Ft could also be obtained with iterated Γ-calculus (see [34]), in particular for models where the invariant measure is not so easy to handle.
As in kinetic models the coercive part K of L only acts on the velocity variable, one cannot find any λ > 0 such that, for all ft, F t ≤ −λFt. We call µ1 (resp. µ2) the first (resp. second) marginal of µ, namely the position (resp. velocity) distribution at equilibrium. In our specific models we'll have µ = µ1 ⊗ µ2. We call V = Ker(µ2 − 1) the set of functions which do not depend on y. The orthogonal projection to V and V ⊥ will be respectively denoted by πV and π ⊥ : We will note fV = πV ft and f ⊥ = π ⊥ ft; as fV only depends on x we will sometimes consider fV as a one-parameter function in L 2 (µ1). Finally let Gt = ∂xft 2 , and recall that a measure ν is said to satisfy a Poincaré (or spectral gap) inequality with constant c if whenever νg = 0.

Lemma 2.
We have µ1fV = 0. In particular, if µ1 satisfies a Poincaré inequality with constant c, Proof. For the first assertion, Furthermore ∂ * x ∂x is self-ajoint and stabilizes V , so it stabilizes V ⊥ and Then Gt − c d F t ≥ cFt is clear.
Now we will show that in both models, the inequality (4) holds for some parameters.

The kinetic Fokker-Planck process
In this section (from Lemma 3 to Theorem 2) the generator is The invariant measure is µ = e −U (x) dx ⊗ e − y 2 2 dy so that From now on we will make some assumptions on the potential U : The potential U is smooth, U is bounded and µ1 = e −U (x) dx satisfies a Poincaré inequality with constant cU The smoothness and the Poincaré inequality conditions are usual assumptions (for instance in [42], [16]) ; however the boundedness of U is quite restrictive, and could be an artefact due to the lack of subtility of some of our computations.
We can decompose L = K + R − R * with We compute in appendix the brackets appearing in Lemma 1 : As expected the operator −∂ * x ∂x appears in the third derivative: It brings the coercivity in position, which is missing in F t . However it is known (cf. [24], [36]) that no linear combination of Ft, F t , F t and F t can be non-positive for every f ∈ L 2 (µ).
In the particular cases treated in [24] and [36], Gt the norm of the gradient in space appears naturally, thanks to Lemma 2. Indeed the smaller eigenvalue of ∂ * y ∂y on V ⊥ = (Ker∂ * y ∂y) ⊥ is 1 (Poincaré inequality for the gaussian distribution) and thus In the other hand, Assumption 1 and Lemma 2 ensure Gt ≥ cU fV 2 and lead to Finally in order to close the differential inequality we need the first derivative of Gt (see Appendix for the proof): We can now find a linear combination of Ft, Gt and their derivatives which is always non-positive. The terms in fV 2 will be controlled by F t , the ones in f ⊥ 2 by F t and G t .

Lemma 5.
Let A ∈ R and β, k > 0. Under Assumption 1, there exists τ * ∈ R such that for all Proof. The above computations (Lemma 1 and 3) allow to write, for any A ∈ R, The operator R (6 + 6(2K) + 2A) is annoying because, as a quadratic form on L 2 (µ), it is neither positive nor non-positive ; we'll give for it a not so subtle upper bound by the Cauchy-Schwarz and 2ab ≤ a 2 + b 2 inequalities with the sum of a term RR * to be controlled by G t and of a term only acting on V ⊥ , controled by F t .
More precisely, remark that R = Rπ ⊥ and furthermore that π ⊥ commutes with the selfajoint operators Id, K and U (x) which stabilize V (and so V ⊥ too). Thus, for any β > 0, We obtain, taking into account lemma 4, Now we want to replace the terms with U by something that does not depend on x (under Assumption 1).
So by denoting the previous computation leads to The eigenvalues of 2K on V ⊥ being the −2n for n ∈ Z+ (the eigenvectors are the so-called Hermite polynomials), consider any k ≥ 0 and so that P (2K) + 2τ K + k gets to be a non-positive bilinear form on V ⊥ for all τ ≥ τ k , in other words On the other hand Gt ≥ cU fV 2 (cf. Lemma 2) so that Now it remains to get rid of Gt thanks to (6), and to find a common root for Q1 and Q3 in order for inequation (4) to hold.

The telegraph process
This section is a replica of the previous one. From Lemma 6 to Theorem 3, the generator is As in the kFP case we compute the derivatives of Ft and Gt, proceed with a differential equation and conclude with a particular choice of the parameters which are in parties to the above approach. First, the invariant measure has to be explicited:

Lemma 6. The unique (up to a constant) invariant measure of the telegraph model is
Proof. Note that yU (x) = a(x, y) − a(x, −y). We check for all smooth f ∈ L 2 (µ), so that µ is invariant. In the other hand, the process is clearly irreducible (from any point X0 it can reach any ball in finite time with positive probability) and aperiodic (Xt can go back to X0 at an arbitrarily small time s with positive probability) and uniqueness of its invariant probability follows.
We note f−(x, y) = f (x, −y) and remark that and, keeping the previous notation πV and π ⊥ (or fV and f ⊥ ) for the orthogonal projections on V and V ⊥ , Thus yV = V ⊥ and yV ⊥ = V , and more precisely π ⊥ y = yπV πV y = yπ ⊥ .
Now recall that ∂ * x = ∂x − U and define Then Note that a + a− and U do not depend on y and so, seen as self-adjoint operators on L 2 (µ), they commute with π ⊥ and πV . In particular this gives K * = K. Now thanks to these considerations we can compute the following brackets (see the appendix for details).
Furthermore U (x) = x 0 (a(z, 1) − a(z, −1)) dz satisfies Assumption 1. Then, again Ft is controlled by Gt and F t . Indeed, Under Assumption 2, We will also need the derivative of Gt, computed in the appendix: We are now ready to prove a result similar to Lemma 5 Lemma 9. Under Assumption 2, there exist polynomialsQ1 andQ3 respectively of first and third order such thatQ 3(∂t)Ft +Q1(∂t)Gt ≤ 0 Proof. Lemma 1 and 7 give, for any A ∈ R, We consider any h ≥ 0 and write Now for the extra −4hR * R(a + a−), for any α ∈ (0, 1], via the Cauchy-Schwarz inequality, Then following again the steps of Lemma 5, for any β > 0, we bound Gathering all this, and recalling K = −(a + a−)π ⊥ we get For the last term, as long as β ≤ 1, Choose α < 1, let k ∈ 0, 4cU (1 − β) and define h k such that We will note (1 + h k ) and define the function so that everything comes down to Under Assumtion 2, in one hand Lemma 2 gives ∂xfV 2 ≥ cU fV 2 and, in the other hand H is bounded; so there exists τ k such that Thus, for all τ ≥ τ k , Finally we get Here ends the proof that (4) is satisfied for the telegraph model: Under assumption 1, there exist λ, η > 0, and t → νt ≥ ν * ∈ R with Re(η− √ −ν * ) > 0 such that (∂t + λ) (∂t + η) 2 + νt Ft ≤ 0.
Proof. We keep the notations used in the proof of Lemma 9; our purpose is to find some parameters for wichQ is zero for k = 4cU (1 − β), else negative. We take τ ≥ τ 4c U (1−β) large enough so that, for any k ∈ [0, 4cU (1 − β)],Q3 has a unique non-positive real root, which is continuous with respect to k. This real root is zero for k = 0 and negative otherwise, thus by continuity there exists a k ≥ 0 such thatQ3 andQ1 have a common root. We call −λ this root and consider u, v, w ∈ R such that (7) exactly gives νt ≥ ν * . It remains to find some parameters for which −λ and Re(−η ± √ −ν * ) are negative. These are the real parts of the roots of Take A > −cU a −1 * , α ∈ (0, 1), β = 1, τ large enough and k = 0, then h k = λ k = 0 and zero is a common root. ThusQ1(X) = 2X and Q3(X) + cU 2a * XQ1(X) + cUQ1(X) = X X 2 + A + cU a * X + τ + 2cU .
If X 2 + A + cU a −1 * X + τ + 2cU , polynomial with positive coefficients, has real roots, they are negative, and else Re(η ± √ −ν * ) = η = 1 2 A + cU a −1 * > 0. Now if β si slightly less than 1 this is still the case by continuity, but then −λ is a real root of a polynomial with positive coefficient so λ > 0.
Using the notations of Lemma 5 and 9, for the kFP process, and for the telegraph one. C is the set of parameters for which Q1 and Q3 have a common root and the inequality is proven, for instance in the kFP model one need (P defined in lemma 5). Nevertheless we can numerically deal with this and compare the obtained results with the theorical rates when they are known, namely in the case of a quadratic potential for the kFP process (see [24]) and for the constant jump rate of the telegraph on the torus (see [36]). Obviously, such examples may just be considered as some benchmarks and are not really interesting processes for MCMC algorithm. As a consequence, once we will have seen the numerical rates can be of the right order of magnitude for the kFP model, we won't push this analysis deeper.
First of all, we adapt Lemma 5 in order to allow some changes in the parameters. The same computations lead to Lemma 10. Consider the generator Under Assumption 1, for any A, k ∈ R and β > 0 there exists τ ∈ R such that In the other hand, In a MCMC algorithm, U would be given while b −1 the variance of the invariant speed and v the ratio between the antisymmetric and symmetric parts of the dynamics should be chosen to get the fastest convergence to equilibrium (given the instantaneous randomness injected in the system). Figure 3: Left: theorical rate computed in [24]. Right: numerical rate given by Theorem 2. When v is small (i.e. the antisymetric part of L is in a sense weak), the numerical rate is not very accurate and can miss the values for which the non reversible process is faster (asymptotically) than the reversible one. It becomes better with big values of v and for some b we get the right order of magnitude The real exponential rate of convergence for L v,b,U with U constant is (see [24]) Here are some numerical optimal rates given by Lemma 10 (to be compared to the theorical one, in brackets if it is not 1) for U = 1 and different values of v and b (see also