On The Rates of Decay to Equilibrium in Degenerate and Defective Fokker-Planck Equations

We establish sharp long time asymptotic behaviour for a family of entropies to defective Fokker-Planck equations and show that, much like defective finite dimensional ODEs, their decay rate is an exponential multiplied by a polynomial in time. The novelty of our study lies in the amalgamation of spectral theory and a quantitative non-symmetric hypercontractivity result, as opposed to the usual approach of the entropy method.


INTRODUCTION
1.1. Background. The study of Fokker-Planck equations (sometimes also called Kolmogorov forward equations) has a long history -going back to the early 20th century. Originally, Fokker and Planck used their equation to describe Brownian motion in a PDE form, rather than its usual SDE representation. In its most general form, the Fokker-Planck equation reads as with t > 0, x ∈ R d , and where D i j (x), A i (x) are real valued functions, with D(x) = D i j (x) i , j =1,...,d being a positive semidefinite matrix. The Fokker-Planck equation has many usages in modern mathematics and physics, with connection to statistical physics, plasma physics, stochastic analysis and mathematical finances. For more information about the equation, we refer the reader to [19]. Here we will consider a very particular form of (1.1) that allows degeneracies and defectiveness to appear.

The Fokker-Planck Equation in our Setting.
In this work we will focus our attention on Fokker-Planck equations of the form: with appropriate initial conditions, where the matrix D (the diffusion matrix) and C (the drift matrix) are assumed to be constant and real valued. In addition to the above, we will also assume the following: (A) D is a positive semidefinite matrix with 1 ≤ r := rank (D) ≤ d .
(B) All the eigenvalues of C have positive real part (this is sometimes called positively stable). (C) There exists no non-trivial C T -invariant subspace of Ker (D) (this is equivalent to hypoellipticity of (1.2), cf. [12]).
Each of these conditions has a significant impact on the equation: • Condition (A) allows the possibility that our Fokker-Planck equation is degenerate (r < d ).
• Condition (B) implies that the drift term confines the system. Hence it is crucial for the existence of a non-trivial steady state to the equation, and • Condition (C) tells us that when D is degenerate, C compensates for the lack of diffusion in the appropriate direction and "pushes" the solution back to where diffusion happens.
Equations of the form (1.2), with emphasis on the degenerate structure (and hence d ≥ 2), have been extensively investigated recently (see [2], [17]) and were shown to retain much of the structure of their non-degenerate counterpart. When it comes to the question of long time behavior, it has been shown in [2] that under Conditions (A)-(C) there exists a unique equilibrium state f ∞ to (1.2) with a unit mass (it was actually shown that the kernel of L is one dimensional) and that the convergence rate to it can be explicitly estimated by the use of the so called (relative) entropy functionals. Based on [3,5], and denoting by R + := {x > 0 | x ∈ R} and R + 0 := R + ∪ {0}, we introduce these entropy functionals: Definition 1.1. We say that a function ψ is a generating function for an admissible relative entropy if ψ ≡ 0, ψ ∈ C R + 0 ∩ C 4 R + , ψ(1) = ψ (1) = 0, ψ > 0 on R + and For such a ψ, we define the admissible relative entropy e ψ ·| f ∞ to the Fokker-Planck equation (1.2) with a unit mass equilibrium state f ∞ , as the functional for any non-negative f with a unit mass.

Remark 1.2.
It is worth to note a few things about Definition 1.1: • As ψ is only defined on R + 0 the admissible relative entropy can only be used for non-negative functions f . This, however, is not a problem for equation (1.2) as it propagates non-negativity.
This means that up to some multiplicative constant, e 2 is the square of the (weighted) L 2 norm.
A detailed study of the rate of convergence to equilibrium of the relative entropies for (1.2) when r < d was completed recently in [2]. Denoting by L 1 + R d the space of non-negative L 1 functions on R d , the authors have shown the following: (ii) If one of the eigenvalues from the set (1.6) is defective, then for any > 0 there exists a fixed geometric constant c , that doesn't depend on f , such that The loss of the exponential rate e −2µt in part (i i ) of the above theorem is to be expected, however it seems that replacing it by e −2(µ− )t is too crude. Indeed, if one considers the much related, finite dimensional, ODE equivalenṫ where the matrix B ∈ R d ×d is positively stable and has, for example, a defect of order 1 in an eigenvalue with real part equal to µ > 0 (defined as in (1.5)), then one notices immediately that i.e. the rate of decay is worsened by a multiplication of a polynomial of the order twice the defect of the "minimal eigenvalue". The goal of this work is to show that the above is also the case for our Fokker-Planck equation. 1 An eigenvalue is defective if its geometric multiplicity is strictly less than its algebraic multiplicity. We will call the difference between these numbers the defect of the eigenvalue.
We will mostly focus our attention on the natural family of relative entropies e p ·| f ∞ , with 1 < p ≤ 2, which are generated by Notice that ψ 1 can be understood to be the limit of the above family as p goes to 1.
An important observation about the above family, that we will use later, is the fact that the generating function for p = 2, associated to the entropy e 2 , is actually defined on R and not only R + . This is not surprising as we saw the connection between e 2 and the L 2 norm. This means that we are allowed to use e 2 even when we deal with functions without a definite sign. Our main theorem for this paper is the following: for t ≥ 0, where c p > 0 is a fixed geometric constant, that doesn't depend on f 0 , and f ∞ is the unique equilibrium with unit mass.
The main idea, and novelty, of this work is in combining elements from Spectral Theory and the study of our p−entropies. We will give a detailed study of the geometry of the operator L in the L 2 R d , f −1 ∞ space and deduce, from its spectral properties, the result for e 2 . Since the other entropies, e p for 1 < p < 2, lack the underlying geometry of the L 2 space that e 2 enjoys, we will require additional tools: We will show a quantitative result of hypercontractivity for non-symmetric Fokker-Planck operators that will assure us that after a certain, explicit time, any solution to our equation with finite p−entropy will belong to L 2 R d , f −1 ∞ . This, together with the dominance of e 2 over e p for functions in L 2 R d , f −1 ∞ will allow us to "push" the spectral geometry of L to solutions with initial datum that only has finite p−entropy. We have recently become aware that the long time behaviour of Theorem 1.4 has been shown in a preprint by Monmarché, [15]. However, the method he uses to show this result is a generalised entropy method (more on which can be found in §5), while we have taken a completely different approach to the matter. The structure of the work is as follows: In §2 we will recall known facts about the Fokker-Planck equation (degenerate or not). §3 will see the spectral investigation of L and the proof of Theorem 1.4 for p = 2. In §4 we will show our nonsymmetric hypercontractivity result and conclude the proof of our Theorem 1.4. Lastly, in §5 we will recall another important tool in the study of Fokker-Planck equations -the Fisher information -and show that Theorem 1.4 can also be formulated for it, due to the hypoelliptic regularisation of the equation.

THE FOKKER-PLANCK EQUATION
This section is mainly based on recent work of Arnold and Erb (see [2]). We will provide here, mostly without proof, known facts about degenerate (and nondegenerate) Fokker-Planck equations of the form (1.2). Moreover, if f 0 = 0 it is strictly positive for all t > 0.

Theorem 2.2. Assume that the diffusion and drift matrices, D and C, satisfy Conditions (A)-(C). Then, there exists a unique stationary state f
Corollary 2.4. The Fokker-Planck operator L can be rewritten as A surprising, and useful, property of (1.2) is that the diffusion and drift matrices associated to it can always be simplified by using a change of variables. The following can be found in  The above matrix normalisation has additional impact on the calculation of the adjoint operator: Corollary 2.6. Let C s = D. Then: where L * denotes the (formal) adjoint of L, considered w.r.t. L 2 R d , f −1 ∞ . The domain of L will be discussed in §3.
(ii) The kernels of L and L * are both spanned by exp(− |x| 2 2 ). This is not true in general, i.e. for a Fokker-Planck operator L without the matrix normalisation assumption.
Proof. (i) Under the normalising coordinate transformation of Theorem 2.5 we see from (2.2) that (2.4) (ii) follows from (2.1) and K = I.
From this point onwards we will always assume that Conditions (A)-(C) hold, and that we are in the coordinate system where D is of form (2.3) and equals C s .

THE SPECTRAL STUDY OF L
The main goal of this section is to explore the spectral properties of the Fokker-Planck operator L in L 2 R d , f −1 ∞ , and to see how one can use them to understand rates of convergence to equilibrium for e 2 . The crucial idea we will implement here is that, since L 2 R d , f −1 ∞ decomposes into orthogonal eigenspaces of L with eigenvalues that get increasingly farther to the left of the imaginary axis, one can deduce improved convergence rates on "higher eigenspaces". The first step in achieving the above is to recall the following result from [2], where we use the notation N 0 := N ∪ {0}:

and V m are invariant under L and its adjoint (and thus under the flow of (1.2)).
Moreover, the spectrum of L satisfies where λ j j =1,...,d are the eigenvalues (with possible multiplicity) of the matrix C.

The eigenfunctions of L (or eigenfunctions and generalized eigenfunctions in the
Let us note that this orthogonal decomposition is non-trivial since L is in general non-symmetric. The above theorem quantifies our previous statement about "higher eigenspaces": the minimal distance between the eigenvalues of L restricted to the "higher" L-invariant eigenspace V m and the imaginary axis is mµ. Thus, the decay we expect to find for initial datum from V m is of order e −2mµt (in the quadratic entropy, e.g.). However, as the function we will use in our entropies are not necessarily contained in only finitely many V m , we might need to pay a price in the rate of convergence. This intuition is indeed true. Denoting by for any k ≥ 0, we have the following: Then for any 0 < < µ there exists a geometric constant c k, ≥ 1 that depends only on k and such that The loss of an in the decay rate of (3.2) -compared to the decay rate solely on V k -can have two causes: (1) For drift matrices C with a defective eigenvalue with real part µ, the larger decay rate 2kµ would not hold in general. This is illustrated in (1.7), which provides the best possible purely exponential decay result, as proven in [2]. (2) For non-defective matrices C, the improved decay rate 2kµ actually holds, but our method of proof, that uses the Gearhart-Prüss Theorem, cannot yield this result. The decay estimate (3.2) will be improved in Theorem 3.11: There, the -reduction drops out in the non-defective case.

Remark 3.4.
As we insinuated in the introduction to our work, an important observation to make here is that the initial data, f 0 , doesn't have to be non-negative (and in many cases, is not). While this implies that f (t ) might also be nonnegative, this poses no problems as e 2 is the squared (weighted) L 2 norm (up to a constant). Theorem 3.2 would not work in general for e p as the non-negativity of f (t ) is crucial there (in other words, f 0 would not be admissible).
The main tool to prove Theorem 3.2 is the Gearhart-Prüss Theorem (see for instance Th. 1.11 Chap. V in [8]). In order to be able to do that, we will need more information about the dissipativity of L and its resolvents with respect to H k .
Proof. Given f ∈ D(L), and denoting g : where we have used the fact that C s = D. Thus, L is dissipative.
To show the second statement we use the Lumer-Phillips Theorem (see for instance Th. 3.15 Chap. II in [8]). Since is finite dimensional, and is invariant under L (Theorem 3.1 again) we can consider the linear bounded operator L| V m : V m → V m . Since we have shown that L is dissipative, we can conclude that the eigenvalues of L| V m have non-positive real parts, implying that (λI − L) | V m is invertible. This in turn implies that completing the proof.
To study the resolvents of L we will need to use some information about its "dual": the Ornstein-Uhlenbeck operator. For a given symmetric positive semidefinite matrix Q = (q i j ) and a real, negatively stable matrix B = (b i j ) on R d we consider the Ornstein-Uhlenbeck operator Similarly to our conditions on the diffusion and drift matrices, we will only be interested in Ornstein-Uhlenbeck operators that are hypoelliptic. In the above setting, this corresponds to the condition The hypoellipticity condition guarantees the existence of an invariant measure, d µ, to the process. This measure has a density w.r.t. the Lebesgue measure, which is given by where c M > 0 is a normalization constant. It is well known that the above definition of M is equivalent to finding the unique solution to the continuous Lyapunov equation (See for instance Theorem 2.2 in [20], §2.2 of [13].) Hypoelliptic Ornstein-Uhlenbeck operators have been studied for many years, and more recently in [18] the authors considered them under the additional possibility of degeneracy in their diffusion matrix Q. In [18], the authors described the domain of the closed operator P Q,B , and have found the following resolvent estimation: Theorem 3.6. Consider the hypoelliptic Ornstein-Uhlenbeck operator P Q,B , as in (3.3), and its invariant measure d µ(x). Then there exist some positive constants c,C > 0 such that for any z ∈ Γ κ , with (3.5) We illustrate the spectrum of P Q,B and the domain Γ κ in Figure 1. In order to use the above theorem for our operator, L, we show the connection between it and P in the following lemma: Lemma 3.7. Assume that the associated diffusion and drift matrices for L, defined We start by recalling that we assume that D = C s . Since (3.4) can be rewritten as 2D = CM + MC T for our choice of Q and B, we conclude that M = I for P 2D,−C and that P 2D,−C * = P 2D,−C T (the last equality can be shown in a similar way to (2.4)). Thus, the invariant measure corresponding to both these operators is where the adjoint is considered w.r.t.
With this at hand we can recast, and improve, Theorem 3.6 for the operator L and its closure. Proposition 3.8. Let any k ∈ N 0 be fixed. Consider the set Γ κ , defined by (3.5), associated to Q = 2D, B = −C T (Condition (C) guarantees the existence of such κ). Then we have that, for any z ∈ Γ κ , the operator (L − z I ) | H k : H k → H k is well defined, closable, and its closure is invertible with where C > 0 is the same constant as in Theorem 3.6.
Proof. We consider the case k = 0 first. Due to Theorem 3.6 we know that for any which can also be written differently due to (3.7), as This implies that L − z I is bijective on its appropriate space. Next we notice that, with the notations from Lemma 3.7 from which we conclude that completing the proof for this case. We now turn our attention to the restrictions (L − z I ) | H k with k ≥ 1 and domain Moreover, the dissipativity of L on D(L) assures us that L is dissipative, and as such closable, on the Hilbert space H k . Thus (L − z I )| H k is closable too and Additionally, since the only part of L 2 R d , f −1 ∞ that is not in H k is a finite dimensional subspace of D(L), we can conclude that Given z in the resolvent set of L we know that L − z I | V m : V m → V m is invertible for any m and as such We conclude that (L − z I )| H k is injective with a dense range in H k for any z ∈ Γ κ , and hence invertible on its range. The validity of (3.8) for k = 0 allows us to extend our inverse to H k with the same uniform bound as is given in (3.8). The general case is now proved.
From this point onward, we will assume that we are dealing with the closed operator L and with its appropriate domain (that includes m∈N 0 V m ) when we consider our equation. We will also write L instead of L in what is to follow. Lemma 3.5 and Proposition 3.8 are all the tools we need to estimate the uniform exponential stability of our evolution semigroup on each H k , an estimation that is crucial to show Theorem 3.2. Proposition 3.9. Consider the Fokker-Planck operator L, defined on L 2 R d , f −1 ∞ , and the spaces {H k } k≥1 defined in (3.1). Then, for any 0 < < µ, the semigroup generated by the operator L + kµ − I | H k , with domain D(L) ∩ H k , is uniformly exponentially stable. I.e., there exists some geometric constant C k, > 0 such that Proof. We will show that and conclude the result from the fact that L generates a contraction semigroup according to Lemma 3.5 and the Gearhart-Prüss Theorem.
The study of upper bounds for the resolvents of L + [kµ − ]I in the right-hand complex plane relies on subdividing this domain into several pieces. This is illustrated in Figure 2, which we will refer to during the proof to help visualise this division.
Since L generates a contraction semigroup, for any > 0, L − I generates a semigroup that is uniformly exponentially stable on L 2 (R d , f −1 ∞ ). The Gearhart-Prüss Theorem applied to L − I implies that where we removed the subscript H k from the operator on the left-hand side to simplify notations. Since (this term corresponds to the right-hand side of the dashed line in Figure 2). From the above we conclude that . which implies that we only need to show that the second term in the parenthesis is finite (this term corresponds to the area between the dashed line and the imaginary axis in Figure 2). Using Proposition 3.8 we conclude that where the eigenvalues of the 2 × 2 matrix C are given by λ 1,2 = 1± 7 2 i . The empty dots are the eigenvalues of the operator L + [2µ − ]I that disappear due to the restriction to H 2 , and the shaded area represents the compact set {z ∈ C | 0 ≤ Re z ≤ 2µ} ∩ {z ∈ Γ κ + 2µ − } where κ = 1.
(represented in Figure 2 by the domain between the two solid blue curves). We conclude that M k, < ∞ if and only if Since Re z = − is the closest vertical line to Re z = 0 which intersects σ L + [kµ − ]I | H k , we notice that 0 < Re z ≤ kµ ∩ z ∈ Γ κ + kµ − (represented by the shaded area in Figure 2) is a compact set in the resolvent set of L + [kµ − ]I | H k . As the resolvent map is analytic on the resolvent set, we conclude that M k, < ∞, completing the proof. Remark 3.10. While the constant mentioned in (3.9) is a fixed geometric one, the original Gearhart-Prüss theorem doesn't give an estimation for it. However, recent studies have improved the original theorem and have managed to find explicit expression for this constant by paying a small price in the exponential power. As we can afford to "lose" another small , we could use references such as [11,14] to have a more concrete expression for C k, . We will avoid giving such an expression in this work to simplify its presentation.

We finally have all the tools to show Theorem 3.2:
Proof of Theorem 3.2. Using the invariance of V 0 and H k under L and Proposition 3.9 we find that for any f k ∈ H k showing the desired result.
Theorem 3.2 has given us the ability to control the rate of convergence to equilibrium of functions with initial data that, up to f ∞ , live on a "higher eigenspace". Can we use this information to understand what happens to the solution of an arbitrary initial datum f 0 ∈ L 2 R d , f −1 ∞ with unit mass? The answer to this question is Yes. Since for any k ≥ 1 and the Fokker-Planck semigroup is invariant under all the above spaces, we are motivated to split the solution of our equation into a part in V 0 ⊕ H k+1 and a part in k m=1 V m -which is a finite dimensional subset of D(L). As we now know that decay in k m=1 V m is slower than that for H k+1 we will obtain a sharp rate of convergence to equilibrium. We summarise the above intuition in the following theorem:

Theorem 3.11. Consider the Fokker-Planck equation (1.2) with diffusion and drift matrices satisfying Conditions (A)-(C). Let f
be a given function with unit mass such that where f k 0 ∈ V k 0 is non-zero andf k 0 ∈ H k 0 +1 . Denote by [L] k 0 the matrix representation of L with respect to an orthonormal basis of V k 0 and let where µ is defined in (1.5). Then, there exists a geometric constant c k 0 , which is independent of f 0 , such that

Remark 3.12.
As can be seen in the proof of the theorem, the sign of f 0 plays no role. As such, the theorem could have been stated for f 0 ∈ L 1 R d ∩ L 2 R d , f −1 ∞ . We decided to state it as is since it is the form we will use later on, and we wished to avoid possible confusion.
Proof of Theorem 3.11. Due to the invariance of all V m under L we see that with e Lt f k 0 ∈ V k 0 and e Ltf k 0 ∈ H k 0 +1 . From Theorem 3.2 we conclude that for any 0 < < µ. Next, we denote by d k := dim(V k ) and let {ξ i } i =1,...,d k 0 be an orthonormal basis for V k 0 . The invariance of V m under L implies that we can write with a(t ) := a 1 (t ), . . . , a d k 0 (t ) satisfying the simple ODĖ This, together with the definition of n k 0 and the fact that a matrix and its transpose share eigenvalues and defect numbers, implies that we can find a geometric constant that depends only on k 0 such that Since we see, by combining Theorem 3.2 and (3.11) that Hence µt . This completes the proof, as we have seen that Remark 3.13. The idea to split a solution into a few parts is viable only for the 2−entropy. The reason behind it is that such splitting, regardless of whether or not it can be done to functions outside of L 2 R d , f −1 ∞ , will most likely create functions without a definite sign. These functions can not be explored using the p−entropy with 1 < p < 2. Theorem 3.11 gives an optimal rate of decay for the 2−entropy. However, one can underestimate the rate of decay by using Theorem 3.2 and remove the condition f k 0 = 0 to obtain the following: Corollary 3.14. The statement of Theorem 3.11 remains valid when replacing k 0 by any 1 ≤ k 1 ≤ k 0 . However, the decay estimate (3.10) will not be sharp when Proof of Theorem 1.4 for p = 2. The proof follows immediately from Corollary 3.14 for k 1 = 1. Now that we have learned everything we can on the convergence to equilibrium for e 2 , we can proceed to understand the convergence to equilibrium of e p .

NON-SYMMETRIC HYPERCONTRACTIVITY AND RATES OF CONVERGENCE FOR THE p−ENTROPY
In this section we will show how to deduce the rate of convergence to equilibrium for the family of p−entropies, with 1 < p < 2, from e 2 . The main thing that will make the above possible is a non-symmetric hypercontractivity property of our Fokker-Planck equation -namely, that any solution to the equation with (initially only) a finite p−entropy will eventually be "pushed" into L 2 R d , f −1 ∞ , at which point we can use the information we gained on e 2 . Before we show this result, and see how it implies our main theorem, we explain why and how this non-symmetric hypercontractivity helps.
(ii) for any 1 < p 1 < p 2 ≤ 2 there exists a constant C p 1 ,p 2 > 0 such that In particular, for any 1 < p < 2 for a fixed geometric constant.
Proof. (i ) is trivial. To prove (i i ) we consider the function g (y) := p 2 (p 2 −1) p 1 (p 1 −1) Clearly g ≥ 0 on R + , and it is easy to check that it is continuous. Since we have lim y→∞ g (y) = 0, we can conclude the result using (1.4).
It is worth to note that the second point of part (i i ) of Lemma 4.1 can be extended to general generating function for an admissible relative entropy. The following is taken from [3]: Let ψ be a generating function for an admissible relative entropy. Then one has that ψ(y) ≤ 2ψ (1)ψ 2 (y), y ≥ 0.
In particular e p ≤ 2e 2 for any 1 < p < 2 whenever e 2 is finite.
Lemma 4.1 assures us that, if we start with initial data in L 2 R d , f −1 ∞ , then e p will be finite. Moreover, due to Theorem 1.4 for p = 2, and the fact that the solution to (1.2) However, one can easily find initial data f 0 ∈ L 2 R d , f −1 ∞ with finite p−entropies. If one can show that the flow of the Fokker-Planck equation eventually forces the solution to enter L 2 R d , f −1 ∞ , we would be able to utilise the idea we just presented, at least from that time on. This explicit non-symmetric hypercontractivity result we desire, is the main new theorem we present in this section.
(i) Then, for any q > 1, there exists an explicit t 0 > 0 that depends only on geometric constants of the problem such that the solution to (1.2) satisfies for t ≥t 0 (p) > 0, which can be given explicitly.  Q and drift matrix B), discussed in §3. With this notation, (4.3) is equivalent to . Since e 2 decreases along the flow of our equation, (4.4) is valid for p = 2 with C 2,d = 1. Thus, by using the Riesz-Thorin theorem one can improve inequality (4.4) to the same inequality with the constant We would like to point out at this point that a simple limit process shows that (4.4) is also valid for p = 1, but there is no connection between the L 1 norm of g and the Boltzmann entropy, e 1 , of f 0 . [16], the notion of hypercontractivity has been studied extensively for Markov diffusive operators (implying selfadjointness). A contemporary review of this topic can be found in [4]. For such selfadjoint generators, hypercontractivity is equivalent to the validity of a logarithmic Sobolev inequality, as proved by Gross [10]. For non-symmetric generators, however, this equivalence does not hold: While a log Sobolev inequality still implies hypercontractvity of related semigroups (cf. the proof of Theorem 5.2.3 in [4]), the reverse implication is not true in general (cf. Remark 5.1.1 in [22]). In particular, hypocoercive degenerate parabolic equations cannot give rise to a log Sobolev inequality, but they may exhibit hypercontractivity (as just stated above). The last 20 years have seen the emergence of the, more delicate, study of hypercontractivity for non-symmetric and even degenerate semigroups. Notable works in the field are the paper of Fuhrman, [9], and more recently the work of Wang et al., [6,7,21]. Most of these works consider an abstract Hilbert space as an underlying domain for the semigroup, and to our knowledge none of them give an explicit time after which one can observe the hypercontractivity phenomena (Fuhrman gives a condition on the time in [9]). Our hypercontractivity theorem, which we will prove shortly, gives not only an explicit and quantitative inequality, but also provides an estimation on the time one needs to wait before the hypercontractivity occurs. To keep the formulation of Theorem 4.3 simple we did not include this "waiting time" there, but we emphasised it in its proof. Moreover, the hypercontractivity estimate from Theorem 4.3(i) only requires (4.1), a weighted L 1 norm of f 0 . This is weaker than in usual hypercontractivity estimates, which use L p norms as on the r.h.s. of (4.4).

Remark 4.5. Since its original definition for the Ornstein-Uhlenbeck semigroup in the work of Nelson,
It is worth to note that we prove our theorem under the setting of the e p entropies, which can be thought of as L p spaces with a weight function that depends on p.
In order to be able to prove Theorem 4.3 we will need a few technical lemmas. where This is a well known result, see for instance §1 in [12] or §6.5 in [19].

Lemma 4.7. Assume that the diffusion and drift matrices, D and C, satisfy Conditions (A)-(C), and let K be the unique positive definite matrix that satisfies
Then (in any matrix norm) where c > 0 is a geometric constant depending on n and µ, with n being the maximal defect of the eigenvalues of C with real part µ, defined in (1.5).
Proof. We start the proof by noticing that K is given by (see for instance [18]). As such ˆ∞ t e −Cs e −C T s d s.
Using the fact that Ae −Ct A −1 = e −ACA −1 t for any regular matrix A, we conclude that, if J is the Jordan form of C, then where A J is the similarity matrix between C and its Jordan form. For a single Jordan block of size n+1 (corresponding to a defect of n in the eigenvalue λ), J, we find that Thus, we conclude that 1 + t n e Re(λ)t x 1 = (n + 1) 1 + t n e Re(λ)t x 1 , t ≥ 0.
Due to the equivalence of norms on finite dimensional spaces, there exists a geometric constant c 1 > 0, that depends on n, such that (4.7) e Jt ≤ c 1 1 + t n e Re(λ)t .
Coming back to C, we see that the above inequality together with (4.6) imply that e −Ct is controlled by the norm of C's largest (measured by the defect number) Jordan block of the eigenvalue with smallest real part. From this, and (4.7), we conclude that The same estimation for e −C T t implies that for some geometric constant c 3 > 0 that depends on n.
we conclude the desired result.
While we can continue with a general matrix K, it will simplify our computations greatly if K would have been I. Since we are working under the assumption that D = C S , the normalization from Theorem 2.5 implies exactly that. Thus, from this point onwards we will assume that K is I. Lemma 4.8. For any > 0 there exists an explicit t 1 > 0 such that for all t ≥ t 1 where W(t ) is as in Lemma 4.7. An explicit, but not optimal choice for t 1 is given by where 0 < α < µ is arbitrary and c > 0 is given by Lemma 4.7. Proof. We have that for any invertible matrix A In addition, if A − I < 1, then Thus, for any t > 0 such that W(t ) − I < 1 we have that Combining the above with (4.10), shows the first result for t 1 =t 1 ( ).
To prove the second claim we will show that For this elementary proof we use the fact that for any a, b > 0. Thus, choosing a = 2α, where 0 < α < µ is arbitrary, and b = 2n we have that As a consequence, if then s ≥t 1 ( ) due to (4.11). The smallest possible s in (4.12) is obtained by solving the corresponding equality for t , and yields (4.9), concluding the proof.
We now have all the tools to prove Theorem 4.3 Proof of Theorem 4.3. To show (i ) we recall Minkowski's integral inequality, which will play an important role in estimating the L p norms of f (t ).
Proof of Theorem 1.4 for 1 < p < 2. Using Theorem 4.3 (i i ) we find an explicit T 0 (p) such that for any t ≥ T 0 (p) the solution to the Fokker-Planck equation, Proceeding similarly to the previous remark (but now with q = 2 and = p−1 4p ) we have 1 := min 4p . This yields the following upper bound for the "waiting time" in the hypercontractivity estimate (4.3): Using Lemma 4.2, Theorem 1.4 for p = 2 (which was already proven in §3), and inequality (4.3) we conclude that for any t ≥ T 0 (p) To complete the proof we recall that any admissible relative entropy decreases along the flow of the Fokker-Planck equation (see [2] for instance). Thus, for any t ≤ T 0 (p) we have that The theorem now follows from (4.19) and (4.20), together with the fact that for a 1 < p < 2 where C p := sup x≥0 x (p(p−1)x+1) 2 p < ∞.
We end this section with a slight generalization of our main theorem: Theorem 4.10. Let ψ be a generating function for an admissible relative entropy. Assume in addition that there exists C ψ > 0 such that (4.21) ψ p (y) ≤ C ψ ψ(y) for some 1 < p < 2 and all y ∈ R + . Then, under the same setting of Theorem 1.4 (but now with the assumption e ψ ( f 0 | f ∞ ) < ∞) we have that where c p,ψ > 0 is a fixed geometric constant.
Proof. The proof is almost identical to the proof of Theorem 1.4. Due to (4.21) we know that e p ( f 0 | f ∞ ) < ∞. As such, according to Theorem 4.3 (i i ) there exists an explicit T 0 (p) such that for all t ≥ T 0 (p) we have that f (t ) ∈ L 2 R d , f −1 ∞ and Fokker-Planck equation and finding a closed functional inequality for it. By an appropriate integration in time, one can then obtain (5.1). Problems start arising with the above method when D is not invertible. As can be seen from the expression of I D ψ -there are some functions that are not identically f ∞ yet yield a zero Fisher information. In recent work of Arnold and Erb ( [2]), the authors managed to circumvent this difficulty by defining a new positive definite matrix P 0 that is strongly connected to the drift matrix C, and for which (5.1) is valid as a functional inequality. They proceeded to successfully use the Bakry-Émery method on I P 0 ψ and conclude from it, and the log-Sobolev inequality, rates of decay for I D ψ (which is controlled by I P 0 ψ ) and e ψ . This is essentially what is behind the exponential decay in Theorem 1.3. Moreover, in the defective case (ii), it led to an -reduced exponential decay rate. As we have managed to obtain better convergence rates to equilibrium (in relative entropy) for the case of defective drift matrices C, one might ask whether or not the same rates will be valid for the associated Fisher information I D p := I D ψ p . The answer to that question is Yes, and we summarise this in the next theorem: (1.5) and assume that one, or more, of the eigenvalues of C with real part µ are defective. Denote by n > 0 the maximal defect of these eigenvalues. Then, for any 1 < p ≤ 2, the solution f (t ) to (1.2) with initial datum f 0 ∈ L 1 + R d that has a unit mass and I P 0 p ( f 0 | f ∞ ) < ∞ satisfies:

Theorem 5.3. Consider the Fokker-Planck equation (1.2) with diffusion and drift matrices D and C which satisfy Conditions (A)-(C). Let µ be defined as in
where c p ( f 0 ) depends on I P 0 Proof. We first note that Proposition 4.4 from [2] implies the estimate e p f 0 | f ∞ ≤ c I P 0 p ( f 0 | f ∞ ) < ∞, and hence Theorem 1.4 applies. This decay of e p carries over to I P 0 p due to the following two ingredients: For small t we can use the purely exponential decay of I P 0 p as established in Proposition 4.5 of [2] (with the rate 2(µ − )). And for large time we use the (degenerate) parabolic regularisation of the Fokker-Planck equation (1.2): As proven in Theorem 4.8 of [2] we have for all τ ∈ (0, 1] that where ψ is the generating function for an admissible relative entropy. And κ > 0 is the minimal number such that there existsλ > 0 with κ j =0 C j D C T j ≥λI. The existence of such κ andλ is guaranteed by Condition (C) and equivalent to the rank condition (3.6)-cf. Lemma 2.3 in [1].