The Poincar\'e inequality and quadratic transportation-variance inequalities

It is known that the Poincar\'e inequality is equivalent to the quadratic transportation-variance inequality (namely $W_2^2(f\mu,\mu) \leqslant C_V \mathrm{Var}_\mu(f)$), see Jourdain \cite{Jourdain} and most recently Ledoux \cite{Ledoux18}. We give two alternative proofs to this fact. In particular, we achieve a smaller $C_V$ than before, which equals the double of Poincar\'e constant. Applying the same argument leads to more characterizations of the Poincar\'e inequality. Our method also yields a by-product as the equivalence between the logarithmic Sobolev inequality and strict contraction of heat flow in Wasserstein space provided that the Bakry-\'Emery curvature has a lower bound (here the control constants may depend on the curvature bound). Next, we present a comparison inequality between $W_2^2(f\mu,\mu)$ and its centralization $W_2^2(f_c\mu,\mu)$ for $f_c = \frac{|\sqrt{f} - \mu(\sqrt{f})|^2}{\mathrm{Var}_\mu (\sqrt{f})}$, which may be viewed as some special counterpart of the Rothaus' lemma for relative entropy. Then it yields some new bound of $W_2^2(f\mu,\mu)$ associated to the variance of $\sqrt{f}$ rather than $f$. As a by-product, we have another proof to derive the quadratic transportation-information inequality from Lyapunov condition, avoiding the Bobkov-G\"otze's characterization of the Talagrand's inequality.


Introduction
The aim of this paper is to investigate some links between the Poincaré inequality (PI for short) and various comparison inequalities of quadratic Wasserstein distance with variance. Some conclusions might be extended to abstract settings of metric measure spaces, nevertheless for simplicity, our basic framework is specified as follows. Let E be a connected complete Riemannian manifold of finite dimension, d the geodesic distance, dx the volume measure, P(E) the collection of all probability measures on E, µ(dx) = e −V (x) dx ∈ P(E) with V ∈ C 1 (E), L = ∆ − ∇V · ∇ the µ-symmetric diffusion operator with domain D(L), and Γ(f, g) = ∇f · ∇g the carré du champ operator with domain D(Γ), satisfying the integration by parts formula where C(ν, µ) denotes the set of any coupling π on E × E with marginals ν and µ respectively. Throughout this paper we focus on quadratic Wasserstein distance, so it is convenient to assume µ has a finite moment of order 2. The reader is referred to several constant references as Bakry-Gentil-Ledoux [2] and Villani [16,17] for detailed presentations.
Our motivation partially arises from the problem of how to characterize the exponential decay of quadratic Wasserstein distance along heat flow. It is known that the exponential decay of heat semigroup P t = exp(tL) in L 2 -norm is equivalent to PI, which reads for any f ∈ D(Γ) ∩ L 2 (µ) (simply denote by µ(h) = hdµ the expectation and by Var µ (f ) = µ(f 2 ) − (µ(f )) 2 the variance). Similarly, the exponential decay of P t in relative entropy is equivalent to the logarithmic Sobolev inequality (LSI for short), which reads for any f > 0 with √ f ∈ D(Γ) (denote by Ent µ (f ) = f log f dµ the relative entropy and by I µ (f ) = Γ(f,f ) f dµ the Fisher information). Somehow, we think it is tough to give a proper answer to the same question in Wasserstein space, namely to find some equivalent inequality characterizing W 2 2 (P t ν, µ) e −2κt W 2 2 (ν, µ) (or up to a multiple) with κ > 0 for any ν = f µ ∈ P(E). When we turn to some weak replacements, one natural candidate is to compare W 2 with variance, which can be quickly derived from the control inequality of weighted total variation (see [16,Proposition 7.10]) and Hölder inequality that W 2 2 (ν, µ) 2||d 2 (x 0 , ·)(ν − µ)|| TV 2 d 2 (x 0 , ·) |f − 1| dµ C Var µ (f ) if d 4 (x 0 , ·) is µ-integrable. At least, it follows the integrability of W 2 2 (P t ν, µ) for t ∈ [0, ∞) provided that PI holds true, which is helpful to the semigroup analysis more or less.
If µ fulfills the Talagrand's inequality (W 2 H for short), namely the control of relative entropy on W 2 (ν, µ) as In particular, for p = 2 it covers W 2 2 (ν, µ) C Var µ (f ), and for p = 1 it gives which suggests an improved decay rate of W 2 along heat flow. Since W 2 H implies PI with C P C T (see [2] for example), it is natural to ask what about the relation between PI and a transportation-variance inequality like (1.1). Indeed, Jourdain [10] proved their equivalence in dimension one. Ding [6] claimed a general inequality between W 2 and the so called Rényi-Tsallis divergence of order α, which equals the variance for α = 2 (somehow, it is obscure for us to check Remark 3.3 therein for small variance, maybe we misunderstand something). Then Ledoux [12] provided a very streamlined proof to show a general result that PI is equivalent to the quadratic transportation-variance inequality (W 2 V for short) We give two alternative proofs to this fact and achieve a smaller constant as C V 2C P . Conversely, various perturbation techniques ensure PI with a constant no more than C V if assume W 2 V (see [12]). Precisely, our first main result is the following. Theorem 1.1. Let ν = f µ ∈ P(E). The Poincaré inequality implies next every inequality: There are two approaches to this end, and both are contributed to get the inequality (see also (2.1) below) The first approach is a shortcut based on the interpolation technique developed by Kuwada [11] and further by [12]. The other one appeals to the derivative formula of W 2 2 (P t f µ, µ) in t (almost everywhere), which is slightly different from what Otto-Villani employed in [15,Lemma 2]. Our method doesn't involve the theory of solving Fokker-Planck equation on Riemannian manifolds, so we have a by-product as reproving their lemma for nice initial data but avoiding the curvature condition.
Another by-product is to show the equivalence between the LSI and strict contraction of heat flow in Wasserstein space (here we actually mean a strictly exponential decay of W 2 (P t f µ, µ) with some multiple in front) provided that the Bakry-Émery curvature has a lower bound. One can compare the following with the well known characterization of curvature-dimension condition through the heat flow contraction (see [2,Theorem 9.7.2] for this fact and [2, Subsection 3.4.5] for precise definition of curvature-dimension condition CD(ρ, ∞)). Proposition 1.3. Assume V is a smooth potential such that the curvature-dimension condition CD(ρ, ∞) holds for ρ ∈ R. Then the next two statements are equivalent: (1) there exist two constants C > 0 and κ > 0 such that for all t > 0 and any ν = f µ ∈ P(E) W 2 (P t ν, µ) Ce −κt W 2 (ν, µ); (2) there exists a constant C LS > 0 such that the LSI holds.
Remark 1.4. The constants involved here may depend on ρ. If the LSI holds, we have κ = 1/C LS . Very recently, Wang [19] discussed exponential contraction in any W p (p 1) for a class of diffusion semigroups and gave the implication from (2) to (1) as well.
Next, we are interested in the comparison of W 2 2 (ν, µ) to Var µ ( √ f ) rather than Var µ (f ). In general, one can't expect a strong inequality as W 2 2 (ν, µ) CVar µ ( √ f ), since from PI it follows W 2 2 (ν, µ) 1 4 CC P I µ (f ), which is called the quadratic transportation-information inequality (W 2 I for short, see [9]), and it is known that W 2 I is strictly stronger than PI and even than W 2 H. Actually what we present first is a new inequality between the Wasserstein distance and its "centralization", which may be viewed as a special counterpart of the Rothaus' lemma for relative entropy (see [2,Lemma 5.1.4]), namely for any a ∈ R Precisely we have . If the Poincaré inequality holds, then there exists two constants C 1 and C 2 such that For instance, we can take C 1 = 2 and C 2 = 96C P . Actually our method implies that C 1 can approach 1 but should be strictly greater than 1. Moreover, f c can be extended to for any θ ∈ (0, 2c) associated with two constants C 1 (θ) and C 2 (θ) depending on θ.
As consequence, when E has a finite diameter, it follows by the definition of W 2 which can't be directly concluded by Theorem 1.1 we think. Then it quickly derives W 2 I from PI again. Moreover, a LSI holds by using the HWI inequality in [15,16,2] under the curvature-dimension condition CD(ρ, ∞), with the control constant There is a lot of literature concerning LSI, for example one can compare the above (1.2) with [18,Theorem 1.4] about the constant estimate on compact manifolds by means of semigroup analysis.
When E is unbounded, we have at least by using [16, Proposition 7.10] that It gives a direct way to derive W 2 I from the so-called Lyapunov condition. Recall [13], the Lyapunov condition here means there exists such a function W > 0 satisfying that W −1 is locally bounded and for some c > 0, b 0 and x 0 ∈ E holds in the sense of distribution where Q C denotes the infimum-convolution operator and Q C h solves the Hamilton-Jacobi equation d dt Q t h + 1 2 |∇Q t h| 2 = 0 for initial data h, see [2,3] for example. Nevertheless, facing the stability problem for W 2 H under bounded perturbation, one needs various additional curvature conditions so far, for example see [8,14].
When we turn to the same problem for W 2 I, it would be more robust if we can find a direct method to derive W 2 I from (1.4) with no appearance of W 2 H. Actually, Theorem 1.5 takes on such a role.
The paper is organized as follows. In next Section 2, we give a quick proof to Theorem 1.1. In Section 3 and 4, we compute the derivative of quadratic Wasserstein distance along heat flow, and then complete the other proof of Theorem 1.1. The equivalence of the LSI and strict contraction of heat flow in Wasserstein space is shown in Section 5. Section 6 is devoted to the comparison inequality about centralization of quadratic Wasserstein distance, and Section 7 provides a direct proof of W 2 I under the Lyapunov condition.
2. The first proof of Theorem 1.1 Recall that, for any bounded Lipschitz function h, define its infimum-convolution for any t > 0 by According to [11,12], for any decreasing function λ ∈ C 1 [0, +∞) with λ(0) = 1 and lim t→∞ λ(t) = 0, one has a semigroup interpolation by virtue of Hamilton-Jacobi equation, integration by parts and the Hölder inequality that Using the Kantorovich dual (see [2, Section 9.2], [16, It is flexible to choose a nice λ to prove Theorem 1.
, then it follows We will revisit (2.1) in Section 4 by means of derivative estimate of Wasserstein distance.
Proof. It consists of two parts. Part 1. First of all, using the inequality log x x − 1 yields that For Inversely, assume there exists some C > 0 such that Various perturbation techniques give PI with a constant √ 2C, see [12,17] and the references therein. For completeness, we write down a sketch.
Part 2. When we bound relative entropy by other functionals, it should lead to new types of transportation-variance inequalities. Indeed, for any p 1 holds by Jensen's inequality (recall µ(f ) = 1 here) that If PI holds, it follows similarly from (2.1) which covers the second inequality in Theorem 1.1 for p = 1 and also gives the third one Using PI again yields which gives the fourth inequality in Theorem 1.1. It follows the fifth inequality by taking p = 1 that Inversely, still following the routine of perturbation technique, (2.4) implies PI too. More precisely, recall the first part, we have a similar result as (2.3) that

Derivative of quadratic Wasserstein distance along heat flow
In this section, we compute the derivative formula of W 2 (ν t , µ) for dνt dµ = P t f . Recall that, in our notation, Otto-Villani [15, Lemma 2] (see [16,Subsection 9.3.4] also) was actually concerned to the upper right-hand derivative of W 2 (ν, ν t ) and found a bound as d dt ρI for some ρ ∈ R (namely the curvaturedimension condition CD(ρ, ∞)). The difference between W 2 (ν t , µ) and W 2 (ν, ν t ) is that the former might be integrable for t ∈ [0, +∞).
According to [16,Exercise 2.36], there exists h t ∈ L 1 (µ) such that µ(h t ) = 0 and Q 1 h t ∈ L 1 (ν t ), and the conjugate pair (Q 1 h t , h t ) attains the supremum as Given nice initial data, we obtain the derivative formula for W 2 2 (ν t , µ) in almost all t with no condition on curvature. Then for almost all t > 0, there exists some h t ∈ L 1 (µ) satisfying (3.2) and Proof. It consists of four steps. Note that L 1 (ν t ) ⊂ L 1 (µ) in our case since f has a positive lower bound and then ν t (|h|) inf f · µ(|h|). The assumption of Lf ∈ L ∞ (E) is reasonable due to that the resolvent operator R λ sends C b (E) into C b (E) ∩ D(L) and L = −R −1 λ + λI (see for example Evans [7, Subsection 7.4.1]). Step 1. To show the continuity of W 2 (ν t , µ) in t.
Using the control inequality of weighted total variation (see [16,Proposition 7.10]) yields that for any t, t ′ > 0 It follows from the triangle inequality Step 2. To choose a conjugate pair (Q 1 h t , h t ) satisfying (3.2) and some auxiliary "maximality" (which will be introduced in (3.3) and applied for next step).
First of all, let (Q 1ht ,h t ) ∈ L 1 (ν t ) × L 1 (µ) satisfy µ(h t ) = 0 and Q 1ht may not have a gradient, so we take a sequence of bounded Lipschitz functions Without loss of generality, assume u ∞ = lim We want to show that (Q 1 h t , h t ) is also a conjugate pair satisfying W 2 2 (ν t , µ) = 2 Q 1 h t dν t . The difference between (Q 1 h t , h t ) and (Q 1ht ,h t ) is that the former can be approximated by a special sequence of bounded Lipschitz pairs with the property (3.3).
To this end, by the definition of infimum convolution, we have first which means that Q 1 h k,t falls between two L 1 -convergent sequences. By virtue of the Prokhorov theorem (namely the tightness argument) together with the fact of L 1 (ν t ) ⊂ L 1 (µ), one can extract a subsequence of Q 1 h k,t (denoted by itself for the ease of notation) converging in L 1 (ν t ). Denote φ t = lim k→∞ Q 1 h k,t , which satisfies On the other hand, due to the definition of h k,t in (3.3), it follows Hence, (φ t , h t ) attains the supremum of the dual Kantorovich problem too. Moreover, it follows φ t = Q 1 h t almost everywhere with respect to ν t and µ as well since f has a positive lower bound.
For (Q 1 h t , h t ), we have Recall the approximating sequence (Q 1 h k,t , h k,t ) for (Q 1 h t , h t ) in Step 2, using the formula of integration by parts and the Hölder inequality yields that Since Q s h k,t solves the Hamilton-Jacobi equation d ds Q s h k,t + 1 2 |∇Q s h k,t | 2 = 0 (see [7, Subsection 3.3.2]), we have by (3.3) (namely the integral "maximality" for Q 1 h k,t ) that Note that A t is continuous in t.
Step 4. To show the Lipschitz property of W 2 2 (ν t , µ). For convenience, denote F (t) = W 2 2 (ν t , µ). Heuristically, using (3.5) and (3.7) yields a local estimate that for any t > 0 there exists s > 0 such that A t + ε. For any t ∈ [a, b], there exists some η t ∈ (0, b − a] by using (3.5) and (3.7) such that for all s ∈ (0, η t ] On the other hand, the continuity of F (t) implies there existsη t ∈ (0, η t ] such that for all −s ∈ [−η t , 0] Then the open interval I t = (t −η t , t + η t ) is of length no less than η t and no more than 2η t , and holds for any t 2 t t 1 or t t 2 t 1 in I t The collection of all I t becomes an open covering of [a, b], which implies a finite sub-covering I. To reduce overlaps, we have to do some selection. Starting from t 0 = a, one can successively take the i-th open interval I ti from I for i = 1, 2 . . . satisfying next two properties: (1). I ti ∩ I ti−1 = ∅, and I ti contains the right-hand endpoint of I ti−1 .
(2). If there is another I t * ∈ I intersecting with I ti−1 , then I t * ⊂ j i I tj , namely the right-hand endpoint of I t * doesn't exceed I ti . It means I ti is the most effective cover than any other I t * . This procedure will stop at time N once I tN contains b. Now, we have a chain I t0 , I t1 , . . . , I tN satisfying that each element only intersects with its neighbors, which means their overlap is at most 2-fold for every point in In any case, we obtain an interpolation by (3.8) Similarly, it follows from (3.6) and (3.7) that Combining the above estimates yields that F (t) = W 2 2 (ν t , µ) is locally Lipschitz and then has a derivative for almost all t > 0 as d dt It follows that for almost all t > 0 d dt which can be rewritten to d dt The proof is completed.

Remark 3.2.
It is interesting to ask further that whether h t =h t almost everywhere (namely u ∞ = 0 in (3.4)). For any positive α and β with α + β = 1, we have αQ 1ht + βQ 1 h t Q 1 αh t + βh t and W 2 2 (ν t , µ) = 2 αQ 1ht + βQ 1 h t dν t 2 Q 1 αh t + βh t dν t W 2 2 (ν t , µ), which implies αQ 1ht + βQ 1 h t = Q 1 αh t + βh t almost everywhere. It follows that for almost every x ∈ E and h =h t or h t or αh t + βh t , Q 1 h(x) can take its value at the same critical point y x such that Q 1 h(x) = h(y x ) + 1 2 d 2 (x, y x ) (or the same point sequence {y x )). If u ∞ = 0 andh t is bounded and differentiable, we have ∇h t (y x ) = ∇h t (y x ) = x − y x and then ∇h t (y x ) = ∇h t (y x ) ≡ 0 since h t = (1 − u ∞ )h t , which meansh t has to be a constant function and furthermoreh t ≡ 0 for µ(h t ) = 0. This suggests that h t =h t is true, however, it seems complicated to deal with L 1 functions.
The same argument is also effective in reproving Lemma 2 in [15] as which avoids using the second inequality in (3.1).

The second proof of Theorem 1.1
Proof. Assume PI holds with a constant C P . Recall that which implies Ent µ (P t f ) → 0 for t → ∞. Using the same method in the second part of [15, Lemma 3] yields W 2 (ν t , µ) → 0 too. More precisely, W 2 (ν t , µ) decays exponentially fast due to that for any continuous ξ with |ξ(x)| C(d 2 (x 0 , x) + 1), where the integrability of d 4 (x 0 , ·) comes from PI as well.
For simplicity, assume f fulfills all the conditions in Lemma 3.1, then we have by using the Hölder inequality to get (2.1) again The following steps are the same as those in Section 2.
2 (ν s , µ)ds (it is finite since W 2 (ν t , µ) decays exponentially fast), (4.1) can be rewritten to and then Substituting this estimate back to (4.1) for t = 0 gives us The following steps are the same as before. By the way, if one is concerned to the quantity W 2 , it also decays exponentially fast provided that PI holds. Firstly we have for any g 2 µ ∈ P(E) (denote m = µ(g) and Var µ (g 2 ) |g 2 − m 2 | 2 dµ 2 |g − m| 4 dµ + 8m 2 |g − m| 2 dµ.
Then it follows from PI that d dt µ (P t g − m) which implies by taking λ = 3 that d dt Λ t −3C −1 P Λ t and then Λ t exp (−3t/C P ) Λ 0 . Hence using Theorem 1.1 yields for g = √ f that where the total rate is no more than e −2t/CP .

The logarithmic Sobolev inequality and strict contraction of heat flow in Wasserstein space
In this section, we prove Proposition 1.3. The curvature-dimension condition plays a fundamental role such that we can compare several functionals for heat flow at different times. The derivative estimate in previous section is also useful.
Proof. Assume V is a smooth potential satisfying the curvature-dimension condition CD(ρ, ∞).
If the LSI holds, it is known that the entropy along heat flow decays exponentially fast. Moreover, the Talagrand inequality comes true (see [15] or [2, Theorem 9.6.1]), namely for any positive bounded f and any t > T > 0 it follows from the the same argument as [2, Page 446] that , which attains its minimum at T 0 = 1 2|ρ| log(1 + C LS |ρ|). So now we obtain the exponential decay for t > T 0 .
For 0 < t T 0 , there is a general bound according to the heat flow contraction in Wasserstein space (see [2, Theorem 9.7.2]) as Combining two regions gives us a control constant C := max{γ(T 0 ), e (2C −1 LS −2ρ)T0 , 1} such that for all t > 0 and κ : Conversely, if W 2 (P t f µ, µ) Ce −κt W 2 (f µ, µ), there exists t (independent of f ) such that η := Ce −κt < 1. Using the derivative estimate for nice f (see Lemma 3.1) yields Based on the heat flow contraction and information contraction (see [2,Eq. 5.7.4]) we have further where the last step comes from the Cauchy-Schwarz inequality for any ε > 0. It follows W 2 I by taking ε = η = 1 2 explicitly that Since W 2 I is equivalent to LSI under CD(ρ, ∞) by virtue of the HWI inequality (see [15] or [2, Subsection 9.3]) we complete the proof.
Proof. For any bounded Lipschitz h with µ(h) = 0, let m t = µ(Q t h), we have Taking any interval [a, b] ⊂ R + and any nonnegative φ ∈ C 1 ([a, b]), we integrate both sides to get For convenience, denote the right-hand three terms by I 1 , I 2 , I 3 respectively. Using the Cauchy-Schwarz, Hölder and Poincaré inequalities yields for any λ > 0 where the last step comes from the Hamilton-Jacobi equation. Using the integration by parts gives If φ(a) = φ(b) = 0, we have further and then Now we want to drop the first integral on the right side of above inequality. For instance, take a = 1 2 , b = 1, φ(t) = (t − a)(b − t) (satisfying φ(a) = φ(b) = 0, φ 0 and |φ ′ | 1 2 ), and λ = C −1 P , then for t ∈ [a, b], the quantity ψ := (4λ(b − a)C P φ ′ + 1) 0, which implies b a m t φψdt 0 since the monotonicity of Q t in t gives m t = µ(Q t h) µ(h) = 0. Hence I 2 + I 3 C P σ 2 .
Finally, combining all above estimates yields I 0 I 1 + C P σ 2 .
Denote M = b a φdt = 1 48 , it follows M · µ(Q b hf ) I 0 I 1 + C P σ 2 M · µ(Q a h( f − c) 2 ) + C P σ 2 , which implies by the Kantorovich dual of W 2 -distance that The proof is completed.