Integral Criteria for Transportation-Cost Inequalities

In this paper, we provide a characterization of a large class of transportation-cost inequalities in terms of exponential integrability of the cost function under the reference probability measure. Our results completely extend the previous works by Djellout, Guilin and Wu and Bolley and Villani.


Introduction
In all the paper, (X , d) will be a polish space equipped with its Borel σ-field. The set of probability measures on X will be denoted by P(X ).
1.1. Norm-entropy inequalities and transportation cost inequalities. The aim of this paper is to give necessary and sufficient conditions for inequalities of the following form : where • α : R + → R + ∪ {+∞} is a convex lower semi-continuous (l.s.c) function vanishing at 0, • The semi-norm ν − µ * Φ is defined by where Φ is a set of bounded measurable functions on X which is symmetric, i.e.
ϕ ∈ Φ ⇒ −ϕ ∈ Φ, • The quantity H(ν | µ) is the relative entropy of ν with respect to µ defined by if ν is absolutely continuous with respect to µ and +∞ otherwise.
Inequalities of the form (1.1) were introduced by C. Léonard and the author in [12]. They are called norm-entropy inequalities. An important particular case, is when Φ is the set of all bounded 1-Lipschitz functions on X : Φ = BLip 1 (X , d). Indeed, in that case ν − µ * Φ is the optimal transportation cost between ν and µ associated to the metric cost function d(x, y). Let us recall that if c : X × X → R + is a lower semi-continuous function, then the optimal transportation cost between ν ∈ P(X ) and µ ∈ P(X ) is defined by where π describes the set Π(ν, µ) of all probability measures on X × X having ν for first marginal and µ for second marginal. According to Kantorovich-Rubinstein duality theorem (see e.g Theorem 1.3 of [18]), if the cost function c is the metric d, the following identity holds In this setting, inequality (1.1) becomes ∀ν ∈ P(X ), α (T d (ν, µ)) ≤ H(ν | µ) (1.5) Such an inequality is called a convex transportation-cost inequality (convex T.C.I).
1.2. Applications of transportation-cost inequalities. After the seminal works of K. Marton [14,15] and M. Talagrand [17], new efforts have been made in order to understand this kind of inequalities. The reason of this interest is the link between T.C.I and concentration of measure inequalities. Namely, according to a general argument du to K. Marton, if µ satisfies (1.5), then µ has the following concentration property For a proof of this fact, see e.g. Theorem 9 of [12]. Other applications of T.C.Is were investigated in [8], [3], [2] and [12]. In these papers, it was shown that T.C.Is are an efficient way for deriving precise deviations results for Markov chains and empirical processes. One can also consult [5] and [10] for applications of norm-entropy inequalities to the study of conditional principles of Gibbs type for empirical measures and random weighted measures.
1.3. Necessary and sufficient conditions for norm-entropy inequalities. Our main result gives necessary and sufficient conditions on µ for (1.1) to be satisfied. Before to state it, let us introduce some notations. In all what follows, C will denote the set of convex functions α : R + → R + ∪ {+∞} which are lower semi continuous (l.s.c) and such that α(0) = 0. For a given α, the monotone convex conjugate of α will be denoted by α ⊛ . It is defined by Note that, if α belongs to C, then α ⊛ also belongs to C. Furthermore, one has the relation α ⊛ ⊛ = α. If α is in C, the Orlicz space L τα (X , µ) associated to the function τ α := e α − 1 is defined by where µ almost everywhere equal functions are identified. The space L τα (X , µ) is equipped with its classical Luxemburg norm . τα , i.e ∀f ∈ L τα (X , µ), We will need the following assumptions on α : Assumptions. (1.6) We can now state the main result of this paper, which will be proved in section 2.
Theorem 1.7. Let α ∈ C satisfy assumptions (A 1 ) and (A 2 ) and µ ∈ P(X ). The following statements are equivalent : More precisely, if (1) holds true then one can take M = 3a. Conversely, if (2) holds true, then one can take a = √ 2m α M , with m α defined by where the constants s α ⊛ and c α ⊛ are given by (1.6).
• If Φ contains an element which is not µ-a.e constant, and if inequality (1.1) holds for some α ∈ C, then α satisfies assumption A 2 (see Lemma 2.1). • The constant a = √ 2m α M is not optimal. This can be easily checked by considering the celebrated Pinsker inequality, i.e ∀ν ∈ P(X ), where ν − µ T V is the total-variation norm which is defined by In order to prove Theorem 1.7, we will take advantage of the dual formulation of norm-entropy inequalities developed in [12]. Namely, according to Theorem 3.15 of [12], we have the following result : with α ∈ C is equivalent to the following condition : ∀ϕ ∈ Φ, ∀s ∈ R + , X e sϕ dµ ≤ e s ϕ,µ +α ⊛ (as) . (1.11) According to (1.11), the only thing to know is how to majorize the Laplace transform of a centered random variable X knowing that this random variable satisfies an Orlicz integrability condition of the form : E e α( X λ ) < +∞, for some λ > 0. Estimates of this kind are very useful in probability theory, because they enable us to control the deviation probabilities of sums of independent and identically distributed random variables. In [12], we have shown how to deduce Pinsker inequality from the classical Hoeffding estimate (see Section 2.3 of [12]). We also proved that the weighted version of Pinsker inequality (1.20) recently obtained by Bolley and Villani in [3] is a consequence of Bernstein estimate (see Corollaries 3.23 and 3.24 of [12]). Here, Theorem 1.7 will follow very easily from the following theorem which is du to Kozachenko and Ostrovskii (see [13] and [4] p. 63-68) : Theorem 1.12. Suppose that α ∈ C satisfies Assumptions (A 1 ) and (A 2 ), then for all f ∈ L τα (X , µ) such that X f dµ = 0, the following holds For further informations on the preceding result, we refer to Chapter VII of [11] (p. 193-197) where a complete detailed proof is given. Before proving Theorem 1.7, we discuss below some of its applications.
1.4. Applications to T.C.Is. Applying the preceding theorem to the case where Φ is the Lipschitz ball BLip 1 (X , d), one obtains the following result.
Theorem 1.13. Let α ∈ C satisfy assumptions (A 1 ) and (A 2 ) and µ ∈ P(X ) be such that X d(x 0 , x) dµ(x) < +∞ for all x 0 ∈ X . The following statements are equivalent : More precisely, if (2) holds true, then one can take a = 2 Actually, other transportation cost inequalities can be deduced from Theorem 1.7. Using a majorization technique developed by F. Bolley and C. Villani in [3], we will prove the following result : (
More precisely, if (2) holds true then one can take a = √ 2Km α inf x0∈X c(x 0 , . ) τα . Furthermore, if dom α = R + then the following inequality holds Contrary to what happens in the case where c is the metric d, a transportation-cost inequality α (T c (ν, µ)) ≤ H(ν | µ) can hold even if α does not satisfy Assumption (A 2 ). The most known example is Talagrand inequality, also called T 2 -inequality. Let us recall that a probability measure µ on R n satisfies the Talagrand inequality T 2 (a) if Gaussian measures do satisfy a T 2 -inequality. This was first shown by Talagrand in [17]. In this case, the corresponding α is a linear function and hence its monotone conjugate α ⊛ does not satisfy (A 2 ). Sufficient conditions are known for Talagrand inequality. In [16], it was shown by F. Otto and C. Villani that if dµ = e −Φ dx is a probability measure on R n satisfying a logarithmic Sobolev inequality with constant a, then it also satisfies the inequality T 2 (a). Furthermore, if µ satisfies T 2 (a), then it satisfies the Poincaré inequality with a constant a/2. An alternative proof of these facts was proposed in [1] by S.G. Bobkov, I. Gentil and M. Ledoux. In a recent paper P. Cattiaux and A. Guillin gave an example of a probability measure satisfying T 2 but not the logarithmic Sobolev inequality (see [6]). A necessary and sufficient condition for T 2 is not yet known. Other examples of transportation-cost inequalities involving a linear α can be found in [1], [9] and [6]. The common feature of these T 2 -like inequalities is that they enjoy a dimension free tensorization property (see e.g Theorem 4.12 of [12]) which in turn implies a dimension free concentration phenomenon.
1.5. About the literature. Theorems 1.14 and 1.13 extend previous results obtained by H. Djellout, A. Guillin and L. Wu in [8] and by F. Bolley and C. Villani in [3]. In [8], H. Djellout, A. Guillin and L. Wu obtained the first integral criteria for the so called T 1inequality. Let us recall that a probability measure µ on X is said to satisfy the inequality T 1 (a) if ( 1.18) According to Jensen inequality, T d (ν, µ) 2 ≤ T d 2 (ν, µ), and thus T 2 (a) ⇒ T 1 (a). The inequality T 1 is weaker than T 2 and it is also considerably easier to study. According to Theorem 3.1 of [8], the following propositions are equivalent : (1) ∃a > 0, such that µ satisfies T 1 (a) (2) ∃δ > 0 such that e δd(x,y) 2 dµ(x)dµ(y) < +∞ for some δ > 0, then one can take The link between the constants a and δ was then improved by F. Bolley and C. Villani in [3] (see (1.24) bellow).
In [3], F. Bolley and C. Villani obtained the following weighted versions of Pinsker inequality : if χ : X → R + , is a measurable function, then for all ν ∈ P(X ), Using the following upper bound (see [18], prop. 7.10)  In order to derive T.C.Is from norm-entropy inequalities, we will follow the lines of [3]. To do this, we will deduce from Theorem 1.7 a general version of weighted Pinsker inequality (see Theorem 2.7). Theorem 1.14 will follow from Theorem 2.7 and from Lemma 3.2 which generalizes inequality (1.22).

Necessary and sufficient conditions for norm-entropy inequalities.
Let us begin with a remark on Assumption (A 2 ).
Proof. Let us define Λ ϕ0 (s) = log X e sϕ0 dµ, for all s ∈ R. According to Theorem 1.10, we have It is well known that From this follows that lim inf s→0 + α ⊛ (s) s 2 > 0, which easily implies (1.6).
The rest of this section is devoted to the proof of Theorem 1.7. The following lemma will be useful in the sequel : Lemma 2.3. Let X be a random variable such that E e δ|X| < +∞, for some δ > 0. Let us denote by Λ X the Log-Laplace of X, which is defined by Λ X (s) = log E e sX , and by Λ * X its Cramér transform defined by Λ * X (t) = sup s∈R {st − Λ X (s)}, then the following upper-bound holds : Proof. (See also Lemma 5.1.14 of [7].) Let a < b with a ∈ R∪{−∞} and b ∈ R∪{+∞} be the endpoints of dom Λ * X . Since Λ * X is convex l.s.c, {Λ * X ≤ t} is an interval with endpoints a ≤ a(t) ≤ b(t) ≤ b, for all t ≥ 0. As a consequence, If a(t) > a, the continuity of Λ * X on ]a, b[ easily implies that Λ * X (a(t)) = t. Thus, according to (2.4), P(X < a(t)) ≤ e −t .
If a(t) = a, then P(X < a) = lim n→+∞ P(X < a − 1/n) where (i) comes from (2.4) and (ii) from a − 1/n / ∈ dom Λ * X . Therefore, in all cases P(X < a(t)) ≤ e −t . In the same way, we have P(X > b(t)) ≤ e −t . As a consequence, Finally, integrating by parts and using (2.5) in ( * ) bellow, we get Now, let us prove Theorem 1.7.
More precisely, if χ ∈ L τα (X , µ), then one can take a = 2 √ 2m α χ τα . Conversely, if (1) holds true, then if µ has no atoms 3a + X χ dµ · 1 I τα , otherwise Furthermore, the Luxemburg norm χ τα can be estimated in the following way : Remark 2.8. If α ∈ C satisfies Assumptions (A 1 ) and (A 2 ) and is such that dom α = R + , we have thus shown the following weighted version of Pinsker inequality : Inequality (2.9) completely extends Bolley and Villani's results (1.20) and (1.21). The proof of Bolley and Villani is very different from ours. Roughly speaking, it relies on a direct comparison of the two integrals X χ dν dµ − 1 dµ and X dν dµ log dν dµ dµ.
The case where dom α is a bounded interval is left to the reader.
Remark 2.11. It is easy to show that when α(x) = x 2 , the Luxemburg norm χ τ x 2 can be estimated in the following way : With this upper-bound, one obtains

12)
which differs from (1.21) only by numerical factors. The following proposition gives a way to improve the constants in the preceding inequality.
Proposition 2.13. For every measurable function χ : X → R + , the following inequality holds (2.14) Proof. First let us show that if X is a real random variable such that E e X 2 < +∞ one has the following upper bound : Let X be an independent copy of X. According to Jensen inequality, we have E e s(X−E[X]) ≤ E e s(X− X) . The random variable X − X is symmetric, thus E (X − X) 2k+1 = 0, for all k. Consequently, It is easily seen that E e s 2 (X− X) 2 /2 ≤ E e s 2 X 2 2 , and if s ≤ 1, E e s 2 X 2 2 ≤ E e X 2 2s 2 . Hence, But if s ≥ 1, one has E e s(X−E[X]) ≤ E e s(X− X) ≤ E e s 2 /2+(X− X) 2 /2 ≤ e s 2 /2 · E e X 2 2 ≤ e s 2 /2 · E e X 2 2s 2 .
So, the inequality E e s(X−E[X]) ≤ e s 2 /2 · E e X 2 2s 2 holds for all s ≥ 0.

Applications to transportation cost inequalities.
In this section, we will see how to derive transportation-cost inequalities from norm-entropy inequalities. Let us begin with the proof of Theorem 1.13.
Proof of Theorem 1.14. Theorem 1.14 follows immediately from Propositions 3.3 and 3.6.