Transportation-cost inequalities for diffusions driven by Gaussian processes

We prove transportation-cost inequalities for the law of SDE solutions driven by general Gaussian processes. Examples include the fractional Brownian motion, but also more general processes like bifractional Brownian motion. In case of multiplicative noise, our main tool is Lyons' rough paths theory. We also give a new proof of Talagrand's transportation-cost inequality on Gaussian Fr\'echet spaces. We finally show that establishing transportation-cost inequalities implies that there is an easy criterion for proving Gaussian tail estimates for functions defined on that space. This result can be seen as a further generalization of the"generalized Fernique theorem"on Gaussian spaces [Friz-Hairer 2014; Theorem 11.7] used in rough paths theory.


Introduction
Transportation-cost inequalities can be seen as a functional approach to the concentration of measure phenomenon (cf. Ledoux's work [Led01] for an introduction to the theory of measure concentration and the work [GL10] by Gozlan and Léonard for an overview to transport inequalities). They are usually of the following form: Let (E, d) be a metric space and let P (E) denote the set of probability measures on the Borel sets of E. We say that a p-transportation-cost inequality holds for a probability measure µ ∈ P (E) if there is a constant C such that W p (ν, µ) ≤ CH(ν | µ) (0.1) holds for all ν ∈ P (E). Here W p (ν, µ) denotes the Wasserstein p-distance where Π(ν, µ) is the set of all probability measures on the product space E × E with marginals ν resp. µ, and H(ν | µ) is the relative entropy (or Kullback-Leibler divergence) of ν with respect to µ, i.e.
If (0.1) holds, we will say that T p (C) holds for the measure µ.
Inequalities of type (0.1) were first considered by Marton (cf. [Mar86], [Mar96]). The cases "p = 1" and "p = 2" are of special interest: The 1-transportation-cost inequality, i.e. the weakest form of (0.1), is actually equivalent to Gaussian concentration as it was shown by Djellout, Guillin and Wu in [DGW04] (using preliminary results by Bobkov and Götze obtained in [BG99]). The 2-transportation-cost inequality was first proven by Talagrand for the Gaussian measure on R d in [Tal96] with the sharp constant C = 2 (for this reason it is also called Talagrand's transportation-cost inequality).
T 2 (C) is particularly interesting since it has the dimension-free tensorization property: If T 2 (C) holds for two probability measures µ 1 and µ 2 , it also holds for the product measure µ 1 ⊗ µ 2 for the same constant C (see also [GL07] for a general account on tensorization properties for transportation-cost inequalities). One says that a probability measure µ has the dimension-free concentration property if measure concentration holds for µ and all product measures µ n with the same parameters (a precise definition can be found e.g. in [Goz09]). It follows that the 2-transportation-cost inequality implies the dimension-free Gaussian concentration property for the underlying measure.
Gozlan realized in [Goz09] that also the converse is true: If µ possesses the dimensionfree Gaussian concentration property, T 2 (C) holds for µ. We also remark that the 2-transportation-cost inequality gained much attention because it is intimately linked to other famous concentration inequalities, notably to the logarithmic Sobolev inequality: In their celebrated paper [OV00], Otto and Villani showed that in a smooth Riemannian setting, the logarithmic Sobolev inequality implies the 2-transportation-cost inequality.
Since then, this result has been generalized in several directions, see e.g. the recent work of Gigli and Ledoux [GL13] and the references therein.
In this work, we will mainly study transportation-cost inequalities for the law of a continuous diffusion Y induced by a stochastic differential equation (SDE) driven by a general Gaussian process, i.e. Y : [0, T ] → R d solves Transportation-cost inequalities, Gaussian processes studied, up to a certain extend, is the fractional Brownian motion (fBm). By definition, a fBm with Hurst parameter H ∈ (0, 1) is a centered Gaussian process with covariance R(s, t) = 1 2 |t| 2H + |s| 2H − |t − s| 2H , and it is easily seen that we obtain the usual Brownian motion for H = 1/2. However, for H = 1/2 this process is neither a semimartingale nor a Markov process. Guendouzi shows T 1 (C) for the L 1 -metric for a mixed SDE involving a fBm with Hurst parameter H > 1/2 in [Gue12]. Saussereau studies more general equations in [Sau12] and shows T 1 (C) and T 2 (C) also for the uniform metric in particular situations. However, all equations he considers are either driven by a fBm with Hurst parameter H > 1/2, have additive noise or are one-dimensional. In fact, all these examples have something in common. Namely, it is known that in these cases, the solution to (0.2) is a continuous function of the driving process path-by-path. This is not true in the general case of (0.2) (and already fails, for instance, for the usual Brownian motion). For studying the equation (0.2) in full generality, one needs further ingredients, and we will use Lyon's rough paths theory to achieve this goal. Let us mention that our results imply those obtained in [Sau12] in case of fBm. There is a further challenge when studying transportation-cost inequalities for solutions to (0.2) for general Gaussian processes X. The standard tool to establish transportation-cost inequalities, following [FÜ04] and [DGW04], is to use the Girsanov transformation. In a non-martingale framework, this argument completely breaks down. In case of the fBm, it can still be applied up to a certain point due to the Mandelbrotvan Ness representation of the fBm as a stochastic integral with respect to standard Brownian motion [MVN68]. However, there are many Gaussian processes (and we will encounter a class of them in the forthcoming Example 2.7) where such a representation is simply not known. Our approach can be seen as an attempt to prove concentration inequalities for diffusions avoiding the Girsanov transformation. Let us explain our strategy and the contribution of this work. In Section 1, we consider transportation-cost inequalities on infinite dimensional Gaussian spaces. In turns out that in this framework, the quadratic transport inequality even holds for the Cameron-Martin metric, which is defined as follows: If H denotes the Cameron-Martin space associated to a Gaussian measure γ, set The fact that a transport inequality holds for γ and this metric should be surprising at first sight since it is known that for infinite dimensional spaces, the Hilbert space H has γ-measure 0; in other words, d H (x, y) = ∞ "very often". In this form, the quadratic transport inequality was first proven by Feyel and Üstünel on Gaussian Banach spaces in [FÜ04, Theorem 3.1] using the Girsanov transformation (cf. also Gentil's PhD thesis [Gen01]). The proof we give does not rely on the Girsanov transformation and holds even in Fréchet spaces (cf. Theorem 1.2). Our main tool for proving transport inequalities for solutions to (0.2) will be a contraction principle, first proven 1 by Djellout (0.2). This is usually true for additive noise, and we study this case in Section 2.1 first. Interestingly, due to the strong form of the Gaussian transportation-cost inequality, we obtain such inequalities for the law of Y for metrics which are much larger than the uniform metric (cf. Theorem 2.2 and the discussion in Example 2.7) in the case of b in (0.2) being Lipschitz continuous. We further study the case where b only satisfies a one-sided Lipschitz condition in Theorem 2.6. We proceed with the multiplicative noise case in Section 2.2. As already mentioned, here we cannot expect the solution map to be Lipschitz continuous anymore in the usual topologies. The key idea is to use the rough path factorization: Instead of studying the map X(ω) → Y (ω) directly, we consider an intermediate step; namely, we decompose this map as The map S is called lift map, and it takes a Gaussian trajectory and maps it to a rough path. It is not continuous, but easy to analyze. The map I is called Itō-Lyons map, and it is known to be continuous, and even locally Lipschitz continuous in rough paths topology (in fact, this result can be seen as the main theorem in rough paths theory). The point now is that S can be shown to be locally Lipschitz continuous from H to a rough paths space, hence the decomposition, seen as a map from H to the space of continuous paths, is locally Lipschitz continuous. The contraction principle allows us to conclude T 2−ε (C) for any ε > 0, cf. Theorem 2.14 (the ε-correction stems from the fact the we only have local Lipschitzness). Finally, we discuss the link between T p (C) and tail estimates for functions in Section 3 and establish a link between T p (C) and the generalized Fernique theorem (cf. [DOR15,Theorem 17], [FH14,Theorem 11.7] and [FO10]) which is of fundamental importance in rough paths theory (cf. [FH14,Chapter 11]). This section does not depend on the former ones and may be of independent interest. Let us finally mention that we think that our approach can be carried over to SDEs in infinite dimensions, i.e. to stochastic partial differential equations, in particular to those considered in Hairer's theory of regularity structures [Hai14] or Gubinelli-Imkeller-Perkowski's approach using paracontrolled distributions [GIP15]. Indeed, in both theories, it was understood that (after a possible renormalization), singular equations like the KPZ-equation (cf. also [Hai13]) often have a similar factorization as in (0.4), and this was the basic ingredient we needed for ordinary SDEs as well.

Notation
If (X, F) is a measurable space, P (X) denotes the set of all probability measures defined on F. If X is a topological space, F will be usually be the Borel σ-algebra B(X). If X and Y are measurable spaces and ν ∈ P (X), µ ∈ P (Y ), then Π(ν, µ) denotes the set of all product measures on X × Y with marginals ν resp. µ.

Transportation inequality on a Gaussian space
In this section, we give a proof of T 2 (2) on Gaussian spaces for the Cameron-Martin metric defined in (0.3), a result which was first proven on Banach spaces by Feyel and Üstünel [FÜ04,Theorem 3.1] using the Girsanov transformation. Our strategy will be to "approximate" the infinite dimensional space by finite dimensional ones on which we know from Talagrand's original result that T 2 (2) holds.
We start with an abstract approximation result. Lemma 1.1. Let X and Y be Polish spaces and let (µ n ) and (ν n ) be sequences of probability measures on X resp. Y which converge weakly to some probability measures µ resp. ν. Let c n : X × Y → [0, ∞) be a nondreasing sequence of bounded, continuous functions such that c n c pointwise where c : c(x, y) dπ(x, y).
In the following, we aim to consider Gaussian measures on linear spaces. Typically, one assumes that the space should be locally convex, i.e. its topology is generated by family of seminorms separating points (cf. [Bog98, Chapter 2 and Appendix A]). It will be convenient for us to assume that the space is also Polish, i.e. separable and completely metrizable. Such spaces are also called separable Fréchet spaces. A Gaussian Fréchet spaces is a triplet (F, H, γ) where F is a separable Fréchet spaces, γ is a Gaussian measure on the Borel σ-field B(F ) and H denotes the Cameron-Martin space which is a separable Hilbert space (H, ·, · ) lying in F (cf. The following theorem is the main result from this section.
Proof. Note that for every h, k ∈ H, there are elementsĥ,k ∈ F * such that Note that with this definition, p n (h), p n (k) Hn = p n (h), p n (k) H . Consider the image measureγ n := γ •p −1 n . Then (H n ,γ n ) is a finite dimensional Gaussian space, and we know from Talagrand's result that T 2 (2) holds here. Consider the inclusion maps ι n : H n → F and set γ n :=γ n • ι −1 n . By the contraction principle in Lemma 4.1, we see that for every holds for all n ≥ 1 whered Transportation-cost inequalities, Gaussian processes holds for every ν ∈ P (F ) and n ≥ 1. We collect some facts about the functions d n . First, it is clear by definition that all d n : F × F → [0, ∞) are bounded and continuous. Furthermore, for fixed x, y ∈ F , and applyingê k on both sides shows thatê k (x − y) =ê k (z) holds for every k ∈ N. Hence x − y = z ∈ H and we have shown the claim. Next, we show that γ n → γ weakly for n → ∞. Let g : F → R be a bounded, continuous function. Then for n → ∞ which shows weak convergence. Choose any ν ∈ P (F ) with ν γ. Set f := dν dγ and define dν n := f dγ n . From (1.1), we have Assume first that f is bounded and continuous. In this case, we have ν n → ν weakly for n → ∞ and we can use Lemma 1.1 for the left hand side and weak convergence for the right hand side of the above inequality to conclude that indeed Assume first that the density f is bounded by some C > 0. Let (f n ) be a sequence of continuous functions converging γ-a.s. to f . We may assume w.l.o.g. that 0 ≤ f n ≤ C for all f n , otherwise we replace each f n by (f n ∧ C) ∨ 0. Set α n := f n L 1 (γ) and dν n := (f n /α n ) dγ. We have shown that for every n ∈ N, The above inequality implies that also holds for every fixed n, m ∈ N. From Lebesgue's dominated convergence theorem, we can conclude that ν n → ν weakly and holds for every m ∈ N and every bounded density f . Now let f be an arbitrary density function. Set f n := f ∧ n, α n := f n L 1 (γ) and dν n := (f n /α n ) dγ. Using monotone convergence, we see that ν n → ν weakly and H(ν n | γ) → H(ν | γ) for n → ∞. As before, Lemma 1.1 shows that (1.2) holds for every ν γ with density function f and every m ∈ N. Taking the limes inferior along a subsequence of m in (1.2), we can use Lemma 1.1 a fourth time to conclude the assertion of our theorem.

Banach spaces
Let (B, · ) be a separable Banach space and set d B (x, y) := x − y . As an immediate corollary of Theorem 1.2 we obtain: Proof. It is well known that σ < ∞ and that for every h ∈ H one has h ≤ σ|h| H , cf.

Rough paths spaces
In the case of B = C 0 ([0, T ], R d ), Theorem 1.2 immediately generalizes to rough paths spaces. Let γ be a Gaussian measure on B with corresponding Cameron-Martin space H. For the sake of simplicity, we will assume that H is continuously embedded in C 0 , otherwise we could have used a smaller space lying in C 0 instead. Let D be a rough paths space (which could either be geometric or non-geometric, a p-variation or an α-Hölder rough paths space, cf. [LCL07], [FV10b] or [FH14] for a precise definition) and assume that there is a measurable map S : C 0 → D such that π 1 • S = Id C0 holds where π 1 : D → C 0 is the projection map. The map S is called a lift map. Set γ = γ • S −1 . Abusing notation, we define d H : , hence S is (in particular) 1-Lipschitz and the result follows from Theorem 1.2 and Lemma 4.1.

Applications to diffusions 2.1 SDEs with additive noise
In this section, we will consider SDEs of the form  space into the space of paths with finite p-variation. Showing Lipschitz continuity for the p-variation metric is a much easier task which will immediately yield concentration inequalities in p-variation topology.
We start with a simple calculation.
Proof. Existence and uniqueness is classical, we only need to prove the estimate (2.4).
Using the equations shows that Taking both sides to the power q and summing over all increments Taking now the supremum over all partitions implies The integral can be estimated by Gronwall's inequality implies the claim. for some q ∈ [1, ∞). Let Y be the solution to the SDE (2.2) and let µ be the law of Y . Then for every ν ∈ P (C ξ ), Proof. Follows from Theorem 1.2, the contraction principle in Lemma 4.1 and Proposition 2.1.

Remark 2.3.
Embeddings of the form (2.5) play a crucial role in Gaussian rough paths theory and we will revisit them also in the next section. Sufficient conditions for such embeddings, as well as many examples of Gaussian processes for which they hold, are given in [FGGR16].
Next, we aim to relax the assumptions on b : R d → R d . In case of the Brownian motion, it is well known (cf. [PR07]) that (2.2) has a unique solution provided b is continuous and satisfies the following one sided Lipschitz condition: (2.7) However, one has to be careful when solving (2.2) pathwise: In [CHJ13, p. 43], the authors show that there are trajectories which lead to explosion in finite time of solutions to (2.2) although the vector field b satisfies (2.7). In [RS17] and [SS17], a further condition on b was introduced. Together with (2.7), this condition prevents explosion, even in the more general case of multiplicative noise. This condition takes the following form: (2.8) In the following, we will assume both (2.7) and (2.8).
generates a continuous two-parameter flow.
Proof. The fact that the equations (2.9) possess unique solutions is a special case of [RS17, Theorem 4.3]. We only need to prove the estimate (2.10). It is easy to see that if the solutions y i to (2.9) are given by z i + x i . Set We define a sequence of increasing numbers 0 =: τ 0 < τ 1 < . . . as follows: Note that there is a minimal number N ∈ N such that τ N = T . Indeed, otherwise we constructed an increasing sequence (τ n ), bounded by T , which therefore converges towards some number τ , but (β τn ) can clearly not converge although it is continuous, which is a contradiction. By construction, for every n = 0, . . . , N − 1, one either has β t ≤ δ or β t ≥ δ/2 for every t ∈ [τ n , τ n+1 ]. In the first case, which holds for all t ∈ [τ n , τ n+1 ]. In the case n = 0, we have |z 1 τn − z 2 τn | = |ξ 1 − ξ 2 |. For n ≥ 1, we know that β t ≤ δ for t ∈ [τ n−1 , τ n ], therefore This shows that in all cases we have considered, the estimate Note that this is true for any δ > 0, therefore we can conclude that holds true. The claim follows from the equality y i = z i + x i and the triangle inequality.
Theorem 2.6. Assume b : R d → R d is continuous and satisfies (2.7) and (2.8). Let X : [0, T ] → R d be a continuous Gaussian process with corresponding Gaussian measure γ on the space of continuous functions, and let σ 2 be defined as in (1.3). Let Y be the solution to the SDE (2.2) and let µ be the law of Y . Then for every ν ∈ P (C ξ ), Proof. This is a consequence of Corollary 1.3, the contraction principle in Lemma 4.1 and Proposition 2.5.
Example 2.7. We finally discuss an example to illustrate our findings. Let B H,K : [0, T ] → R m be a bifractional Brownian motion, i.e. a continuous, centered Gaussian process with independent components and the covariance of each component is given by with H ∈ (0, 1) and K ∈ (0, 1]. This process was introduced in [HV03] and further studied e.g. in [RT06,KRT07]. Note that for K = 1, we obtain a fractional Brownian motion, and for K = 1 and H = 1/2 we have the usual Brownian motion. In the general form, it is not known whether the process can be written as a stochastic integral with respect to Brownian motion (as for the fractional Brownian motion) or whether it is adapted to a Brownian filtration. This rules out any Girsanov transformation techniques. It can be shown that B H,K has sample paths of α-Hölder regularity for any α < HK (and the sample paths are therefore of finite 1/α-variation), but not better. Martin space H to the space C([0, T ], R d ), and therefore also to the Hilbert space L 2 ([0, T ], R d ). If (e n ) denotes an orthonormal basis of L 2 , we define a scalar product x, y ∼ := n 1 n 2 x, e n L 2 y, e n L 2 .
LetL 2 denote the space L 2 ([0, T ], R d ) equipped with this scalar product. Then integration induces a Hilbert-Schmidt operator from H toL 2 , which can therefore be uniquely extended to the whole space C([0, T ], R d ) almost surely and induces a Gaussian measure onL 2 (cf. e.g. [Hai09, Theorem 3.44]). It can be shown (using the explicit bounds of this map) that the associated Gaussian process onL 2 has actually continuous sample paths almost surely, and that its Cameron Martin space can again be continuously embedded in the space of q-variation paths with the same choice of q. From now on, assume that the σ i satisfy one of the stated regularity assumptions. In case that the drift b is Lipschitz continuous, we can solve (2.1), and the law µ of the solution Y satisfies the quadratic transport inequality for some constant C > 0 and any ν ∈ P (C ξ ) by Theorem 2.2. Note that q < 1/(HK) (e.g. q = 1 in case of the Brownian motion), and we cannot expect that the sample paths of Y itself have finite q-variation. Assuming only continuity, (2.7) and (2.8) for b, we can still solve (2.1), and the law µ of the solution Y satisfies the quadratic transport inequality for another constant C > 0 and any ν ∈ P (C ξ ) by Theorem 2.6. In case of the fractional Brownian motion (i.e. K = 1), the transport inequalitities (2.12) and (2.13) may be compared to the corresponding results obtained in [Sau12] (namely Theorem 1 and Theorem 3). Note that our results imply those and are even stronger in several regards (quadratic transport inequality instead of simple one, larger metric, less regularity assumptions on the vector fields).

SDEs with multiplicative noise
Next we will consider SDEs with multiplicative noise, i.e. equations of the form where ξ ∈ R d , X = (X 1 , . . . , X m ) is a continuous m-dimensional Gaussian process and b, σ 1 , . . . σ m : R d → R d are continuous vector fields. The problem in (2.14) is of course to make sense of the stochastic integrals if X is not a martingale.
We start to discuss a simple case; namely, we assume that the driving process is one dimensional. Under further assumptions on the vector fields, we can use the Doss-Sussmann representation to define the solution to (2.14) pathwise for any continuous driving signal 2 . If we further assume that also the solution space is one dimensional we can follow [Sau12] to derive the following result: Theorem 2.8. Assume m = d = 1 and consider the equation where X : [0, T ] → R is a continuous Gaussian process. We further assume that b is bounded by some constant B and Lipschitz continuous with Lipschitz constant L b . For the diffusion vector field σ, we assume that it is Lipschitz continuous with Lipschitz constant L σ and that there are constants 0 < σ 1 ≤ σ 2 such that σ 1 ≤ σ(x) ≤ σ 2 for all x ∈ R.
Then the equation (2.15) has a unique continuous solution Y and its law µ satisfies the quadratic transport inequality for all ν ∈ P (C ξ ) where C > 0 is a constant depending on the variance of the Gaussian measure given in (1.3), T and all constants above.
Proof. Under the stated conditions, one can show, using the Lamperti transform (cf. [Sau12, proof of Theorem 12 on p. 12]) that the solution map associated to (2.14) is Lipschitz continuous on the space of continuous functions. The result follows from Corollary 1.3 and the contraction principle in Lemma 4.1.
2 Note that we can also use rough paths theory for m = 1 to make sense of (2.14) since the iterated integrals are canonically given as products in this case.
Note that this result generalizes [Sau12, Theorem 2] to arbitrary Gaussian processes. It is even stronger than [Sau12, Theorem 2] since we can deduce the quadratic transportation inequality, not only the simple one. We can also deduce the quadratic transportation inequality for general Gaussian processes under the conditions stated in [Sau12, Theorem 4] for the uniform metric. Indeed, an inspection of the proof reveals that under these conditions, the solution map associated to (2.14) is again Lipschitz, therefore we can conclude as before. Now assume that X is an m-dimensional Brownian motion. In contrast to the additive noise case or the one dimensional case, the solution map I(·, ξ) : where X = (X 1 , . . . , X m ) is the canonical process induced by the Gaussian measure γ.
Our key result will be that for Brownian-like Gaussian processes (we will be more precise later), we have an estimate of the form almost surely for every x, y ∈ C 0 where L is a random variable which possesses every moment w.r.t. the Gaussian measure γ. Together with Lemma 4.1, this yields a transportation inequality which is stated in Theorem 2.14. We will not make an attempt to give an overview to rough paths theory since we will use it merely as a tool. Instead, we refer to the monographs [LQ02], [LCL07], [FV10b] and [FH14]. The terms and notation we are using coincides with the one from [FV10b] with the only exception that we use the symbol D 0,p g to denote the space of geometric p-variation rough paths C 0,p−var 0 ([0, T ]; G [p] (R m )) equipped with the p-variation metric.
We start with some deterministic estimates for rough paths. If ω is a control function and α > 0, recall the definition of N α (ω; [s, t]) resp. of N α (x; [s, t]) for geometric rough paths x ([CLL13], [FR13]). The next proposition is a version of [BFRS16, Theorem 4] for the p-variation metric. Proposition 2.9. Let x 1 and x 2 be weakly geometric p-rough paths for some p ≥ 1.

Consider the rough differential equations (RDEs)
..,m are two families of vector fields in R d , θ > p and β is a bound on 3 |σ 1 | Lip θ and |σ 2 | Lip θ .
Then for every α > 0 there is a constant C = C(θ, p, β, α) such that Proof. The proof follows [BFRS16, Lemma 7 and Theorem 4]. Let ω be a control function such that sup s<t x j ω(s,t) 1/p ≤ 1 for j = 1, 2. Setȳ := y 1 − y 2 and κ := We claim that there is a constant C = C(θ, p) such that for every s < t, Concerning the second term, fix some D ∈ P([S, T ]). We have and the result follows from the triangle inequality for the p-variation seminorm and standard estimates.
Lemma 2.11. Let x 1 := x and x 2 := T h (x) where x is a weakly geometric p-rough path for some p ∈ [1, 3) and h is a path of finite q-variation with 1 p + 1 q > 1. Consider the solutions y 1 and y 2 to the RDEs as in Proposition 2.9 with f 1 = f 2 and y 1 S = y 2 S . Then where C is a constant depending on p, q, θ and β.
Proof. We will only consider the case p ∈ [2, 3), the case p ∈ [1, 2) is similar (and easier). using the deterministic estimates for the Itō-Lyons map proven in [FR13]. With [FH14,Lemma 11.12], we conclude that for a larger constant C.
We come back to our original setup. Assume that γ is a Gaussian measure on the Borel sets of the Banach space C 0 ([0, T ], R m ) induced by a continuous R m -valued Gaussian process X. As usual, we denote the corresponding Cameron-Martin space by H. Assume that there is some p ∈ [1, 3) and a measurable lift map S : C 0 → D 0,p g such that the diagram (2.16) commutes on a set of full γ-measure. Will now make further assumptions on our lift map S: Suppose that (i) There is a continuous embedding ι : H → C q−var ([0, T ], R d ); 1 ≤ q ≤ p with 1 p + 1 q > 1 (note that this implies 1 ≤ q < 2 when p ≥ 2).
(ii) The set has full γ-measure.
The next theorem is our main result for the multiplicative case.
Example 2.15. Let us come back to the bifractional Brownian motion B H,K : [0, T ] → R m already considered in Example 2.7. In [RT06] and [KRT07], it was shown that in the case 2HK = 1, the process has many similarities to the usual Brownian motion. The same holds true here: From [FGGR16, Example 2.12], we know that the covariance of the bifractional Brownian motion has mixed (1, ρ)-variation for ρ = (2HK) −1 . In particular, we can choose q = 1 in the case 2HK = 1, and Theorem 2.14 applies. Therefore, we can almost (i.e. modulo an ε-correction) deduce Talagrand's transport inequality in this case. However, for the Brownian motion (which we obtain for the choice K = 1 and H = 1/2), it is known that Talagrand's inequality holds for the uniform distance (which is smaller than the p-variation distance) even without ε-correction, cf. [Üst12]. It remains an open problem how to obtain the full 2transport inequality for diffusions driven by a multidimensional Brownian motion without using the Girsanov transformation.

Tail estimates for functionals
In the following, we aim to motivate why it is useful to have p-transportation-cost inequalities for p > 1. This section is independent of the former one and may be interesting in its own right.
It is well known that transportation-cost inequalities imply Gaussian measure concentration. This was first disovered by Marton ([Mar86], [Mar96]). In [DGW04], it was shown that for p = 1, the converse is true: Gaussian tails imply the 1-transportation-cost inequality. In the case of a Gaussian Banach space (E, H, γ), it is a classical result (cf. [Bog98,4.5.6. Theorem]) that H-Lipschitz functions on Gaussian spaces have Gaussian tails. This result was further generalized in [DOR15, Theorem 17] (cf. also [FO10] and [FH14,Theorem 11.7]) where it was shown that the linear growth of a function in H-direction already implies that it has Gaussian tails. More precisely, if there is a constant σ > 0 and a measurable map g : E → [0, ∞] for which g < ∞ on a set of positive γ-measure such that f : for all x on a set of full γ-measure and all h ∈ H, then f has Gaussian tails. In the following, we will prove an abstract result which will imply that we may even choose σ random in (3.1) and still obtain Gaussian tails for f . Theorem 3.1. Let E be a linear Polish space and let µ be a probability measure defined on its Borel σ-algebra. Assume that there is a normed subspace U ⊆ E and let d U : Assume that there is a p ∈ [1, ∞) and a constant C such that for every ν ∈ P (E), Let (F, d) be some metric space and let f : E → F be measurable w.r.t. the Borel σ-algebra. Choose r 0 ≥ 0 and some element e ∈ E such that Assume that there are measurable functions g, σ : holds for every x ∈ E and every h ∈ U, and assume that g ∈ L 1 (µ) and σ ∈ L q (µ) where q ∈ (1, ∞] is chosen such that 1 q + 1 q = 1. for all r ≥ r 1 where r 1 = r 0 + 4 g L 1 (µ) + σ L q (µ) 2C log(a −1 ). In particular, the random variable d(f (·), e) : E → [0, ∞) has Gaussian tails.
For any measurable set A ⊆ E and r ≥ 0 we define A r := {x ∈ E : there is anx ∈ A such that d f (x,x) ≤ r} .
If µ(B) = 0, it follows that {x ∈ E : d(f (x), e) ≤ r 0 + r} has full measure. In other words, d(f (·), e) is bounded almost surely and the claimed estimate is trivial. If µ(B) > 0, we can use our calculations above to conclude that 1 − exp − (r −r) 2 C σ 2 L q ≤ µ{x ∈ E : d(f (x), e) ≤ r 0 + r} holds for every r ≥r and the claim follows.
In the Gaussian case, Theorem 1.2 immediately implies Corollary 3.2. Let (F, H, γ) be a Gaussian Fréchet space and f : F → [0, ∞] be measurable. Assume that there are nonnegative random variables g ∈ L 1 (γ) and σ ∈ L 2 (γ) such that f (x + h) ≤ g(x) + σ(x)|h| H holds for every x ∈ F and h ∈ H. Then f has Gaussian tails.
Remark 3.3. Corollary 3.2 is more universal than the Generalized Fernique Theorem proven in [DOR15,Theorem 17] and [FH14,Theorem 11.7] (cf. also [FO10]) since it allows σ to be an L 2 (γ)-random variable. Moreover, our proof does not rely on the Borell-Sudakov-Cirelson inequality ( [Bor75], [SC74]) and can be applied in more general frameworks whenenver transport inequalities are available. Non-Gaussian examples include the law of diffusions driven by Gaussian processes, as was shown in this work.

A generalized contraction principle
The next Lemma is a generalization of [DGW04, Lemma 2.1].