The five gradients inequality on differentiable manifolds

The goal of this paper is to derive the so-called five gradients inequality for optimal transport theory for general cost functions on two class of differentiable manifolds: locally compact Lie groups and compact Riemannian manifolds.


Introduction
Among variational problems involving optimal transportation and Wasserstein distances, it has been shown in [5] that the so-called 'five gradients inequality' plays a distinguished role.Indeed, it has been used to derive BV and Sobolev estimates for the solutions of the JKO scheme for diffusion evolution equations in R n .This is the case for example of nonlinear diffusion [5], weighted ultrafast diffusion [14] and Fokker-Planck or Keller-Segel equations for chemiotaxis [6].It has been also used to provide bounds on the perimeters of solutions of some variational problems involving mutually singular measures [3].Remarkably the five gradients inequality allowed also us to establish stronger convergence estimates for the JKO scheme for the Fokker-Planck equations [18].From a physical perspective, working in R n and with the quadratic cost function given by cpx, yq " |x ´y| 2 is quite restrictive: a recent generalization for different costs is presented in [4], however the setting is still the Euclidean space.Many models inspired by physics and biology involve diffusions on interfaces [9,11,20,21] and it is convenient to model them using Riemannian manifolds.It is then natural to extend the results in [4,5] to differentiable manifolds and the first main result of this paper is a derivation of the five gradients inequality for any compact, complete and smooth (actually C 3 ) Riemannian manifold for any cost function cpx, yq " hpdpx, yqq where h is a convex function.
In the sequel, every probability measure µ P PpMq will be always absolutely continuous with respect to the volume measure vol g ; we will often, by a slight abuse of notation, identify measures with their density with respect to vol g .Theorem 1.1.Let pM, gq be a smooth compact manifold with Ricci curvature bounded from below by K P R. Let d denotes the Riemannian distance, h P C 1 pr0, 8qq be some nonnegative strictly convex increasing function such that h 1 p0q " 0 and assume that the cost function c : M ˆM Ñ R is defined by cpx, yq :" hpdpx, yqq.Finally, let ℓ be an increasing, convex, isotropic function on the tangent space such that ℓp0q " 0 and let be µ, ν P W 1,1 pMq X PpMq.Then, denoting by γ, ϕ and ψ respectively the optimal plan and the optimal Kantorovich potentials for the optimal transport problem between µ and ν for the cost c, it holds ż M ´ℓ1 p∇ϕq ¨∇µ `ℓ1 p∇ψq ¨∇ν ¯vol g ě K ż MˆM ℓ 1 ph 1 pdpx, yqqqdpx, yq dγ.
The compactness hypothesis on pM, gq is not really necessary, as it can be replaced by imposing that µ and ν have compact support: in this way we can consider also spaces with constant negative curvature, which would be otherwise not allowed by the completeness assumption.
Remark 1.2.Notice that ℓ 1 : r0, 8q Ñ r0, 8q.In (1) we use ℓ 1 on a vector, which acts by modifying its distance from the origin, keeping the same direction, namely: It would be interesting to have a generalization, that is a monotone function acting on vectors: however while ℓ 1 pvq acts isotropically, so we can canonically identify it on every tangent space, it is not clear how a generic monotone function should change from point to point.We will see that in fact when a canonical identification of tangent spaces exists (for example in Lie groups), a more general monotone function can be used (see Theorem 1.4).
Remark 1.3.It is not surprising that Ricci bounds for the curvature appear in the formula: in fact it is to be expected since we can show that inequality (1) for ℓptq " hptq " t 2 {2 for K ě 0 yields K-contractivity of the 2-Wasserstein distance along the heat flow, which in turn it is known to be equivalent to having the Ricci curvature bounded below by K (see [22]).In fact let µ 0 , ν 0 P W 1,1 pMq X PpMq, and let B t µ t " ∆µ t and B t ν t " ∆ν t .Then µ t , ν t P W by Grönwall we eventually get W 2 pµ t , ν t q ď e ´Kt W 2 pµ 0 , ν 0 q.
A separate result holds in R n and more generally on Lie groups G, which can be of independent interest: in those cases, we obtain the inequality for any cost cpx, yq which is L-Lipschitz and translation invariant (without requiring any convexity assumptions).The proof in this case is rather different and it is more inspired by the "regularity by duality" approach, very different from the one introduced in [4][5][6], which is the reason why we make less restrictive assumptions.
We denote by g the Lie algebra of the Lie group, that is the vector space of right-invariant vector fields on G: this is naturally isomorphic to T e G and it has an antisymmetric multiplication given by the Lie bracket.Given a subspace X Ď g, we say that it is a generating subspace if the algebra generated by it is the whole g.Sometimes these are called also horizontal distributions, and they have the following crucial property: when we define a control distance associated to this distribution (called Carnot-Caratheodory distance), obtained by minimizing the length of curves whose derivatives belong at each point to the corresponding horizontal subspace, then this distance is finite (as long as G is connected) for every pair of points and moreover generate the same topology as the Riemannian distance.
Then a function f : G Ñ R is called H-Lipschitz if it is Lipschitz with respect to the Carnot-Caratheodory distance, and the horizontal gradient ∇ H f is representing the restriction to the distribution X of the differential of f : given an orthonormal basis X 1 , . . ., X m of X we can identify ∇ H f " pX 1 f, . . ., X m f q.
Theorem 1.4.Let G be a connected Lie group with Lie algebra g and let X Ď g be a right invariant generating distribution of dimension m for g.Let us consider on G the control distance relative to X, dpx, yq " d X px, yq; let h : G Ñ r0, 8q be a L-Lipschitz continuous function (with respect to d), and let µ, ν P W 1,1 H pGq and let G : R m Ñ r0, `8q be a convex function of the form Gpxq " ş S m´1 f v pv ¨xq dσpvq for a family of even convex functions f v and a positive measure σ on S m´1 .
Let ϕ, ψ two optimal L-Lipschitz Kantorovich potentials for the optimal transport problem between µ and ν with the cost cpx, yq " hpy ´1xq.Then For any horizontal right-invariant vector field X P X we have X " v ¨pX 1 , . . ., X m q for some v; considering the special case of (2) with Gpwq " f pw ¨vq, where f P C 1 pRq be an even convex function we obtain Notice that in particular (2) holds for the notable cases Gpxq " }x} p for every p ą 1: it is sufficient to choose σ the uniform measure on S m´1 and f v ptq " C p |t| p for some constant C p ą 0. Notice that G plays the same role of the function ℓ in Theorem 1.1, but thanks to the richer geometry allowed by the Lie structure, we can drop the isotropy assumption.Indeed in this setting we can even prove a five gradients inequality for directional derivatives (see (3)).
As a consequence of Theorem 1.4, it is clear that in Theorem 1.1 assuming the Riemannian manifold to be compact is not a necessary assumption.On the contrary, having Ricci curvature bounded from below seems to be a necessary condition.The class of manifold that 'behave' similarly to compact manifolds is the so-called Riemannian manifold of bounded geometry.Let us recall that a Riemannian manifold pM, gq is of bounded geometry if g is complete and the curvature tensor and all its covariant derivatives are bounded.This leads to the following conjecture.
Conjecture 1.5.A complete Riemannian manifold is of bounded geometry if and only if the five gradients inequality holds.
Notice that due to Remark 1.3, thanks to the results of the seminal paper by Sturm and von Renesse [22], we have that the double implication holds at least for compact Riemannian manifolds.However the equivalence between the contraction in the Wasserstein distance and Ricci curvature goes much further: Savaré proved in [19,Theorem 4.1] that contraction of the Wasserstein distance along the heat flow implies Ricci curvature bounded from below in the abstract setting of infinitesimally Hilbertian metric spaces with synthetic Ricci curvature bounded from below, introduced in [13].We thus believe also the following to be true.Conjecture 1.6.Let pX, d, µq be an infinitesimally Hilbertian Polish metric space.Then pX, d, µq is RCDpK, 8q if and only if the five gradients inequality holds.

Sketch of the proof in R n
We present here the idea at the core of the proof of the "five gradients inequality" in the settings that we consider in the present work.Differently from the explicit approach of [4,5], we deduce the inequality as a consequence of the optimality of the Kantorovich potentials in the dual formulation of the optimal transport problem for a Lipschitz cost function: moreover this approach does not require any a priori smoothness for µ, ν or the potentials.
For simplicity, we explain our strategy directly in R n .Let then µ, ν P W 1,1 pR n q be compactly supported probability measures on R n , ϕ, ψ be the Lipschitz Kantorovich potentials for the classical quadratic Wasserstein distance between µ and ν and T : R n Ñ R n the optimal transport map.Let us first describe the linear case, namely when ∇G in (2) is the identity map and the "five gradients inequality" actually becomes a "four gradients inequality".Observe, that the couple pϕ h , ψ h q defined as ϕ h pxq " ϕpx `heq, ψ h pxq " ψpx `heq for some unit vector e P S n´1 and h ą 0, is a good competitor for the dual problem because we still have ϕ h pxq `ψh pyq ď |x ´y| 2 {2.Moreover, if we set then from the optimality of the Kantorovich potentials we deduce that E has a maximum point in h " 0 thus Ephq `Ep´hq ´2Ep0q ď 0.
The integral on R n being translation invariant, the previous inequality implies that ´żR n pϕ h ´ϕqdpµ h ´µq ´żR n pψ h ´ψqdpν h ´νq ď 0, then dividing by h, taking the limit as h OE 0 and summing up over an orthonormal basis of R n one obtains Let us now consider a convex function G : R n Ñ R as in Theorem 1. 4. Let e P S n´1 be fixed, then the couple p r ϕ h , r ψ h q defined by r ϕ h pxq " ϕpxq `hf ḋoes not satisfy anymore the constraint that r ϕ h pxq `r ψ h pyq ď |x ´y| 2 {2 for every x, y P R n .However the crucial observation is that this still holds in the support of any optimal plan γ.In particular ż Similar computations as in the linear case lead to and hence to ´żR n f 1 e p∇ϕ ¨eq∇µ ¨e dx ´żR n f 1 e p∇ψ ¨eq∇ν ¨e dx ď 0 for the fixed direction e and then a final integration over e P S n´1 provides ´żR n ∇Gp∇ϕq ¨∇µ dx ´żR n ∇Gp∇ψq ¨∇ν dx ď 0.
In the present work we apply the above strategy in the context of locally compact Lie groups and compact Riemannian manifolds.The first setting does not present substantial differences with respect to the Euclidean case R n , as it is possible to find a point-independent orthonormal basis of the space and the Lebesgue measure is invariant under translation.The Riemannian case is instead more complicated and one has to argue locally.For the linear case, in essence in the above sketch we used that if ϕpxq `ψpyq ď |x ´y| 2 {2 then we have ∆ϕpxq `∆ψpyq ď 0 on the contact set ϕpxq `ψpyq " |x ´y| 2 ; this inequality is obtained by using a simultaneous parallel variation in x and y, which do not alter the distance between them, and then average in the directions.
In the Riemannian setting in order to do a simultaneous variation in x and y, we perform a flow along the geodesic connecting them, similar to the one generated by Fermi coordinates in [1, Section 3]; now dpx t , y t q 2 is not constant anymore and we will see in Section 4 that in its second variation, the Ricci curvature naturally appears.Notice that in [1] sharper estimates are obtained for d 2 dt 2 dpx t , y t q 2 : we expect, as a consequence, that some finer estimates could be obtained also in our case, but we did not pursue them for the sake of simplicity (i.e.linear dependence in K of the right-hand side).

Applications
We conclude this introduction commenting on the BV regularity of certain variational problems in Optimal Transportation on Riemannian manifolds.Indeed, a direct application of our main Theorem 1.1 provides explicit estimates for the Wasserstein projection of measures with BV densities over the set tρ P PpMq : ρ !vol g and dρ dvolg ď f u in terms of the BV norm of f .This problem is of particular interest for the choice f " 1 as it corresponds to the case of projection on densities with L 8 bound, thus having a natural application in the context of evolutionary PDEs describing crowd motion.The Euclidean version of this result has been proved in [5,Theorem 1.1] for f " 1 and in [5,Theorem 1.2] for generic f with mass bigger than 1, the key ingredient of both proofs being the 'five gradients inequality' in R n .Thanks to McCann polar decomposition theorem on Riemannian manifolds [16] and straightforward computations in local charts, the same argument provided in [5] still works on smooth compact Riemannian manifolds and allows us to apply the Riemannian version of the 'five gradients inequality', i.e. estimate (1).
More precisely, we obtain the following.
Theorem 1.7.Let pM, gq, K, d, h be as in Theorem 1.1 and ℓptq " t.Let then η : R `Ñ R Y t`8u be a convex and l.s.c.function and ν P PpMq X BV pMq be a given measure.If μ is a solution of where γ is an optimal plan between μ and ν for the cost c " h ˝d.Moreover, if f P BV pMq is a non negative function with mass ş M f vol g ě 1 and μ P argmintC c pµ, νq : µ P PpMq, µ ď f vol g -a.e.u, where, again, γ is an optimal plan between μ and ν for c " h ˝d.
Estimates (4) and ( 5) then have some important consequences depending on the sign of the Ricci curvature.Indeed, recalling that γ is an optimal plan for the cost c " h ˝d with marginals μ and ν, one has that the two estimates provide an explicit control of the change of the BV norm in terms of either the 1-Wasserstein distance or (via Jensen inequality), a suitable homogenized version of the optimal cost C c between the two measures μ and ν, depending on the sign of K. Indeed, one has Notice that in the application of Theorem 1.7 to the JKO scheme one has hptq " t 2 and so this last estimate, for K ă 0 reduces to ´KW 2 pμ, νq.

Possible generalizations
We want also to comment on five gradient inequalities holding for other distance-like functions.Notice that in fact in Subsection 1.1 the relevant facts of the squared Wasserstein distance that we use are • there exists a dual formulation, which depends linearly on the dual variables; • the admissible set of dual potential is invariant under translation.
• the optimal dual potentials are differentiable almost everywhere.
This of course let us immediately realize that the sketch we gave for |x ´y| 2 works also with ℓpx ´yq, where ℓ is any positive Lipschitz cost (this is what we do in Theorem 1.4 in the case G " R n ).However we can change also the type of functional.Notable examples are: • Entropic optimal transport problem, for a Lipschitz cost h: Assuming that G is λ-convex we expect the following inequality to hold notice that the constraint in the dual is not translation invariant: in fact it is useful in this case to pass to the potentials upxq " ϕpxq `ε lnpµpxqq and vpxq " ψpyq `ε lnpνpyqq, for which we can obtain the five gradients inequality and then use the convexity of G to come back to ϕ and ψ.
Then, the following inequality is expected to hold for every convex function G: Structure of the paper In Section 2 we discuss the setting of Lie groups and prove Theorem 1.4; Section 3 encompasses some known results about the variation of the arc-length and optimal transport theory on Riemannian manifolds which will be exploited in Section 4 to show that suitable second variations of the arc-length are bounded from above by the Ricci curvature.In the first part of Section 5 we make use of the construction built in Section 4 to recover a weighted pointwise four gradients inequality for smooth enough initial measures, which eventually implies the nonlinear five gradients inequality.Finally, the second part of Section 5 is devoted to the proof of Theorem -ΓpEq denotes space of smooth sections of a principal bundle E over M.
-T M and F O M denote respectively the tangent and the orthonormal tangent frame bundle.
-We denote the Levi-Civita connection with ∇ and the Christoffel symbols with Γ k ij .
-Π b V paq denotes the parallel transport of the element V paq of the vector filed V along the geodesic connecting a and b.
-We denote the Riemann curvature tensor by R : T p M ˆTp M ˆTp M Ñ T p M and we recall that, for any v, w, z P T p M it holds Rpv, wqz :" r∇ v , ∇ w sz ´∇rv,ws z .
-We denote the Ricci curvature by ric : T p M ˆTp M Ñ R and the Ricci tensor by Ric p : T p M Ñ T p M and we recall that, for any p P M and for any v, w P T p M it holds ricpv, wq " gpRic p pvq, wq " ÿ i gpRpv, e i qe i , wq .
-We denote the scalar curvature by scalppq : M Ñ R and we recall that, for any p P M and for any orthonormal basis te i u Ă T p M, it holds scalppq " ÿ i ricpe i , e i q " ÿ i gpRic p pe i q, e i q .

Setup
Throughout this paper, we shall adopt the following setup: (man) pM, gq is a smooth compact Riemannian manifold; we denote by K the bound from below of the Ricci curvature, and by r K the constant that uniformly bound all the sectional curvatures; (meas) µ, ν are probability measures on M absolutely continuous with respect to the volume form; we will use the same letters, with a slight abuse of notation, to denote their densities with respect to vol g (which denotes the volume form induced by the metric g).
(cvx) ℓ P C 1 pr0, 8qq is an increasing convex function such that ℓp0q " 0.Moreover, in order to prove the results in full generality, we will argue approximating with more regular objects, for which we use the following stronger assumptions: (cost2reg) the cost function c satisfies (cost) where, additionally, hptq " f pt 2 q for some f P C 2 pr0, `8qq.

Five gradients inequality on locally compact Lie groups
Let G be a locally compact connected Lie group of dimension n and denote with l g : G Ñ G (resp.r g : G Ñ G) the left action (resp.the right action) defined by l g pxq :" gx (resp.r g pxq " xg ´1).
As usual, we denote with dl g and dr g the lifts of these actions to the tangent bundle.Let us consider the Lie algebra g, the set of the right-invariant vector fields X such that for any g P G it holds dr g ˝X " X ˝rg , endowed with the product given by the Lie bracket.
Let now tX i u d i"1 be an orthogonal set of right-invariant vector fields and denote with Φ i t be the flow of X i , i.e. the map Φ : R ˆG Ñ G defined for each t P R by sending x P G to the point obtained by following for time t the integral curve starting at x defined by Notice that we allow for d ă n, which is for example the setting of nilpotent Carnot groups.Since the vector fields X i are chosen to be right-invariant, it is easy to see that , where 1 G is the unit of G.By denoting with dx the right Haar measure of G, we get and by uniqueness of the flow Φ i t p1 G q " pΦ i ´tp1 G qq ´1 .This is because the right Haar measure is the unique (up to a positive multiplicative constant) countably additive, nontrivial measure dµ on the Borel subsets of G satisfying the following properties: (1) The measure dx is right-translation-invariant; (2) The measure dx is finite on every compact set; (3) The measure dx is outer regular on Borel sets S Ď G; (4) The measure dx is inner regular on open sets U Ď G.
Remark 2.1.Notice that, on a generic (smooth) Riemannian manifold pM, gq, Equation ( 6) can be only satisfied for divergence-free vector fields.However, not every manifold admits an orthogonal set of divergence-free vector fields, see e.g.[10].
A natural assumption to have on the vector fields X 1 , . . ., X d is the Hörmander condition: let ∆ " xX 1 , . . ., X d y the distribution generated by X 1 , . . ., X d as a subspace of g, we say ∆ satisfies the Hörmander condition if it generates the whole g as an algebra.In this case, we will say that pG, ∆q is a Lie group with its associated right invariant subRiemannian structure ∆.
Then, an important concept is that of Horizontal regularity.We can look for rectifiable horizontal curves γ : r0, 1s Ñ G, that is, curves such that 9 γ P ∆ " xX 1 , . . ., X d y, and we accordingly define a distance, called Carnot-Caratheodory distance d CC px, yq " min "ż 1 0 } 9 γptq} dt : γ horizontal curve such that γp0q " x, γp1q " y * By the left invariance of the construction, it is obvious that, for every h P G, we have d CC px, yq " d CC phx, hyq " d CC py ´1x, 1 G q and crucially, thanks to the Hörmander condition we have that d CC px, yq ă `8, that is every two points can be connected by a finite length horizontal curve.Next, we say that f : G Ñ R is horizontally Lipschitz (or H-Lipschitz) if it is Lipschitz with respect to the distance d CC .
We recall that thanks to Pansu differentiability theorem, every H-Lipschitz function is differentiable dx-a.e.along the horizontal directions; we identify by ∇ H f its gradient.
The transport costs we will be using are left invariant Lipschitz function, that is they are of the following form: cpx, yq " hpy ´1xq for some H-Lipschitz h : G Ñ r0, 8q.In the sequel we will need a lemma about the strong L 1 convergence of differential quotients to directional derivative for horizontal Sobolev functions, which can be defined as in the Euclidean case using the integration by parts formula that holds also in G: we say that a function f P L 1 pGq is a Sobolev function if for every i " 1, . . ., d we have that there exists In this case we say that f P W 1,1 H pGq and we define ∇ H f " pg 1 , . . ., g d q.
Lemma 2.2.Let f P W 1,1 H pGq and let Φ t be the flow of an horizontal right-invariant vector field X P XpGq.Then we have that Proof.For dx-a.e.x P G we have that t Þ Ñ f pΦ t pxqq is an absolutely continuous function whose derivative is Xf pΦ t pxqq.In particular for almost every x we have where P t : L 1 pGq Ñ L 1 pGq is defined as P t pgqpxq :" 1 t ş t 0 gpΦ s pxqq ds.This equality can be proven also by duality and linearity starting from (7).
Using Fubini, Jensen and the invariance of dx under translation, it is easy to see that for every p ě 1 we have }P t g} p ď }g} p .Moreover if g is H-Lipschitz we also have |P t gpxq´gpxq| ď Lippgqt{2.By usual triangular inequality and approximation with compactly supported Lipschitz functions we obtain that }P t g ´g} 1 Ñ 0 for every g P L 1 pGq.Now, using this observation and (8) we deduce f pΦ t pxqq ´f pxq t ´Xf pxq " P t pXf qpxq ´Xf pxq Ñ 0 in L 1 pGq which concludes our proof.
We are now ready to prove the five gradients inequality on locally compact Lie groups.
Proof of Theorem 1.4.First of all the existence of d-Lipschitz Kantorovich potentials ϕ, ψ is given by Theorem 3.3, in particular (16), which is true in every Polish metric space as long as c is Lipschitz.
Let now Φ t be the flow of the vector field X.Let us then define Let us moreover consider the set S ϕ,ψ :" tx, y P G | ϕpxq `ψpyq " hpy ´1xqu.
Furthermore, for any t ą 0 and x, y P S ϕ,ψ it holds ϕ t pxq ´ϕpxq t `ψt pyq ´ψpyq t " ϕ t pxq `ψt pyq ´ϕpxq ´ψpyq t " ϕ t pxq `ψt pyq ´hpy ´1xq t ď 0. ( Since f is an even convex function, by the monotonicity of the derivative, for any s ď ´t we have f 1 psq ď f 1 p´tq.Furthermore, since f is even, its first derivative is odd, i.e. f 1 p´tq " ´f 1 ptq, which allow us to conclude that for any t `s ď 0 it holds Combining Inequality (10) with Equations (9a) and (9b), and applying Inequality (11), we obtain, for any x, y P S ϕ,ψ , r ϕ t pxq `r ψ t pyq ď hpy ´1xq.
Finally, consider an optimal plan γ for µ and ν.We know that it is concentrated on S ϕ,ψ and its marginals are µ and ν.Then we find that In particular we can write the same inequality for ´t and then add them up to obtain ż G p r ϕ t `r ϕ ´t ´2ϕq dµ `żG p r ψ t `r ψ ´t ´2ψq dν ď 0. Now we compute: Letting Hpxq " f 1 ´ϕtpxq´ϕpxq t ¯, g t :" Φ t p1 G q and g ´t :" Φ ´tp1 G q " g ´1 t , we notice that In particular, using the change of variable g Þ Ñ g ´1 t ¨g, which leaves the Haar measure dx invariant, we get we can do the same for the term with r ψ t and then dividing by t 2 and then letting t Ñ 0 we can conclude using dominated convergence and Lemma 2.2.
Let us now consider the horizontal tangent space H which is isomorphic to R d with some metric g, which we can assume to be the usual scalar product, up to a change of coordinates.For any v P H we can consider the relative right-invariant vector field X v .We can now average (3) for f " f v to get Notice that X v ϕ " v ¨∇H ϕ and similarly the other terms.Denoting H v pwq " f v pv ¨wq we have that ∇H v pwq " f 1 v pv ¨wqv, in particular we have and similarly for the other term in (12).Using now Fubini and the linearity of the gradient we see that ( 12) is equivalent to Notice that if f v is independent of v, then G is rotation invariant and so Gpwq " gp}w}q for some convex g.When f v ptq " |t| p we get precisely gptq " c p |t| p for some c p ą 0, which let us conclude.

Preliminary results in Riemannian geometry
In this section, we collect some facts about optimal transport and Riemannian geometry that we will need in the sequel.We refer the reader to [16,23] for more details on the optimal transport theory (in the particular case of Riemannian manifolds) and to [2,7] for what concerns the variation of the arc length on Riemannian manifolds.

Optimal transport on compact Riemannian manifolds with general costs
We begin by recasting the definition of a semi-concave function on a Riemannian manifold.
where we used that ψ, ψ 1 are bounded in terms of the diameter of M, and that f 1 and f 2 are bounded on bounded sets.
Let now PpMq be the set of probability measures in M, a smooth and compact Riemannian manifold.
Theorem 3.3.Let µ, ν P PpMq be two probabilities on M which are absolutely continuous with respect to vol g : with a slight abuse of notation we will sometimes identify µ and ν with their densities with respect to vol g .Let us consider a cost function c " h ˝d that satisfies (cost) for some strictly increasing continuous function λ : r0, `8q Ñ r0, `8q.Then the following hold: where Πpµ, νq is the set of transport plans, i.e.
Πpµ, νq :" tγ P PpM ˆMq : pπ x q # γ " µ, pπ y q # γ " νu, has a unique solution, which is of the form γ T :" pid, T q # µ, and T : M Ñ M is a solution of the problem min (ii) The map T : tµ ą 0u Ñ tν ą 0u is vol g -a.e.invertible and its inverse S :" T ´1 is a solution of the problem min (iii) We have `żM ψpyq dνpyq : ϕpxq `ψpyq ď cpx, yq, @x, y P M * . ( (iv) If ϕ, ψ are optimal in (16), they are clearly Lipschitz and differentiable almost everywhere; moreover we have: • T pxq " exp x pλ ´1p´∇φpxqqq and Spyq " exp y pλ ´1p´∇ψpyqqq, almost everywhere; in particular, the gradients of the optimal functions are uniquely determined (even in case of non-uniqueness of ϕ and ψ) a.e. on tµ ą 0u and tν ą 0u, respectively; • if c " h ˝d satisfies (cost2reg) the functions ϕ and ψ are Λ-concave for some Λ P R; • ϕpxq " min yPM thpdpx, yqq ´ψpyqu and ψpyq " min xPM thpdpx, yqq ´ϕpxqu ; • if we denote by χ c the c´transform of a function χ : M Ñ R defined through χ c pyq " inf xPM tcpx, yq ´χpxqu, then the maximal value in ( 16) is also equal to and the optimal ϕ is the same as above, and is such that ϕ " pϕ c q c a.e. on tµ ą 0u.
(vi) If ν P PpMq is given, the functional F : PpMq Ñ R defined through As a consequence, φ is the first variation of F and from point (v) we deduce that φ coincides with ϕ of the optimal couple pϕ, ψq in (16).
The only non-standard point is the last one.For more details we refer to [17, Section 7.2], (see also [3] for a sketch of the proof.)Uniqueness of ψ on supp pνq is obtained from the uniqueness of its gradient and the connectedness of tν ą 0u.
We can now state (and prove) the following result, which is a direct consequence of [8,12,16].
Theorem 3.4.Let M be a smooth Riemannian manifold, and let µ, ν : M Ñ p0, 8q be two continuous probability densities, locally bounded away from zero and infinity on M, and let us consider a cost function c : M ˆM Ñ r0, `8q satisfying (cost2reg).Let T : M Ñ M denote the optimal transport map for the cost c sending µ onto ν.Then there exist two closed sets Σ µ , Σ ν Ă M of measure zero such that T : MzΣ µ Ñ MzΣ ν is a homeomorphism of class C 0,β loc for any β ă 1.In addition, if both µ and ν are of class C k,α then T : Proof.As shown in [16,Theorem 13], given two probability densities µ and ν supported on M, there exists a c-convex function u : M Ñ R Y t8u such that u is differentiable µ-a.e., and T u ppq " exp p pλ ´1p∇uppqqq (where λ is that of (cost)) is the unique optimal transport map sending µ onto ν.Furthermore, as noted in Remark 3.2, c is semiconcave; in particular, the semiconcavity is inherited by the optimal Kantorovich potential ϕ, and so u " ´ϕ is semiconvex.By Alexandrov's Theorem, we get that u is twice differentiable almost everywhere and, arguing as in [8,Proposition 4.1], we conclude that T u ppq is not in the cut-locus of p. Since the cut-locus is closed and c is smooth outside the cut-locus, there exists a set X of full measure such that, for every p 0 P X, u is twice differentiable at p 0 and there exists a neighborhood U p 0 ˆVTupp 0 q Ă MˆM of pp 0 , T u pp 0 qq such that c P C 8 pU p 0 ˆVTupp 0 q q.By taking a local chart around pp 0 , T u pp 0 qq we reduce ourself to [12,Theorem 1.3].This shows that T u is a local homeomorphism (resp.diffeomorphism) around almost every point.In particular, since T u is invertible a.e., it can be shown that T u is also a global homeomorphism (resp.diffeomorphism) outside a closed singular set of measure zero.For more details we refer to [12].
It follows that T has the form T ppq :" exp p pλ ´1p∇uqq for a proper c-convex function u defined on M which is smooth on MzΣ and Σ is a closed set of zero volume; notice that under the assumption (cost2reg) we also have that the vector field Ξ " λ ´1p∇uq, the generator of T , is at least C 2 on the open set MzΣ. Thus, T is a diffeomorphism only on MzΣ.Moreover, there exist two functions ϕ, ψ : M Ñ R, smooth on MzΣ, which are the Kantorovich potentials of the optimal transport problem.Since h is bounded and Lipschitz, we can assume that ϕ, ψ, ∇ϕ and ∇ψ are uniformly bounded on M. Actually, ϕ " ψ c and the optimality of the Kantorovich potentials implies We conclude this section, by showing that the optimal transport map is stable.
Proposition 3.5 (Stability of optimal transport).Let M be a compact manifold, c n , c : M M Ñ r0, `8q be L-Lipschitz uniformly bounded cost functions such that c n uniformly converges to c.Let µ n , ν n be probability measures on M, let γ n be an optimal plan for C cn pµ n , ν n q and let ϕ n , ψ n be c n -concave optimal Kantorovich potentials.We further assume that ϕ n , ψ n are L-Lipschitz and uniformly bounded.Suppose that µ n á µ and ν n á ν.Then, up to subsequences we have that ϕ n uniformly converges to ϕ, ψ n uniformly converges to ψ and γ n á γ where γ, ϕ, ψ are respectively an optimal plan and optimal potentials for C c pµ, νq.
If in addition µ n , µ !vol g , and c n , c satisfy (cost), we also have ∇ϕ n Ñ ∇ϕ vol g -a.e.
Proof.We already know that we can assume ϕ n , ψ n to be L-Lipschitz and uniformly bounded.
By Ascoli-Arzelà we can thus extract subsequences that converge uniformly to ϕ, ψ; passing to the limit ϕ n pxq `ψn pyq ď c n px, yq we obtain ϕpxq `ψpyq ď cpx, yq, that is ϕ and ψ are admissible potentials for c.Thanks to the weak compactness of PpM ˆMq we can also assume that γ n á γ; it is easy to see that since γ n P Πpµ n , ν n q we have γ P Πpµ, νq.If c n satisfy (cost) then, arguing similarly to [16, Lemma 7], in every point x of differentiability of ϕ n , we have that ϕ n pxq `ψn pyq " c n px, yq on the support of any optimal transport plan (in particular, on the support of γ n ), moreover ϕ n pxq `ψn pT n pxqq " c n px, T n pxqq, where we deduce that γ, ϕ, ψ are respectively an optimal plan and optimal potentials.Let us consider A the set of full vol g measure where ϕ and ϕ n for every n P N are differentiable.Let us fix x P A and let ȳ be a limit point for the sequence T n pxq; using the uniform continuity of ϕ n , ψ n , c n and their uniform convergence, passing to the limit (20) we obtain ϕpxq`ψpȳq " cpx, ȳq.Arguing again as [16, Lemma 7] we get ȳ " exp x pλ ´1p´∇ϕpxqqq " T pxq; since this holds true for every limit point we get T n pxq Ñ T pxq in A. Moreover, as in [16, Lemma 7], we get that x 1 Þ Ñ dpx 1 , ȳq is differentiable in x as well and so there is a unique geodesic between x and ȳ, which in turn implies ∇ϕ n pxq Ñ ∇ϕpxq.

Variations of the arc-length
Let pM, gq be a connected, oriented, compact, smooth Riemannian manifold of dimension n.On account of the Hopf-Rinow theorem, M is geodesically complete, i.e. for every p P M, the exponential map exp p : T p M Ñ M is defined on the entire tangent space.This assumption implies that for any two points p and q in M, there exists a length minimizing geodesic γ :ra, bs Ñ M with γpaq " p and γpbq " q and their Riemannian distance d coincides with the arc-length of γ, namely dpp, qq " Lpγq :" where 9 γ denotes the tangent vector to γ and } 9 γ} :" gp 9 γ, 9 γq where ℓ " Lpγq.Consider a smooth orthonormal frame te i u i"1,...,n defined in an open neighborhood U of γ.Each vector field e i is the infinitesimal generator of a (local) flow which we denote as Φ i .Then, for ε ą 0 small enough, we can define n-smooth variations of γ via f i : r0, εs ˆr0, ℓs Ñ U, ps, tq Þ Ñ f i ps, tq :" Φ i ps, γptqq .
For every fixed s P r0, εs we denote with γ s i :" r0, ℓs Ñ M the curve obtained as a variation of γ whose endpoints are f i ps, 0q and f i ps, ℓq and whose variational vector fields are exactly e i | γ ": ξ i .By denoting with 9 γ s i the tangent vector to γ s i , we have the following: Lemma 3.6.Let γ s i be a variation of geodesic γ connecting p to q defined as above.Then the first and second variations of the arc-length Lpγ s i q satisfy respectively d ds ˇˇs "0 Proof.Parametrizing γ by arc-length and calling E the energy of γ s i , we have Clearly, since γ is a geodesic, then 9 γ is parallel transported along γ, i.e. ∇ 9 γ 9 γ " 0. By using [2, Theorem 2.6.5]together with the computations in the proof of [2, Theorem 5.3.2],we obtain the result.
Remark 3.7.Let now γ ´s i be the variations of γ obtained with the variational vector fields ´ξi and consider the Taylor expansion of Lpγ p¨q i q with respect to s, namely Lpγ s i q s 2 `ops 2 q.
Using d ds ˇˇs "0 Lpγ s i q `d ds ˇˇs "0 Lpγ ´s i q " 0 and Lpγ ´s i q we can immediately conclude that Lpγ s i q `Lpγ ´s i q " 2Lpγq `d2 ds 2 ˇˇs "0 Lpγ s i qs 2 `ops 2 q .
Moreover, if h is an increasing, non-negative C 2 -function function, then hpLpγ s i qq " hpLpγqq `sh Lpγ s i q ȷ `ops 2 q.

Simultaneous local variations on good normal neighborhoods
This section is dedicated to the construction in Theorem 4.7 of simultaneous local variations of geodesics between a point q P M and its image T pqq P M. Such variations are designed to behave as much as possible as those obtained by rigid translation in the Eucledian setting.
For the sake of clarity we will describe briefly the construction: 1. Given p and T ppq we consider an orthonormal frame in p and its corresponding one in T ppq, constructed by parallel transport on the geodesic connecting p and T ppq.Then we can consider small enough neighborhoods U p and T pU p q such that in each neighborhood we have good control of the orthonormal frames defined via the exponential map from p and T ppq (V i and W i respectively): for the details of the properties see Definition 4.3 of ε-small normal neighborhood.
2. For every q P U p we can interpolate on the geodesic from q to T pqq between the frames V i pqq and W i pT pqqq.Notice that even though Π T pqq V i ‰ W i we still have Π T pqq V i « W i (Lemma 4.9) and so the interpolating frames X i are almost parallel and almost orthogonal.
3. Given i P t1, . . .nu we can construct variations γ s i of the geodesic γ between q and T pqq simply deforming the geodesic γ through the flow of the vector field X i , as shown below.

Good normal neighborhood and ε-small normal neighborhood
Now we introduce the notion of good normal neighborhood of a point p P M. To this end, let V p Ď T p M be a neighborhood of 0 P T p M such that the exponential map is a diffeomorphism between V p and exp p pV p q ": N p Ď M. We shall refer to N p as normal neighborhood of p.
Let N p be a normal neighborhood of p and denote by Υ the isomorphism between T p M and R n .Then we can use the exponential map exp p : T p M Ñ M to define a system of coordinates by px 1 , . . ., x n q :" pexp ˝Υq ´1 : We will refer to this system of coordinates as geodesic normal coordinates.This choice of coordinates induces a local coordinate frame te j u :" tB x j u and the Christoffel symbols of the Levi-Civita connection ∇ are the n 3 -functions defined by With the next proposition, we recall an important property of the geodesic normal coordinates.We refer to [2, Proposition 2.6.31] for more details.
Proposition 4.1.Let p P N p P M and let px 1 , . . ., x n q be geodesic normal coordinates.Denote with Γ k ij the Christoffel symbols in geodesic normal coordinates.Then we have gpe j , e i qppq " where δ is the Kronecker delta.
Corollary 4.2.Let te j u be a local frame for geodesic normal coordinates and define r e j " e j {}e j }.Then for every ε P R, there exists a δ " δpεq P R such that for any 0 ă δ ă δ the n 3 -smooth function r for every q in the geodesic ball B δ ppq Ă N p of radius δ.
Proof.By a straightforward computation, we can see that where Γ k ij are the Christoffel symbols in geodesic normal coordinates.Since for every fixed i, j, k, Γ k ij are smooth functions, then we can conclude.
Let now T be a diffeomorphism between two open subsets of M.Then, whenever N p is a normal neighborhood of p, also T pN p q is a normal neighborhood of T ppq.Based on this observation we can finally introduce the definition of ε-small normal neighborhood with respect to the diffeomorphism T .In what follows we will make use of the following notation: given a map T : M Ñ M and a vector field X : T M Ñ T M, then Π T ppq Xppq will denote the parallel transport of Xppq along the geodesic between p and T ppq.Definition 4.3.Let Ω, ∆ be open subsets of M and T P DiffpΩ, ∆q with infinitesimal generator Ξ, a section of T M such that T ppq " exp p pΞppqq.Let then ε ą 0 and p P Ω be fixed.We say that U p Ă N p is an ε-small normal neighborhood of p with respect to T if the following holds true: (i) U p is open and U p and T pU p q are ε-separated, that is dpp 1 , q 1 q ě 4ε @p 1 P U p , q 1 P T pU q q (25) in particular we have also U p X T pU p q " H; (ii) diampU p q, diampT pU p qq ď ε; (iii) there exists δ ă δ{2, being δ " δpεq the one provided by Corollary 4.2, such that the δ-tubular neighborhood of U p is contained in the geodesic ball Bδppq and the δ-tubular neighborhood of T pU p q is contained in BδpT ppqq; (iv) it is possible to define two local frames for geodesic normal coordinates, te j u on U p and tf j u on T pU p q, and the corresponding modifications r e j " e j {}e j } and r for every q P U p , where, with a slight abuse of notation, we use the same notation r Γ k ij to denote the symbols on both the sets U p and T pU p q; (v) r e 1 ppq " pΞ{}Ξ}qppq and moreover r f i pT ppqq " Π T ppq r e i ppq for all i " 1, . . ., n; (vi) for every q P U p we have |pΞ{}Ξ}qpqq ´pΞ{}Ξ}qppq| ă ε and |Π T pqq pΞ{}Ξ}qpqq ´ΠT ppq pΞ{}Ξ}qppq| ă ε Lemma 4.4.If T P DiffpΩ, ∆q is a diffeomorphism with infinitesimal generator Ξ, then every non-fixed point of T admits an ε-small normal neighborhood with respect to T inside Ω for any ε ą 0 small enough.In particular it is sufficient that ε ď Cdpp, T ppqq.
Proof.We aim to show that it is always possible to find a proper set of geodesic normal coordinates in U p and T pU p q so that the requests of Definition 4.3 are satisfied.Let p be a non-fixed point for T and δ be the parameter provided by Corollary 4.2, then it is always possible to find δ ă δ such that the geodesic balls B δ ppq and B δ pT ppqq are disjoint in M, and in fact ε-separated.Up to further decrease δ, conditions (i), (ii), (iii) of Definition 4.3 are clearly satisfied.Moreover, being the vector field Ξ smooth on Ω, also conditions (vi) and (vii) of Definition 4.3 must be true up to further decrease δ.Finally, observe that a rigid rotation inside R n does not affect the notion of geodesic normal coordinates.As a consequence, it is always possible to find a system of geodesic normal coordinates inside U p so that e 1 ppq is parallel to Ξppq, thus r e 1 ppq " Ξppq{}Ξppq}.In a similar way, we can find a system of geodesic normal coordinates inside T pU p q so that f i pT ppqq " Π T ppq e i ppq (notice that orthogonality is preserved by parallel transport).Thanks to the estimates of Corollary 4.2, we conclude the proof.Remark 4.5.Notice that a fixed point of T does not have a ε-small normal neighborhood.However, it is still possible to have properties (ii) to (vii) constructing a local normalized frame tr e i u around p on a normal neighborhood N p and then letting r f i " r e i : the conclusion follows taking a small enough neighborhood U p .
In what follows, whenever U p is a ε-small normal neighborhood of p with respect to T , we will always denote V i pqq " r e i pqq and W i pT pqqq " r f i pT pqqq for all q P U p and i P t1, . . ., nu.We recall that V 1 ppq " pΞ{}Ξ}qppq, while W i pT ppqq " Π T ppq V i ppq.
Lemma 4.6 (Generalized Berger's Lemma).Let pM, gq be a C 2 compact Riemannian manifold and let r K be a uniform bound for the sectional curvatures.Then then for any p P M and u, v, w, z P T p M one has |gpRpu, vqw, zq| ď 7 r K}u} }v} }w} }z}; By the arbitrariness of z we also have }Rpu, vqw} ď 7 r K}u} }v} }w}.
This concludes the proof.
As a direct consequence of the generalized Berger's lemma, if U p is an ε-small normal neighborhood of p with respect to T , using Corollary 4.2, we get Moreover, analogous estimates hold inside T pU p q |gpRpW i , W j qW k , W j q| ď 2 r K and |gpW j , W 1 ´ΠpΞq{}Ξ}q| ă ε for all i, j, k P t1 . . .nu, where the field ΠpΞq is, as usual, obtained by parallel transport along the geodesics between q and T pqq for every q P U p .

Construction of perturbed geodesics
We can now state the main result of this section.
Theorem 4.7.Let M be a compact Riemannian manifold with the Ricci curvature bounded from below by K, and let again r K be a uniform bound for the sectional curvatures.Let ε ą 0 and T P DiffpΩ, ∆q be a diffeomorphism with infinitesimal generator Ξ.For any p, consider U p its ε-small normal neighborhood with respect to T and the respective frames tV i u P ΓpF O U p q and tW i u P ΓpF O T pU p qq provided by Definition 4.3 (or Remark 4.5 in case p is a fixed point for T ).Then for every geodesic γ connecting q P U p to q :" T pqq P T pU p q (with length Lpγq " dpq, qq) there exists a family of variations tγ s i u i satisfying such that Lpγ s i q ˇˇˇă Cε, Lpγ s i q ă Cε ´KLpγq and where C is a positive constant depending only on K, r K, the dimension and diameter of M and the Lipschitz constant of Ξ on U p .
The rest of this section is devoted to proving Theorem 4.7.Let us observe that, given q P U p , it is not true in general that V 1 pqq " 9 γpqq or W 1 pT pqqq " Π T pqq V 1 pqq.Indeed, these identities are, a priori, only valid if q " p.The mismatch between 9 γpqq and V 1 , and between W 1 pT pqqq and Π T pqq V 1 pqq comprises the main issue in the proof of Theorem 4.7.On the other hand, whenever q satisfies both V 1 pqq " 9 γpqq and W 1 pT pqqq " Π T pqq 9 γpqq (28) then estimates ( 26) and ( 27) would be straightforward.This lucky situation is investigated in the following Lemma.
Let te i pxqu be an orthonormal basis of T x M such that e 1 pxq " 9 γpαq and denote the parallel transport of e i pxq along γ by e i pγptqq " Π γptq e i pxq.Finally, consider the smooth variation of γ given by f i : r0, δs ˆrα, βs Ñ M, ps, tq Þ Ñ f i ps, tq :" exp γptq ps e i q where we omit the dependence of e i on the point on γptq and set γ s i p¨q " f i ps, ¨q.Then we have Lpγ s i q " 0 and d 2 ds 2 ˇˇs "0 Proof.For any fixed t P rα, βs, f p¨, tq is a geodesic generated by the 'starting' tangent vector e i .By extending te i u to these geodesics f i by parallel transport, we obtain ∇ 9 γ e i " 0 and ∇ e i e i " 0 .
By using that the parallel transport preserves the scalar product, i.e.
g `ei pγptqq, e j pγptqq ˘" g `Πγptq e i pxq, Π γptq e j pxq ˘" g `ei pxq, e j pxq ȃnd te i u are orthonormal, by Lemma 3.6 we can conclude our proof.
As already observed, we cannot apply, in general, Lemma 4.8 to the points of U p .We want to understand now how much the equality (28) is violated.Lemma 4.9.There exists a constant C " CpLippΞ| Up q, r K, nq such that for every q P U p , letting q " T pqq P T pU p q, we have In case p is a fixed point for T we can suppose V i " W i and }W i pqq ´ΠT pqq V i } ď Cεdpq, T pqqq @i " 1, . . ., n.
Proof.Let us denote ℓ " dpp, pq.At first let us define the geodesic σ : r0, 1s Ñ M connecting p to q; then for every s P r0, 1s we consider the geodesic γ s : r0, 1s Ñ M from σpsq to T pσpsqq.Notice that | 9 γ s | " dpσpsq, T pσpsqqq ď dpp, qq `2ε.We then want to bound Jps, tq " B s γ s ptq.Since J is a variation of geodesics, it satisfies the Jacobi equation γ s ptq, Jps, tqq 9 γ s ptq " 0.
Given that Ξ is L-Lipschitz in U p we obtain |∇ 9 γs Jps, 0q| ď L|B s γ s p0q| " Ldpp, qq.Since |Jps, 0q| " |B s γ s p0q| " dpp, qq we have finally that |Jpt, sq| ď a f s ptq ď C ¨dpp, qq for every t, s.Now we can consider for every s the transported field on γ s defined as V i pt, sq " Π γsptq pV i p0, sqq.
Notice that we want to understand how far is V i p1, 1q from W i pqq.In order to do this we consider hpsq " V i p1, sq ´Wi pγp1, sqq: notice that we wanto to estimate hp1q, and we know that hp0q " 0 by definition of W i in γp1, 0q " p.In particular, we have But now, using the properties of W i as a local frame for geodesic normal coordinates we have that in charts a similar estimate holds for V i .In particular, we only now need to estimate B s V i p1, sq " ∇ Bsγ V i p1, sq, and in order to do so we use that In particular, using the Riemann tensor and that V i pt, sq is transported parallel along the curves γ s for every s, we get Now by the (generalized) Berger's Lemma 4.6 we obtain that At this point we can use that }B s γ s } ď C ¨dpp, qq, } 9 γ s } ď dpp, pq `2ε, }V i pt, sq} " 1, to conclude for some constant C depending only on U p .As for the fixed point case, we know we can suppose V i " W i and that the whole geodesic γ between q and T pqq is inside U p .In particular we have Definition 4.10 (Quasi-orthonormality).Let V be a n-dimensional Hilbert space and 0 ď σ ă 1{n.We say that tX i u i is a σ-orthonormal basis if |xX i , X j y ´δi,j | ď σ.Lemma 4.11.Let tX i u i that be a σ-orhtonormal basis for V , and n-dimensional Hilbert space.Then, as long as σn ă 1{2, there exists a dimensional constant C n such that for any linear map Proof.First of all the condition σ ă 1{n grants us that X i is in fact a basis thanks to the fact that the matrix X T X is diagonally dominant and thus invertible.Now, consider any linear map A : V Ñ V and let A i,j be the matrix of A with respect to X. Notice that |xAX i , X i 1 y| " | ř j A i,j xX j , X i 1 y| ě |A i,i 1 | ´σ ř j |A i,j |; summing up this inequality on i 1 we get p1 ´nσq In the end we can estimate trpAq " ř i A i,i using estimate (30) on the remainder: and so we can conclude, for example choosing C n " 4n 2 .
Proposition 4.12.Let M, ε, p, T , U p , tV i u, tW i u be as Theorem 4.7.Consider q P U p , γ the geodesic connecting q to T pqq and, for simplicity, denote ℓ " Lpγq.Then it is possible to construct tX i u P F O M| γ for which the following holds: (I) X i pγp0qq " V i pqq and X i pγpℓqq " W i pT pqqq for every i P t1, . . ., nu.
(II) The variations of γ given by Lpγ s i q " gpW i pT pqq, 9 γpℓqq ´gpV i pqq, 9 γp0qq (III) Moreover, X i is a 2Cε-orthonormal basis in the spirit of Definition 4.10, that is it satisfies |gpX i pγptqq, X j pγptqqq ´δi,j | ď 2Cε @i, j P t1, . . ., nu, t P r0, ℓs Proof.Let η : r0, ℓs Ñ r0, 1s be a smooth function such that ηp0q " 0 and ηpℓq " 1: we can also assume |η 1 pτ q| ď 2 ℓ .Now we consider r V i and Ă W i as parallel transport along γ respectively of V i pγp0qq and W i pγpℓqq along γ.In particular, thanks to Lemma 4.9 and condition (ii) of Definition 4.3 we have } Ă W i pγpτ qq ´r V i pγpτ qq} ď Cε for some C " CpLipp∇ϕ| Up q, r K, nq and every τ P r0, ℓs.Then we define the vector fields X i as ΓpF 0 M| γ q Q X i pγpτ qq :" `1 ´ηpτ q ˘r V i pγpτ qq `ηpτ q Ă W i pγpτ qq for τ P r0, ℓs.
Notice that since Plugging this estimate in (23) and using (25) we obtain the estimate of the second derivative of the length Lpγ s q in pIIq (notice that the variation curves have been constructed in such a way that, when extended properly outside, we have ∇ X i X i " 0 on r0, ℓs).Next, simply using the definition of X i in (22) we obtain its first derivative.Notice that a similar calculation can be carried on for the fixed point case, using the corresponding estimate in Lemma 4.9.
In order to prove the quasi-orthonormality, we first show it for i " j, where we observe and in particular 1 ´ε2 ď gpX i , X i q ď 1.For the off-diagonal term we then compute where we used }X i } " gpX i , X i q 1{2 ď 1 and }∇ 9 γ X i } ď 2Cε ℓ .Using that gpX i pγptqq, X j pγptqqq " 0 for t " 0 and t " ℓ we deduce that |gpX i , X j q| ď 2Cε concluding the proof.
We now have all the ingredients to prove the main Theorem of this section.
Proof of Theorem 4.7.We discuss in detail the proof of the more difficult case when p is not a fixed point for T , while we refer to Remark 4.13 for the suitable modifications of the following argument in the case of a fixed point.Let q P U p be fixed and γ be the length minimizing geodesic connecting q to T pqq inside M. We consider the geodesic parametrized by the arclength, so γ : r0, ℓs Ñ M where ℓ denotes the length of γ.In what follows we will use the same notation as in Proposition 4.12.Let then X i be the class of variation fields introduced in Proposition 4.12, where tV i u and tW i u are the usual coordinate frames associated to the ε-small normal neighborhood U p .Accordingly to Proposition 4.12, we then define the variations of γ f i : r0, δ{2s ˆr0, ℓs Ñ M, ps, tq Þ Ñ f i ps, tq :" exp γptq ´s X i pγptqq ¯γs i p¨q " f i ps, ¨q, where δ is provided by the definition of ε-small normal neighborhood and, hence, it ultimately depends on ε.Condition (26) follows immediately by Proposition 4.12 pIq.
Next, gathering together (22), the first order identity in Proposition 4.12 pIIq, and the fact that Π T pqq 9 γp0q Lpγ s i q ˇˇˇă Cε for every i, and Lpγ s i q ă pCε ´KqLpγq where C is the same constant of Theorem 4.7.Indeed we can find a ε-small normal neighborhood V p of p with respect to T and some U p Ă V p such that also T pU p q Ă V p .Then, by choosing W i " V i and using that dpq, T pqqq is very small, estimates (26) and ( 27) are still valid in this setting.
We conclude this section with an immediate consequence of Theorem 4.7 and Remark 4.13.
Corollary 4.14.Let T P Diff pΩ, ∆q with infinitesimal generator Ξ and p P Ω, where Ω, ∆ are open subsets of M. For every ε ą 0 there exists an ε-small normal neighborhood U p such that the following holds.For every q P U p and s P r0, δ{2s one can define q ˘s i :" exp q p˘sV i pqqq and q ˘s i :" exp T pqq p˘sW i pT pqqqq , where W i " V i in case p were a fixed point for T .Denoting as usual γ the geodesic connecting q and q " T pqq and ℓ " Lpγq, for every s P r0, δ{2s and every i P t1 . . .nu there exist suitable variations of γ γ ˘s i : r0, ℓs Ñ M such that γ ˘s i p0q " q ˘s i and γ ˘s i pℓq " q ˘s i satisfying estimates (27).Moreover, for every non-negative increasing h P C 2 pMq one has n ÿ i"2 rhpdpq `s i , q `s i qq`hpdpq ´s i , q ´s i qq´2hpdpq, qqqs ď s 2 pCε´KLpγqq h 1 pdpq, qqq`ops 2 q`s 2 ε 2 r C (32) hpdpq `s 1 , q `s 1 qq `hpdpq ´s 1 , q ´s 1 qq ´2hpdpq, qqq ď 2s 2 Cε h 1 pdpq, qqq `ops 2 q `s2 ε where C depends only n, K, r K and r C depends on n and sup r0,diamMs h 2 .
Proof.The proof is immediate once we observe that the curve γ ˘s i connecting q ˘s i and q ˘s i is surely longer than dpq ˘s i , q ˘s i q.We recall that, by construction, γ is precisely the geodesic between q and q " T pqq.Therefore, dpq ˘s i , q ˘s i q ď Lpγ ˘s i q , and dpq, qq " Lpγq, and the conclusion follows by gathering together (34) and (24).
5 Linear and nonlinear gradient inequalities

The smooth case
As a first step, we shall prove, in the smooth setting, the validity of the 4 gradient inequality on small enough neighborhoods of an arbitrary point outside a negligible set Σ.In what follows we will use the notation introduced in Theorem 4.7.
Proposition 5.1.Assume the Setup 1.4, where additionally h and ℓ respectively satisfy (cost2reg) and (cvxreg).Let ϕ, ψ, the optimal Kantorovich potentials between µ and ν for cost c " h ˝d and T be the optimal map from µ to ν.Then there exists a vol g -negligible closed set Σ such that for every p P MzΣ there exists a constant c p ą 0 depending on the geometric setting of the problem (thus on n, µ, ν, h, ϕ, ψ and the geometry of M) such that for every 0 ă ε ă 1 one can find an ε-small normal neighborhood of p, denoted by U p , for which the following inequality holds whenever U is a geodesic open ball contained in U p .If moreover p is not a fixed point for T we also have the stronger inequality ż U µ div ∇ϕ vol g `żT pU q ν div ∇ψ vol g ď ´K ż U h 1 `dpq, T pqqq ˘dpq, T pqqqdµ `I1 pU q `εc p µpU q, where I 1 pU q :" ş U µ ¨∇V ∇ V ϕ vol g `şT pU q ν ¨∇W ∇ W ψ vol g , in which V " ∇ϕ }∇ϕ} and W " ∇ψ }∇ψ} .Moreover I 1 pU q satisfies I 1 pU q ď c p εµpU q . (37) Proof.Let us consider Σ " Σ µ Y Σ ν given by Theorem 3.4 and c " h ˝d.In particular we have vol g pΣq " 0 and T is at least a C 3 diffeomorphism in MzΣ; moreover, for the infinitesimal generator of T we have Ξ " λ ´1p´∇ϕq P C 2 pMzΣq which implies that ϕ P C 3 pMzΣq; Let p, ε be fixed and let U p be an ε-small normal neighborhood of p with respect to T .Observe, that the existence of such U p is ensured by Lemma 4.4, moreover estimates of Corollary 4.14 applies in this setting.We introduce the fields t˘V i u, t˘W i u given by Definition 4.3 and the associated flows " B s Φ ˘ps, qq " ˘Vi pΦ ˘ps, qqq Φ ȋ p0, qq " ˘Vi pqq " B s Ψ ˘ps, T pqqq " ˘Wi pΨ ˘ps, T pqqqq Ψ ȋ p0, T pqqq " ˘Wi pT pqqq for every q P U p and s P r0, δs, where δ is given by the small normal neighborhood, so it ultimately depends on p and ε.Moreover, we let Φ ȋ " Ψ ȋ whenever p is a fixed point for T .For future use, we calculate Φ ì ps, ¨q7 vol g " vol g `s pdiv V i qvol g `opsq which, thanks again to Corollary 4.2 and for all s P r0, δs.Analogous estimates hold also for Φ í and Ψ ȋ .Moreover, from (18)  where C, r C are geometric constants provided by Corollary 4.14.Let us underline that r C do not depend on p nor on U , while C depends on p and U through the Lipschitz constant of Ξ, which is still controlled on compact sets of MzΣ.On the other hand, using (38), the uniform bounds of ϕ, ψ and µ, ν and the relation ν " T 7 µ, for every i " 1 . . .n it is easy to get ż U ϕpΦ ì ps, qqq `ϕpΦ í ps, qqq `ψpΨ ì ps, T pqqqq `ψpΨ í ps, T pqqqq ´2`ϕ pqq `ψpT pqqq ˘dµ ě ´żU X Φ í pU q `ϕpΦ ì ps, qqq ´ϕpqq ˘`µpΦ ì ps, qqq ´µpqq ˘vol g ´żT pU qXΨ í pT pU qq `ψpΨ ì ps, qqq ´ψpqq ˘`νpΨ ì ps, qqq ´νpqq ˘vol g ´p2s 2 εn `ops 2 qq `}∇ϕ µ} L 8 vol g pU q `}∇ψ ν} L 8 vol g pT pU qq ˘`boundary terms (41) where the boundary terms correspond to ż U zΦ í pU q `ϕpΦ ì ps, qqq ´ϕpqq ˘µpqqvol g ´żΦ í pU qzU `ϕpΦ ì ps, qqq ´ϕpqq ˘µpΦ ì ps, qqqvol g `żT pU qzΨ í pT pU qq `ψpΨ ì ps, qqq ´ψpqq ˘νpqqvol g ´żΨ í pT pU qqzT pU q `ψpΨ ì ps, qqq ´ψpqq ˘νpΨ ì ps, qqqvol g .
Observe that BU and BT pU q being smooth curves and using standard geometric arguments it is possible to deduce lim sÑ0 1 s 2 boundary terms " where we denoted with n i , n i the i-th components of the outer unit normals to BU and BT pU q respectively.Gathering together (39), (41), dividing by s 2 and taking the limit as s Ñ 0, we then obtain ´żU ∇ϕ ¨∇µ vol g ´żT pU q ∇ψ ¨∇ν vol g `żBU ∇ϕ ¨nµ `żBT pU q ∇ψ ¨nν ď ´K ż U h 1 `dpq, T pqqq ˘dpq, T pqqqdµ `ε" vol g pU qpε r C `n2 }∇ϕ µ} L 8 q `vol g pT pU qqn 2 }∇ψ ν} L 8 ı .
Finally, by applying the divergence theorem for Riemannian manifolds, we conclude (35) with * .
Since we choose ε ă 1, c p does not depend on the choice of U and so this concludes the proof.
Dividing by s 2 we get I 1 pU q ´CεµpU q ď lim sÑ0 I s 1 {s 2 ď I 1 pU q `CεµpU q.Using again (40), we can obtain I 1 pU q ď CεµpU q and using instead the estimate from above of I s 1 in terms of I 1 we get the improved inequality (36).
An immediate consequence of Proposition 5.1 is the following Proposition 5.2.Assume the Setup 1.4 , where additionally h and ℓ respectively satisfy (cost2reg) and (cvxreg).Let ϕ, ψ, the optimal Kantorovich potentials between µ and ν for cost c " h ˝d and T be the optimal map from µ to ν.Let Σ be the set given by Proposition 5.1.
Proof.Given p P MzΣ and 0 ă ε ă 1 arbitrary small, thanks to Proposition 5.1, we know that there exists some ε-small normal neighborhood U p of p where (35) holds for any geodesic ball U Ă U p .Thanks to the regularity assumptions we have that div ∇ϕpqq and div ∇ψ `qȃ re continuous, respectively in U p and T pU p q, as a consequence of Theorem 3.4.In particular, dividing (35) by µpU q " νpT pU qq (by the injectivity of T ) we deduce, by the continuity of the integrands, that div ∇ϕpqq `div ∇ψ `T pqq ˘ď ´Kh 1 pdpq, T pqqqqdpq, T pqqq `cp ε.

The general case
We are now in the position to prove the main result of this paper, namely Theorem 1.1.As already anticipated, in order to get the result in full generality, dropping the smoothness of µ and ν and assumptions (cost2reg) and (cvxreg), we will perform an approximation procedure with more regular objects: we will then use Proposition 5.2 for the smooth case and rely on Proposition 3.5 to conclude.We now prove the last ingredient, which is a general approximation result for λ-concave functions, which is needed in order to deal with the exceptional set Σ which appears in Proposition 5.2.Proposition 5.3 (Approximation of λ-concave functions).Let ϕ : pM, gq Ñ R be an L-Lipschitz λ-concave function such that }ϕ} 8 ď M and ℓ be a convex function that satisfies (cvxreg).Suppose that there is an open set Ω Ď M such that ϕ P C 2 pΩq.Then for every K Ď Ω compact there exists A ą 0 which depends only on M and a sequence ϕ m : pM, gq Ñ R such that (i) ϕ m P C 2 pMq; ϕ m is also pAL `AM q-Lipschitz and pAλ `AM `ALq-concave; (iii) div `ℓ1 p∇ϕ m q ˘ď C distributionally in M, where C depends only on L, λ, M , M and ℓ.
Proof.To prove the existence of such sequence, it is sufficient to use a finite C 2 partition of unity tχ i u N i"1 such that |∇χ i | ď C 1 and }D 2 χ i } ď C 2 supported on some tΩ i u N i"1 , then do a compactly supported convolution in charts ϕ m,i " η i m ˚pϕχ i q and then sum up the contributions again.It is clear that ϕ m,i P C 2 pMq and so we have also ϕ m P C 2 pMq.Moreover, by the usual properties of convolutions, we have ϕ m,i Ñ ϕ i uniformly on M, ∇ϕ m,i Ñ ∇ϕ i in L p pMq, and D 2 ϕ m,i Ñ D 2 ϕ i uniformly on K X Ω i , so we deduce piiq by summing up the separate contributions.For the first point, we have ∇ϕ m " ÿ i η m ˚pχ i ∇ϕq `ηm ˚pϕ∇χ i q |∇ϕ m ppq| ď L we conclude by choosing A ě maxtN, N C 1 u.With a similar strategy, we obtain the uniform bound for the quasi-concavity using the following estimate for the distributional second derivative D 2 pϕχ i q " χ i D 2 ϕ `∇ϕ b ∇χ i `ϕD 2 χ i ď λ `LC 1 `M C 2 ; convolving and summing up the contribution we obtain the claim for the quasi-concavity by choosing A ě maxtN, N C 1 , N C 2 u In order to do the final estimate we observe that if ∇ϕ m ppq " 0 there is nothing to prove since ℓ 1 p∇ϕ m q " 0 in a neighborhood of p, while if ∇ϕ m ppq ‰ 0 we can consider X 1 " ∇ϕ m {}∇ϕ m }pqq and pX i q i"1...n an orthonormal basis on T q M complementing X 1 : we extend this basis as geodesic normal coordinates in a small neighborhood of q.
We thus have div pV q " ř i gp∇ X i V, X i q " ř i ∇ X i gpV, X i q and in particular div p∇ϕq " ř i ∇ X i ∇ X i ϕ: we will denote D ii ϕ " ∇ X i ∇ X i ϕ, and the quasi-concavity of ϕ m implies D ii ϕ m ppq ď pAλ `AM `ALq for every i " 1, . . ., n.We compute now div `ℓ1 p∇ϕ m q ˘" div `2g This concludes the proof.
We have now all the ingredients to prove the main Theorem 1.1.
Letting now D Ò Ω we have ε Ñ 0 and we get the conclusion by monotone convergence and the fact that µpMzΩq " 0. In order to prove (1) for any µ, ν P W 1,1 pMq, we perform another approximation argument: we consider µ m , ν m smooth positive densities which converge in L 1 to µ, ν respectively and such that ∇µ m Ñ ∇µ and ∇ν m Ñ ∇ν in L 1 pMq, where we can assume that the convergence is dominated.Moreover, thanks to Proposition 3.5 we also have that ∇ϕ m Ñ ∇ϕ pointwise vol galmost everywhere, where ϕ m are the Kantorovich potentials for µ m , ν m , and so we can pass to the limit via dominated convergence since we also have ϕ m , ϕ uniformly Lipschitz.In order to prove the theorem also for c which satisfies (cost), we can approximate it uniformly with c m which satisfies (cost2reg) and again we conclude using Proposition 3.5.Finally, we can approximate every ℓ satisfying (cvx) with some ℓ m satisfying (cvxreg) in such a way that ℓ 1 m Ò ℓ 1 and we conclude again by dominated convergence (notice that this approximation argument works also when ℓ 1 p0q ‰ 0, paying attention to Remark 1.2 for the correct interpretation of (1)).
With this choice, if χ " r ϱ´ϱ is the difference between two probability measures, then we have νq " max * is convex.Moreover, if tν ą 0u is a connected open set we can choose a particular potential φ, defined as φpxq " inf thpdpx, yqq ´ψpyq : y P supp pνqu , where ψ is the unique (up to additive constants) optimal function ψ in (16) (i.e.φ is the c´transform of ψ computed on Mˆsupp pνq).
n pxq " exp x pλ ´1 n p´∇ϕ n pxqqq.We can now compute nÑ8 ˇˇˇż MˆM pϕ `ψ ´cq dγ n ˇˇ" lim nÑ8 ˇˇˇż MˆM pϕ `ψ ´c ´pϕ n `ψn ´cn qq dγ n ˇˇď lim nÑ8 }ϕ ´ϕn } 8 `}ψ ´ψn } 8 `}c ´cn } 8 " 0; 6T pqqq ´ΠT pqq V i pqq, 9 γpℓq ˇˇď Cεdpq, T pqqq, where in the end we used Lemma 4.9: we conclude that the first order inequality in (27) holds.We next notice that the lower bound on the Ricci curvature implies that if V i is an orthonormal basis we have Notice that ricpv, vq " ´trpAq where A : T p M Ñ T p M is the linear map defined by AX :" RpX, vqv; in particular, thanks to Lemma 4.6 we have }A} ď 7 r K}v} 2 where r K is a bound on the sectional curvature and so, using that X i is a 2Cε-orthonormal base we get, by (29) Cε, and in particular by the antisymmetry of the Riemann tensor and Lemma 4.6we have |gpRpX 1 , 9 γq, 9 γ, X 1 q| ď 7 r KC 2 ε 2 .So applying now (31) to estimates pIIq of Proposition 4.12, we get where estimate C is depending only on the Lipschitz constant of Ξ in U p and C n depends only on the dimension n.Similarly we get d 2 ds 2 | s"0 Lpγ s 1 q ď 2Cε.Remark 4.13.Let ε ą 0 be fixed.If p is a fixed point for T , then we can easily obtain i ˇˇgpW i