A glimpse into the differential topology and geometry of optimal transport

This note exposes the differential topology and geometry underlying some of the basic phenomena of optimal transportation. It surveys basic questions concerning Monge maps and Kantorovich measures: existence and regularity of the former, uniqueness of the latter, and estimates for the dimension of its support, as well as the associated linear programming duality. It shows the answers to these questions concern the differential geometry and topology of the chosen transportation cost. It also establishes new connections --- some heuristic and others rigorous --- based on the properties of the cross-difference of this cost, and its Taylor expansion at the diagonal.


Introduction
What is optimal transportation? This subject, reviewed by Ambrosio and Gigli [5], McCann and Guillen [58], Rachev and Ruschendorf [69], and Villani [81] [82] among others, has become a topic of much scrutiny in recent years, driven by applications both within and outside mathematics. However, the problem has also lead to the development of its own theory, in which a number of challenging questions arise and some fascinating answers have been discovered. The present manuscript is intended to reveal some of the differential topology and geometry underlying these questions, their solution and variants, and give some novel and simple yet powerful heuristics for a few highlights from the literature that we survey. It attempts to frame the phenomenology of the subject, without delving deeply into many of the methodologies -both novel and standard -which are used to pursue it. The new heuristics are largely based on the properties of the cross-difference (4), and its Taylor expansion (6) at the diagonal.
Given Borel probability measures µ ± on complete separable metric spaces M ± , and a continuous bounded function c(x, y) representing the cost per unit mass transported from x ∈ M + to y ∈ M − , the basic question is to correlate the measures µ + and µ − so as to minimize the total transportation cost. In Monge's 1781 formulation [63], we seek to minimize (1) among all Borel maps G : M + −→ M − pushing µ + forward to µ − = G # µ + , where the pushed-forward measure is defined by G # µ + (Y ) = µ + (G −1 (Y )) for each Y ⊂ M − . This question is interesting, because it leads to canonical ways to reparameterize one distribution of mass with another. When the probability measures are given by densities dµ ± (x) = f ± (x)dx on manifolds M ± , we can expect G to satisfy the Jacobian equation ± det[∂G i /∂x j ] = f + (x)/f − (G(x)). Additional desirable properties of G can sometimes be guaranteed by a suitable choice of transportation cost; for example, G will be irrotational for the quadratic cost c(x, y) = 1 2 |x − y| 2 on Euclidean space [9]. For subsequent purposes, we will often assume the cost c(x, y) and manifolds M ± to be smooth, but quite general otherwise.
In Kantorovich's 1942 formulation, we seek to minimize cost(γ) := over all joint measures γ ≥ 0 on M + × M − having µ + and µ − as marginals. The form of the latter problem -minimize the linear functional cost(γ) on the convex set Γ(µ + , µ − ) := {γ ≥ 0 | π ± # γ = µ ± }, where π + (x, y) = x and π − (x, y) = y -makes it easy to show the Kantorovich infimum is attained. A result of Pratelli [67] following Ambrosio and Gangbo asserts that its value coincides with the Monge infimum min γ∈Γ(µ + ,µ − ) if c is continuous and µ + is free of atoms. However it is not straightforward to establish uniqueness of the Kantorovich minimizer, nor whether the Monge infimum is attained, and if so, whether the mapping G which attains it is continuous. A sufficient condition (A1) ′ + for existence (and uniqueness) of optimizers G (and γ) was found by Gangbo [30] and Levin [44], building on work of many authors, including Brenier, Caffarelli, Gangbo, McCann, Rachev and Rüschendorf. When M ± ⊂ R n are the closures of open domains, sufficient conditions for the existence of a smooth minimizer G : M + −→ M − were provided by Ma, Trudinger and Wang [53], building on work of Delanoë, Caffarelli, Urbas and Wang, and later refined through work of Delanoë, Figalli, Ge, Kim, Liu, Loeper, McCann, Rifford, Trudinger, Villani and Wang, among others. See Appendix A for a statement of their conditions (A0) ′ -(A4) ′ . At the same time, we introduce a new but equivalent formulation of conditions (A0) ′ -(A4) ′ in terms of the cross-difference (4), which emphasizes their purely topological (A0)-(A2) and geometric (A3)-(A4) nature, exposing their naturality and relevance. This process of reformulation, begun with Kim in [38], is completed here, as part of a series of questions and responses.
2 Why do Kantorovich minimizers concentrate on low-dimensional sets?
Abstractly, one expects a linear functional cost(γ) on a convex set Γ(µ + , µ − ) to attain its infimum at one of the extreme points. So it is interesting to understand the extreme points of Γ(µ + , µ − ). Such extreme points are sometimes called simplicial measures. Despite much progress, surveyed in [2], a characterization of simplicial measures in terms of their support has long remained elusive and is probably too much to hope for. Recall that a measure γ ≥ 0 is simplicial if it is not the midpoint of any segment in Γ(π + # γ, π − # γ). Ahmad, Kim and McCann [2] showed each simplicial measure γ vanishes outside the union of a graph {(x, G(x)) | x ∈ M + } and an antigraph {(H(y), y) | y ∈ M − }, generalizing Hestir and Williams [35] result from the special case of Lebesgue measure µ ± on the unit interval M ± = [0, 1]. This shows γ concentrates on a set whose topological dimension should not exceed max{n + , n − }, where n ± = dim M ± . Taking n + ≤ n − without loss of generality, if the measure µ − fills the space M − , then γ cannot concentrate on any subset of lower dimension than n − , so it would seem we have identified the topological dimension of the set on which γ concentrates to be precisely n − = max{n + , n − }. Unfortunately, this simple argument is somewhat deceptive. Although the graph and antigraph of [35] and [2] enjoy further structure, they are not generally γ-measurable; a priori it is conceivable that their closures might actually fill the product space With some assumptions on the topology of the cost function c and spaces M ± , it is possible to better estimate the size of support of the particular extreme points of interest using the more robust notion of Hausdorff dimension. The basic object of geometrical relevance will be the support spt γ of the Kantorovich optimizer, defined as the smallest closed subset S ⊂ M + × M + carrying the full mass of γ. If the Monge infimum (3) is attained by a map G : M + −→ M − and the Kantorovich minimizer is unique, it will turn out that spt γ agrees (γ-a.e.) with the graph of G; when this map is a diffeomorphism, then γ concentrates on a subset of dimension n + = dim M + in M + × M − . We shall show why this might be expected more generally, assuming M ± to be (smooth) manifolds henceforth.
Setting N := M + × M − and S = spt γ, consider the cross-difference [56] δ(x, y; defined on N 2 . An observation -special cases of which date back to Monge -asserts δ(x, y; x 0 , y 0 ) ≥ 0 on S 2 ⊂ N 2 ; in other words, we cannot lower the cost by exchanging partners between (x, y) and (x 0 , y 0 ); for a modern proof, see Gangbo and McCann [31]. This fact is called the c-monotonicity of S.
If c ∈ C 2 , then (x 0 , y 0 ) ∈ N is a critical point for the function δ 0 (x, y) := δ(x, y; x 0 , y 0 ), whose Hessian is then well-defined (though it need not be at points (x, y) = (x 0 , y 0 ) which are non-critical). Now for (x 0 , y 0 ) ∈ S, we have δ 0 (x, y) ≥ 0 on S, with equality at (x 0 , y 0 ). On the other hand, the symmetries of the cross-difference δ ensure that the Hessian h contributes the only non-vanishing term in the second order Taylor expansion of δ 0 : more explicitly δ 0 (x 0 + ∆x, y 0 + ∆y) = h((∆x, ∆y), (∆x, ∆y)) + o(|∆x| 2 + |∆y| 2 ) as (∆x, ∆y) → 0. It is then not so surprising to discover that the Hessian h controls the geometry and dimension of the support of any Kantorovich optimizer γ near (x 0 , y 0 ) in various ways, as we now make precise following Pass [66] and my joint works with Kim [38], Pass and Warren [60]. Let (k + , k 0 , k − ) be the (+, 0, −) signature of h, meaning k + , k 0 , k − ∈ N count the number of positive, zero, and negative eigenvalues of h in one and hence any choice of coordinates.
In fact, since the matrix h is symmetric, in any coordinate system we can find a basis of orthogonal eigenvectors for h. The preceding argument shows that if (∆x, ∆y) is an eigenvector with eigenvalue λ > 0 then (−∆x, ∆y) is an eigenvector with eigenvalue −λ. In this case ∆x = − 1 2λ n − j=1 D 2 xy j c(x 0 , y 0 )∆y j is determined by ∆y and vise versa, so at most k ≤ min{n + , n − } eigenvectors can correspond to positive eigenvalues [66].
Lowersemicontinuity of k = k(x 0 , y 0 ) follows from the fact that c ∈ C 2 .
The Hessian h of the cross-difference also determines the spacelike, timelike, and lightlike cones Σ + , Σ − and Σ 0 ⊂ T (x 0 ,y 0 ) N according to the defini- (0) is the tangent vector and h denotes the Hessian (5) at (x 0 , y 0 ) = z(0). Similarly, S is timelike (or lightlike) if the inequality is reversed (or if both inequalities hold).
Corollary 2.4 (Dimensional bounds) If c is C 2 and has rank 2k at a point (x 0 , y 0 ) where S has a well-defined tangent space T , then c-monotonicity of S ⊂ N implies the dimension of this tangent space satisfies dim T ≤ n + + n − − k.
Proof. Fix coordinates on N. As a consequence of the (Courant-Fischer) min-max formula for eigenvalues of h at (x 0 , y 0 ), the signature (k + , k 0 , k − ) = (k, n + + n − − 2k, k) of h limits the maximal number of linearly independent tangent vectors to N which are not timelike to k + + k 0 = n + + n − − k. Since the preceding lemma shows the tangent space T of S to be spanned by such a set of vectors, its dimension satisfies the asserted bound.
The following much stronger result of Pass [66] asserts S is contained in a spacelike Lipschitz submanifold of the prescribed dimension -hence implies differentiability a.e. as a consequence instead of a hypothesis. The case k = n + = n − was worked out earlier by McCann, Pass and Warren [60], by adapting an idea of Minty [61] [3] from the special case c(x, y) = −x · y.
Theorem 2.5 (Rectifiability [66]) If c has rank 2k at (x 0 , y 0 ) and is C 2 nearby, then on a (possibly smaller) neighbourhood Idea of proof. A kernel of the proof can be apprehended already in the one-dimensional case n ± = 1. When c has rank zero, taking L = N 0 implies the result, so assume c has full rank (2k = 2), meaning either ∂ 2 c/∂x∂y < 0 or ∂ 2 c/∂x∂y > 0 near (x 0 , y 0 ). In the first case, c-monotonicity of S implies S ∩ R is contained in a non-decreasing subset of any sufficiently small two-dimensional rectangle R = B ǫ (x 0 ) × B ǫ (y 0 ). This monotonicity is wellknown in both mathematical [51] and economic contexts [76] [62]. Rotating coordinates by setting u = (x + y)/ √ 2 and v = (y − x)/ √ 2, the monotonicity is equivalent to asserting that S is contained in the graph of {(u, V (u))} of a function v = V (u) with Lipschitz constant one. In the second case, c-monotonicity would imply S ∩ R is non-increasing, hence contained in a 1-Lipschitz graph of u over v.
When the rank of c is maximal (i.e. k = min{n + , n − }), then the dimensional bound is dim L ≤ max{n + , n − }. Taking n + ≤ n − without loss of generality, if the measure µ − fills M − (say, by being mutually absolutely continuous with respect to Lebesgue measure in any coordinate patch), the dimension of the Lipschitz submanifold L on which γ concentrates cannot be less than n − , in which case we see the bound given by the theorem is sharp: Example 2.6 (Submodular costs on the line) If M ± = R there is a unique measure in Γ(µ + , µ − ) whose support S = spt γ forms a non-decreasing subset of the plane. This measure is the unique minimizer of Kantorovich's problem (3) for each cost c ∈ C 1 (R 2 ) satisfying ∂ 2 c/∂x∂y < 0; see e.g. [56]. Apart from at most countably many vertical segments, the set S is contained in the graph of some G : R −→ R ∪ {±∞} non-decreasing. Unless µ + has atoms, the vertical segments in S are γ negligible, in which case γ = (id × G) # µ + and Monge's infimum is attained uniquely by G.
Example 2.7 (Transporting mass between spheres) Transporting mass on the surface of the earth has lead to consideration of the cost function c(x, y) = 1 2 |x − y| 2 restricted to the boundary of the unit sphere x, y ∈ ∂B n+1 [26][59], a problem considered earlier in the context of shape recognition [32][1]. The restricted cost has rank 2n except on the degenerate set c = 1, where it has rank 2n − 2. Thus any c-cyclically monotone subset S of the 2n-dimensional product space ∂B n+1 (0) has dimension at most n except along the degenerate set, where it has dimension at most n + 1 (in spite of the fact that the degenerate set is 2n − 1 dimensional). Since the degenerate set separates the orientation preserving and orientation reversing parts S + and S − of S, this means that S + cannot intersect S − transversally (except in dimension n = 1); instead, if S + meets S − at a point where both have n-dimensional tangent spaces, these spaces must have n − 1 directions in common. For example, if n = 3, and both S + and S − are generically 3-dimensional, but their union is contained in a 4-dimensional Lipschitz submanifold, whereas the cost degenerates on a smooth 5-dimensional hypersurface.
In summary, c-monotonicity implies rectifiability of S = spt γ ⊂ N = M + × M − in a dimension determined locally by the rank of the Hessian h of the cross-difference δ 0 ∈ C 2 ; moreover S must be spacelike with respect this Hessian (5). If h is non-degenerate, we will eventually see that h can be viewed as a pseudo-metric on N whose Riemannian sectional curvatures combine with µ ± to determine smoothness of S.

When do optimal maps exist?
We now turn to the more classical question of attainment of the infimum (3). To expect existence of Monge maps, we generally need µ + to be more than atom-free. We need µ + not to concentrate positive mass on any lower dimensional submanifold of M + , or more precisely on any hypersurface parameterized locally in coordinates as the graph of a difference of convex functions. This condition, proposed by Gangbo and McCann [31], is sharp in a sense made precise by Gigli [33], and implies Lipschitz continuity and C 2rectifiability of the hypersurfaces in question. Absolute continuity of µ + in coordinates -i.e. the existence of a density f + such that dµ + (x) = f + (x)dx -is more than enough to guarantee this. However, we also require further structure of the transportation cost.
For c ∈ C 1 (N), the Gangbo [30] and Levin [44] criterion for existence of Monge solutions G : M + −→ M − given in Appendix A is equivalent to: (A1) + For each x 0 ∈ M + and y 0 = y 1 ∈ M − , assume x ∈ M + −→ δ 0 (x, y 1 ) has no critical points, where δ 0 (x, y 1 ) = δ(x, y 1 ; x 0 , y 0 ) is from (4). Naturally, this implies n + ≥ n − , due to the fact we cannot generally hope to use a (rectifiable) map G on a low dimensional space to spread a measure over a higher dimensional space. In fact, (A1) + implies something stronger: namely that every solution of the Kantorovich problem is a Monge solution. This in turn implies uniqueness of the Kantorovich (and hence Monge) solution, for the following reason. Suppose two Kantorovich solutions exist, and both correspond to Monge solutions: γ 0 = (id × G 0 ) # µ + and γ 1 = (id × G 1 ) # µ + . Linearity of the Kantorovich problem implies γ 2 := (γ 0 +γ 1 )/2 is again a solution, hence by (A1) + must concentrate on the graph of a map G : M + −→ M − . It is then easy to argue γ i = (id × G) # µ + for i = 0, 1, 2 as in e.g.
[2]. This implies γ 0 = γ 1 ; moreover G 0 = G = G 1 µ-a.e. Thus we arrive at the following theorem [30] [44] [2] [33]: Theorem 3.1 (Existence and uniqueness of optimal maps) Let µ ± be probability measures on manifolds M ± , with a cost c ∈ C 1 (M + × M − ) which is bounded and satisfies (A1) + . If µ + assigns zero mass to each Lipschitz hypersurface in M + , then Kantorovich's minimum is uniquely attained, and the minimizer γ = (id × G) # µ + vanishes outside the graph of a map G solving Monge's problem. In fact, not all Lipschitz hypersurfaces are required: it is enough that µ + vanish on each hypersurface locally parameterizable in coordinates as the graph of a difference of two convex functions.
Notice (A1) + asserts the restriction of δ 0 to each horizontal fibre M + × {y 1 } has no critical points, except on the fibre y 1 = y 0 where δ 0 vanishes identically. To guarantee invertibility of the map G, we need the same condition to hold for the reflected cost c * (y, x) := c(x, y), meaning the roles of M + and M − are interchanged. If both c and c * satisfy (A1) + , we say (A1) holds. Thus (A1) is equivalent to asserting that (x 0 , y 0 ) is the only critical point of δ 0 (x, y).
On the other hand, (A1) + also fails for many interesting geometries. We mention two such examples. In the first -the cost function of interest to Monge [63] -optimal maps turn out to exist but are not unique. Their non-uniqueness was quantified with Feldman [22]. In the second, Monge's infimum turns out not to be attained, despite the fact that the Kantorovich minimizer is unique.  [63]. Taking M + disjoint from M − ensures smoothness of c. Notice that when n = 1 and M + and M − are disjoint intervals, every γ ∈ Γ(µ + , µ − ) has the same total cost cost(γ). In this case the solution to Kantorovich's problem is badly non-unique. Clearly (A1) + also fails in this case. In higher dimensions, the situation is slightly less degenerate since the cost takes a range of values on Γ(µ + , µ − ), but it remains true that its extrema are not uniquely attained. In this setting, it can be a difficult problem to show that Monge's infimum is attained. This problem was first solved by Sudakov in the plane n = 2; he asserted a result in all dimensions but it was later discovered that one of his claims sometimes fails if n > 2. This existence result was extended to higher dimensions by Evans and Gangbo, assuming µ ± to be given by Lipschitz continuous densities on R n [21], and for general absolutely continuous densities µ ± by Ambrosio [4], Trudinger-Wang [78] and Caffarelli-Feldman-McCann [12] simultaneously and independently. The last group also considered costs given by non-Euclidean norms, but with smooth and strongly convex unit balls, restrictions removed in a seqeunce of papers by different teams of authors including Ambrosio, Bernard, Buffoni, Bianchini, Caravenna, Kirchheim, and Pratelli, and culminating in work of Champion and DePascale [13].
On the other hand, if M + is a compact manifold without boundary, it is evident that x ∈ M + −→ δ 0 (x, y 1 ) must attain at least one maximum and one minimum so that -as long as the cost is assumed differentiable -it is clear that (A1) + cannot be satisfied. In this case, it will not always be true that Monge's infimum (3) is attained, as my examples with Gangbo [32] show: Example 3.4 (Transporting mass between spheres, revisited) Restrict c(x, y) = 1 2 |x − y| 2 to M ± = ∂B 1 (0) ⊂ R n+1 so that 0 ≤ c ≤ 2, as in Example 2.7. Take µ ± to be mutually absolutely continuous with respect to surface area H n on their respective spheres, but take most of the mass of µ + to be concentrated near the north pole and most of the mass of µ − to be concentrated near the south pole. Then Monge's infimum (3) will not be attained, despite the fact that the Kantorovich minimizer γ is unique. The intersection of S = spt γ with the set c ≤ 1 is contained in the graph of a map G : M + −→ M − , while the intersection S ∩ {c ≥ 1} is contained in the graph of a map H : M − −→ M + -sometimes called an antigraph. If the densities f ± = dµ ± /dH n are both bounded, so that log f ± ∈ L ∞ , then G is a homeomorphism of ∂B 1 and H may be taken to be continuous [32]; both maps enjoy a local Hölder exponent of continuity α = 1/(4n − 1) except possibly where their graphs touch the set {c = 1} where the rank of c drops from 2n to 2n − 2 [59]. It may be possible to improve this Hölder exponent to α = 1/(2n − 1) using techniques of Liu [46], but even when f ± are smooth we have no idea how to prove G will be smoother, nor how to extend Hölder continuity of G up to the degenerate set {c = 1}.
Notice that global differentiability of the cost is crucial to this discussion. For costs whose differentiability fails -even on a small set such as the Riemannian cut locus -the theorem which follows gives many natural examples where existence and uniqueness both hold. Theorem 3.5 (Minimizing Riemannian distance squared) Let c(x, y) = d 2 (x, y)/2 be the square distance induced by some Riemannian metric on a compact manifold M + = M − . If µ + is absolutely continuous (with respect to Riemannian volume) then the Kantorovich minimizer is unique in (3), and takes the form γ = (id × G) # µ + for a map solving Monge's problem [57].
In case M ± are round spheres [49] (or quotients [18], submersions [37] or products thereof [25]), and both µ ± are given by smooth positive densities with respect to surface area, then the map G will be a smooth diffeomorphism.
Notice that the existence and uniqueness asserted in Theorem 3.5 is not quite a corollary of Theorem 3.1, since compactness of the manifold M ± forces the cut-locus to be non-trivial. Here the cut-locus is defined as (the closure of) the set of points where differentiability of the cost c = d 2 /2 fails.

When are optimal measures unique?
The preceding section shows that if the cross-difference δ 0 (x, y) = δ(x, y; x 0 , y 0 ) has no critical points unless x = x 0 or y = y 0 , then Monge's problem is soluble and the Kantorovich problem admits a unique solution. Although very useful when it applies, this criterion is not satisfied in all cases of interest.
-for example, when trying to minimize the restriction of the quadratic cost c(x, y) = |x − y| 2 /2 to the Euclidean unit sphere M ± = ∂B 1 (0) ⊂ R n+1 . In such situations, my results with Chiappori, Nesheim [14], Ahmad and Kim [2] may be useful: Notice that if the manifold M + is compact, hypothesis (7) restricts its Morse structure to be that of the sphere, so the theorem generalizes of Example 3.4: However apart from the continuity results of [32] [59] and [?], it is not known when G and H can be expected to be smooth. It is even more shocking that no criterion analogous to Theorem 4.1 is known which guarantees uniqueness of Kantorovich minimizer on the torus -or indeed on any other compact manifolds M ± apart from the sphere.

When are optimal maps continuous? Smooth?
Examples 3.2, 3.4 and Theorem 3.5 complement Theorems 2.5 and 3.1 by providing a variety of settings where the optimal map G is continuous and/or support of the optimal measure can actually be shown to be smooth. In each case, we need the cost to be suitable, the domain geometry to be favorable, and the measures to be positive, bounded and possibly smooth.
Following the analysis of a number of such examples, including the restriction of c(x, y) = − log |x−y| to the unit sphere [83] [84], a general theory for addressing such questions has begun to be developed, starting from the pioneering work of Ma, Trudinger and Wang [53], who identified conditions on the transportation cost c which are close to being necessary and sufficient for smoothness of G. Their work is set on bounded domains M ± ⊂ R n , and as we now explain, each of their conditions can be reformulated in terms of the topology and geometry of the cross-difference δ 0 (x, y) = δ(x, y; x 0 , y 0 ) from (4) and its Hessian h = 1 2 Hess (x 0 ,y 0 ) δ 0 . Where c has full rank 2n, the Hessian h is non-degenerate and can be understood as a pseudo-Riemannian metric tensor on the product space. According to Claim 2.1, this pseudo-metric tensor is not positive definite, but instead has the same number of spacelike and timelike dimensions. At each point point (x 0 , y 0 ) ∈ N, the light-cone separating these spacelike from timelike directions consists of the tangent spaces to {x 0 } × M − and M + × {y 0 }. However, just as in Riemannian (and Lorentzian) geometry, the pseudometric tensor h induces a geometry on the product space N = M + × M − , including geodesics and a pseudo-Riemannian curvature tensor R i ′ j ′ k ′ l ′ , which assigns sectional curvature to each pair of vectors P, Q ∈ T x 0 ,y 0 N. The explicit formulae expressing geodesics and the curvature tensor (12) in terms of h can be found in [38] or deduced from Appendix A; they are precisely analogous to the Riemannian case.
In terms these notions, we may now state conditions equivalent to those of Ma, Trudinger and Wang (A1) ′ -(A4) ′ found in Appendix A below: (A0) c ∈ C 4 (N), and for each (x 0 , y 0 ) ∈N =M + ×M − ⊂ R n × R n : (A1) (x, y) ∈N → δ 0 (x, y) from (4) has no critical points save (x 0 , y 0 ); (A2) c has rank 2n, so h =Hess (x 0 ,y 0 ) δ 0 defines a pseudo-metric tensor; Here a subset Z ⊂N is said to be h-geodesically convex if each pair of points (x 0 , y 0 ) and (x 1 , y 1 ) ∈ Z can be joined by an geodesic inN lying entirely within Z, geodesics being defined relative to the pseudo-metric h.
The most intriguing of these conditions is the curvature condition (A3). A large body of example costs which satisfy [53] [48] have now been established. Among the former we may mention the restriction of the Euclidean distance squared to the graphs M ± ⊂ R n+1 of any pair of 1-Lipschitz convex functions [53], as well as the Riemannian distance squared on the round sphere [49], and any products [38], submersions [38] or perturbations [18] [27] [28] thereof. Among the latter we may mention the Riemannian distance squared on any manifold (M, g ij ) with a non-negative sectional curvature somewhere [48], and the restriction of the Euclidean distance squared to the graphs of two functions in R n+1 , one of which is convex and the other non-convex [53]. Thus the distance squared in hyperbolic space c = d 2 H n violates (A3), though c = − cosh d H n satisfies it [45] [42].
To conclude continuity or higher regularity of G at present requires a slight strengthening of one of the geometric conditions (A3) or (A4). If the inequality in (A3) holds strictly whenever the h-orthogonal vectors p⊕0 and 0 ⊕ q are non-vanishing, we denote that by (A3) s . If instead the geodesic convexity of the sets in (A4) is strong (i.e. 2-uniform, in the sense of Example 3.2 or Appendix A), we denote that by (A4) s . Under these assumptions the following extensions of Theorem 3.1 and Example 3.2 have been proved, in works of Ma, Trudinger, Wang, Loeper, Liu, Figalli, Kim and myself.
It is possible to construct smooth bounded f ± for which continuity of G fails in the absence of either (A3) or (A4) as was done by Loeper [48] and by Ma, Trudinger and Wang [53] respectively. Still, there are few results quantifying the discontinuities of G, except for the cost c(x, y) = 1 2 |x − y| 2 of Example 3.2 [85] [23] [24], for which examples of discontinuous maps go back to Caffarelli [10].
6 Closed forms and c-cyclical monotonicity The sections above have discussed many necessary conditions for optimality of γ, but few sufficient conditions. In fact, for bounded continuous c ∈ C(M + × M − ), a condition on the support S = spt γ well-known to be necessary and sufficient for optimality in Γ(π + # γ, π − # γ) is given by: y 1 ), . . . , (x k , y k ) ∈ S, and permutation τ on k letters satisfy the following inequality: This result can be found in Pratelli [68] or Schachermayer-Teichmann [74], building on earlier works of Knott-Smith, Gangbo-McCann, Rüschendorf, and Ambrosio-Pratelli. The case k = 2 corresponds to the c-monotonicity condition which implies that S is h-spacelike. The result quoted above shows the cross-difference δ(x, y; x 0 , y 0 ) is just the first in an infinite sequence of functions whose non-negativity on S k for each k ∈ N characterizes optimality of γ. In fact, since all permutations are made up of cycles, for each k it is enough to check (8) for the cyclic permutation τ (i) = i + 1 if i < k with τ (k) = 1. This family of conditions has a differential topological content whose relevance we now try to make clear.
Choose any map G : U + ⊂ M + −→ M − defined on a subset U + ⊂ M − , whose graph lies inside S. Any differentiable loop σ : S 1 −→ M + may be approximated by x i = σ(θ i ) for a partition 0 < θ 1 < · · · < θ k ≤ 2π as fine as we please. The non-negative sums (8) then approximate Riemann sums for the integral 0 ≤ 2π 0 D x c(σ(θ), G(σ(θ))) · σ ′ (θ)dθ arbitrarily closely. If the form x ∈ U + −→ D x c(x, G(x)) is continuous on an open set U + ⊂ M + containing σ, then the Riemann integral exists. Since the curve can be traversed in either direction, the non-negative integral must actually vanish, hence the form must be closed: for U + simply connected, there would exist u ∈ C 1 loc (U + ) such that D x c(x, G(x)) = Du(x). Similarly, if G could be continuously inverted on a simply connected domain U − ⊂ M − , there would exist v ∈ C 1 loc (U − ) such that D y c(G −1 (y), y) = Dv(y). These suppositions are not so implausible when (A1)-(A2) hold, since S at least coincides with the graph of a map G and has a well-defined tangent space H n -almost everywhere.
However, despite the fact that neither G nor its inverse will be continuous in general, some vestige of this integrability persists. If c is Lipschitz continuous for example, then (8) implies the existence of Lipschitz u, v such that c(x, y) −u(x) −v(y) ≥ 0 on N = M + ×M − with equality holding throughout S. This fact, which goes back to [72] [71], is in many senses better than mere integrability of a form: it requires no topology restriction on the domains, and not only do we get the first-order condition Du(x) = D x c(x, y) for those points (x, y) ∈ S with x in the set of H n full measure Dom Du where u is differentiable; as a second-order condition we get positive-definiteness of the matrix D 2 xx c(x, y) − D 2 u(x) ≥ 0 if x ∈ Dom D 2 u, and analogous conditions for v. Verily is S contained in the gradient of a convex function when c(x, y) = −x · y or c(x, y) = 1 2 |x − y| 2 on U ± ⊂ R n . As Gangbo and McCann argue [31], this rough integrability result of Rockafellar and Rochet implies the famous duality of Kantorovich [36], Koopmans and Beckmann [40]: with the supremum over being attained at (u + , u − ) = (u, v). Indeed, for any (u + , u − ) ∈ Lip c , integrating the inequality (10) against γ ∈ Γ(µ + , µ − ) yields Thus the min dominates the sup in (9). Starting from γ ∈ Γ(µ + , µ − ) with ccyclically monotone support, Rochet's generalization of Rockafellar's theorem provides (u + , u − ) = (u, v) ∈ Lip c -bounded and Lipschitz if c is -such that equality holds in (11), and hence in (9) as desired.

Connections to differential geometry
We have already seen that the pseudo-Riemannian geometry induced on the product space N = M + × M − by the metric tensor h = 1 2 Hess δ 0 (x 0 , y 0 ) plays a key role in determining whether or not maps y = G(x) which solve Monge's problem (1) are smooth. Here h is the Hessian of the cross-difference (4)-(5) associated to the cost c. The antisymmetry δ(x, y; x 0 , y 0 ) = δ(x 0 , y 0 ; x, y) = −δ(x, y 0 ; x 0 , y) ensures that h vanishes on n×n diagonal blocks. The involution U(∆x, ∆y) = (∆x, −∆y) on T (x 0 ,y 0 ) N allows us to define an antisymmetrized analog of h by ω(P, Q) = h(P, U(Q)).
Here ω turns out to be a symplectic form if and only if h has the full rank 2n = 2n ± that we often assume. Notice the similarity to Kähler geometry, with the splitting T (x 0 ,y 0 ) N = T x 0 M + ⊕T y 0 M − of the tangent space associated to U playing the role of the almost complex structure J, and the cost c playing the role of the Kähler potential. For geometric measure theory in such geometries see Harvey and Lawson [34]. Kim and McCann showed that any c-optimal diffeomorphism G : M + −→ M − has a graph which is ω-Lagrangian in addition to being h-spacelike. Conversely, when (A0)-(A4) hold, then any diffeomorphism with an ω-Lagrangian and h-spacelike graph is necessarily c-optimal [38]. Here a submanifold S ⊂ N is called ω-Lagrangian if ω(P, Q) = 0 for every pair of tangent vectors P, Q ∈ T (x 0 ,y 0 ) N. Being ω-Lagrangian is essentially the integrability condition which asserts closure of the form D x c| (x,G(x)) on M + ; it amounts to equality of the cross-derivatives ∂G i /∂x j = ∂G j /∂x i which imply the existence of u such that G(x) = Du(x) in case c(x, y) = −x · y.
So far these geometric structures -the pseudo-metric h, symplectic form ω, c-cyclical monotonicity, and c-optimality -reflect only the cost function c(x, y), and not the densities dµ ± (x) = f ± (x)dx. Remarkably, however, there is a conformally equivalent pseudo-metric for which the graph Graph(G) of an optimal mapping G # µ + = µ − turns out to be a zero mean curvature surface -and in facth-volume maximizing among homologous surfaces. This surprising connection of optimal transportation to geometric measure theory was discovered with Kim and Warren [39].
Thus the properties of optimal maps relate to both sectional and mean curvatures with respect toh. On the other hand, in the special case of the quadratic cost c = d 2 on a Riemannian manifold M = M ± , several surprising connections relate optimal transportation to the Riemannian geometry of (M, g ij ). For example, in this case Loeper and Villani conjecture [50] and in some cases have proved -(A3) s implies convexity of the tangent injectivity locus, which is to say the cut locus of each given point x 0 ∈ M, lifted to the tangent space T x 0 M + by the Riemannian exponential exp −1 x 0 . An earlier development involved lifting the metrical distance d from M to the space P (M) of Borel probability measures µ ± ∈ P (M) using the minimal transportation cost d 2 (µ + , µ − ) = cost(γ) with respect to distance squared c = d 2 [6] [20] [64]. Geodesic convexity of various entropy functionals on P (M) turns out to be equivalent to Ricci non-negativity of (M, g). This was shown by von Renesse and Sturm [70], building on work of myself [55], Cordero-Erausquin, Schmuckenschläger and I [15], and Otto and Villani [65]. This idea was turned on its head by Lott-Villani [52] and independently Sturm [77], who used geodesic convexity of the same entropies to define Ricci non-negativity in (not necessarily smooth) metric-measure spaces. This nonnegativity is stable under measured Gromov-Hausdorff convergence, and has significant consequences.

A Ma-Trudinger-Wang conditions
The conditions (A0)-(A4) above have been synthesized in a language selected to manifest their topological and geometric invariance -aspects not readily apparent [7] from the original formulation by Ma, Trudinger, and Wang [53] in coordinates on the bounded sets M ± ⊂ R n , as we now recall.
Use subscripts such as i and j to denote derivatives with respect to x i and y j , and commas to separate derivatives in M + from those in M − , so that c i,j = ∂ 2 c/∂x i ∂y j and c ij.kl = ∂ 4 c/∂x i ∂x j ∂y k y l , etc. Also let c k,l denote the matrix inverse of c i,j , and let D x c(x, y) = (c 1 , c 2 , . . . , c n )(x, y). Then the original conditions of Ma, Trudinger and Wang were formulated as the existence of a constant C 0 > 0 such that: (A0) ′ c ∈ C 4 (N), and for each (x 0 , y 0 ) ∈N =M + ×M − ⊂ R n × R n : (A1) ′ + the map y ∈M − −→ D x c(x 0 , y) ∈ T * x 0 M + is injective; (A1) ′ both c(x, y) and c * (y, x) := c(x, y) satisfy (A0) ′ and (A1) ′ + ; (A2) ′ det c i,j (x 0 , y 0 ) = 0; (A3) ′ s (−c ij,kl + c ij,m c m,n c kl,n )p i q j p k q l ≥ C 0 |p| 2 |q| 2 whenever p i c i,j q j = 0; (A4) ′ the sets D x c(x 0 , M − ) ⊂ R n and D y c(M + , y 0 ) ⊂ R n are convex. Here the Einstein summation convention is in effect, and |p| and |q| denote the Euclidean norm on p ∈ T x 0 M + and q ∈ T y 0 M − ⊂ R n respectively.
Their method is heavily based on a priori C 2 estimates, which require a maximum principle for the directional second derivatives D 2 pp u := u ij p i p j of the unknown maximizers u ± ∈ C(M ± ) for the dual problem (9). A secondorder linear elliptic equation satisfied by D 2 pp u is obtained by twice differentiating the prescribed Jacobian equation for the map G, which is a fully nonlinear Monge-Ampère type equation for the potential u = u + . Condition (A3) s ′ ensures the zeroth order term in the elliptic equation satisfied by D 2 pp u has a coefficient with the correct sign to admit a maximum principle.
The relaxation (A3) ′ of C 0 > 0 to C 0 = 0 and strengthening (A4) s ′ which requires all principal curvatures of D x c(x 0 , M − ) and D y c(M + , y 0 ) to be positive was introduced in the subsequent investigation of boundary regularity by Trudinger and Wang [79]. We leave it as an exercise to the reader to confirm the equivalence of each primed hypothesis (A0) ′ -(A4) ′ and their variants to the corresponding unprimed hypothesis in the text. The connection of these conditions to the Riemann curvature tensor sec (N ,h) (x 0 ,y 0 ) (p ⊕ 0) ∧ (0 ⊕ q) = (−c ij,kl + c ij,m c m,n c kl,n )p i q j p k q l (12) and geodesic equations for the pseudo-metric h = 1 2 Hess (x 0 ,y 0 ) δ 0 was first discovered in my joint work with Kim [38]. However, the link to the crossdifference δ 0 (x, y) originates in the present work.