1 Introduction

Optimal transport continues to be a very active field of research, both in mathematics and in applications. One of the central objects is the \(L^p\)-Kantorovich metric \(W_p\), defined by

$$\begin{aligned} W_p(\mu ,\nu ) = \left( \inf _{\pi \in \Pi (\mu ,\nu )}\int _{\mathcal {X}\times \mathcal {X}}d(x,y)^p\,d\pi (x,y)\right) ^{1/p}, \end{aligned}$$

where \(\mu \) and \(\nu \) are Borel probability measures on the metric space \((\mathcal {X},d)\), and \(\Pi (\mu ,\nu )\) is the set of all couplings of \(\mu \) and \(\nu \).

The metric \(W_2\) plays a special role in the theory, as it is the crucial object in the gradient flow formulation of dissipative PDE (starting from [11, 20]) and in the synthetic theory of Ricci curvature [14, 22], which builds on McCann’s discovery that several important functionals enjoy convexity properties along \(W_2\)-geodesics [16].

In spite of the robustness of the optimal transport theory, it is well known that if the underlying space is discrete, \(W_2\) has several undesirable properties that hamper its usefulness. In particular, if \(\mathcal {X}\) is discrete, the metric space \((\mathcal {P}(\mathcal {X}), W_2)\) does not contain any non-trivial geodesics.

To circumvent this problem, several authors introduced discrete dynamical transport metrics \(\mathcal {W}\), based on discrete versions of the Benamou–Brenier formulation of optimal transport [2, 15, 17]. These metrics have been intensively studied in recent years; in particular, gradient flow formulations have been obtained for nonlinear evolution equations [6, 19], and a discrete theory of Ricci curvature has been developed based on geodesic convexity of entropy functionals along discrete optimal transport [5, 18]. Such Ricci curvature bounds have subsequently been obtained in various discrete probabilistic models [4, 7, 8].

In spite of the relevance of the notion of geodesic convexity, geometric properties of \(\mathcal {W}\)-geodesics are currently poorly understood. The aim of this paper is to obtain results of this type. We focus on the issue of locality of geodesics in the space of probability measures.

More precisely, let \((\mathcal {X}, \mathsf {d})\) be a metric space, and consider a geodesic metric \(\mathsf {D}\) on (a subset of) the space of Borel probability measures \(\mathcal {P}(\mathcal {X})\). We say that a subset \(\mathcal {Y}\subseteq \mathcal {X}\) has the weak locality property if any pair of probability measures \(\mu _0, \mu _1 \in \mathcal {P}(\mathcal {X})\) supported in \(\mathcal {Y}\) can be connected by a geodesic that is supported in \(\mathcal {Y}\) at all times. The notion of strong locality is defined by requiring this property to hold for any geodesic connecting \(\mu _0\) and \(\mu _1\). If any pair of measures can be connected by a unique geodesic, the notions of weak and strong locality coincide, but this property is currently unknown for discrete dynamical transport metrics.

If \((\mathcal {X}, \mathsf {d})\) is a geodesic metric space, and \(\mathsf {D}\) is the Kantorovich metric \(W_p\) for some \(1 \le p < \infty \), it is well known that a subset \(\mathcal {Y}\) has the weak (resp. strong) locality property if and only if it is weakly (resp. strongly) geodesically convex. This follows from the fact that geodesics in \((\mathcal {P}_p(\mathcal {X}), W_p)\) are supported on geodesics in \((\mathcal {X}, \mathsf {d})\); cf. [12] for a precise formulation of this result in a general setting.

Interestingly, the issue of locality in the discrete setting [with a discrete dynamical transport metric \(\mathcal {W}\) on \(\mathcal {P}(\mathcal {X})\) instead of \(W_p\)] turns out to be much more delicate. For example, if one considers the complete graph on a three-point set \(K_3\), then any geodesic connecting two Dirac masses transports a nontrivial part of the mass via the third point. Hence, two-point subsets of \(K_3\) do not have the locality property. This is shown in Sect. 6 of this paper.

Based on this observation one may conjecture that any nontrivial \(\mathcal {W}\)-geodesic has support on the whole graph. However, we show that this is not the case. In fact, the main contribution of this paper is the introduction of a geometric condition for subsets \(\mathcal {Y}\subseteq \mathcal {X}\) (the retraction property), that is shown to be sufficient for locality; see Theorem 4.11. The retraction property is easy to check in concrete examples, as is shown in Sect. 4.

As an application of our main result, we show that if \(\mathcal {X}\) is any subset of the grid \(\mathbb {Z}^d\) with the usual graph structure, and \(\mathcal {Y}\subseteq \mathcal {X}\) is a hyperrectangle, then any pair of measures supported in \(\mathcal {Y}\) can be connected by a geodesic supported in \(\mathcal {Y}\). In particular, this property holds for measures supported on subsets of lines, or k-dimensional hyperplanes of dimension less than d. Let us also mention that discrete Ricci curvature bounds in the sense of [5, 18] are inherited by subsets with the retraction property; see Corollary 4.12.

A key ingredient in the proof of our main result is a duality result for the discrete transport metric \(\mathcal {W}\), which was recently obtained by Gangbo, Li, and Mou (under slightly more restrictive conditions on the transition rates) [9]. We interpret this result (Theorem 3.4 below) in terms of subsolutions of a discrete Hamilton–Jacobi equation and present a different proof based on Fenchel–Rockafellar duality. We then show that subsolutions of the Hamilton–Jacobi equation on a subset \(\mathcal {Y}\subseteq \mathcal {X}\) can be extended to the full space \(\mathcal {X}\), provided that \(\mathcal {Y}\) has the retraction property; cf. Theorem 4.10. Our main theorem is then a straightforward consequence of this result.

Structure of the paper In Sect. 2 we collect the necessary preliminaries on discrete transport metrics. Section 3 contains the dual formulation of the transport problem in terms of Hamilton–Jacobi subsolutions. In Sect. 4 we introduce the retraction property, we show the extension result for subsolutions to the Hamilton–Jacobi equation (Theorem 4.10), and we prove the main result on weak locality of subsets with the retraction property (Theorem 4.11). In Sect. 5 we show that the strong locality property holds for Markov chains with “dead ends”. Finally, it is shown in Sect. 6 that geodesics between Dirac measures on the triangle have full support.

2 The discrete transport distance

In this section we briefly recall the definition and basic properties of the discrete transport distance constructed in [2, 15, 17].

Let \(\mathcal {X}\) be a finite set, and let \(Q: \mathcal {X}\times \mathcal {X}\rightarrow \mathbb {R}_+\) denote the transition rates for a Markov chain on \(\mathcal {X}\). Without loss of generality, we use the convention that \(Q(x,x) = 0\) for all \(x \in \mathcal {X}\). The corresponding generator \(\mathcal {L}\) acts on functions \(\phi : \mathcal {X}\rightarrow \mathbb {R}\) by

$$\begin{aligned} \mathcal {L}\phi (x) = \sum _{y \in \mathcal {X}} Q(x,y)\big (\phi (y)-\phi (x)\big )\;. \end{aligned}$$

We assume that Q is irreducible, i.e., each pair \((x, y)\in \mathcal {X}\times \mathcal {X}\) can be connected, for some \(n \in \mathbb {N}\), by a path \(\{x_i\}_{i=0}^n\) satisfying \(x_0 = x\), \(x_n = y\), and \(Q(x_{i-1}, x_i) > 0\) for \(i = 1, \ldots , n\). This assumption implies the existence of a unique stationary probability measure \(\pi \) on \(\mathcal {X}\). Moreover, \(\pi \) is strictly positive. We will furthermore assume that Q is reversible with respect to \(\pi \), i.e., the detailed balance condition holds:

$$\begin{aligned} \pi (x) Q(x,y) = \pi (y) Q(y,x) \quad \text { for all } x,y \in \mathcal {X}\;. \end{aligned}$$

The triple \((\mathcal {X},Q,\pi )\) will be referred to as a Markov triple.

A Markov chain induces a graph on the vertex set \(\mathcal {X}\), whose edge set is given by \(\mathcal {E}= \{ (x,y) \in \mathcal {X}\times \mathcal {X}: Q(x,y) > 0 \}\). We write \(x \sim y\) iff \(Q(x,y) > 0\). The assumption that Q is irreducible corresponds to the graph \((\mathcal {X},\mathcal {E})\) being connected. The detailed balance condition implies that the graph is undirected.

In order to define the discrete transport distance on the set \(\mathcal {P}(\mathcal {X})\) of probability measures on \(\mathcal {X}\), we introduce the following objects.

Definition 2.1

(Continuity equation) A pair \((\mu ,V)\) is said to satisfy the continuity equation if

  1. (i)

    \(\mu :[0,T]\rightarrow \mathbb {R}^\mathcal {X}\) is continuous;

  2. (ii)

    \(V:[0,T]\rightarrow \mathbb {R}^{\mathcal {X}\times \mathcal {X}}\) is locally integrable;

  3. (iii)

    \(\mu _t\in \mathcal {P}(\mathcal {X})\) for all \(t\in [0,T]\);

  4. (iv)

    the continuity equation holds in the sense of distributions:

    $$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t}\mu _t(x) + \frac{1}{2}\sum _{y\in \mathcal {X}}\left( V_t(x,y) - V_t(y,x) \right) = 0 \quad \text { for all } x\in \mathcal {X}. \end{aligned}$$
    (2.1)

In this case, we write \((\mu ,V) \in \mathsf {CE}_T\). Furthermore, \(\mathsf {CE}_T(\mu ^0,\mu ^1)\) denotes the collection of pairs \((\mu ,V)\in \mathsf {CE}_T\) satisfying \(\mu |_{t=0}=\mu ^0\) and \(\mu |_{t=T}=\mu ^1\).

Definition 2.2

(Admissible mean) An admissible mean is a continuous function \(\Lambda : \mathbb {R}_+ \times \mathbb {R}_+ \rightarrow \mathbb {R}_+\) that is \(C^\infty \) on \((0,\infty ) \times (0,\infty )\), symmetric, positively 1-homogeneous, non-decreasing in each of its variables, jointly concave, and normalised, i.e., \(\Lambda (1,1) = 1\).

Of particular interest to us is the logarithmic mean given by

$$\begin{aligned} \Lambda _{\text {log}}(s,t):=\int _0^1s^\alpha t^{1-\alpha }\; \mathrm {d}\alpha , \end{aligned}$$

since it arises in the entropic gradient flow structure for the master equation \(\partial _t \mu = \mathcal {L}^* \mu \). Other relevant examples of admissible means are the harmonic mean \(\Lambda _{\text {har}}(s,t) = \frac{2st}{s+t}\), the geometric mean \(\Lambda _{\text {geo}}(s,t) = \sqrt{st}\), and the arithmetic mean \(\Lambda _{\text {ari}}(s,t) = \frac{s+t}{2}\). Some of these means arise in gradient structures for porous medium equations; cf. [6]. From now on, we will fix an admissible mean \(\Lambda \).

The action functional for the discrete transport distance is defined using the convex and lower semicontinuous function \(A: \mathbb {R}^3 \rightarrow [0,\infty ]\) given by

$$\begin{aligned} A(s,t,w) := {\left\{ \begin{array}{ll} \frac{w^2}{\Lambda (s,t)},&{} w \in \mathbb {R},\ s,t>0,\\ 0, &{} w=0,\ s,t\ge 0,\\ +\infty , &{}\text {otherwise}. \end{array}\right. } \end{aligned}$$
(2.2)

For \(\mu \in \mathcal {P}(\mathcal {X})\) and \(V: \mathcal {X}\times \mathcal {X}\rightarrow \mathbb {R}\) we define the action by

$$\begin{aligned} \mathcal {A}(\mu ,V) =\frac{1}{2}\sum _{x,y\in \mathcal {X}}A\big (\mu (x)Q(x,y),\mu (y)Q(y,x), V(x,y)\big )\;. \end{aligned}$$

For brevity we sometimes write

$$\begin{aligned} \widehat{\mu }(x,y) := \Lambda \big ( \mu (x)Q(x,y), \mu (y) Q(y,x) \big )\;. \end{aligned}$$

Definition 2.3

(Discrete transport distance) For a Markov triple \((\mathcal {X}, Q, \pi )\) and an admissible mean \(\Lambda \), the discrete transport distance \(\mathcal {W}\) is defined for \(\mu _0,\mu _1\in \mathcal {P}(\mathcal {X})\) by

$$\begin{aligned} \mathcal {W}(\mu _0,\mu _1):=\inf \left\{ \sqrt{\int _0^1\mathcal {A}(\mu _t,V_t)\; \mathrm {d}t} : (\mu ,V)\in \mathsf {CE}_1(\mu _0,\mu _1)\right\} \;. \end{aligned}$$
(2.3)

It has been shown in [5] that minimisers exist in the minimisation problem above. Any minimal curve \((\mu _t)_{t\in [0,1]}\) is a constant speed geodesic, i.e., it satisfies \(\mathcal {W}(\mu _s,\mu _t)=|t-s|\mathcal {W}(\mu _0,\mu _1)\) for all \(s,t\in [0,1]\).

Remark 2.4

Without loss of generality we may assume in the minimisation (2.3) that V is anti-symmetric, i.e., \(V_t(x,y)= -V_t(y,x)\). In fact, for each \(U \in \mathbb {R}\), the quantity \(|V(x,y)|^2+|V(y,x)|^2\) is minimised among all choices of V(xy), V(yx) such that \(V(x,y) - V(y,x) = U\) by choosing \(V(y,x) = -V(x,y) = U/2\).

Finally, let us introduce some convenient notation to be used in the sequel. We denote the Euclidean inner products on \(\mathbb {R}^\mathcal {X}\) and \(\mathbb {R}^{\mathcal {X}\times \mathcal {X}}\) by

$$\begin{aligned} \langle {\phi , \psi }\rangle = \sum _{x \in \mathcal {X}} \phi (x) \psi (x) \quad \text {and} \quad \langle \langle {\Phi , \Psi }\rangle \rangle = \frac{1}{2}\sum _{x,y \in \mathcal {X}} \Phi (x,y) \Psi (x,y) \ . \end{aligned}$$

The discrete gradient of a function \(\phi \in \mathbb {R}^\mathcal {X}\) will be denoted by \(\nabla \phi (x,y) = \phi (y) - \phi (x)\), and the discrete divergence of \(\Phi \in \mathbb {R}^{\mathcal {X}\times \mathcal {X}}\) is given by

$$\begin{aligned} \nabla \cdot \Phi (x) = \frac{1}{2}\sum _{y\in \mathcal {X}}\big (\Phi (x,y)-\Phi (y,x)\big )\;. \end{aligned}$$

Furthermore, for \(\mu \in \mathcal {P}(\mathcal {X})\) and \(\Phi \in \mathbb {R}^{\mathcal {X}\times \mathcal {X}}\) we write

$$\begin{aligned} \Vert \Phi \Vert _\mu := \sqrt{\langle \langle {\Phi ,\Phi \cdot \hat{\mu }}\rangle \rangle }\;. \end{aligned}$$

where the multiplication of \(\Phi \cdot \hat{\mu }\) is understood componentwise. For all \(\Phi ,V\in \mathbb {R}^{\mathcal {X}\times \mathcal {X}}\) and \(\mu \in \mathcal {P}(\mathcal {X})\), Young’s inequality yields

$$\begin{aligned} \langle \langle {\Phi ,V}\rangle \rangle \le \frac{1}{2} \Vert \Phi \Vert _\mu ^2 + \frac{1}{2} \mathcal {A}(\mu ,V)\;. \end{aligned}$$
(2.4)

3 Duality for discrete optimal transport

We present a dual formulation for the discrete transport distance which can be seen as a discrete analogue of the Kantorovich duality. This result has recently been proved in [9] using different methods; cf. Proposition 3.10 and Theorems 5.10 and 7.4 in that paper. Note that the result in [9] is stated under slightly stronger assumptions on the transition rates. In our notation, it is assumed there that \(Q(x,y) = Q(y,x)\) and \(\pi \) is constant. The slightly greater generality here does not cause additional difficulties.

Definition 3.1

(Hamilton–Jacobi subsolution) A function \(\phi \in H^1\big ((0,T);\mathbb {R}^\mathcal {X}\big )\) is said to be a Hamilton–Jacobi subsolution if for a.e. t in (0, T), we have

$$\begin{aligned} \langle {\dot{\phi }_t,\mu }\rangle + \frac{1}{2} \Vert \nabla \phi _t\Vert ^2_\mu \le 0 \quad \text { for all } \mu \in \mathcal {P}(\mathcal {X})\;. \end{aligned}$$
(3.1)

The collection of all Hamilton–Jacobi subsolutions is denoted \(\mathsf {HJ}^T_\mathcal {X}\).

Remark 3.2

Hamilton–Jacobi subsolutions obey a simple scaling relation: given \(\phi \in \mathsf {HJ}_\mathcal {X}^T\) and \(\lambda >0\), set \(\phi ^\lambda _t:=\lambda \phi _{\lambda t}\). It is immediate to check that \(\phi ^\lambda \in \mathsf {HJ}_\mathcal {X}^{\lambda T}\).

Remark 3.3

Informally, (3.1) may be seen as a one-sided discrete version of the Hamilton–Jacobi equation \(\partial _t \phi + \frac{1}{2}|\nabla \phi |^2 = 0\). Note however that the dependence on \(\mu \) in (3.1) is nonlinear, which prevents us from formulating the inequality pointwise in terms of \(\phi \) only. This is a crucial difference between the discrete and the continuous setting, and a source of several difficulties.

In the continuous setting, a full treatment of Hamilton–Jacobi equations relies on the theory of viscosity solutions [3], but this concept will not play any role in our discrete setting. Let us also mention that Hamilton–Jacobi equations have been studied in the setting of metric length spaces [10, 13] as well as on graphs [21]. Our discrete notion of Hamilton–Jacobi subsolution is different from the one studied in [21].

Theorem 3.4

(Duality formula) For \(\mu _0,\mu _1\in \mathcal {P}(\mathcal {X})\) we have

$$\begin{aligned} \frac{1}{2}\mathcal {W}^2(\mu _0,\mu _1)=\sup \big \lbrace \langle {\phi _1,\mu _1}\rangle -\langle {\phi _0,\mu _0}\rangle \,:\, \phi \in \mathsf {HJ}^1_\mathcal {X}\big \rbrace . \end{aligned}$$
(3.2)

This representation remains true if the supremum is restricted to the class of functions \(\phi \in C^1\big ([0,1],\mathbb {R}^\mathcal {X}\big )\) satisfying (3.1).

Let us first give a heuristic argument for the duality result above. We start by introducing a Lagrange multiplier for the continuity equation constraint and write

$$\begin{aligned} \frac{1}{2}\mathcal {W}^2(\mu _0,\mu _1)&=\inf \limits _{\mu ,V}\sup \limits _\phi \left\{ \int _0^1\frac{1}{2}\mathcal {A}(\mu _t,V_t)\; \mathrm {d}t + \int _0^1\langle {\phi _t,\dot{\mu }_t+\nabla \cdot V_t}\rangle \; \mathrm {d}t\right\} , \end{aligned}$$
(3.3)

where the supremum is taken over all (sufficiently smooth) functions \(\phi :[0,1]\rightarrow \mathbb {R}^\mathcal {X}\) and the infimum is taken over all (sufficiently smooth) curves \(\mu : [0,1] \rightarrow \mathbb {R}_+\) connecting \(\mu _0\) and \(\mu _1\), and over all \(V : [0,1] \rightarrow \mathbb {R}^{\mathcal {X}\times \mathcal {X}}\). Here we do not require that \((\mu ,V)\) satisfies the continuity equation, but the inner supremum takes the value \(+\infty \) if \((\mu ,V)\) does not belong to \(\mathsf {CE}_1(\mu _0,\mu _1)\). We also do not require that \(\mu \) takes values in \(\mathcal {P}(\mathcal {X})\), but this is automatically enforced by the continuity equation.

Integrating by parts and using the min–max principle we obtain

$$\begin{aligned} \frac{1}{2}\mathcal {W}^2(\mu _0,\mu _1)&= \inf \limits _{\mu ,V}\sup \limits _\phi \left\{ \langle {\phi _1,\mu _1}\rangle -\langle {\phi _0,\mu _0}\rangle + \int _0^1\frac{1}{2}\mathcal {A}(\mu _t,V_t) -\langle {\dot{\phi }_t,\mu _t}\rangle -\langle \langle {\nabla \phi _t, V_t}\rangle \rangle \; \mathrm {d}t\right\} \\&= \sup \limits _\phi \left\{ \langle {\phi _1,\mu _1}\rangle -\langle {\phi _0,\mu _0}\rangle + \inf \limits _{\mu ,V} \int _0^1\frac{1}{2}\mathcal {A}(\mu _t,V_t) -\langle {\dot{\phi }_t,\mu _t}\rangle -\langle \langle {\nabla \phi _t, V_t}\rangle \rangle \; \mathrm {d}t\right\} . \end{aligned}$$

As the quantity to be minimised is positively 1-homogeneous in \((\mu , V)\), the infimum takes the value \(-\infty \) if \(\phi \) does not belong to \(\mathcal {H}\), the set of \(C^1\) functions \(\phi :[0,1]\rightarrow \mathbb {R}^{\mathcal {X}}\) satisfying

$$\begin{aligned} \int _0^1\frac{1}{2}\mathcal {A}(\mu _t,V_t) -\langle {\dot{\phi }_t,\mu _t}\rangle -\langle \langle {\nabla \phi _t, V_t}\rangle \rangle \; \mathrm {d}t \ge 0 \end{aligned}$$

for all \(\mu : [0,1] \rightarrow \mathbb {R}_+^\mathcal {X}\) and all \(V : [0,1] \rightarrow \mathbb {R}^{\mathcal {X}\times \mathcal {X}}\). Consequently,

$$\begin{aligned} \frac{1}{2}\mathcal {W}(\mu _0,\mu _1)^2=\sup \{\langle {\phi _1,\mu _1}\rangle -\langle {\phi _0,\mu _0}\rangle \ : \ \phi \in \mathcal {H}\}. \end{aligned}$$

A simple localisation argument in t shows that \(\phi \in \mathcal {H}\) iff for all \(t\in [0,1]\) and \((\mu ,V)\in \mathbb {R}_+^\mathcal {X}\times \mathbb {R}^{\mathcal {X}\times \mathcal {X}}\):

$$\begin{aligned} \frac{1}{2}\mathcal {A}(\mu ,V) -\langle {\dot{\phi }_t,\mu }\rangle - \langle \langle {\nabla \phi _t,V}\rangle \rangle \ge 0. \end{aligned}$$

We may write

$$\begin{aligned} \frac{1}{2}\mathcal {A}(\mu ,V) - \langle \langle {\nabla \phi _t,V}\rangle \rangle&= \frac{1}{4}\sum _{x,y}\frac{\big (V(x,y) - \widehat{\mu }(x,y) \nabla \phi _t(x,y)\big )^2}{\widehat{\mu }(x,y)} - \frac{1}{2}\Vert {\nabla \phi _t}\Vert ^2_\mu . \end{aligned}$$

Minimising over V we conclude that \(\phi \in \mathcal {H}\) iff the inequality

$$\begin{aligned} \langle {\dot{\phi }_t,\mu }\rangle + \frac{1}{2}\Vert {\nabla \phi _t}\Vert ^2_\mu \le 0 \end{aligned}$$

holds for all \(\mu \in \mathbb {R}^\mathcal {X}_+\) and \(t \in [0,1]\), which means that \(\phi \in \mathsf {HJ}_\mathcal {X}\).

We present a proof of Theorem 3.4 using the Fenchel–Rockafellar duality theorem; see, e.g., [23, Theorem 1.9]. Recall that given a normed vector space E with topological dual space \(E^*\) and a proper convex function \(F: E \rightarrow \mathbb {R}\cup \{+\,\infty \}\), its Legendre–Fenchel transform \(F^* : E^* \rightarrow \mathbb {R}\cup \{+\,\infty \}\) is defined by

$$\begin{aligned} F^* :E^*\rightarrow \mathbb {R}\cup \{+\,\infty \}, \qquad F^*(x^*) := \sup \limits _{x \in E}\Big \{ \langle {x,x^*}\rangle - F(x) \Big \}\;. \end{aligned}$$

Theorem 3.5

(Fenchel–Rockafellar duality) Let E be a normed vector space and \(E^*\) its topological dual. Let \(F,G:E\rightarrow \mathbb {R}\cup \{+\infty \}\) be proper convex functions and denote by \(F^*,G^*:E^*\rightarrow \mathbb {R}\cup \{+\infty \}\) their Legendre–Fenchel transforms. Assume that there is \(z_0\in E\) such that G is continuous at \(z_0\) and \(F(z_0),G(z_0)<\infty \). Then we have:

$$\begin{aligned} \sup \limits _{z\in E}\Big [ - F(z) - G(z) \Big ] = \min \limits _{z^*\in E^*} \Big [ F^*(z^*) + G^*(-z^*) \Big ]\;. \end{aligned}$$
(3.4)

Proof of Theorem 3.4

Let us first note that, by the convexity of the constraint (3.1), any \(\phi \in \mathsf {HJ}^1_\mathcal {X}\) can be approximated uniformly by \(C^1\) functions satisfying (3.1) by convolution (after scaling the function to a slightly larger interval \([-\,\delta ,1+\delta ]\) via Remark 3.2). Therefore, the final part of the theorem follows.

To show the dual representation with \(C^1\) functions, we will apply Theorem 3.5 in the following situation. Let E be the Banach space

$$\begin{aligned} E = C^1\big ([0,1],\mathbb {R}^\mathcal {X}\big ) \times L^2\big ((0,1),\mathbb {R}^{\mathcal {X}\times \mathcal {X}}\big ) . \end{aligned}$$

Since we can identify \(C^1\big ([0,1],\mathbb {R}^{\mathcal {X}}\big )\) with \(\mathbb {R}^\mathcal {X}\times C^0\big ([0,1],\mathbb {R}^{\mathcal {X}}\big )\) via the map \(I : \phi \mapsto (\phi _0, \dot{\phi })\), the dual space \(E^*\) can be identified with

$$\begin{aligned} E^*= \mathbb {R}^\mathcal {X}\times M\big ([0,1],\mathbb {R}^\mathcal {X}\big ) \times L^2\big ((0,1),\mathbb {R}^{\mathcal {X}\times \mathcal {X}}\big ), \end{aligned}$$

where the duality pairing between \((\phi , \Phi )\in E\) and \((b,\sigma ,V)\in E^*\) is given by

$$\begin{aligned} \langle {(\phi ,\Phi ), (b,\sigma ,V)}\rangle&= \langle {\phi _0,b}\rangle + \int _0^1\langle {\dot{\phi }_t,\; \mathrm {d}\sigma (t)}\rangle + \int _0^1\langle \langle {\Phi _t,V_t}\rangle \rangle \; \mathrm {d}t, \end{aligned}$$

keeping in mind that \(\sigma \) is a vector-valued measure.

Define the functionals \(F,G: E \rightarrow \mathbb {R}\cup \{+\,\infty \}\) by

$$\begin{aligned} F(\phi ,\Phi )&= {\left\{ \begin{array}{ll} -\langle {\phi _1,\mu _1}\rangle + \langle {\phi _0,\mu _0}\rangle , &{} \Phi = \nabla \phi ,\\ +\infty , &{} \text {otherwise}, \end{array}\right. }\\ G(\phi ,\Phi )&= {\left\{ \begin{array}{ll} 0, &{} (\phi ,\Phi )\in \mathcal {D},\\ +\infty , &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

Here we say that a pair \((\phi ,\Phi )\in E\) belongs to \(\mathcal {D}\) if for all continuous curves \(t\mapsto \eta _t\in \mathbb {R}_+^\mathcal {X}\) we have \(\int _0^1\langle {\dot{\phi }_t,\eta _t}\rangle +\frac{1}{2} \Vert \Phi _t\Vert _{\eta _t}^2\; \mathrm {d}t \le 0\). It is readily checked that F and G are convex. Moreover, setting \(\bar{\phi }(t) = t(-\,1,\ldots ,-\,1)\) and \(\bar{\Phi } \equiv 0\), both F and G are finite at \((\bar{\phi },\bar{\Phi })\) and G is continuous at \((\bar{\phi },\bar{\Phi })\). Note that for \(\phi \in C^1\big ([0,1],\mathbb {R}^\mathcal {X}\big )\) we have \((\phi ,\nabla \phi )\in \mathcal {D}\) if and only if \(\phi \in \mathsf {HJ}_\mathcal {X}^1\) which follows from a simple localisation argument in t. Hence, the supremum in the left-hand side of (3.4) coincides with the supremum in the right-hand side of (3.2).

We will calculate the Legendre–Fenchel transforms of F and G. For F we obtain

$$\begin{aligned} F^*(b,\sigma ,V)&= \sup \limits _{(\phi ,\Phi )\in E} \left\{ \big \langle {(\phi ,\Phi ), (b,\sigma ,V)}\big \rangle - F(\phi ,\Phi ) \right\} \\&= \sup \limits _{\phi } \left\{ \langle {\phi _0, b}\rangle + \int _0^1 \langle {\dot{\phi }_t,\mathrm {d}\sigma (t)}\rangle + \int _0^1\langle \langle {\nabla \phi _t,V_t}\rangle \rangle \; \mathrm {d}t +\langle {\phi _1,\mu _1}\rangle - \langle {\phi _0,\mu _0}\rangle \right\} . \end{aligned}$$

Thus, by homogeneity of the last expression in \(\phi \), one has \(F^*(b,\sigma ,\nu )= + \infty \) unless \((\sigma , V)\) satisfies the continuity equation \(\partial _t \sigma + \nabla \cdot V = 0\) with boundary values \(-\,(\mu _0-b)\) and \(-\,\mu _1\), in the sense that

$$\begin{aligned} \langle {\phi _1,-\mu _1}\rangle - \langle {\phi _0,-(\mu _0-b)}\rangle =\int _0^1\langle {\dot{\phi }_t,\; \mathrm {d}\sigma (t)}\rangle + \int _0^1\langle \langle {\nabla \phi _t,V_t}\rangle \rangle \; \mathrm {d}t \end{aligned}$$
(3.5)

for all \(\phi \in C^1\big ([0,1],\mathbb {R}^\mathcal {X}\big )\). In particular, the distributional derivative of \(\sigma \) belongs to \(L^2([0,1];\mathbb {R}^\mathcal {X})\). Since the antiderivative of a distribution is unique up to a constant, the fundamental theorem of Lebesgue calculus implies that \(\sigma \) has the form \(\mathrm {d}\sigma (t)=\sigma _t\; \mathrm {d}t\) for some curve \((\sigma _t)_t\in H^1([0,1];\mathbb {R}^\mathcal {X})\). Moreover, (3.5) implies \(\sigma _0=-(\mu _0-b)\) and \(\sigma _1=-\mu _1\). Thus, we obtain

$$\begin{aligned} F^*(b,\sigma ,V)&= {\left\{ \begin{array}{ll} 0, &{} (-\sigma ,-V)\in \mathsf {CE}'(\mu _0-b,\mu _1),\\ +\infty , &{} \text {otherwise}, \end{array}\right. } \end{aligned}$$
(3.6)

where \(\mathsf {CE}'\) is defined by dropping the positivity and normalisation condition (i) in the definition of \(\mathsf {CE}\), and we have identified the measure \(\sigma \) with the \(H^1\)-map \(\sigma _t\).

As it suffices to calculate the transform of G at points \((b,\sigma ,V)\) where \(F^*(-b,-\sigma ,-V)\) is finite, we can assume that \(\mathrm {d}\sigma (t)= \sigma _t\; \mathrm {d}t\) with \((\sigma _t)_{t}\in H^1([0,1];\mathbb {R}^\mathcal {X})\). We claim that:

$$\begin{aligned} G^*(b,\sigma ,V) = {\left\{ \begin{array}{ll} \frac{1}{2}\int _0^1\mathcal {A}(\sigma _t,V_t)\; \mathrm {d}t, &{} b=0,\\ +\infty , &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(3.7)

Indeed, it follows that

$$\begin{aligned} G^*(b,\sigma ,V)&= \sup \limits _{(\phi ,\Phi )\in E} \left\{ \big \langle {(\phi ,\Phi ), (b,\sigma ,V)}\big \rangle - G(\phi ,\Phi ) \right\} \\&=\sup \limits _{(\phi ,\Phi )\in \mathcal {D}} \left\{ \langle {\phi _0, b}\rangle + \int _0^1\langle {\dot{\phi }_t, \sigma _t}\rangle + \langle \langle {\Phi _t,V_t}\rangle \rangle \; \mathrm {d}t \right\} . \end{aligned}$$

Since \((\phi , \Phi ) \in \mathcal {D}\) implies \((\phi + c, \Phi ) \in \mathcal {D}\) for all \(c \in \mathbb {R}^\mathcal {X}\), we have \(G^*(b,\sigma ,V)=+\infty \) unless \(b=0\). Moreover, from the definition of \(\mathcal {D}\) we infer that \(G^*(b,\sigma ,V)=+\infty \) unless \(\sigma _t\in \mathbb {R}_{+}^\mathcal {X}\) for a.e. t.

Let us assume that \(b=0\) and \(\int _0^1\mathcal {A}(\sigma _t,V_t)\; \mathrm {d}t<\infty \). Then we obtain

$$\begin{aligned} \begin{aligned} G^*(0,\sigma ,V)&=\sup \limits _{(\phi ,\Phi )\in \mathcal {D}} \left\{ \int _0^1\langle {\dot{\phi },\sigma _t}\rangle + \langle \langle {\Phi _t,V_t}\rangle \rangle \; \mathrm {d}t \right\} \\&\le \sup \limits _{(\phi ,\Phi )\in \mathcal {D}} \bigg \{ \int _0^1 - \frac{1}{2}\Vert \Phi _t\Vert _{\sigma _t}^2 + \langle \langle {\Phi _t,V_t}\rangle \rangle \; \mathrm {d}t \bigg \} \le \frac{1}{2} \int _0^1 \mathcal {A}(\sigma _t, V_t) \; \mathrm {d}t , \end{aligned} \end{aligned}$$
(3.8)

where the first inequality follows from the definition of \(\mathcal {D}\) and the second from (2.4).

It remains to show that we have in fact equality. First we consider a convolution in time yielding smooth pairs \(\sigma ^\varepsilon _t\), \(V^\varepsilon _t\) converging to \(\sigma _t\), \(V_t\) as \(\varepsilon \rightarrow 0\). Then we set for \(\delta > 0\), \(\sigma ^{\delta ,\varepsilon }_t = \sigma _t^\varepsilon + \delta \pi \). By convexity of the action and monotonicity of the mean \(\Lambda \) we have

$$\begin{aligned} \int _0^1\mathcal {A}(\sigma ^{\delta ,\varepsilon }_t,V^{\varepsilon }_t)\; \mathrm {d}t \le \int _0^1\mathcal {A}(\sigma ^{\varepsilon }_t,V^{\varepsilon }_t)\; \mathrm {d}t \le \int _0^1\mathcal {A}(\sigma _t,V_t)\; \mathrm {d}t. \end{aligned}$$
(3.9)

The convexity and lower semicontinuity of A further implies the lower semicontinuity of the action; see [1, Theorem 3.4.3] for a general result on lower semicontinuity of integral functionals and the proof of [5, Theorem 3.2] for the application to the action functional \(\mathcal {A}\). Consequently,

$$\begin{aligned} \int _0^1\mathcal {A}(\sigma _t,V_t)\; \mathrm {d}t= \lim _{\delta \rightarrow 0} \lim _{\varepsilon \rightarrow 0} \int _0^1\mathcal {A}(\sigma ^{\delta ,\varepsilon }_t,V^{\varepsilon }_t)\; \mathrm {d}t. \end{aligned}$$

Now, we can choose in particular \((\phi ^{\delta ,\varepsilon },\Phi ^{\delta ,\varepsilon })\) such that

$$\begin{aligned} \Phi ^{\delta ,\varepsilon }_t = \frac{V^{\varepsilon }_t}{\hat{\sigma }^{\delta ,\varepsilon }_t},\quad \dot{\phi }^{\delta ,\varepsilon }(x)=-\frac{1}{2}\sum _y\partial _1\Lambda \big (\rho ^{\delta ,\varepsilon }_t(x),\rho ^{\delta ,\varepsilon }_t(y)\big )|\Phi ^{\delta ,\varepsilon }_t(x,y)|^2Q(x,y), \end{aligned}$$

where \(\sigma ^{\delta ,\varepsilon }=\rho ^{\delta ,\varepsilon }\pi \).

We claim that \((\phi ^{\delta ,\varepsilon },\Phi ^{\delta ,\varepsilon })\in \mathcal {D}\). To see this, we use the inequality

$$\begin{aligned} \partial _1\Lambda (s,t)u+\partial _2\Lambda (s,t)v\ge \Lambda (u,v) \quad \forall s,t>0,\ u,v\ge 0, \end{aligned}$$

which is an identity for \(s=v, t=u\), see [5, Lemma 2.2]. From this we infer that for any \(\mu =\tilde{\rho }\pi \in \mathcal {P}(\mathcal {X})\) we have

$$\begin{aligned} \begin{aligned} \langle {\dot{\phi }_t^{\delta ,\varepsilon },\mu }\rangle =&-\frac{1}{2}\sum _{x,y}\partial _1\Lambda \big (\rho _t^{\delta ,\varepsilon }(x),\rho _t^{\delta ,\varepsilon }(y)\big )\tilde{\rho }(x) |\Phi ^{\delta ,\varepsilon }_t(x,y)|^2Q(x,y)\pi (x) \\ =&-\frac{1}{4}\sum _{x,y}\Big [\partial _1\Lambda \big (\rho ^{\delta ,\varepsilon }(x), \rho ^{\delta ,\varepsilon }(y)\big )\tilde{\rho }(x)+\partial _2\Lambda \big (\rho ^{\delta ,\varepsilon }(x),\rho ^{\delta ,\varepsilon }(y)\big )\tilde{\rho }(y)\Big ]\\&\times |\Phi _t^{\delta ,\varepsilon }(x,y)|^2 Q(x,y)\pi (x)\\&\le -\frac{1}{4}\sum _{x,y}\Lambda \big (\tilde{\rho }(x),\tilde{\rho }(y) \big )|\Phi ^{\delta ,\varepsilon }_t(x,y)|^2 Q(x,y)\pi (x) = -\frac{1}{2}\Vert \Phi ^{\delta ,\varepsilon }_t\Vert _\mu ^2, \end{aligned} \end{aligned}$$
(3.10)

which proves the claim. Note that for \(\tilde{\rho }=\rho ^{\delta ,\varepsilon }_t\) we obtain equality.

Next we claim that

$$\begin{aligned}&\lim _{\delta \rightarrow 0} \lim _{\varepsilon \rightarrow 0}\int _0^1\langle {\dot{\phi }^{\delta ,\varepsilon },\sigma _t}\rangle \; \mathrm {d}t=-\frac{1}{2}\int _0^1\mathcal {A}(\sigma _t,V_t)\; \mathrm {d}t. \end{aligned}$$

To prove this, we compare the left-hand side and the second line in (3.10). The limit \(\varepsilon \rightarrow 0\) is justified by dominated convergence, since (3.10) yields the majorant

$$\begin{aligned} \frac{1}{2} \Vert \Phi ^{\delta ,\varepsilon }_t\Vert _{\sigma _t}^2 \le \frac{C}{\delta } \mathcal {A}(\sigma _t^\varepsilon , V_t^\varepsilon ) , \end{aligned}$$

where C depends on Q and \(\pi \). The right-hand side converges as \(\varepsilon \rightarrow 0\) by (3.9). The limit \(\delta \rightarrow 0\) is justified by monotone convergence. Similarly, we have

$$\begin{aligned} \lim _{\delta \rightarrow 0} \lim _{\varepsilon \rightarrow 0} \int _0^1\langle \langle {V_t, \Phi _t^{\varepsilon ,\delta }}\rangle \rangle \; \mathrm {d}t = \int _0^1\mathcal {A}(\sigma _t,V_t)\; \mathrm {d}t. \end{aligned}$$

Here, we can use the estimate \(|ab|\le \frac{1}{2}a^2+\frac{1}{2}b^2\) to obtain a majorant that converges by (3.9) as before. Thus the expression in the first braced bracket of (3.8) converges to the right-hand side of (3.8) with this choice of \((\phi ^{\delta ,\varepsilon },\Psi ^{\delta ,\varepsilon })\) as \(\delta ,\varepsilon \rightarrow 0\). A similar argument yields \(G^*(0,\sigma ,V)=\infty \) if \(\int _0^1\mathcal {A}(\sigma _t,V_t)\; \mathrm {d}t=\infty \). Combining (3.6), (3.7) and the fact that \(\mathcal {A}(\sigma ,V)=+\infty \) unless \(\sigma \in \mathbb {R}^\mathcal {X}_+\), we obtain

$$\begin{aligned} F^*(-b,-\sigma ,-V)+ G^*(b,\sigma ,V) = {\left\{ \begin{array}{ll} \frac{1}{2}\int _0^1\mathcal {A}(\sigma _t,V_t)\; \mathrm {d}t, &{} (\sigma ,V)\in \mathsf {CE}(\mu _0,\mu _1),\ b=0,\\ +\infty , &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

Thus the infimum in the right-hand side of (3.4) coincides with \(\frac{1}{2}\mathcal {W}(\mu _0,\mu _1)^2\). An application of Theorem  3.5 concludes the proof. \(\square \)

4 Locality of optimal curves

In this section we investigate locality properties for discrete transport geodesics. More precisely, we study the following question: Given two probability measures supported in a subset \(\mathcal {Y}\) of a state space \(\mathcal {X}\), is there an optimal curve connecting them that is supported in \(\mathcal {Y}\)? The crucial tool to analyse this question is the dual characterisation of the transport problem given in the previous section. We prove two types of results.

Firstly, we show that the question can be answered affirmatively, under a simple condition (the retraction property of the subgraph \(\mathcal {Y}\)), which will be introduced below. This property ensures that any competitor in the dual problem on the subgraph can be extended to a competitor on the full graph. We present several examples where this property is satisfied. Later, in Sect. 6, we will show that locality may fail if the retraction property is not satisfied.

We start by introducing the retraction property and we give several examples. To increase readability, we often write subscripts instead of parentheses, e.g., \(Q_{xy} = Q(x,y)\).

A subset \(\mathcal {Y}\subseteq \mathcal {X}\) is said to be connected if any two distinct points \(y, y' \in \mathcal {Y}\) can be connected by a path \(\{y_i\}_{i=0}^n \subseteq \mathcal {Y}\) satisfying \(y_0 = y\), \(y_n = y'\), and \(Q(y_{i-1}, y_i) > 0\) for \(i=1,\ldots ,n\).

Definition 4.1

(Retraction property) A connected subset \(\mathcal {Y}\subseteq \mathcal {X}\) has the retraction property if there exists a map \(T:\mathcal {X}\rightarrow \mathcal {Y}\) such that

  1. (R1)

    \(T(y)=y\) for all \(y\in \mathcal {Y}\);

  2. (R2)

    For all \(y,y' \in \mathcal {Y}\) with \(y\ne y'\), and all \(x \in T^{-1}(y)\), we have

    $$\begin{aligned} \sum _{x'\in T^{-1}(y')} Q(x,x') \le Q(y,y'). \end{aligned}$$

The map T is called a retraction of \(\mathcal {X}\) onto \(\mathcal {Y}\).

Remark 4.2

If the Markov triple \((\mathcal {X},Q,\pi )\) corresponds to a simple random walk (i.e., \(Q(x, y) \in \{0,1\}\) for all \(x, y \in \mathcal {X}\)), the retraction property can be rephrased in graph theoretical terms. Indeed, it is readily verified that the retraction property holds if and only if there exists a map \(T : \mathcal {X}\rightarrow \mathcal {Y}\) with the following properties:

\((R1')\) :

\(T(y) = y\) for all \(y \in \mathcal {Y}\);

\((R2')\) :

If \(x \sim x'\), then \(T(x) = T(x')\) or \(T(x) \sim T(x')\);

\((R3')\) :

If \(x_1' \sim x\), \(x_2' \sim x\), and \(T(x_1') = T(x_2')\) for some \(x_1' \ne x_2'\), then \(T(x) = T(x_1')\).

Definition 4.3

(Restriction) The restriction of a Markov triple \((\mathcal {X},Q,\pi )\) to a connected subset \(\mathcal {Y}\subseteq \mathcal {X}\) is the Markov triple \((\mathcal {Y},Q|_{\mathcal {Y}},\pi |_{\mathcal {Y}})\), where \(Q|_{\mathcal {Y}}\) is the restriction of Q to \(\mathcal {Y}\times \mathcal {Y}\), and \(\pi |_{\mathcal {Y}}\) is the normalised restriction of \(\pi \) to \(\mathcal {Y}\).

Connectedness of \(\mathcal {Y}\) implies that the Markov triple \((\mathcal {Y},Q|_{\mathcal {Y}},\pi |_{\mathcal {Y}})\) is irreducible, and the detailed balance relation is obviously inherited. The following result implies that if \(\mathcal {Y}\) has the retraction property as a subset of \(\mathcal {X}\), it also has this property as a subset of any set \(\mathcal {X}'\) with \(\mathcal {Y}\subseteq \mathcal {X}' \subseteq \mathcal {X}\).

Lemma 4.4

Let \((\mathcal {X}, Q, \pi )\) be a Markov triple and \(\mathcal {Y}\subseteq \mathcal {X}' \subseteq \mathcal {X}\). If \(T : \mathcal {X}\rightarrow \mathcal {Y}\) is a retraction, then its restriction \(T|_{\mathcal {X}'} : \mathcal {X}' \rightarrow \mathcal {Y}\) is a retraction as well.

Proof

This follows immediately from the definition. \(\square \)

We present some examples of sets with the retraction property.

Example 4.5

(Cycle) For \(n \ge 2\), let \(\mathcal {X}=\mathbb {Z}/n\mathbb {Z}\), and set \(Q_{j,j+1}=Q_{j+1,j}=1\) and \(Q_{ij}=0\) otherwise. All computations are to be understood modulo n. We claim that the subset \(\{1,\dots ,k\}\) of \(\mathcal {X}\) has the retraction property if and only if \(2k \le n\). In this case, a retraction is given as follows (cf. Fig. 1):

$$\begin{aligned} T : \mathcal {X}\rightarrow \{1,\dots ,k\},\qquad T(j) = {\left\{ \begin{array}{ll}j &{} \text {if } 1 \le j \le k,\\ 2k-j+1 &{} \text {if } k+1 \le j \le 2k,\\ 1 &{} \text {if } 2k+1 \le j \le n . \end{array}\right. } \end{aligned}$$

Indeed, to check sufficiency, note that \((R1')\) is trivial, \((R2')\) holds since \(n \ge 2k\), and \((R3')\) is readily checked as well. Necessity follows from a simple argument.

Fig. 1
figure 1

Retraction of a 9-cycle \(\mathcal {X}\) onto a 4-point set \(\mathcal {Y}\). The labels indicate the image of the corresponding vertex under the retraction T

Example 4.6

(Grid) Consider \(\mathbb {Z}^d\) with the usual graph structure given by \(Q_{xy}=1\) if \(|x-y|=1\) and \(Q_{xy}=0\) otherwise. Let \(\mathcal {Y}\subseteq \mathbb {Z}^d\) be a nonempty subset of the form \(\mathcal {Y}= \mathcal {R}\cap \mathbb {Z}^d\), where \(\mathcal {R}= \prod _{j=1}^d [a_j,b_j]\) is a hyperrectangle, and let \(\mathcal {X}\) be a connected subgraph of \(\mathbb {Z}^d\) containing \(\mathcal {Y}\). We claim that \(\mathcal {Y}\) has the retraction property. Indeed, it is readily checked that a retraction from \(\mathcal {X}\) to \(\mathcal {Y}\) can be obtained by mapping \(x \in \mathcal {X}\) to the point in \(\mathcal {Y}\) that is closest to x with respect to the Euclidean distance.

Example 4.7

(2-Point space) Assume that Q takes values in \(\{0, 1\}\) and let \(x, y \in \mathcal {X}\) with \(Q_{xy} = 1\). A disjoint decomposition \(\mathcal {X}=A_x\cup A_y\) with \(x\in A_x\) and \(y\in A_y\) is called an x-y cut. An edge \((u,v) \in \mathcal {E}\) is a cross if \(u \in A_x\) and \(v\in A_y\). The subset \(\{x,y\}\) has the retraction property if and only if there exists an x-y cut such that no distinct crosses share a point. The correspondence between x-y cuts with this property and retractions is given by \(T^{-1}(x)=A_x\), \(T^{-1}(y)=A_y\) (Fig. 2).

Fig. 2
figure 2

An x-y-cut associated with a retraction

Example 4.8

(Honeycomb lattice) Let \((\mathcal {X},\mathcal {E})\) be a connected subgraph of the honeycomb lattice and define transition rates by setting \(Q_{xy} = 1\) if \((x,y)\in \mathcal {E}\) and zero otherwise. Then each fundamental cell \(\mathcal {Y}=\{y_1,\dots ,y_6\}\) (see Fig. 3) has the retraction property. Indeed, to obtain a retraction of \(\mathcal {X}\) onto \(\mathcal {Y}\), we partition the plane into 6 sectors separated by rays that originate at the centre of \(\mathcal {Y}\) and intersect the midpoints of the sides of \(\mathcal {Y}\) orthogonally. A retraction is then obtained by mapping each \(x \in \mathcal {X}\) to the unique \(y \in \mathcal {Y}\) that belongs to the same sector (cf. Fig. 3).

Fig. 3
figure 3

Part of the honeycomb lattice with a fundamental cell \(\mathcal {Y}\). The labels indicate the image of the corresponding vertex under the retraction

Example 4.9

(Trees) Assume that the graph \((\mathcal {X}, \mathcal {E})\) is a tree, i.e., it does not contain a cycle. Every subtree \(\mathcal {Y}\) of \(\mathcal {X}\) has the retraction property, and a retraction can be constructed as follows: Fix a vertex \(y \in \mathcal {Y}\). Since \(\mathcal {X}\) is a tree, for every \(x\in \mathcal {X}\) there is a unique path \(\gamma \) without self-intersections connecting x and y. The map assigning to x the first point where the path \(\gamma \) meets \(\mathcal {Y}\) is a retraction of \(\mathcal {X}\) onto \(\mathcal {Y}\). Note that the retraction property depends only on the graph \((\mathcal {X}, \mathcal {E})\) and not on the choice of the transition rates Q (as long as they give rise to the same graph).

Theorem 4.10

(Extension of Hamilton–Jacobi subsolutions) Let \((\mathcal {X}, Q, \pi )\) be a Markov triple, and let \(\mathcal {Y}\) be a connected subset of \(\mathcal {X}\). If \(\mathcal {Y}\) has the retraction property, then every Hamilton–Jacobi subsolution on \(\mathcal {Y}\) can be extended to a Hamilton–Jacobi subsolution on \(\mathcal {X}\).

Proof

Let \(\phi \) be a Hamilton–Jacobi subsolution on \(\mathcal {Y}\), and let T be a retraction of \(\mathcal {X}\) onto \(\mathcal {Y}\). Define \(\bar{\phi } : \mathcal {X}\rightarrow \mathbb {R}\) by \(\bar{\phi } := \phi \circ T\), so that \(\bar{\phi }|_\mathcal {Y}= \phi \) by (R1). We will show that for any \(\bar{\nu } \in \mathcal {P}(\mathcal {X})\), there exists \(\nu \in \mathcal {P}(\mathcal {Y})\) such that

$$\begin{aligned} \langle {\dot{\bar{\phi }}_t,\bar{\nu }}\rangle + \frac{1}{2} \Vert \nabla \bar{\phi }_t\Vert ^2_{\bar{\nu }} \le \langle {\dot{\phi }_t,\nu }\rangle + \frac{1}{2} \Vert \nabla \phi _t\Vert ^2_{\nu } \end{aligned}$$
(4.1)

for a.e. t. To improve readability, we omit the subscript t. As \(\phi \in \mathsf {HJ}_\mathcal {Y}\), the right-hand side of (4.1) is nonpositive, so this suffices to prove the theorem.

For \(\bar{\nu }\in \mathcal {P}(\mathcal {X})\) define \(\nu \in \mathcal {P}(\mathcal {Y})\) by \(\nu := T_\# \bar{\nu }\). Clearly,

$$\begin{aligned} \langle {\dot{\bar{\phi }}, \bar{\nu }}\rangle = \langle { {\dot{\phi }} \circ T , \bar{\nu }}\rangle = \langle {\dot{\phi }, T_\# \bar{\nu }}\rangle = \langle {\dot{\phi }, \nu }\rangle . \end{aligned}$$

It thus remains to show that \(\Vert \nabla \bar{\phi }_t\Vert _{\bar{\nu }} \le \Vert \nabla \phi _t\Vert _{\nu }\).

Splitting the sum we obtain

$$\begin{aligned} \Vert \nabla \bar{\phi }\Vert _{\bar{\nu }}^2&= \frac{1}{2} \sum _{x,x'\in \mathcal {X}}\Lambda (\bar{\nu }_x Q_{xx'}, \bar{\nu }_{x'} Q_{x'x})(\bar{\phi }_x - \bar{\phi }_{x'})^2\\&= \frac{1}{2} \sum _{\begin{array}{c} y,y'\in \mathcal {Y}\\ y\ne y' \end{array}}(\phi _y-\phi _{y'})^2\sum _{\begin{array}{c} x\in T^{-1}(y)\\ x'\in T^{-1}(y') \end{array}}\Lambda (\bar{\nu }_x Q_{xx'}, \bar{\nu }_{x'} Q_{x'x} ). \end{aligned}$$

The concavity and homogeneity of \(\Lambda \) imply

$$\begin{aligned} \begin{aligned}&\sum _{\begin{array}{c} x\in T^{-1}(y)\\ x'\in T^{-1}(y') \end{array}}\Lambda (\bar{\nu }_x Q_{xx'}, \bar{\nu }_{x'} Q_{x'x} )\\&\quad \le \Lambda \Bigg (\sum _{x\in T^{-1}(y)}\sum _{\begin{array}{c} x'\in T^{-1}(y') \end{array}} \bar{\nu }_x Q_{xx'}, \sum _{x'\in T^{-1}(y')} \sum _{\begin{array}{c} x\in T^{-1}(y) \end{array}} \bar{\nu }_{x'} Q_{x'x}\Bigg ) \;. \end{aligned} \end{aligned}$$

Given \(x \in \mathcal {X}\) and \(y,y' \in \mathcal {Y}\) with \(y\ne y'\) and \(T(x) = y\), the retraction property (R2) implies that \(\sum _{x'\in T^{-1}(y')} Q_{xx'} \le Q_{yy'}\) (and the same holds with primed and unprimed variables interchanged). Hence the monotonicity of \(\Lambda \) yields

$$\begin{aligned} \Lambda \Bigg (&\sum _{x\in T^{-1}(y)}\sum _{\begin{array}{c} x'\in T^{-1}(y') \end{array}} \bar{\nu }_x Q_{xx'}, \sum _{x'\in T^{-1}(y')} \sum _{\begin{array}{c} x\in T^{-1}(y) \end{array}} \bar{\nu }_{x'} Q_{x'x}\Bigg ) \\ {}&\le \Lambda \Bigg (Q_{yy'}\sum _{x\in T^{-1}(y)}\bar{\nu }_x, \ Q_{y'y} \sum _{x'\in T^{-1}(y')}\bar{\nu }_{x'}\Bigg ) =\Lambda (\nu _y Q_{yy'}, \nu _{y'} Q_{y'y} ) . \end{aligned}$$

Combining these inequalities, we infer that

$$\begin{aligned} \Vert \nabla \bar{\phi }\Vert _{\bar{\nu }}^2 \le \frac{1}{2}\sum _{y,y'\in \mathcal {Y}}(\phi _y - \phi _{y'})^2\Lambda (\nu _y Q_{yy'}, \nu _{y'} Q_{y'y}) = \Vert \nabla \phi \Vert _{\nu }^2 , \end{aligned}$$

which completes the proof. \(\square \)

The following result shows that any pair of measures supported in a set \(\mathcal {Y}\) with the retraction property can be connected by a geodesic supported in \(\mathcal {Y}\).

Theorem 4.11

(Weak locality under the retraction property) Let \((\mathcal {X}, Q, \pi )\) be a Markov triple, and let \(\mathcal {Y}\) be a subset of \(\mathcal {X}\) with the retraction property. For all \(\mu ^0,\mu ^1 \in \mathcal {P}(\mathcal {X})\) with support in \(\mathcal {Y}\) there exists a minimising \(\mathcal {W}\)-geodesic \((\mu _t)_{t \in [0,1]} \subseteq \mathcal {P}(\mathcal {X})\) connecting \(\mu ^0\) and \(\mu ^1\) such that \(\mu _t\) has support in \(\mathcal {Y}\) for all \(t\in [0,1]\).

In fact, we will show that any \(\mathcal {W}_\mathcal {Y}\)-geodesic \((\mu _t)_t \subseteq \mathcal {P}(\mathcal {Y})\) is also a \(\mathcal {W}_\mathcal {X}\)-geodesic when regarded as a curve in \(\mathcal {P}(\mathcal {X})\).

Proof

Let \((\mu _t)_t\) be a minimising geodesic in \(\mathcal {P}(\mathcal {Y})\) satisfying the continuity Eq. (2.1) with momentum vector field \((V_t)_t\). Consider the extension to \(\mathcal {X}\) defined by \(\bar{\mu }_t(x)=0\) if \(x\notin \mathcal {Y}\) and \(\bar{V}_t(x,x')=0\) if \(x\notin \mathcal {Y}\) or \(x'\notin \mathcal {Y}\). Clearly, \((\bar{\mu }_t,\bar{V}_t)_t\) has the same action as \((\mu _t,V_t)_t\).

Let \(\varepsilon > 0\). Since \((\mu _t,V_t)_t\) is a geodesic in \(\mathcal {P}(\mathcal {Y})\), Theorem 3.4 (applied in \(\mathcal {P}(\mathcal {Y})\)) implies that there exists a Hamilton–Jacobi subsolution \(\phi \in \mathsf {HJ}_\mathcal {Y}\) such that

$$\begin{aligned} \langle {\phi _1,\mu ^1}\rangle -\langle {\phi _0,\mu ^0}\rangle \ge \frac{1}{2} \int _0^1 \mathcal {A}(\mu _t,V_t) \; \mathrm {d}t - \varepsilon . \end{aligned}$$

By Theorem 4.10, \(\phi \) can be extended to a Hamilton–Jacobi subsolution \(\bar{\phi } \in \mathsf {HJ}_\mathcal {X}\). In particular, using Theorem 3.4 once more (this time in \(\mathcal {P}(\mathcal {X})\)),

$$\begin{aligned} \langle {\phi _1,\mu ^1}\rangle -\langle {\phi _0,\mu ^0}\rangle = \langle {\bar{\phi }_1,\mu ^1}\rangle -\langle {\bar{\phi }_0,\mu ^0}\rangle \le \frac{1}{2}\mathcal {W}_\mathcal {X}^2(\mu ^0,\mu ^1) . \end{aligned}$$

Since \(\varepsilon > 0\) is arbitrary, it follows that \(\int _0^1\mathcal {A}(\mu _t,V_t) \; \mathrm {d}t \le \mathcal {W}_\mathcal {X}^2(\mu ^0,\mu ^1)\), which yields the result. \(\square \)

It follows from the previous result that Ricci curvature bounds in the sense of [5, 18] are inherited by subsets with the retraction property. We recall that a Markov triple \((\mathcal {X}, Q,\pi )\) is said to have Ricci curvature bounded from below by \(\kappa \in \mathbb {R}\) if for any \(\mu _0, \mu _1 \in \mathcal {P}(\mathcal {X})\), and for some (equivalently, for any) \(\mathcal {W}\)-geodesic \((\mu _t)\) connecting \(\mu _0\) and \(\mu _1\), the relative entropy \(\mu \mapsto {{\,\mathrm{Ent}\,}}_\pi (\mu ) := \sum _{x \in \mathcal {X}}\mu (x) \log \big (\frac{\mu (x)}{\pi (x)}\big )\) satisfies the following \(\kappa \)-convexity inequality, for any \(0 \le t \le 1\):

$$\begin{aligned} {{\,\mathrm{Ent}\,}}_{\pi }(\mu _t) \le (1-t) {{\,\mathrm{Ent}\,}}_{\pi }(\mu _0) + t {{\,\mathrm{Ent}\,}}_{\pi }(\mu _1) - \frac{\kappa }{2}t(1-t) \mathcal {W}(\mu _0, \mu _1)^2 . \end{aligned}$$

In this case we write \({{\,\mathrm{Ric}\,}}(\mathcal {X}, Q, \pi ) \ge \kappa \); cf. [5, 18] for further details.

Corollary 4.12

Let \((\mathcal {X}, Q, \pi )\) be a Markov triple, and let \(\mathcal {Y}\) be a subset of \(\mathcal {X}\) with the retraction property. If \({{\,\mathrm{Ric}\,}}(\mathcal {X}, Q, \pi ) \ge \kappa \) for some \(\kappa \in \mathbb {R}\), then \({{\,\mathrm{Ric}\,}}(\mathcal {Y}, Q|_\mathcal {Y}, \pi |_\mathcal {Y}) \ge \kappa \) as well.

Proof

Take \(\mu _0, \mu _1 \in \mathcal {P}(\mathcal {Y})\), and let \((\mu _t)_t\) be a \(\mathcal {W}_\mathcal {Y}\)-geodesic connecting them. By Theorem 4.11, \((\mu _t)_t\) is also a geodesic in \(\mathcal {P}(\mathcal {X})\). Since \({{\,\mathrm{Ric}\,}}(\mathcal {X}, Q, \pi ) \ge \kappa \), it follows that \(t \mapsto {{\,\mathrm{Ent}\,}}(\mu _t|\pi )\) is \(\kappa \)-convex. As \({{\,\mathrm{Ent}\,}}_{\pi |_\mathcal {Y}}(\mu _t) = {{\,\mathrm{Ent}\,}}_{\pi }(\mu _t) + \log (\pi (\mathcal {Y}))\) and \(\mathcal {W}_\mathcal {Y}(\mu _0, \mu _1) = \mathcal {W}_\mathcal {X}(\mu _0, \mu _1)\), we infer that \(t \mapsto {{\,\mathrm{Ent}\,}}_{\pi |_\mathcal {Y}}(\mu _t)\) is \(\kappa \)-convex as well, which yields the result. \(\square \)

5 Optimal transport avoids dead ends

In this section we prove the intuitively natural statement that optimal curves do not transport mass into “dead ends”. We formalise this concept by considering the gluing of two Markov triples along a vertex.

Definition 5.1

(Gluing of Markov triples) Let \((\mathcal {X}_1,Q_1,\pi _1)\) and \((\mathcal {X}_2,Q_2,\pi _2)\) be Markov triples, and fix \(x_1\in \mathcal {X}_1\), \(x_2\in \mathcal {X}_2\). The gluing of the two triples at \(x_1, x_2\) is the Markov triple \((\mathcal {X},Q,\pi )\) defined by setting

$$\begin{aligned} \mathcal {X}= (\mathcal {X}_1\sqcup \mathcal {X}_2) / \{x_1, x_2\} \end{aligned}$$

and \(*= [x_1] = [x_2]\). For brevity, let us write \(\mathcal {X}_i' := \mathcal {X}_i \backslash \{x_i\}\). We have canonical injections \(\mathcal {X}_1' \rightarrow \mathcal {X}\), \(\mathcal {X}_2' \rightarrow \mathcal {X}\), and we identify elements of \((\mathcal {X}_1\sqcup \mathcal {X}_2) \backslash \{x_1,x_2\}\) with their respective images. We define transition rates \(Q : \mathcal {X}\times \mathcal {X}\rightarrow \mathbb {R}\) by

$$\begin{aligned} Q(x,y) = \left\{ \begin{array}{ll} Q_i(x,y) &{} \text {if }x, y \in \mathcal {X}_i' ,\\ Q_i(x,x_i) &{} \text {if }x \in \mathcal {X}_i'\text { and }y = *,\\ Q_i(x_i,y) &{} \text {if }x = *\text { and }y \in \mathcal {X}_i',\\ 0&{} \text {otherwise}.\end{array} \right. \end{aligned}$$

It is easy to see that Q is irreducible and reversible, and the unique invariant probability measure is given by

$$\begin{aligned} \pi (x) = \frac{1}{1 - \pi _1(\mathcal {X}_1') \pi _2(\mathcal {X}_2')} \times \left\{ \begin{array}{ll} \pi _1(x)\pi _2(x_2) &{} \text {if }x \in \mathcal {X}_1',\\ \pi _1(x_1)\pi _2(x) &{} \text {if }x \in \mathcal {X}_2',\\ \pi _1(x_1)\pi _2(x_2) &{} \text {if }x = *. \end{array} \right. \end{aligned}$$

Definition 5.2

(Dead end) Let \((\mathcal {X}, Q, \pi )\) be a Markov triple, and let \(\mathcal {X}_1, \mathcal {X}_2 \subseteq \mathcal {X}\). We say that \(\mathcal {X}_2\) is a dead end for \(\mathcal {X}_1\) (and vice versa) if the intersection of \(\mathcal {X}_1\) and \(\mathcal {X}_2\) contains exactly one point (denoted “\(*\)”), and moreover, \(Q(x,y) = Q(y,x) = 0\) whenever \(x \in \mathcal {X}_1'\) and \(y \in \mathcal {X}_2'\). Here, we write \(\mathcal {X}_i' = \mathcal {X}_i \backslash \{ *\}\).

Remark 5.3

The notions of dead end and gluing of Markov triples are compatible in the following sense: Let \((\mathcal {X}, Q, \pi )\) be a Markov triple, and suppose that \(\mathcal {X}_2 \subseteq \mathcal {X}\) is a dead end for \(\mathcal {X}_1 \subseteq \mathcal {X}\) with intersection point \(*\). Then one recovers \((\mathcal {X}, Q, \pi )\) by gluing together the restrictions of \(\mathcal {X}\) to \(\mathcal {X}_1\) and \(\mathcal {X}_2\) at \(*\).

Proposition 5.4

Let \((\mathcal {X}_1,Q_1,\pi _1)\) and \((\mathcal {X}_2,Q_2,\pi _2)\) be Markov triples, and let \((\mathcal {X}, Q, \pi )\) be the Markov triple obtained by gluing the triples at \(x_1 \in \mathcal {X}_1\) and \(x_2 \in \mathcal {X}_2\). Then \(\mathcal {X}_1\) and \(\mathcal {X}_2\) have the retraction property as subsets of \(\mathcal {X}\).

Proof

Define \(T : \mathcal {X}\rightarrow \mathcal {X}_1\) by \(T(x) = x\) for \(x \in \mathcal {X}_1\) and \(T(x) = *\) for \(x \in \mathcal {X}_2'\). One verifies that T indeed defines a retraction by distinguishing cases. \(\square \)

In view of Theorem 4.11, the previous result implies that any two measures \(\mu _0, \mu _1\) supported in (the image of) \(\mathcal {X}_1\) can be connected by a geodesic that is supported in \(\mathcal {X}_1\) for all times; i.e., weak locality holds. We will now show that in fact strong locality holds: any geodesic connecting \(\mu _0\) and \(\mu _1\) has to be supported in \(\mathcal {X}_1\).

Theorem 5.5

Let \((\mathcal {X}_1,Q_1,\pi _1)\) and \((\mathcal {X}_2,Q_2,\pi _2)\) be Markov triples, and let \((\mathcal {X}, Q, \pi )\) be the Markov triple obtained by gluing the triples at \(x_1 \in \mathcal {X}_1\) and \(x_2 \in \mathcal {X}_2\). If \((\mu _t)_{t\in [0,1]}\) is a geodesic in \((\mathcal {P}(\mathcal {X}),\mathcal {W})\) with \({\text {supp}}\mu _0, {\text {supp}}\mu _1\subseteq \mathcal {X}_1\), then \({\text {supp}}\mu _t\subseteq \mathcal {X}_1\) for all \(t\in [0,1]\).

Proof

Let \(t \mapsto V_t \in \mathbb {R}^{\mathcal {X}\times \mathcal {X}}\) be an anti-symmetric momentum vector field such that \((\mu , V)\) is a solution to the continuity equation with \(\int _0^1 \mathcal {A}(\mu _t, V_t) \; \mathrm {d}t = \mathcal {W}^2(\mu _0, \mu _1)\). We define a new curve \(t \mapsto \bar{\mu }_t \in \mathcal {P}(\mathcal {X})\) by

$$\begin{aligned} \bar{\mu }_t(x)={\left\{ \begin{array}{ll} \mu _t(x)&{} \text {if } x\in \mathcal {X}_1' ,\\ \mu _t(*)+\sum _{y\in \mathcal {X}_2'}\mu _t(y)&{} \text {if } x=*,\\ 0&{} \text {otherwise}, \end{array}\right. } \end{aligned}$$

and a new anti-symmetric momentum vector field \(t \mapsto \bar{V}_t \in \mathbb {R}^{\mathcal {X}\times \mathcal {X}}\) by

$$\begin{aligned} \bar{V}_t(x,y)={\left\{ \begin{array}{ll} V_t(x,y)&{} \text {if } x,y\in \mathcal {X}_1' \cup \{*\},\\ 0&{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

We claim that \((\bar{\mu }, \bar{V})\) solves the continuity Eq. (2.1) as well.

Indeed, this statement trivially holds for any \(x \in \mathcal {X}\backslash \{ *\}\). To prove the claim at \(*\), we note that for any \(y \in \mathcal {X}_2'\),

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t}\mu _t(y) = \sum _{x \in \mathcal {X}_2' \cup \{*\}} V_t(x,y). \end{aligned}$$

Therefore, using the anti-symmetry of \(V_t\),

$$\begin{aligned} \sum _{y \in \mathcal {X}_2'} \frac{\mathrm {d}}{\mathrm {d}t}\mu _t(y) = \sum _{y \in \mathcal {X}_2'} \sum _{x \in \mathcal {X}_2' \cup \{*\}} V_t(x,y) = \sum _{y \in \mathcal {X}_2'} V_t(*,y) . \end{aligned}$$

Furthermore,

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t}\mu _t(*) = \sum _{y\in \mathcal {X}_1'}V_t(y,*) + \sum _{y\in \mathcal {X}_2'}V_t(y,*) , \end{aligned}$$

hence by another application of the anti-symmetry,

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t}\bar{\mu }_t(*)&= \frac{\mathrm {d}}{\mathrm {d}t}\mu _t(*) + \sum _{y\in \mathcal {X}_2'} \frac{\mathrm {d}}{\mathrm {d}t}\mu _t(y) = \sum _{y\in \mathcal {X}_1'} V_t(y,*) = \sum _{y\in \mathcal {X}} \bar{V}_t(y,*) , \end{aligned}$$

which proves the claim.

For all \(t \in (0,1)\) and \(x, y \in \mathcal {X}\), we clearly have

$$\begin{aligned} A(\mu _t(x) Q(x,y), \mu _t(y) Q(y,x), V_t(x,y)) \ge A(\bar{\mu }_t(x) Q(x,y), \bar{\mu }_t(y) Q(y,x),\bar{V}_t(x,y)) . \end{aligned}$$

Moreover, if \(\mu _t(\mathcal {X}_2') > 0\) for some \(t \in (0,1)\), then there exists \(z \in \mathcal {X}_2'\) such that \(V_t(*, z) > 0\) and \(\Lambda (\mu _t(*) Q(*,z), \mu _t(z) Q(z,*)) > 0\) for all t on a set of positive measure in (0, 1). Therefore,

$$\begin{aligned} A(\mu _t(*) Q(*,z), \mu _t(z) Q(z,*), V_t(*,z)) > 0 = A(\bar{\mu }_t(*) Q(*,z), \bar{\mu }_t(z) Q(z,*),\bar{V}_t(*,z)) . \end{aligned}$$

This strict inequality contradicts the fact that \((\mu _t)_{t\in [0,1]}\) is a geodesic. \(\square \)

6 Nonlocality of optimal transport on the triangle

Consider a Markov triple \((\mathcal {X}, Q, \pi )\) and a connected subset \(\mathcal {Y}\subseteq \mathcal {X}\). In this section we show that locality of geodesics in \(\mathcal {P}(\mathcal {Y})\) may fail if \(\mathcal {Y}\) does not have the retraction property. We consider the simplest possible setting, where \((\mathcal {X}, Q, \pi )\) corresponds to simple random walk on a triangle, and \(\mathcal {Y}\subseteq \mathcal {X}\) is a two-point set. We show that the canonical lift of a geodesic between Dirac measures on the two-point space is not an optimal curve in \(\mathcal {P}(\mathcal {X})\), by constructing a competitor that transports mass along all edges.

Throughout this section we make the following additional assumption on the mean \(\Lambda \).

Assumption 6.1

For any \(s>0\) we have

$$\begin{aligned} \Lambda (s,t)\rightarrow \infty \quad \text {as } t\rightarrow \infty . \end{aligned}$$
(6.1)

If \(\Lambda (0,t)>0\) for \(t>0\), then (6.1) also holds for \(s=0\).

Clearly, this assumption is satisfied for the arithmetic, geometric, and logarithmic means, but not for the harmonic mean.

The main result of this section relies on the following lemma concerning the variation of the action functional on cycles of arbitrary length.

Lemma 6.2

For \(n \ge 3\), let \(\mathcal {X}= \mathbb {Z}/n\mathbb {Z}\) be equipped with transition rates \(Q_{ij}\) such that \(Q_{i,i+1},Q_{i+1,i}>0\) for all \(i\in \mathcal {X}\) and \(Q_{ij}=0\) otherwise. Let \(\mu , \nu \in \mathcal {P}(\mathcal {X})\), and let \(V,U \in \mathbb {R}^{\mathcal {X}\times \mathcal {X}}\) be anti-symmetric, and such that both \(\mathcal {A}(\mu ,V)\) and \(\mathcal {A}(\nu ,U)\) are finite. Assume that \(\mu _1,\mu _2>0\) and \(\mu _i = 0\) for all \(i\ne 1,2\), \(V_{12} \ne 0\), and \(V_{ij} = 0\) for all \(\{i,j\} \ne \{1,2\}\) and that \(U_{12}=0\). For \(\alpha \in [0,1]\) we define \(\mu ^\alpha =(1-\alpha )\mu +\alpha \nu \) and \(V^\alpha =(1-\alpha )V+\alpha U\). Then we have:

$$\begin{aligned}&\lim _{\alpha \rightarrow 0}\frac{1}{\alpha }\Big (\mathcal {A}(\mu ^\alpha ,V^\alpha )-\mathcal {A}(\mu ,V)\Big )\\&\quad = \frac{-V_{12}^2}{\Lambda \big (\mu _1 Q_{12}, \mu _2 Q_{21}\big )}\bigg [1+\frac{\partial _1\Lambda \big (\mu _1Q_{12},\mu _2Q_{21}\big )\nu _1Q_{12}+\partial _2\Lambda \big (\mu _1Q_{12},\mu _2Q_{21}\big )\nu _2Q_{21}}{\Lambda \big (\mu _1 Q_{12}, \mu _2 Q_{21}\big )}\bigg ]\nonumber \\&\qquad +\sum _{i=3}^{n-1}\frac{U_{i,i+1}^2}{\Lambda \big (\nu _iQ_{i,i+1}, \nu _{i+1}Q_{i+1,i}\big )} .\nonumber \end{aligned}$$
(6.2)

Proof

First note that

$$\begin{aligned} \mathcal {A}(\mu ,V) = \frac{V_{12}^2}{\Lambda \big (\mu _1 Q_{12},\mu _2 Q_{21} \big )}. \end{aligned}$$

Using the 1-homogeneity of \(\Lambda \) we observe that

$$\begin{aligned} \mathcal {A}(\mu ^\alpha ,V^\alpha )&=\frac{(1-\alpha )V_{12}^2}{\Lambda \big (\big (\mu _1+\frac{\alpha }{1-\alpha }\nu _1\big )Q_{12},\big (\mu _2+\frac{\alpha }{1-\alpha }\nu _2\big )Q_{21}\big )} + \alpha \sum _{i=3}^{n-1}\frac{U_{i,i+1}^2}{\Lambda \big (\nu _i Q_{i,i+1},\nu _{i+1}Q_{i+1,i}\big )}\\&\quad + \frac{\alpha U_{23}^2}{\Lambda \big (\big (\frac{1-\alpha }{\alpha }\mu _2+\nu _2\big )Q_{23},\nu _3 Q_{32}\big )} + \frac{\alpha U_{n1}^2}{\Lambda \big (\nu _n Q_{n1},\big (\frac{1-\alpha }{\alpha }\mu _1+\nu _1\big )Q_{1n}\big )}\\&=: T_1+T_2+T_3+T_4. \end{aligned}$$

Since \(\mu _1,\mu _2>0\), the first term in (6.2) is well-defined and easily seen to be the limit of \((T_1-\mathcal {A}(\mu ,V))/\alpha \). Obviously, \(T_2/\alpha \) converges to the second term in (6.2). Finally, \(T_3\) vanishes unless \(U_{23}\ne 0\). But in this case, since \((1-\alpha )/\alpha \rightarrow \infty \) as \(\alpha \rightarrow 0\), we see that \(T_3/\alpha \) converges to zero as \(\alpha \rightarrow 0\) by Assumption 6.1. A similar argument applies to \(T_4\). \(\square \)

Now we can prove the nonlocality result.

Theorem 6.3

Let \((\mathcal {X},Q,\pi )\) be a Markov triple with \(\mathcal {X}=\{1,2,3\}\) and such that \(Q(x,y) > 0\) for all \(x\ne y\). Let \((\mu _t)_{t \in [0,1]}\) be a \(\mathcal {W}\)-geodesic connecting \(\mu _0 = \delta _1\) to \(\mu _1 = \delta _2\). Then, \(\mu _t(3)>0\) for some \(0<t<1\).

As \(\mu _t(3)>0\) for some \(0<t<1\), the result implies that mass is transported along the edges (1, 3) and (3, 2).

Proof

Suppose that the geodesic \((\mu _t,V_t)_{t\in [0,1]}\) transports only along the edge (1, 2), i.e., \(V_t(2,3) = V_t(3,1) = 0\) for a.e. \(t \in (0, 1)\). Then \((\mu _t,V_t)\) must be given by the corresponding geodesic on the two point space \(\{1,2\}\). Obviously, we have \(\mu _t(1),\mu _t(2)>0\) for all \(t\in (0,1)\). Let \((\nu ,U) \in \mathsf {CE}_1(\delta _1,\delta _2)\) be a curve of finite action such that \(U_t(1,2)=0\) for a.e. t and \(\nu _t(1),\nu _t(2),\nu _t(3)>0\) for all \(0<t<1\). Define \((\mu ^\alpha ,V^\alpha )\in \mathsf {CE}_1(\delta _1,\delta _2)\) by \(\mu ^\alpha =(1-\alpha )\mu +\alpha \nu \) and \(V^\alpha =(1-\alpha )V+\alpha U\) for \(\alpha \in [0,1]\). Then Lemma 6.2 yields for a.e. t:

$$\begin{aligned} \lim _{\alpha \rightarrow 0}\frac{1}{\alpha }\big [\mathcal {A}(\mu _t^\alpha ,V_t^\alpha )-\mathcal {A}(\mu _t,V_t)\big ]<0. \end{aligned}$$

Consequently, there exists \(\alpha >0\) such that

$$\begin{aligned} \int _0^1\mathcal {A}(\mu _t^\alpha ,V_t^\alpha )\; \mathrm {d}t < \int _0^1\mathcal {A}(\mu _t,V_t)\; \mathrm {d}t, \end{aligned}$$

contradicting the optimality of \((\mu ,V)\). \(\square \)