$W_{1,+}$-interpolation of probability measures on graphs

We generalize an equation introduced by Benamou and Brenier, characterizing Wasserstein W_p-geodesics for p>1, from the continuous setting of probability distributions on a Riemannian manifold to the discrete setting of probability distributions on a general graph. Given an initial and a final distributions f_0 and f_1, we prove the existence of a curve (f_t) satisfying this Benamou-Brenier equation. We also show that such a curve can be described as a mixture of binomial distributions with respect to a coupling that is solution of a certain optimization problem.


Introduction
Given some p ≥ 1, we consider the space P p (X) of probability distributions over a metric space (X, d) having a finite p-th moment. On this space we define the Wasserstein distance W p by W p (µ 0 , µ 1 ) p := inf π∈Π(µ0,µ1) X×X d(x 0 , x 1 ) p dπ(x 0 , x 1 ), where the set Π(µ 0 , µ 1 ) is the set of couplings of µ 0 and µ 1 , i.e. the set of probability distributions π on X × X having µ 0 and µ 1 as marginals.
An comprehensive study of the minimization problem (1), called Monge-Kantorovitch problem, can be found in Villani's textbooks [Vil03] and [Vil08]. Let us recall what is important for our purposes: under very general assumptions, it is possible to prove the existence of a minimizer π ∈ Π(µ 0 , µ 1 ) for problem (1), called optimal coupling, and that W p is indeed a metric on P p (X). Moreover, if we suppose that (X, d) is a geodesic space, i.e. if the distance d(x 0 , x 1 ) is exactly the length of the shortest curve joining x 0 to x 1 , then the metric space (P p (X), W p ) is also a geodesic space. In particular, each couple µ 0 , µ 1 ∈ P 2 (X) can be joined curve (µ t ) t∈[0,1] of minimal length for W 2 , called W 2 -Wasserstein geodesic.
In their seminal papers [Stu06a], [Stu06b] and [LV09], Sturm and independently Lott and Villani studied the links between the geometry of a measured geodesic space (X, d, ν) and the behaviour of the entropy functional along the W 2 -Wasserstein geodesics on P 2 (X). For instance, (X, d, ν) is said to satisfy the curvature condition CD(K, ∞) for some K ∈ R if for each couple of probability distributions µ 0 , µ 1 ∈ P 2 (X) there exists a W 2 -geodesic (µ t ) t∈[0,1] such that where the relative entropy functional H ν (·) is defined by if µ = ρν for some density ρ, and by H ν (µ) := ∞ otherwise.
If the measured geodesic space (X, d, ν) is a compact Riemannian manifold with its usual distance an normalized volume measure, the curvature condition CD(K, ∞) is shown to be equivalent to the bound Ric ≥ K on the Ricci curvature tensor. Another important property is the stability of the condition CD(K, ∞) under measured Gromov-Hausdorff convergence.
Moreover, if CD(K, ∞) is satisfied for some K > 0, one can prove functional inequalities on (X, d, ν) such as the logarithmic Sobolev inequality, which asserts that for any Lipschitz probability density f and where |∇ − f | is to be seen as a particular form of the norm of a gradient. As a corollary, it can be shown that under the condition CD(K, ∞) for K > 0 a Poincaré inequality holds: for any Lipschitz funtion h : X → R such that X hdν = 0, we have Since the pioneering works of Sturm and Lott-Villani, the theory of measured geodesic spaces satisfying CD(K, ∞) has been thoroughly studied in a large number of papers, among which the most impressive are the works by Ambrosio, Gigli and Savaré (see for instance [AGS12]) and by Erbar,Kuwada and Sturm ([EKS13]).
Several obstacles prevent us from a direct generalization of Sturm-Lott-Villani theory to the framework of discrete metric spaces. Indeed, if (X, d) is a graph with its usual distance, equation (1) still defines a metric on the space P p (X), but if p > 1 then the length of non-trivial curves in (P(X), W p ) is +∞, which means that it is not a geodesic space. In particular, Wasserstein W 2 -geodesics do not exist in general.
Several solutions have been proposed to overcome this difficulty, and there are now many different definitions of Ricci curvature bounds on discrete spaces. The most notable of them are the coarse Ricci curvature, defined by Ollivier in [Oll09], and the Erbar-Maas curvature, defined in [EM12]. The latter is based on the study of the gradient flow of the entropy and present some similarities with our own approach.
In this paper, we place ourselves in the framework of a connected and locally finite graph G, endowed with its usual graph distance and the counting measure as the reference measure. In this framework, a probability distribution will be denoted by its density, i.e. by a function f : G → R + suwh that x∈G f (x) = 1. Given two probability distributions f 0 and f 1 on G, we investigate the question of the generalization of the notion of W p -geodesic joining f 0 to f 1 in a setting where such a curve does not exist. Our goal is to provide a way to chose, among the set of all W 1 -geodesics joining f 0 to f 1 , a curve which shares some properties satisfied by W p -geodesics for p > 1. Such curves will be called W 1,+ geodesics on the graph G.
This article is to be seen as the first of a two-paper research work. A following article will investigate the convexity properties of the entropy functional along those particular W 1 geodesics, in the view of obtaining a discrete version of equation (2) strong enough to imply discrete versions of log-Sobolev or Poincaré inequalities. This ultimate goal has to be kept in mind even in this present paper because it will motivate the definition of a W 1,+ -geodesic between f 0 and f 1 : along such a curve, some technical tools will allow us to give bounds on the second derivative of the entropy.
Our starting point is the article [BB99], by Benamou and Brenier. In this paper, the authors reformulate the Monge-Kantorovitch problem in terms of velocity fields and prove the following: Theorem 1.1. Let µ 0 , µ 1 be two probability distributions on a Riemannian manifold (M, g) and p > 1. Then the infimum being taken over the families of probability distributions (µ t ) := (f t d vol) joining µ 0 to µ 1 and all velocity fields where ∇· is the divergence operator on M . Moreover the minimizing curve (µ t ) t∈[0,1] is the W pgeodesic joining µ 0 to µ 1 .
This theorem has been extended to the framework of separable Hilbert spaces by Ambrosio, Gigli and Savré in [AGS].
The strategy used by Erbar and Maas in [EM12] is based on a generalization of the minimization problem (6) in the framework of discrete Markov chains. Our approach will consist in defining a discrete version of a characterization of its solutions. More precisely, as pointed in [BB99], the formal optimality condition for the optimization problem (6) can be written: Another point of view on the formal optimality condition (7) is provided by writing the velocity field (v t (x)) as the gradient of a family of convex functions v t := grad Φ t . As explained for instance in [0V00], it can be proven that such a function Φ satisfies the Hamilton-Jacobi equation It suffices to consider the gradient of equation (8) to recover equation (7).
The links between the convexity of the entropy H(t) of µ t and the Ricci curvature tensor on the manifold M are seen on the following heuristic formula, established by Otto and Villani in [0V00]: In particular, the non-negativity of the tensor Ric easily implies that H ′′ (t) ≥ 0.
The formal optimality condition (7) on velocity fields makes sense only when v is regular enough. The question of the regularity of optimal couplings is a difficult topic, see for instance [AGS]. However, what is important for our purposes is that (7) can be used to construct W 2 -geodesics: if (f t (x)) is a smooth family of probability densities satisfying the transport equation (1.1) for a smooth velocity field (v t (x)) satisfying the condition (7), then the curve (f t (x)) is a W 2 -geodesic.
In the simpler framework of the real line R with usual distance and Lebesgue reference measure, it is possible to give an equivalent statement of this result without introducing explicitly the velocity field.: ) be a family of smooth probability densitites on R. We define the families of functions We suppose that g t (z) > 0 and that the following one-dimensional Benamou-Brenier condition holds: Then (f t (x)) is a W p -geodesic for any p > 1.
Apart from regularity issues, which will not play an important role in a discrete framework, the main restriction made in the statement of Proposition 1.2 is the non-degeneracy condition g t (z) > 0. It is quite easy to prove that such a condition implies that f 0 is stochastically dominated by f 1 . In the setting of graphs, we will introduce the notion of W 1 -orientation (see Paragraph 2.2) in order to force the function g t to stay positive.
The main purpose of this article is to study curves in the space of probability distributions on a graph which satisfy a discrete version of the Benamou-Brenier condition (11).
• The goal of Section 2 is to provide a generalization of equations (10) and (11) to this discrete setting. We will first show that these equations can be recovered in a particular form in the case of contraction of measures. Given a couple of probability distributions f 0 , f 1 defined on G, we then endow G with an orientation which will allow us to give a general definition of W 1,+ -geodesics on G. The terminology "W 1,+ -geodesic" will be explained by considering a discrete version of problem (6) when p > 1 is close to 1.
• In Section 3, we are looking for necessary conditions satisfied by W 1,+ -geodesics on G. In particular, we prove in Theorem 3.18 that if f 0 and f 1 are finitely supported, then any W 1,+geodesic (f t ) can be written as a mixture of binomial distributions supported on geodesics of G.
• In Section 4 we prove the existence of W 1,+ -geodesics (f t ) with prescribed initial and final distributions f 0 and f 1 . The construction of such curves suggests us strong links with the "Entropic Interpolations" studied in a recent series of papers by Léonard.
We will use the usual graph distance on G: d(x, y) is the length of the shortest path joining x to y. The set of geodesics joining x to y, denoted by Γ x,y , is the set of paths γ joining x to y such that L(γ) = d(x, y). The set of all geodesics on G is denoted by Γ(G).
A coupling π ∈ Π(f 0 , f 1 ) is said to be a W p -optimal coupling for some p ≥ 1 if it is a minimizer for the functional I p : π → x,y∈G×G d(x, y) p π(x, y).
We denote by Π p (f 0 , f 1 ) the set of W p -optimal couplings.

Contraction of measures and the Benamou-Brenier equation
Among early attempts to generalize particular Wasserstein geodesics to the discrete case, one important example is given by the thinning operation: Definition 2.2. Let f be a probability distribution finitely supported on Z + . The thinning of f is the family (T t f ) of probability distributions defined by where by convention l k = 0 if l < 0 or if k / ∈ {0, . . . l}.
The operation f → T t f is often seen as a discrete version of the operation and is for instance used to state a weak law of small numbers (see [HJK10]) about the limit in distribution of T 1/n (f ⋆n ) when n → ∞.
We know that, given a smooth probability density f on R, the family (f t ) defined by equation (14) is a W p -geodesic for any p ≥ 1. According to Sturm-Lott-Villani theory, the metric space (R, | · |) satisfies the condition CD(0, ∞), so the entropy H(t) of f t with respect to the Lebesgue measure is a convex function of t. On the other hand, a theorem by Johnson and Yu (see [YJ09]) asserts that the entropy of the thinning T t f is also a convex function of t. The proof of this theorem given in [Hil14] relies on the following: Proposition 2.3. Let (f t ) := (T t f ) be the thinning family associated to a probability distribution f = f 1 supported on Z + . We define the families of functions (g t ) and (h t ) by The triple (f t , g t , h t ) then satisfies the discrete Benamou-Brenier equation: Moreover, g t (k) ≥ 0, and if g t (k) = 0 then either f t (k + 1) = 0 or h t (k) = 0.
Remark 2.4. Denoting by ∇ 1 (resp. ∇ 2 ) the left derivative operator (resp. the left second derivative operator) defined by ∇ 1 u(k) := u(k) − u(k − 1) (resp. ∇ 2 u(k) = u(k) − 2u(k − 1) + u(k − 2)), we thus have The proof of the convexity of the entropy along thinning families relies so importantly on Proposition 2.3 that this proof can be used verbatim to prove a stronger statement: Proposition 2.5. Let (f t ) be a family of finitely supported probability distributions on Z. We suppose that the families of functions (g t ) and (h t ), defined by equation (15), satisfy the discrete Benamou-Brenier equation (16) and the non-negativity condition g t (k) ≥ 0. Then the entropy H(t) of f t is a convex function of t.
Because the similarities with equation (11), it seems legitimate to consider a family of measures satisfying equation (16) and the non-negativity condition g t (k) ≥ 0 as a pseudo W p -geodesic, for p > 1, along which the entropy functional is convex, which is reminiscent of Sturm-Lott-Villani theory.
The notion of thinning has been extended in [Hil14] to the setting of general graphs in the following way: we consider a probability distribution f 1 defined on G and another probability measure f 0 which is a Dirac mass at a given point o ∈ G. In this case, an interpolating curve (f t ), called contraction of f 1 on o, is defined as a mixture of binomial distributions by where the binomial distribution on γ is related to the classical binomial distribution by ∀p ∈ {0, . . . L(γ)} , bin γ,t (γ(p)) := bin L(γ),t (p) and where |Γ o,z | denotes the cardinality of the set Γ o,z of geodesics joining o to z.
A couple of initial and final distributions δ o = f 0 and f 1 being given, we define a partial order on the set of vertices of G by writing x 1 ≤ x 2 if the vertex x 1 belongs to a geodesic γ ∈ Γ o,x2 . If x 1 ∼ x 2 and x 1 ≤ x 2 , we say that (x 1 x 2 ) is an oriented edge and we write x 2 ∈ F (x 1 ), To the oriented graph (G, →) are associated two other oriented graphs: Similarly, we define the graph of oriented triples (T (G), →) := (E(E(G)), →), having as vertices the triples (x 1 x 2 x 3 ) with x 1 → x 2 → x 3 and edges between each couple (x 0 x 1 x 2 ) and (x 1 x 2 x 3 ).
Remark 2.7. The graph G being now oriented, the notations E(G) and T (G) stand for (E(G), →) and (T (G), →), which is a slight abuse of notation. For instance, (xy) ∈ E(G) imply that x → y. This remark will still be valid once introduced the W 1 -orientation on G.
Orienting the graph G allows us to define a divergence operator: Similarly, the divergence of a function h : We use this orientation to express the function f t as a product of two functions satisfying interesting differential equations: Proposition 2.9. There exists a couple (P t ), (Q t ) of families of non-negative functions on G such that:

The functions P and Q satisfy the equations
This proposition is proven in [Hil14]. We can now use Definition 2.8 and 2.9 to state a generalized version of Proposition 2.3: (20) 1. The functions f , g and h satisfy the differential equations Remark 2.11. As in the thinning case, Proposition 2.10, and in particular equation (22) are used to study the convexity of the entropy functional along contraction families on graphs.

The W 1 -orientation
It is not possible to use directly Proposition 2.10 to propose a general Benamou-Brenier condition because such a definition relies on an orientation of the graph G which has been constructed by using the fact that f 0 is Dirac. As a first necessary step in the construction of general W 1,+geodesics, we thus need to find a nice orientation on G, depending on the initial and final measures f 0 and f 1 .
The term "nice orientation" is vague, but the study of the thinning and of the contraction families suggests that, in order to have interesting consequences on the convexity of the entropy, we should at least require that g t (x 1 x 2 ) ≥ 0 for every x 1 → x 2 ∈ E(G). As we will see at the end of this paragraph, this requirement can be interpreted in the framework of optimal transportation theory.
We first recall some properties of supports of W 1 -optimal couplings: Definition 2.12. Given a couple f 0 , f 1 of finitely supported measures, we associate the set Equivalently, C(f 0 , f 1 ) is the smallest subset of G×G containing the supports of all the W 1 -optimal couplings between f 0 and f 1 .
As f 0 and f 1 are finitely supported, we can consider the barycenter which by convexity is in Π 1 (f 0 , f 1 ) and which is clearly fully supported in C(f 0 , f 1 ).
Definition 2.16. Let f 0 , f 1 be two finitely supported probability distributions on G.
• An oriented path on the oriented graph (G, →) is an application γ : • We define a partial order relation on the vertices of G by writing x ≤ y if there exists an oriented path joining x to y.
An important property of the W 1 -orientation is the following: Theorem 2.17. Every oriented path on (G, →) is a geodesic.
The following shows that the W 1 -orientation is in some sense stable by restriction: The proof of this fact is inspired by the 'gluing lemma' stated and explained in [LV09]: let We consider the 'gluing' π of these three couplings, defined by: This shows the W 1 -optimality of π and the equality .
Theorem 2.17 shows that whenever π(a, d We now prove: We endow this graph with the W 1orientation with respect to f 0 , f 1 . There exists a family (g t ) : Moreover, there exists a family h t : T (G) → R such that We first prove a general result implying the existence of a family (g t ) such that ∂ ∂t f t (x) = −∇g t (x): Lemma 2.20. Let (G, →) be an oriented graph and u : G → R finitely supported such that x∈G u(x) = 0. Then there exists g : (E(G), →) → R with ∇g = u. Proof. We consider two scalar products, on the spaces of functions defined respectively on G and E(G), defined by The adjoint of the divergence operator ∇ is −∂, where ∂ is the linear operator ∂ defined by (∂u)(xy) := u(y) − u(x), in the sense that for any couple u, a of functions respectively defined on G and E(G). The kernel of ∂ is the one-dimensional space generated by the constant function v = 1. The condition x∈G u(x) = 0 is thus equivalent to < u, v > G = 0 or u ∈ (ker(∂)) ⊥G . We thus want to prove the inclusion (ker(∂)) ⊥G ⊂ im(∇). As the linear spaces we are considering are finite-dimensional, this inclusion is equivalent to (im(∇)) ⊥G ⊂ ker(∂). Let u ∈ (im(∇)) ⊥G . Then for any b : (E(G), →) → R we have < ∇b, u > G = 0, so < b, ∂u > E = 0, which proves that u ∈ ker(∂).
As we have x∈G ∂ ∂t f t (x) = 0, Lemma 2.20 gives the existence of a family (g t ) with ∂ ∂t f t (x) = −∇g t (x). However, this result does not provide an explicit construction of g and in general nothing can be said about its sign.
Proof of Theorem 2.19. Let G ′ be a spanning tree of G, i.e. a tree having the same vertices as G, but with possibly fewer edges. We endow G ′ with the restriction of the orientation on G. According to Lemma 2.20, there exists a family of functions (g t ) : E(G ′ ) → R + satisfying ∂ ∂t f t (x) = −∇g t (x). As G ′ is a tree, we know that removing an edge (x 0 y 0 ) from the graph G ′ will cut it into two disjoint subgraphs G ′ 1 := G ′ 1 (x 0 y 0 ) and G ′ 2 := G ′ 2 (x 0 y 0 ) such that x 0 ∈ G ′ 1 and y 0 ∈ G ′ 2 . Let u (x0y0) be the indicator function of G ′ 1 . This function satisfies (∂u (x0y0) )(xy) = −1 if (xy) = (x 0 y 0 ) and (∂u (x0y0) )(xy) = 0 otherwise, which implies: We want to prove that g t (x 0 y 0 ) ≥ 0. Actually we will prove that the function t → z∈G ′ By Proposition 2.18, we know that if π(x, y) > 0 then x ≤ y. In particular, we cannot have x ∈ G ′ 2 and y ∈ G ′ 1 . Equivalently, if x ≤ y, π(x, y) > 0 and y ∈ G 1 then x ∈ G 1 . Consequently, we have Furthermore, as (x 0 y 0 ) is an oriented edge, we know by the definition of W 1 -orientation that there exists (x, y) ∈ C(f 0 , f 1 ) such that x ≤ x 0 ≤ y 0 ≤ y. In particular, x ∈ G ′ 1 , y ∈ G ′ 2 and π(x, y) > 0.
This proves that the inequality (31) is actually strict, which shows the positivity of the family of functions (g t ) on E(G ′ ). The first point of Theorem 2.19 is proven by extending g t to E(G), setting g t (xy) := 0 if (xy) / ∈ E(G ′ ).
The existence of a family of functions (h t ) such that ∂ ∂t g t = −∇h t is proven by Lemma 2.20. We only need to check that (xy)∈E(G) ∂ ∂t g t (xy) = 0. We are actually going to prove the stronger statement: To prove this fact, we consider the function u := (x0y0)∈E(G ′ ) u (x0y0) . The function u satisfies (∂u)(xy) = 1 for every x → y ∈ E(G ′ ). We then have: Let π ∈ Π 1 (f 0 , f t ). We know by Proposition 2.18 that if π(x, y) > 0 then x ≤ y. On the other hand, if x ≤ y then there exists a path x = γ 0 → · · · → γ n = y and we have u(y) − u(x) = (u(γ n ) − u(γ n−1 )) + · · · + (u(γ 1 ) − u(γ 0 )) = n = d(x, y), so we have Differentiating with respect to t shows that the sum (xy)∈E(G) g t (xy) is constant and equal to W 1 (f 0 , f 1 ). To finish the proof of the theorem, we extend (h t ) to T (G) by defining h t ( Actually, Theorem 2.19 can be strengthened in the following way: Proposition 2.21. In Theorem 2.19, we can replace the assertion ∀(xy) ∈ E(G) , g t (xy) ≥ 0 by ∀(xy) ∈ E(G) , g t (xy) > 0.
Proof. The proof of Theorem 2.19 allowed us to construct, given a spanning tree G ′ ⊂ G, a family of functions (g G ′ t ) such that g G ′ t (xy) > 0 when (xy) ∈ E(G ′ ) and g G ′ t (xy) = 0 when (xy) / ∈ E(G ′ ). But for each edge (x 0 y 0 ) ∈ E(G) there exists a spanning tree G ′ ⊂ G with (x 0 y 0 ) ∈ E(G ′ ). We define a family (g t ) : E(G) → R as the barycenter where T is the (finite) set of spanning trees for G. Then g t > 0 and satisfies the conditions of Theorem 2.19. We finally construct a suitable family (h t ) by defining h t := 1 as in the proof of Theorem 2.19.

Definition of W 1,+ -geodesics
Having now constructed an orientation of G associated to each couple of finitely supported probability distributions f 0 , f 1 ∈ P(G), we propose a definition of W 1,+ -geodesic inspired by Proposition 2.10: Definition 2.22. Let G be a graph, W 1 -oriented with respect to a couple of finitely supported probability distributions f 0 , f 1 . A family (f t ) is said to be a W 1,+ -geodesic if: 1. The curve (f t ) is a W 1 -geodesic.
2. There exists two families (g t ) and (h t ) defined respectively on E(G) and T (G) such that 3. For every (xy) ∈ E(G) we have g t (xy) > 0.

4.
The triple (f t , g t , h t ) satisfies the Benamou-Brenier equation Remark 2.23. In the sequel,the assertion "let (f t ) be a W 1,+ -geodesic" means "let ((f t ), (g t ), (h t )) be a triple of families of functions satisfying the conditions of Definition 2.22". This is an abuse because nothing is a priori known about the uniqueness of the families (g t ) and (h t ) associated to a W 1,+ -geodesic.
Remark 2.24. We can check that any contraction of measure on a graph is also a W 1,+ -geodesic: if f 0 = δ o is a Dirac measure, then the set Π 1 (f 0 , f 1 ) has only one element, and it easy to prove that the W 1 -orientation with respect to f 0 , f 1 coincide with the orientation used for contraction of measures. Proposition 2.10 shows that the other points of Definition 2.22 are satisfied by contraction families.
It is possible to state (32) in terms of two different velocity fields: and the velocity functions V +,t and V −,t by The following differential equations then hold: Proof. We use the definitions of g t and h t and then apply the Benamou-Brenier equation (32) to write: The second formula is proven by similar methods.
We now give some heuristic arguments explaining the terminology 'W 1,+ -geodesic'. Let us consider the minimization problem described by equation (6) of Theorem 1.1, when the paramater p = 1 + ε is close to 1. We use the expansion a 1+ε = a exp(ε log(a)) = a + ε a log(a) + O(ε 2 ), valid for a > 0, to write The integral M 1 0 |v t (x)|dµ t (x)dt is exactly equation (6) for p = 1. We thus know, by Theorem 1.1 that the minimizers of this integral over the set of families (f t ) of probability measures with f 0 , f 1 prescribed and ∂ ∂t f t (x) + ∇ · (v t (x)f t (x)) = 0 are exaclty the W 1 -geodesics joining f 0 to f 1 . This suggests the following: Definition 2.26. We say that a curve (f t ) of probability measures on a Riemannian manifold M is a W 1,+ -geodesic on M if it is solution to the minimization problem where the infimum is taken over the set of all W 1 -geodesics between f 0 and f 1 and where the velocity field (v t ) is defined by the continuity equation The formal optimality condition on (v t ) obtained by applying Euler-Lagrange equations is the same as for W p -geodesics: The next proposition shows that W 1,+ -geodesics on a graph can be related to a minimization problem similar to the continuous one described in Definition 2.26: Proposition 2.27. Let G be a W 1 -orientated with respect to f 0 , f 1 finitely supported. We consider the problem where the infimum is taken over the set of W 1 -geodesics (f t ) between f 0 and f 1 such that the velocity v +,t (xy) is defined by equation (33) from a positive family (g t ) with ∂ ∂t f t (x) = −∇g t (x). We suppose that there exists a W 1,+ -geodesic (f t ) joining f 0 to f 1 . Then (f t ) is a critical point for I + in the following sense: if (u t ) is a family of functions defined on E(G) satisfying the boundary conditions u 0 (xy) = u 1 (xy) = 0, then Remark.Recall that, given a W 1 -geodesic (f t ), the continuity equation ∂ ∂t f t (x) = −∇g t (x) may be solved by a family (g t ) which is not necessarily always positive. We restrict ourselves to the families of positive (g t ), which always exist by Proposition 2.21, in order to write |v +,t (xy)| = v +,t (xy).
Proof of Proposition 2.27. When η is small, we have the expansion On the other hand, we use the boundary conditions u 0 = u 1 = 0 to write: which proves that I + f + η∇u, g − η ∂u ∂t = I + (f, g) + O(η 2 ). Remark. Similarly, it can be proven that a W 1,+ -geodesic is also critical for the functional 3 W 1,+ -geodesics as mixtures of binomial distributions W 1,+ -geodesics have been constructed as generalizations of contraction families, which have been defined as mixture of binomial distributions. In this section, we fix a W 1,+ -geodesic (f t ) on G, joining two finitely supported probability distributions f 0 , f 1 ∈ P(G). It will always be assumed that the graph G is W 1 -oriented with respect to f 0 , f 1 and that every path is an oriented path, thus a geodesic, by Theorem 2.17.
The main purpose of this section is to prove Theorem 3.18: (f t ) can also be expressed as a mixture of binomial measures, with respect to a coupling π ∈ Π(f 0 , f 1 ) solution to a certain minimization problem. The key ingredients to the proof of this theorem are the study of the behaviour of (f t ) along particular geodesics of G, called extremal and semi-extremal geodesics, and the construction of two sub-Markov kernels K, K ⋆ on G associated to (f t ).

Extremal geodesics
Recall that we write x 2 ∈ F (x 1 ) and x 1 ∈ E(x 2 ) if x 1 ≤ x 2 and d(x 1 , x 2 ) = 1 or equivalently if (x 1 x 2 ) is an oriented edge of G. If γ is a geodesic of G, it will be sometimes convenient to use the notation γ i := γ(i).
Definition 3.1. Let γ be a geodesic on G.
Proof. If L(γ) = 0, equation (41) is equivalent to ∂ ∂t f t (γ 0 ) = −(∇g t )(x 0 ), which is true by the definition of (g t ). If L(γ) ≥ 1, we notice that Proposition 2.25 gives: Multiplying by C γ (t) and applying equation (42) leads to the result. Equation (41) takes a simpler form in the case where the set E(e 0 (γ)) (or F (e 1 (γ)), or both) is empty. This motivates the following: Definition 3.3. We define the particular subsets of vertices of G: • The set of initial vertices A ⊂ G contains every x 1 ∈ G such that E(x 1 ) is empty.
• The set of final vertices B ⊂ G contains every x 1 ∈ G such that F (x 1 ) is empty.
Remark 3.4. The sets A and B are both non empty. If we suppose for instance that B is empty, then we can construct an infinite sequence (x n ) n≥0 in G such that x n+1 ∈ F (x n ). But, f 0 and f 1 being finitely supported and G being locally finite, the set of oriented edges of G is finite so x p = x q for a couple of indices q > p. This means that there exists a non-trivial oriented path γ joining x p to itself, which is a contradiction because γ is a geodesic of G by Proposition 2.17.
An immediate corollary of Proposition 3.2 is the following: Proposition 3.5. Let γ be a geodesic of G.
• If γ ∈ EΓ, then C γ (t) = C γ is a constant function of t.
Proof. If γ ∈ EΓ, then the sets E(e 0 (γ)) and F (e 1 (γ)) are empty, which by Proposition 3.2 shows that C γ is a constant function of t. We prove the second point by induction on m = m(γ) := sup{L(γ) :γ ∈ SEΓ 2,x }, which only depends on the endpoint e 1 (γ) = x . If m = 0 then γ ∈ EΓ and this case has been considered in the first point. We now fix a geodesic γ ∈ SEΓ 1,x such that m(γ) ≥ 1. We apply Proposition 3.2 and use the fact that e 0 (γ) ∈ A to write: It is easily shown that, for z ∈ F (x), m(γ ∪ {z}) = m(γ) − 1, which proves by induction on m that C γ (t) is polynomial in t of degree less than m(γ).
3.2 Sub-Markov kernels associated to a W 1,+ -geodesic The fact that the function C γ is constant and positive on extremal geodesics allows us to introduce a useful function on ordered subsets of G: Definition 3.6. Given an ordered p-uple z 1 ≤ z 2 ≤ . . . z p of vertices of G, we define where E(z 1 , . . . z p ) ⊂ EΓ is defined by: If γ is a geodesic of G, we denote by m(γ) the number m(e 0 (γ(0)), · · · e 1 (γ)).
Writing that both sets have same cardinality gives the result.
• The operators K and K * are adjoint for the scalar product < f, g >:= x∈G f (x)g(x)m(x).
• The iterated kernel K n is supported on the set of couples (x n , x 0 ) such that x 0 ∈ E n (x n ), i.e. such that x 0 ≤ x n and d(x n , x 0 ) = n. For such a couple we have • Similarly, for x n ∈ F n (x 0 ), i.e. for x n ≤ x 0 such that d(x n , x 0 ) = n we have • The operators K and K * are nilpotent.
Proof. The first point comes from the fact that, if x 0 / ∈ B, there exists a bijection between the set E(x 0 ) and the disjoint union x1∈F (x0) E(x 0 , x 1 ). The second point is proven similarly. The third point is proven by noticing that both scalar products < Kf, g > and < f, K * g > are equal to To prove the fourth point, we write the general formula for the iterated kernel for some n ≥ 2: The product K(x n , x n−1 ) · · · K(x 1 , x 0 ) is non-zero if and only if x 0 → · · · → x n , i.e. if (x 0 , . . . , x n ) is a geodesic. This proves that K n (x n , x 0 ) > 0 implies that x 0 ∈ E n (x n ). Moreover we have: by Proposition 3.7. The fifth point is proven similarly. The nilpotency of K and K * comes from the fact that (G, →) has a finite diameter: if n > Diam(G) then K n = 0 and (K * ) n = 0.
Remark 3.10. The first point of Proposition 3.9 shows that K can easily be transformed into a Markov kernel: it suffices to add a vertex ω (often called "cemetery") to G and oriented edges ω → x for every x ∈ A. The sub-Markov kernel K is extended into a Markov kernel on G ∪ ω by defining K(ω, ω) = 1 and K(ω, x) = 1 for every x ∈ A. The kernel K * can be treated similarly, by considering the oriented edges (x, ω) for x ∈ B.

Polynomial structure of W 1,+ -geodesics
In this paragraph we use properties of the functions C γ (t) and of the sub-Markovian kernels K, K ⋆ to give expression of (f t ) as a mixture of binomial distributions on geodesics of G.
A direct calculation proves the following fundamental result: Proposition 3.11. Let x ∈ G be a vertex and γ,γ be two geodesics on G with γ ∈ SEΓ 1,x and γ ∈ SEΓ 2,x . Then where γ ∪γ is the concatenation of γ andγ.
Remark 3.12. A first consequence of Propositions 3.5 and 3.11 is the fact that, for any x ∈ G, f t (x) is a polynomial function of t such that deg(f t (x)) ≤ Diam(G).
We also use Proposition 3.11 to show the following: Proposition 3.13. For x ∈ G, we consider two semi-extremal curves γ (1) , γ (2) ∈ SEΓ 1,x . The does not depend on t and is equal to m(γ (1) ) m(γ (2) ) . Furthemore, we have Proof. Letγ be in SEΓ 2,x (G). Then Proposition (3.11) shows that We use the fact that this quotient does not depend onγ to write The second point is proven by writing We now introduce two families of functions which play the same role as in the case of contraction of measures: Definition 3.14. We define the functions P t (x) and Q t (x) by Proposition 3.15. The functions f t , g t and h t are related to P t , Q t and m by Proof. To prove the first point, we notice that the concatenation map γ (1) , γ (2) → γ (1) ∪ γ (2) is a bijection between the sets SEΓ 1,x0 × SEΓ 2,x0 and E(x 0 ). We then use Proposition 3.11 to write: To prove the second point, given of vertices x 0 → x 1 we consider the bijection between the sets SEΓ 1,x0 × SEΓ 2,x1 and E(x 0 , x 1 ) given by the concatenation γ (1) , γ (2) → γ (1) ∪ γ (2) . Moreover, if γ (1) ∈ SEΓ 1,x0 and γ (2) ∈ SEΓ 2,x1 have length L 1 ≥ 2 and L 2 ≥ 2 we have: Summing over all γ (1) , γ (2) gives Replacing f t (x 0 ) and f t (x 1 ) by their expressions in terms of P t , Q t proves the second point. The third point is simply proven by using the Benamou-Brenier equation: Proposition 3.16. The functions P t and Q t satisfy the differential equations Proof: When applied to semi-extremal geodesics, Proposition 3.2 takes a simpler form. More precisely, if γ (2) ∈ SEΓ 2,x0 , we have On the other hand, by Proposition 3.13, we have: Summing this last equation over x −1 ∈ E(x 0 ) gives the result. The differential equation for Q t (x 0 ) is proven similarly.
Proposition 3.17. There exist two functions a, b : G → R such that Proof. For x ∈ G, let a(x) := P 0 (x) be the constant term of the polynomial t → P t (x). Using Proposition 3.16 and Proposition 3.9, we have The proof of the second point is quite similar: defineQ t (z) := Q 1−t (z) and b(y) :=Q 0 (y) = Q 1 (y). As we have ∂Qt(z) ∂t = (K * Q t )(z), we use again Proposition 3.9 to conlude.
We are now ready to write the W 1,+ -geodesic (f t ) as a mixture of binomial distributions: Theorem 3.18. For any couple of vertices x ≤ y ∈ G we define the binomial probability distribution on bin (x,y),t on G, associated to the application m, supported on the set of vertices z ∈ G such that x ≤ z ≤ y, by The W 1,+ -geodesic (f t ) t∈[0,1] is a mixture of such binomial distributions: Proof. The theorem follows from the calculation: and from the fact that m(x,z)m(z,y) m(z) = m(x, z, y) (by Proposition 3.7).

Existence of W 1,+ -geodesics
In the previous section, we showed that any W 1,+ -geodesic (f t ) can be expressed a mixture of binomial distributions with respect to a certain coupling between f 0 and f 1 . We now turn to the question of the existence of a W 1,+ -geodesic (f t ) joining two fixed probability distributions f 0 , f 1 . Through this section, we fix such a couple and endow the underlying graph G with the W 1 -orientation associated to f 0 , f 1 .
• If p ≥ 2 and γ : {0, . . . p} → G is a geodesic, then  3. An equivalent way to define the extension of m is to define m(γ) on extremal geodesics using equation (58) and to extend it to general (p + 1)-uples as in Definition 3.6, the quantity m(γ) playing the role of C γ .
Theorem 4.4. A W 1 -geodesic (f t ) is a W 1,+ -geodesic if and only if there exists: • A function m : E(G) → R * + satisfying ∇m(x) = 0 for x / ∈ A, B, extended to ordered families of G.
• A couple of non-negative functions a, b : G → R + , such that equations (56) and (57) hold.
Proof. The "only if" part of Theorem 4.4 is exactly Theorem 3.18. Indeed, the restriction to E(G) of the function m constructed from a W 1,+ -geodesic (f t ) satisfies ∇m = 0 outside of A ∪ B, and using Definition 4.1 to extend this restriction to ordered families allows us to recover the original m. Moreover, the functions a and b introduced in Proposition 3.17 are non-negative: a(x) is the constant term of the polynomial P t (x), which is non-negative for every t ∈ [0, 1], and the same goes for b(x).
Conversely, let (f t ) be a curve satisfying the assumptions of Theorem 4.4. We define the polynomial functions Direct calculations show that f t (z) = m(z)P t (z)Q 1−t (z). Moreover, using the definition of m(x, z) and m(z, y), one can prove easily that P t and Q t satisfy the differential equations This allows us to write ∂ ∂t f t (z) = −∇g t (z) where we define Similarly, defining h t (x 0 x 1 x 2 ) := m(x 0 , x 1 , 2 )P t (x 0 )Q 1−t (x 2 ) we have ∂ ∂t g t (x 0 x 1 ) = −∇h t (x 0 x 1 ). The positivity of P t and Q 1−t implies the positivity of g t (x 0 x 1 ). Moreover, the formula which shows that (f t ) is a W 1,+ -geodesic.
The task of finding a W 1,+ -geodesic joining f 0 to f 1 is simplified by Theorem 4.4 because it turns it into the static problem of finding a coupling π between f 0 and f 1 such that π(x, y) := m(x,y) d(x,y)! a(x)b(y)1 x≤y for a couple of functions a(x), b(y) defined on G and for a function m constructed in Definition 4.1.
This method can be used to prove the existence of W 1,+ -geodesics with prescribed initial and final distributions: Theorem 4.5. Let f 0 , f 1 ∈ P(G) be finitely supported. Then there exists a W 1,+ -geodesic between f 0 and f 1 .
Proof. Let m : E(G) → R * + be any positive function with ∇m(x) = 0 for every x / ∈ A, B, and extended to ordered families of G. We set c(x, y) := m(x,y) d(x,y)! . By Theorem 4.4, it suffices to prove the existence of a coupling π ∈ Π 1 (f 0 , f 1 ) such that π(x, y) = c(x, y)a(x)b(y)1 x≤y for a couple of positive a, b : G → R.
We will adopt the following point of view on the set Π 1 (f 0 , f 1 ): In the space R D with the usual sclar product, we consider the particular families of vectors (j 0,x ) x∈G and (j 1,y ) y∈G defined by ∀(x, y) ∈ D , j 0,x0 (x, y) := 1 x=x0 , j 1,y0 (x, y) := 1 y=y0 .
If for every (x, y) ∈ D we have x 0 = x then we set j 0,x0 = 0.
In particular, we have In other words, Π 1 (f 0 , f 1 ) is seen as the intersection of the "quadrant" R D + with an affine subspace of R D directed by the vector subspace V ⊥ , where V is the vector space generated by the families (j 0,x ) x∈G and (j 1,y ) y∈G .
Depending on the dimension of Π 1 (f 0 , f 1 ) as a subset of an affine subspace of R D , we will consider two cases: 1. The dimension of Π 1 (f 0 , f 1 ) is zero. In this case, the vector space V is R D . In particular, the vector l ∈ R D , with components l(x, y) := π(x,y) c(x,y) for every couple x ≤ y ∈ D, can be written under the form l(x, y) = x∈G A(x)j 0,x + y∈G B(y)j 1,y for a unique couple of functions A, B defined on G. Considering the exponential of each side proves that π can be written under the form π(x, y) := c(x, y)a(x)b(y)1 x≤y with a(x) := exp(A(x)) and b(y) := exp(B(y)).
We consider the mapping J : R D + → R defined by J(π) := (x,y)∈D π(x, y) log π(x, y) c(x, y) − π(x, y), where the variables are denoted by π(x, y), for x ≤ y. The function J is clearly continuous on R D + and smooth on R * + D . Moreover, we have: ∂J ∂π(x, y) = log π(x, y) c(x, y) .
The Hessian of J is thus a diagonal matrix with positive coefficients 1 π(x,y) (x,y)∈D , so J is strictly convex on R * + D .
The set Π 1 (f 0 , f 1 ) being compact, the infimum of J on Π 1 (f 0 , f 1 ) is attained for some coupling π. As J is striclty convex and Π 1 (f 0 , f 1 ) is a convex subset of R D , we know thatπ is unique and that we have eitherπ ∈ ∂Π 1 (f 0 , f 1 ) orπ ∈ Π 1 (f 0 , f 1 ) • and in this second caseπ is a critical point for the restriction to Π 1 (f 0 , f 1 ) of the application J.
We have proven the existence of a unique critical pointπ ∈ Π(f 0 , f 1 ) • for the restriction to Π 1 (f 0 , f 1 ) of J. As Π 1 (f 0 , f 1 ) is a subset of an affine space directed by a vector subspace V ⊥ , we know that gradπ J ∈ V.
In other terms, grad π0 (J) = x∈G A(x)j 0,x + y∈G B(y)j 1,y for a couple of functions A, B : G → R. Due to the particular form taken by j 0,x and j 1,y , Equation (61) can be rewritten in a simple way: ∀(x, y) ∈ D , grad π0 (J)(x, y) = A(x) + B(y).
But equation (60) gives an explixcit formula for grad π0 (J)(x, y), which allows us to write, for (x, y) ∈ D:π (x, y) c(x, y) = exp grad π0 (J)(x, y) Remark 4.6. The particular form taken by W 1,+ -geodesics (see Equation (57)) and the minimisation problems associated by the functionals (59) and (37), are reminiscent of the theory of Entropic Interpolations, constructed in a recent series of articles by Léonard. A survey of the main results of this theory is found in [Leo14]. A construction of entropic interpolations and a discussion of the cases where they can be described as mixtures of binomials is found in [Leo13b]. Another paper, see [Leo13a], addresses the question of the convexity of entropy along such interpolations.
A major difference between these two kinds of interpolations lies in their construction: in order to define an entropic interpolation on a graph G, one requires an underlying Markov chain to which is canonically associated a positive measure R 01 on the set of couples of vertices x, y ∈ G. On the other hand, the definition of a W 1,+ -interpolation does not require an underlying Markov chain. It only relies on the "metric-measure" properties of the graph G, endowed with its counting measure. However, to each W 1,+ -geodesic is associated a function m on the ordered subsets of G, which is used to construct sub-Markov kernels.
A complete understanding of the links between entropic interpolations and W 1,+ -geodesics, and more especially between the measure R 01 of entropic interpolations and the function m of W 1,+geodesics, is still under investigation.