A survey of the Schr\"odinger problem and some of its connections with optimal transport

This article is aimed at presenting the Schr\"odinger problem and some of its connections with optimal transport. We hope that it can be used as a basic user's guide to Schr\"odinger problem. We also give a survey of the related literature. In addition, some new results are proved.


Introduction
This article is aimed at presenting the Schrödinger problem and some of its connections with optimal transport. We hope that it can be used as a basic user's guide to Schrödinger problem. We also give a survey of the related literature. In addition, some new results are proved.
We denote by P(Y ) and M + (Y ) the sets of all probability and positive measures on a space Y.
In 1931, Schrödinger [Sch31,Sch32] addressed a problem which is translated in modern terms 1 as follows. Let X = R n or more generally a complete connected Riemannian manifold without boundary, Ω = C([0, 1], X ) be the space of all continuous X -valued paths on the unit time interval [0, 1] and denote R ∈ M + (Ω) the law of the reversible Brownian motion on X , i.e. the Brownian motion with the volume measure as its initial distribution. Remark that R is an unbounded measure on Ω whenever the manifold X is not compact. Define the relative entropy of any probability measure P with respect to R by H(P |R) = Ω log dP dR dP ∈ (−∞, ∞], P ∈ P(Ω) if P is absolutely continuous with respect to R and the above integral is meaningful, and H(P |R) = ∞ otherwise. A precise definition of the relative entropy with respect to an unbounded measure R is presented at the Appendix. The dynamic Schrödinger problem is H(P |R) → min; P ∈ P(Ω) : P 0 = µ 0 , P 1 = µ 1 , (S dyn ) where µ 0 , µ 1 ∈ P(X ) are prescribed values of the initial and final time marginals P 0 := P (X 0 ∈ ·) and P 1 := P (X 1 ∈ ·) of P. Here (X t ) 0≤t≤1 is the canonical process on Ω.
The disintegration formula (1.1) means that P shares its bridges with R, that is: P xy = R xy for almost all x, y, and that this mixture of bridges is governed by the unique solution π to the static Schrödinger problem (S). It also follows from (1.1) that the values of the dynamic and static problems are equal: inf (S dyn ) = inf (S).
The structure of problem (S) is similar to Monge-Kantorovich problem's one: c(x, y) π(dxdy) → min; π ∈ P(X 2 ) : π 0 = µ 0 , π 1 = µ 1 (MK) where c : X 2 → [0, ∞) represents the cost for transporting a unit mass from the initial location x to the final location y. Both are convex optimization problems, but unlike (S), the linear program (MK) might admit infinitely many solutions. Since (1.2) writes as R 01 (dxdy) ∝ exp − c(x, y) vol(dx)vol(dy) with c(x, y) = d 2 (x, y)/2, x, y ∈ X , it happens that the Schrödinger problem (S) is connected to the quadratic Monge-Kantorovich optimal transport problem (MK) which is specified by this quadratic cost function. The natural dynamic version of (MK) is Ω C dP → min; P ∈ P(Ω) : P 0 = µ 0 , P 1 = µ 1 (MK dyn ) with where we put C(ω) = ∞ when ω is not absolutely continuous. Let us comment on the choice of this dynamic version of (MK). For all x, y ∈ X , we have c(x, y) = inf{C(ω); ω ∈ Ω : ω 0 = x, ω 1 = y} (1.4) and this infimum is attained at the constant speed geodesic path γ xy between x and y, which is assumed to be unique for any (x, y), for simplicity. Therefore, the solutions of (MK) and (MK dyn ) are in one-one correspondence: • If P solves (MK dyn ), then P 01 := P ((X 0 , X 1 ) ∈ ·) solves (MK); • If π solves (MK), then the solution of (MK dyn ) is where δ a denotes the Dirac measure at a. Furthermore, we have equality of the values of the problems: inf (MK) = inf (MK dyn ) ∈ [0, ∞].
Again, the dynamic Schrödinger problem (S dyn ) and the dynamic Monge-Kantorovich problem (MK dyn ) are similar. Comparing their respective solutions (1.1) and (1.5), we see that the π's solve their respective static problem (S) and (MK), while for each (x, y) the bridge R xy ∈ P(Ω) in (1.1) plays the role of δ γ xy ∈ P(Ω) in (1.5).
All the notions pertaining to the Monge-Kantorovich optimal transport problems (MK) and (MK dyn ) which are going to be invoked below are discussed in great details in C. Villani's textbook [Vil09].
Based on these properties of the displacement interpolations, F. Otto [JKO98,Ott01] developed an informal theory aimed at considering the metric space (P 2 (X ), W 2 ) as a Riemannian manifold. This informal approach relies on the idea that, in view of the current equation (1.8)-(a), ∂ t µ |t=0 = −∇·(µ ∇ψ 0 ) is a candidate to be a tangent vector at µ 0 . Second order calculus necessitates to take also (1.8)-(b) into account.
The analogue of displacement interpolation exists with (MK dyn ) replaced by (S dyn ). This entropic interpolation also enjoys properties which are similar to (1.7) and (1.8).
They are discussed below.
The Monge-Kantorovich problem is a limit of Schrödinger problems. It is wellknown that taking R k to be the reversible Brownian motion with variance 1/k, i.e. the Markov measure associated to the Markov generator with the volume measure as its initial distribution, the bridges of R k converge: for each (x, y), we have lim k→∞ R k,xy = δ γ xy ∈ P(Ω) (1.10) with respect to the usual narrow topology σ(P(Ω), C b (Ω)). This result is an easy consequence of Schilder's theorem which is a large deviation result (as k tends to infinity) whose rate function is precisely the dynamic cost function C given at (1.3), see [DZ98]. In fact, the dynamic and static Monge-Kantorovich problems are respectively the Γlimits of sequences of dynamic and static Schrödinger problems associated to the sequence (R k ) k≥1 in M + (Ω) of reference path measures [Mik04,Léo12a]. More precisely (but still informally), considering the sequence of re-normalized Schrödinger problems H(P |R k )/k → min; P ∈ P(Ω) : P 0 = µ 0 , P 1 = µ 1 , and similarly the re-normalized static version satisfies Γ-lim k→∞ (S k ) = (MK). Recall that this implies that under some compactness requirements, the values converge: lim k→∞ inf (S k dyn ) = inf (MK dyn ) and any limit point of the sequence of minimizers ( P k ) k≥1 of (S k dyn ) solves (MK dyn ). A similar statement holds with the static problems. In particular, the time-marginal flow µ k t := P k t , t ∈ [0, 1], of the solution to (S k dyn ) converges as k tends to infinity to the displacement interpolation [µ 0 , µ 1 ] when (MK) admits a unique solution, for instance when both µ 0 and µ 1 are absolutely continuous. Denoting and calling the entropic interpolation with respect to the topology of uniform convergence on C([0, 1], P 2 (X )) where P 2 (X ) is equipped with W 2 .
The Schrödinger problem is a regular approximation of the Monge-Kantorovich problem. Now, we explain informally why in some sense, (S k dyn ) is a regularization of its limiting Monge-Kantorovich problem (MK dyn ). Unlike (MK dyn ), for each k ≥ 1, (S k dyn ) admits a unique solution P k ∈ P(Ω). It can be proved that P k is a Markov diffusion whose semigroup generator (A k t ) 0≤t≤1 is of the following form A k t = ∇ψ k t ·∇ + ∆/(2k), 0 ≤ t < 1, with ψ k the smooth function on [0, 1) × X which is the unique classical solution of the Hamilton-Jacobi-Bellman equation for some measurable function ψ k 1 designed for recovering 2 µ k 1 = µ 1 as the final distribution of the weak solution to which is the evolution equation of the entropic interpolation [µ 0 , µ 1 ] k of order k.
Remark that the current equation (1.8)-(a): to be compared with the second order operator A k t above. We see that, as a consequence of the smoothing and positivity-improving effects of the Laplace operator, the entropic interpolation of order k: [µ 0 , µ 1 ] k , is positive and regular on (0, 1) × X . This is in contrast with the limiting displacement interpolation [µ 0 , µ 1 ].
2 In fact, one can recover exactly µ 1 if it has a regular density. Otherwise, one can build a sequence µ k 1 such that lim k→∞ µ k 1 = µ 1 .
Extension of the framework. We have chosen R k to be attached to the Brownian motion, but taking R k to be any Markov measure on a Polish state space X satisfying a large deviation principle with some rate function C leads to limiting Monge-Kantorovich problems associated to alternate cost functions C and c which are still linked by the contraction formula (1.4). Such extensions based on continuous random paths are considered in [Léo12a]. Extensions where the reference measure R is a random walk on a discrete graph are investigated in [Léoa], see also Sections 4 and 5 below.
New results. Although this article is mainly a survey, we have obtained some new results. Theorem 2.9 recollects several sufficient conditions on the reference path measure R and the prescribed marginal measures µ 0 and µ 1 , for the unique solution P of (S dyn ) to admit the following product-shaped Radon-Nikodym derivative where f 0 and g 1 are positive measurable positive functions on X . The slight innovation is due to the possibility that R might have an infinite mass, e.g. the reversible Brownian motion on R n . Theorem 2.12 is a significant improvement of Theorem 2.9 in the special important case where R is assumed to be Markov. Under some additional requirement on R, it states that (1.13) holds where f 0 and g 1 may vanish on some sets. Proposition 2.10 simply states that, if R is Markov, then the solution P of (S dyn ) is also Markov. Although this is intuitively clear, the author couldn't find in the literature any proof of this result. Finally, the Benamou-Brenier type formulas that are stated at Propositions 4.1 and 4.2, are new results.
Outline of the paper. In Section 2, the dynamic and static Schrödinger problems are rigorously stated, their main properties of existence and uniqueness are discussed and the shape of their minimizers is described. This specific shape, given by (1.13), suggests to introduce at Section 3 the notion of (f, g)-transform of a Markov measure R which is a time-symmetric version of Doob's h-transform. In particular, the classical analogue of Born's formula, which was Schrödinger's motivation in [Sch31,Sch32], is derived at Theorem 3.4. Then we illustrate at Section 4 the general results of Sections 2 and 3. First, we revisit the case where R is the reversible Brownian motion. Then, we consider a discrete setting where the reference measure is a reversible random walk on a graph. At Section 5, we see that slowing the reference Markov process down to a complete absence of motion, is the right asymptotic to consider for recovering optimal transport from minimal entropy. Technically, this is expressed in terms of Γ-convergence results in the spirit of (1.11). In Section 6, by means of basic large deviation results, we present the motivation for addressing the entropy minimization problem (S dyn ). This leads us naturally to the lazy gas experiment, a starting point to the Lott-Sturm-Villani theory. Literature is discussed at Section 7.
Acknowledgements. Many thanks to Jean-Claude Zambrini for numerous fruitful discussions. The author also wishes to thank Toshio Mikami and a careful referee for pointing out a gap in the preliminary version of the article.

Schrödinger's problem
We begin fixing some notation and describing the general framework. Then, Schrödinger's problem is stated and its main properties are discussed in a general setting.
Path measures. Depending on the context, we denote by the same letter the set Ω = C([0, 1], X ) of all continuous paths from the unit time interval [0, 1] to the topological state space X , or Ω = D([0, 1], X ) the set of all càdlàg (right-continuous and left-limited) paths. We furnish X with its Borel σ-field and Ω with the canonical σ-field σ(X t ; 0 ≤ t ≤ 1) which is generated by the time projections The mapping X = (X t ) 0≤t≤1 : Ω → Ω which is the identity on Ω, is usually called the canonical process. We call a path measure, any positive measure Q ∈ M + (Ω) on Ω. Its time-marginals are the push-forward measures This means that for any Borel subset A ⊂ X , Q t (A) = Q(X t ∈ A). If Q describes the behaviour of the random path (X t ) 0≤t≤1 of some particle, then Q t describes the behaviour of the random position X t of the particle at time t. Remark that the flow (Q t ) 0≤t≤1 ∈ M + (X ) [0,1] contains less information than the path measure Q ∈ M + (Ω). In particular, (Q t ) 0≤t≤1 doesn't tell us anything about the correlations between two positions at different times s and t which are encoded in Q st := (X s , X t ) # Q ∈ M + (X 2 ). We shall be primarily concerned with the endpoint marginal measure meaning that for any Borel subsets B ⊂ X 2 , Q 01 (B) = Q((X 0 , X 1 ) ∈ B). We also denote Q xy = Q(· | X 0 = x, X 1 = y) ∈ P(Ω), the bridge of Q between x and y. For each Q ∈ M + (Ω), the disintegration formula (A.7) with φ = (X 0 , X 1 ) writes as follows: We assume that the topological state space X is a Polish (separable and complete metric) space and equip Ω = D([0, 1], X ) with the corresponding Skorokhod topology. It is wellknown [Bil68] that this topology turns Ω into a Polish space and that the corresponding Borel σ-field is precisely the canonical one: σ(X t ; 0 ≤ t ≤ 1). Moreover, in restriction to C([0, 1], X ), the Skorokod topology is the topology of uniform convergence which also turns C([0, 1], X ) into a Polish space. We still have the coincidence of the Borel σ-field and the canonical one. The path space Ω is furnished with this topology.
Why unbounded path measures. One may wonder why a random behaviour should be described by an unbounded measure rather than a probability measure. We have in mind as a particular but important application, the reversible Brownian motion on X = R n . It is the Brownian motion whose forward dynamics is driven by the heat semigroup as usual, but its random initial position X 0 is uniformly distributed on R n . Denoting R ∈ M + (Ω) the corresponding path measure on Ω = C([0, 1], R n ), R 0 (dx) = dx is the Lebesgue measure 3 3 Although this paper is not concerned with the interpretation of such a description, one should note that a "frequencist" interpretation fails unless one introduces an infinite system of independent particles initially distributed according to a Poisson point process with a uniform spatial frequency. An alternate information viewpoint is also relevant: the Lebesgue measure (or any of its positive multiples) is the less informative a priori measure for modelling our complete lack of knowledge about the initial position. Indeed, it is invariant under isometries and translations, and the entropic problems to be considered below are insensitive to homotheties (up to an additive constant). on R n and R(·) = X W x (·) dx where W x is the Wiener probability measure with initial marginal δ x . Clearly, R has the same infinite mass as R 0 . Similarly, the simple random walk on a countably infinite graph X admits an unbounded reversing measure R 0 so that the corresponding reversible simple random walk is described by an unbounded measure R ∈ M + (Ω) with Ω = D([0, 1], X ). Considering such reversible path measures R ∈ M + (Ω) as reference measures usually simplifies computations.
Relative entropy. Let r be some σ-finite positive measure on some space Y . The relative entropy of the probability measure p with respect to r is loosely defined by if p ≪ r and H(p|r) = ∞ otherwise. The rigorous definition of the relative entropy and its basic properties are recalled at the appendix section A.
Statement of Schrödinger's problem. The main data is a given reference path measure R ∈ M + (Ω). In this section any (non-zero) σ-finite path measure in M + (Ω) can serve as a reference measure. We first state a dynamic version (S dyn ) of Schrödinger's problem which is associated to R. Then, we define Schrödinger's problem (S) as a static projection of (S dyn ) and the connections between the solutions of (S) and (S dyn ) are described at Proposition 2.3.
Considering the projection R 01 = (X 0 , X 1 ) # R ∈ M + (X 2 ) of R on the product space X 2 as a reference measure, leads us to Schrödinger's (static) problem.
These optimization problems are highly connected. This is the content of next proposition.
Let us particularize the consequences of the additive property formula (A.8) to r = R, p = P and φ = (X 0 , X 1 ). We have for all P ∈ P(Ω), which implies that H(P 01 |R 01 ) ≤ H(P |R) with equality (when H(P |R) < ∞) if and only if P xy = R xy for P 01 -almost every (x, y) ∈ X 2 , see (A.9) and (A.10). Note that this additive property formula is available since both X 2 and Ω are Polish spaces. Therefore P is the (unique) solution of (S dyn ) if and only if it disintegrates as (2.2).
Existence results. We present below at Proposition 2.5 a simple criterion for (S) and (S dyn ) to have a solution. We first need a preliminary result.
and take µ 0 , µ 1 ∈ P(X ) such that The static and dynamic Schrödinger problems (S) and (S dyn ) admit a (unique) solution if and only if inf (S dyn ) = inf (S) < ∞ or equivalently if and only if the prescribed marginals µ 0 and µ 1 are such that there exists some π o ∈ P(X 2 ) such that π o 0 = µ 0 , π o 1 = µ 1 and H(π o |R 01 ) < ∞. (2.5) Proof. The first identity comes from the proof of Proposition 2.3. Since X is Polish, the probability measures µ 0 and µ 1 are tight measures on X and it follows with the Prokhorov criterion on X 2 that the closed constraint set Π(µ 0 , µ 1 ) := {π ∈ P(X 2 ) : π 0 = µ 0 , π 1 = µ 1 } is uniformly tight and therefore compact in P(X 2 ).

Taking (2.3) into account, (A.3) and (A.4) give us H(π|R
. In restriction to Π(µ 0 , µ 1 ), we obtain Together with (2.4), this implies that H(·|R 01 ) is lower bounded and lower semi-continuous on the compact set Π(µ 0 , µ 1 ). Hence, (S) admits a solution if and only inf (S) < ∞. We already remarked that (S dyn ) has a solution if and only (S) has a solution that is: inf (S) < ∞, or equivalently if and only (2.5) is satisfied.
Proposition 2.5. Suppose that R 0 = R 1 = m ∈ M + (X ) (this is satisfied in particular when R is reversible with m as its reversing measure).
Remark that for (v) to be satisfied, it is necessary that m is a bounded measure.
Let us look at statement (c). With the variational representation formula (A.5), one sees that (iv) and (v) imply (iii).
The dual problem. Take a measurable function B : Define C B (X ) to be the space of all continuous functions u : X → R such that sup |u|/B < ∞ and P B (X ) := µ ∈ P(X ); X B dµ < ∞ . Based on the variational representation of the relative entropy (A.6) and on the observation that for each π ∈ P(X 2 ), (the space of all bounded continuous numerical functions on X ), it can be proved that a dual problem to the Schrödinger problem (S) is where ϕ ⊕ ψ : (x, y) ∈ X 2 → ϕ(x) + ψ(y) ∈ R and it is assumed that the prescribed marginals satisfy µ 0 , µ 1 ∈ P B (X ). In particular, the dual equality inf (S) = sup (D) ∈ (−∞, ∞] is satisfied. This is proved, for instance, in [Léo01a] when the reference measure is a probability measure. In the general case, take (2.6) into account to get back to a reference measure with a finite mass. Of course, there is no reason for the dual attainment to hold in general in a space of regular functions such as C B (X ) 2 . Suppose however that µ 0 and µ 1 are such that (D) is attained at (φ,ψ). Then, the dual equality: X 2φ ⊕ψ d π − log X 2 eφ ⊕ψ dR 01 = H( π|R 01 ) and the case of equality in (A.6) lead us to at least when d π/dR 01 > 0. The shape of the minimizer π of (S) will be discussed further in next subsection. Similarly, a dual problem to the dynamic version (S dyn ) of (S) is We observe that (D)=(D dyn ).
Some properties of the minimizer π of (S). We give some details about the structure of the unique minimizer π of (S) which is assumed to exist; for instance under the hypotheses of Proposition 2.5.
It is proved in [Léo01b, Thm. 5.1 & (5.9)] 4 that there exist two functions ϕ, ψ : X → R such that It is tempting to write that π has the following shape where f = e ϕ and g = e ψ are such that the marginal constraints (2.8) are satisfied. This was already suggested by (2.7). But this is not allowed in the general case. Indeed, two obstacles have to be avoided. Some comments are necessary.
Obstacle (i). Firstly, the identity (i) is only valid π-almost everywhere and it might happen that it doesn't hold true R 01 -almost everywhere. Otherwise stated, there exists some measurable subset S ⊂ X 2 such that: Obstacle (ii). Secondly, statement (ii) does not imply that ϕ and ψ are respectively R 0 and R 1 -measurable on X . Only the tensor sum ϕ ⊕ ψ is R 01 -measurable on the product space X 2 . Hence, one is not allowed to consider the conditional expectations in (2.8).
Avoiding obstacle (i). To avoid the obstacle (i), it is enough to slightly modify the prescribed marginals µ 0 and µ 1 as follows. It is shown in [Léo01b] that (i) is satisfied R 01 -a.e. (rather than π-a.e.) if (µ 0 , µ 1 ) is in the intrinsic core: icor C, of the set of all admissible constraints C := {(µ 0 , µ 1 ) ∈ P(X ) 2 ; inf (S) (µ 0 ,µ 1 ) < ∞}. Recall that for any convex set C, its intrinsic core is defined as It is also shown in [Léo01b] that C = {Λ * < ∞} where Λ * is the convex conjugate of the extended real valued function Λ which is defined for any measurable functions ϕ, ψ by Λ(ϕ, ψ) = log X 2 e ϕ⊕ψ dR 01 ∈ (−∞, ∞]. Therefore C is a convex subset of P(X ) 2 . In particular, considering , we observe that for any admissible (µ 0 , µ 1 ) ∈ C, for any arbitrarily small ǫ > 0, (µ ǫ 0 , µ ǫ 1 ) ∈ icor C. Therefore, (µ ǫ 0 , µ ǫ 1 ) is arbitrarily close to (µ 0 , µ 1 ) in total variation norm and the corresponding solution π ǫ of (S) (µ ǫ for some functions ϕ ǫ and ψ ǫ such that ϕ ǫ ⊕ ψ ǫ is jointly R 01 -measurable. 4 The assumptions of [Léo01b] require that R 01 is a probability measure. In the general case where R 01 is unbounded, use (2.6) to go back to the unit mass setting. 5 In the important special case where R is a probability measure, just take w = 0 in order that Proposition 2.6. We say that the constraint (µ 0 , µ 1 ) is internal if it is in the intrinsic core of the set of all admissible constraints: (µ 0 , µ 1 ) ∈ icor C. In this case, we have for some jointly R 01 -measurable function ϕ ⊕ ψ on X 2 .
Summing up. Putting Propositions 2.5, 2.6 and the above considerations together with (2.8), we obtain the following and B appear at (ii) and (iii) above; (vii) (µ 0 , µ 1 ) is internal, see the statement of Proposition 2.6. This is the case for instance when m is a probability measure and µ 0 , µ 1 ≥ ǫm, for some ǫ > 0.
Then, (S) admits a unique solution π and where the positive functions f 0 and g 1 are m-measurable and solve : which is called the Schrödinger system 6 .
• It is not necessary for ] to be well-defined that f 0 (X 0 ) and g 1 (X 1 ) are R-integrable, since f 0 and g 1 are positive measurable functions. Only a notion of integration of nonnegative functions is required, see [Léoc]. • The assumption (vii) is here to make sure that d π/dR 01 > 0. If it is not satisfied, d π/dR 01 may not have a product form and its structure may be quite complex.
The complete description of d π/dR 01 in this case is given in [Léo01b].
• In view of (2.9), for the assumption (vii) to hold, it is enough that for some ǫ > 0, and we can choose w = 0 when m is a probability measure.
The solution P of (S dyn ). We deduce from this theorem the characterization of P .
Theorem 2.9. Suppose that the hypotheses of Theorem 2.8 are satisfied. Then, (S dyn ) admits the unique solution where f 0 and g 1 are the measurable positive functions which appear at (2.10) and solve (2.11).
Proof. The existence of the solution P and its representation by the Radon-Nikodym formula (2.12) are direct consequences of Proposition 2.3, Theorem 2.8, (2.2) and (2.10).
The special case where R is Markov. We are going to assume that the reference path measure R is Markov. Under this restriction, we obtain at Theorem 2.12 below a more efficient version of Theorem 2.8. Let us recall the time-symmetric definition of the Markov property.
Markov property. One says that signifying that under R, for any t, conditionally on the present state X t at time t, past and future are independent. This is equivalent to the usual forward time-oriented Markov property Proposition 2.10. Suppose that the reference measure R ∈ M + (Ω) is Markov. If it exists, the solution P of (S dyn ) is also Markov.
Proof. We need some notation.
Accept this claim for a while and suppose, ad absurdum, that P is not Markov. Then, there exists some 1] in the above claim, we see that the time marginals are unchanged: P * s = P s , for all s ∈ [0, 1] and that H(P * |R) < H( P |R) : P is not the solution to (S dyn ), a contradiction.
It remains to prove the claim. With (A.8), we see that and e. Since this holds for µ-almost every z, this amounts to say that P (·) = X Q tz < ⊗ Q tz > (·) P t (dz) and it also means that e. which is the desired forward Markov property at time t. This completes the proofs of the claim and the proposition.

Reversibility.
A path measure R ∈ M + (Ω) is said to be reversible with m ∈ M + (X ) as its reversing measure (m-reversible for short), if R 0 = m and for any 0 ≤ u ≤ v ≤ 1, R is invariant with respect to the time reversal mapping rev uv defined by: rev uv Clearly, this implies that R u = R v for any u, v. In other words, R is m-stationary i.e. R t = m, for all 0 ≤ t ≤ 1.
This notion is invoked at statement (c) of the following result.
Suppose that the reference measure R ∈ M + (Ω) satisfies Suppose also that the constraint (µ 0 , µ 1 ) satisfies (a) Then, the unique solution P of (S dyn ) is also Markov and where f 0 and g 1 are the m-measurable nonnegative functions which appear at (2.10) and solve the Schrödinger system (2.11). (b) Conversely, let P be defined by (2.13) with f 0 and g 1 two m-measurable nonnegative functions solving the Schrödinger system (2.11). Then, P is Markov and it is the unique solution of (S dyn ).

(c) For the properties (i), (ii) and (iii) to hold, it is enough that R is a m-reversible
Markov measure which admits a regenerative set in the following sense: There exists a measurable subset Proof. • Proof of (a). By Proposition 2.5, the properties (iii)-(vii) assure the existence of the unique solution P of (S dyn ). With Proposition 2.3, we have P xy = R xy , for all (x, y), π-a.e. This means that On the other hand, we have just seen at Proposition 2.10 that P is Markov. But, it is proved in [LRZ] that under the assumptions (i) and (ii), if P ∈ P(Ω) is a Markov measure such that dP/dR = h(X 0 , X 1 ) for some measurable function h, then there exist two measurable nonnegative functions f and g such that P = f (X 0 )g(X 1 ) R. This proves statement (a).
• Proof of (b). The fact that P is the solution of (S dyn ) is proved in [Csi75] by a geometric approach or in [Léo01b] by a functional analytic approach. We easily see that P inherits the Markov property of R using the time-symmetric definition of the Markov property together with the product shape of (2.13).
• Proof of (c). Statement (c) is an easy exercise.
Remark that, unlike Theorem 2.8, Theorem 2.12 does not require that the constraint (µ 0 , µ 1 ) is internal. Also remark that the functions f 0 and g 1 are nonnegative (in contrast with Theorem 2.8 where they are positive) and that it may happen that R 0 (f 0 = 0) or R 1 (g 1 = 0) is positive.
Theorem 2.12 extends a similar result by Föllmer and Gantert in [FG97], where it is required for the product shape formula (2.13) to hold, that R ≪ P and also that X o has full measure.

(f, g)-transform of a Markov measure
Motivated by Theorems 2.9 and 2.12, we introduce the transform f 0 (X 0 )g 1 (X 1 ) R of a Markov measure R and call it an (f, g)-transform. It was already noticed by Föllmer [Föl88,FG97] and Nagasawa [Nag89], that it is a time symmetric version of Doob's usual h-transform [Doo57,Doo00].
We are going to assume for simplicity that the reference path measure R is reversible. Let us recall the definition of this notion.
(f, g)-transform of a reversible Markov measure. Let us first state an assumption which will hold for the remainder of the paper. The Markov property of the reference measure will turn out to be crucial for the description of the dynamics of the solution P of (S dyn ). Indeed, we have already seen at Proposition 2.10 that P inherits the Markov property from R. It follows that its dynamics is characterized by its stochastic derivatives. On the other hand, reversibility is only assumed for simplicity.
is called an (f, g)-transform of R.
This definition is motivated by Theorems 2.9 and 2.12 which assert that the solution P of (S dyn ) is an (f, g)-transform of R. Note that under Theorem 2.9's assumptions, f 0 and g 1 are positive, while they are allowed to vanish under Theorem 2.12's assumptions, as in Definition 3.2.
Let us introduce for each t ∈ [0, 1], the functions f t , g t : X → [0, ∞) defined by Remark that although we have E R (f 0 (X 0 )g 1 (X 1 )) < ∞, this does not ensure that f 0 (X 0 ) and g 1 (X 1 ) are integrable. We have to use positive integration to give a meaning to the [Léoc]. Next result is a kind of converse of Theorems 2.9 and 2.12.
Theorem 3.3. If the functions f 0 and g 1 entering the definition of the (f, g)-transform P of R given at (3.1) satisfy X g 0 f 0 log f 0 dm < ∞ X f 1 g 1 log g 1 dm < ∞ (as a convention 0 log 0 = 0), then P 01 and P are the unique solutions of (S) and (S dyn ) respectively, where the prescribed constraints µ 0 , µ 1 ∈ P(X ) are chosen to satisfy (2.11), i.e. using notation (3.2) µ 0 = f 0 g 0 m µ 1 = f 1 g 1 m . (3.3) Proof. Note that there may exist solutions of (S dyn ) which are not (f, g)-transforms of R. This happens when the support of the solution is not a rectangle (i.e. the product of Borel subsets), see [FG97,§2] or [Léo01b,§5]. Next result extends the product formulas (3.3) to all t ∈ [0, 1].
Theorem 3.4 (Euclidean analogue of Born's formula). The path measure P = f 0 (X 0 )g 1 (X 1 ) R is Markov and for each 0 ≤ t ≤ 1, its time marginal P t ∈ P(X ) is given by (3.4) Remark 3.5. It follows with (3.4) that for all t ., but not R-a.e. in general.
Proof of Theorem 3.4. The Markov property of P is a direct consequence of Theorem 3.3 and Proposition 2.10. We propose an alternate simple proof. To prove that P is Markov, we show that for each 0 ≤ t ≤ 1 and any bounded measurable functions a ∈ σ( Indeed, we have which is the desired result. Equality (i) is a general result about conditioning; note that we do not divide by zero P -a.s. Equality (ii) uses crucially the assumed Markov property of R and one obtains (iii) by considering separately the cases when b ≡ 1 and a ≡ 1 in the just obtained identity to see that where we used the Markov property of R at the marked equality.
Forward and backward generators. Let Q ∈ M + (Ω) be a Markov measure. Its forward stochastic derivative ∂ + − → L Q is defined by for any measurable function u : [0, 1] × X → R in the set dom − → L Q for which this limit exists Q t -a.e. for all 0 ≤ t < 1. In fact this definition is only approximate, we give it here as a support for understanding the relations between the forward and backward generators. For a precise statement see [Léod,§2]. Since the time reversed Q * of Q is still Markov, Q admits a backward stochastic derivative −∂ + ← − L Q which is defined by It is proved in [Léod,§2] that these stochastic derivatives are extensions of the extended forward and backward generators of Q in the sense of semimartingales. In particular, they offer us a natural way for computing generators. Later on, we shall call − → L Q and ← − L Q generators, rather than stochastic derivatives.
For simplicity, we denote − → L R = ← − L R = L without the superscript R and without the time arrows, since R is assumed to be reversible. We also write − → A = − → L P and ← − A = ← − L P the generators of the (f 0 , g 1 )-transform P defined by (3.1).
Stochastic derivatives have been introduced by E. Nelson in [Nel67] while studying the dynamical properties of the Brownian motion. The above definition (more precisely the one of [Léod]), which is an extension of Nelson's one, is necessary for technical reasons. In general, the forward and backward generators ( of P depend explicitly on t. The following informal statement is known for long in specific situations. In the important examples which are discussed below at Section 4 below, these claims are easy consequences of Itô's formula (for instance see [RY99, Ch. 8, § 3] in the continuous diffusion case).
Informal statement 3.6. Under some hypotheses on R, the forward and backward generators of P are given for any function u : [0, 1] × X → R belonging to some class U R of regular functions, by where f t , g t are defined at (3.2). Because of (3.4), for any t no division by zero occurs P t -a.e.
Rigorous statement and proof are given in [Léod] for instance.
Idea of proof of Statement 3.6. To obtain the forward generator of P , we are going to compute the stochastic derivative where the Markov property of R is used at second identity. But, where the first equality is a martingale identity. We conclude by means of the definition of L: , and with the following identity One sees that it is necessary that the functions f t and g t are regular enough to be in the domains of the carré du champ operators. For instance, choosing f 0 , g 1 ∈ dom L insures that f ∈ dom(−∂ + L) and g ∈ dom(∂ + L) and also that f and g are classical solutions of the following parabolic PDEs Even better, since R is assumed to be m-reversible, its Markov generator L is self-adjoint on L 2 (m) and for any f 0 , g 1 in L 2 (m) we have f ∈ dom(−∂ + L) and g ∈ dom(∂ + L).
It is worthwhile describing the dynamics (3.5) in terms of ϕ := log f, ψ := log g. (3.7) Remark that because of (3.4), for any 0 ≤ t ≤ 1, ϕ t and ψ t are well defined P t -a.e. In analogy with the Kantorovich potentials which appear in the optimal transport theory, we call ϕ and ψ the Schrödinger potentials. They are solutions of the "second order" 8 Hamilton-Jacobi equations and (∂ t + B)ψ(t, x) = 0, 0 ≤ t < 1, P t -a.e. ψ 1 = log g 1 , t = 1, (3.9) where the non-linear operator B is defined by Bu := e −u Le u for any function u such that e u ∈ dom L.

Standard examples
We present two well-known reference processes: the reversible Brownian motion and a reversible random walk on a graph. We also apply the above general results to these important examples.
Reversible Brownian motion. The reversible Brownian motion R on X = R n is specified by L = ∆/2, R 0 (dx) = m(dx) = dx (4.1) where the Markov generator L = ∆/2 is defined on C 2 (R n ). It is easily checked that R is m-reversible.
Let P = f 0 (X 0 )g 1 (X 1 ) R ∈ P(Ω) be any (f, g)-transform of R. By the regularity improving property of the heat kernel, (f 0 , g 1 ) is such that f ∈ dom(−∂ t + ∆/2) for t ∈ (0, 1] and g ∈ dom(∂ t + ∆/2) for t ∈ [0, 1). We have Bu = ∆u/2 + |∇u| 2 /2, Γ(u, v) = ∇u · ∇v for any u, v ∈ C 2 (X ). The expressions (3.5) of the forward and backward generators tell us that the density µ t (x) := dP t /dx solves the following parabolic PDEs where ψ solves (3.9): and in the reversed sense of time where ϕ solves (3.8): It is important to note the smoothing effect of the semigroup of R which allows us to define the classical gradients ∇ψ t and ∇ϕ t for all t in [0, 1) and (0, 1] respectively. They are the forward and backward drift vector fields of the canonical process under P. Also recall that, as a direct consequence of (3.4), no logarithm of zero is taken, P t -almost surely, i.e. almost everywhere, and we have the time-reversal formula ∇ψ t + ∇ϕ t = ∇ log µ t , 0 < t < 1.
Back to Schrödinger problem. Under the assumption of Theorem 3.3, P = f 0 (X 0 )g 1 (X 1 ) R solves (S dyn ) with the prescribed marginals given at (3.3). It is a (f, g)-transform of R, and we have just seen that there exist ϕ and ψ such that its forward and backward generators are given by (4.2).
Reversible random walk on a graph. Let R be a random walk on a countable connected graph (X , ∼) where x ∼ y means that x and y are next neighbours. Its generator is given for any finitely supported function u by where J x (y) > 0 for all x ∼ y is interpreted as the average frequency of jumps from x to y. Its dual formulation is the current equation where ρ t (x) := R(X t = x). Clearly, the path measure R admits the stationary measure m ∈ M + (X ) if and only if y:y∼x [m y J y (x) − m x J x (y)] = 0, for all x ∈ X . It admits m as a reversing measure if this global equilibrium property is reinforced into the following detailed one: m x J x (y) = m y J y (x), ∀x, y ∈ X , x ∼ y. The special case where the jump measure is the number of neighbours of x which is assumed to be finite for all x, corresponds to the simple random walk on the graph (X , ∼). It is easily checked that its reversing measure is m o = x∈X n x δ x ∈ M + (X ) which is unbounded when X is infinite. For simplicity we assume in the general case that where J x (X ) := y:y∼x J x (y) is the global average frequency of jumps at x. This ensures that for any initial marginal R 0 ∈ M + (X ), there exists a Markov measure R ∈ M + (Ω) with generator L. Moreover, any bounded function u : X → R is in the domain of L.
otherwise and the functions as column vectors indexed by X , we observe that the solutions of the heat equations (3.6) are f t = e tL f 0 and g t = e (1−t)L g 1 , 0 ≤ t ≤ 1.
(4.11) It follows that for any couple (f 0 , g 1 ) of nonnegative functions and any 0 ≤ t ≤ 1, f t ∈ dom(−∂ t + L) and g t ∈ dom(∂ t + L). We have Bu(x) = Lu(x) + y:y∼x θ(u(y) − u(x)) J x (y) with θ(a) := e a − a − 1, a ∈ R, (4.12) ] J x (y), for any bounded functions u, v and any x ∈ X . Let P = f 0 (X 0 )g 1 (X 1 ) R be any (f, g)-transform of the random walk R. Applying (3.5), the forward and backward generators of P turn out to be the jump generators associated respectively with the jump measures Again, no division by zero occurs P t -a.e. for every 0 ≤ t ≤ 1, i.e. everywhere for each 0 < t < 1. The functions f and g satisfy (4.11) and the Schrödinger potentials ψ and ϕ satisfy (3.9): and (3.8): Minimal action. Let θ * be the convex conjugate of θ defined at (4.12), i.e.
Consider the problem of minimizing among all (ν, j) where ν = (ν t ) 0≤t≤1 is a measurable path in P(X ), j : is a measurable nonnegative function and the following constraints are satisfied: Proposition 4.2. Let µ 0 , µ 1 ∈ P(R n ) be such that inf (S) < ∞, for instance when the assumptions of Proposition 2.5 are satisfied. The unique solution to the minimal action problem (4.14) and P ∈ P(Ω) is the unique solution of (S dyn ), and with g 1 a solution of (2.11). Moreover, ψ is the unique classical solution of the Hamilton-Jacobi-Bellman equation (4.13) and inf H(π|R 01 ); π ∈ P(X 2 ) : where R x 1 := R(X 1 ∈ ·|X 0 = x) ∈ P(X ) and (π x ) x∈X ∈ P(X ) X is any measurable Markov kernel.
looks like a discrete logarithmic derivative which is analogous to ∇ψ t (x) = ∇ log g t (x).
Proof. The proof follows the same line as Proposition 4.1's one.

Slowing down
In this section, we describe an efficient way to recover optimal transport as a limit of Schrödinger problems. The main idea consists in slowing the reference process down to a no-motion process. In the following lines, we present some heuristics and refer the reader to [Léo12a] for a rigorous treatment in the case where X is a real vector space and [Léoa] in the alternate case where X is a discrete graph. The specific case of the reversible Brownian motion has been investigated by T. Mikami in [Mik04] with a stochastic control approach which differs from what is presented below.
Let R be Markov with generator L. The slowed down process is represented by the sequence (R k ) k≥1 in M + (Ω) of Markov measures associated with the generators Remark that slowing the process down doesn't modify its reversible measure m; one converges more slowly towards the same equilibrium. Suppose also that the sequence (R k ) k≥1 in M + (Ω) obeys the large deviation principle in Ω with speed α k and rate function C, meaning approximately that for a "large class" of measurable subsets A of Ω, we have For instance, in the case (4.1) when R is the reversible Brownian motion on R n , Schilder's theorem tells us that C is the kinetic action (1.3) and α k = k. In the case (4.10) when R is a reversible random walk, it is proved in [Léoa] that α k = log k and the rate function is where in this situation Ω is the space of all right-continuous paths with finitely many jumps. Remark that C is simply the total number of jumps of the path. At a heuristic level, the Γ-convergence of H(P |R k )/α k → min; P ∈ P(Ω) : P 0 = µ 0 , P 1 = µ 1 (S k dyn ) as k tends to infinity to Ω C dP → min; P ∈ P(Ω) : is best seen with the dual problems. Without getting into the details, to show this Γ-convergence, it is enough to prove that the objective functions of the dual problems converge pointwise, see [Léo12a] for the details. Let us check this pointwise convergence.
(2) In the case of a random walk on a graph (4.10), we have: The diffusion case is treated in details in [Léo12a]. In the specific Brownian case (1), the Schrödinger problem converges to the quadratic Monge-Kantorovich problem and, as already remarked at (1.10), the bridges converge as follows: lim k→∞ R k,xy = δ γ xy ∈ P(Ω) where γ xy t = (1 − t)x + ty, 0 ≤ t ≤ 1 is the constant speed geodesic path between x and y. In case (MK dyn ) has a unique solution P , we also have lim k→∞ P k = P ∈ P(Ω).
Since the rigorous version of the Informal Statement 5.1 is simpler to state in the second case (2) of a random walk on a graph, we refer the reader to [Léo12a] for the details about (1) and we restrict our attention to (2). In this random walk case, the rigorous version of the Informal Statement 5.1 is stated below at Theorem 5.2. Some preparation is needed. In particular, let us recall basic facts about Γ-convergence. Γ-convergence. Recall that Γ-lim k→∞ f k = f on the metric space Y if and only if for any y ∈ Y, (a) lim inf k→∞ f k (y k ) ≥ f (y) for any convergent sequence y k → y, (b) lim k→∞ f k (y o k ) = f (y) for some sequence y o k → y. A function f is said to be coercive if for any a ≥ inf f, {f ≤ a} is a compact set.
The sequence (f k ) k≥1 is said to be equi-coercive if for any real a, there exists some compact set K a such that ∪ k f k ≤ a ⊂ K a . If in addition to Γ-lim k→∞ f k = f , the sequence (f k ) k≥1 is equi-coercive, then: For more details about Γ-convergence, see [DM93] for instance.
Theorem 5.2 ( [Léoa]). Assume that the random walk R and the prescribed marginal measures µ 0 , µ 1 ∈ P(X ) satisfy the hypotheses of Theorem 2.12. For each k ≥ 2, let P k ∈ P(Ω) and π k ∈ P(X 2 ) be the respective solutions of (S k dyn ) and (S k ).
In particular, lim k→∞ inf (S k dyn ) = inf (MK dyn ) and any limit point of ( P k ) k≥2 is a solution of (MK dyn ). A more careful study allows to show that there is a unique limit point P , so that lim k→∞ P k = P ∈ P(Ω), and that P is the only solution of the auxiliary entropy minimization ( S dyn ) which is stated below.
In particular, lim k→∞ inf (S k ) = inf (MK) and any limit point of ( π k ) k≥2 is a solution of (MK) which is the Monge-Kantorovich problem associated with the metric cost c = d ∼ : the usual graph distance . Furthermore, this sequence admits the unique limit point P 01 , so that lim k→∞ π k = P 01 ∈ P(X 2 ).
It is also proved in [Léoa] that for any distinct x, y ∈ X , lim k→∞ R k,xy = R xy ∈ P(Ω) where R xy is the (x, y)-bridge of with G the set of all geodesic paths on (X , d ∼ ). Remark that the set G xy of all geodesic paths between any two distinct states x and y is infinite since the instants of jump are not specified: only the ordered enumeration of the visited states is relevant. Let us denote M(µ 0 , µ 1 ) ⊂ P(X 2 ) the set of all solutions of the Monge-Kantorovich problem (MK) with c = d ∼ , and introduce the subsequent auxiliary entropic minimization problem The set of all solutions of (MK dyn ) consists of all P ∈ P(Ω) concentrated on G, i.e. P (G) = 1, and such that the endpoint marginal P 01 ∈ P(X 2 ) solves (MK). Although (MK dyn ) has always infinitely many solutions (for any distinct x, y, G xy is infinite), the sequence of Schrödinger problems (S k dyn ) selects a unique limit point: lim where P is the unique solution of ( S dyn ). We obtain the corresponding results about the static problems (S k ) and (MK) by considering the push-forward mapping P ∈ P(Ω) → P 01 ∈ P(X 2 ).

The statistical physics motivation of Schrödinger's problem
We consider a large number n of independent (non-interacting) moving random particles in the state space X . They are described by the independent stochastic processes Y 1 , . . . , Y n taking their random values in Ω with the laws where R ∈ M + (Ω) is a path measure and y i 0 ∈ X is the deterministic initial position of the i-th particle. It is also assumed that the particles are indistinguishable. Therefore, one doesn't loose information considering the empirical probability measure which is a random element of P(Ω). At each time t, the empirical measure of the particle system is the following random element of P(X ), Suppose that the initial positions are close to a profile µ 0 ∈ P(X ), i.e.
with respect to the narrow topology σ(P(X ), C b (X )). The law of large numbers tells us that, as n tends to infinity, L n converges in law to the deterministic limit µ 0 R := X R(· | X 0 = x) µ 0 (dx) in P(Ω) and in particular that at time t = 1, Schrödinger addressed the following problem. Suppose that at the final time t = 1, you observe the system in a profile L n 1 far away from the expected profile ( µ 0 R) 1 : for all large enough n, L n 1 is in a very small neighbourhood of some µ 1 ∈ P(X ) which doesn't contain ( µ 0 R) 1 . This may happen since n is finite, but this is a very rare event, i.e. with an exponentially small probability, see (6.5) below. Nevertheless, conditionally on this rare event, what is the most likely dynamical behaviour of the whole random system described by L n ? Before stating this rigorously at Problem 6.1 below, take a metric d on P(X ) compatible with the narrow topology and denote B(µ, ǫ) = {ν ∈ P(X ); d(ν, µ) < ǫ} the open ball centred at µ with radius ǫ > 0.
Solving the problem without getting into details. Schrödinger's approximate proof contains the main ideas. It is based on a statistical physics approach. The main tool for obtaining the limiting behavior as n tends to infinity of the combinatoric terms is Stirling's formula. As pointed out by Föllmer in [Föl88], its modern counterpart, which is available in a much more general setting, is Sanov's theorem.
Informal statement 6.2 (Informal statement of Sanov's theorem). Let Y 1 , . . . , Y n , . . . be a sequence of independent identically distributed Ω-valued random variables with common law R ∈ P(Ω) 9 . Define L n := 1 n n i=1 δ Y i its empirical measure. Then, for a "large class" of measurable subsets A of P(Ω), we have The rigorous statement of this result is in terms of a large deviation principle. It is valid for a general class of spaces Ω, not necessarily a path space. For a comprehensive introduction to the theory of large deviations including Sanov's theorem, a good textbook is [DZ98]. One says that H(·|R) is the rate function of the large deviations of {L n } n≥1 as n tends to infinity.
Idea of proof (a hint to agree with this statement). We consider informally the situation where Ω is replaced by a three-point set. Take Ω = {a, b, c}, R = αδ a + βδ b + γδ c and P = pδ a + qδ b + rδ c with α, β, γ, p, q, r > 0 and α + β + γ = p + q + r = 1. Then, where we used Stirling's formula: k! ≈ exp[k log k − k] as k tends to infinity.
This hint is very much in the spirit of Schrödinger's derivation in [Sch32].
Since P → H(P |R) ∈ [0, ∞] is strictly convex and H(P |R) = 0 if and only if P = R, see (A.2), one observes that if R ∈ A, (6.3) leads to the law of large numbers: lim n→∞ L n = R, with an exponential rate of convergence.
We need a slight modification of Sanov's theorem.
Note that this problem admits a unique solution since H(·|R) is a strictly convex function on the convex set C ǫ . Existence is obtained as usual showing that H(·|R) has compact sublevel sets. Finally, as ǫ decreases to zero, C ǫ decreases to C = {P ∈ P(Ω); P 0 = µ 0 , P 1 = µ 1 } and the objective functions of the minimization problems on C ǫ : H(P |R) + ι Cǫ (P ) where ι Cǫ (P ) = 0 if P ∈ C ǫ ∞ otherwise increase towards H(P |R)+ι C (P ). Together with some compactness, this monotonicity allows to prove easily that lim ǫ→0 P ǫ = P where P is the unique solution to the limiting minimization problem: Therefore, we have informally obtained the answer to Schrödinger's question.
Informal statement 6.4 (The answer to Schrödinger's question). The limit (6.2) is lim ǫ↓0 lim n→∞ Prob(L n ∈ ·|L n 1 ∈ B(µ 1 , ǫ)) = δ P ∈ P(P(Ω)) where P is the unique solution to the entropy minimization problem Loosely speaking, this means that conditionally on L n 0 ≈ µ 0 and L n 1 ≈ µ 1 , the whole system L n tends in law as n tends to infinity towards P . In fact, the rigorous proof of this theorem [Léo10,Thm. 7.3] uses large deviation principles and shows that this convergence is exponentially fast. Therefore, Borel-Cantelli lemma allows us to state an almost sure version of this conditional law of large numbers.
The same line of reasoning leads to the following evaluation of the probability that the system evolves spontaneously from the prepared initial profile µ 0 to the unexpected profile final profile µ 1 : These considerations show that solving Schrödinger's problem amounts to solve the convex minimization problem (S dyn ) which, in statistical physics, enters the class of Boltzmann-Gibbs conditioning principles.
For a variation on this theme, with killed particles, see [DGW90].
The lazy gas experiment. In his textbook [Vil09,, C. Villani writes in a section entitled "A fluid mechanics feeling for Ricci curvature -The lazy gas experiment", the following sentences. Take a perfect gas in which particles do not interact, and ask him to move from a certain prescribed density field at time t = 0, to another prescribed density field at time t = 1. Since the gas is lazy, he will find a way to do so that needs a minimal amount of work (least action principle). Measure the entropy 11 of the gas at each time, and check that it always lies above the line joining the final and initial entropies. If such is the case, then we know that we live in a nonnegatively curved space.
Schrödinger problem suggests a slight (more realistic :-) variant of this thought experiment where displacement interpolations are replaced with entropic interpolations, see (1.12). This is really a lazy gas experiment, while in some sense, the above mentioned lazy gas experiment in [Vil09] is a very lazy gas experiment. Indeed, in the displacement interpolation setting, not only the particles need to find a cooperative lazy behaviour (the transport mapping x → y) but also each individual particle must find an economic way 11 Here, the entropy is standard Boltzmann's one: p → −H(p|vol), which is a concave function. suivante : 16 "The whole interpretation is very obscure, but it seems to depend on wether you are considering the probability after you know what has happened or the probability for the purposes of prediction. The ψψ is obtained by introducing two symmetrical systems of ψ waves travelling in opposite directions in time; one of these must presumably correspond to probable inference from what is known (or is stated) to have been the condition at a later time." In 1931, wave mechanics is newly born and many physicists are puzzled by its possible interpretations. Based on Eddington's remark, one may wonder at first sight if in the quantum world knowledge from the far future is available. Of course, this is not so, but why ? In his 1931-32 papers, Schrödinger solves this paradox by providing an amazingly close analogue of the quantum wave function propagation in the classical world, by means of the entropy minimization problem (S dyn ). In particular, formula (3.4) in Theorem 3.4 : , must be interpreted as the classical analogue of Born's formula : P t (dx) = ψ t (x)ψ t (x) dx. Let us quote [Sch32] again (this quotation also appears in [Föl88]) to emphasize that, although derived in a heuristic manner in [Sch31,Sch32], the system (2.11) and Born's formula (3.4) are motivated by the following question of large deviations in the framework of the lazy gas experiment : Imaginez que vous observez un système de particules en diffusion, qui soient enéquilibre thermodynamique. Admettons qu'à un instant donné t 0 vous les ayez trouvées en répartitionà peu près uniforme et qu'à t 1 > t 0 vous ayez trouvé unécart spontané et considérable par rapportà cette uniformité. On vous demande de quelle manière cetécart s'est produit. Quelle en est la manière la plus probable ? 17 As a concluding comment in his 1932 article, Schrödinger writes : La fonction [d'onde] complexe ψ correspondà deux fonctions réelles, de sorte qu'il suffit de définir les conditions aux limites en se donnant la valeur de ψà un seul instant déterminé ; c'est la façon de voir généralement admise en mécanique quantique. Est-elle la seule admissible ? Dans notre problème, cela reviendraità regarder comme données les valeurs de f et g 18à un instant déterminé (au lieu des valeurs de leur produità deux instants différents), chose inadmissible et absolument dénuée de sens. Doit-on interpréter la remarque d'Eddington, citée plus haut, comme signalant la nécessité de modifier cette manière de voir en mécanique ondulatoire et prendre comme conditions aux limites les valeurs d'une seule probabilité réelleà deux instants différents ? 19 16 This is a classical problem: a probability problem in the theory of Brownian motion. But eventually an analogy with the wave mechanics will appear. This analogy stroke me so hard once I discovered it, that it is difficult for me to believe that it is purely accidental. As an introduction, let me quote a remark that I found in the "Glifford lectures" of A. S. Eddington (Cambridge, 1928, p. 216 et sqq). Discussing the interpretation of wave mechanics, Eddington writes in a footnote the following remark: "The whole interpretation is very obscure, . . . " 17 Imagine that you observe a system of diffusing particles which is in thermal equilibrium. Suppose that at a given time t 0 you see that their repartition is almost uniform and that at t 1 > t 0 you find a spontaneous and significant deviation from this uniformity. You are asked to explain how this deviation occurred. What is its most likely behaviour?
18 With the notation of the present article. 19 The complex [wave] function ψ corresponds to two real functions. Therefore, it is enough to define the limit conditions by prescribing the value of ψ at a unique given time. This is the regular practice in quantum mechanics. Is it the only admissible one? In our problem, this would correspond to considering that the values of f and g [with the notation of the present article] are prescribed at a given time (instead of the values of their product at two distinct times). This is inadmissible and meaningless. Should one interpret the previously quoted remark of Eddington, as a hint for the necessity of modifying This has been performed in 1942 by R. Feynman in his PhD thesis [Fey05], without knowing Schrödinger's contribution. Feynman's thesis is entitled : The principle of least action in quantum mechanics. Based on a seminal article by Dirac [Dir33], entitled The Lagrangian in quantum mechanics (also reproduced in [Fey05]), and in contrast with the regular Hamiltonian approach, Feynman's thesis proposes a Lagrangian approach to quantum mechanics which will be further developed in several directions, see [FH65].
Föllmer's contribution. Although Schrödinger obtains the classical Born formula (3.4), he does not write explicitly the problems (S dyn ) and (S). Their explicit formulation is due to H. Föllmer in his Saint-Flour lecture notes [Föl88,. Proposition 2.3 which is based on the additive property of the relative entropy (A.8), also appears in [Föl88].
Early mathematical developments. Although this part of Schrödinger's work has been forgotten for some decades, it had influenced leading mathematicians soon after its publishing.
Reciprocal processes, 1932. Very soon after Schrödinger's 1931 article, S. Bernstein published in 1932 an article [Ber32] about the general problem of deriving limit theorems for sequences of dependent random variables. Among other notions, he explored the Markov property and, motivated by [Sch31], proposed a type of time-correlation which is less restrictive than the Markov property and is still symmetric with respect to time reversal 20 . He suggested that such stochastic processes could be called reciprocal process. While a Markov measure Q ∈ M + (Ω) satisfies for any 0 ≤ s ≤ t ≤ 1, a path measure Q ∈ M + (Ω) is reciprocal if for any 0 ≤ s ≤ t ≤ 1, Any Markov measure is reciprocal, but the converse is false. The theory of reciprocal processes has been forgotten for a while after Bernstein's article and was eventually developed by B. Jamison in 1974, [Jam74,Jam75]. A significant contribution of Jamison to the theory of Schrödinger problem was that its solution P is not only reciprocal, but also Markov and it is indeed an h-transform of the Markov reference process. This is performed without any entropy, but solely by means of reciprocal transitions. Föllmer recovered these results in [Föl88] using the entropy minimization problem (S dyn ). For more information about the relations between reciprocal and Markov measures, see [LRZ].
Time-reversal, 1936. In the very first lines of his celebrated paper [Kol36] about Markov processes and time-reversal, A. Kolmogorov quotes Schrödinger's 1931 paper as a motivation. This has been surprisingly forgotten afterwards.
Schrödinger system, 1940. Schrödinger had left open the problem of finding criteria for the system (2.11) to have a solution (f 0 , g 1 ). In 1940, R. Fortet [For40] proposed a partial solution and in 1960, A. Beurling [Beu60] gave a solution close to the statement of Theorem 2.8. Beurling's proof also relies upon an entropy argument. Beurling's result was improved by Jamison in [Jam75] who obtained the complete solution of Schrödinger's system. our usual way of looking at quantum mechanics by defining the limit conditions in terms of the values of a single real probability at two distinct times? 20 It is not clear that Bernstein was aware of the time-symmetry of the Markov property. This symmetry has clearly been identified twenty years later by J.L. Doob in his textbook [Doo53].
Stochastic deformations of mechanics. The aim of Euclidean quantum mechanics (EQM), which is mainly developed by J.-C. Zambrini since 1986 [Zam86, CZ91, CWZ00, CZ08], is to transfer by analogy, known results from quantum mechanics to the theory of stochastic processes and the other way round 21 . The starting paper [Zam86] of this program relies on Schrödinger's discovery and adapts Jamison's results for an appropriate class of reciprocal processes (unlike Zambrini, Jamison doesn't use the time-reversed filtration in his construction of reciprocal processes). Then the EQM program was extended to the derivation of rigorous results about various kind of stochastic processes which are suggested by the textbook Quantum mechanics and path integrals by Feynman and Hibbs [FH65]. This textbook presents, indeed, a time-symmetric (Lagrangian) approach to quantum mechanics which extends Feynman's early works and in particular his PhD thesis [Fey05].
Feynman's approach is an enlightening, efficient and intuitive guideline for physicists, but unfortunately it is impossible to put it on a rigorous mathematical ground : it is proved that Feynman's integral is an oddly defined object. However, replacing Feynman's integration by stochastic calculus suggests interesting results about diffusion processes. The first of these results was the celebrated Feynman-Kac's formula [Kac49]. EQM viewpoint, however, is that there is much more in Feynman's method than this time-asymmetric measure theoretic perturbative formula. EQM uses Kac's strategy in a systematic manner and its basic program is to obtain rigorous stochastic analogues of several intuitive statements from [FH65] ; intuitive, but highly efficient since they are corroborated by experiments. In EQM, the natural stochastic processes to work with are reciprocal processes. However, in several important situations, it appears that the critical (solving some variational problem) reciprocal processes are Markov. In this case, it is sufficient to work with (f, g)-transforms of Markov reference processes (see [Jam75,Föl88] for an h-transform representation) and their extensions : f 0 (X 0 ) exp 1 0 U(X t ) dt g 1 (X 1 ) R with the additional Feynman-Kac integral term (x, y) → exp 1 0 U(X t ) dt R xy which is the classical analogue of Feynman's propagator. These extensions of h-transforms are also used by M. Nagasawa in [Nag89,Nag00] who also explores connections between stochastic processes and quantum physics which are highly inspired by the Schrödinger problem.
It is also possible to stochastically deform all the mathematical tools of classical mechanics to derive new results about diffusion processes. For instance, M. Thieullen designed a second order calculus for reciprocal processes in [Thi93] and without referring to (S dyn ) or entropy in general, M. Thieullen and J.-C. Zambrini have obtained a stochastic deformation of Noether theorem [TZ97].
An interesting problem. This suggests that it would be also interesting to derive a type of Noether theorem for the Monge-Kantorovich dynamical problem. Let us give some hint of what is meant. In the Euclidean case, the displacement interpolation [µ 0 , µ 1 ] is a solution of (MK dyn ) with C = C kin := [0,1] |Ẋ t | 2 /2 dt, the kinetic action. It has a constant speed ; this means that twice the average kinetic energy Ω |Ẋ t | 2 d P = X |∇ψ t (x)| 2 µ t (dx), with the notation of the Benamou-Brenier formula (1.7), doesn't depend on time t. What happens when considering, instead of C kin , the action functional C = [0,1] |Ẋ t | 2 /2 + U(X t ) dt which should be connected with some Newton equation ? What are the quantities that are conserved along the minimizer [µ 0 , µ 1 ], in terms of the symmetries of the potential U?
Stochastic optimal control. Optimal transport can be deformed into a stochastic optimal control problem. This is mainly the contribution of T. Mikami, see [Mik09] for an overview of this approach and some of its main developments. With (4.7), one obtains that the Brownian Schrödinger problem (S dyn ), i.e. taking R to be the reversible Brownian motion, is also expressed as follows : E P u 1 0 L(u t ) dt → min; u ∈ A : P u 0 = µ 0 , P u 1 = µ 1 (7.1) where A is the set (of admissible controls) which consists of all the R n -valued progressively measurable processes u and P u (if it exists) is the law of the semi-martingale where W is a standard Brownian motion starting from 0 and L(u) = |u| 2 /2, u ∈ R n . This is a stochastic version of the quadratic Monge-Kantorovich problem (MK dyn ) : E P u 1 0 L(u t ) dt → min; u ∈ A MK : P u 0 = µ 0 , P u 1 = µ 1 which is obtained by replacing A with A MK , the set of all controls u ∈ L 1 R n ([0, 1]) and taking P u to be the law of X u t = X u 0 + t 0 u s ds, 0 ≤ t ≤ 1, a process with a random initial position and a deterministic evolution. This theory extends naturally to the case where L is strictly convex, regular and coercive enough : lim |u|→∞ L(u)/|u| p = ∞, for some p > 1. But results close to optimal transport are obtained with L admitting a quadratic growth, i.e. in harmony with the Brownian motion W.
When L is quadratic, if the Brownian motion W is replaced with √ ǫW and ǫ tends to zero, then Mikami shows in [Mik04] that (S dyn ) tends to (MK dyn ). Unfortunately, this type of convergence remains unclear unless L is quadratic, i.e. unless the stochastic optimal control problem corresponds to (S dyn ). T. Mikami and M. Thieullen have proved a Kantorovich-type dual equality in [MT06] for (7.1) and recovered related optimal transport results in [MT08]. T. Mikami has intensively studied the connections between stochastic control and optimal transport. In particular, soon after the discovery by Jordan, Kinderlehrer and Otto [JKO98] of the relation between gradient flows, Wasserstein distance and dissipative evolution equations, he proposed in [Mik00] a stochastic approach to the JKO approximation scheme. In addition to the already cited articles by Mikami, several other works by the same author are related to a probabilistic approach to optimal transport : [Mik02,Mik06,Mik08,Mik12]. Let us also quote the early contributions of Mikami [Mik90] and P. Dai Pra [Dai91] where the Schrödinger problem is translated in terms of stochastic control.
Putting ρ k (dxdy) = Z −1 k e −kc(x,y) ρ(dxdy) with Z k = X 2 e −kc dρ < ∞, up to the additive constant log(Z k )/k, this minimization problem rewrites as (S k ) with ρ k instead of R k 01 . See for instance the papers by Rüschendorf and Thomsen [RT93,RT98] and the references therein. Also interesting are the papers by Dupuy, Galichon and Salanie [GS, DG] with an applied point of view.
Annexe A. Relative entropy with respect to an unbounded measure This appendix section is a short version of [Léoc, § 2] which we refer to for more details. Let r be some σ-finite positive measure on some space Y . The relative entropy of the probability measure p with respect to r is loosely defined by H(p|r) := Y log(dp/dr) dp ∈ (−∞, ∞], p ∈ P(Y ) (A.1) if p ≪ r and H(p|r) = ∞ otherwise. More precisely, when r is a probability measure, we have H(p|r) = Y h(dp/dr) dr ∈ [0, ∞], p, r ∈ P(Y ) with h(a) = a log a − a + 1 ≥ 0 for all a ≥ 0, (take h(0) = 1). Hence, the definition (A.1) is meaningful. It follows from the strict convexity of h that H(·|r) is also strictly convex. In addition, since h(a) = inf h = 0 ⇐⇒ a = 1, we also have for any p ∈ P(Y ), H(p|r) = inf H(·|r) = 0 ⇐⇒ p = r.

(A.2)
If r is unbounded, one must restrict the definition of H(·|r) to some subset of P(Y ) as follows. As r is assumed to be σ-finite, there exist measurable functions W : Y → [1, ∞) such that Define the probability measure r W := z −1 W e −W r so that log(dp/dr) = log(dp/dr W ) − W − log z W . It follows that for any p ∈ P(Y ) satisfying Y W dp < ∞, the formula is a meaningful definition of the relative entropy which is coherent in the following sense. If Y W ′ dp < ∞ for another measurable function W ′ : Y → [0, ∞) such that z W ′ < ∞, then H(p|r W ) − Y W dp − log z W = H(p|r W ′ ) − Y W ′ dp − log z W ′ ∈ (−∞, ∞]. Therefore, H(p|r) is well-defined for any p ∈ P(Y ) such that Y W dp < ∞ for some measurable nonnegative function W verifying (A.3). It can be proved that H(p|r) where (i) identity (i) is valid when p is assumed to be absolutely continuous with respect to r; (ii) identity (ii) is meaningful when Y is a topological space equipped with its Borel σfield since we have set C W (Y ) to be the space of all continuous functions u : Y → R such that sup |u|/W < ∞, where W is any nonnegative function satisfying (A.3).
In this case, it follows that, being the supremum of affine continuous functions, H(·|r) is a convex lower semi-continuous function with respect to the weak topology σ({p ∈ P(Y ); Y W dp < ∞}, C W (Y )). Clearly, identity (i) entails that H(p|r) = ∞ whenever p ∈ P(Y ) is such that Y W dp = ∞ It follows from the strict convexity of H(·|r W ) and (A.4) that H(·|r) is also strictly convex.
Let Y and Z be two Polish spaces equipped with their Borel σ-fields. For any measurable function φ : Y → Z and any measure q ∈ M + (Y ) we have the disintegration formula where z ∈ Z → q(·|φ = z) ∈ P(Y ) is measurable, and the following additive property is valid for any p ∈ P(Y ) and any σ-finite r ∈ M + (Y ). In particular, as r(· | φ = z) is a probability measure for each z, with (A.2) we see that with equality if and only if p(· | φ = z) = r(· | φ = z), ∀z, φ # p-a.s. (A.10)