Weak approximation of Schr¨odinger-F¨ollmer diffusion

We consider the weak convergence of the Euler-Maruyama approximation for Schr¨odinger-F¨ollmer diffusions, which are solutions of Schr¨odinger bridge problems and can be used for sampling from given distributions. We show that the distribution of the terminal random variable of the time-discretized process weakly converges to the target one under mild regularity conditions


Introduction
In this paper we are concerned with a weak solution of the following stochastic differential equation: (1.1) dX t " ∇ log hpt, X t qdt `dB t , 0 ď t ď T, X 0 " 0.
Here, ∇ denotes the gradient operator, tB t u 0ďtďT is a d-dimensional standard Brownian motion on some probability space, and for a given Borel probability measure µ on R d with positive density ρ the function h is defined by where p s is the transition density function of tB t u.
Eq. (1.1) appears in the so-called Schrödinger bridge problem between the Dirac measure δ 0 with mass at origin and µ.To explain the Schrödinger bridge problem briefly, let W d " Cpr0, T s, R d q be the space of all R d -valued continuous functions on r0, T s.Denote by PpW d q the totality of Borel probability measures on W d .Assume here that (1.1) has a weak solution and denote by P ˚the corresponding path measure on W d .Further denote by D KL pQ } P q the Kullback-Leibler divergence or the relative entropy of Q P PpW d q with respect to the Wiener measure P P PpR d q, defined by where E R denotes the expectation with respect to a probability measure R on a measurable space.If D KL pP ˚} P q ă 8, then P ˚is a unique solution of the minimization problem of D KL pQ } P q over all Q P PpW d q such that the marginal distributions of Q at time 0 and 1 are given by δ 0 and µ, respectively.The name Schrödinger bridge problem comes from Erwin Schrödinger's works [10] and [11].His aim was to study a transition probability that most likely occurs under constraints on the initial and terminal time distributions of the empirical measures of independent Brownian particles.We refer to Chetrite et al. [2], an english translation of [10], for an exposition of Schrödinger's original approach.Föllmer [6] discovers that such problem is nothing but the one of large deviation and is nearly equivalent to the minimization problem above.Moreover, [6] shows the optimality of P ˚.In particular, the marginal distribution P ˚at time T coincides with µ.We refer to, e.g., Chen et al. [1] and Léonard [9] for a detailed survey of Schrödinger's bridges.
Sampling from a given probability distribution is an important issue, primarily in fields such as statistics and machine learning.For instance, in Bayesian estimation, the accuracy of sampling from an unnormalized posterior distribution is a critical concern.Thus, various sampling methods have been studied, including Markov chain Monte Carlo methods and the Langevin samplers.Recently, Huang et al. [7] proposes to use Schrödinger bridges as a sampler of µ.Based on this sampling method, Dai et al. [3] applies Schrödinger bridges to global optimization.Following [7] and [3], we call a weak solution of (1.1) the Schrödinger-Föllmer diffusion.
In sampling applications, we need to consider a time discretization of the Schrödinger-Föllmer diffusion, which are also discussed in [7] and [3].The both studies assume that the drift function in (1.1) is of linear growth, Lipschitz continuous in x, and 1{2-Hölder continuous in t on r0, T s ˆRd , and then justify the Euler-Maruyama method for (1.1) as the strong approximation.In many practical situations, e.g., in Bayesian estimation, the density ρ has a complicated form, and so in general it may be difficult to check that the drift function satisfies these conditions.In the present paper, we aim to find weaker sufficient conditions for which the Euler-Maruyama method for the Schrödinger-Föllmer diffusion converges.In particular, we discuss the weak approximation of (1.1), carried out in the next section.

Main results
Let PpR d q be the set of all Borel probability measures on R d .Let µ P PpR d q with density ρ satisfying the following condition: Under (A1) there exists a weak solution of (1.1) and the corresponding path measure P satisfies where tx t u is the coordinate map on W d (see [6] and Dai-Pra [4]).
Let us consider the Euler-Maruyama approximation of the weak solution of (1.1).To this end, let pΩ, F, Pq be an atomless and complete probability space, and tZ i u n i"1 an IID sequence on pΩ, F, Pq such that each Z i is a d-dimensional standard normal vector.Let tt i u n i"0 be given by t with Y 0 " 0, where h " T {n and bpt, yq " ∇ log hpt, yq " ∇Erφpy `?T ´tZqs Erφpy `?T ´tZqs , pt, yq P r0, T q ˆRd , where E " E P .
In the framework of weak solutions, it is convenient to work under the boundedness of drift functions.To this end, we shall impose the following condition on µ: Here is our first main result.
Theorem 2.1.Let tY i u n i"0 be as in (2.1).Under pA1q and pA2q, the distribution of Y n under P weakly converges to µ as n Ñ 8.
Remark 2.2.Suppose that ρpxq 9 e ´V pxq .Then a sufficient condition for which (A2) holds is that is globally Lipschitz.
Remark 2.3.Huang et al. [7] assumes that the drift b is of linear growth, Lipschitz continuous in x, and 1{2-Hölder continuous in t on r0, T s ˆRd .Then they justifies a strong approximation of the Euler-Maruyama method.Remark 4.1 in [7] states that a set of sufficient conditions for which these hold is that φ is bounded from below and both φ and ∇φ are Lipschitz continuous.If ρpxq 9 e ´V pxq as in the previous remark, then this requires that Ṽ has bounded derivatives up to second order, which is stronger than our conditions (A1) and (A2).
Remark 2.4.It should be noted that the weak convergence of the Euler-Maruyama methods under non-Lipschitz conditions for the drift term has already been studied, for example, in Zhang [12].To apply the results from [12] to (1.1), the drift function ∇ log hpt, xq in (1.1) must be continuous in t on r0, T s for any x.However, under our conditions (A1) and (A2), the continuity of t Þ Ñ ∇ log hpt, xq at T cannot be assured.That is, Schrödinger-Föllmer diffusions generally fall outside the scope of existing studies on the weak convergence of the Euler-Maruyama methods.
Proof of Theorem 2.1.
With this definition, we obtain |bpt, xq| ď C 0 a.e.x P R d for any t P r0, T s.
Step (ii).Let tX t u 0ďtďT be a d-dimensional standard Brownian motion on pΩ, F, Pq.By Step (i), the process bpt, X t q, 0 ď t ď T , is bounded P-a.s., whence we can define the probability measure P ˚on pΩ, Fq by where a T denotes the transposition of a vector a.Further, by Girsanov's theorem, the process is a Brownian motion under P ˚.Thus, the pΩ, F, F, P ˚, X, B ˚q is a weak solution of (1.1), i.e., (2.2) X t " ż t 0 bps, X s qds `Bt .
In particular, P ˚pX T q ´1 " µ.
Step (iii).Define Xt by Xt " Xt i `bpt i , Xt i qpt ´ti q `Bt ´Bt i , t i ă t ď t i`1 , i " 0, . . .n ´1, with X0 " 0. Note that P ˚p XT q ´1 " PpY n q ´1.Further t Xt u satisfies where τ n psq " tnsu{n and txu denotes the greatest integer not exceeding x P R. In particular, tB t u is adapted to the augmented natural filtration G generated by t Xt u.
Then put β t " bpτ n ptq, Xτnptq q ´bpt, Xt q and consider another probability measure P on pΩ, Fq defined by Again by Girsanov's theorem, the process Bt :" B t `ż t 0 β s ds, 0 ď t ď T, is a P-Brownian motion.This leads to (2.4) Xt " ż t 0 bps, Xs qds `B t , 0 ď t ď T, whence pΩ, F, F, P, X, Bq is also a weak solution of (1.1).By the uniqueness in law for the weak solution of (1.1) obtained by Girsanov's theorem, we have P ˚X ´1 " P X´1 (see Proposition 5.3.10 in Karatzas and Shreve [8]).Now, for any Γ P BpW d q, Since β is G-adapted, as in the proof of Lemma 2.4 in [8], we have ş T 0 β T s d Bs " lim kÑ8 ş T 0 pβ pkq s q T d Bs a.s.possibly along subsequence for some G-adapted simple processes tβ pkq t u 0ďtďT , k P N. The process B is also G-adapted, whence there exists a BpW d q-measurable map Φ such that Φp Xq " exp By exactly the same way, we see where θ pnq s " bpτ n psq, X τnpsq q ´bps, X s q, 0 ď s ď T .This means Step (iii).Let A P BpR d q.Using Birkholder-Davis-Gundy inequality, Cauchy-Schwartz inequality, Doob's maximal inequality, and the boundedness of b, we observe for some positive constants C, which may be different from line to line.Hence Now, by the continuity of b on r0, T q ˆRd , lim nÑ8 bpτ n psq, X τnpsq q " bps, X s q, dt ˆdP ˚-a.e..
Thus, applying the bounded convergence theorem, we obtain Consequently, we have shown that the total variation distance between PpY n q ´1 and µ converges to zero as n Ñ 8.
Next, we shall consider the case where the density ρ may have a compact support.Again we impose the following integrability condition as in (A1): It should be emphasized here that ρ is not necessarily a positive function, whence (1.1) may not have a weak solution.Thus we introduce the approximate distribution µ ε with density ρ ε defined by ρ ε pyq " p1 ´εqρpyq `εG T pyq, y P R d , where ε P p0, 1q and G T pyq " p T p0, yq " e ´|y| 2 {p2T q p2πT q d{2 , y P R d .
To guarantee the boundedness of b ε on r0, T s ˆRd except null sets, we impose the following condition on φ: (A4) There exists a positive constant C 1 such that As a criterion of convergences, we adopt the total variation distance δ defined by δpν 1 , ν 2 q " sup APBpR d q |ν 1 pAq ´ν2 pAq| , ν 1 , ν 2 P PpR d q.
Then we have the following: Theorem 2.5.Let tY ε i u n i"0 be as in (2.6).Denote by LpY ε n q the distribution of Y ε n under P.Under pA3q and pA4q, we have lim εÑ0 lim nÑ8 δpLpY ε n q, µq " 0.
Remark 2.6.Huang et al. [7] analyses the weak convergence of LpY ε n q under the condition that b ε is of linear growth, Lipschitz continuous in x, and 1{2-Hölder continuous in t on r0, T s ˆRd for any ε ą 0. A set of sufficient conditions for which these hold is the Lipschitz continuity of both φ and ∇φ.See Section 4.3 in [7].If φ is Lipschitz, then ρpxq ď Cp1 `|x|qG T pxq for some constant C ą 0 from which (A3) holds.In this sense, our conditions (A3) and (A4) are weaker than those imposed in [7].Remark 2.7.Consider the case where ρ is given by the density estimator with triangular kernel, i.e., ρpxq " for some x 1 , . . ., x m P R, h 0 ą 0 and pxq `" maxpx, 0q for x P R. Then it is straightforward to see that φ j pxq :" p1 ´|x ´xj |{h 0 q `e|x| 2 {p2T q satisfies (A4) and that φ 1 j is not continuous on R. Therefore, in this case, the conditions (A3) and (A4) are satisfied, but not the Lipschitz continuity of φ 1 .
Proof of Theorem 2.5.Let us confirm that b ε can be defined as an essentially bounded function on r0, T s ˆRd .By (A4), the function φ is locally Lipschitz.This together with Rademacher's theorem tells us that ∇φ exists almost everywhere in R for any t P r0, T s.Therefore, the process b ε pt, X t q, 0 ď t ď T , is bounded P-a.s.Thus we can proceed exactly the same way as in the proof of Theorem 2.1, and obtain δpLpY ε n q, µ ε q ď C ε E P This together with lim εÑ0 δpµ ε , µq " 0 completes the proof of the theorem.

˚"ż T 0 |θ pnq s | 2
ds ȷ 1{2 , where P ˚and θ pnq are defined as in the proof of Theorem 2.1 with b replaced by b ε and C ε is a positive constant depending on ε.By the continuity of b ε on r0, T q ˆRd , lim nÑ8 b ε pτ n psq, X τnpsq q " b ε ps, X s q, dt ˆdP ˚-a.e.. Thus, applying the bounded convergence theorem, we obtain lim nÑ8 E P ˚"ż T 0 |θ pnq s | 2 ds ȷ " 0.