Ergodicity of self-attracting motion

The aim of this paper is to study the asymptotic behaviour of a class of self- attracting motions on R^d . Using stochastic approximation methods, these processes have already been studied by Bena\"im, Ledoux and Raimond (2002) in a compact setting. We also relate the asymptotic behaviour of the self-attracting Brownian motion to the McKean-Vlasov process that was studied, via the decrease of the free energy, by Carrillo, McCann and Villani (2003). Mixing these methods, we manage to obtain sufficient conditions for the (limit-quotient) ergodicity of the self-attracting diffusion, together with a speed of convergence.

1. Introduction 1.1. Statement of the problem. This text is devoted to study the asymptotic behaviour of a Brownian motion, interacting with its own passed trajectory, so-called "self-interacting motion". Namely, we fix an interaction potential function W : R d → R, and consider the stochastic differential equation (1) dX where (B t , t ≥ 0) is a standard Brownian motion, with an initial condition of given X 0 (with the condition of continuity at t = 0). This equation can be rewritten using the normalized occupation measure µ t : where δ x is the Dirac measure concentrated at the point x. Using this convention, the equation (1) becomes where * stands for the convolution. Note that the equations (1), (2) clearly have singularities at t = 0, which is the reason why sometimes they are considered only after some positive time r > 0. We discuss the existence and uniqueness questions for the solution in the appendix. Similar problems have already been studied since the 90's, for instance by Durrett and Rogers [8], or Benaïm, Ledoux and Raimond [2], initially to modelize the evolution of polymers or ants. The first time-continuous self-interacting processes have been introduced by Durrett and Rogers [8] under the name of "Brownian polymers". They are solutions to SDEs of the form where (B t , t ≥ 0) is a standard Brownian motion and f a given function. We remark that, in the latter equation, the drift term is given by the non-normalized measure tµ t and not by µ t as the process we will study here. As the process (X t , t ≥ 0) evolves in an environment changing with its past trajectory, this SDE defines a self-interacting diffusion, which can be either self-repelling or self-attracting, depending on the function f . In any dimension, Durrett & Rogers obtained that the upper limit of |X t |/t does not exceed a deterministic constant whenever f has a compact support. Nevertheless, very few results are known as soon as the interaction is not self-attracting. Self-interacting diffusions, with dependence on the (convoled) empirical measure (µ t , t ≥ 0), have been considered since the work of Benaïm, Ledoux & Raimond [2]. A great difference between these diffusions and Brownian polymers is that the drift term is divided by t. This implies that the long-time away interaction is less important than the near-time interaction (the interaction is not "uniform in time" anymore). Benaïm et al. have shown in [2,3] that the asymptotic behaviour of µ t can be related to the analysis of some deterministic dynamical flow defined on the space of the Borel probability measures. Afterwards, one can go further in this study and give sufficient conditions for the a.s. convergence of the empirical measure. It happens that, with a symmetric interaction, µ t converges a.s. to a local minimum of a nonlinear free energy functional (each local minimum having a positive probability to be chosen), this free energy being a Lyapunov function for the deterministic flow. These results are valid for a compact manifold. Part of them have recently been generalized to R d (see [9]) assuming a confinement potential satisfying some conditions -these hypotheses on the confinement potential are required since in general the process can be transient, and is thus very difficult to analyze. In these works, no rate of convergence is obtained. Most of these results are summarized in a recent survey of Pemantle [12], which also includes self-interacting random walks.
Coming back to the process introduced by Durrett & Rogers, all the results obtained have in common that the drift may overcome the noise, so that the randomness of the process is "controlled". To illustrate that, let us mention, for the same model of Durrett & Rogers, the case of a repulsive and compactly supported function f , that was conjectured in [8] and has been partially solved very recently by Tarrès, Tóth and Valkó [15]: Conjecture (Durrett & Rogers [8]). Suppose that f : R → R is an odd function of compact support, such that xf (x) ≥ 0. Then, for the process X defined by (3), the quotient X t /t converges a.s. to 0.
In (1), the drift term is divided by t, and so it is bounded for a compactly supported interaction W . As for the process of the conjecture, the interaction potential is in general not strong enough for the process (1) to be recurrent, and the behaviour is then very difficult to analyze. In particular, it is hard to predict the relative importance of the drift term (in competition with the Brownian motion) in the evolution.
On the other hand, in our case of uniformly convex W , the interaction potential is attractive enough for the diffusion (a bit modified) to be comparable to an Ornstein-Uhlenbeck process, what gives an access to its ergodic behaviour.
Another problem, related to the one considered in this paper, is the diffusion corresponding to MacKean and Vlasov's PDE. Namely, consider the Markov process defined by the SDE where ν t stands for the law of Y t , and W is a smooth strictly uniformly convex function. The questions of the asymptotic law for Y have been intensively studied these last years, by Carrillo, MacCann & Villani [5], Bolley, Guillin & Villani [4], or Cattiaux, Guillin & Malrieu [7] for instance. It turns out that, under some assumptions, the laws ν t converge to the limit measure ν * . This measure is characterized as a fixed point of a map Π : ν → Π(ν) associating to a measure ν the probability measure Π(ν)(dx) := 1 Z e −W * ν(x) dx, which is the stationary measure of the process, with ν t in the right-hand side of (4) replaced by ν and Z = Z ν is the normalization constant. In particular, Carrillo, MacCann & Villani [5] have shown, using some mass transport tools, that the relative free energy corresponding to ν t with respect to ν * decreases exponentially fast to 0. Then Talagrand's inequality allows to compare the relative free energy to the Wasserstein distance in case of uniform convexity of the interaction potential W , and so they have obtained the decrease to 0 of the quadratic Wasserstein distance between ν t and ν * .
We remark that a huge difference between the preceding Markov process and the (non-Markov) self-interacting diffusion is that the asymptotic σ-algebra is in general not trivial for the non-Markov process. Nevertheless, we will use a similar mass transport method to show the convergence of the empirical measure µ t .

1.2.
Main results. Our results are analogous to those of Carrillo et al. [5]: under some assumptions imposed on the interaction potential W , we show that the empirical measure µ t almost surely converges to an equilibrium state, which is unique up to translation: Theorem 1 (Main result). Suppose, that W ∈ C 2 (R d ), and: 1) spherical symmetric: W (x) = W (|x|); 2) uniformly convex: denoting by S d−1 the (d − 1)−dimensional sphere, 3) W has at most a polynomial growth: for some polynomial P , we have Then, there exists a unique symmetric density ρ ∞ : R d → R + , such that almost surely, there exists c ∞ such that Moreover, there exists a > 0 such that the speed of convergence of µ t toward ρ ∞ (· + c ∞ ) for the Wasserstein distance is at least exp{−a k+1 √ log t}, where k is the degree of P .
Remark 1. The assumption 1) corresponds to the physical assumption of the interaction force between two particles being directed along the line joining them, and to the third Newton's law (that is the equality between the action and the reaction forces). The symmetry assumption cannot be omitted, as shows an example in the appendix.
Remark 2. We will suppose in the following, without any loss of generality, that P ≥ 1 is of degree k ≥ 2 and such that for all x, y ∈ R d , we have P (|x − y|) ≤ P (|x|)P (|y|). Indeed, we choose P (|x|) = A(1 + |x| k ), where A is a constant large enough. This will be used in §2.2.
The origin of the following remark will be clear after the discussion in §2.4 Remark 3. The density ρ ∞ is the same limit density as in the result of [5], uniquely defined (among the centered densities) by the following property: ρ ∞ is a positive function, proportional to e −W * ρ∞ .
We can also consider the same drifted motion in presence of an external potential V . For this, the following result is a generalization of Theorem 1 (where we replace C W by C in the notation): Theorem 2. Let X be the solution to the equation Suppose, that V ∈ C 2 (R d ) and W ∈ C 2 (R d ), and: 1) spherical symmetric: W (x) = W (|x|); 2) V and W are convex, lim |x|→∞ V (x) = +∞, and either V or W is uniformly convex: 3) V and W have at most a polynomial growth: for some polynomial P we have ∀x ∈ R d Then, there exists a unique density ρ ∞ : R d → R + , such that almost surely As the proof of the latter Theorem coincides with the proof of Theorem 1 almost identically, we do not present it here. It suffices to add V in the arguments below. Moreover, if V is symmetric with respect to some point q, then the corresponding density ρ ∞ is also symmetric with respect to the same point q.
The proof of Theorem 1 is split into two parts. Consider a natural "reference point" for a measure µ: Definition 1. Consider a measure µ on R d , decreasing fast enough for W * µ to be defined. The center of µ is the point c µ = c(µ) such that ∇W * µ(c µ ) = 0, or equivalently, the point where the convolution W * µ (the potential generated by µ) takes its minimal value. Also, we define the centered measure µ c as the translation of the measure µ, bringing c µ to the origin: Remark 4. This notion of center had been previously introduced by Raimond in [13]. Indeed, to study the linear attracting d-dimensional case of Brownian polymers, Raimond has defined the center and proved that the process remains close to c t = c(µ t ) (and that c t converges a.s.). A sufficient condition for the existence of the center is that W is convex, and it is unique if W is stricty convex.
The first part of the proof of Theorem 1 consists in proving the convergence of centered occupation measures: Theorem 3. Under the assumptions of Theorem 1, for some symmetric density function ρ ∞ : R d → R + , we have almost surely The second is the convergence of centers: Theorem 4. Under the assumptions of Theorem 1, almost surely the centers c t := c(µ t ) converge to some (random) limit c ∞ .
It is clear that the two latter theorems imply the main result. Let us sketch their proofs.
1.3. Outline of the proof and physical interpretation.
1.3.1. Existence and uniqueness. First, a standard remark is Markovianization: the behaviour of the pair (X t , µ t ) is Markovian. The reader will find it, together with some other standard remarks, in §2.1.1. Unfortunately, the Markov process (X t , µ t ) is infinite-dimensional and, in general (except for the case of a polynomial interaction W ), we do not manage to reduce to a finite-dimensional process. So, we do not use this information directly in order to obtain interesting properties on µ t , because the state space is then too large. After this remark, we discuss the global existence and uniqueness for the solutions of (2) in §2.1.4.

Discretization.
A next step is discretization: we take a (well-chosen and deterministic) sequence of times T n → ∞, with T n ≫ T n+1 − T n ≫ 1, and consider the behaviour of the measures µ Tn . As T n ≫ T n+1 − T n , it is natural to expect (and we will give the corresponding statement) that the empirical measures µ t on the interval [T n , T n+1 ] almost do not change and thus stay close to µ Tn . So, on this interval we can approximate the solution X t of (2) by the solution of the same equation with µ t ≡ µ Tn : in other words, by a Brownian motion in a potential W * µ Tn that does not depend on time.
On the other hand, the series of general term T n+1 −T n increases. So, using Birkhoff Ergodic Theorem 1 , we see that the (normalized) distribution µ [Tn,T n+1 ] of values of X t on these intervals becomes (as n increases) close to the equilibrium measures Π(µ Tn ) for a Brownian motion in the potential W * µ Tn , where (see §3.1) This could motivate us to approximate the behaviour of the measures µ t by trajectories of the flow (on the infinite-dimensional space of measures) or after a logarithmic change of variable θ = log t, In fact, it is not a priori clear that the flow defined by (11) exists, as the space of measures is infinite-dimensional. Though the flow can be shown to be well defined on a subspace of exponentially decreasing measures, we prefer to avoid all these problems by working directly with the discretization model in §3.1. Nevertheless, this flow serves very well in motivating the considered functions and lemmas describing their behaviour, as the discretized procedure we have is in fact the Euler method of finding solutions to (11).

1.3.3.
Physical interpretation: gas re-distribution. Before proceeding further, let us give a physical interpretation to the flow (11), predicting its asymptotic behaviour. Namely, note that a Brownian motion drifted by some potential V , can be thought as movement of gas particles under this potential, and the stationary probability measure, m = 1 Z V e −V dx, is the density with which the gas becomes distributed after some time passes. So, in dimension one, a discrete approximation to the flow (11) can be seen as follows. We take a tube, filled with W -interacting gas, separated in a plenty of very small cells (see Fig. 1).
. . . . . . Figure 1. Gas: phase "separation" Each unit of time, small parts (of proportion ε) of gas in these cells are separated, allowed to travel along the tube, and are proposed to equilibrate in the potential generated. This part of all the gas being small, its auto-interaction is negligible, thus their new distribution is governed by the field V := W * µ, generated by the major part of the particles staying fixed to their cells. The small part is then equilibrated to its weight ε times Π(µ).  Then, it is separated again by the cells, thus the distribution after such step becomes On the other hand, this procedure does not require any work (in the physical sense) to be done: the only actions are opening and closing the doors. So, due to the general principle, one can expect that the system will tend to its equilibrium. And a tool allowing to show that it is the case is the free energy, that we recall in the next paragraph.
We conclude by noticing that the same physical interpretation can be considered for the problem in any dimension d, by placing in R d+1 two close parallel walls (corresponding to the tube in dimension one), and placing the cells along them.
1.3.4. Free energy functional. A tool allowing to show the convergence of trajectories of (11) is the free energy that, due to a general physical principle, should not increase along the trajectories as long as we do not do any work.
Namely, consider an absolutely continuous probability measure µ = µ(x)dx (by an abuse of notation, we denote the measure and its density by the same letter). Imagine µ(x) as the density of a gas, particles of which implement the Brownian motion √ 2dB t , as well as interact with the potential W (x−y). Then, one defines the free energy of µ as the sum of its "entropy" H and "potential energy": where the entropy of the measure µ is Then, as we have already said, a general physical principle says that, as we are doing no work on the system, the free energy should decrease, and the system should tend to its minimum. Indeed, the free energy F is a Lyapunov function for the flow (11) (when it is defined, though it is defined only for measures that are absolutely continuous with respect to the Lebesgue measure, and otherwise, F (µ) = +∞). This can be seen by joining two statements: on one hand, the measure Π(µ) is (what corresponds to the same physical principle) the unique global minimum of a free energy of a non-interacting Brownian motion in the exterior potential V = W * µ (see §1.3.3 and Lemma 7 in §2.4). The second is the inequality where m = Π(µ). On one hand, it can be easily seen by an explicit computation, noticing that the entropy part is convex. On the other hand, such a differentiation corresponds to replacing some small parts of the gas distributed with respect to the measure µ by the one distributed with respect to the measure m, and in the right-hand side we have the corresponding free energies of these small parts in the potential, generated by the main part of the gas. Then, differentiating the function F along the trajectories of the flow (11), one finds for the solution µ(θ) with the equality if and only if µ(θ) = Π(µ(θ)). Finally (and we recall these arguments in §3.1), the fixed points of Π are exactly the translation images of the density ρ ∞ , that is the centered global minimum of the functional F . So, roughly speaking, the function F is the Lyapunov function of the flow (11). The words "roughly speaking" here refer to that these arguments are non-rigorous: we avoided showing that the flow is indeed well-defined, the free energy functional is defined only for absolutely continuous measures, etc. Though all of this serves well as a motivation to (rigorous) lemmas of free energy behaviour used in this paper.
We conclude this paragraph by indicating that for the dynamics in presence of an exterior potential V (the case of Theorem 2) one has to replace the free energy function by and, instead of F W * µ , consider F V +W * µ for the energy of "small parts".
We are now ready to conclude the sketches of the proofs of Theorems 3 and 4 (as it was already mentioned, they immediately imply Theorem 1). Namely, we consider the discretized Euler-like evolution of the flow (10), defined by the rule For the measuresμ Tn defined by this procedure, we obtain (using discrete rigorous analogues of informal arguments of the previous paragraph) some estimates on the speed with which their free energy decreases. This allows us to estimate distances from these measures to the set of translates of ρ ∞ (because they are the only minima of F ). Now, we are taking the true random trajectory µ t , and estimate the distance from the centered measures µ c t to the equilibrium point. To do this at some moment t, we choose an earlier moment t ′ , replace the measure µ t ′ by a close smooth measureμ t ′ , and consider deterministic discrete iterates by (15). On one hand, for this new trajectory the free energy is defined (as we have chosen a smooth approximation), so we control the decrease of energy and hence the distance from the centered measureμ c t to ρ ∞ . On the other hand, an accurate computation allows us to control the distance between the random measure µ t and the approximating deterministic imageμ t of its smooth perturbation. The sum of these distances then estimates the distance from µ c t to ρ ∞ , and the obtained estimate tends to 0 as t → ∞. This concludes the proof of Theorem 3.
Finally, to prove Theorem 4, one first computes the speed of drift of the center c t , and then shows that the series of general term |c T n+1 − c Tn | converges, and the oscillations osc [Tn,T n+1 ] c t tend to zero. This implies the existence of the limit of c t as t → ∞.

1.4.
Outline of the paper. At the beginning of Section 2, we show the existence and uniqueness of solutions to (2) starting at any positive moment r > 0. The discussion of this topic at t = 0 is postponed to the appendix. In the rest of Section 2, we present some crucial preliminary computations which are at the basis of our proofs. Most of the material there is not new, except for the combination of stochastic approximation of the empirical measure (see [2]) with free energy functionals (see [5]) and the achieving of a bound on the convergence rate. Finally, Section 3 is devoted to the proofs of our main results.
1.5. Acknowledgments. The authors are very grateful to two anonymous referees for their useful comments which led to a rewritting of the paper for a better understanding.

Preliminaries
As usual, we denote by M(R d ) the space of signed (bounded) Borel measures on R d and by P(R d ) its subspace of probability measures. We will need the following measure space: where |µ| is the variation of µ (that is |µ| := µ + + µ − with (µ + , µ − ) the Hahn-Jordan decomposition of µ: µ = µ + − µ − ). Belonging to this space will enable us to always check the integrability of P (and therefore of W and its derivatives thanks to the domination condition (6)) with respect to the (random) measures to be considered. We endow this space with the dual weighted supremum norm (or dual P -norm) defined for µ ∈ M(R d ; P ) by We recall that P (|x|) ≥ 1, so that µ P ≥ |µ(R d )|. This norm naturally arises in the approach to ergodic results for time-continuous Markov processes of Meyn & Tweedie [11]. It also makes M(R d ; P ) a Banach space. Next, we consider P(R d ; P ) = M(R d ; P ) ∩ P(R d ). We remark that both M(R d ; P ) and P(R d ; P ) contain any probability measure with an exponential tail and, in particular, any compactly supported measure. For any κ > 0, we also define 2.1. Existence and uniqueness of solutions.
2.1.1. Markovian form; local existence and uniqueness. First step in studying the trajectories of (2) is to pass to the couple (X t , µ t ). A standard remark is that the behaviour of this couple is infinite-dimensional Markovian (and in general, except for W being polynomial, cannot be reduced to a finite-dimensional Markov process). This reduction is easily implied by the identity Note that the second term in the right-hand side of (19) can be written as s t+s µ [t,t+s] , where µ [t,t+s] is the empirical measure during the time interval [t, t + s]: Now, passing µ t to the left-hand side of (19), dividing by s and passing to the limit as s → 0, we obtain the following SDE for the couple (X t , µ t ): . For any t 0 > 0, the local existence and uniqueness of solutions to (20), in a neighbourhood of t 0 , is implied by well-known arguments: see Theorem 11.2 of [14].
However, in order to study the asymptotic behaviour of solutions to (2), we should first show the global existence of these solutions, in other words, that they do not explode in a finite time. It will be done in §2.1.2.
Note also that the equation (20) clearly has a singularity at t = 0. To avoid this singularity, sometimes the equation (20) is considered with an initial condition (X r , µ r ) at some positive time r > 0 (and thus for t ∈ [r, ∞)). After the time-shift s = t − r, the system (20) transforms to s+r (−µ s + δ Xs ). In fact, we can restrict our consideration to such situations only (as, anyway, we are interested in the asymptotic behaviour of solutions at infinity), but it is interesting to show that the equation (2) has indeed existence and uniqueness of solutions for any initial value problem X 0 = x 0 . It is done in the appendix.

2.1.2.
Center-drift estimates. A natural "reference point" that one can associate to a measure µ is the equilibrium point c µ = c(µ) of the potential it generates with W , defined by the equation ∇W * µ(c µ ) = 0 (see Definition 1, §1.2), that we refer to as the center of the measure µ. Also, it will be convenient to consider the centered measure µ c , obtained from µ by the translation that shifts the center to the origin.
Note that the implicit function theorem allows to estimate (on an interval of existence of solution (X t , µ t ) to (20)) the derivativeċ t of c t := c µt . In particular, we will see that c t is a C 1 -function on this interval.
Indeed, the function ( and for any (x, t) we have ∇ 2 W * µ t (x) ≥ C W I > 0. The implicit function theorem thus implies that c t is a C 1 -function of t (on the interval of existence of solution), and thaṫ This implies that the projection of the center drift velocity on the line from This also immediately gives an upper bound on the drift speed: Law of X-center distances: Ornstein-Uhlenbeck estimate. To continue our study, first we would like to obtain an estimate on the behaviour of the distance |X t − c t |. Namely, we are going to compare it with the (absolute value of) Ornstein-Uhlenbeck process, and to obtain exponential-decrease bounds on its occupation measure in §2.2.1.
ii) Z t is the absolute value of a 3d-dimensional Ornstein-Uhlenbeck process. Proof.
In the same way, the desired Z t shall satisfy the equation where γ is also a Brownian motion. So, take a one-dimensional standard Brownian motion β independent of the Brownian motion B and let γ be defined as where α : [0, +∞) → [0, 1] is a C ∞ -function which is identically zero in some neighbourhood of 0 and α(r) = 1 for any r ≥ 1. The process Z is then defined by (23).
We point out that, as B and β are independent, B is a d-dimensional Brownian motion while β is 1-dimensional. It follows that Z defined by (23) is the absolute value of a 3d-dimensional Ornstein-Uhlenbeck process.
On the other hand, for any t, either |X t − c t | ≤ 1 and then automatically |X t − c t | ≤ 2 + Z t , or |X t − c t | ≥ 1 and then both |X t − c t | and Z t share exactly the same Brownian component (as α ≡ 1), with the inequality between the drift terms of 2 + Z t and |X t − c t |: Proposition 2 (global existence). For any r > 0 and for any initial condition (X r , µ r ), the solution to (20) exists (and is unique) on the whole interval [r, +∞).
Proof. As we already have the local existence and uniqueness, it suffices to check that the solution X t cannot explode in a finite time (this impossibility will imply that the measures µ t , as the occupation measures of X t , also stay in a compact domain for any bounded interval of time).
Let us introduce the increasing sequence of stopping times τ 0 = 0 and In order to show that the solution never explodes, we use the comparison of X t − c t with the Ornstein-Uhlenbeck process Z t (see §2.1.3). So, we have for the corresponding Z, that |X min(t,τn) − c min(t,τn) | ≤ 2 + Z min(t,τn) .
As Z does not explode in a finite time, letting n go to infinity, we conclude that X t − c t does not explode in a finite time. To conclude, one has to use the inequality (22): Any trajectory of Z being bounded on any finite interval of time, the integral t r P (Zs) s ds is finite for any t ≥ r. So, the process (X t , t ≥ 0) does not explode in a finite time and there exists a global strong solution.

2.2.1.
Estimates for the centered empirical measure. We shall now estimate the behaviour of the centered measures µ c t . Namely, we are going to prove that these measures are exponentially decreasing. For shortness and simplicity, we introduce the following Definition 2. Let α, C > 0 be given. Then Also, for non-probability positive definite measures, we denote the spaces defined by the same For what follows, we need one easy lemma.
Lemma 1. Let Z be the absolute value of a 3d-dimensional Ornstein-Uhlenbeck process. Then, there exists C 1 > 0, such that for almost any trajectory Z t , one has almost surely Proof. Note that the Ornstein-Uhlenbeck process is ergodic, with stationary measure Thus for all t large enough, 1 t t 0 e |Zs| ds ≤ I + 1. Applying Chebychev's inequality, we see that for all r > 0, 1 t |{s ≤ t : Z s > r}| < (I + 1)e −r .
The main result of this subsection is the following, showing that the measure µ t belongs to the set K α,C . Proposition 3. There exist two constants α, C > 0 such that a.s. at any sufficiently large time t, µ t ∈ K α,C .
To prove this proposition, we need two intermediate lemmas, which proofs are postponed.
Proof of Proposition 3. First, let us estimate the drift of the center. Namely, taking together (22) and Proposition 1, we have for the corresponding Ornstein-Uhlenbeck trajectory Z t . On the other hand, Z is a Harris recurrent process and P (Z) is integrable with respect to the Gaussian measure, thus due to the limit-quotient (or Birkhoff) theorem, almost surely there exists a limit So, almost surely from some time t 1 we have Therefore, after this time we can estimate the displacement of the center between the moments t/2 and t: ∀t > t 1 In fact, the same estimate holds for any t ′ between t/2 and t: This immediately implies that for any t > t 1 and n ∈ N such that 2 −n+1 t > t 1 , one has Let us now apply Lemma 3. First let us decompose, for any t ∈ [t 1 , 2t 1 ], the measure µ 2t as 1 2 µ t + 1 2 µ [t,2t] , then the measure µ 4t as 1 4t] , . . ., and finally the measure µ 2 n t as 1 2 n µ t + 1 2 n µ [t,2t] + · · · + 1 2 µ [2 n−1 t,2 n t] . An induction argument, together with Lemma 2, immediately shows that in each such decomposition, the second term shifted by the corresponding c(µ 2 j t ) belongs toK 0 α,C . The only part that is left to handle is 1 2 n µ t . But the distance between c t and c 2 n t does not exceed C 3 n, and the centered measure µ c t is compactly supported. So it is contained in a ball of some (random) radius R that can be chosen uniform over t ∈ (t 1 , 2t 1 ). Now the measure 1 2 n µ t is of total weight 2 −n and it vanishes outside a radius R ball. If α is small enough so that e αC 3 < 2, then for any r > C 3 n + R, we have 1 2 n µ t (|y − c 2 n t | > r) ≤ 1 2 n µ c t (|y| > r − C 3 n) = 0, and for r ≤ C 3 n + R and n big enough, 1 2 n µ t (|y − c 2 n t | > r) ≤ 2 −n < e −nαC 3 e −αR ≤ e −αr . The middle inequality comes, for n large enough, from a comparison between exponent bases, e αC 3 < 2, with respect to which a multiplication constant e −αR is minor. Finally, joining the obtained 1 2 n µ t (· + c 2 n t ) ∈K 0 α,1 and 1 2 n µ [t,2t] + · · · + 1 2 µ [2 n−1 t,2 n t] (· + c 2 n t ) ∈K 0 α,C , we obtain µ 2 n t ∈ K α,C+1 .
Proof of Lemma 2. This lemma immediately follows from Lemma 1, once we notice that Proof of Lemma 3. First, let us estimate the position of the center ofμ in a way that is linear in λ and does not depend on α and C -thus in particular, proving the statement i). Indeed, cμ is the minimum of the function W * μ. At the point c µ , the gradient of this function can be bounded as because the norm ν(· + c µ ) P is uniformly bounded due to the condition ν(· + c µ ) ∈ K α 0 ,C 0 . Now, restricting the function W * μ on the line joining c µ and cμ, that is considering Once α is small enough so that C ′′ α < 1/2, α < α 0 and once C is greater than 2C 0 e α 0 C ′′ , the right-hand side of (28) is not greater than Ce −αr , what concludes the proof.

2.2.2.
Estimates for the centered measure Π.
Lemma 4. For any κ > 1, the map Π restricted to P κ (R d ; P ) is bounded and Lipschitz.
Proof. First, we need to show that Z(µ) is bounded from below on P κ (R d ; P ). For µ ∈ P κ (R d ; P ), the domination condition (6) implies that W * µ(x) ≤ ||µ|| P P (|x|) ≤ κP (|x|). So we have: we hence have the following bound for Π(µ): Note that Π is C 1 on P(R d ; P ) endowed with the strong topology. As the set of probability measures has no interior point, we have to specify the meaning of C 1 : there exists a continuous linear operator DΠ(µ) : M 0 (R d ; P ) → M 0 (R d ; P ), continuously depending on µ, such that Π(µ ′ )−Π(µ)−DΠ(µ)(µ−µ ′ ) P = O( µ−µ ′ P ) provided that µ ′ ∈ P(R d ; P ) and µ ′ converges toward µ. Indeed, it is easy to see that Now, note that the norms DΠ are uniformly bounded for µ ∈ P κ (R d ; P ) (for any given κ). Indeed, fix ν ∈ M 0 (R d ; P ). Since |W * ν(x)| ≤ ||ν|| P P (|x|), we find that For µ ∈ P κ (R d ; P ), the same computation used for the bound (29) on the norm of Π(µ) enables to control the last integral. Hence, we deduce a bound (call it C ′ κ ) on the norm of the differential. Thus, Π is Lipschitz as stated.
We prove now the exponential decrease for the centered measure Π(µ).

Proposition 4.
There exists a positive constant C Π such that for all µ ∈ P(R; P ), for all Proof. Note first that, imposing a condition C Π ≥ e 2C W , we can restrict ourselves only on R ≥ 2: for R < 2, the estimate is obvious. The measure Π(µ) has the density 1 Z(µ) e −W * µ(x) . To avoid working with the normalization constant Z(µ), we will prove a stronger inequality, that is Pass to the polar coordinates, centered at the center c µ : we want to prove that It suffices to prove such an inequality "directionwise": for all But from the uniform convexity of W and the definition of the center, the function f (λ) = W * µ(c µ + λv) satisfies f ′ (0) = 0 and ∀r > 0, f ′′ (r) ≥ C W . Hence, f is monotone increasing on [0, ∞), and in particular, On the other hand, for all λ ≥ 2, f ′ (λ) ≥ f ′ (2) ≥ 2C W , and thus f (λ) ≥ 2C W (λ − 2) + f (2). Hence, Comparing (31) and (32), we obtain the desired exponential decrease.

2.3.
A new transport metric: T P -metric. Usually, to estimate the distance between two probability measures, one introduces the Wasserstein distance. Indeed, for µ 1 , µ 2 ∈ P(R d ; P ), we define W 2 2 (µ 1 , µ 2 ) := inf{E(|ξ 1 − ξ 2 | 2 )}, where the infimum is taken over the random variables such that {law of ξ 1 } = µ 1 and {law of ξ 2 } = µ 2 . In our setting, for a measure µ, the corresponding probability measure Π(µ) is defined using the convolution W * µ. So, it would be rather natural to use a distance, looking like the one for the weak* topology, but allowing to control W * µ for our unbounded function W . This motivates to introduce a new metric looking like the Wasserstein distance: Definition 3. For µ 1 , µ 2 ∈ P(R d ; P ), we define the P -translation distance between them as where the infimum is taken over the maps f : We also denote the T P -distance between two c-centered measures by T c P (µ 1 , µ 2 ) = T P (µ 1 (· + c), µ 2 (· + c)).
Remark 5. In dimension one, we have the equivalent definition: The following lemma will be useful to show the convergence of the empirical measure in the W 2 -meaning, as Proposition 6 shows.
Let now ξ 1 , ξ 2 be two random variables corresponding to the T P -optimal transport of µ 1 to µ 2 . We then have Indeed, the inequality (34) is due to the fact that the path between ξ 1 and ξ 2 either stays outside the radius max(|ξ 1 |, |ξ 2 |)/2-ball centered in 0, in which case we estimate its length from below as |ξ 1 − ξ 2 |, or this path has a part joining the maximum norm vector to this ball, which is of length greater than max(|ξ 1 |, |ξ 2 |)/2 ≥ |ξ 1 − ξ 2 |/4.
It is clear from the definition that T P is a distance; and also taking into account that |P ′ | ≤ P , one easily has (35) µ 2 P ≤ µ 1 P + T P (µ 1 , µ 2 ).
Thus, the set P(R d ; P ) is T P -complete. Indeed, a T P -Cauchy sequence (µ n ) will have a weak limit µ and it is easy to check that µ P = lim n→∞ µ n P < ∞. So, µ ∈ P(R d ; P ). Now, we are going to estimate the deviance of trajectories in terms of T P -metric, a result that will be useful in §3.1.
Lemma 6. For µ 1 , µ 2 ∈ P(R d ; P ), the following statements hold: 1) The map c is locally Lipschitz in the sense of T P -metric: 4) For all κ > 0, µ c : P κ (R d ; P ) → P(R d ; P ) is T P -Lipschitz.
2.4. Free energy functional. We recall from §1.3.4 that the free energy of a measure is defined as The free energy of a non-self-interacting gas in an exterior potential V is defined as and the map Π associates to a measure µ the probability measure 1 Z e −W * µ(x) dx (when W * µ is well-defined).
The first auxiliary statement implies that, as we mentioned it in §1.3.4, Π(µ) is the unique global minimum of F W * µ . Lemma 7. For any potential V such that e −V is integrable, the probability measure Z −1 e −V is the unique global minimum of F V on P(R d ).
Proof. Let µ = Z −1 e −V . Then, for any arbitrary absolutely continuous measure ν, letting ρ(x) = Ze V (x) ν(x) be its density with respect to µ, we see that and thus Jensen's inequality, for the convex function ρ log ρ, leads immediately to the conclusion. Now, for the free energy functional, McCann [10] proved the following Proposition 5 (McCann). There exists a centered symmetric density ρ ∞ , which is a unique, up to translation, global minimum of F . Moreover, F is a displacement convex functional, that is for two probability measures µ 0 , µ 1 and the Wasserstein-optimal transport between them Finally, the transport distance from a centered measure µ to ρ ∞ can be estimated as Remark 6. i) The uniqueness of the minimum comes from the strict displacement convexity of the restriction to the space of centered measures.
ii) The functional F is not convex in the usual sense, due to the self-interacting part. iii) Inequality (14) together with Lemma 7 immediately imply that the minimum of F is also a fixed point of Π.
Finally, as we are going to work in §3.1.3 with the discretized flow, we will need two auxiliary statements for the free energy: Lemma 8. For all absolutely continuous measures µ, ν ∈ P(R d ; P ) of finite free energy and for all λ ∈ [0, 1], we have where ϕ µ (·) := F W * µ (·) is the free energy in the µ-generated potential. Moreover, for all absolutely continuous µ ∈ P(R d ; P ), we have H(ν)). So, it suffices to prove (38) with entropy terms removed form both sides (from both F and ϕ µ in the right-hand side). After this removing, the formula becomes a Taylor expansion for a degree two polynomial. The same holds for (39), with a remark that the entropy terms are exactly the same in both sides.

Proofs
3.1. Proof of Theorem 3. In fact, we will prove a stronger statement, controlling the speed of convergence in the sense of the transport distance: There exists a > 0 such that almost surely, as t → ∞, where k is the degree of the polynomial P , as well as The proof of this statement will be decomposed into several propositions. We first present them all, postponing their proofs; then deduce from them Proposition 6. Finally, we prove these propositions.
In order to prove this statement, as it was announced in §1.3, we will discretize the random process. Namely, we define the sequence T n of moments of time as T n := n 3/2 and then, ∆T n := T n+1 − T n is of order T 1/3 n . Also, for what follows, we will associate to a random trajectory (X t , t ≥ 0) thz sequence (L n ) defined as An easy conclusion from the Ornstein-Uhlenbeck comparison §2.1.3 and logarithmic drift of the center is that almost surely L n ≤ C ′ 3 log n and L ′ n ≤ log n for any n large enough. Now, let us state the first of the propositions mentioned above, the one allowing to estimate the "Euler-method" one-step error in the description of the behaviour of measures µ t : Proposition 7. Almost surely there exists n 0 such that for any n ≥ n 0 , we have 5d . Associated to the moments of time T n , consider the following, roughly speaking, Eulerapproximation maps for the flowṁ = 1 t (Π(m)−m), with the knots chosen at the moments T n : Let us first exhibit an invariant set for Φ.
Lemma 9. For any α, C as in Lemma 3, corresponding to α 0 = C W and C 0 = C Π (from Proposition 4), if µ ∈ K α,C and i ≤ j, then Φ j i (µ) ∈ K α,C . Proof. This is a direct corollary of Lemma 3.
Denote, for a probability measure µ and for a number h > 0, by µ (h) the "smoothening convolution" where U h (0) is the radius h ball in R d , centered at the origin.
The following proposition allows to compare the deterministic Euler-like behaviour of the smoothened, at some moment T i , measure with the true random trajectory: Proposition 8. There exist some constants A, C 1 , C 2 , C 3 > 0 such that almost surely there exists n 0 for which the following statements hold. For any j > i ≥ n 0 and any h > 0, provided that the right-hand side of (41) does not exceed C 3 . Also, under the same condition, Next, we have to show that the deterministic trajectory of an absolutely continuous measure sufficiently fast approaches the set of translates of ρ ∞ . To do this, due to the estimate (37), it suffices to estimate the free energy: Proposition 9. Let µ ∈ K α,C . Then, there exist a 1 , C 4 , C 5 > 0 such that almost surely there exists n 0 for which the following statements hold for any j ≥ i ≥ n 0 : Now, modulo these propositions, we are ready to prove Proposition 6.
Proof of Proposition 6. Recall from Proposition 7 that β = min(8C W , (5d) −1 ). Note first that the distances T c Tn P (µ t , µ Tn ) for t ∈ [T n , T n+1 ] are uniformly bounded for n sufficiently big by where L n is defined by (40). Hence, it suffices to check the estimate for the subsequence of moments T n : Now, for any sufficiently big n, take i := [n 1−δ ], where a small δ > 0 will be chosen and fixed (in a way that does not depend on n) later. Then, considering for some h > 0 a smoothened convolution µ (h) provided that the right-hand side does not exceed C 3 . Denote by const a generic constant. Let us estimate the first term in the right-hand side: So, for any fixed choice of δ < β/2 1+A+(β/2) , the first term in the right-hand side of (42) will decrease as a negative power of n and thus quicker than e −a k+1 √ log Tn .
. For such a choice of h, the second term in the right-hand side of (42) is not greater than T i Tn ∼ n −δ . So it also decreases quicker than e −a k+1 √ log Tn and thus Finally, we have to estimate T c Tn T i ), ρ ∞ (· + c Tn )). To do this, it suffices to estimate the free energy F (Φ n i (µ (h) T i )), as Indeed, T i ) ≤ C 6 log n for some constant C 6 . Hence, from the first part of Proposition 9, for j = Applying the second part, with Φ j i (µ (h) T i ) as a starting measure, we obtain Let us now prove Propositions 7-9.
3.1.1. One-step error estimate. This section is devoted to the proof of Proposition 7. To estimate the difference between the occupation measure of X t on [T n , T n+1 ], and the measure Π(µ t ), we will first introduce another process, for which Π(µ Tn ) is the stationary measure: the process with "frozen" measure µ Tn . More precisely, on [T n , T n+1 ) we consider a process Y with some choice of Y Tn , satisfying generated by the same Brownian motion B t as X t . In other words, the couple (X t , Y t ) satisfies The following lemma allows to control the difference between them: Adding and substracting ∇W * µ Tn (X t ), we see The last term can be rewritten as Noting the first term as D t , and putting a scalar product with X t − Y t , we see 1 2 Finally, notice that |D t | ≤ P (2L n ) ∆Tn Tn , as it is the difference between the forces generated at X t by µ Tn and by µ t = µ Tn + t−Tn For what follows (see Proposition 10 and Lemma 12 below), we will have to assume that the initial distribution of Y Tn is absolutely continuous with respect to Π(µ Tn ), and to use an estimate on its density. So finally, we define the process Y t for all t in the following way: for every interval [T n , T n+1 ) the initial value Y Tn is chosen randomly with respect to the restriction of Π(µ Tn ) on the unit ball U 1 (c Tn ). On each new interval, the choice is independent of X and of all the past. Then, inside the interval (T n , T n+1 ), the couple (X n , Y n ) satisfies (45).
Let us compare the occupation measures of the processes X and Y on these intervals of time. Denote byμ [Tn,T n+1 ] the occupation measure of Y on the interval [T n , T n+1 ]. Then, we have the following: Lemma 11. For any family of choices Y Tn ∈ U 1 (c Tn ), we have as n → ∞, provided that for n sufficiently big L n ≤ C ′ 3 log n. Proof. The measures µ [Tn,T n+1 ] andμ [Tn,T n+1 ] are both images of the normalized Lebesgue measure 1 ∆Tn Leb [Tn,T n+1 ] under the maps X • and Y • respectively. So, consider the transport ξ s (t) = (1 − s)X t + sY t between them.
Using this transport, we have an estimate By definition of L n , we have ∀t ∈ [T n , T n+1 ], |X t − c Tn | ≤ L n and due to Lemma 10, provided that L n ≤ C ′ 3 log n and n is sufficiently big. This implies that Now, substituting the obtained estimates to the right-hand side of (46), we see that Tn P (2L n + 2) · e −C W (t−Tn) (L n + 1) (we have used that ∆T n ∼ T 1/3 n , and once again the logarithmic growth of L n ). Now, we will compare the occupation measureμ [Tn,T n+1 ] with Π(µ Tn ). To do this, we use Proposition 1.2 of Cattiaux & Guillin [6] (see also Wu [17]), stating that the trajectory mean of a function ψ is, with a probability close to 1 that can be exponentially controlled, close to its stationary mean. Namely, this proposition says the following: Proposition 10 (Cattiaux & Guillin [6]). Given a process ξ with a stationary measure m and Poincaré constant C P , an initial measure ν and a function ψ satisfying |ψ| ≤ 1, one has for any 0 < ρ < 1 and t > 0 .
We will use this proposition with ψ being the indicator function ψ = 1l M of various sets M: it then allows to compare the occupation measure of the set M to its Π(µ Tn )-measure.
We know that m = Π(µ Tn ) is the unique stationary measure of the drifted Brownian motion (44). Also, the Poincaré constant for this process is 2C W (see [1]).
To proceed, we have to declare the initial measure ν = ν n for Y Tn , and we choose it to be the measure Π(µ Tn ) restricted to the ball U 1 (c Tn ) and then normalized accordingly. Then, the latter inequality is due to the exponential tails of Π(µ Tn ). Having made these choices, we are going to prove the following Lemma 12. As n → ∞, we have almost surely Proof. The previous estimates imply that the process Y t on [T n , T n+1 ] almost surely for all n sufficiently big stays inside the ball U Rn (c Tn ), where R n := 3L n . Now, take this ball and cut it into some number N n parts M 1 , . . . , M Nn of diameter less than ε n := 2dRn d √ Nn (by cubic the grid with the step 2R n / d √ N n , that is decomposing each of the coordinate segments of length 2R n into d √ N n parts). We will choose and fix the number N n later. For each of these parts, choose Let ψ j = 1l M j . Then, the probability that all the empirical measuresμ [Tn,T n+1 ] (M j ) are ρ j -close to their "theoretical" values Π(µ Tn )(M j ) is at least .
As the variance V ar Π(µ Tn ) (ψ j ) does not exceed Π(µ Tn )(ψ j ), we have a lower bound for the probability by we pay at most P (3L n )ε n = O (∆T n ) − 1 5d . Next, bring the exterior part of Π(µ Tn ) to the ball U Rn (c Tn ): due to the exponential decrease estimates, we pay at most as R n = 3 log T n . Finally, let us re-distribute the parts left: we pay at most Adding these three estimates, we obtain the desired T c Tn P (μ [Tn,T n+1 ] , Π(µ Tn )) = O (∆T n ) −β with β = min(8C W , (5d) −1 ).
Putting Lemmas 11 and 12 together, and recalling that ∆T n ∼ T 1 3 n , we conclude that almost surely, for all n sufficiently big, Proposition 7 is proven.
Proof of Proposition 8. We prove the proposition by induction on j. The case j = i is obvious: the only term in the right-hand side is C 1 h, being an estimate for the distance to the smoothened convolution: provided that h ≤ 1 (because the norm µ c T i P is bounded due to the exponential tails of µ c ).
Let us now check the step of induction. Namely, assume that the conclusion holds for some j ≥ i, and check it for j + 1. To do this, first shift the center of the translation distance from c T j+1 to c T j : from Proposition 3 provided that |c T j+1 − c T j | ≤ 1. On the other hand, we have by Lemma 6 Now, the map Π is Lipschitz on K α,C by Proposition 4, so for any two measures ν 1 , ν 2 one has Substituting for ν 1 and ν 2 respectively the translated by c T j images of measures Φ j i (µ (h) T i ) and µ T j respectively, we see that Now, using that by Proposition 7, Finally, we fix the choice of A := A 1 + A 2 , and, using the induction assumption, the righthand side of (51) is not greater than The induction step is proved.
3.1.3. Decrease of energy. This section is devoted to the proof of Proposition 9. To estimate the decrease of energy, we will need the following is an increasing continuous function, and the constants C 7 , ε 0 , ε 1 depend only on α and C.
We postpone its proof, but we use it as a motivation for the next result, which immediately implies Proposition 9: Lemma 14. There exists n 0 such that for any µ ∈ K α,C and for any j ≥ i ≥ n 0 : with the initial condition y(T i ) = max(F (µ T i ), 1).
Proposition 9 is its immediate corollary, as the solution of (52) decreases exponentially for big energies y and has the form y(t) = exp{− k+1 C 7 2 (k + 1) log(t/T 0 )} for y ≤ ε 0 (what happens for t large enough).
We will need the following corollary to Lemma 8: Corollary 1. For any fixed α, C, there exists C ′′ such that for all µ ∈ K α,C , for all 0 < λ < 1, Proof. For µ ∈ K α,C , the integral that is the coefficient before λ 2 is uniformly bounded.
Let us now prove the previous lemmas.
3.2. Proof of Theorem 4. As it has been already shown in (22), we have Tn P (L n + |c t − c Tn |) C W t dt ≤ P (L n + C 3 ) ∆T n T n .
Thus, almost surely one has osc t∈[Tn,T n+1 ] c t → 0 as n → ∞. So, to prove Theorem 4, it suffices to show that the sequence c Tn converges almost surely. Now, let us estimate the distance c T n+1 − c Tn . Indeed, Translating c Tn to the origin, using the decrease estimates of §3.1 and recalling that c(·) : K 0 α,C → R d is a T P -Lipschitz function, we see that ∆T n T n+1 · T c Tn P (µ [Tn,T n+1 ] , µ Tn ).
As in §3.1.2, the distance in the right-hand side can be estimated as a sum of two distances: (59) T c Tn P (µ [Tn,T n+1 ] , µ Tn ) ≤ T c Tn P (µ [Tn,T n+1 ] , Π(µ Tn )) + T c Tn P (Π(µ Tn ), µ Tn ). We already have an estimate for the first term in this sum: On the other hand, the limit density ρ ∞ is a fixed point of the map Π. And the map Π being Lipschitz on K 0 α,C , the second summand in (59) can be estimated as T c Tn P (Π(µ Tn ), µ Tn ) ≤ (Lip K 0 α,C (Π) + 1) · T P (µ c Tn , ρ ∞ ). The latter distance is already estimated in the proof of Theorem 3: almost surely for n sufficiently big, we have T P (µ c Tn , ρ ∞ ) ≤ exp{−a k+1 log T n }. Finally, adding the estimates for the first and the second terms in (59), we obtain that for all n sufficiently big,  We choose δ such that δ Lip(W ) < 1/3 and then χ is a contraction, as stated, with Lip(χ) ≤ 1/2. So, we have obtained existence and uniqueness of the solution on [0, δ].