On a strong form of propagation of chaos for McKean-Vlasov equations

This note shows how to considerably strengthen the usual mode of convergence of an $n$-particle system to its McKean-Vlasov limit, often known as propagation of chaos, when the volatility coefficient is nondegenerate and involves no interaction term. Notably, the empirical measure converges in a much stronger topology than weak convergence, and any fixed $k$ particles converge in total variation to their limit law as $n\rightarrow\infty$. This requires minimal continuity for the drift in both the space and measure variables. The proofs are purely probabilistic and rather short, relying on Girsanov's and Sanov's theorems. Along the way, some modest new existence and uniqueness results for McKean-Vlasov equations are derived.


Introduction
This note develops a simple but apparently new approach to analyzing McKean-Vlasov stochastic differential equations, of the form dX t = b(t, X t , µ t )dt + σ(t, X t )dW t , µ t = Law(X t ), ∀t ≥ 0, in which the drift is merely bounded and measurable, with fairly weak continuity requirements in the measure variable. The volatility σ is nondegenerate and independent of the measure, and this enables a line of argument based on Girsanov's theorem which leads to a much stronger propagation of chaos result than usual, along with some new results on existence and uniqueness.
Propagation of chaos here refers to the convergence of the n-particle system, defined by the SDE dX n,i 1 n n j=1 δ X n,j t , to the solution law µ of the McKean-Vlasov equation. Precisely, propagation of chaos typically means that the empirical measures µ n (say, on the path space) converge weakly in probability to the deterministic measure µ, or equivalently that the law of (X n,1 , . . . , X n,k ) converges weakly to the product measure µ ⊗k for any fixed k. In our context, we show that in fact the law of (X n,1 , . . . , X n,k ) converges in total variation to µ ⊗k . Moreover, the sense in which µ n converges in probability to µ can be strengthened; rather than working with the usual weak topology induced by duality with bounded continuous test functions, we work with the stronger topology induced by duality with bounded measurable test functions. In particular, our results will assume that b(t, x, µ) is continuous in µ in this stronger topology (or, for some results, in total variation) and merely measurable in (t, x), and our coefficients may be path-dependent as well.
McKean-Vlasov equations have been studied in a variety of contexts since the seminal work of McKean [18]. Sznitman's monograph [21] is a classic introduction, and Gärtner's results [11] remain among the most general on existence, uniqueness, and propagation of chaos results for models with (weakly) continuous coefficients.
More recently, interacting diffusion models of this form have enjoyed something of a renaissance, due in part (but certainly not entirely) to new applications in mean field game theory [17], and this is one impetus for revisiting these classical questions here. The McKean-Vlasov equations arising in mean field game theory can involve feedback controls obtained via Nash equilibrium problems. Regularity for these controls can be hard to come by, and this motivates a better understanding of somewhat more pathological dynamics. For instance, the recent work of [4] on mean field games with absorbing states naturally gives rise to McKean-Vlasov systems with discontinuous and path-dependent coefficients.
Several authors have studied McKean-Vlasov systems with various kinds of discontinuities arising in a variety of concrete applications. Noteworthy classes of examples include interactions based on ranks [20,14] and quantiles [8,16], to which our results apply in certain cases. One such example given in Section 2.4, where we show that the particle approximation of Burgers' equation given in [3,13] holds in a stronger sense.
While several papers have studied McKean-Vlasov equations with discontinuities, the coefficients are often continuous enough, in the sense that the set of discontinuities has measure zero with respect to any candidate solution (see, e.g., [7]). In such a situation one can still apply the usual weak convergence arguments, which are not available for the general discontinuities in x we allow, more in the spirit of [13]. We lastly mention the interesting recent works [19,6] that deal with similarly irregular coefficients but less general interaction terms, with no results on propagation of chaos. While our existence and uniqueness results differ from those mentioned above, the main novelty of this work is the strong propagation of chaos result, Theorem 2.5.
Section 2 below states the main results, and proofs are given in Sections 3 and 4. It is worth stressing that all of the proofs are purely probabilistic.

Notation and topologies.
Let E be a Polish space. For a signed Borel measure γ on E, define the total variation norm Let P(E) denote the set of Borel probability measures on E. For µ, ν ∈ P(E), define the relative entropy Let B(E) denote the set of bounded measurable real-valued functions on E. Define τ (E) to be the coarsest topology on P(E) such that the map µ → E φ dµ is continuous for each φ ∈ B(E). This topology is somewhat well known in large deviations literature as the τ -topology. Notably, (P(E), τ (E)) is not separable or metrizable.
The map E n ∋ (x 1 , . . . , x n ) → 1 n n j=1 δ x j ∈ P(E) need not be measurable with respect to the Borel σ-field of (P(E), τ (E)), and we will need to work with a smaller σ-field for which we recover this measurability. Define E(P(E)) to be the smallest σ-field on P(E) such that the map µ → E φ dµ is measurable for each φ ∈ B(E). It is well known that E(P(E)) coincides with the Borel σ-field on P(E) generated by the topology of weak convergence [2, Corollary 7.29.1].

The McKean-Vlasov equation. Fix a time horizon T > 0 and a dimension d ∈ N.
Let C = C([0, T ]; R d ) denote the path space, endowed with the supremum norm. We will be interested in McKean-Vlasov equations of the form 2) stated more precisely in Definition 2.2 below. The data of the problem are coefficients and an initial law λ 0 ∈ P(R d ).
For µ ∈ P(C) and t ∈ [0, T ], let µ t ∈ P(C) denote the law of the process stopped at time t, defined as the image of µ through the map C ∋ x → x ·∧t ∈ C. At various points in the sequel, we will refer to the following assumptions: , and σ is jointly Borel-measurable. In addition, the coefficients are progressive in the sense that Moreover, there exists a unique strong solution to the driftless SDE, For each µ ∈ P(C), the following function is sequentially τ (C)-continuous at µ: Remark 2.1. If one is careful about integrability, the assumptions can undoubtedly be relaxed to cover unbounded coefficients and stronger topologies for the continuity of b(t, x, µ) in µ. We prefer to avoid obscuring the main line of argument with such generalities.
Definition 2.2. We say µ ∈ P(C) is a weak solution of (2.2) if there exists a filtered probability space (Ω, F, F, P) supporting a progressively measurable d-dimensional process X, a d-dimensional F-Wiener process W , and an F 0 -measurable random vector ξ with law λ 0 , such that P • X −1 = µ and The closest result to Theorem 2.4 that we know of seems to come from the paper [5], from which we borrow the proof idea. A nearly identical form of Theorem 2.3 was given in [13, Theorem 2.2] and [4, Theorem C.1], though our proof seems to be much simpler.

Propagation of chaos.
For n ∈ N, let (X n,1 , . . . , X n,n ) denote a weak solution on some filtered probability space (Ω, F, F, P) of the SDE system where W 1 , . . . , W n are independent d-dimensional F-Wiener processes, and ξ 1 , . . . , ξ n are i.i.d. and F 0 -measurable with law λ 0 . Under assumptions (E) and (A), a standard argument by Girsanov's theorem guarantees the existence and uniqueness in law for this SDE system.

Theorem 2.5. Assume (E) and (A) hold. Suppose there exists a weak solution µ of (2.2).
For each 0 ≤ s < t ≤ T , assume that the function F s,t : P(C) → R defined by is τ (C)-continuous and E(P(C))-measurable. Assume lastly that there exists L > 0 such that Then the following hold: The closest result we know of to Theorem 2.5 is that of [1, Theorem 3], which proves (3) above even when k can grow with n, but only when the coefficients (in particular, the interactions) take a very specific form.
The assumption (2.3) in Theorem 2.5 is worth commenting on, so we point out two notable sufficient conditions. First, in light of Pinsker's inequality, assumption (B1) is sufficient for (2.3). For a second example, suppose σ is the identity, and the initial law λ 0 satisfies R d exp(a|x| 2 )λ 0 (dx) < ∞ for some a > 0. Then, by boundedness of b and exponential integrability of Brownian motion, there existsã > 0 such that It follows [12, Proposition 6.3] that there exists C > 0 such that µ satisfies the transport inequality W 1 (µ, ν) ≤ CH(ν|µ), ∀ν ∈ P(C), where W 1 denotes the 1-Wasserstein metric on P(C). If we assume b(t, x, ·) is Lipschitz with respect to W 1 , uniformly in (t, x), then it follows that (2.3) holds.
Remark 2.6. Conclusion (3) of Theorem 2.5 implies in particular that for each k ∈ N and φ 1 , . . . , φ k ∈ B(C). In fact, for fixed k, this convergence is uniform over all φ 1 , . . . , φ k ∈ B(C) satisfying |φ i | ≤ 1. 3)) the same local large deviation bounds as in [9, Theorem 5.2], but in the stronger topology τ (C). However, to deduce from this a full LDP in the topology τ (C) analogous to [9, Theorem 5.1], one would need to establish exponential tightness of µ n in the same topology, which does not seem feasible.
Remark 2.9. It is not true in the setting of Theorem 2.5 that P(lim n µ n = µ) = 1, where the limit is taken in τ (C). In fact, P(lim n µ n = µ) = 0, because for each ω ∈ Ω the countable set S(ω) = {X n,i (ω) : n ∈ N, 1 ≤ i ≤ n} satisfies both µ n (ω)(S(ω)) = 1 and µ(S(ω)) = 0, as µ is nonatomic. 1 In general, a sequence of discrete measures can never τ (C)-converge to a nonatomic measure, so we cannot hope to improve the convergence in probability stated in Theorem 2.5 (2). For this reason, we cannot state a version of Theorem 2.5 in line with more traditional propagation of chaos results (e.g., [11, Theorem 3.1]), in which the initial states X n,i 0 are taken to be deterministic but with a prescribed limit λ 0 = lim n 1 n n k=1 δ X n,k 0 .

A rank-based interaction.
A notable class of examples related to Burgers' and porous medium type PDEs fits into our framework. Consider the one-dimensional case d = 1, with σ ≡ 1 and where G : [0, 1] → R is Lipschitz continuous. The corresponding McKean-Vlasov equation is Letting V (t, x) = µ t (−∞, x], one expects (cf. [20,13,3]) that V is the unique generalized solution of the Burgers-type equation where G is an antiderivative of g, and this reduces to Burgers' equation when g(x) = x. The corresponding n-particle approximation is where X n,i 0 are i.i.d. with law λ 0 , and W i are independent Brownian motions. All of the assumptions of our Theorems 2.3, 2.4, and 2.5 hold in this example. Notably, our Theorem 2.5(3) is considerably stronger than [3,Theorem 3.2] or [13,Theorem 2.4], which provide only weak convergence.

Existence and uniqueness proofs
The proofs of both Theorems 2.3 and 2.4 rely on the following change of measure argument. Let (Ω, F, F = (F t ) 0≤t≤T , P ) denote a filtered probability space supporting an F-Wiener process W and an F 0 -measurable random vector ξ : For each µ ∈ P(C), define a measure P µ ∼ P by where we define E t (M ) = exp(M t − 1 2 [M ] t ) for any continuous martingale M . Girsanov's theorem implies that defines a P µ -Wiener process, and dX t = b(t, X, µ)dt + σ(t, X)dW µ t . Then, a measure µ ∈ P(C) is a weak solution of (2.2) if and only if P µ • X −1 = µ.
For t ∈ [0, T ] and µ, ν ∈ P(C), abbreviate H t (ν|µ) := H(ν t |µ t ). Let Φ(µ) := P µ • X −1 for µ ∈ P(C). For any µ, ν ∈ P(C), we have Assumption (A) and nondegeneracy of σ imply that W and X generate the same filtration. Hence, and so Proof of Theorem 2.3. We use Banach's fixed point theorem on the complete metric space (P(C), · TV ). For any µ, ν ∈ P(C), we use (3.1) along with assumption (B1) to get By Pinsker's inequality, Conclude by Picard iteration. 2 Proof of Theorem 2.4. This proof is by Schauder's fixed point theorem, on the topological vector space of bounded signed measures on C endowed with the weak * topology induced by B(C). Note that the induced topology on the subset P(C) is exactly τ (C). Proceeding as in (3.1), for any µ ∈ P(C) we have where the constant c > 0 comes from assumption (A). Hence, For the reader worried about measurability of the integrand s → ν s − µ s 2 TV , notice that we may write from which it is clear that the total variation norm is lower semicontinuous and thus Borel measurable with respect to the topology of weak convergence on P(C).
Sub-level sets of relative entropy are convex, compact, and metrizable in τ (C) [10,Lemma 6.2.12]. Hence, to apply Schauder's theorem it remains only to show that Φ : P(C) → P(C) is sequentially τ (C)-continuous. Fix ν, µ ∈ P(C), and use Pinsker's inequality with (3.1) to get As a function of ν, the right-hand side is sequentially τ (C)-continuous at ν = µ by assumption (B2), and this completes the proof.

Proof of Theorem 2.5
We first introduce some notation, used in the proof of both claims (1) and (2). We transfer the problem set up to a convenient probability space. Let (Ω, F, P ) be a probability space supporting an i.i.d. sequence of processes X i with law µ. For n ∈ N, let F n = (F n t ) 0≤t≤T denote the filtration generated by (X 1 , . . . , X n ). There exist i.i.d. Wiener processes W 1 , W 2 , . . . such that and such that W i is adapted to the filtration generated by X i . For n ∈ N, let . Define a measure P n on (Ω, F n T ) by dP n /dP = Z n T , where we define the density process By Girsanov's theorem, W n,i · := W i · − · 0 σ −1 b(t, X i , µ n )dt defines a P n -Wiener process, and dX i t = b(t, X i , µ n )dt + σ(t, X i )dW n,i t . Hence P n • (X 1 , . . . , X n ) −1 is a weak solution of the n-particle system, and in the notation of Section 2.3 we have P • (X n,1 , . . . , X n,n ) −1 = P n • (X 1 , . . . , X n ) −1 .
Proof of (1). Fix a E(P(C))-measurable open set U ⊂ P(C) containing µ. The goal is to show that lim n→∞ P n (µ n / ∈ U ) = 0. (4.1) Fix p, q ∈ (1, ∞), and let p * and q * denote the conjugate exponents, p * = p/(p − 1) and q * = q/(q − 1). Assume p and q are such that M = LT pq/2 is an integer, for reasons which will be clear later. Define t j = jT /M for j = 0, . . . , M . We will show inductively that, for each j, Indeed, once this is established, it is easy to complete the proof of (1) as follows: By taking j = 0 and noting that P n and P agree on F t 0 = F 0 , it follows from (4.2) that lim sup Noting that lim x→∞ ( x x−1 ) x = e, we may send p, q → ∞ in the above to get (2.4).