Particle method and quantization-based schemes for the simulation of the McKean-Vlasov equation

In this paper, we study three numerical schemes for the McKean-Vlasov equation \[\begin{cases} \;dX_t=b(t, X_t, \mu_t) \, dt+\sigma(t, X_t, \mu_t) \, dB_t,\: \\ \;\forall\, t\in[0,T],\;\mu_t \text{ is the probability distribution of }X_t, \end{cases}\] where $X_0$ is a known random variable. Under the assumption on the Lipschitz continuity of the coefficients $b$ and $\sigma$, our first result proves the convergence rate of the particle method with respect to the Wasserstein distance, which extends a previous work [BT97] established in one-dimensional setting. In the second part, we present and analyse two quantization-based schemes, including the recursive quantization scheme (deterministic scheme) in the Vlasov setting, and the hybrid particle-quantization scheme (random scheme, inspired by the $K$-means clustering). Two examples are simulated at the end of this paper: Burger's equation and the network of FitzHugh-Nagumo neurons in dimension 3.


Introduction
The McKean-Vlasov equation was originally introduced in [McK67] as a stochastic model naturally associated to a class of non-linear PDEs.Nowadays, it refers to the whole family of stochastic differential equations whose coefficients not only depend on the position of the process X t at time t but also depend on its probability distribution µ t .This distribution dependent structure of the McKean-Vlasov equation is widely used to model phenomenons in Statistical Physics (see e.g.[MA01], [BJR22]), in mathematical biology (see e.g.[BFFT12], [BFT15]), also in social sciences and in quantitative finance, often motivated by the development of the Mean-Field games and the interacting diffusion models (see e.g.[LL18], [CL18] and [CD18]).Besides the original paper [McK67], an excellent reference of the general theory of the McKean-Vlasov equation and the propagation of chaos is [Szn91].See also the lecture note [Lac18].
In this paper, we consider a filtered probability space (Ω, F, (F t ) t≥0 , P) satisfying the usual conditions and an (F t )−standard Brownian motion (B t ) t≥0 valued in R q .For a separable Banach space (E, • E ), we denote the set of probability distributions on E by P(E) and denote by P p (E) the set of probability distributions on E having p-th finite moment, p ≥ 1.The McKean-Vlasov equation writes dX t = b(t, X t , µ t ) dt + σ(t, X t , µ t ) dB t , µ t is the probability distribution of X t , t ∈ [0, T ], (1.1) where X 0 : (Ω, F, P) → R d , B(R d ) is a known random variable independent of the Brownian motion (B t ) t≥0 and b, σ are Borel functions defined on [0, T ] × R d × P(R d ) having respective values in R d and M d,q (R), the space of real matrices with d rows and q columns.This paper aims to show three implementable numerical methods for the simulation of the McKean-Vlasov equation (1.1), including the particle method and two quantization-based schemes, accompanied by a quantitative analysis of the corresponding simulation error.
In the following discussion, the notation C p1,...,pn means a positive constant depending on parameters p 1 , ..., p n , |•| d denotes the Euclidean norm on R d (we drop the subscript d when there is no ambiguity) and •|• denotes the Euclidean inner product.We endow M d,q (R) with an operator norm |||•||| defined by |||A||| := sup |z| q ≤1 |Az| d .For a random variable X, we write P X or L(X) for its probability distribution and write X p for its L p -norm defined by X p := E |X| p 1/p , p ≥ 1.Moreover, let W p denote the L p -Wasserstein distance on P p (R d ), p ≥ 1, defined by |x − y| p π(dx, dy) where in the first line of (1.2), Π(µ, ν) denotes the set of all probability measures on with marginals µ and ν.We denote by δ a the Dirac mass at a point a ∈ R d .
Throughout this paper, we make the following assumption on the coefficients b, σ and on the initial random variable X 0 .Remark that the following assumption depends on an index p ∈ [2, +∞).
(2) The coefficient functions b and σ are γ-Hölder continuous in t, γ ∈ (0, 1], and Lipschitz continuous in x and in µ in the following sense: there exists a constant L > 0 such that (1.4) Assumption I is a classical assumption for existence and uniqueness of a strong solution X = (X t ) t∈[0,T ] to the McKean-Vlasov equation (1.1) (see e.g.[Lac18], [Liu19,Chapter 5]) and the convergence of the Euler scheme of (1.1), defined further in (1.6) (see [LP22, Proposition 2.1]).Besides, in the Vlasov case, that is, there exist β : a sufficient condition for Assumption I is Assumption II V (see below), which follows from the Kantorovich-Rubinstein dual representation of the Wasserstein distance W 1 : and follows from the fact that for every p ≥ 1, W 1 (µ, ν) ≤ W p (µ, ν) (see e.g.[Edw11] and [Vil09,Chapter 6]).
Assumption II V .The functions β and a in (1.5) are γ-Hölder continuous in t, γ ∈ (0, 1], Lipschitz continuous in (x, u) uniformly with respect to t ∈ [0, T ] in the following sense: there exists a constant L > 0 such that We then turn to the construction of three numerical schemes, which contains two parts: the temporal discretization on [0, T ] and the spatial discretization on R d .
We fix M ∈ N * and define h = T M as the time step.Let t m = t M m := m • h, 0 ≤ m ≤ M and let Z m+1 := 1 √ h (B tm+1 − B tm ), 0 ≤ m ≤ M − 1.The random variables Z m , 1 ≤ m ≤ M , have standard normal distribution N (0, I q ), where I q denotes the identity matrix of size q × q.The theoretical Euler scheme of (1.1) is defined as follows, where for every m ∈ {0, ..., M }, μM tm denotes the probability distribution of XM tm .We also define the continuous extension of ( XM t0 , ..., XM (1.7) with the same X0 = X 0 .When there is no ambiguity, we will omit the superscript M in (1.6) and in (1.7).
Under Assumption I, [LP22, Proposition 2.1] gives the following convergence rate of the theoretical Euler scheme.
We call (1.6) theoretical Euler scheme since it is not directly implementable, contrary to the Euler scheme for a standard diffusion dX t = b(t, X t )dt + σ(t, X t )dB t .The reason is that the scheme (1.6) does not directly indicate how to simulate μM tm in the coefficient functions b and σ.To do this, we need a further spatial discretization on R d , a key point of this paper, to construct a discrete approximation of μM tm .(2.1) Particle method.A first way to perform the spatial discretization is called the particle method, inspired by the propagation of chaos property of the McKean-Vlasov equation (see e.g.[Szn91], [G 88], [Lac18] and [CST22]).In one dimensional setting, the convergence rate of the distribution function and the density function of the particle method has been established in [BT97].In this paper, we establish the convergence rate with respect to the Wasserstein distance, which also holds for a higher dimensional setting (d ≥ 2).
We consider the same temporal discretization number M and the same time step h as in (1.6).For the simplicity of notation, we will omit the superscript M in the following discussion.Let X1,N 0 , ..., XN,N 0 be i.i.d copies of X 0 in (1.1) and let B n := (B n t ) t∈[0,T ] , 1 ≤ n ≤ N, be N independent standard Brownian motions valued in R q , independent of the Brownian motion (B t ) t∈[0,T ] in the initial McKean-Vlasov equation (1.1) and of (X 0 , X1,N 0 , ..., XN,N 0 ).The main idea of the particle method is to construct an N -particle system ( X1,N tm , ..., XN,N tm ) 0≤m≤M by computing for every 1 and then to use μN tm defined in (1.8) as an estimator of μtm in (1.6) at each time step t m , 1 ≤ m ≤ M .
Remark that μN tm in (1.8) is a random probability distribution, that is, it depends on ω ∈ Ω.Our first main result is the convergence rate of W p (μ N tm , μtm ) in L p for every 0 ≤ m ≤ M .Let C [0, T ], R d , • sup denote the space of R d -valued continuous applications defined on [0, T ], equipped with the uniform norm α sup := sup t∈[0,T ] |α t |.Let W p denote the Wasserstein distance on ´, (1.9) which is defined for every µ, ν where in the first line of (1.10), Π(µ, ν) denotes the set of all probability measures on for the temporal discretization.Let (μ tm ) 1≤m≤M , (μ N tm ) 1≤m≤M be probability distributions respectively defined by the theoretical Euler scheme (1.6) and the particle method (1.8).
(i) We have where μ is the probability distribution of the process X = ( Xt ) t∈[0,T ] defined by (1.7) and ν N := (ii) If, in addition, X 0 p+ε < +∞ for some ε > 0, we have the following inequality where ‹ C is a constant depending on p, ε, d, b, σ, L, T and X 0 p+ε .
Theorem 1.2 can be considered as a strong error of the particle method.The weak error, that is, the upper-bound of |Φ(μ tm ) − E Φ(μ N tm )| for a function Φ : P p (R d ) → R d with appropriate differentiability, can be established in the future by applying similar techniques as [CST22].Moreover, we refer to [AKH02] and [HL22] for the studies of the density simulation of μtm based on the particle method (1.8).
(2.2) Quantization-based scheme.A second way to implement the spatial discretization is to use the (optimal) vector quantization, also known as K-means clustering in unsupervised learning.
Consider a probability distribution µ on R d .The main idea of the vector quantization is to use the projection of µ on a fixed quantizer (called also quantization grid in the literature) x = (x 1 , ..., x K ) ∈ (R d ) K as an approximation of µ.More specifically, for a fixed quantization level K ∈ N * , for a quantizer x = (x 1 , ..., x K ) satisfying x i = x j if i = j, this projection, denoted by µ x , is defined by (see Figure 1) and where the projection function Proj x is defined by The approximation µ x defined in (1.11) is determinist.Moreover, if µ ∈ P p (R d ), p ≥ 1, there exists (at least) an optimal quantizer x * = (x * 1 , ..., The equality (1.14) shows that µ x * is the closest probability measure to µ with respect to the Wasserstein distance W p , among all probability distributions having a support of at most K points ([GL00, Lemma 3.4]).Moreover, if µ ∈ P p+ε (R d ) for some ε > 0, the optimal error W p (µ, µ x * ) has the following upper bound with convergence rate K −1/d (Zador's Theorem, see e.g.[LP08] and [Pag18, Theorem 5.2]) In the quadratic setting (p = 2), the optimal quantizer x * can be found by several numerical methods such as Lloyd's algorithm (see further Algorithm 1 and [Llo82], [Kie82], [PY16]) or the CLVQ algorithm (see e.g.[Pag15, Section 3.2]).Figure 2 is an illustration of the quadratic optimal quantization at level K = 60 for the standard normal distribution µ = N (0, I 2 ), where the blue points represent an optimal quantizer and the colour in red represents the weight µ(V k (x)) of each Voronoï cell (the darker the heavier).  The Voronoï partition V k (x) 1≤k≤K generated by a fixed quantizer x = (x 1 , ..., x K ) is not unique since we can place points on the hyperplane H i,j := {ξ ∈ R d | |x i − ξ| = |x j − ξ|} in either the Voronoï cell V i (x) or V j (x).However, the choice of the Voronoï partition does not impact on the quantization error of the quantizer x (see further discussion in (3.1)-(3.2)).
2 This idea of applying optimal quantization to simulate the McKean-Vlasov equation was firstly introduced in [GPPP05][Section 4] in a different framework.Besides, another quantization-based scheme is proposed in [Liu19, Section 7.4], called doubly quantized scheme, in which we implement the quantized Gaussian random variables instead of (Zm) 1≤m≤M in (1.6).
Different from the particle method, the recursive quantization scheme is deterministic, that is, the simulation result does not depend on ω ∈ Ω.
(a) Recursive quantization scheme.The most natural idea of applying optimal quantization to the simulation of the McKean-Vlasov equation is to replace Xtm and μtm in (1.6) by their respective quantized projection.More precisely, we consider a quantizer sequence for each time step t m and define the quantized random variable " X tm and its probability distribution µ tm by (1.16) In (1.16), at each time step t m , we add a quantization procedure for ‹ X tm , µ tm = L( ‹ X tm ) and inject their quantization projection " into the time step t m+1 .This scheme (1.16) is not directly implementable because of µ tm for the same reason as the theoretical Euler scheme (1.6), so we call (1.16) theoretical quantization-based scheme.However, in the Vlasov setting (1.5), we can circumvent this issue by using the recursive quantization method, which is firstly introduced in [PS15] for the diffusion equation dX t = b(t, X t )dt + σ(t, X t )dB t .
The main idea of the recursive quantization method is to construct the Markovian transitions of ( " X tm , µ tm ) based on (1.16).To be more specific, for each time step t m , by considering Voronoï partitions (V k (x (m) )) 1≤k≤K generated by x (m) , 0 ≤ m ≤ M, and by applying (1.11), we can rewrite µ tm in (1.16) as follow The expression (1.17) shows that for every m ∈ {0, ..., M }, the quantized distribution µ tm is determined by the quantizer K ) and its weight vector p (m) = (p (m) 1 , ..., p (m) K ).In the Vlasov setting (1.5), the transition step of (1.16), with respect to µ tm in (1.17), writes k )Z m+1 .
Consequently, given " X tm and p (m) , the transition probability is so that given p (m) , we can compute p (m+1) j = P( " A detailed proof of the above equalities is provided in Section 3.2, where we also explain how to combine this recursive quantization method with Lloyd's algorithm to optimally compute the quantizer x (m) at each time step.The following theorem, whose proof is in Section 3.1, shows the error analysis of the quantization based scheme (1.16).
Theorem 1.3 (Error analysis of the quantization-based scheme).Assume that Assumption I is satisfied with p = 2. Set M ∈ N * and h = T M for the temporal discretization.Let ( Xtm ) 0≤m≤M be random variables defined by the Euler scheme (1.6).Consider a fixed K-level quantizer sequence K ), 0 ≤ m ≤ M and define ( ‹ X tm ) 0≤m≤M , ( " X tm ) 0≤m≤M by the quantization-based scheme (1.16).
(1) For every m ∈ {1, ..., M }, we have where for every j ∈ {0, ..., m}, Ξ j denotes the quadratic quantization error of the j-th step, that is, (2) If moreover, there exists ε > 0 such that X 0 2+ε < +∞ and if for every 0 ≤ m ≤ M , x (m) is a quadratic optimal quantizer of ‹ X tm , we have (1.21) (b) Hybrid particle-quantization scheme.Another way to simulate the McKean-Vlasov equation is to apply the optimal quantization method to the particle system (1.8) and subsequently devise the following hybrid particle-quantization scheme.Consider the same initial random variables X1,N 0 , ..., XN,N 0 and the same Brownian motions B n , 1 ≤ n ≤ N as (1.8) and define (1.22) This scheme adds a quantization step of the empirical measure 1 N N i=1 δ X n,N tm at time t m and injects the quantized measure µ K tm as an input to the simulation for ‹ X n,N tm+1 .This scheme is inspired by the consistency of the optimal quantization established in [LP20].Namely, we can envisage that an optimal quantizer of the particle system at time t m is quasi -optimal for the measure μtm of (1.6), as a consequence of the convergence of the particle method obtained in Theorem 1.2.
The error analysis of the hybrid scheme (1.22) is established in the following proposition.
Proposition 1.4.Set the same temporal discretization as Theorem 1.2 and Theorem 1.3.Assume that Assumption I holds true with p = 2.For any m ∈ {1, ..., M }, let μN tm be the empirical measure on the particles ( X0,N tm , ..., XN,N tm ), defined by the particle system (1.8) and let µ K tm be the quantized measure defined in (1.22).Then, denotes the quadratic quantization error at time t m and C 1 , C 2 are positive constants depending on L, q and T .Remark 1.5.We would like to highlight that the upper-bounds in (1.20), (1.21) and (1.23) are not numerically optimal since the two sums m−1−j at the right hand side converge to infinity as the temporal discretization number M → +∞.These are common error bounds when dealing with Markovian quantization based scheme (see e.g.[PS15]).However, we do not observe on numerical experiments (see further Section 4) a large simulation error, even when we consider a large values of M .This paper is organised as follows.
In Section 2, we prove Theorem 1.2, the convergence rate of the particle method.The proof shares the same idea of the well-known propagation of chaos (see e.g.[Lac18, Theorem 3.3]).We compare the particle system (1.8) with another particle system without interaction, that is, a system composed by N i.i.d.Itô processes (Y 1 t , ..., Y N t ) t∈[0,T ] simulated by the continuous Euler scheme (1.7).Then the error of the particle method can be bounded by, up to a constant multiplier, the Wasserstein distance between the probability measure of X defined by (1.7) and its empirical measure defined on its i.i.d.copies (Y 1 , ..., Y N ).
Section 3 is devoted to the quantization-based schemes (1.16), (1.18) and (1.22) and their error analyses.We first give a review of the optimal quantization method in Subsection 3.1 and prove Theorem 1.3, the L 2 -error analysis of the theoretical Euler scheme (1.16).Then in Subsection 3.2, we show how to obtain the transition probability (1.18) for the recursive quantization scheme in the Vlasov setting, as well as the integration of Lloyd's algorithm.Finally, Subsection 3.3 contains the proof of Proposition 1.4, the L 2 -error analysis of the hybrid particle-quantization scheme (1.22).
In Section 4, we illustrate two simulation examples by using the above numerical methods.The first simulation is for a one-dimensional Burgers equation, introduced in [Szn91] and [BT97].This equation has a closed form solution so we can compute and compare the accuracy of different methods.The second example is a 3-dimensional model, the network of FitzHugh-Nagumo neurons, firstly introduced and simulated in [BFFT12] (see also some corrections in [BFT15]) and also simulated in [dRES22].
Finally, for the reader's convenience, all numerical methods presented in this paper and their connection are displayed in Appendix B-Figure 18, in which we also briefly mention their convergence rates.
Remark 1.6.(Comments on the particle method and on the hybrid particle-quantization scheme) In many research areas and applications related to the McKean-Vlasov equation such as mean field games (see e.g.[CD18]) and opinion dynamics (see e.g.[HK02]), the main research target is an N (1.24) The McKean-Vlasov equation (1.1) is considered as a limit equation of such particle system (1.24) when N → +∞, often appeared in the propagation of chaos theory in the literature.The particle method (1.8) in this paper can be considered as a temporal discretization of (1.24).Furthermore, from a point of view of unsupervised learning, the hybrid particle-quantization scheme (1.22) is essentially adding a K-means clustering step of the particle method, which can be computed by e.g.Python package sklearn.cluster.KMeans in practice.This adding step could be itself heuristic for the modelling since K-means clustering is usually used for feature extraction in unsupervised learning.For example, we can think the K-means cluster centers model some representative particles of the system.In Section 4.2, we show by numerical experiments that the hybrid particle-quantization scheme reduces the data volume while keeping a kind of stability on the mean and variance of the test function value.

Particle method and its convergence rate
In this section, we study the convergence of the particle method and prove Theorem 1.2 in three steps.
• Step 1 (Subsection 2.1): We state the space of stochastic processes where live the strong solution X = (X t ) t∈[0,T ] of the McKean-Vlasov equation (1.1) and the process X = ( Xt ) t∈[0,T ] defined by the continuous Euler scheme (1.7).We also define the space of probability distributions and the space of marginal distributions of these two processes X = (X t ) t∈[0,T ] , X = ( Xt ) t∈[0,T ] .
• Step 3 (Subsection 2.3): We compare these two particle systems and prove Theorem 1.2.

Spaces of stochastic processes and their probability distributions
Recall that C [0, T ], R d , • sup denotes the space of R d -valued continuous applications defined on [0, T ] and valued in R d , equipped with the uniform norm Then the unique solution X = (X t ) t∈[0,T ] of (1.1) and the process XM = ( XM t ) t∈[0,T ] defined by the continuous Euler scheme (1.7) satisfy the following inequality where

Let us consider now
(2.3) and define an application ι : The well-posedness of the application ι has been proved in [Liu19, Lemma 5.1.2],which also implies that the marginal distributions of processes X and X lie in the space C [0, T ], P p (R d ) .
Lemma 2.2.Let (µ t ) t∈[0,T ] and (μ t ) t∈[0,T ] be respective marginal distributions of the unique solution X of (1.1) and the process X defined by the continuous Euler scheme (1.7).
In [Lac18], the author defines an application W p,t , called "truncated Wasserstein distance", on where Π(µ, ν) is the same set as in (1.10).The following Lemma, whose proof is postponed to Appendix A, shows the relation between W p,t (µ, ν) and sup s∈[0,t] W p (µ s , ν s ).

Definition of the particle systems with and without interaction
We define the continuous extension (X 1,N t , ..., X N,N t ) t∈[0,T ] of the particle system (1.8) as follows : ∼ X 0 ; for any n ∈ {1, ..., N } and for any t ∈ (t m , t m+1 ], set (2.6)For every t ∈ [t m , t m+1 ), we define t = t m .Then, the continuous extension (X 1,N t , ..., X N,N t ) t∈[0,T ] are solutions to the following equations with initial random variable X n,N (2.7) Lemma 2.4.Suppose that Assumption I holds for some p ∈ [2, +∞).
(a) The coefficients b and σ have a linear growth in the sense that there exists a constant C b,σ,L,T depending on b, σ, L and T such that ) t∈[0,T ] be processes defined by (2.6).Then for a fixed temporal discretization number M ∈ N * , we have (2.9) The proof of Lemma 2.4 is postponed to Appendix A. The system ( X1,N t , ..., XN,N t ) t∈[0,T ] defined by (2.6) can be considered as a particle system having interaction through μN tm = 1 Now we define another particle system (Y 1 t , ..., Y N t ) t∈[0,T ] without interaction, which are essentially i.i.d.processes of X = ( Xt ) t∈[0,T ] defined by the continuous Euler scheme (1.7).Based on the same Brownian motions B n , n = 1, ..., N as in (2.6), these processes where for every t ∈ [0, T ], μt in (2.10) is the marginal distribution at time t of the process X = ( Xt ) t∈[0,T ] defined by (1.7).By using the same notation t as in (2.7), for every n = 1, ..., N , (Y n t ) t∈[0,T ] defined by (2.10) is the solution of Moreover, it is obvious that the processes Y n , 1 ≤ n ≤ N, are i.i.d copies of X and then is the empirical measure of μ = P • X−1 .When there is no ambiguity, we will write ν N instead of ν N,ω .The random measure ν N,ω is valued in As before, we write ν N,ω t

Proof of Theorem 1.2
The proof of Theorem 1.2 relies on a variant version of Gronwall's Lemma (see e.g.[Pag18, Lemma 7.3] for the proof) and the following Theorem 2.6 from [FG15], which shows a non-asymptotic upper bound of the convergence rate in the Wasserstein distance of the empirical measures.Lemma 2.5 (" À la Gronwall" Lemma).Let f : [0, T ] → R + be a Borel, locally bounded, non-negative and non-decreasing function and let ψ : [0, T ] → R + be a non-negative non-decreasing function satisfying where A, B are two positive real constants.Then, for any t Theorem 2.6.([FG15, Theorem 1]) Let p > 0 and let µ ∈ P q (R d ) for some q > p.Let U 1 (ω), ..., U n (ω), ... be i.i.d random variables with distribution µ.Let µ ω n denote the empirical measure of µ defined by Then, there exists a real constant C only depending on p, d, q such that, for all n ≥ 1, , In particular, Theorem 2.6 implies that for p ≥ 2, . (2.12) Moreover, we need the following Lemma, whose proof is postponed to Appendix A, for the proof of Theorem 1.2.Lemma 2.7.Assume that Assumption I holds for an index p ∈ [2, +∞).Then the coefficients b and σ satisfy the following inequalities where C d,p,L is a constant depending on d, p, L.
Proof of Theorem 1.2.(a) For every n ∈ {1, ..., N }, we have (by the Minkowski inequality) by Lemma 2.7 where Then by Lemma 2.5, we have Here we can apply Lemma 2.5 since Lemma 2.1 and Lemma 2.4-(b) imply (by the defintion of ψ(t) in (2.14)) Then, by Lemma 2.5, we obtain where the second inequality above follows from Lemma 2.3.For the convergence of W p (μ, ν N ) p → 0 as N → +∞, we refer to [Lac18, Corollary 2.14] among many other references.
(b) If X 0 p+ε < +∞ for some ε > 0, the Assumption I holds with index p + ε since for every µ, ν For any s ∈ [0, T ], ν N s is the empirical measure of μs .It follows from Theorem 2.6 that for any s ∈ [0, T ], , (2.17 where C is a constant depending on p, ε, d.Moreover, the inequality (2.2) implies that sup where C = C sup s∈[0,T ] M 1 p+ε p+ε (μ s ) which is a constant depending on p, ε, d, b, σ, L, T and X 0 p+ε .Moreover, the inequality (2.15) implies that sup Then, by Lemma 2.5, we obtain sup where where ‹ C is a constant depending on p, ε, d, b, σ, L, T and X 0 p+ε .
By the definition of ψ(t) in (2.14), we have p so that (2.15) and (2.21) imply that sup where C is a constant depending on p, ε, d, b, σ, L, T and X 0 p+ε .

Quantization-based schemes and their error analyses
This section is devoted to the error analyses of the quantization based schemes (1.16), (1.18) and (1.22).We start with a review of optimal quantization, as well as its connection with the K-means clustering.We also prove Theorem 1.3, the L 2 -error analysis of the quantization based scheme (1.16) in Subsection 3.1.Next in Subsection 3.2, we give computational details of transition probabilities (1.18) for the recursive quantization scheme and show how to optimally compute the quantizer x (m) by integrating Lloyd's algorithm to reduce the quantization error.Finally, in Subsection 3.3, we prove Proposition 1.4, the error analysis of the hybrid particle-quantization scheme.

Preliminaries on the optimal quantization
In this section, X : (Ω, F, P) → R d , B(R d ) is a random variable having p-th finite moment, p ≥ 1.Its probability distribution is denoted by µ.Let K ∈ N * be the fixed quantization level.We call x = (x 1 , ..., x K ) ∈ (R d ) K a quantizer of level K.The quantization error function of µ at level K and order p, denoted by e K,p (µ, •), is defined by (3.1) Sometimes we write e K,p (X, •) for the quantization error function since P X = µ, but the value of this function depends only on the probability distribution µ.For a quantizer x = (x 1 , ..., x K ) such that x i = x j , i = j and for (V k (x)) 1≤k≤K a Voronoï partition generated by x defined in (1.12), the quantization error of x satisfying It is obvious that the error value does not depend on the choice of Voronoï partition.Moreover, still considering this Voronoï partition (V k (x)) 1≤k≤K , we have (see e.g.[GL00, Lemma 3.4]3 ) where " X x := Proj x (X) and µ x := µ • Proj −1 x = L( " X x ) with the projection function Proj x defined by (1.13).
The optimal quantizer at level K and order p for µ, denoted by x * = (x * 1 , ..., x * K ), is defined by e K,p (µ, y). (3.4) The existence of such an optimal quantizer has been proved in [Pol82].There exists few results in the literature for the uniqueness of the optimal quantizer (see e.g.[Kie83]) but in practice, we only need one optimal quantizer for a further simulation as all optimal quantizers have the same quantization error by (3.4).We call e * K,p (X) = e * K,p (µ) := inf the optimal (quantization) error at level K and order p for µ (or for X).
Now we recall several basic properties of the optimal quantization.
Proposition 3.1.Let K ∈ N * be the quantization level and let p ≥ 1.We consider an R d -valued random variable X having probability distribution µ ∈ P p (R d ).Let x * = (x * 1 , ..., x * K ) be an optimal quantizer of X at level K and order p.
(1) (see [GL00, Theorem 4.1 and Theorem 4.12]) Assume that the support of µ satisfies card(supp(µ)) ≥ K, where card(supp(µ)) denotes the cardinality of the support of µ.Then any K-level optimal quantizer x * contains K different points, i.e. for every i, j ∈ {1, . . ., K} such that i = j, we have (2) (Zador's theorem, see [LP08] and [Pag18, Theorem 5.2]) If µ ∈ P p+ε (R d ) for some ε > 0, the optimal error has the following non-asymptotic upper bound for every quantization level K ≥ 1, (4) (Stationary property when p = 2, see e.g.[Pag18, Proposition 5]) For the quadratic optimal quantization (i.e.p = 2), any optimal quantizer x * is stationary in the sense that (3.7) (5) (Consistency of the optimal quantization, see [Pol82, Theorem 9] and [LP20, Theorem 4 and Appendix A]) Consider a probability distribution sequence µ n , µ ∈ P 2 (R d ), n ≥ 1, such that W 2 (µ n , µ) → 0 when n → 0. For every n ≥ 1, let x (n) be a quadratic optimal quantizer of µ n .Then any limiting point of (x (n) ) n≥1 is a quadratic optimal quantizer of µ.Moreover, we have From a numerical point of view, the main idea of the optimal quantization is to use " ) as an approximation of the target random variable X (resp. the target probability distribution µ).Proposition 3.1-(3) shows that " X x * (respectively µ x * ) is the closest discrete representation of X (resp. of µ) with respect to the L p -norm (resp.the Wasserstein distance W p ), among all random variables (resp.probability distributions) having a support of at most K points.Furthermore, the Proposition 3.1-(5) shows the connection between the optimal quantization and the K-means clustering in unsupervised learning (see e.g.[Pol82] and [LP20]).Consider a sample {η 1 , ..., η n } ⊂ R d , the K-means clustering is essentially to find an optimal quantizer with respect to the empirical measure μn := 1 n n i=1 δ ηi over the sample.In the unsupervised learning context, such an optimal quantizer is called cluster centers and η 1 , ..., η n are often considered as i.i.d.samples having probability distribution µ so that W p (μ n , µ) → 0 a.s.if µ ∈ P p (R d ) (see e.g.[Pol82, Thereom 7]).
Finally, we recall that there exist several numerical methods to find an optimal quantizer in the quadratic setting (p = 2) such as the (stochastic) gradient descent algorithm (see e.g.[Pag15, Section 3.2]) and Lloyd's fixed point algorithm (see e.g.[Llo82] and [PY16] for its convergence).In this paper, we implement Lloyd's algorithm to obtain the quadratic optimal quantizer in the simulations but numerical results can be obtained by other methods as well.Lloyd's algorithm, whose objective is to find a stationary quantizer in the sense of (3.7), is described as follows.
Algorithm 1: Lloyd's (fixed point) algorithm for a probability measure µ ∈ P 2 (R d ) Set K ∈ N * for the quantization level.
The definitions of Xtm , ‹ X tm and in (1.6) and (1.16) directly imply that Hence, Let F 0 denote the σ-algebra generated by X 0 .For every m ∈ {1, ..., M }, we define F m the σ-algebra generated by X 0 , Z 1 , ..., Z m .Then, as Z m+1 is independent of F m and Xtm , " Moreover, Assumption I implies that as well as where we recall that |||•||| denotes the operator norm defined by |||A||| := sup |z|≤1 |Az|.Consequently, where in the last inequality, we use the fact that where Ξ m = ‹ X tm − " X tm 2 denotes the quadratic quantization error at time t m (see (3.3)).This directly implies (2) Step 1.We first prove that there exists a constante ‹ C max > 0 such that As for every m ∈ {0, ..., M }, the quantizer x (m) = (x (3.12) On the other hand, we have where the first inequality follows from the definition of ‹ X tm in (1.16) and Minkowski's inequality, the second inequality follows from the fact that σ(t m , " X tm , µ tm ) ⊥ ⊥ Z m+1 and the third inequality follows from Lemma 2.4-(a) and the inequality W p ( µ tm , δ 0 ) ≤ " X tm p .Thus, the inequality (3.12) implies that Thus (3.11) is obtained by considering p = 2 + ε and by letting ‹ Step 2. As for every m ∈ {0, ..., M }, the quantizer x (m) = (x Then we can conclude by remarking

Recursive quantization-based scheme for the Vlasov equation
In this section, we present the recursive quantization scheme in the Vlasov setting and show how to integrate Lloyd's algorithm into this scheme to reduce the quantization error at each time step.
Recall that in the Vlasov setting, there exist β : ) 1≤k≤K for a Voronoï partition generated by x (m) .Then the transition step of theoretical quantization-based scheme (1.16) can be written as where p since Z m+1 ∼ N (0, I q ).It follows that and obviously, by letting k )Z m+1 , we get Thus one can compute the weights p , 1 ≤ j ≤ K, starting from the weight vector p (m) at time t m by applying (3.15) The formula (3.15) is valid for any quantizer sequence x (m) with distinct components.Nonetheless, the quantization errors Ξ tm = ‹ X tm − " X tm 2 of each time step t m impact on the simulation result, which is also indicated in the error analysis (1.20).One way to reduce the quantization error Ξ tm is to integrate Lloyd's algorithm (see Algorithm 1) into the recursive quantization scheme.Given x (m) and p (m) , the Lloyd's iteration (3.9) sending x (m+1, [l]) = (x The denominator of (3.16) can be directly computed by (3.15) while the numerator of (3.16) is essentially to compute the following value where E ‹ (3.19) Remark 3.2.In dimension 1, Lloyd's iteration above is numerically low cost, since the the Voronoï cells in dimension 1 are in fact intervals of R. For example, let x = (x 1 , ..., x K ) ∈ R K be a quantizer such that x i < x i+1 , i = 1, ..., K − 1, one can choose a Voronoï partition as follows: K ) be the quantizer of the m-th time step.The transition probability π where F m,σ 2 denotes the cumulative distribution function of N (m, σ 2 ) with k ) . (3.20) Moreover, Lloyd's iteration (3.18) depends on the value where f m,σ 2 (ξ) is the density function of N (m, σ 2 ) with m and σ defined in (3.20).In fact, to avoid computing the integral, (3.21) can be alternatively calculated by the following formula: for every However, in higher dimension, the main difficulty of this method is computing the integral of the density function of a multi-dimensional normal distribution over a Voronoï cell (see (3.15), (3.18)).There exists several numerical solutions such as pyhull package in Python (see also the website www.qhull.com)for the cubature formulas of the numerical integration over a convex set in low and medium dimensions (d = 2, 3, 4).

L 2 -error analysis of the hybrid particle-quantization scheme
We prove Proposition 1.4 in this section.
Proof of Proposition 1.4.To simplify the notation, we will denote by where the superscript "P" indicates the Particle method and the superscript "Q" indicates the hybrid particle-Quantization scheme.Moreover, let F m be the σ−algebra generated by The third term in (3.24) equals to 0 since Z n m+1 , n ∈ {1, ..., N }, are independent of F m .Moreover, and Hence, (3.24) becomes Remark that for any m ∈ {1, ..., M }, the measure 1 Hence, by letting and by denoting the quantization error of the time Hence, it follows from (3.25) that Consequently, Remark 3.3.For every m = 1, ..., M , it follows from (3.3) that ò .
Thus one can implement Lloyd's algorithm at each time step in order to reduce the error bound on the right-hand side of (1.23).Moreover, for a fixed ω ∈ Ω, finding optimal quantizer for the empirical measure 1 N N n=1 δ X n,N tm (ω) is equivalent to compute the K-means cluster centers of the sampling ‹ X 1,N tm (ω), ..., ‹ X N,N tm (ω) for which we can use e.g.sklearn.cluster.KMeans package in Python.

Simulation examples
In this section, we illustrate two simulation examples.The first one is the Burgers equation introduced and already considered for numerical tests in [BT97] and [GPPP05].This is a one-dimensional example with an explicit solution so we use this example to compare the accuracy and computational time of the different numerical methods under consideration.The second example, the Network of FitzHugh-Nagumo neurons, already numerically investigated in [BFFT12], [BFT15] (see also [dRES22]), is a 3-dimensional example.All examples are implemented in Python 3.7.

Simulation of the Burgers equation, comparison of three algorithms
In [BT97], the authors analyse the solution and study the particle method of the Burgers equation where H is the Heaviside function H(z) = 1 {z≥0} and σ is a real constant.The cumulative distribution function V (t, x) of µ t satisfies Moreover, if the initial cumulative distribution function V 0 satisfies x 0 V 0 (y)dy = O(x), then the function V has a closed form given by (see [Hop50]) Hence, if we consider X 0 = 0, then the cumulative distribution function at time T = 1 reads However, it is computationally extremely costly to directly compute the inverse function of the cumulative distribution function (4.2) and if we compute (4.4) by using Monte-Carlo simulation, it will induce its own statistical error which may perturb our comparisons.Thus, instead of considering (4.4), we choose to directly compute (4.3) by where Unifset is a uniformly spaced point set in [−2.5, 3.5].
In the following simulation, we choose σ 2 = 0.2 and M = 50 so that we have the same time step h = T M = 0.02 for each method.We first give preliminary illustrations of the simulated cumulative distribution function by the particle method (1.8), the recursive quantization scheme (1.16)-(1.18)-(1.19)(with and without Lloyd's algorithm) and the hybrid particle-quantization scheme (1.22) in Figure 3, 4, 5, 6 and 7.The true cumulative distribution functions are marked in red while the simulated ones are marked in blue.In a second phase, we show the converging rate of the simulation error (4.3) of the particle method (1.8) and the recursive quantization scheme without Lloyd quantizer optimization (1.16)-(1.18)-(1.19)respectively according to N and K. Table 2 shows the simulation error (4.3) of the particle method (1.8) with respect to the numbers of particle N = 2 8 , ..., 2 13 for a fixed M = 50.As the particle method (1.8) is a random algorithm, the error shows in    Table 3 shows the convergence rate of the error of the recursive quantization scheme (1.16)-(1.18)with respect to the quantizer size K = 2 5 , 2 6 , 2 7 , 2 8 , 2 9 , 2 10 .Here we use a fixed quantizer sequence which is a uniformly spaced point set in [-2.5, 3.5] without Lloyd's algorithm for each time step t m .K 2 5 2 6 2 7 2 8 2 9 2 10 Error F simu − F true sup 0.07347 0.04176 0.02360 0.01471 0.01043 0.00829  Now we give some comments on the numerical performance of different methods mainly through two aspects: the accuracy and the computing time.
• Comparison of the computing time.
The particle method (1.8) and the recursive quantization scheme (1.16)-(1.18)-(1.19)without Lloyd iteration are the two fastest methods.In fact, these two methods are essentially computing a Markov chain in R N and R K respectively.The application of Lloyd's iteration in the recursive quantization scheme is slightly faster than in the hybrid particle-quantization scheme since we used the closed formulas (3.22).We can also remark that, when the quantization level K is large, Lloyd's iteration is numerically costly even in dimension 1 and the application of Lloyd's algorithm only slightly reduces the quantization error.
• Comparison of the accuracy computed by F simu (x) − F true (x) sup .
-The particle method (1.8) and the hybrid particle-quantization scheme (1.22) are random algorithms, whose simulation results, including the error F simu (x) − F true (x) sup , depend on ω in (Ω, F, P).Therefore, when considering the accuracy of these two algorithms, one need to compute in the same time the standard deviation of errors.However, the recursive quantization scheme (1.16)-(1.18)-(1.19) is deterministic.
-Comparing with the particle size N in Table 2 and the quantizer size K in Table 3, one can remark that to achieve the same accuracy, we need fewer points in the quantizer than in the particle.So if we need a discrete representation of the cumulative distribution function F (or equivalently, a discrete representation of the probability distribution µ) to compute a further functional of µ, such as an integral with respect to µ, the recursive quantization based scheme provides a smaller data set (K-size quantizer and K-size weight vector) than the particle method.
-The error of the recursive quantization scheme (1.16)-(1.18)-(1.19),especially when we implement without Lloyd's quantizer optimization, strongly depends on the choice of quantizer.Generally, a practical way to choose the initial quantizer of a probability distribution µ is to use self-quantization technique for which we refer to [DGLP06], [GL00][Section 7.1 and Section 14], [PP03] and [PPP04].Another efficient trick is to rely on a so-called "splitting method", that is, using the trained quantizer of the Euler step t m as an initial quantizer of the Euler step t m+1 .
In this one dimensional example, we did not remark any obvious advantage of the hybrid particle quantization scheme (1.22) comparing with other methods.However, in the next section, we will show that the hybrid method provides a fair balance between the accuracy and the data volume.

Simulation of the network of FitzHugh-Nagumo neurons in dimension 3
In this section, we consider the network of FitzHugh-Nagumo neurons introduced in [BFFT12] and [BFT15]: where b : R 3 × P(R 3 ) → R 3 and σ : R 3 × P(R 3 ) → M 3×3 are defined by b(x, µ) := and where with the following parameter values This model was first studies in [BFFT12] and then rigorously investigated and analyzed in [BFT15].We are aware that the equation (4.5) does not fulfill Assumption I since the drift b defined by (4.6) is only locally Lipschitz in (x, µ).However, the drift b satisfies for some c 1 > 0 and the coefficient σ is bounded.This ensures the existence of a strong solution living on the whole R + .In fact the presence of the term − (x1) 3 3 also induces a mean-reversion property that makes possible to control the long-run behaviour of the equation.On the other hand such drift with non linear growth (in norm) usually induces an instability of the Euler scheme as emphasized e.g. in [Lem07], at least when using equidistant time steps.We nevertheless chose this model for a 3D numerical illustration due to its challenging feature, but with a refined time step (M = 5000, see below) to ensure its stability.Other numerical schemes, typically semi-implicit could be more stable but out of the scope of this paper.
In this section, we compare the performance of the particle method (1.8) and the hybrid particlequantization scheme (1.22) in two aspects.First, we intuitively compare these two methods by simulating the density function of (x 1 , x 2 ) for T = 1.5, as in the original paper [BFFT12][Page 31, Figure 4, the third one in the right].In this step, we choose M = 5000 for the temporal discretization to reduce (as much as possible) the error induced by the Euler scheme.The images of the density function simulated respectively by the particle method and the hybrid method are displayed in Figure 13, 14 and Figure 15, 16, 17 respectively.Next, as the particle method and the hybrid method are both random methods, we take     The obtained density functions have a similar form by these two methods but the data volume obtained by the particle method is 5000 (the number of particle) × 3 (dimension), while the data volume obtained by the hybrid method is 300 (the quantizer size) × 4 (dimension + weight for each quantizer).).
Intuitively, the hybrid method can be considered as adding a "feature extraction" step on the particle method.Comparing the third and fourth columns of the above Table 4, one can notice that this added step needs more computing time but reduces the size of the output data for the further computing of the test function ϕ(µ simu T ) without increasing the standard deviation.However, the second column of the above table shows that if we implement the particle method with a similar data size than the quantization level, the test value of ϕ(µ simu T ) provides a much larger standard deviation.
A Proofs of Lemma 2.3, Lemma 2.4 and Lemma 2.7   Step 2. We prove (2.9) in this step.The first step implies that  For the proof of Lemma 2.7, we need the following two inequalities and we refer to [Pag18][Section 7.8] among other references for more details.
Lemma A.1 (The Generalized Minkowski Inequality).For any (bi-measurable) process X = (X t ) t≥0 , for every p ∈ [1, ∞) and for every T ∈ [0, +∞], In particular, if (B t ) is an (F t )-standard Brownian motion and (H t ) t≥0 is an (F t )-progressively measurable process having values in M d,q (R) such that (by Assumption I)

Figure 1 :
Figure 1: A Voronoi partition.Figure 2: The quadratic optimal quantization for

Figure 7 :
Figure 7: Simulation by the hybrid particle-quantization scheme with 5 Lloyd iterations at each Euler step (1.22)

Figure 8 ,
Figure8, 9 and 10 show the curve of the particle method errors with respect to N , the respective log-log

Figure 8 :
Figure 8: Error of the particle method (1.8) with respect to the particle size N Figure 9: Log-error of the particle method (1.8) with respect to log 2 (N ).The slope is approximately equal to −0.285.

Figure 10 :Figure 11 :
Figure 10: The standard deviation of the error of the particle method (1.8) with respect to N

Figure 11
Figure 11 and 12 above show the curve of the error with respect to K and the respective log-log error.
function for the simulated distribution µ simu T at time T , rerun 200 times for each method and compare the mean and the standard deviation of ϕ(µ simu T ).The particle method (1.8):

Figure 13 :
Figure 13: The first and second coordinates of 5000 particles at time T = 1.5.

Figure 16 :
Figure 16: The Voronoï cells of the above quantizer.The colour of each Voronoï cell represents the weight of this cell (the darker the heavier).

Figure 17 :
Figure 17: The density function simulated by the hybrid particlequantization method (1.22).The vertical axis is the weight divided by the area of the corresponding Voronoï cell.

For a more precise
comparison, we set now the time discretization number M = 150 and we consider the following test function for the simulated distribution µ simu T at T = 1.5 ϕ(µ simu T ) := R 3 |ξ| 2 µ simu T (dξ) = E X simu T 2 and rerun 200 times for each method.
Theorem 1.2.Assume that Assumption I is in force for an index p ∈ [2, +∞).Set M ∈ N * and h = T M simu represents the simulated cumulative distribution function by different numerical methods and F true is the true cumulative distribution function (4.2).For two probability distributions µ, ν ∈ P p (R d ) with respective cumulative distribution function F and G, the Wasserstein distance W p (µ, ν) can be computed by

Table 1 :
Comparison of three algorithms

Table 2 :
Error of the particle method (1.8) with respect to the particle size N

Table 3 :
Error of the recursive quantization scheme (1.16)-(1.18)with respect to the quantizer size K

Table 4 :
Table 4 shows the simulation results.Comparison of the simulation result ϕ(µ simu T ., N , we have where the first inequality above follows from the Minkowski inequality and the second inequality follows from the fact that Z n m+1 is independent of F tm and σ(t m , Xn,N tm , μN tm ) is F tm measurable.By applying the inequality (2.8) in the first part, we obtain b(t m , Xn,N tm , μN tm ) (as for every a, b ∈ R + , p ≥ 2, {E [a p ∨ b p ]} 1/p ≤ {E [a p ] + E [b p ]} 1/p ≤ {E [a p ]} 1/p + {E [b p ]} 1/p ) tm + (t − t m )b t m , X n,N tm , μN tm + σ t m , X n,N tm , μN tm (B n t − B n tm ) p ≤ sup tm + b t m , X n,N tm , μN tm • t − t m + σ t m , X n,N tm , μN tm + h b t m , X n,N tm , μN < +∞ by applying (2.8), (A.1) (A.2) and (A.3)