A flexible split-step scheme for solving McKean-Vlasov Stochastic Differential Equations

We present an implicit Split-Step explicit Euler type Method (dubbed SSM) for the simulation of McKean-Vlasov Stochastic Differential Equations (MV-SDEs) with drifts of superlinear growth in space, Lipschitz in measure and non-constant Lipschitz diffusion coefficient. The scheme is designed to leverage the structure induced by the interacting particle approximation system, including parallel implementation and the solvability of the implicit equation. The scheme attains the classical $1/2$ root mean square error (rMSE) convergence rate in stepsize and closes the gap left by [18,"Simulation of McKean-Vlasov SDEs with super-linear growth"in IMA Journal of Numerical Analysis, 01 2021. draa099] regarding efficient implicit methods and their convergence rate for this class of McKean-Vlasov SDEs. A sufficient condition for mean-square contractivity of the scheme is presented. Several numerical examples are presented, including a comparative analysis to other known algorithms for this class (Taming and Adaptive time-stepping) across parallel and non-parallel implementations.


Introduction
The aim of this paper is to present a numerical scheme for simulating McKean-Vlasov Stochastic Differential Equations (MV-SDEs) with drifts of superlinear growth in space and Lipschitz in the measure component, and Lipschitz diffusion coefficients. MV-SDEs differ from standard SDEs by means of the presence of the law of the solution process in the coefficients and their dynamics are of the following type dX t = v(t, X t , µ X t ) + b(t, X t , µ X t ) dt + σ(t, X t , µ X t )dW t , X 0 ∈ L m 0 (R d ). (1.1) where µ X t denotes the law of the solution process X at time t, W is a multidimensional Brownian motion, v, b, σ are measurable maps and X 0 is a sufficiently integrable initial condition (in L m 0 for m ≥ 2). These equations were introduced by McKean in the sixties [40] and have been the target of much research since. There is a rich literature on well-posedness [6,42,44,45] and we point to the recent summary work [27] highlighting recent developments in regularity estimates, exponential ergodicity, long time large deviations (also [1,19]), comparison Theorems and the Vlasov-Fokker-Plank equations associated to MV-SDEs. One particular element of interest is the so-called propagation of chaos (PoC) introduced by Kac [30] and further studied in the MV-SDE literature [33,34,41,45]. The PoC phenomena states that a MV-SDE is the limit of a certain weakly interacting particle systems (of standard SDEs) as the system's size increases to infinity. Namely, (1.1) is the limit, as N → ∞, of the N -dimensional system of R d -valued interacting particles X i,N , with µ X,N t being the empirical measure given as µ X,N t (dx) := 1 N N j=1 δ X j,N t (dx) and δ X j,N t is the Dirac measure at point X j,N t , W i are independent Brownian motions and with independent and identically distributed initial conditions X i 0 across i = 1, · · · , N . In essence, the law µ X t in R d of (1.1) is approximated by the empirical average µ X,N t generated by the (R d ) N -system (a high-dimensional system). This methodology, appealing to the interacting particle system, includes a well-known quantified speed of convergence result (summarised in Proposition 2.2 below) providing a path for a numerical approximation: from X to {X i,N } i to {X i,N,π } i with X ·,N,π the numerical approximation of each particle in the SDE system.
It is essential to highlight that although {X i,N } i is a high-dimensional SDE and any existing numerical method a priori applies straightforwardly, all rates and results obtained in this straightforwardly way depend on the system's dimension N d. The constants then explode as N increases. Part of the difficulty of the method is to show that such constants, rates and results are independent of N albeit depending on d. This step has its non-trivial difficulties which we discuss further below by drawing on prior work [18].
In this work, we focus on the class of MV-SDE with drifts of superlinear growth in their spatial components [4,18,19,31,43], encapsulated in the function v, where b, σ are uniformly Lipschitz (in space and measure), and σ is non-constant -this structural Assumption on b, v is trivial from the theoretical perspective but plays an important role in the numerics. This class of MV-SDEs appears in several practical models in science, for example, in neuroscience [3,12] introduce the mean-field FitzHugh-Nagumo model for a neuron networks in the brain; [11] discuss individual-based and swarming Cucker-Smale interaction models; and models of battery electrodes [21,25]. These equations do not have explicit or closed-form solutions and numerical approximations are needed. Moreover, standard explicit numerical methods suitable for the Lipschitz case fail to converge on the superlinear growth setting. This is exemplified by the 'particle corruption' phenomena [18,Section 4.1] for MV-SDE particle system numerics. This phenomenon is akin to the divergence of SDE schemes in the superlinear growth setting as highlighted in the seminal work [28].
The numerical approximation of McKean-Vlasov equations in the continuous case was initiated in [13] and has been investigated further in several recent works. We briefly mention a few on numerical schemes for MV-SDEs under the superlinear setting and in the Brownian framework. Tamed Euler schemes appeared first [18], shortly followed by tamed Milstein schemes [5,31] (appeared simultaneously) and Milstein schemes for delay MV-SDEs [4]. In [32] a tamed scheme is proposed (and a new wellposedness result) for MV-SDE featuring common noise in the particle system (which (1.2) does not) and where the diffusion is also allowed to grow superlinearly. Adaptive time-stepping methods come as an alternative to taming and in the context of MV-SDE they are proposed in [43] -these two methods will henceforth be referred to as the 'Taming' and the 'Adaptive' algorithm, respectively.
Outside the superlinear setting, [29] discusses computational complexity of MV-SDE algorithms (uniformly Lipschitz drift and constant diffusion coefficient) and we point the reader there for an in-depth overview on the state of the art in that regard. Numerical approximations for MV-SDEs with non-Lipschitz conditions in measure and space exist [17] but impose a linear growth condition on the coefficients. The case of simulating MV-SDEs with discontinuous coefficients has been addressed [35]. An alternative to the empirical measure approximation of (1.2) is to use a projection-type estimation of the marginal densities [7] where the error analysis requires differentiability of the coefficients. Variance reduction technique have been analysed for the class of MV-SDE, namely, importance sampling [20], antithetic multilevel Monte Carlo sampling [4] and antithetic sampling [8]. There also recent progress in the jump-diffusion setting [2,10].
Motivation. For the superlinear growth case described above, we are motivated by an open question left in [18] regarding implicit-type numerical methods for MV-SDEs. All methods described above are of explicit time-stepping nature which are known to lose some of the geometric properties of the original system. For instance, taming destroys the strict dissipativity of the drift map which then raises questions regarding the stability of the scheme's output across long time horizons (see [48]). Implicit methods for MV-SDEs are largely unexplored. The notable exceptions are [18,37] which are also starting point for this work. In [37] the authors study MV-SDEs and associated particle systems with drifts of (symmetric) convolution kerneltype (as in [1]) and constant diffusion coefficient. The theoretical (Bacry-Emery) machinery employed there yields a critical logarithmic Sobolev inequality estimate that is used to establish a concentration inequality for their implicit Euler scheme. Assumption-wise, their setting and the setting of this work do not cover each other but agree over a small class. Further, it is unclear how to extend their methodology to non-constant possibly degenerate diffusion coefficients.
More recently, an implicit method to deal with the superlinear growth was proposed [18] (for general diffusion coefficients and without a concavity assumption) but convergence was shown under stronger restrictions than expected: the measure component of the drift is Lipschitz in Wasserstein-1 metric not just in Wasserstein-2 (plus uniformly bounded measure dependency). At the core of these difficulties was the use of stopping times arguments which revealed themselves difficult to handle with the measure dependency. The critical point is the proof of Lemma 5.12 (p.41) in [18] and the calculations executed in (p. [48][49]. Lastly, upon inspection of that proof, we emphasise that general θ-methods for SDEs [26] will face the same difficulties. Our contributions. In this work, we revisit the framework of [18] and propose a split-step numerical scheme inspired by the earlier work [26]. Our contributions can be summarised as follows, (I) Main results. We proposed a split-step method (SSM), see (2.3)-(2.4) below, for this class of MV-SDEs.
We prove its convergence and recover the 1/2-convergence rate in root Mean Square Error (rMSE) under the same general assumptions as Taming [18] or Adaptive [43]. No differentability or nondegeneracy assumptions are imposed and stopping-time arguments are fully avoided.
We provide the stability analysis of Mean-square contractivity for the SSM (non-constant diffusion coefficient). To the best of our knowledge this has not yet been discussed for MV-SDE schemes in general. The stability of the SSM provides a theoretical foundation for carrying out simulation with larger timestep and we point to positive results by way of numerical simulation with the Cucker-Smale flocking model [22] where the SSM outperforms both Taming [18] and Adaptive [43] algorithms.
In regards to known findings, the SSM here overcomes the limitations of the implicit method proposed in [18,Section 3.2] and extends its scope of application. In the context of standard SDE simulation (where the coefficients are independent of the measure), our results lift the differentiability restriction of [26, Assumption 3.1], allow for time-dependence and can take advantage of (the possible) concavity of the map v (in (1.1)) -this is a mild improvement of known SDE results.
(II) Computational gains. The scheme is designed so that the superlinear term can be split from the main equation in a way that optimises/minimises the computational cost of the inversion method (from the implicit component). This flexibility is understood in the following way: given an MV-SDE it is left to the user to choose which terms form v and which terms form b in (1.1) (within restrictions). Moreover, since there is a split v Vs b the user may decide to add & subtract convenient terms to the drift (see Section 3.4.2 below; also [14,Eqs. (37) and (38)]) -this trick allows one to transform a non-dissipative term x → v(·, x, ·) into a dissipative one at the expense of an increase of the Lipschitz constant of b.
The SSM proposed allows to decouple the measure component making it amenable to a parallel implementation (e.g., [39]). Namely, one parallelizes the task of solving N -times an R d -system in opposition to the non-parallelizable task of solving once the R dN -system. The computational gains of the parallel implementations for the SMM are shown to be on par with Taming [18] and Adaptive [43] algorithms.
(III) Comparative study against known literature. We provide a comparative analysis against the Taming [18] and Adaptive [43] methods, across parallel and non-parallel implementations. The numerical study covers four examples of interest, highlighting a different flavour of these 3 algorithms including stability experiments.
We close this introduction with two comments. The general setting of [4,18,19,31,43] is based on drifts maps (t, x, µ) → b(t, x, µ) and it is this map that satisfies a one-sided Lipschitz condition in space and a uniform Lipschitz condition in measure. In (1.1), we specify b to be (t, where the polynomial growth is fully captured by v while b remains uniformly Lipschitz in its variables. Separating b into v and b is natural non-limiting assumption. Settings outside the scope of this work are: non-Lipschitz measure dependency [1,37], the superlinear diffusion case of [32] (also [43, Section 3.1]), common-noise [32] or jumps [10]. Weak error analysis is left unaddressed.
Organisation of the paper. In section 2, notations and necessary concepts for this work are given. In Section 2.3 we state the SSM, the main Theorem of convergence and the stability results for the scheme. Section 3 provides several numerical examples for comparison with other methods. Examples covering the stability analysis in the superlinear case, notably the Cucker-Smale model, are also given in Section 3. All proofs are postponed to Section 4. For convenience, A contains a short description of the Taming and Adaptive algorithms.
Acknowledgements. The authors would like to thank the 3 referees for their thorough work and suggestions that led to non trivial improvements.

Notation and Spaces
Let N = {1, 2, · · · } be the set of natural numbers starting at 1 and R denotes the real numbers where R + = [0, ∞). Also, we denote a, b := [a, b] ∩ N = {a, · · · , b}, for any a, b ∈ N with a ≤ b. For x, y ∈ R d denote the scalar product of vectors by x, y ; and the Euclidean distance of x is |x| = ( d j=1 x 2 j ) 1/2 . The indicator function of a set Ω is denoted as 1 Ω .
The space of probability measures on R d is denoted by P(R d ) and its subset of finite second moment measures is denoted by P 2 (R d ). We recall the definition of the Wasserstein-2 distance metrizing P 2 (R d ), where Π(µ, ν) denotes all probability measure couplings between µ and ν on R d × R d , i.e. π ∈ Π(µ, ν) if and only if π(· × R d ) = µ and π(R d × ·) = ν. Let (Ω, F, F, P) be a filtered probability space with F t the augmented filtration generated by a standard l-dimensional Brownian motion W = (W 1 , · · · , W l ) and with an additionally sufficiently rich sub σ-algebra F 0 independent of W . We use E[·] = E P [·] to denote the expectation operator under the measure P. As usual notations, "i.i.d." means "independent and identically distributed" and "w.r.t" means "with respect to".
Fix T ∈ (0, ∞). We follow the notation from [18]. Let p ≥ 1. Define L p t (R d ) as the space of F t -measurable R d -valued random variables X that satisfy E |X| p 1/p < ∞. We use S p to denote the space of R d -valued The cross-variation between two processes X and Y is denoted as X, Y .
Throughout the text C ∈ R + is a constant that may change from line to line, may depend on the problem's data but is always independent of the constants h, M, N (associated with the numerical scheme and specified below).

Framework
We study MV-SDE (1.1) for m ≥ 1 and we denote the law of X at time t as µ X t . Take b, v, σ as measurable Throughout the text, we make the following assumption.
The structural choice of having a drift b = v + b with only v containing the superlinear growth component, as opposed to a single drift map b in the style of [18,43], is negligible and its use is discussed in Remark 2.7.
Immediate well-known properties can be derived from this assumption.
The above assumptions cover a larger range of models as highlighted in Section 3 below. This setting subsumes the standard globally Lipschitz assumptions. For further examples we point to [11,19,24,38]. There exists a constant C ∈ R + such that The interacting particle system approximation. In order to approximate of MV-SDE (1.1), we build an interacting particle system as follows: 1. For i ∈ 1, N , take X i,N 0 = X i 0 as i.i.d. random initial condition for each particle.
2. Each particle is driven by its own independent Brownian motion W i (all are i.i.d.) 3. The dynamics of the particle system is given by (1.2), and we set µ X,N t (dx) :

Propagation of chaos (PoC).
Below we show a pathwise PoC result to control the difference between the original MV-SDE and the interacting particle system. For that, we introduce the auxiliary equation system of non interacting particles t is the law of X solution to (1.1)). Under Assumption 2.1, we have (see [15,33,41,45]) By showing the following propagation of chaos result, we can connect in a quantifiable manner the MV-SDE and the interacting particle system.

Proposition 2.4 (Propagation of chaos). Let Assumption 2.1 hold and suppose we have for some
Moreover, let X i ∈ S m satisfy (2.1) and assume m > 4. Then, the convergence rate between MV-SDE (2.1) and the interacting particle system (1.2) is given by Under this result, one can approximate MV-SDEs through particle scheme. Thus, by showing the convergence of the numerical methods to the interacting particle scheme, we can obtain the convergence rate between a numerical method and the MV-SDE.

Remark 2.5 (Optimising the PoC convergence rate).
We highlight a result from [16] and later reviewed in [43] in the context of numerical methods for MV-SDEs. The PoC rate can be improved for the case d = 4, namely the log(N ) term can be omitted (under restrictions). This holds under the additional constraint of a constant diffusion coefficient σ, and a bounded drift with bounded derivatives. It does not cover the superlinear growth case here, nonetheless, it is reasonable to expect that the result can be lifted to match the drift condition in this work and a diffusion coefficient that is bounded (and sufficiently smooth).

The split-step method (SSM) for MV-SDEs: convergence and stability
The numerical scheme proposed in this work, and dubbed Split-Step Method (SSM), improves strongly on the implicit numerical scheme proposed in [18]. It is an enhanced variant of the split-step backward Euler scheme for standard SDEs [26, Eq. (3.8)-(3.9)] and here further optimised for the MV-SDE setting.
Define the uniform partition of [0, T ] as π := {t n := nh : n ∈ 0, M , h := T /M } for a prescribed M ∈ N. Define recursively the split-step method to approximate (1.2) as follows: We state immediately the main convergence result between the continuous time extension of the scheme Then, the following assertions hold. There exists a continuous-time extension, 3. Let (X i ) i be the solution of the non-interacting particle system (2.1). We have We remind the reader about Remark 2.5 concerning the PoC's convergence rate.
x, µ) one can always add and subtract a linear term This means that the one-sided Lipschitz constant L v becomes L v − γ and hence negative if γ is sufficiently large. We show below that this operation is not free of cost. Concretely, there is an implication in terms of the scheme's stability since for L v to become negative the Lipschitz constant L b increases proportionally. In Section 3.4.2 we discuss this in view of a numerical example and via a mean-square stability result we provide in Theorem 2.9 for the SSM (2.3)-(2.4).

Lastly, to the best of our knowledge, the SSM scheme (2.3)-(2.4) is not of the usual form SSM schemes are presented in the literature. Usually there is no structural separation of v + b and one sets b = 0. Consequently there is no drift component in (2.4) (only a diffusion part), thus a discussion on the benefits/drawbacks of adding/subtracting of a γx-term seems generally absent.
Remark 4.3 provides more details on the choice of h. The constraint of L v < −1/2 is not sharp. In fact, it can be replaced by L v < −ε for some ε ∈ (0, 1) at the expense of another constant growing proportionally to 1/ε. We choose for simplicity ε = 1/2, see Remark 4.3 and the definition of L v in Remark 2.2 for further details. This issue is relevant in case one sets b = 0 as is usual in the SSM literature (and the trick of Remark 2.7 cannot be applied).
Theorem 2.6 shows the strong convergence rate of the SSM is 1/2 (rMSE) which is the same as Taming [18] and Adaptive [43]. Also, the complexity of the particle system is of order N 2 in the worst situation, however, in several examples of section 3 the complexity for the calculation of the interaction term is of order N .
After the convergence study of Theorem 2.6 we introduce the notion of mean-square contractivity and study the stability of the SSM (2.3)-(2.4).
The next result shows when the SSM is mean-square contractive.

Theorem 2.9. Let the assumptions of Theorem 2.6 hold. Assume a further form of the Lipschitz condition of
Under the choice of h stated in Theorem 2.6, the quantity 1+βh is always positive.
and for a sufficient small h then β < 0 and thus the SSM is Mean-square contractive.
The proof of this result is postponed to Section 4.2. We illustrate immediately the scope of our findings with several numerical results. The reasoning behind the specification of the Lipschitz constants of b and σ will become apparent in the examples Section 3.4. The main motivation is to account for the contribution of the spatial and measure components separately as a way to explore the flexibility allowed by the scheme in choosing the drift coefficients v and b.
The equivalent definition of Mean-square contractivity (Definition 2.8) for the initial MV-SDE (1.1) is the so-called Exponential mean-square stability inequality defined next. We will make use of this definition in Section 3.4 below. Definition 2.10 (Exponential mean-square contractive solutions). Let X, Y be two solutions to ( for some real number η < 0, then the MV-SDE (1.1) is said to generate exponential mean-square contractive solutions.

Examples of interest
We now illustrate our numerical scheme (2.3), (2.4) through several examples of interest. Moreover, alongside the SSM simulations we also provide a comparative analysis with two other numerical schemes: Taming [18] and Adaptive timestepping [43]. For convenience, A contains a brief description of the algorithms including convergence results and conditions.
Since the exact solution to each example is unknown we use a proxy for the true solution in order to compute approximation errors. Concretely, we compare SSM/Taming/Adaptive results with SSM/Taming/Adaptive results at a lower value of timestep h as a proxy (each method is compared with its own approximation of the true solution but at a much smaller timestep). Within each example, all the methods will use same Brownian motion paths.
We consider the weak error (k = 1) and the strong error (k = 2) between the true solution X T and the approximationX T as follow Our main theorem covers only the strong convergence result, nonetheless, we also present the weak convergence rate estimation.
We study several examples. The stochastic Ginzburg Landau example is a well-studied one and it provides a comparison example to other methods. The second example is a multi-dimensional FitzHugh-Nagumo model of McKean-Vlasov type from neuroscience which shows that the split-step method can properly deal with the superlinear term in a complex system. For the first example, we discuss the parallel implementation, for the second example, we discuss the accuracy w.r.t runtime.
The third example lies outside the scope of our assumptions by featuring a non-Lipschitz measure dependency, but all the methods still work. Proving the convergence of the method under this setting is left for future research.
In the last part, we discuss the stability of the SSM as understood by Theorem 2.9. We first look at a linear case to compare the conditions for mean-square contractivity between MV-SDEs and the numerical scheme. Then, the non-linear Ginzburg Landau type equation is used to illustrate the mean square contractivity for the split-step method. At last, the Cucker-Smale flocking model (a degenerate MV-SDE) shows the split-step method has better properties compared to the other methods under larger choices of timestep h. 3) of the SSM distributed by the cores, then calculate the empirical measure of the particle system in the first core, and finally the second step (2.4) of the SSM is executed also in parallel. To implement Taming (A.1) and Adaptive (A.2) in parallel, at each timestep h one needs to first communicate the empirical measure of the particle system between different cores and then each core can calculate its particles dynamics independently.
A priori one should expect the taming to be the fastest of the three algorithms. We highlight that due to the nature of the empirical measure, after each timestep the processors need to communicate to update the calculation of the empirical measure, this leads to a well-known loss of parallelization power [9].

Example: the stochastic Ginzburg Landau equation
The first example we consider is the mean-field perturbation of the classic one-dimensional stochastic Ginzburg Landau equation, namely, for all t ∈ [0, T ], where σ , c are constants. This equation is a toy one-dimensional MV-SDE that we use for a methodological comparison analysis. It features in [  In both the parallel and non-parallel implementation, Taming is the fastest while the SSM takes a slightly longer time than the other methods but with a performance comparable to Adaptive. In a parallel implementation with 4 cores we reach a reduction to nearly 27% in relation to the non-parallel implementation's computational time. In this example, to reach the same strong error level Taming takes nearly 7-times more time than SSM; Adaptive is similar to SSM.
All the parameters are the same as in [18,Section 4.3] (V 0 = 0, w 0 =1/2, . . . ) -see also [43,Section 4.4]. The split-structure of the SSM enable us to flexibly deal with the superlinear term and the Lipschitz terms separately -we make judicious choices regarding v and b and thus we can optimise the solver of the implicit part. Otherwise, we would have been forced to use a general purpose solver which is costlier. For the simulation, we take N = 1000, T = 2, the time step is taken from h ∈ {10 −4 , 2 × 10 −4 , 5 × 10 −4 , 10 −3 , . . . , 10 −1 } and the true solution is calculated with h = 10 −5 . Taming is implemented with α = 1/2 and Adaptive with h δ (x) = h min(1, |x| 2 /|b(t, x, µ)| 2 ) . Fig 3.2(a) shows the weak error rate of Taming to roughly be 1/2 with other methods being 1.0 (implementing Taming with α = 1 yields a weak convergence rate of order 1.0, we do not present the result). Fig 3.2(b) shows the strong error rate of all the methods to roughly be 1/2, the error of SSM is an order of magnitude lower than the others. Fig 3.2(c) shows that, to reach the same strong error level Taming takes nearly 80-times more time than SSM; Adaptive takes nearly 10-times more time than SSM.

Example: Polynomial drift (non-Lipschitz measure dependency but still of onesided Lipschitz type)
We present an example from [8, Section 3.3] that falls outside the theoretic framework of this work. Take the one-dimensional MV-SDE for t ∈ [0, T ] and γ ∈ R and The dynamics of the interacting particle system (1.2), for i ∈ 1, N , where (W i ) i are independent Brownian motions. Take γ = −1, T = 1. We have a superlinear measure component in (3.4) and none of the schemes is applicable (insofar as existing theoretical results allow). If the measure component was fixed (with finite 2nd moment), then, the drift would satisfy a one-sided Lipschitz condition -the convergence of this type of schemes is left for future work.

Linear case: an Ornstein-Uhlenbeck McKean-Vlasov SDE
For the MV-SDE (see e.g. [8, Section 2.1]), for all t ∈ [0, T ] and x 0 ∈ R where ρ, λ, η are constants. The first and second moments of X are respectively given by E . Let X, Z be two solution of (3.5) with X 0 and Z 0 as initial condition respectively, then by direct calculation Let ρ ≤ 0 and ρ + λ < 0 then from Definition 2.10, (3.5) generates exponential mean-square contractive solutions. The parameters of this example are L v = ρ, Lb = λ 2 , Lṽ = L b = L σ = Lσ = 0. Plugging these into (2.7) and in order to make β < 0, we need to choose h satisfying From Definition 2.8, the split-step method (2.3)-(2.4) is mean-square contractive. So, for the split-step method, h exists when ρ + λ < −1/2. Both this condition and the condition for the SDE to generate exponential mean-square contractive solutions need the constraint ρ + λ < 0. Thus, the condition for a meansquare stable numerical solution is slightly stronger than the condition for the SDE to generate exponential mean-square contractive solutions.

Nonlinear case I: a stochastic Ginzburg Landau type equation
We illustrate the stability of the SSM scheme via the stochastic Ginzburg Landau type equation (in the style of that in Section 3.1), we consider the following one-dimensional MV-SDE for all t ∈ [0, T ] The parameters of this example are L v = −5/2 − γ, Lb = L σ = 1, and L b = γ 2 , Lṽ = Lσ = 0 -with these parameters it is known [47] that the system is conservative and the solution satisfies X t → 0 a.s. as t → ∞. Plugging these into the mean-square stability β constant (2.7) and, when γ = 0, one must have small h in order to make β < 0. We now employ the split-step method, under different choices of h and initial values. Set the number of particles to be N = 1000. Set γ = 0, T = 3 and X 0 = 1. Figure 3.4(a) shows that E[|X i,N T | 2 ] decreases to zero under different values of h (but small). Figure 3.4(b) shows mean square contraction property between X and Z highlights an exponential decay (where Z solves (3.7) for Z 0 = 5, 10, 100.) Fix h = 0.1. Figure 3.4(c) shows that when γ = −12 the scheme performs poorly, and this follows from the conditions of Theorem 2.9 not being satisfied. For γ ∈ {0, 12}, the scheme shows contraction as t → ∞, but it is much slower for γ = 12. Figure 3.4(d, lower graph) shows what happens when one shifts "slope from v to b" via the linear term γx (see Remark 2.7). We have now L v = −γ − 5/2, thus, when γ ∈ (−5, 5) the figure shows contraction, with X i,N T ≈ 0 as expected. There is a significant change for γ < −10 where the approximation is not converging to the correct value. For γ ≥ 5 and higher (recall that h = 0.1 is fixed) it seems the contraction is happening (although at a slower pace) but in Figure 3.4(d, upper graph) one sees that γ → 1 + β(γ)h is now above 1.0 which does not guarantee contraction (in the sense of Theorem 2.9).

Nonlinear case II: the two-dimensional Cucker-Smale flocking model
This example (see [22,Section 2]) highlights the stability of the split-step method. It is stable under larger timestep h by using the implicit step for the superlinear part. The explicit methods (Taming and Adaptive) fail to have acceptable results at this level of h.

Applied our settings, this is a two-dimensional MV-SDE define under
where λ, σ are constants. The dynamics of the particle system follows easily where i ∈ 1, N , V i,N , X i,N ∈ R, (W i ) i are independent Brownian motions.  Figure 3.5 (a,b) show the distribution of V at time T = 1, 2. All four methods have same initial distribution (and same filtration) nonetheless there is a slight skew between the final results. Taming with α = 0.5 has a different distribution than hte other three methods at T = 1, later, the SSM clusters at a different point than Adaptive and Taming method with α = 1, but the deviation is very small (< 10 −2 ). Consider the strong error graph (c), the two Taming methods fail to have acceptable result with larger timestep; while SSM and Adaptive are at a similar position. For Adaptive, the error rate is nicely behaved but there is a jump at h = 0.02. The split-step method error rate decrease is stable as h decreases.

Discussion
We discuss some comparative advantages between the methods starting with generalist comments. All schemes have the same convergence error rate rMSE ≈ Ch 1 2 . Taming is by far the easiest to implement, with Adaptive the most complex requiring tuning the h δ map for each case (see A). The SSM requires an implicit solver and ad-hoc choices of v and b for efficiency. Taming is the fastest algorithm with SSM and Adaptive running times being comparable with each other. In the way we presented the SSM: all methods are amenable to an efficient parallel implementation (under the caveat of processor communication [9]); moreover, in view of Remark 2.7, the SSM does not have any (real) restriction on the time-stepping although one solves an implicit method.
From the numerical examples, we see that 1. the strong error of the SSM is consistently one order of magnitude smaller than that of Taming. Under same choice of timestep, Taming is the fastest with SSM comparable to Adaptive. However, to reach the same strong error level, SSM takes less computational time than Adaptive and significantly less than Taming.
2. Compared to Adaptive, the SSM has in general no worse convergence than it and no clear domination of one over the other emerged. Implementation wise (at the level of computing the rMSE), to keep the same filtration for different timestep choices, the Brownian motion paths for Adaptive with function h δ (x) is much harder to generate (requiring sub-simulation from Brownian bridges) than the SSM with a fixed timestep. As a rule of thumb, Adaptive does on average a double amount of timesteps than SSM or Taming [43].
3. From the numerical examples and at the level of the strong error, the SSM performs better than Taming and Adaptive at larger time steps h (via comparative lower errors).
We have not investigated the effect of dimensionality, and we suspect that the running time gap of Taming between SSM or Adaptive will widen. The SSM we present has the extra advantage of flexibility in the way of how v and b are chosen. This means that a layer of optimisation can be added to the implicit solver. Lastly, and partially addressed here with the stability analysis, do the schemes preserve the finer properties of the underlying dynamics? Are they geometrically ergodic? Do they preserve oscillatory dynamics, such as amplitudes, frequencies and phases of oscillations? Even for large time steps? It is known that explicit Euler type schemes face difficulties in regards to this, with implicit or splitting methods being more stable [14].

Proof of the convergence result for the split-step method (SSM)
Throughout this section Assumption 2.1 is assumed to hold for all results.

Proof of the main convergence result, Theorem 2.6
For all auxiliary results next, we assume the conditions of Theorem 2.6 are in force and we thus do not state them.

Preliminary results
As a first step, we state a result that allows us to re-write (2.3) and (2.4) as a map ofX i,N without the presence of the Y i, ,N . We present first a new general version of [26,Lemma 3.4] where the differentiability Assumption is lifted and the maps are allowed to depend on time (and the measure component).

Lemma 4.1. Let v be as in Assumption 2.1. Choose
with d, µ, t fixed and c unknown, has a unique solution in c. Define the functions v h and F h as We then have for all t ∈ [0, T ], x, y ∈ R d , µ, µ x , µ y ∈ P 2 (R d ) the following four inequalities, For x i , y i ∈ R d , i ∈ 1, N and µ x , µ y ∈ P 2 (R d ) being the empirical measures associated with the collections We observe that to establish (4.3) one only needs 1 − hL v > 0, in other words: the condition 1 − 2hL v > 0 is not sharp for that result. Nonetheless, inequalities (4.4) and (4.6) are critical for our work and hence we write one single constraint.
Proof. Existence and uniqueness for (4.1) can be proved via a strict monotonicity contraction argument. Namely, fix some t ∈ [0, T ], from (4.1) one defines the operator A : R d → R d as A(u) = u − hv(t, u, µ) for u ∈ R d . Following [46, Definition 25.2 (p.500)], the operator A is continuous and strongly monotone (uniformly in t) under the restrictions h > 0 and 1 − hL v > 0. This follows by directly injecting the onesided condition of v (from Assumption 2.1) in the definition of strongly monotone operator. Finally, from [46,Theorem 26.A (p.557)] we conclude that the operator A is invertible and the inverse map is Lipschitz continuous. Thus, (4.1) has a unique measurable inverse given by F h from (4.2). See also [36, (p.2596)].
We now determine the Lipschitz constant of v h and F h of (4.2). For (4.3), suppose c, d ∈ R d , µ ∈ P 2 (R d ) satisfy c = d + hv(t, c, µ) then, from Assumption 2.1: Since c = d + hv h (t, d, µ), we have by re-arranging the terms and plugging the inequality above For (4.4), suppose c 1 , c 2 , d 1 , d 2 ∈ R d , µ ∈ P 2 (R d ) satisfy c 1 = d 1 + hv(t, c 1 , µ) and c 2 = d 2 + hv(t, c 2 , µ), then For (4.5), suppose x, y,x,ŷ ∈ R d , µ x , µ y ∈ P(R d ) satisfyx = x + hv(t,x, µ x ),ŷ = y + hv(t,ŷ, µ y ). We then have To prove (4.6) we use the same notation/identities used to prove Inequality (4.4) above. We have that and thus We now address the Lipschitz property of b h , σ h . Since they are of the same nature, we provide only the proof for b h as that for σ h is identical. Let i ∈ 1, N , using the definition of b h , then Assumption 2.1 followed by (4.5) we have The convergence result in the final statement follows straightforwardly from [26,Lemma 3.4]. This convergence result is applied with fixed N and the parameter of the convergence is h (not N ). One only needs to apply their arguments over [0, T ] × R dN where the measures µ ∈ P 2 (R d ) are taken to be compactly supported on the compact where the family of points {x i } i is contained.
After having introduced Lemma 4.1, we can finally address the continuous-time extension of the SSM (2.3)-(2.4) as referenced in Theorem 2.6. The SSM can be written as a continuous time SDE via linear interpolation of the iterates, namely, for t ∈ [t n , t n+1 ], i ∈ 1, N , where κ(t) = sup t n : t n ≤ t, n ∈ 0, M and µ N tn = µ N n .

Moment bounds
We now employ the results of Lemma 4.1 to establishing a domination of |Y i, ,N n | by |X i,N n |.  However, through the split-step structure, one can use the "add and subtract a linear component" in the drift before the split-step is executed to make L v < 0 and thus remove the constraint on h -see Section 3.4.1.
Proof. From (2.3) and Remark 2.2, for any i, n and any t n ∈ π we have using Cauchy-Schwarz and the properties of v that After this remark emphasising the independence of the constants in h, we are in a position to prove the moment bounds for the output of the SSM.
Proof. Let i ∈ 1, N , n ∈ 0, M − 1 and recall (2.4). Using Assumption 2.1, Remark 4.4, we have by taking squares and applying Young's inequality: where we used By backward induction from n + 1 to zero, we have (after some simplification) Taking power p ≥ 2 on both sides, expectations and re-organising the terms, we have (with C p independent of h, N ) There are 8 terms to be estimated, but in essence only 3 arguments are needed. We present them only for the most complex terms since for the remaining ones it is just a simplification of the arguments presented. We present them in the form of supremum over n ∈ 0, M − 1 . We start with the last term of the 2nd line: apply Jensen's inequality twice after scaling the outer summation and then tower property to take advantage of the conditional independence betweenX j,N k and ∆W i k , namely, We now address the second term in the 1st line. Using Burkholder-Davis-Gundy (BDG) inequality and Jensen's inequality as above, the Lipschitz property of σ, Jensen's inequality and the domination of Finally, we address the last term in the 3rd line of the inequality. The result follows by applying Jensen's inequality Collecting the several inequalities and injecting them back in the initial inequality, we conclude that Taking supremum and using that the particles are conditional i.d.d. (for fixed k) The proof finishes after applying the discrete Grönwall's inequality to the inequality, and using that theX i,N 0 are i.i.d.
We now provide (moment) estimates for the continuous-time extension of the SSM.
Proof. Let i ∈ 1, N and n ∈ 0, M − 1 . From (4.7), set t n + s = t, for all t ∈ [0, T ], n ∈ 0, M − 1 , then Since s ≤ h, by the Lipschitz conditions on b h and σ h , and Lemma 4.2, we have

Taking supremum over time and expectations on both sides yields
where I i,n h is given by . Using the BDG inequality, Jensen's inequality, the Lipschitz condition on σ h and Proposition 4.5, gives Take supremum over i ∈ 1, N , then by Proposition 4.5 it follows that The last result in this block concerns incremental (in time) moment bounds ofX i,N .

Proposition 4.7.
There exists C ∈ R + such that for any p ≥ 2, with m ≥ (q + 1)p, . Thus, we have a constant C p only depending on p such that where q follows from the polynomial growth property of v in Assumption 2.1.

Local errors
After having discussed moment bounds, we now discuss the local error. Then, there exist positive constants C 1 , C 2 , C 3 and q = 2(q + 1) 2 , such that for all t ∈ [0, T ], i ∈ 1, N , (4.14) where µ z and µ F h,z,µ z are the two empirical measures associated with {z i } i and {F h (t, z i , µ z )} i respectively, i.e., Proof. Recall the estimates given in Lemma 4.1. Using the identity F h (t, z i , µ z ) = z i + hv h (t, z i , µ z ), (4.3), Assumption 2.1, Young's inequality and Jensen's inequality, we have As in Lemma 4.1 we show only the result for b h as the computation is the same for σ h (and overall very close to that for v h ). Using the definition of b h , the Lipschitz property of b and the definition of µ F h,z,µ z , µ z , using similar calculations as above, by Young's inequality and Jensen's inequality, we have Applying Inequality (4.3) and the growth in v from Assumption 2.1 (as in the previous proof), we have the claim.
We analyse the components term by term. Namely, for (4 From the Assumption 2.1, take supremum over [0, T ] and expectations, by the Young's inequality, we have By the 1/2-Hölder regularity in time and the assumption on v, the particles being i.i.d. and the Cauchy-Schwarz inequality we have where in the last inequality we used Hölder's inequality on the product term in combination with Proposition 4.6 and 4.7 with m ≥ 2(q + 1) 2 to guarantee the error satisfies E |X i,N s −X i,N κ(s) | 4 ≤ Ch 2 . We now make use of Proposition 4.8 and arguments similar to those above to deal with the last term of the initial inequality where q defined in Proposition 4.8 such that m ≥ 2(q + 1) 2 = q as to guarantee E |X i,N κ(s) | q ≤ C. We now proceed to estimate the b components. Using Young's inequality, for (4.16) For the first term above, the Lipschitz condition on b, (4.4) and Proposition 4.7, yield and similarly we obtain: (4.23) Consider the last term (4.18), take expectations, using the BDG inequality, Cauchy-Schwarz inequality and Proposition 4.8, Grönwall's lemma delivers the final result.
Now, the proof of the main theorem is concluded as follow.
Proof of Theorem 2.6. In relation to Points 1, 2 and 3 in the theorem's statement: Point 1 follows from Proposition 4.6; Point 2 follows from Proposition 4.9; the last point follows by a straightforward combination of Proposition 2.4 and Proposition 4.9.

Proof of the stability Theorem, Theorem 2.9
Proof of Theorem 2.9. Let i ∈ 1, N and n ∈ N. From (4.5) in Lemma 4.1, since the particles are identically distributed, we have where we used the tower property of the expectation with F tn -conditional expectations to deal with the Brownian increment term (it holds that E[ |∆W i n | 2 |F tn ] = h after using that all Y j, ,N n , G j, ,N n are F tnadapted), the Cauchy-Schwarz inequality and that the particles are i.i.d.

A Short description of the Taming and Adaptive time-stepping method
We provide a brief review of Taming [18] and Adaptive time-stepping [43] for superlinear growth MV-SDEs in the context of the notation set in Section 1 and 2. Each method approximates (1.1) through the interacting particle system (1.2) as described next. Table 1 summarises strong error (rMSE), mentions weak error as an open problem, and stability results. Both schemes (plus the proposed SSM one) hold under the same conditions: Assumption 2.1 and a sufficiently high integrable initial condition (as in Theorem 2.6).
Methods rMSE Stability Taming [18] 0.5 Unknown Adaptive [43] 0.5 Unknown SSM 0.5 Contraction Theorem 2.9 Table 1: Information regarding the different methods. Convergence of Taming [18] and Adaptive [43] scheme under the same conditions as Assumption 2.1. The Root MSE (rMSE) is the metric presented in (2.5). Weak error analysis has not been carried out but experimental work points in the direction of weak error order 1 for the three methods.

A.2 Adaptive time-stepping method
Adaptive [43] approximates (1.2) as follows for t n ∈ [k n h, (k n + 1)h), k n ∈ N and X i,N tn+1 =X i,N tn + b t n ,X i,N tn ,μ X,N knh h i n + σ t n ,X i,N tn ,μ X,N knh ∆W i tn , i ∈ 1, N . The function h δ is specified at each example and is to be understood similarly to the taming technique. In essence, h δ is to be chosen such that | b(x)h δ (x)| is of linear growth. For Adaptive, one modifies the timestep h in a dynamic fashion to control the growth of b while taming modifies the drift b to control the growth across the application of the scheme. The rMSE convergence rate of order 1/2, see [23] or [43].