Convergence analysis of an explicit method and its random batch approximation for the McKean-Vlasov equations with non-globally Lipschitz conditions

In this paper, we present a numerical approach to solve the McKean-Vlasov equations, which are distribution-dependent stochastic differential equations, under some non-globally Lipschitz conditions for both the drift and diffusion coefficients. We establish a propagation of chaos result, based on which the McKean-Vlasov equation is approximated by an interacting particle system. A truncated Euler scheme is then proposed for the interacting particle system allowing for a Khasminskii-type condition on the coefficients. To reduce the computational cost, the random batch approximation proposed in [Jin et al., J. Comput. Phys., 400(1), 2020] is extended to the interacting particle system where the interaction could take place in the diffusion term. An almost half order of convergence is proved in $L^p$ sense. Numerical tests are performed to verify the theoretical results.


Introduction
The McKean-Vlasov equations are some distribution dependent stochastic differential equations (SDEs) that are first introduced by H. McKean [29] to describe the Vlasov system in dynamics theory and mean field stochastic differential equations.They have been widely studied due to their important applications in nonlinear equations and finance [6,31,36].
In this paper we consider the following McKean-Vlasov SDE: where ℒ(  ) indicates the law of   , i.e.,   := ℒ(  ) satisfies with any Borel  ⊂ R  .Here,  0 ∈ R  is the initial value of the SDE sampled from an initial distribution  0 ,  : R  ×  2 (R  ) → R  ,  : R  ×  2 (R  ) → R × ′ are some given fields and  • is an  ′ -dimensional standard Wiener processes so   is the process at time .Here,  2 is defined as follows.The set of all probability measures on R  is denoted by (R  ) while the notation   indexed by some  ≥ 1 means the set of probability measures having finite -moment, i.e., Clearly,   ⊂  2 for  ≥ 2. We consider the dynamics for all  ∈ (0, ∞).Later for numerical approximation, we will fix one, but arbitrary, terminal time  to study the convergence.Different from the classical SDEs [26], the coefficients of McKean-Vlasov SDEs depend on the law of the current variable   so the dynamics of the law ℒ () is nonlinear [23].
The existence and uniqueness for strong solutions of (1.1) have been established in [34] under some linear growth and Lyapunov-type conditions.Later the existence and uniqueness for weak and strong solutions of the McKean-Vlasov SDEs have been studied by some authors under various conditions and we refer the reader to [4,16,30] for more details.To our knowledge, there is little literature that reports the existence and uniqueness results on the solution of a McKean-Vlasov SDE with super-linear coefficients.In [11], the authors showed that the McKean-Vlasov SDE has a unique solution when the drift coefficient satisfies a one-sided Lipschitz condition, but the diffusion term is still linearly growing.It is worthwhile to mention that a recent result that proves that the McKean-Vlasov SDE admits a unique solution when both drift and diffusion coefficients are super-linear w.r.t. the state [24].
As is well known, the first step to simulate the McKean-Vlasov SDEs is usually to approximate the true measure ℒ(  ) by the empirical measure.That is, a particle system is adopted to simulate the McKean-Vlasov SDE following the theory of propagation of chaos [34].The state of the particle  ∈ 1, . . .,  in the symmetric system of SDEs coupled in a mean field scaling is then given by where Here,   denotes the Dirac measure at point ,   0 ∈ R  are the initial values of the SDEs and the processes    (1 ≤  ≤  ) are  independent copies of the  ′ -dimensional standard Wiener processes.We always assume that   0 are i.i.d.sampled from some initial law  0 .Assuming the coefficients of (1.1) satisfy global Lipschitz condition w.r.t. the state and measure, the convergence of an Euler scheme applied to the particle system has been shown in [7].In [15], multilevel and multi-index Monte Carlo methods have been proposed for the Euler schemes applied to the McKean-Vlasov equation with global Lipschitz conditions.Bao and Huang [2] studied propagation of chaos and convergence rate of the tamed Euler-Maruyama scheme for McKean-Vlasov SDEs by using associated weakly interacting particle systems, where the drift or diffusion term is Hlder continuous.Then the convergence of Milstein schemes, taming the drift by a one-sided Lipschitz condition, for a time-discretized interacting particle system was discussed in [3].Li et al. [25] proved strong convergence of the Euler-Maruyama schemes for approximating McKean-Vlasov SDEs under local Lipschitz conditions w.r.t. the state variable.Under a Khasminskii-type monotonicity condition, instead of imposing a one-sided and global Lipschitz condition on the drift and diffusion coefficient, respectively, two explicit schemes were proposed in [24] for an interacting particle system and the propagation of chaos property of associated McKean-Vlasov equation was also studied therein.Hoeksema et al. [17] proved a large deviation principle for the empirical measure of a general system of mean-field interacting diffusions with a singular drift and show convergence to the associated McKean-Vlasov equation.Rached et al. introduced in [32] a decoupling approach for McKean-Vlasov SDEs, which approximates the McKean-Vlasov law in the coefficients.
Clearly, the computational cost for the numerical schemes is ( 2 ) per time step if naively implemented, which is expensive in applications.To save the computational cost, a projection-based particle method was proposed in [5] for solving McKean-Vlasov stochastic differential equations.Recently, a random batch method(RBM) has been introduced in [20] to simulate the interacting particle systems efficiently, which can reduce the computational cost from ( 2 ) per time step to ( ) with additive noise.
In this paper, we first extend the theory of propagation of chaos for a highly nonlinear McKean-Vlasov equation [24] to the case of  ≥ 2. We then adopt the truncated Euler scheme, which is different from the tamed scheme in [24], to numerically solve (1.3) for approximation of the McKean-Vlasov SDE.The main contribution of this paper in the numerical analysis is twofold.Firstly, we provide a strong convergence analysis of a truncated Euler scheme to numerically solve the McKean-Vlasov SDEs under a Khasminskii-type monotonicity condition w.r.t. the state.Secondly, we extend the aforementioned random batch approximation to improve the efficiency of the truncated method, and establish the error estimate for the approximation.Different from the results in [20], the diffusion coefficient in the interacting particle systems considered in this paper could contain weakly interacting terms so that the random batch approximation also takes place in the diffusion.
The remainder of this article is organized as follows: In Section 2, some assumptions are introduced and the propagation of chaos property is obtained.In Section 3, we employ a truncated Euler method to solve the interacting particle systems and prove that the convergence order is almost one-half.Then in Section 4, we present the convergence and efficiency of the numerical method by integrating the RBM into the truncated Euler scheme.In the last section (Sect.5), a numerical experiment is given to verify the theoretical results.

Mathematical preliminaries
We first introduce some notations and basic definitions for later sections.The Euclidean norm of a dimensional vector is denoted by | • | and the Hilbert-Schmidt norm of a  × -matrix is denoted by || • ||, where we recall the Hilbert-Schmidt norm is given by ‖‖ 2 = tr(  ) for a real matrix .If  is a set, its indicator function ℐ  is given by Moreover,  ∧  := min(, ) and  ∨  := max(, ).The notation  ⊗  for  ∈ R  and  ∈ R  means the tensor product of  and , namely, ( ⊗ )  =     .Throughout this article,  > 0 is a generic constant that might change its value from line to line.We use the notation   to mean  ≤  for some generic constant  that is independent of the parameters been focused on.
Recall (1.2) for   .For  ≥ 1 and any ,  ∈   (R  ), the -Wasserstein distance is defined by where Π(, ) is the set of couplings of  and  (i.e., joint distributions with marginals to be  and  respectively).Clearly,   (︀ R  )︀ is a Polish space under the -Wasserstein metric.Regarding the   distance, we have the following simple observations.Lemma 2.1.Let  ≥ 1.For any probability measure , For two empirical measures The proof for the first claim with  = 2 can be found, for example, in Lemma 2.3 of [11], and the proof for general  ≥ 1 can be similarly performed.The second follows from constructing a simple transport plan (, ) = 1
Assumption 2.3.There exists a pair of constants  > 0 and  0 > max(2, 4) such that the coefficients (•, •) : The assumptions here also appeared in [24].We note that the Lipschitz continuity for  is uniform.A typical example is (, ) =  () + ∫︀ (, )() where  is globally Lipschitz.As another comment, for condition (2.2) to hold,  is not necessarily globally Lipschitz in .In fact, as long as the growth can be controlled by the confining effect in , the condition still holds.One example is  =  −  3 and  =  2 for  ∈ R.
The lower bound 2 for  0 in Assumption 2.3 is essential in our proof and is used in the proof of Theorem 3.4 where we need to bound  1  moment for some  ≥ 2 and  1 > 1.The lower bound 4 for  0 , however, is not essential for the convergence.It only ensures the existence of  ∈ [2,  0 /2) so the error estimates given below in Proposition 2.5 and Theorem 3.4 can be cleaner (otherwise, there would be a term like  −(−)/ ).
The assumptions made above clearly imply the following growth conditions of the coefficients: -For any  <  0 where  0 is the parameter in Assumption 2.3, there exists a constant  > 0 such that Combining these two simple estimates, the conclusion then follows.
-For the same constant  as in Assumption 2.2, it holds that for all  ∈ R  and  ∈  2 (R  ).
Corresponding to (1.3), we consider X  which solves (1.1) but with X 0 =   0 and the same Brownian motions.In other words, X  's satisfy the following for any .By standard techniques, one may show the following bounds on the moments (see, for example, [9], Thm. 1).For the convenience of the readers, we provide a short sketch in Appendix A.
Proposition 2.4.Let (2.3) hold for some  ≥ 2 and  0 ∈   .Then for any  > 0, there exists  depending on  and  but independent of  and  such that The following result claims that the trajectories of X  and    become identical as  → ∞ almost surely.This implies the propagation of chaos for the interacting particle system so that the mean field limit to the McKean SDE holds.Proposition 2.5 (Propagation of Chaos).Let Assumption 2.3 be satisfied.If for some  ∈ [2,  0 /2) the initial law has finite -moment (i.e.,  0 ∈   ⊂  2 ), then it holds that where the constant  > 0 depends on  ,  and  0 but does not depend on N.
Proof.From equations (1.3), (2.5), we have Due to Assumption 2.3, one has Consequently, one obtains In the last inequality, we have applied  2 (, ) ≤   (, ).Then, by the Minkowski inequality, The last equality here holds due to the symmetry.
The term )] is controlled by the Wasserstein distance estimate for the empirical measures with i.i.d.samples in Theorem 1 of [13].Here, since  0 > 2, one can choose the moment  such that  > 2 and the terms  −(−)/ in Theorem 1 of [13] can be removed for simplicity.Then, applying Grönwall's inequality completes the proof.
We remark that the requirement  <  0 /2 is only used such that the term  −(−)/ is absent for simplicity.Clearly, if  ∈ [ 0 /2,  0 ), there is still convergence.Another remark is that the rate of the propagation of chaos here is like  −1/(2) under   norm when  is large.This low rate is due to the usage of Wasserstein distances to gauge the continuity on the distribution dependence in the coefficients.If the dependence on the distribution is through some statistics, the rate may be improved to  −1/2 due to the central limit theorem.Note that though the particles are dependent for finite  , the rate of central limit theorem is expected to hold in the large  regime due to propagation of chaos, though rigorous proof needs careful estimates (see, for example, [35] regarding the rate of convergence for statistics).
Lemma 3.1.It holds that and for any  ∈ [2,  0 ) where  0 is parameter in Assumption 2.3, there exists a constant  > 0 such that Proof.Here, we only need to verify the Khasminskii-type condition (3.2).
Q. GUO ET AL.

The truncated scheme
The numerical approximation of    at   is denoted by  ,Δ  .With the truncated coefficients, the numerical solutions  ,Δ  are then generated by the basic Euler-Maruyama scheme where ∆   =   +1 −    , the initial value  ,Δ 0 =   0 and The scheme (3.3) is the basic truncated scheme studied in this work.
We introduce two versions of extension of the numerical solution at the discrete time points to  ∈ [0, ∞).The first is the piecewise constant extension given by X ,Δ =  ,Δ  ,   ≤  <  +1 , and the second is the continuous extension of the truncated Euler-Maruyama method defined by Here, the initial value X,Δ

Analysis of the truncated scheme
We present some preliminary properties of the numerical solutions.We first give some moments estimates of the numerical solution to ensure certain stability.The first lemma below is to control the difference between the two versions of extension.Lemma 3.2.For any ∆ ∈ (0, ∆ * ], any  ≥ 0 and any  ≥ 2, where C is a positive constant independent of ∆. Proof.By the definitions (3.4) , one has Using Hölder's inequality, Burkholder-Davis-Gundy inequality and (3.1), we have Proposition 3.3.Suppose (3.2) holds and  0 ∈   ⊂  2 .Then for any ∆ ∈ (0, ∆ * ], any  > 0, it holds that where C is a positive constant dependent on T and  but independent of ∆. Proof.Recall (3.4): By the Itô's formula, one has Hence, where we applied Lemma 2.1.By Lemma 3.2, it is clear that By the Minkowski inequality and the symmetry By the monotonticy of the bound on the right hand side, one has Application of Grönwall's inequality then completes the proof.
Below, we prove the convergence of the truncated scheme.Define two stopping times for particle The estimates in Propostion 3.3 ensures that   → +∞ a.s. as ∆ → 0. Clearly, before   , the truncated method reduces to the usual Euler-Maruyama scheme.
Below, we make use of these stopping times to obtain the convergence of the truncated scheme, and the following is our first main result.
Theorem 3.4 (Rate of Convergence).Suppose Assumptions 2.2 and 2.3 hold and  0 ∈  0 .Let {   } be the solution to the  -particle system (1.3) and  ,Δ  be the solution to the truncated scheme where   is a positive constant dependent on , ,  but independent of  and ∆.Consequently, one has for where  depends on , ,  but is independent of  and ∆.Hence, it holds that )︁ 1/ = 0.
Proof.For 0 ≤  ≤  ∧   , the coefficients  and  that particle  sees agree with the truncated ones,  Δ and  Δ .Hence, by the Itô's formula, with and Here, we can take any  ′ ∈ (,  0 ).
According to Assumption 2.3.
For  1 , we first note that because for  >   , the integrand is nonnegative.Then again, by Lemma 2.1, the Minkowski inequality and the symmetry of the particles, Here,   has been thrown away because it is not suitable for stopping other particles    .Therefore, Note that for all  ∈ [0, ], (3.10) Here, recall that ℐ  is the indicator function of the set .By Young's inequality: Where  2 > 1 is some positive constant such that  2 <  0 .The expectation in the first term is then bounded.By the moment estimation of X  and  ,Δ  and the Markov inequality Balancing the terms, we may choose Therefore, Applying the Grönwall inequality yields Combining with the propagation of chaos results (Prop.2.5), one therefore obtains the eventual estimate as listed in the statement.Now, we perform some discussions.First of all, we note that if  is globally Lipschitz and  is one-sided Lipschitz, then  0 can be arbitrarily large.Then, (∆) can be chosen as polynomial of ∆.For large enough , the last term can be dropped.Hence, one has Corollary 3.5.If  is globally Lipschitz and  is one-sided Lipschitz, then (3.11) In this case, one may choose ℎ(∆) that grows slowly as ∆ → 0 like ℎ(∆) ∼ ∆ − for very small .This indicates that the order of the strong error can be arbitrarily close to one half.However, if  0 has an upper bound, choosing ℎ(∆) ∼ ∆ − with small  leads to big bound for the last term (∆) /−1 .In this case, one needs to choose suitable ℎ to optimize the rate.

The random batch approximation for the interacting particle systems and its error analysis
The computational cost for the simulation of the particle systems in (1.3) and (3.4) is high due to the ( 2 ) complexity per time step.In this section, we extend the recently proposed RBM [18][19][20][21] to our particle systems and establish the approximation error estimate.
The RBM is based on the so-called "random mini-batch" idea, which is famous for its application in the stochastic gradient descent (SGD) [33] algorithm for optimization in machine learning.The key methodology of "mini-batch" is to find a cheap and unbiased random estimator using small subset of data/particles for the original quantity.How to design the random batch estimator clearly depend on the applications.In the RBM for particle systems, a random grouping strategy was proposed in [19][20][21], while an importance sampling in the Fourier space was proposed for the Random Batch Ewald method for molecular dynamics in [19,22].Note that for each single step, the random estimator has (1) error for the quantity.The key reason for the method to converge is the error cancelation in time.It is this type of Law of Large Numbers (in time) that ensures convergence.A difference of the RBM method from SGD is that the method is designed to dynamical properties of the systems, not just for equilibrium distribution.
In the original work [20], Under this setting, the interacting particle systems becomes The initial values   0 are i.i.d.sampled from  0 .Here, we have used 1 −1

∑︀
̸ =     to approximate , and there is no significance difference regarding the propagation of chaos.
Suppose we aim to do simulation until time  > 0. RBM does the following.Choose a batch size  ≪ ,  ≥ 2 that divides  .For the time step ∆ and   := ∆, on each time subinterval [  ,  +1 ), there are two steps: (1) at time grid   , we divide the  particles into  := / groups (batches) randomly; (2) the particles evolve with interaction inside the batches only.
end for 6: end for In the case above, the dependence in  is linear in the drift.As proved in [20], for a fixed configuration, the random batch approximation in the drift is unbiased and there is no restriction on batch size  .
Below we extend the random batch approximation to general coefficients where the diffusion coefficient could also depend on  and they could be nonlinear in .In general, if there is dependence of  in , the quadratic variation in the Itô's formula would cause a contributing term of the order 1/ to the mean square error (or

√︀
1/ to the error).Moreover, if the dependence on  is nonlinear, the random approximation will no longer be unbiased for a fixed configuration.Consequently, it would also contribute an error that does not vanish as ∆ → 0. All these would put restrictions on the batch size  .For example,  ∼ ∆ −1/2 to achieve half order for the mean square error.However, since the selection of batch size does not grow as the number of particles  → ∞, the random batch approximation could still be beneficial.
We first consider a special case where the dependence on  is linear to investigate how the random batch approximation in diffusion coefficient contributes to the error: In particular, we investigate the bias introduced by the distribution dependence in the diffusion coefficient, which as we shall see, is already very different from [20] regarding the application of RBM.Secondly, we would consider drifts where the dependence on  is nonlinear to investigate the error introduced by the random batch approximation: Here,  : R  → R  for some  ≥ 1 is a nonlinear field.Lastly, we perform discussion on general cases.In principle, the analysis would be similar to the cases discussed.In this section, all the discussion on the random batch system will be performed by keeping the time continuous.Combining with the truncated scheme in the previous section, one will obtain the eventual method for numerical simulation.

The particle system and the random batch approximation
The interacting particle system (1.3) for (4.3) is given by The initial values   0 ∈ R  are the same as (1.3).The difference is whether we use for the approximation.Note that there is no big difference regarding the propagation of chaos while 1/( − 1) is more convenient in notation for the random batch system.We will focus on (4.5) then.Similarly, for the case (4.4), the interacting particle system is then given by Recall that for the random batch approximation, we divide the  =  particles into  small batches with size  ≥ 2 randomly.Let the random batches be  () ,  = 1, . . ., .We will use to represent the division of random batches.Let () be the index  such that  ∈   .For the random batch approximation of (4.5), we generate two random batches  and ℬ at   and run the following equation where  1 () and  2 () are used to distinguish the batchmates in the two different batches ℬ and .And the initial values are X 0 =   0 , the same as the particle system (1.3).Correspondingly, the random batch approximation for (4.6) The initial values are X 0 =   0 , the same as the particle system (1.3).Our experience tells us that if  is not so small, there would be no significant difference if  and ℬ are independent or the same.Hence, for the convenience of the notations and analysis, we will always assume from here on that  = ℬ.
Namely, the random batches used for the drift and the diffusion are the same.To distinguish the batches at different time intervals, we denote the random division of batches at   by Define the filtration {ℱ  } ≥0 by This contains the information for the batches up to   .Also, is the -algebra for how batches are generated at all time points.

The linear dependence case
In this section, we consider the special cases where the dependence on the distribution is linear and analyze the random batch approximation (4.7).For this purpose, we need stronger assumptions imposed on the coefficients than the one in Assumption 2.3.These conditions are still non-globally Lipschitz but the essential part is Lipschitz.Specifically, we will investigate the problem with the following assumptions.Assumption 4.1. : R  → R  is one-sided Lipschitz in the sense that and the derivatives have polynomial growth. : R  × R  → R  and  : R  × R  → R × ′ are all Lipschitz continuous.
Remark 4.2.The coefficient (, ) = ∫︀ (, )() could also have a single term depending on , like (, ) = () +  1 (, ).In principle, () does not have to be globally Lipschitz.We assume the Lipschitz continuity of  simply for the convenience of discussion.Similarly, sup In both estimates, the constant  , is independent of  and the sequence of batches.
Proof.We only show the moment control for X  because the moment control for    is similar.The proof is essentially the same as Lemma 3.4 of [18].Our approach is to show the boundedness of the moments for any given sequences of the batches, to avoid the interplay between randomness of the batches and the process X .
For  ∈ [  ,  +1 ), by Itô's formula one has By Assumption 4.1, Similarly, Again, using the Lipschitz continuity of  and the Young's inequality like one will get the same bound as in  1 .Consequently, by gluing together all the estimates on different subintervals of the form [  ,  +1 ), one has for any .Hence, defining one has the inequality

Applying Grönwall's inequality yields the result for E[| X𝑗
Then, taking expectation over the random batches, the result for the moments also follows.
Below, for a given configuration, x := ( 1 ,  2 , • • • ,   ) ∈ R   .Define the random deviation in a random batch approximation, with interaction kernel : (4.12) The following lemma, shown in Lemma 3.1 of [20], gives the basic consistency and stability features of the RBM. where We remark that if  is a matrix-valued function, the norm in this lemma is the Hilbert-Schmidt norm.Let us briefly understand how the random batch approximation in the diffusion coefficient affects the error.If we define Then, one finds that Intuitively, if    is close to X  , then the leading error would be    ∼   (X; ) +   (X; )   , where X := ( 1 ,  2 , • • • ,   ) ∈ R   .Then, by Itô's formula, so conditioning on    , the expectation of   is zero, then the term    •   would contribute a term that vanishes as ∆ → 0. The second term, however, is of order 1/ .This would not vanish as ∆ → 0 if we fix  .This term is the quadratic variation due to the Brownian motion.Clearly, when there is the distribution dependence in the diffusion coefficient, the effect of random batch would be very different from the ones discussed in [20].
The following result gives the basic estimate for the approximation error.Clearly, when (, ) ≡ (), the result (4.16) is the same as in [20], though they assumed boundedness of the kernel (, ).For general  that may depend on , a fixed batch size  clearly would not give convergence.Instead, we need to take  large to get convergence.Corollary 4.6.If one takes  = min(∆ − ,  ) with  ≤ 1, which stays finite as  → ∞, the error is controlled as Similarly, for any particle , Here, we make of the symmetry.The point is that In other words, no matter what its batchmates are, this conditional expectation would be the same because the joint distribution of the particles is symmetric at   .Hence, The second claim is nearly the same.The only difference is that the batch  () used will vary.Let 1  ′ be the indicator that particles  and  ′ are in the same batch.Then, Applying Lemma 3.2 of [21] gives the result.
Remark 4.8.This lemma is different from Lemma 3.4 of [21], where the summands are required to be independent from the random batches.Here, we do not require the independence, but use the symmetry.Nevertheless the goal is the same: to deal with the random sum.
We rearrange the terms in    as Now, we can prove the main result.
Proof of Theorem 4.5.For  ∈ [  ,  +1 ), by Itô's formula, one has For  1 term, using the one-sided Lipschitz condition of  and the Lipschitz continuity of , one has The  3 term can be estimated similarly: Here,  32 is the main local error term arising from  3 , which will not vanish as ∆ → 0. The  31 term can be estimated using the Lipschitz continuity of  so that Now, for  2 , due to Lemma 4.4 and the Lipschitz continuity of , we break this term into the following.
The terms around the brackets on the right side are called  2, ,  = 0, 1, 2, 3 respectively for the convenience of notations.We will leave  20 as it is.Consider  21 first.The first term in  21 can be controlled using Proposition 4.3 by The second term in  21 can be estimated similarly by The  22 term can be estimated using the Lipschitz continuity of , Proposition 4.3 and the Itô's isometry by Here,  23 term contains essentially the local error term for random batch approximation arising from  2 .We may expect that this tends to zero as ∆ → 0. This term can be simply estimated by Combining all these estimates together, we find The double integrals can be estimated, for example, by and One actually has Due to Lemma 4.4, we note that the batch  () is not independent of |   |.To treat this, [20,21] considered the difference |   |−|   |.Here, we make use of the symmetry.By Lemma 4.7, taking expectation on both sides, one has Here, note that X() is independent of the random batches at   so that Finally, by iteration, one has Applying Grönwall's inequality (Thm. 1 in [12]) yields the result as ( + 1)∆ 3 ≤  ∆ 2 .

The nonlinear case
In this subsection, we consider the analysis of the random batch approximation (4.8) for the nonlinear dependence case.
We will establish the approximation error estimate with the following assumption.For the process X  , conditioning on the sequence of random batches, one has where  , is independent of  and the sequence of batches.
The proof is essentially the same and we omit.Moreover, the symmetry Lemma 4.7 also holds for this case.
Below, let us focus on the error process    = X  −    .We find that Denote   (X()) := where  is a constant in (0, 1).As one may expect, the linear term is fine and can be estimated similarly as in [20,21].The main difference is the quadratic term.In fact, this term will be of the order of the variance so that it contributes a term with order 1/ 2 to the mean square error (thus (1/ ) to the error).
Proof.By Itô's formula, one has for  ∈ [  ,  +1 ) that By Assumption 4.9,  1 +  3 can be simply controlled by Consider the  2 term.The second term in  2 will be bounded simply by For the first term in  2 , noticing that one may again break    =    + (   −    ).The treatment for    −    is tedious but is similar to the proof of Theorem 4.5.We omit the details.Eventually, one has Taking expectation, using Lemma 4.4, the symmetry (Lem.4.7 also holds for this case and the proof is the same) and Lemma 4.11, one has Then, similar as in the proof of Theorem 4.5, one can do iteration first and apply the Grönwall's inequality to obtain the final estimate.
Clearly, this error introduced by the nonlinearity is smaller compared to the one introduced by the random batch approximation in the noise term.

Discussion on more general cases
In this section, we discuss more general cases.In fact, the distribution dependence in the previous section does not include the cases when several statistical quantities are involved in the coefficients.For example, if the coefficient depends on the variance, then both the first and second moments will be involved.These cases could be given by (, ) =  () +  Here, , ,   ,   are all assumed to be Lipschitz and have bounded derivatives. is allowed to be one-sided Lipschitz.
The random approximation for such systems can be similarly performed.There is no big difference for the error estimate, while the details could be tedious.One can find that the strong error is again like In other words, if there is law dependence in the coefficient of the noise, the mean square error would be like (1/ ).The mean error arising from the nonlinearity in the drift is like (1/ 2 ).These do not vanish as the step size tends to zero.However, the estimates are independent of  so they could also save time if the required  is large.In the special case when the diffusion coefficient  is independent of  and the drift linearly depends on , the RBM approximation has an error that vanishes in the ∆ → 0 limit, which is exactly the one studied in [20].fix the time step size ∆ = 2 −7 .The 'TEM' and the 'TEMwRBM' run 1000 sample paths to obtain the average time consumption respectively.Upon implementing the random batch technique, we can approximate the error, as stated in (4.16), using √︀ 1/ ∼  for a given tolerance .In this scenario,  = (1/ 2 ) remains fixed, resulting in a cost per iteration of (  ).However, when  = 1/∆  is also required, the cost per iteration increases as the value of  grows.On the other hand, the computational cost presented in Table 1 corresponds to a fixed terminal time.Notably, adjusting  is equivalent to modifying ∆ such that ∆ = ( 2/ ).Increasing the value of  results in a larger step size, consequently reducing the number of iterations.Thus, an optimal value of  exists in terms of the overall computational cost.However, we have yet to fully comprehend this phenomenon, and further investigation would be both intriguing and worthwhile.
The conclusion (4.16) implies that we can obtain the first-order convergence when approximating the interacting particle system associated to (5.3) by a random batch technique with  = 1 (the batch size  = 1/∆).At the same time strong convergence order of a numerical method should be one.Therefore we employ the truncated Milstein scheme [14] to verify the first result of the Corollary 4.6.The numerical results illustrated in Figure 3 confirm our theoretical result, where a dashed red line with a slope of one is provided for reference.Moreover, we conducted a comparison of the results with varying batch sizes.When considering  = 1/∆  with  = 1, 1/2, 1/3, as well as  = 2, as shown in Figure 3 and agrees with the theory in Corollary 4.6.
Example 5.3.For nonlinear cases, we consider the following SDE where  1 ,  2 and  3 are positive constants.The corresponding system of interacting particles is given by We set parameters   0 = 1,  1 = 2.5,  2 = 1,  3 = 1, for all  ∈ {1, . . .,  }, and choose particle number  = 2 12 and obtain 1000 sample paths to calculate the mean square error at  = 1 with four different step sizes: ∆ = 2  for −10 ≤  ≤ −7.Then we examine three distinct batch sizes represented by  = 1/∆  .The numerical errors are illustrated in Figure 4, where three dashed red lines with slopes of 1/2, 1/4, and 1/6 are provided as reference.These results align with the expected strong order of convergence stated in Theorem 4.12.
Example 5.4.In this example, we consider a three-dimensional McKean-Vlasov SDE model, which is employed to describe neuron activity [1,10], given as follows. where ).This is a stochastic model describing the firing dynamics of a network of spiking neurons.The integral terms in the functions  and  capture the influence of the network's collective behavior, commonly referred to as the mean-field limit.This model is important for understanding how neurons interact and communicate with each other in the brain.The stochastic nature of the model accounts for the inherent randomness observed in the behavior of individual neurons, as discussed in [1].We choose particle number  = 2 8 and obtain 1000 sample paths to calculate the mean square error at  = 1 with four different step sizes: ∆ = 2  for −8 ≤  ≤ −5.Here we choose the tamed Euler scheme with time step size ∆ = 2 −13 and the same number of particles  = 2 8 for producing a reference solution.
In Figure 5, we examine three distinct batch sizes represented by  = 1/∆  , where three dashed red lines with slopes of 1/2, 1/4, and 1/6 are provided as reference.The numerical results roughly agree with the theory, and have slightly larger convergence rate compared to theory in the range observed.The error bars in the aforementioned figures indicate that the variance of the error data calculated by our method is small.Appendix A. Proof of Proposition 2.4 Proof.Here, we only illustrate the moments control for    .The one for X  is similar and simpler.For (2.5), using Itô's formula,    Q. GUO ET AL.

E[|𝑋
Case 4. Three indices are the same.There are 4( − 1)( − 2) such terms.This situation can be treated similarly as Case 3.This is controlled by  3  1 / 2 .Case 5.All indices are the same.There are  − 1 such terms.This is controlled by  4 / 3 .
These then add up to the estimate in the lemma.

Figure 1 .
Figure1.The convergence order of the truncated EM method applied to the interacting particle system related to (5.1) without using the random batch technique.The error bars indicate the 95% confidence interval.

Figure 2 .
Figure 2. Convergence order of truncated EM method applied to the interacting particle system related to (5.1) with random batch technique for  = 1, 1/2, 1/3.The error bars indicate the 95% confidence interval.

Figure 4 .
Figure 4.The convergence order of the truncated Milstein method and random batch technique with  = 1, 1/2, 1/3 in nonlinear cases.The error bars indicate the 95% confidence interval.

Figure 5 .
Figure 5. Convergence order of truncated EM method applied to the interacting particle system associated to (5.5) with random batch technique for  = 1, 1/2, 1/3.The error bars indicate the 95% confidence interval.
5) almost surely for any  ∈ [0,  ] and  ∈ 1, • • • ,  .Here,    is the same process as in (1.3) and   0 ∈ R  are the initial values of the SDEs.Clearly, { X  } are i.i.d. with the law being ℒ(  ) for the McKean-Vlasov SDE (1.1).For the convenience of the discussion, we denote the moments of  for  ≥ 1 by The last inequality is due to the Young's inequality.