Non-asymptotic error bounds for The Multilevel Monte Carlo Euler method applied to SDEs with constant diffusion coefficient

In this paper, we are interested in deriving non-asymptotic error bounds for the multilevel Monte Carlo method. As a first step, we deal with the explicit Euler discretization of stochastic differential equations with a constant diffusion coefficient. We prove that, as long as the deviation is below an explicit threshold, a Gaussian-type concentration inequality optimal in terms of the variance holds for the multilevel estimator. To do so, we use the Clark-Ocone representation formula and derive bounds for the moment generating functions of the squared difference between a crude Euler scheme and a finer one and of the squared difference of their Malliavin derivatives.


introduction
We are interested in deriving non asymptotic error estimations for the multilevel Monte Carlo estimators introduced by Giles [3].In this paper, as a first step, we deal with estimators of E [f (X T )] where f : R d → R is Lipschitz continuous with constant [ ḟ ] ∞ , T ∈ (0, +∞) is a deterministic time horizon and X := (X t ) 0≤t≤T is the R d -valued solution to the stochastic differential equation with additive noise When d = 1, this additive noise setting is not restrictive.Indeed any stochastic differential equation dY t = σ(Y t )dW t + η(Y t )dt with multiplicative noise given by some function σ : R → R * + such that 1 σ is locally integrable can be reduced to (1.1) by the Lamperti transformation: for ϕ(y) = y y 0 dz σ(z) , X t = ϕ(Y t ) solves (1.1) with b(x) = η σ − σ ′ 2 (ϕ −1 (x)).For n ∈ N * , we consider the simple Euler-Maruyama approximation X n with time step T /n and we introduce its continuous version given by When b is smooth, both the strong and the weak errors of this scheme converge to 0 with order 1 as n → ∞.According to Giles [3], the time complexity for the multilevel Monte Carlo estimator of E [f (X T )] based on this scheme to achieve a root mean square error ε is O(ε −2 ) in the limit ε → 0, the same as in a standard Monte Carlo method with i.i.d.unbiased samples.For positive integers m and L and (N ℓ ) 0≤ℓ≤L , the multilevel Monte Carlo method approximates the expectation of interest E [f (X T )] by 3) The processes ((X m ℓ t,k ) 0≤t≤T ) k denote independent copies of the Euler scheme with time step m −ℓ T for ℓ ∈ {0, • • • , L}.Here, it is important to point out that all these L + 1 Monte Carlo estimators have to be based on different, independent samples.However, for fixed k and ℓ, the simulations f (X m ℓ T,k ) and f (X m ℓ−1 T,k ) have to be based on the same Brownian path but with different times steps m −ℓ T and m −(ℓ−1) T .
Our main motivation is the derivation of Gaussian type concentration inequalities for Q−E[f (X T )], a natural question, which, to our knowledge has not been addressed in the literature.Recently, Frikha and Menozzi [2] derived concentration inequalities for f (X n T )−E[f (X n T )] which appears in the classical Monte Carlo method.However, estimating the moment generating function of the differences f (X mn T ] is a much more delicate task and adapting their approach seems to be problematic.Nevertheless, the boundedness of the Malliavin derivatives DX n T and DX mn T in the additive noise setting permits to follow the approach of Houdré and Privault [5] based on the Clark-Ocone formula and this is one reason why we focus on this setting.Another reason is that for stochastic differential equations with multiplicative noise, more sophisticated schemes, like the Milstein scheme in the commutative case or the Giles and Szpruch [4] scheme in the general case, are necessary to improve to two the order one of convergence of the variance of f (X and recover the unbiased Monte Carlo complexity.
In Section 2, when b is C 2 , Lipschitz continuous and the Laplacians of its coordinates have an affine growth, we first derive non-asymptotic estimates of the squared error E[( Q − E[f (X T )]) 2 ] of the multilevel Monte Carlo estimator (1.3) for a Lipschitz continuous test function f by computing explicit bounds for the bias E[f (X m L T ) − f (X T )] and the variance Var[f (X m ℓ T ) − f (X m ℓ−1

T
)].Then we optimize the parameters (L, (N ℓ ) 0≤ℓ≤L ) in order to minimize the computation cost needed to achieve a root mean square error smaller than a given precision ε.It turns out that, as ǫ → 0, the optimal bias is of order O(ε 4/3 ), which, to our knowledge, has not been pointed out in the MLMC literature so far.Notice that, for stochastic differential equations with a non constant diffusion coefficient (multiplicative noise), this property remains true for the multilevel Monte Carlo estimator based on the Giles and Szpruch scheme [4], since it exhibits the same orders of convergence of the bias and the variance within a given level as (1.3).
In Section 3, we state and derive our main result, a Gaussian-type concentration inequality for (1.3).Using the Clark-Ocone formula as suggested in [5], we see that to estimate the moment generating function of it is enough to estimate the moment generating functions of the squared difference between the crude Euler scheme with n steps and the finer one with mn steps and of the squared difference of their Malliavin derivatives.We state such estimations which are respectively proved in Sections 5 and 6 by developing a clever decomposition of the difference between the two schemes that is of independent interest.
Notations.Throughout this paper, we shall use the following notations.
• We denote by C ∞ p (R d , R q ) the set of all infinitely differentiable functions g : R d → R q such that g and all of its partial derivatives have at most polynomial growth.
• For n ∈ N * , we denote by C n (R d , R q ) the set of all n times continuously differentiable functions g : R d → R q .• For g : R d → R d , we denote by ∇g the Jacobian matrix defined for all i, j ∈ {1, . . ., d} • The Euclidean inner product and the associated norm are respectively denoted by |M x|.
• For any adapted R d -valued process (H t ) 0≤t≤T and M d -valued process (M (t)) 0≤t≤T , we denote where for A ∈ M d , A ⊤ and Tr[A] denote respectively the transpose and the trace of matrix A.

2.
Non-asymptotic mean square error of the multilevel Monte Carlo estimator 2.1.Assumptions and strong error analysis.It is well known that under assumption (H GL ) we have: Moreover, since the diffusion coefficient is constant, the Euler scheme coincides with the Milstein scheme and if b belongs to C 2 (R d , R d ) with bounded derivatives, then the strong error estimation improves to E sup 0≤t≤T |X t − X n t | p ≤ Kp(T ) n p , with K p (T ) < ∞ (see for instance [6]).In order to get a non-asymptotic control of the bias and the variance of the multilevel Monte Carlo estimator, we are now going to state an explicit bound for the terminal quadratic strong error E |X mn T − X n T | 2 for (n, m) ∈ N * × N (with the convention X mn = X for m = ∞) under the following assumption.The constancy of the diffusion coefficient ensures that the bias can be estimated with the right order of convergence using this strong error analysis instead of the more complicated weak error analysis. where and K 2,m is defined like K 1,m but with C (4.9) + √ T replacing C (4.9 In the estimations (and in the remaining of the paper), m only appears through ratios which have a limit as m → ∞ and when m = ∞, we consider that they are equal to this limit.The proof is postponed to Section 4.

MLMC parameters optimization revisited. In what follows let us assume that
3), the expectation leads to a telescoping summation so that where we used Proposition 2.1 for the last inequality.On the other hand, again by Proposition 2.1, Last, as X 1 T ∼ N (x 0 +b(x 0 )T, T I d ), we use the logarithmic Sobolev inequality for the Gaussian measure and the Herbst's argument (see e.g.propositions 5.5.1 and 5.4.1 in [1]) to get for all By performing Taylor expansions as λ → 0, we easily deduce that as a consequence, the following non-asymptotic estimation of the mean square error of Q holds.
Proposition 2.2.Under (R1), According to the above proposition, to achieve a root mean square error ε > 0, one should choose For such a choice, one should then choose (N ℓ ) 0≤ℓ≤L such that where 1) minimizing the computation cost which is equal to N 0 + L ℓ=1 N ℓ (m+ 1)m ℓ−1 .Note that for ℓ ∈ {1, . . ., L}, (m + 1)m ℓ−1 = m ℓ + m ℓ−1 is the number of grid values of the Euler schemes which are computed for each Brownian path at the level ℓ.This constrained minimization problem leads to where the total number N of simulations is chosen in order to achieve equality in (2.4) : .
Then the computation cost is given by Cost(m, m −L ) where .
Notice that for fixed m, Cost(m, x) is up to some positive multiplicative factor not depending on x equal to g(x) = ( not depending on x.We thus want to find L ∈ N minimizing g(m −L ) under the constraint (2.3) which writes m −L < β ε .We have g x is increasing on (0, 1], • either 2 ) such that g is decreasing on (0, x ⋆ ε ) and increasing on (x ⋆ ε , 1 ∧ β ε ) and the value of L solving the constrained minimization problem belongs to {⌊− ln x ⋆ ε ln m ⌋, ⌈− ln x ⋆ ε ln m ⌉}.We denote by L ε this optimal value of L, by N ε (resp.N ε ℓ ) the corresponding total number of samples (resp.number of samples in the level ℓ) and by Qε the multilevel Monte Carlo estimator (1.3) with those optimal parameters.When ε is small enough, 2 ).We easily deduce that, as expected from [3], Cost(m, m −L ε ) = O(ε −2 ) as ε → 0 for fixed m and N ε = O(ε −2 ).One could also consider minimizing m → Cost(m, L ε (m)) numerically, where we used the notation L ε (m) to make the dependence on m explicit.
Remark 2.1.For this optimal choice which clearly differs from the one in [3], the bias of the multilevel Monte Carlo method is not of the same order of magnitude as the precision ε but much smaller.To the best of our knowledge, such a 4/3 order of convergence of the bias does not appear in the existing multilevel Monte Carlo methods literature.Notice that, for stochastic differential equations with a non constant diffusion coefficient (multiplicative noise), this property remains true for the multilevel Monte Carlo estimator based on the Giles and Szpruch scheme [4], since it exhibits the same orders of convergence of the bias and the variance within a given level as (1.3).

Concentration bounds for the Multilevel Monte Carlo Euler method
3.1.Basic facts on Malliavin calculus.In this work, we follow the notations, definitions and results of [7].Let (W t ) 0≤t≤T be a d-dimensional Brownian motion defined on the filtered probability space (Ω, F, (F t ) t≥0 , P).Let D denote the Malliavin derivative operator taking values in the real separable Hilbert space where h k i denotes the i-th coordinate of h k .The operator D is closable as an operator from L p (Ω) to L p (Ω, H), for any p ≥ 1.Its domain with respect to the norm p is denoted by D 1,p .We now state some essential properties, which are going to be useful in the sequel.Proposition 3.1 (Chain's rule).Let φ ∈ C 1 (R q , R) with bounded first order derivatives and F = (F 1 , • • • , F q ) be an R q -valued random vector with F k ∈ D 1,p for k = 1, . . ., q.Then, φ(F ) ∈ D 1,p and for each i = 1, . . ., d Proposition 3.2 (Clark-Ocone formula).Let F be a F T -measurable random variable that belongs to D 1,p for some p ≥ 1.Then, A preliminary essential result is on the boundedness of Malliavin's derivative of the diffusion and its Euler scheme given by (1.2) Proof.Under our assumptions, for any t ∈ [0, T ], the random variables X t , X n t belong to D 1,∞ (see [7], section 2.2).The estimation of the Malliavin derivative of X t is straightforward.We only give a proof for the estimation of the Malliavin derivative of the Euler scheme.For r, t ∈ [0, T ], where (e j ) j=1,...,d denotes the canonical basis of R d .Hence D j r X n . so that Using the boundedness of the first order derivatives of b, we deduce that for t ≥ ⌈ rn T ⌉ T n ,

Concentration bounds.
The main result of this paper is a concentration inequality for the multilevel Monte Carlo estimator Q defined in (1.3).To prove this result, we are going to estimate the moment generating function of Qℓ , where Q0 := 1 For λ ∈ R, by independence, where we used the Gaussian concentration bound (2.2) to get the last the inequality.
Remark 3.1.If, on the coarsest first level, instead of 1 time-step, we choose n ∈ N * steps i.e.
, then one can use Section 4 of [2] to bound E exp(λ Q0 ) from above.More precisely, taking advantage of the constant diffusion coefficient to improve the bound in Proposition 4.
Now, for ℓ ∈ {1, . . ., L}, we set n ∈ N * and define For λ ∈ R, we want to obtain an estimation of E exp( λΥ) of the form exp where C is an explicit constant and is the order of the variance of the centered random variable Υ according to Proposition 2.1.To do so, we assume that where, for j ∈ {1, . . ., d}, the j th -component of the d-dimensional vector K r is given by K r,j := D j r f (X mn T ) − D j r f (X n T ).For p ∈ (0, 1), we use Hölder's inequality to get Now, by the Malliavin chain rule we have where D r X n T = (D i r X n T,j ) 1≤i,j≤d ∈ R d×d .According to Lemma 3.1 and under our assumption on the boundedness of ∇f , we easily check that sup r∈ is a martingale, which together with the choice p = 1/2 which minimizes 1 p(1−p) leads us to Applying Jensen's inequality twice, and now denoting by p a measurable positive function such that T 0 p(r)dr = 1, we obtain that and deduce that We now want to estimate the moment generating function of |K r | 2 .Setting We are first going to derive an exponential type upper bound with the optimal rate of convergence for the moment generating function of the square of the error U T = X n T − X mn T between the Euler schemes with n and mn steps.
n for k ∈ {0, . . ., n} and ρ be a constant satisfying Under assumption (R1), we have for all x ≥ 0 Our second main result yields a similar exponential type upper bound but for the moment generating function of |D j r U ℓ T | 2 under the following reinforced assumption on b.
Assumption (R2).The function b ∈ C 3 (R d , R d ) and satisfies assumption (R1).Moreover, there exist finite constants [ b] ∞ ∈ (0, +∞) and To state the result, let us introduce the following quantities Let assumption (R2) hold, (n, m) ∈ N * × N, t k = kT n for k ∈ {0, . . ., n} and ρ be a constant satisfying Then setting , we have A careful look at the proof of this theorem shows that, in the decomposition (5.4) of D j r U T , the sum D j r U T + D j r U (2) T goes to 0 as r → T whereas D j r U T does not.This indicates that it is not optimal to combine D j r U T with D j r U T as in this proof.We also notice that under the same constraint on ρ as in the theorem, T with it by replacing (3.6) by the estimation which takes (5.5) into account.One deduces that for κ(r) ∈ (0, 1), Combining (3.5), Hölder's inequality and the convexity of the exponential function which ensures that the exponential of the mean of d terms is not greater than the mean of the exponentials, we deduce that p(r)dr.(3.12) We now choose κ(r) to obtain the same constraint on λ for the two expectations at time r and then p(r) to ensure that this common constraint does not depend on r.Notice that since the functions Φ i are continuous on [0, T ], positive on [0, T ) and such that Φ where Hence, with Dealing with P Q − Ef (X m L T ) ≤ −α in a symmetric way we end up with the concentration inequality According to Section 2, the bias satisfies Hence, we proved our main result, which we now state.
in the denominator is closely related to the non-asymptotic upper-bound of the variance of Q derived in Section 2.2.The only difference is the replacement of K 1,m by 2C (3.13) .Let us now discuss the constraint on α under which we proved Gaussian type concentration and see that in the limit ε → 0, for the optimal parameters discussed in Section 2.2, we can choose α = O(ε 2/3 ) i.e. much larger than the root mean square error ε.
Remark 3.2.Following the discussion and notations of Section 2.2, for ε small enough, we consider Qε the MLMC estimator (1.3) with the optimal parameters This choice leads to a bias we rewrite the above concentration inequality as follows: there exist explicit positive constants c 1 , c 2 and c 3 such that for ε small enough and 0 ≤ α ≤ c 1 ε 2/3 , we have At this stage, a natural question arises: is there an alternative choice of the parameters that does not increase neither the root mean square error ε nor the order in ε of the computational cost of the multilevel Monte Carlo estimator and for which the upper bound on the deviation parameter α is larger than c 1 ε 2/3 ?
Since for each ℓ, N ε,β ℓ ≥ N ε ℓ , the root mean square error of Qε,β is not greater than the one of Qε and therefore than ε.Moreover, as ε → 0, the computational cost of Qε,β is O(ε −2 ).Further, for ε small enough min Hence, for this new choice we get the following concentration inequality : for β > 1, there exist explicit positive constants c 4 , c 5 and c 6 such that for ε small enough and 0 ≤ α ≤ c 4 ln β (1/ε), we have

Error expansion and moment generating function of max
For (n, m) ∈ N * × N, we consider the Euler scheme X n on the grid (t k = kT n ) 0≤k≤n and the process X mn which is the Euler scheme on the finer grid ( jT mn ) 0≤j≤mn when m is finite and the solution to (1.1) when m = ∞.We introduce the difference U t k = X mn t k − X n t k between the two processes and define We have U 0 = 0 and To deal with this induction equation, it is convenient to introduce the matrices This way, we have b(X mn ηmn(s) ) − b(X mn ηn(s) ) ds.One can check by induction that Indeed, since ) ds both sides of (4.2) statisfy the induction equality (4.1).By Itô's formula and the integration by parts formula, for k ∈ {1, . . ., n}, where γ(s) = (η n (s) − ηmn (s)).Therefore and U (2) respectively giving the contributions of the stochastic and the deterministic integrals.One can take advantage of the simpler expression but the analogous expression of U 1 T does not make sense because of the non-adapted factor . This is the reason why we introduced the more complicated decomposition (4.2).Notice however than n > T [ ḃ] ∞ , then the matrices A k and therefore A 1 k are invertible for k ∈ {0, . . ., n} so that U (1) where Let us state two lemmas that will be used to deal with U ⋆ (1) and U ⋆ (2) in the proof of Theorem 3.1.The first one follows from usual linear algebra arguments and the submultiplicative property of the matrix norm.
where q ∈ (0, 1) is a parameter to be optimized later.With Hölder's inequality, we deduce that • First term.Let us first deal with the contribution of U ⋆ (1) .Let us introduce the quantities Notice that n k=1 p k = 1 so that we have defined a probability measure.By (4.4), Applying Jensen's inequality to the convex function R ∋ x → exp ρx 2 q 2 , we deduce that Now, using the periodicity of the function γ with period t 1 = T /n, we get Then, by the second assertion in Lemma A.1, • Second term.On the other hand, by (4.5) and Lemma 4.2, one has where we used the periodicity of γ(t) for the first equality.Setting ∞ +a ∆b ) and using Jensen's inequality for the probability density p where the random variables in the sum in the right-hand side are independent.Setting δ = , plugging this inequality in (4.11) after using the scaling property of the Brownian motion W , we deduce that Using the fact that for each k ∈ {1, . . ., n}, r → γ(r) is non-increasing on [t k−1 , t k ] while r → √ r is increasing, we obtain that Applying the first assertion in Lemma A.1 with |H| = 1 and µ = (4.12) to obtain the same constraint on ρ for the two terms U T and U T and conclude by combining (4.8), (4.10) and (4.12) that if 0 ≤ ρ ≤ Remark 4.1.
• When n > [ ḃ] ∞ T , one could consider using the alternative expression U (1) This leads to replace C (4.9) by some constant not smaller than • In the last step of the derivation of (4.8), one could choose a constant q ∈ (0, 1) different from q to obtain , but, following the reasoning in the above proof, this leads to the same upper-bound but under a stronger constraint on ρ.Indeed, for a fixed value of q q, the maximal value of (1 − q)(1 − q) = 1 − (q + q) + q q is attained for q = q.• On the other hand, if q(t) is some probability density on [0, T ] and (Y t ) t∈[0,T ] an R d -valued process, applying Jensen's inequality to the convex function whereas applying Jensen's inequality to R d ∋ x → |x| 2 then Hölder's inequality leads to the upper-bound exp T 0 ln E exp{|Y t | 2 } q(t)dt which is smaller when E exp{|Y t | 2 } is not constant dt a.e.. Nevertheless, in the above proof, the repeated use of Jensen's inequality for x → exp (x) 2 did not worsen the final estimation because this estimation relies on uniform in t ∈ [0, T ] bounds for E exp{|Y t | 2 } .Proof of Proposition 2.1.By (4.3) and Lemma 4.1,

|U
The difference between the right-hand side and the definition (4.4) of U ⋆ (1) is that in the latter T 0 γ(t)∇b(X mn t )dW t is replaced by max 1≤k≤n t k 0 γ(t)∇b(X mn t )dW t .Reasoning like in estimation of the first term in the proof of Theorem 3.1, we obtain that for the probability measure (p k ) 1≤k≤n introduced in this estimation, On the other hand, by the Minkowski inequality and (4.5), Using the fact that for each k ∈ {1, . . ., n}, s → g(s) is non-decreasing on [t k−1 , t k ) while s → γ(s) is non-increasing and bounded by (m−1)T mn , we get that ) is obtained by plugging this inequality together with (4.13) (resp.(4.14)) into the inequality and optimizing over q : for a, b > 0, min q∈(0,1) (5.9) Hence, it follows that for all r ∈ [0, T ], we have As assumption (R1) is satisfied under (R2), then Theorem 3.1 applies and ], (5.10) • Second term.By the second assertion of Lemma 5.1, we have Moreover, thanks to Lemma 4.2, we get where . With (5.6), we deduce that Therefore, using Jensen's inequality for the probability density p where δ = In the same way as we did for the second term of section 4, we use that sup Applying the first assertion in Lemma A.1 with |H| = 1 and using that since s → √ se , we deduce that if ] with ρ (5.12) := m 2 2C 2 (5.11) T 2 (m − 1) 2 and Φ 2 (r) := where • Third term.Let us introduce the quantities Notice that n k=⌈ rn T ⌉+1 p k = 1 so that we have defined a probability measure.Therefore, by (5.7) we have and so Now, applying Jensen's inequality to the convex function R ∋ x → exp ρx 2 q 2 q2 , we deduce that for all r ∈ [0, T ], we have    Proof.The argument is based on the Dambins-Dubins-Schwarz (see e.g.[8]) theorem which ensures the existence of a one-dimensional standard Brownian motion (β t ) t≥0 such that ∀t ≥ 0, The concavity of the logarithm ensures that ∀x ∈ [0, 1  2 ], ln(1 − x) ≥ −2x ln 2 so that 1 where we used Jensen's inequality for the third inequality and the first statement of the Lemma for the fourth.
.11) Since U T does not depend on r, it should be better to combine D j U (0)

Theorem 3 . 3 .
Let assumption (R2) hold and f ∈ C 1 (R d , R) be a Lipschitz continuous function with constant [ ḟ ] ∞ and such that ∇f is also Lipschitz with constant [ ḟ ] lip .Then, for all 0