Two-Sided Error Estimates for the Stochastic Theta Method

Two-sided error estimates are derived for the strong error of convergence 
of the stochastic theta method. The main result is based on two ingredients. 
The first one shows how the theory of convergence can be embedded into standard 
concepts of consistency, stability and convergence by an appropriate 
choice of norms and function spaces. The second one is a 
suitable stochastic generalization of Spijker's norm (1968) that is known to 
lead to two-sided error estimates for deterministic one-step methods. 
We show that the stochastic theta method is bistable with 
respect to this norm and that well-known results on the optimal 
$\mathcal{O}(\sqrt{h})$ order of convergence follow from this 
property in a natural way.


1.
Introduction. There is a well-established theory for the strong convergence of one-step methods for stochastic ordinary equations (SODEs). In particular we mention the pioneering books of P.E. Kloeden and E. Platen [11,12], which paved the way for a fascinating research area in numerical analysis. Other references are for instance [17,18]. The purpose of this paper is to add two new aspects to this theory.
The first one is conceptual. By a proper choice of norms and function spaces we show that strong convergence results can be embedded into the standard framework of consistency, stability and convergence as it is formulated in abstract terms in the theory of discrete approximations (see [23,24,25,26], [22]). In Section 3 we will discuss why and in which sense our notion of consistency deviates from those used in [11], [17,18]. We also note that numerical stability concepts have already been developed for multistep methods, for stochastic differential algebraic equations and stochastic delay equations in [2,3,27].
Our second contribution is concerned with a special choice of norms that allows us to prove bistability in the sense of [26] and, as a consequence, to derive twosided estimates for the convergence error. We show that this can be achieved by a suitable stochastic version of the deterministic Spijker norm (see [21], [22,Ch.2.2], [6,Ch.III.8]). It also turns out that well-known results on the optimal order of convergence [4] follow in a natural way from these two-sided estimates.
In order to present the basic ideas in a simplified technical framework we consider only the semi-implicit Euler method [11] or stochastic theta method (STM) [7]. Extensions of the results to other methods (in particular higher order methods) will be published in a subsequent paper. In the following we give a more technical outline of the paper.
We consider the numerical approximation of R d -valued stochastic processes, which satisfy an ordinary Itô stochastic differential equation [1,16,19] of the form dX(t) = b 0 (t, X(t))dt + m k=1 b k (t, X(t))dW k (t), t ∈ [0, T ], where W k , k = 1, . . . , m, denote real and pairwise independent standard Brownian motions. The initial value X 0 is assumed to have finite second moment. We also assume that the drift and diffusion coefficient functions b k : [0, T ] × R d → R d fulfill the usual Lipschitz conditions such that (1) has a unique solution (see Section 2 for details).
The stochastic theta method (θ ∈ [0, 1]) on a grid is given by the recursion where h i = t i − t i−1 is the length of the i-th interval and ∆ h W k (t i ) = W k (t i ) − W k (t i−1 ) denotes the i-th increment of the Wiener process W k . We collect stepsizes and define It is well-known (see for example [11,17]) that the STM converges at least with order γ = 1 2 in the strong sense, i.e. there exists a constant C > 0 such that where X is the analytic solution and X h is the numerical solution. Note that [11,17,18] use an even stronger norm, where max occurs inside the expectation.
In the forthcoming paper [15] it will be proved that our approach can handle this norm as well. Moreover, it was shown by J.M.C. Clark and R.J. Cameron [4] that, in general, γ = 1 2 is the maximum rate of convergence for the STM (and for any method that only uses the Brownian motion at grid points).
The strong convergence in (5) is written in terms of the norm In this paper we show that the following generalization of Spijker's norm also plays an important role Writing the equations (3) as A h (X h ) = R h with a suitable operator A h and righthand side R h , one of our main results is the following bistability inequality A precise formulation and the proof will be given in Sections 3 and 5.
In Section 2 we summarize the main assumptions and collect some prerequisites from stochastic analysis. Then the main results on two-sided error estimates are stated in Section 3. In Section 4 we show that the STM is consistent with respect to the stochastic Spijker norm (7). We conclude the paper in Section 6 by showing that the results on optimal convergence rates from [4] follow directly from our two-sided error estimates.

2.
Main assumptions and some results from stochastic analysis. In this section we collect the main assumptions and some useful results from stochastic analysis.
Let (Ω, F , P ) be the underlying probability space and denote by E the expectation with respect to P . Let (F t ) t∈[0,T ] ⊂ F be the filtration, which is generated by the Brownian motions W k , k = 1, . . . , m. As in [1,16,19] we assume for the SODE (1) that the drift and diffusion coefficient functions b k : [0, T ] × R d → R d , k = 0, . . . , m, are measurable. We also assume that the following assumptions hold: (A1): The initial value X 0 is an F 0 -measurable and R d -valued random variable satisfying (A2): There exists a constant K > 0 such that Here we denote by |·| the Euclidean norm in R d . Assumptions (A1) and (A2) are sufficient to assure the existence and uniqueness of a strong Itô solution to (1) (see [1,16,19]), i.e. there exists a unique, P -a.s. continuous and (F t ) t∈[0,T ] -adapted process X which satisfies Assumption (A3) is used in [11] to prove convergence of the Euler-Maruyama scheme. We will use it here to prove consistency of the STM. We note that assumption (A3) can be replaced by an L 2 -condition on the second order Itô-Taylor coefficient function (see [15] for further details). From Theorem 7.1.2 and Remark 7.1.5 in [1] we have the following estimates for the second moment of the strong solution.
for all 0 ≤ t ≤ T and some constants C, D > 0 depending only on K and T .
In particular, using the semigroup property of X, one can prove the estimate for all t, s ∈ [0, T ] and some constant C > 0 depending only on K, T and E(|X 0 | 2 ).

Definitions and main result.
In this section we rewrite the STM as an operator equation and introduce the corresponding spaces and norms. We give precise definitions of our notions of consistency and (numerical) stability and compare them to related notions in the literature. Finally, the main convergence theorem and the two-sided error estimate are stated and proved. We note that our notions are largely motivated by the work of Stummel [26]. It will not be necessary to directly invoke results from [26], but in the remark following Definition 3.2 (see also [13,14]) we will indicate how a formal embedding of our approach into the abstract framework can be achieved.
3.1. Basic notions. We introduce an operator that represents the SODE (1). Since a unique solution X to (1) is guaranteed by assumptions (A1) and (A2) we consider the trivial operator where E := {X} and F := {Y = (X 0 , 0)} are singletons (with the second component of Y being the stochastic process which is P -a.s. equal to 0 ∈ R d ) and the operator A is given by With each grid τ h as in (2) we associate the space G h := G(τ h , L 2 (Ω, F , P ; R d )) of all adapted and L 2 (Ω)-valued grid functions. That is, for Z h ∈ G h the random variables Z h (t i ) are F ti -measurable and lie in L 2 (Ω, F ti , P ; R d ) for all t i ∈ τ h , i = 0, . . . , N . Note that G h is a Banach space with respect to the norm (6).
Next we define two sequences of restriction operators on the spaces E and F In this and the next section we consider three real parameters L, ρ, σ > 0 that are arbitrary. They will be given specific values later in Section 5 when we prove stability. The first parameter occurs in the norm for grid functions Z h ∈ G h given by where · L 2 (Ω) denotes the norm in L 2 (Ω, F , P ; R d ). As usual the exponential weight is needed for the proof of stability via a contraction argument for the Picard iteration. The weight plays no role in the proof of consistency of the STM. Whenever appropriate we abbreviate · 0 = · 0,h,L . Our underlying complete metric space is the closed ball Note also that the norms · 0,h,L and (6) are equivalent (uniformly in h).
We also introduce a second norm for Z h ∈ G h by Let us write · −1,h,L = · −1 for short and note that · −1,h,L is uniformly in h equivalent to the norm (7). From the introduction we also recall that these norms are stochastic versions of Spijker's norm [20,22]. Then our second complete metric space is the closed ball In the next step we introduce the residual mapping res h : G h → G h of the stochastic theta method (3) as follows: The definition shows that res h (Z h ) is adapted to the filtration (F ti ) ti∈τ h and as- A direct comparison shows that the stochastic theta method (3) and the residual condition res h (X h ) = 0 ∈ G h are equivalent. This leads to the idea of taking the residual res h (r E h X), where X is the solution of (9), as a measure of the local truncation error.
We are now in the position to define the sequence of operators which represent the stochastic theta method. Define the operator on its domain of definition by the relation We recall that r F h Y = (X 0 , 0, . . . , 0) by (13). Thus A h Z h and res h (Z h ) differ only at the first grid point.
Using the operators A h , the stochastic theta method (3) can equivalently be written as the equation Still we have to prove solvability of this equation and to estimate the global error r E h (X) − X h 0 . This motivates the following definitions.
holds for all grids τ h with |h| ≤ h, where X denotes the analytic solution of (1).

Note that
, and, therefore, we refer to the left hand side of (21) as the local truncation error or the consistency error. The local truncation error is meaningful even if r E h X / ∈ D(A h ), but for a consistent one-step method we know that res h (r E h X) −1 → 0 as |h| → 0 and hence r E h X ∈ D(A h ) for |h| sufficiently small. The second ingredient in a convergence theory for numerical methods is the concept of (numerical) stability. We use here the stronger notion of bistability.
holds for all Z h ,Z h ∈ D(A h ) and for all grids τ h with |h| ≤ h. Remark 1. 1. Our notions of consistency and stability are directly related to the abstract framework invented by F. Stummel [26]. In the general theory of discrete approximations he proves that bistability of a numerical method can be characterized by the equicontinuity of the operators (A h ) h and the equicontinuity of (A −1 h ) h . It is easy to see that our definition of bistability is a sufficient condition for bistability in the sense of [26]. The same is true for the consistency error: Our definition of consistency appears in [26, §2, (6)] as a sufficient condition for Stummel's notion of consistency. We refer to [13] for a derivation of the results of this paper from the theory of discrete approximations. Our current definitions turn out to be convenient for providing a direct approach to the convergence of numerical methods for stochastic differential equations. 2. Note that the definition of bistability depends on the three constants L, ρ, σ which appear in the norms in (22) and in the domain of definition D(A h ) via (15), (19). 3. One obtains an equivalent version of our notions when dividing equation (3) by h i and defining the operators res h and A h accordingly. Then the stochastic Spijker norm (16) is replaced by This form shows that the norm is of W −1,∞ Sobolev type.

3.2.
Main results. Now we formulate the main results. The proofs of the first two theorems will be deferred to the following two sections.
holds for all grids τ h with |h| ≤ h. In particular, the numerical solution X h of the stochastic theta method converges uniformly with order γ = 1 2 at each grid point to the restriction of the analytic solution X to (1).
Proof. By theorem 3.4 we know that there exists an upper step size bound h > 0 such that the operators Remark 2. 1. Note that by our choice of the norm (14) the convergence in theorem 3.5 is understood in the L 2 -sense. The L 2 -convergence of the numerical solution X h to the analytic solution X implies a good pathwise approximation for each sample path ω ∈ Ω. In addition to this notion of strong convergence there also exist the concepts of (numerical) weak convergence (see [11,17,18]) and of pathwise convergence (see [8,9,10]).
2. In our approach we avoid any interpolation and work with grid functions only. Therefore, our estimates of the mean-square error are also restricted to grid points. According to [11,Ch.10.2] one can interpolate the numerical approximation to an adapted, right-continuous stochastic process with existing left limits on the interval [0, T ] by Note that this may be difficult to evaluate in practical computations.
In the case θ = 0 this interpolation is continuous. In [11,Ch.10.2] it is also shown that this interpolation converges uniformly to X(t), t ∈ [0, T ], with the same order that holds at the grid points. The interpolation above can be considered as an a-posteriori process and formula (24) may be called a method for dense output, see [6].

3.3.
Comparison with other consistency concepts. In this subsection we compare, for the case of constant step-size h, our notion of consistency to other approaches in the literature. First, we consider the concept of consistency introduced by G.N. Milstein [17,18] that was used later, for example, by C.T.H. Baker and E. Buckwar [2] for stochastic delay equations. Then we compare with the approach from the book of P.E. Kloeden and E. Platen [11,Ch.9.6].
The use of a stochastic version of Spijker's norm (16) in our definition of the local truncation error is the key to the two-sided error estimate. The standard procedure of eliminating convergence errors by successive triangle inequalities from local errors (see "Lady Windemere's fan" diagram in [6]) is not sharp enough to produce twosided estimates. There is, in fact, a more fundamental difference between our notion of local truncation error and the concepts used in the literature.
By Definition 3.1 a one-step method is consistent if the restriction of the analytic solution X to the grid points produces a small residual for the discrete operator equation Then a stability estimate allows to measure the distance between r E h X and X h in terms of the residual res h (r E h X) = A h r E h X − r F h Y . For a comparison with G.N. Milstein's notion of consistency we have to extend the notation. By X(·; τ 0 , x 0 ), where τ 0 ∈ [0, T ] and x 0 ∈ R d , we denote the solution of (1) with initial condition X(τ 0 ; τ 0 , x 0 ) = x 0 . Similarly,X h,ti,x0 , 0 ≤ i ≤ N , denotes the solution to the discrete system Then the convergence theorem in [17, Ch.1] assumes the following conditions for some constant K > 0 and for all x 0 ∈ R d , 0 ≤ i ≤ N − 1. In addition, the exponents p 1 , p 2 have to satisfy the constraints

TWO-SIDED ERROR ESTIMATES FOR THE STM 397
Under this assumption G.N. Milstein proves that the one-step method converges with order p = p 2 − 1 2 . The idea behind (26) is that the one-step method causes a small error ( in the mean and in the mean-square) after exactly one step when compared to the analytic solution at every grid point t i and for every initial value x 0 .
The main difference between our local truncation error (21) and assumption (26) lies in the meaning of the word 'local'. While G.N. Milstein localizes the convergence error in time but considers the whole phase space, we localize along the trajectory of the analytic solution X to (1) but consider the error over the whole time interval at once. We also emphasize that we do not solve a discrete equation of the form (25) in order to define consistency. Rather, our notions of consistency and solvability are separated in the case of implicit methods (θ > 0).
However, for explicit one-step methods this difference is only minor. In [13, App.A] it is shown, that if the Euler-Maruyama method (θ = 0) satisfies assumption (26) (which is true for p 2 = 1, p 1 = 2 [17, Ch.1]) then it is also consistent in the sense of definition 3.1 with order γ = 1 2 . In the deterministic case there are well known counter examples (see [5,22]) which show that consistency in the Spijker norm does not imply a good approximation of the drift as in (26).
In [11,Ch.9.6] the authors P.E. Kloeden and E. Platen use another concept of consistency. They call a one-step method "strongly consistent" if there exists a nonnegative function c for all initial values x 0 ∈ R d and 0 ≤ i ≤ N − 1. Again, this notion focuses on the one-step method (25) started at arbitrary vectors rather than at the specific solution X as in definition 3.1. In a sense, strong consistency of a method requires that it does not differ too much from one step of the Euler-Maruyama scheme which is taken as a prototype of a strongly consistent scheme. Consequently, the Euler-Maruyama method itself is strongly consistent with c ≡ 0. This interpretation also shows that the function c is of limited use when estimating the order of convergence. Contrary to this, our notion of consistency provides a strong link to the order of convergence. As Theorem 3.5 shows, the local truncation error (21) Note that the terms for i = 0 vanish and that the exponential weight is bounded by 1. Then from the representation (9) and the triangle inequality we obtain the estimate . (29) We estimate the terms separately. The square of the first term (27)

Applying Theorem 2.1 leads to the estimate
where the constant C only depends on K, T and E(|X 0 | 2 ). Hence we complete our estimate of S 1 (i) as follows Replacing t j−1 by t j in (30) one gets the analogous result for the term in (28), i.e.
By the martingale property of the stochastic Itô integral (c.f. Corollary (5.2.1) in [1]) we find for all indices j, ℓ = 1, . . . , i with j = ℓ Hence, by the Pythagoras theorem we get

WOLF-JÜRGEN BEYN AND RAPHAEL KRUSE
where we used the Itô isometry in the last step. Again we apply (30) and obtain Combining the estimates (31), (32) and (33) we arrive at the final estimate where the constantC only depends on K, T , E(|X 0 | 2 ) and m. Thus the proof of theorem 3.3 is complete. . However, as we have shown, for the stochastic Spijker-norm (16) both types of integrals give the same order of convergence. This is the key property of the stochastic Spijker-norm that facilitates the proof of the two-sided error estimate with maximum rate of convergence.

5.
Stability. This section is devoted to the proof of bistability for the STM. The main idea is to rewrite the STM as a fixed point problem that is a discrete analog of the integral equation (9). Then we choose appropriate parameter values L, ρ, σ > 0 such that the Banach fixed point theorem applies. We define the mapping By induction on i one readily proves the following equivalence for all Y h ∈ D(A h ) and Since the inhomogeneities Z h (t j ) appear as sums in Φ h we have the following norm The following lemma is a generalization of a similar assertion proved in [26, §6, (19)] for the case of deterministic ordinary differential equations.
Lemma 5.1. For every ρ > 0 and L ≥ 2K 2 (e √ T + m) 2 there exists an h ρ,L > 0 such that the following properties hold with the settings for all grids with |h| ≤ h ρ,L For 0 ≤ i ≤ N we estimate as follows Note that this estimate does not depend on Z h . In the following we treat the terms separately. We apply Jensen's inequality to the square of the term (40) and then use assumption (A2) to obtain we complete the estimate of (40) by If we apply (43) with t j instead of t j−1 the same arguments work for the term (41) and we arrive at It remains to estimate (42). As in section 4 we use the martingale property of the Itô-integral to obtain By inserting (44), (45) and (46) into (39) we get the estimate Taking h ρ,L ≤ 1 L the contraction estimate (38) follows by the choice of L. By Theorem 3.3 the STM is consistent. After possibly reducing h ρ,L > 0 further we obtain Remark 4. One can improve the lower bound on L to L > 1 2 K 2 √ T + m 2 at the expense of larger contraction and stability constants that depend on L and T .
We are now prepared for the proof of Theorem 3.4.
Proof of theorem 3.4. Choose parameter values L, ρ, σ ρ according to Lemma 5.1. Then for every Z h ∈ F h the mapping Φ h (·, Z h ) : E h → E h is a contraction and the Banach fixed point theorem yields the existence of a unique fixed point Therefore, by the relation (35), there also exists a unique solution Y h ∈ D(A h ) (recall (19)) of the equation Hence the mapping A h : D(A h ) → F h is bijective for |h| ≤ h ρ,L . Moreover, we have the relationship With the help of (48) and Lemma 5.1 we obtain Hence

WOLF-JÜRGEN BEYN AND RAPHAEL KRUSE
which is one part of the bistability inequality (22). The second part follows from Remark 5. Note that the parameter ρ > 0 in Lemma 5.1 is arbitrary due to the global Lipschitz condition (A2). For the proof of Lemma 5.1 we only need a Lipschitz condition of the form Thus the STM is still bistable if we weaken assumption (A2) by assuming constants ρ, K ρ > 0 such that (49) holds for all s ∈ [0, T ] and all adapted random variables In this case the lower bound for the exponential weight parameter L also depends on ρ.
6. Maximum order of convergence. In this section we discuss the maximum order of convergence for the STM. J.M.C. Clark and R.J. Cameron [4] constructed an example to show that, in general, the maximum order of convergence is equal to 1 2 . This extends to all one-step methods which use only the increments W k (t i ) − W k (t i−1 ) of the driving Wiener processes. We will show that the same result follows in a natural way for the STM from the two-sided error estimate (23). Unlike e.g. the Milstein method [11,17], the STM does not use information about the Wiener processes at intermediate times s ∈ [0, T ] \ τ h . Therefore, one cannot expect the STM to give good approximations for an equation with an iterated Wiener process, i.e. with a stochastic integral of the form t 0 W 1 (s)dW 2 (s).
As in [4], we consider two real and independent (F t ) t≥0 -Wiener processes W 1 and W 2 . The two-dimensional stochastic differential equation has the analytic solution For this equation the local truncation error of the STM is As before ∆ h W (t j ) denotes the j-th increment of the two Wiener processes. Note that the local truncation error is independent of the parameter θ ∈ where we used the martingale property of the stochastic integral in the first step and the Itô-isometry in the second step. Note that we have an exact expression for the local truncation error in the Spijker norm. The Cauchy-Schwarz inequality yields Thus the local truncation error is bounded from below by a term of order O ( T   N   1 2 ). In case of an equidistant step size h = T N we have equality in (52). The two-sided error estimate (23) then reads which shows that the STM converges with the exact order γ = 1 2 .

7.
Conclusions. The bistability of a numerical discretization method is formulated in terms of two norms that allow us to estimate differences of grid functions by differences of residuals and vice versa. This property plays an important role in deriving two-sided estimates for the convergence error. In this paper we have set up a pair of norms that guarantee bistability of the stochastic theta method for an SODE satisfying Lipschitz and Hölder conditions. One of the norms is the maximum of the strong norm at grid points while the second norm is a stochastic generalization of Spijker's norm. We have also shown that upper and lower estimates of type √ h follow in a natural way from the corresponding two-sided error estimates. As a byproduct of our approach we succeeded in embedding the theory of convergence for the stochastic theta method into the standard framework of consistency, stability and convergence as developed e.g. by Stummel. Several natural questions follow from our approach. First, it is natural to ask whether the two-sided error estimates extend to higher order onestep and multistep methods (e.g. those in [11,17]). A positive answer will be given in the forthcoming paper [15]. Second, one may ask for other pairs of norms, either both stronger or both weaker than the ones considered here, that allow to prove bistability. Finally, it seems natural to ask for bistability of discretization methods applied to infinite dimensional stochastic equations, such as delay equations and partial differential equations.