Improved order 1/4 convergence for piecewise constant policy approximation of stochastic control problems

In N.V. Krylov, Approximating value functions for controlled degenerate diffusion processes by using piece-wise constant policies, Electron. J. Probab., 4(2), 1999, it is proved under standard assumptions that the value functions of controlled diffusion processes can be approximated with order 1/6 error by those with controls which are constant on uniform time intervals. In this note we refine the proof and show that the provable rate can be improved to 1/4, which is optimal in our setting. Moreover, we demonstrate the improvements this implies for error estimates derived by similar techniques for approximation schemes, bringing these in line with the best available results from the PDE literature.


Introduction
In this paper we derive improved error estimates for approximations of value functions of stochastic optimal control problems.Let (Ω, F , {F t } t≥0 , P) be a complete filtered probability space, (W t ) t≥0 a p-dimensional {F t }-Wiener process on (Ω, F , P), and A the set of progressively measurable processes with values in a set A ⊆ R m .For any α ∈ A, x ∈ R d , t ∈ [0, T ] (with T > 0), let X • = X α,t,x • be the (controlled) Itô diffusion which satisfies Here we use the notation ϕ a (•, •) = ϕ(•, •, a) for any a ∈ A and function ϕ.For a given terminal cost function g and running cost f , the optimal control problem consists of maximizing over α ∈ A the expected total cost J α (t, x) := E α t,x T −t 0 f αr (t + r, X r ) dr + g(X T −t ) . (1. 2) The indices on the expectation E indicate that the law of the process depends on the starting point and control.Finally, the value function of the optimal control problem is defined by v(t, x) := sup α∈A J α (t, x). (1.3) We consider the following set of assumptions: c 0000 (copyright holder) 1 Observe that under assumptions (H1), (H2), and for any α ∈ A, there exists a unique strong solution of equation (1.1).For simplicity, we assume data and coefficients to be Lipschitz continuous in space and 1/2-Hölder continuous in time, and have included no discount factor, but it is not difficult to extend our results to include discounting and a lower Hölder regularity for f and g.We aim to estimate the error introduced by approximating the set of measurable controls A by piecewise constant controls.Let h > 0 be the discretization parameter and A h the subset of A of processes which are constant in the intervals [nh, (n + 1)h) for n ∈ N. 1 The value function associated with this restricted set of controls is defined by (1.4) Note that the definition of v h in (1.4) under the "shifted" dynamics in (1.2) and (1.1) implies that the control discretisation is always centered at t.This will be important for establishing a dynamic programming principle.This is not, though, how one would compute v h in practice, as discussed in the penultimate paragraph of this section.
From a probabilistic perspective, it is clear that 0 is a lower bound for v − v h since A h ⊆ A. Under our assumptions, an upper bound on v − v h of order h 1 6 is given in [8].An indication that the order 1/6 from [8] might be improved is the fact that under the same regularity assumptions as above it is shown in [5] that a fully discrete semi-Lagrangian scheme applied to the corresponding HJB equation has order 1/4 in the timestep for an Euler approximation.This scheme does not distinguish between constant or other controls over individual timesteps.It would therefore be somewhat surprising if the scheme which employs further approximations was closer to the original problem than the one which only holds the policies constant over timesteps.
A slightly different angle to the problem is provided in [3], where the authors construct from (1.4) a subsolution to the HJB equation corresponding to (1.3) by a second order local expansion in t.This results in an order 1 error bound in the case of smooth solutions, in contrast to 1/2 which would be obtained in the smooth case by the method in [8] (see also Section 2.3 below).However, in the general non-regular case, the order in [3] is limited by a switching system approximation of order ε 1/3 (for a switching cost chosen of order ε), which, combined with an error term of the regularised system of order h/ε 3 (for regularisation parameter ε), results in an order 1/10 error by optimisation of ε.
In this paper, we combine the advantages of both methods to obtain order 1/4.The reason we can improve the error estimates of Krylov is that we use a higher order expansion when we derive the truncation error.Our discussion (see Subsection 2.3) also shows that no further improvement can be obtained in this way: our new proof uses the maximal possible order of the truncation error.
Piecewise constant policy time stepping has been used in a numerical method for solving Hamilton-Jacobi-Bellman equations in [13], where the computational advantage comes from the fact that over the time intervals in which the policy is constant, only linear PDEs have to be solved.This has been extended to mixed optimal stopping and control problems with nonlinear expectations and jumps in [6].A further benefit lies in the inherent parallelism so that the linear problems with different controls can be solved on parallel processors.A proof of convergence is given in these works using pure viscosity solution arguments, but no rate of convergence is provided.Early results on this type of approximation can be found in [10] and an extension with "predicted" controls is proposed in [7].
In the remainder of this article, we give in Section 2 a proof of the order 1/4 convergence of the piecewise constant policy approximation, and deduce the linear convergence in the case of sufficiently regular solutions and data.We then outline in Section 3 the improved orders which can be derived for approximation schemes by similar techniques.

Main result
We begin by stating the main result.Throughout the entire section we work under assumptions (H1)-(H3).
Theorem 2.1.For any s ∈ [0, T ], x ∈ R d , and h > 0, we have where the constant C only depends on the constants in Assumptions (H2) and (H3).
A major difficulty in the proof of Theorem 2.1 is the fact that typically v and v h are not smooth.Even in the non-degenerate case where v is C 2+δ , v h is still not smooth in general.A simple example is the Black-Scholes-Barenblatt equation resulting from an uncertain volatility model (see [11]).Here, the control is of bang-bang type and the optimal control problem for piecewise constant policies reduces to taking the maximum of two smooth functions at the end of each time interval, so that for t on the time mesh, v h (t, •) will only be Lipschitz (in the spatial argument).
Since the proof of Theorem 2.1 relies on repeated use of the Itô formula, we need to work with smooth functions, both for the coefficients and value functions v and v h .This means that we need to introduce several regularization arguments and use Krylov's method of shaking the coefficients.
2.1.Background results and regularisation.In this section, we introduce Krylov's regularization and give related preliminary results.Some of the proofs are given in [8] and not repeated here; see also [1,2] for analogous results proved with PDE arguments.In order to apply Itô's formula twice, σ, b, f, g, v, and v h must be regularized.Let ε > 0 and the mollifier ρ ε be defined as where ρ(e) de = 1.
For any function ϕ We can always take an extension which preserves the Hölder continuity in time and Lipschitz continuity in space of ϕ.Then standard estimates for mollifiers imply that Let X• be the solution of (1.1) with coefficients replaced by b (ε) and σ (ε) .Then we denote by ṽ and Jα the solution and cost function of the optimal control problem (1.1)-(1.3)where X • is replaced by X• and f, g by f (ε) , g (ε) .
Proof.The result follows from the definitions of v and ṽ since by standard continuous dependence results for SDEs and Lipschitz and Hölder continuity of f, g, b, σ, for some constant C independent of the control α.
To avoid heavy notation, we will use (f, g, b, σ) instead of (f (ε) , g (ε) , b (ε) , σ (ε) ) in the rest of the paper, keeping in mind estimates (2.3) for their derivatives.We now proceed with the regularisation of the value function v h .Let E h be the set of progressively measurable processes e ≡ (e 1 , e 2 ) with values in (−ε 2 , 0) × B ε (0) (where B ε (0) denotes the ball of radius ε in R d ) which are constant in each time interval [nh, (n + 1)h).Letting S = T + ε 2 , we define for any s ∈ [0, S], x ∈ R d the following "perturbed" value function where X• = X(α,e),s,x • is the solution of the following SDE with (mollified and) "shaken coefficients": for any t, s ∈ [0, S] and x, y ∈ R d .Moreover, for any s ∈ [0, S − h], u h satisfies the following dynamic programming principle (DPP): for any k, m ≥ 1.Moreover, u h satisfies the following super-dynamic programming principle Proof.The first part follows from Proposition 2.2 and (2.3), while (2.9) follows by the definitions of u

1) Upper bound on L a u (ε)
h + f a .By two applications of the Itô (or Dynkin) formula, where the generator L a of the diffusion process is defined as Inserting this equality into the dynamic programming inequality (2.9) in Proposition 2.3, applying Itô once to the f a -term, and dividing by h, we find that h } and 2m + k ≤ 4, by (2.3) and (2.8), and x ∈ R d .By Itô's formula and part 1), From (2.7) in Proposition 2.3 and the first part of Proposition 2.2, it then follows that for a generic constant C. Since by definition (2.4) and the regularity of u h (Proposition 2.2), Since α ∈ A was arbitrary, by the definition of ṽ (see just before Proposition 2.1), ṽ(s, 3) Upper bound on ṽ − v h for s ∈ [T − h, T ].By the definition of Jα (see just before Proposition 2.1), Itô's formula, the regularity of f and g, and using (2.3), there is a constant C > 0 such that for every α ∈ A and s ∈ [T − h, T ], Then it follows from the definitions of ṽ and v h that and hence also |ṽ(s,

4) Conclusion:
Using Proposition 2.1 and parts 2) and 3), we have that for s ∈ [s, T ] and x ∈ R d .Taking ε = h 1/4 then concludes the proof of the right-hand inequality in (2.1).The left-hand inequality is immediate since A h ⊆ A.

2.3.
The maximal rate and comparison with [8].If the data and value functions are smooth enough, we can adapt the proof of Theorem 2.1 to obtain the maximal rate of the approximation, which is 1.More specifically, if we assume v h and f sufficiently smooth, we have in (2.10) Therefore, instead of (2.11), the conclusion of step 1) in the previous proof gives for some constant C independent of a ∈ A and ε.Moreover, if we assume that b, σ and f are Lipschitz in t uniformly in x and a, and g belongs to C 2 b (R d ), then by standard results u h will be Lipschitz in t.Hence, we find in step 2) that ṽ(s, x) − v h (s, x) ≤ C(ε + h).
Sending ε to zero then gives that ṽ converges to v, and we have the following result: Proposition 2.4.Additionally to assumptions (H1)-(H3), let b, σ and f be Lipschitz continuous in t uniformly with respect to x and a, and (2.12)This is the maximal rate that this approximation can reach.The reason is that the order obtained by applying Itô twice in step 1) of the proof cannot be improved.This can easily be checked by repeatedly applying Itô to obtain higher order error expansions and then noting that all such expansions contain terms of order h.
Step 1) of the proof also explains why Krylov in [8] got a less sharp result than ours.After one application of Itô, he used the moment bound This estimate requires only three derivatives in space of u (ε) h but gives the lower rate 1/2.The conclusion of step 1) of the proof then becomes Completing the proof as in Section 2.2 then gives ṽ(s, and optimizing with respect to ε shows that v(s, x) − v h (s, x) ≤ Ch 1/6 .Note that there is no need for regularization of the coefficients and data since Itô is applied only once.In the case of smooth enough solutions, this approach cannot give a higher rate than 1/2.

Consequences on finite difference approximations
In this section, we outline the impact of the improved error bound for the control approximation on the achievable convergence order for numerical schemes, either by directly substituting the improved order (Section 3.1) or by applying adaptations of the steps here using higher order estimates (Section 3.2).
3.1.Improvement to Theorem 1.11 in [9].Using the new bound for the control approximation from Section 2, one easily obtains a sharpening of the order from 1/39 in [9, Theorem 1.11] and 1/21 in [8,Theorem 5.4] to 1/15, which holds for local, monotone schemes of consistency order 1/2.Indeed, using Theorem 2.1 instead of [8,Theorem 2.3], the bound in the second inequality in the proof of [8,Theorem 5.4] (on top of page 14 in [8]) becomes where δ > 0 is the time discretization step used in [8] for the approximation scheme for the value function, n the number of time intervals over which the policy is constant and v δ,1/n is the obtained approximation of v. 2 Optimizing with respect to δ gives n ∼ δ −4/15 and an estimate of order 1/15 in δ.
Assuming order 1 consistency of the scheme used instead of order 1/2 as in [9, Theorem 1.11] and [8,Theorem 5.4], in conjunction with [9, Lemma 3.2], one gets and the rate improves further to 1/10.

3.2.
Improvement to Theorem 5.7 in [8].For a wide class of numerical schemes, similar modifications as those used to prove Theorem 2.1 can be performed to improve the error estimates given in [8,Theorem 5.7].Following as much as possible the notation in [8], let us define for any s ≥ 0, It is easy to check, by Taylor expansion, that for any smooth function φ the estimate in [8, Lemma 5.10] for the truncation error of the generator becomes for a constant C depending only on C 1 and C 2 in assumptions (H2)-(H3) and the bounds on the derivatives ∂ m t D k x φ for 2m + k ≤ 4. Observe that conditions (3.1) are slightly stronger than (5.4) in [8], who only assume accuracy of the moments to order h 3/2 instead of h 2 in (3.1), so that only order 1/2 consistency results instead of order 1 above.However, the higher order assumptions are satisfied by very common schemes such as the classical semi-Lagrangian scheme [4,5] corresponding to the choice The scheme considered in [8]  Proceeding to a perturbation and regularization of vh as in [8] (the notation follows the one in Section 2.2, i.e. û(ε) h is the mollification of ûh , the solution of the scheme with perturbed "shaken" coefficients) we get the inequality L a û(ε) h + f a ≤ Chε −3 in [0, T − h] × R d for some constant C depending only on C 0 , C 1 in assumptions (H2) and (H3).Arguing as in the proof of Theorem 2.1, one obtains vh ≤ v + Ch 1/4 .
Similarly, an upper bound of order 1/4 for v − vh can be obtained.This aligns the bounds for the scheme (3.2) with those obtained in [5] by PDE techniques.

Discussion and conclusions
In this short paper, we show a convergence rate of 1/4 for piecewise constant control approximations to value functions of stochastic optimal control problems.This result is robust and holds for degenerate problems with non-smooth, merely Lipschitz continuous value functions.If the data and value function are smoother, we show that the approximation has rate 1 and explain why this is the maximal rate.
Our rate 1/4 in (2.1) improves both the order 1/6 in [8] and the rate 1/10 achieved in [3] by different (PDE) techniques.We also carefully explain why we can improve the result in [8].It is an interesting open question if the same rate could be obtained purely by PDE techniques.
This work also opens up the possibility of improving the error estimates for other approximation schemes as outlined in Section 3.Moreover, it enables a purely probabilistic error analysis for semi-Lagrangian schemes for HJB equations with results that are in line with the best available results by PDE methods.We refer to [12] for the details.