Approximating Value Functions for Controlled Degenerate Diffusion Processes by Using Piece-Wise Constant Policies

It is shown that value functions for controlled degenerate diffusion processes can be approximated with error of order $h^{1/3}$ by using policies which are constant on intervals $[kh^{2},(k+1)h^{2})$.


Introduction
Piece-wise constant policies play an important role in the theory of controlled diffusion processes. For instance, in [4] they are used for proving Bellman's principle in its most general form. These policies are also important from the point of view of numerical computations since one knows how to approximate diffusion processes without control and on each interval of time where a policy is constant we are dealing with usual diffusion processes. In this connection an interesting question arises as to how good the gain for piece-wise constant policies approximates the value function.
In the framework of controlled diffusion processes this question was first addressed in article [6], in which it is shown that if the intervals, where policies are constant, have length h 2 and the controlled process can degenerate and has coefficients independent of time and space variable, then the rate of approximation is not less than h 1/3 . In the general case of variable coefficients the rate is shown to be not less than h 1/6 . The general situation in [6] allows coefficients to be discontinuous in time although we still need them to be Hölder continuous with respect to some integral norms. The main result of the present article, stated as Theorem 2.3 in Section 2, is that the rate is still h 1/3 if the coefficients are Hölder 1/2 in time variable. In the proof of this result we use an idea from [7], which consists of "shaking" the coefficients and in order to be sure that after "shaking" we get the value functions which are close to initial ones, we assume usual Hölder continuity instead of Hölder continuity in integral norms.
Results of that kind allow one (see, for instance, [7]) to analyze the rate of convergence of finite-difference approximations for Bellman's equations. In Section 5 we apply the results of Section 2 to improve some of the results of [7]. However the best we can do for finitedifference approximations is still h 1/21 (with h being the mesh size) and this is why in Section 5 we discuss other methods of approximating the value function which can be used for numerical computations and are of order of accuracy h 1/3 . Apart from the above mentioned sections, there are also Sections 3 and 6 containing some auxiliary results and Section 4 containing the proofs of our main results.
Throughout the article R d is a d-dimensional Euclidean space, A is a separable metric space, T ∈ (0, ∞), K ∈ [1, ∞), and δ 0 ∈ (0, 1] are some fixed constants. By N we denote various constants depending only on K, d, and d 1 , where d 1 is introduced in the next section.

Main results
Let (Ω, F , P ) be a complete probability space, {F t ; t ≥ 0} be an increasing filtration of σ-algebras F t ⊂ F which are complete with respect to F , P . Assume that on (Ω, F , P ) a d 1 -dimensional Wiener process w t is defined for t ≥ 0. We suppose that w t is a Wiener process with respect to {F t }, or in other terms, that {w t , F t } is a Wiener process.
Definition 2.1. An A-valued process α t = α t (ω) defined for all t ≥ 0 and ω ∈ Ω is called a policy if it is F ⊗ B([0, ∞))-measurable with respect to (ω, t) and F t -measurable with respect to ω for each t ≥ 0. The set of all policies is denoted by A. For any h ∈ (0, 1] let A h be the subset of A consisting of all processes α t which are constant on intervals [0, h 2 ), [h 2 , 2h 2 ), and so on.

Fix an integer d ≥ 1 and suppose that on
x), and g(x). We assume that these functions are Borel measurable.
(i) The functions u(α, t, x) are continuous with respect to α.
(iii) For any α ∈ A, s, t ∈ [0, ∞), and x ∈ R d we have By Itô's theorem for any α ∈ A, s ∈ [0, T ], and x ∈ R d , there exists a unique solution x t = x α,s,x t , t ∈ [0, ∞), of the following equation where, as usual, the indices α, s, x accompanying the symbol of expectation mean that one has to put them inside the expectation at appropriate places.
Here is our first main result.
where the constant N depends only on K, d, and d 1 .
One may try to use Theorem 2.3 for approximating v in real-world problems. At least theoretically v h can be found in the following way. The dynamic programming principle says that, for and the last expectation is evaluated for the constant policy α t ≡ α. Therefore, if one knows how to compute G α s,s+h 2 , one can use (2.2) in order to find v h from its boundary value v h (T, x) = g(x) by backward iteration.
However, finding G α s,s+h 2 requires finding distributions of solutions of (2.1) even though only for constant α t . Since G α s,s+h 2 only depends on solutions of (2.1) on a small time interval, it is natural to think that one can "freeze" not only α in (2.1) but also s+r and x r . Then one can avoid solving equation (2.1) and deal with Gaussian random variables with known mean and variance. The following theorem is an example of what one can get along these lines.
Then, for any s ∈ [0, T ], x ∈ R d , and h ∈ (0, 1], we have where the constant N depends only on K, d, and d 1 .

Auxiliary results
Define As in the example σ(α, t, x) = σ(α, 0, x), extend σ, b, c, and f for negative t and for a fixed ε ∈ (0, 1] and any β = (α, τ, ξ) ∈ B let and similarly define b(β, t, x), c β (t, x), and f β (t, x). We denote by B the set of all measurable F t -adapted B-valued processes. As usual starting with these objects defined on B, for any β ∈ B, s ≥ 0, and x ∈ R d , we define a controlled diffusion process x β,s,x t and the value functions which we consider for s ≤ S, where S = T + ε 2 . We will keep in mind that x β,s,x t , u β (s, x), and u(s, x) also depend on ε which is not explicitly shown just for simplicity of notation. Also let B h be the set of functions from B which are constant on the intervals [0, h 2 ), [h 2 , 2h 2 ), and so on and let Proof. By comparing the equations defining x β,s,x t and x α,s,x t one can easily get (see, for instance, Theorem 2.5.9 in [4]) that the left-hand side of (3.3) is less than where the sup is taken over In turn (3.6) is less than the right-hand side of (3.3) by Assumption 2.2.
One proves (3.4) and (3.5) similarly on the basis of the same Theorem 2.5.9 in [4]. The lemma is proved.
Proof. (i) It suffices to prove the first inequality. By Lemma 3.1 and Hölder's inequality In the same way one proves (ii). The corollary is proved. The next lemma bears on the dynamic programming principle, in which the assertion regarding u is a particular case of Theorem 3.1.6 in [4] and the assertion regarding u h is a particular case of Exercise 3.2.1 in [4], a solution to which can be easily obtained from Lemma 3.2.14 and the proof of Lemma 3.3.1 of [4].
This lemma allows us to improve part of assertions in Corollary 3.2.
Indeed, the second inequality in (i) follows trivially from (ii) and Corollary 3.2 (ii). To prove the first assertion in (i) it suffices to notice that Similarly one proves assertion (ii). Next, take a nonnegative function ζ ∈ C ∞ 0 ((−1, 0) × B 1 ) with unit integral, and for where z x , z xx , z t are the gradient of z in x, the Hessian matrix of z in x, and the derivative in t, respectively, and The space C 2+δ 0 is defined as the collection of all functions z with finite norm |z| 2+δ 0 .
Proof. Observe that, owing to Corollaries 3.2 and 3.4 and the in- As easy to see, the last expression is less than . Hence the inequality between extreme terms holds for all t, s ≤ T . In the same way one gets that , and this proves the first inequality in (3.7).
To prove the second one, it suffices to notice that The lemma is proved. 4. The proofs of Theorems 2.3 and 2.5 . Hence we only need prove (4.1) for s ≤ T − h 2 , assuming without loss of generality that T ≥ h 2 . Denote We restate this in the following way , and x ∈ R d . It follows from (4.3) that, for any constant policy α, s ≤ T − h 2 , and x ∈ R d , we have By Itô's formula we further infer that where To estimate N α we remember that the Hölder continuity of the coefficients of L α is one of our assumptions and we use Lemma 3.5. Then we easily get that N α ≤ Ne N T ε −1−δ 0 , so that (4.4) implies for any α ∈ A, s ≤ T − h 2 , and x ∈ R d . Next, from (4.6) by Itô's formula, for any α ∈ A, s ≤ T − h 2 , and x ∈ R d , we obtain which by Lemma 3.5 allows us to conclude that Furthermore, notice that Hence, from (4.7) and the inequality u h ≤ v h + Ne N T ε δ 0 (see Corollary 3.2 (i)), we obtain and this proves ( recursively bȳ Of course,x α,s,x t also depends on h, dependence on which is not shown explicitly just for simplicity of notation. It is easy to see thatx α,s,x t satisfies

Notice that equation (2.3) is a dynamic programming equation for the problem of maximizingv
Now we see that, owing to Theorem 2.3, to prove the present theorem, it suffices to prove that, for any α ∈ A h , s ∈ [0, T ], and x ∈ R d , This inequality is similar to the inequalities from Corollary 3.2 and we prove it by using again Theorem 2.5.9 of [4]. We rewrite (4.8) as Then by Theorem 2.5.9 of [4] we get It is easy to see that (4.9) follows from the above estimates. The theorem is proved.

Other methods of approximating value functions
Remember that the operator L α is introduced in (4.5) and define By definition v is a probabilistic solution of the problem The function v is also a viscosity solution of (5.1) (see, for instance, [3]). Next, we describe the approximating scheme for solving (5.1) introduced in [7]. Let R d+1 be the set of all bounded functions on R d+1 + . For any h ∈ (0, 1] let a number p h ∈ [1, ∞) and an operator F h : u ∈ B → F h [u] ∈ B be defined.
We need the space C 2+δ 0 ([t, t+h 2 ]) provided with norm |·| 2+δ 0 ,[t,t+h 2 ] . These objects are introduced in the same way as the space C 2+δ 0 in Section 3 before Lemma 3.5 only this time we consider functions defined on [t, t + h 2 ] × R d .
(iv) let := (t) := e −2t , then for any constant M ≥ 0 and u 1 , Remark 5.2. The reader can find in [7] several examples of operators F h satisfying Assumption 5.1. In particular, these examples include implicit and explicit finite-difference schemes. Furthermore, as easy to see the operators with p h = h −2 satisfy Assumption 5.1.
By Lemma 1.7 of [7] there exists a unique bounded functionv h defined on [0, T + h 2 ] × R d and solving the problem Theorem 1.9 in [7] in our particular situation of Lipschitz continuous c, f, g reads as follows.
The following is an improvement of Theorem 1.11 of [7].
Remark 5.5. If σ ≡ 0, Bellman's equation becomes the Hamilton-Jacobi equation. In this case better results can be found in Appendix A, written by M. Falcone, in [1] and in the references therein. Also notice that in the case of δ 0 = 1 and σ, b independent of (t, x) we have |v −v h | ≤ Ne N T h 1/3 as can be seen from [5].
Remark 5.6. The rate of convergence in Theorem 1.11 of [7] is 1/39 if δ 0 = 1. Improving it to 1/21 is of course a step forward. However, we still do not know what kind of additional conditions are needed in order to get the rate 1/3 for more or less general approximating operators F h . Even more than that, we do not know what is the real rate of convergence in the case when d = 2 and where e i are unit basis vectors and σ i are smooth, say small functions having zeros. Theorem 5.4 only says that the rate is not smaller than 1/21 if g is bounded and Lipschitz. Approximations (5.2) with the same time step as in (5.3) are of order at least 1/3 by Theorems 2.3 and 2.5. However computing the operators in (5.2) requires computing some integrals at all points of R d , whereas in (5.3) one only meets simplest sums and one can restrict oneself to points on a grid.
Below we give some conditions which allow one to construct approximations like (5.2) with other type of random variables involved, say only taking finitely many values on a grid but still not as specified as in (5.3). As everywhere in the article, the conditions of Section 2 are assumed to hold.
Fix an h ∈ (0, 1] and assume that we are given an R d 1 -valued random variableŵ such that Then, for any s ∈ [0, T ] and x ∈ R d , we have The proof of this theorem is based on several auxiliary results. Take a sequence of i.i.d. R d 1 -valued random variablesŵ n , n = 0, 1, 2..., having the same distribution asŵ. LetF t be the σ-field generated byŵ n for n ≤ [th −2 ] and letÂ h be the set of all A-valuedF t -adapted processes. For α ∈Â h definex nh 2 =x α,s,x nh 2 recursively bŷ x 0 = x,x (n+1)h 2 =x nh 2 + σ(α nh 2 , s + nh 2 ,x nh 2 )ŵ n +b(α nh 2 , s + nh 2 ,x nh 2 )h 2 . Again we do not include h in the notationx α,s,x nh 2 just for simplicity. Also letφ Next we consider "shaken" coefficients, fixing an ε ∈ (0, 1]. We use again objects (3.1) and (3.2) and we define b(β, t, x), c β (t, x), and f β (t, x) similarly. We denote byB h the set of all measurableF tadapted B-valued processes.
We prove this lemma in Section 6. Our next lemma is the dynamic programming principle.
In particular, for t = s + h 2 ≤ S, we havê Finally, we need the following fact which is easily proved by using Taylor's formula and assumption (5.4).
Take ε from (4.2) and proceed as in Subsection 4.1 observing that (5.6) implies that Hence for any α ∈ A, s ≤ T − h 2 , and x ∈ R d , we havê h and we can finish the proof of (5.7) in exactly the same way as the proof of Theorem 2.3. Now we prove thatv We are going to use the following lemma which is a particular case of Theorem 2.1 in [7].

Proof of Lemma 5.8
We need the following counterpart of Lemma 3.1.
Lemma 6.1. Take t, s, r ∈ [0, S], x, y ∈ R d , and β = (α, τ, ξ) ∈B h . Then Proof. These inequalities are absolutely standard and may be claimed to be well known. Say, (6.1) and (6.2) appeared probably for the first time as Lemmas 2.2 and 2.3 in [2], where the theory of stochastic differential equations is applied to proving the solvability of the Cauchy problem for degenerate parabolic equations (a new result at that time). For completeness we outline the proofs of (6.3) and (6.4) following [2].
On the basis of this lemma and Lemma 5.9, Lemma 5.8 is proved by repeating the proofs of Corollaries 3.2 and 3.4.