Stochastic Recursive Optimal Control Problem with Time Delay and Applications

This paper is concerned with a stochastic recursive optimal control problem with time delay, where the controlled system is described by a stochastic differential delayed equation (SDDE) and the cost functional is formulated as the solution to a backward SDDE (BSDDE). When there are only the pointwise and distributed time delays in the state variable, a generalized Hamilton-Jacobi-Bellman (HJB) equation for the value function in finite dimensional space is obtained, applying dynamic programming principle. This generalized HJB equation admits a smooth solution when the coefficients satisfy a particular system of first order partial differential equations (PDEs). A sufficient maximum principle is derived, where the adjoint equation is a forward-backward SDDE (FBSDDE). Under some differentiability assumptions, the relationship between the value function, the adjoint processes and the generalized Hamiltonian function is obtained. A consumption and portfolio optimization problem with recursive utility in the financial market, is discussed to show the applications of our result. Explicit solutions in a finite dimensional space derived by the two different approaches, coincide.


1.
Introduction. The research of many natural and social phenomena shows that the future development of many processes depends not only on their present state but also essentially on their previous history. Such processes can be described by stochastic differential delayed equations (SDDEs). Many examples, such as population dynamics models in biology and memory or inertia representation models in finance, can be found in Kolmanovskii and Shaikhet [19], Mohammed [23]. Stochastic optimal control problems with time delay are those whose dynamics of states are described by SDDEs, and to find some optimal control to maximize/minimize the corresponding cost functionals. In general, stochastic optimal control problems with time delay are practically intractable, because of their infinite-dimensional nature.
However, in certain cases which are still very interesting for the applications, stochastic optimal control problems with time delay can be reduced to a finitedimensional one and solved explicitly. To the best of our knowledge, the first example of such a solvable problem is a linear delayed system with a quadratic cost functional, given by Kolmanovskii and Maizenberg [18], where only the pointwise and distributed time delays are involved in the state variable (see (2) in this section). Then Elsanosi et al. [13] consider optimal harvesting of systems described by SDDEs, where the value function of the problem depends on the initial path of the process in a simple way. Maximum principle approach was developed by Øksendal and Sulem [25] for optimal control of stochastic systems with delay, where the adjoint equations are described as three backward SDDEs (BSDDEs) and one of the adjoint processes need to be zero. Dynamic programming principle for optimal control problems of systems described by SDDEs was obtained by Larssen [20] when both the dynamics and the cost depends on the past in a general way. As applications, systems where the value function depends on the past only through some weighted average were studied. The finite dimensional Hamilton-Jacobi-Bellman (HJB) equation for the value function of such problems was derived by Larssen and Risebro [21], and the solvability of it was guaranteed by a particular system of first order partial differential equations (PDEs). Extensions for stochastic optimal control problems with time delay, to jump diffusions can be seen in Øksendal et al. [28] and to infinite horizon were researched by Agram et al. [1] recently.
The nonlinear backward stochastic differential equation (BSDEs) was first introduced by Pardoux and Peng [24]. Independently, Duffie and Epstein [10] introduced the BSDE when they presented a stochastic differential formulation of recursive utility. Later, found by El Karoui et al. [11], the recursive utility process can be regarded as the solution to some special BSDE. The stochastic recursive optimal control problem is the one whose cost functional is described by the solution to a BSDE. In this setting, the controlled systems become forward-backward stochastic differential equations (FBSDEs). This kind of optimal control problem has important applications in mathematical economics and finance; see Schroder and Skiadas [31], El Karoui et al. [12], Øksendal and Sulem [26], Wang and Wu [36], Shi and Wu [33], Shi and Yu [35].
It is natural to study stochastic recursive optimal control problems or forwardbackward stochastic control systems with time delay, by involving time delays of the state and/or the control variables in the coefficients of the state dynamics and/or the cost functionals. In this case, the cost functional is described as the solution to some BSDDE which is a natural generalization of the classical BSDE to time delayed one. To our best knowledge, Fuhrman et al. [16] first considered one special case of forward-backward stochastic control system with time delay under infinite dimensional space framework, and the value function was proved to be a mild solution to the corresponding HJB equation and the existence of optimal controls in the weak sense was given. In Chen and Wu [7], stochastic recursive optimal control problem with time delay in a general form was considered and the dynamic programming principle was presented. The value function was proved to be the viscosity solution to the corresponding infinite dimensional HJB equation. The optimal control problem of an infinite horizon system governed by a forward-backward SDDE (FB-SDDE) was studied by Agram and Øksendal [2]. Sufficient and necessary maximum principles for optimal control under partial information were obtained. An optimal consumption problem with respect to recursive utility from a cash flow with delay was discussed. However, since in their paper the adjoint backward equation was described as an anticipated or time-advanced BSDE (ABSDE) of Peng and Yang [30]'s type, no explicit solution was given (note that a solvable special case was given only when trivially there was no time delay in [2]). We point out that the ABSDE was another important generalization of classical BSDE, which was very useful to represent the adjoint equation when dealing with the stochastic optimal control problem especially with time delay in the control variable; see Chen and Wu [6], Yu [39]. However, it is in general very difficult to find explicit solutions to this kind of ABSDEs when dealing with real-world problems, though some solvable and numerical results have been published in very special cases.
In the present paper, different from all the above literatures, we study the following stochastic recursive optimal control problem with time delay. Let {W (t), t ≥ 0} be a one-dimensional Brownian motion on some probability space (Ω, F , P). For s ≥ 0, we assume that the completed filtration F s t = σ{W (τ ); s ≤ τ ≤ t} is augmented by all the P-null sets in F . Let 0 < T < ∞ be the fixed time duration and 0 ≤ δ < ∞ be the constant time delay. Denote C([−δ, 0]; R) the Banach space of continuous functions γ : For given initial time s ∈ [0, T ), we consider the following controlled SDDE Here continuous function ϕ : [−δ, 0] → R is the initial path of X s,ϕ;u (·). Let U ⊂ R be a nonempty convex set. Control u : Ω × [0, T ] → U is an F s t -adapted process and X s,ϕ;u represent given functionals of the path segment X s,ϕ;u t := X s,ϕ;u (t+τ ); τ ∈ [−δ, 0] of X s,ϕ;u (·). λ ∈ R is the averaging parameter. b : [0, T ] × R 3 × U → R and σ : [0, T ] × R 3 × U → R are given continuous functions.
Problem (FBSOCPD). The forward-backward stochastic optimal control problem with time delay is to find an optimal control u * (·) ∈ U[s, T ], such that In general, the solution to Problem (SROCPD) (or equivalently, Problem (FBSOCPD)) will depend on the initial path ϕ, which is in an infinite dimensional space C([−δ, 0]; R). As mentioned before, we expect that in some special cases it can be reduced to a finite dimensional one. In such a context, the crucial point is to investigate when this finite dimensional reduction of the problem is possible and/or to find conditions ensuring that. Several papers have made pioneering effort on this topic for stochastic optimal control problems with time delay (not recursive); see [18], [13], [25], [21]. Motivated by this point and its applicable prospect, in this paper we seek the conditions to ensure that Problem (SROCPD) can be reduced to a finite dimensional one. Specifically, we show that if the system (6) is on the STOCHASTIC RECURSIVE OPTIMAL CONTROL WITH TIME DELAY the problem can be reduced to a finite dimensional one and its solvability could be guaranteed, provided an auxiliary system of first order PDEs involving the coefficients b 1 , b 2 , σ, f 1 , f 2 and φ admits a solution. Though the main result which we obtained in this paper is for the controlled system (9) which is less general than (6), it never the less covers many interesting applications. This is the first main contribution of this paper, and we will make this point clear in Section 2.
The other main contribution in this paper is that we first study the relationship between Bellman's dynamic programming and Pontryagin's maximum principle approaches, for stochastic recursive optimal control problems with time delay. Such a topic is of great importance in delay-free stochastic control theory; see the systematic monograph by Yong and Zhou [38]. Since the relationship between these two approaches is the one between the derivatives of the value function and the adjoint processes along the optimal state, or actually the one between HJB equations and stochastic Hamiltonian systems, and more generally, the one between PDEs and SDEs. For recent development of the relationship between dynamic programming and maximum principle for stochastic optimal control problems (without delay but including jump diffusions, Markov switching, singular control, or FBSDE systems), refer to Framstad et al. [15], Shi and Wu [34], Donnelly [9], Zhang et al. [42], Bahlali et al. [4], Shi and Yu [35], Chighoub and Mezerdi [8]. Thereby, it is natural to ask the question: Are there any relations between these two extensively used and important approaches, for stochastic optimal control problems with time delay? The answer should be yes. However, to our best knowledge, results on this topic are quite lacking in the literature, except the one by the first author [32]. One main difficulty and obstacle is that the solution to a stochastic optimal control problem with time delay, or controlled system with SDDE, will be in an infinite dimensional space framework. Moreover, their solvability in the infinite dimensional spaces is complicated and consequently their real applications are largely limited. Due to the special dependence on the past trajectory via terms X 1 (t) and X 2 (t) in (2), in this paper we first prove a sufficient maximum principle when the terminal condition φ has some linear form. Note that our result can not be covered by Theorem 3.1 of [2], since they use time advanced FBSDE (AFBSDE) to describe the adjoint processes while we use the FBSDDE. Then we find the connections between the derivatives of the value function and the adjoint processes along the optimal state, assuming that the value function depends on the initial path of the process in a simple way and is smooth enough. The main result is shown in Section 3.
Rich literatures can be found for the financial applications of stochastic optimal control problems with time delay. For example, refer to [19], [23] for population growth models in biology, to [25], [6], [28] for optimal consumption choice problems, to Gozzi and Marinelli [17] for advertising models, to Federico [14] for pension fund models, to Pang et al. [5] for portfolio optimization models and to Arriojas et al. [3], Mao and Sabanis [22] for option pricing models in financial market. However, because of the infinite dimensional framework in many cases, no explicit solution exists, and numerical solutions are very difficult to obtain. This is one motivation for us to study the controlled system with time delay in the forms of X 1 (t) and X 2 (t) as defined in (2). In Section 4, inspired by the applicable examples in [25] and particularly in [5], a consumption and portfolio optimization problem with recursive utility in the financial market is discussed. Another main contribution in this paper is that we obtain the explicit solution in finite dimensional space for this problem. Via an investigation of the corresponding PDEs system to guarantee the corresponding generalized HJB equation is effective, a complete discussion is possible and the theoretical results obtained in the previous sections are justified.
The rest of this paper is organized as follows. In Section 2, under some suitable assumptions, we investigate that under what conditions on the coefficients, the generalized HJB equation obtained by [7] via dynamic programming can be reduced to a finite dimensional one. An the main result is a stochastic verification theorem. In Section 3, after deriving a sufficient maximum principle for the optimal control, we obtain the relationship the two approaches: dynamic programming principle and maximum principle. Under the assumption that the value function is smooth enough, the relations among its derivatives, the adjoint processes and the generalized Hamiltonian function are given. A consumption and portfolio optimization problem with recursive utility in the financial market is discussed in Section 4, to show the applications of our result. Explicit solutions in a finite dimensional space derived by the maximum principle and dynamic programming approaches, coincide. Finally, Section 5 gives some concluding remarks.

Preliminaries and the Generalized HJB Equation in
Finite Dimension. In this section, we focus on the dynamic programming approach for Problem (SROCPD). We first present a stochastic verification theorem, where the generalized HJB equation in Theorem 4.9 of [7] is reduced to a finite dimensional one, by assuming that the value function of our problem depends on the initial path of the state process in a simple way. Then we find condition on the coefficients b 1 , b 2 , σ, f 1 , f 2 and φ to ensure the above reduction is effective and applicable, which is a system of first order PDEs. The results in this section can be regarded as the extension of those in [21] to recursive utility case.
For any s ∈ [0, T ), the following notations are used in this paper.
First we introduce the following assumptions. (H1) The functions b(t, x, x 1 , x 2 , u) and σ(t, x, x 1 , x 2 , u) are joint continuous and globally Lipschitz in (x, x 1 , x 2 ). (H2) There exists a constant C > 0 such that The following classical result can be seen in [23].
. We also need the following assumptions.
x, x 1 , x 2 , y, z, u) and φ(x, x 1 ) are joint continuous and globally Lipschitz in (x, x 1 , x 2 , y, z). (H6) There exists a constant C > 0 such that The following result can be obtained from the classical BSDE theory, by Lemma 2.1. See also [7] in detail.

Lemma 2.2. Let assumptions (H1)∼(H6) hold, then for any
. We now introduce some preliminaries in infinite dimension, which is also used in [7,16,23]. Let C b be the Banach space of all bounded uniformly continuous Define the operator P t : We also define an operator A : here Φ belongs to the domain D(A) of A if and only if the above weak limit exists in C b . Then we can obtain easily that ( [23]) For a Borel measurable function Φ : In addition, for each sufficiently smooth function Φ, we denote its first and second Fréchet derivatives with respect to ϕ ∈ C([−δ, 0]; R) by DΦ and D 2 Φ. And let ∂t , DΦ, D 2 Φ exist and they are globally bounded and Lipschitz continuous. Then we have the following formula for the generator A, which is a slight modification of Theorem 4.2 in [7], which is also can be seen in [23].
, and {X s,ϕ;u (t), t ∈ [s, T ]} be the Markov solution process to the SDDE (1) and (2), with the initial data (s, ϕ) Now, we turn to consider Problem (SROCPD) by Belmann's dynamic programming. In general, the value function V (s, ϕ) defined in (8) may depend on the initial path ϕ ∈ C([−δ, 0]; R) in a complicated way. From Theorem 3.7 of [7], we know that the value function satisfies the following generalized dynamic programming principle (DPP): Here X s,ϕ;û s is the map X s,ϕ;û s : [−δ, 0] → R defined by X s,ϕ;û s (τ ) := X s,ϕ;u (ŝ + τ ). The following result is an immediate corollary of Theorem 4.9 in [7]. then V (s, ϕ) solves the following PDE Note that (12) is a PDE with terminal condition in the infinite dimensional space. One of the main target in this section is to find out that, under what conditions it can be reduced to a finite dimensional one. Inspired by [21], one might expect that the value function V (s, ϕ) depends on ϕ only through the first two functionals x(ϕ), x 1 (ϕ), that is, and is independent of the third functional x 2 (ϕ). If this is the case, the operator A in (10) is a differential operator and the equation (12) is a second order PDE in the finite dimensional space. For this, we first need the following delayed Itô's formula, whose proof can be seen in [13]. (1) and (2), then for given u ∈ U, we have The following theorem takes the independence of x 2 for the value function V as an assumption, and states a stochastic verification theorem via the finite dimensional counterpart to (12). Theorem 2.6. (Stochastic Verification Theorem) Suppose that the following PDE admits a sufficiently smooth solution V depends on (s, x, x 1 ) only and V (s, x, Then Furthermore, an admissible pair (X * (·), u * (·)) ≡ (X s,x,x1;u * (·), u * (·)) is an optimal pair for Problem (SROCPD) if and only if Proof. For any u(·) ∈ U[s, T ] with the corresponding state X u (·) ≡ X s,x,x1;u (·) and X u 1 (·) ≡ X s,x,x1;u 1 (·), X u 2 (·) ≡ X s,x,x1;u 2 (·) defined as (2), applying delayed Itô's formula (14) to V (t, X u (t), X u 1 (t)), we obtain that The third "=" in the above holds by the uniqueness of the solution to the BSDDE (3). Thus (17) holds. Next, applying the above inequality to (X * (·), u * (·)), we have The desired result follows immediately from the fact that which is due to PDE (15). The proof is complete.
PDE (15) is the finite dimensional counterpart of (12). However, since the coefficients b, σ of the SDDE (1) enter into the delayed Itô's formula, and the generator f of the BSDDE (3) depends on x 2 , the coefficients of the BSEE (12) also depend on x 2 . Consequently, we cannot apriori expect (12) to have solutions independent of x 2 .
In the sequel, we will clarify that under some conditions on the coefficients b, σ, f , have a solution depending only on (s, x, x 1 ). In other words, we seek conditions ensuring that a solution to (12) will be independent of x 2 , thus the generalized HJB equation in finite dimension (15) is "effective". The following theorem is our main result.
Theorem 2.7. The generalized HJB equation in finite dimension (15) and the following system of first order PDEs Proof. We first know that if V = V (s, x, x 1 ), then from (15), (16), V satisfies the following generalized HJB equation Inserting this into (21), it takes the form (23) Suppose that f takes the form in (19), then (23) reduces to (24) Next, suppose that σ takes the form in (19), then (24)

JINGTAO SHI AND HUANSHUI ZHANG
Finally, suppose that b takes the form in (19), then (25) with the terminal condition reduces to which is independent of x 2 . Note that now (22) takes the form Then (27) states that ∂V ∂x 1 (s,x,x 1 ) = 0, for all (s,x,x 1 ).

Using the initial variables
in (28), we end with (20). The proof is complete.
That is to say, condition (19) together with the PDEs system (20) guarantees that the reduction of PDE (12) from an infinite dimensional one to its finite dimensional counterpart (15). Though the results obtained in Theorems 2.6 and 2.7 corresponding to (29) are less general than (6), they never the less cover many interesting applications. In Section 4, we will present one financial example that satisfy the conditions (19), (20) for its dynamics of the state being the form of (29). Some discussions are also given to indicate why it is difficult to find more general examples.
3. Relationship with Maximum Principle. For stochastic optimal control problems with time delay and those of FBSDEs (recursive utility, without time delay), the relationships between dynamic programming principle and maximum principle are shown in [32] and [35], respectively. In this section, a similar relationship is given between the value function V , the generalized Hamiltonian function G, and the adjoint processes p, q (see Theorem 3.1), under the assumption that the value function is smooth enough and depends on the initial path of the state in a simple way as in Theorem 2.6. The main result is shown in Theorem 3.2, which could cover many interesting applications. For this target, we first solve Problem (FBSOCPD) by the Pontryagin's maximum principle approach. In this part, let the initial time s = 0 and write X = X u = X 0,ϕ;u , etc. Moreover, we need the following additional assumptions.

JINGTAO SHI AND HUANSHUI ZHANG
we have In the above, we have used for β = b, σ, f and ρ = x, x 1 , x 2 , y, z, u. By (32) we then have Thus u * (·) is an optimal control for Problem (FBSOCPD). The proof is complete.
Proof. Note that this sufficient maximum principle is proved for controlled system (6) other than its special form (9), and in the special case that φ is linear with respect to x, x 1 (see (33)). The general case to eliminate this linear restriction is open, even for problem of FBSDEs without time delay. See [26], [33] for details.
The following is the main result in this section.

JINGTAO SHI AND HUANSHUI ZHANG
Hence, by the uniqueness of the solution to the p 2 (t) part of adjoint equation (31), a.e.t ∈ [s, T ], a.s. (42) And finally can be easily obtained by solving the forward equation of q(t) directly. The proof is complete.

Application to Consumption and Portfolio
Optimization with Recursive Utility. In this section, we discuss a consumption and portfolio optimization problem with recursive utility in the financial market. The financial framework in this problem is initiated introduced by Chang et al. [5], with classical cost functional. In this paper, we generalize their model to the case with recursive utility. The optimal portfolio and consumption strategies are obtained by both dynamic programming and maximum principle approaches, in the meanwhile the relations we obtained in Theorem 3.2 are illustrated. Let us first describe the environment of the financial market. Consider an investor who can invest his money into a risky asset and a riskless asset. The risky asset can be a stock, a mutual fund, etc. The riskless asset earns a fixed interest rate r > 0. We can treat the money invested on the riskless asset as money deposited into a bank account. We assume that the investor can consume his/her wealth.
Let U (t) be the amount invested on the risky asset and V (t) is the amount invested on the riskless asset. The total wealth is given by X(t) = U (t) + V (t). We consider the situation in which the performance of the risky asset has some memory (delay). Because many investors will look at an asset's past performance before they invest their money on the asset, the increasing investment performance of their wealth in the past tends to drive the investors to invest more on the risky asset, hence it can push the price of the risky asset even higher. On the other hand, if the price has been decreasing a lot, investors tend to sell the asset and invest on other assets, which will drive the price to go down further. To describe this phenomenon, we assume that the performance of the risky asset depends on the following delay variables X 1 (t) and X 2 (t): for any initial time s ∈ [0, T ). Here λ is a constant and δ > 0 is the delay parameter. The parameter δ gives us the duration of the past that the investor usually cares about.
Let {W (t), t ≥ 0} be a one-dimensional standard Brownian motion defined on a probability space (Ω, F , P). We assume that the filtration F 0 t = σ{W (τ ); 0 ≤ τ ≤ t} is augmented by all the P-null sets in F . We assume that U (t) and V (t) follow the stochastic differential equations: where µ 0 , µ 1 , µ 2 and σ are positive constants, and C(t) is the consumption rate.
Add them together, and use the fact that X(t) = U (t) + V (t), then we get the equation for the wealth X(t): where continuous function ϕ : [−δ, 0] → R is the initial condition for information about X(t) for t ∈ [−δ, 0].
Further, instead of U (t) and C(t), we use c(t) ≡ C(t)/X(t) and u(t) ≡ U (t)/X(t) as our consumption and portfolio control, respectively (note that X(t) > 0, a.s. is proved in Lemma 2.2 of [5]). It is easy to see that V (t) = X(t) − U (t) = X(t)(1 − u(t)). Now we can rewrite the equation for X(t) as Now we define the admissible control space Π for the control variables u(t) and c(t). (i) (u(t), c(t)) is F t -adapted processes; (ii) c(t) ≥ 0, ∀t ∈ [0, T ]; (iii) For any t ∈ [0, T ], we have where Λ 1 , Λ 2 > 0 are positive constants.
Remark 2. The condition (iii) is sufficient to obtain the result in Lemma 2.2 of [5].
Remark 3. The recursive utility functional defined in (49) with generator x, x 1 , y, z, u, c) = −βy stands for some standard additive utility of recursive type. It can be easily checked that f defined above is concave with respect to (c, y) and increasing with respect to c, which are classical properties that utility functions must satisfy. Recursive utility such as (50) is meaningful and nontrivial generalization of the classical additive utility and has many applications in mathematical economics and mathematical finance. For more details about recursive utilities, see [31], [11,12] and the references therein.

4.1.
Dynamic Programming Approach. In this subsection, we solve the above consumption and portfolio optimization problem, applying Bellman's dynamic programming approach.
Note that the conditions (62), (63) are the same as (81), (88), respectively. The conditions (62), (63) comes from the system of first order PDEs (20), and conditions (81), (88) relies heavily on the condition (34) about the adjoint processes. This is not by chance but the natural requirement of our problem being in a finite dimensional space.
So is is easy to see that µ 1 = 0 if and only if µ 2 = 0, provided that µ 2 ≥ 0 and lim µ2→∞ µ 1 = ∞. In other words, the price dynamics of X(t) must depend on both X 1 (t) and X 2 (t) at the same time, in order to obtain the explicit representations of V, u * and c * in a finite dimensional space. Otherwise, when µ 2 = 0 (then µ 1 = 0), that is, the dynamic equation of X(t) does not depend on X 1 (t) and X 2 (t) explicitly, our model reduces to the consumption and portfolio optimization model with recursive utility but without time delay.

5.
Conclusion. In this paper, we have discussed Pontryagin's maximum principle, Bellman's dynamic programming and their relationship for the stochastic recursive optimal control problems with time delay, when only the pointwise and distributed time delays in the state variable is considered. One advantage for this kind time delay is that the corresponding generalized HJB equation is finite dimensional, under some suitable conditions on the coefficients. Under the assumption that the value function is smooth enough, its relations to the adjoint processes and generalized Hamiltonian function are obtained. A consumption and portfolio optimization problem with recursive utility in the financial market, was discussed to show the applications of our result. Explicit solutions for the optimal portfolio and consumption strategy in the finite dimensional space derived by the two approaches, coincide. Potential extensions of the present work include stochastic optimal control problems with time delay under model uncertainty (Pamen [29]) and stochastic differential games (Øksendal and Sulem [27]) under model uncertainty. Problems with time delay in control variables ( [6], [39]) and time varying delay in control variables (Zhang et al. [40], Zhang et al. [41], Wang and Zhang [37]), are rather challenging. These will be considered in our future research.