Linear-quadratic stochastic Volterra controls II: Optimal strategies and Riccati--Volterra equations

In this paper, we study linear-quadratic control problems for stochastic Volterra integral equations with singular and non-convolution-type coefficients. The weighting matrices in the cost functional are not assumed to be non-negative definite. From a new viewpoint, we formulate a framework of causal feedback strategies. The existence and the uniqueness of a causal feedback optimal strategy are characterized by means of the corresponding Riccati--Volterra equation.


Introduction
Linear-quadratic (LQ) control problems are special classes of optimal control problems described by a linear state dynamics and a quadratic cost functional. In the continuous time setting, the state dynamics is assumed to be governed by a controlled differential/integral equation. In this paper, we consider the following controlled linear stochastic Volterra integral equation (SVIE): where u is a control process, x is a given deterministic function called the free term (which is also called the forcing term), W is a Brownian motion, A, B, C and D are matrix-valued deterministic coefficients, and b and σ are vector-valued stochastic inhomogeneous terms. The cost functional is given by the following quadratic functional: where Q, S and R are matrix-valued deterministic functions, and q and ρ are vector-valued adapted processes. The LQ control problem for an SVIE, which we call an LQ stochastic Volterra control problem, is to minimize the quadratic cost functional J(t 0 , x; u) over all control process u subject to the state dynamics (1.1).
The controlled SVIE (1.1) is a Volterra-type extension of a controlled linear stochastic differential equation ( with x being a constant. LQ control problems for SDEs were first studied by Wonham [23] in 1968 and followed by many researchers; see [25,Chapter 6] and [18] for systematic studies and the recent developments of LQ control theory for SDEs. In this context, there are at least two different frameworks, namely, the open-loop framework and the closed-loop framework. On the one hand, in the open-loop framework, the problem is to find, for each fixed input condition (t 0 , x), a control processû such that for any other control processes u. Such a control process is called an open-loop optimal control. The openloop optimal control is characterized by a coupled system of an SDE and a backward SDE (BSDE) (see [18,Section 2.3]). On the other hand, in the closed-loop framework, the problem is to find an optimal "strategy" which a controller uses to select a control action based on his/her state. More precisely, in the LQ control problem for SDE (1.3) with the cost functional (1.2), consider a matrix-valued deterministic function Ξ and a stochastic inhomogeneous term v which are independent of the choice of input conditions (t 0 , x). The pair (Ξ, v) is called a closed-loop strategy (which is also called a state-feedback strategy). Then for each input condition (t 0 , x), consider the following closed-loop system of the controlled SDE (1.3): We note that the above system is an equation for the state process X = X t0,x , and the control process u = u t0,x is obtained as the outcome of the strategy (Ξ, v) by inserting the solution X t0,x into the expression u t0,x = ΞX t0,x + v. In order to clarify the dependency of the outcome u t0,x on the closed-loop strategy (Ξ, v) and the input condition (t 0 , x), we write u t0,x = (Ξ, v)[t 0 , x]. The problem in the closed-loop framework is to find a closed-loop strategy (Ξ,v) such that for any other closed-loop strategy (Ξ, v) and any input condition (t 0 , x). In this case, the pair (Ξ,v) is called a closed-loop optimal strategy. The closed-loop optimality is closely related to the solvability of a Riccati (differential) equation and a BSDE (see [18,Section 2.4]). It is worth to mention that if (Ξ,v) is a closedloop optimal strategy, then the outcomeû t0,x = (Ξ,v)[t 0 , x] is an open-loop optimal control for every input condition (t 0 , x). Therefore, each closed-loop optimal strategy can be seen as a state-feedback representation of an open-loop optimal control. Optimal control problems of (non-linear) SVIEs were first studied by Yong [24]. By means of the maximum principle, he characterized the open-loop optimal control by the so-called Type-II backward stochastic Volterra integral equation (Type-II BSVIE) which is a Volterra-type extension of a BSDE. Since then, several researchers have tried to solve optimal control problems for SVIEs in the open-loop framework; see [3,5,8,9,14,15,20,21,22]. On the other hand, in the special case of SVIEs with completely monotone and convolution-type kernels, several kinds of feedback representations of the optimal controls were investigated by [1,4,6]. Specifically, Abi Jaber, Miller and Pham [1] studied LQ stochastic Volterra control problems with completely monotone and convolution-type kernels. Based on an infinite-dimensional approach, they obtained a kind of a linear feedback representation of the optimal control; see also [2] for the study on the associated integral operator Riccati equation. We emphasize that the approaches of [1,2,4,6] heavily rely on the special structure of the completely monotone and convolution-type kernels, and they cannot be applied to the non-convolution-type SVIE (1.1).
The purpose of this paper is to formulate and investigate the closed-loop framework of LQ stochastic Volterra control problems with general (singular and non-convolution-type) coefficients. In this framework, a difficulty comes from the definition of the "strategy". Indeed, as discussed by Pritchard and You [13] in the deterministic LQ Volterra control problems, the class of state-feedback strategies of the form u t0,x (t) = Ξ(t)X t0,x (t) + v(t) is not sufficient to capture the Volterra structure of the state dynamics (see also [12] for the study of deterministic LQ Volterra control problems). In our previous paper [11], inspired by the so-called causal projection approach of [12,13], we introduced the notion of causal feedback strategies for the linear controlled SVIE (1.1). This is a feedback strategy involving not only the state process X(t), but also the forward state process defined by Θ(s, t) = x(s) + t t0 {A(s, r)X(r) + B(s, r)u(r) + b(s, r)} dr + t t0 {C(s, r)X(r) + D(s, r)u(r) + σ(s, r)} dW (r) for (s, t) ∈ △ 2 (t 0 , T ) := {(s, t) | t 0 < t < s < T }. The forward state Θ(s, t) can be seen as the causal projection of the original controlled SVIE (1.1) which is determined by information of X and u up to the current time t. A causal feedback strategy consists of a triplet (Ξ, Γ, v) of matrix-valued deterministic functions Ξ and Γ and a stochastic inhomogeneous term v which leads to the following closed-loop system: for each input condition (t 0 , x), {C(s, r)X t0,x (r) + D(s, r)u t0,x (r) + σ(s, r)} dW (r), (s, t) ∈ △ 2 (t 0 , T ), We say that a pair (X t0,x , Θ t0,x ) satisfying the above system a causal feedback solution of the controlled SVIE (1.1) at (t 0 , x) corresponding to the causal feedback strategy (Ξ, Γ, v). This framework is different from that of [1,2,4,6] and more reasonable in view of the (generalized) flow property and the time-consistency of the state dynamics. For more detailed theory on the causal feedback strategies and the associated causal feedback solutions, see our previous paper [11]. In order to clarify the dependency of the outcome u t0,x on the causal feedback strategy (Ξ, Γ, v) and the input condition (t 0 , x), we write u t0,x = (Ξ, Γ, v)[t 0 , x]. Our problem is to find a causal feedback strategy (Ξ,Γ,v) such that for any other causal feedback strategy (Ξ, Γ, v) and any input condition (t 0 , x). In this case, we call the triplet (Ξ,Γ,v) a causal feedback optimal strategy.
The main contributions of this paper are the following two points: (i) We show that the existence of a causal feedback optimal strategy is equivalent to the "regular solvability" of a Riccati-Volterra equation ( (ii) We show that the existence of a "strongly regular solution" of the Riccati-Volterra equation (5.1) is equivalent to the uniform convexity of the cost functional. These two equivalent conditions imply the existence and the uniqueness of the causal feedback optimal strategy. See Theorem 6.5.
Furthermore, we found the following interesting fact: Fact. If the control does not enter the drift part, that is, if B = 0, then the causal feedback optimal strategy (Ξ,Γ,v) is of a state-feedback form in the sense thatΓ = 0. Moreover, if in addition the inhomogeneous terms b, σ, q and ρ are zeros, then it is of a Markovian state-feedback form in the sense thatΓ = 0 and v = 0. See Remark 5.7.
This is a surprising consequence since, even in the homogeneous case, the state process is highly non-Markovian and being non-semimartingale due to the Volterra structure. Very recently, a similar fact was also found in an independent work of Wang, Yong and Zhou [19] by a different method. In [19], they considered an LQ stochastic Volterra control problem (involving a terminal cost) in the open-loop framework, where the coefficients A, B, C and D are non-convolution-type but assumed to be regular (i.e. bounded and differentiable), the inhomogeneous terms b, σ, q and ρ are zeros, S = 0, and the weighting matrices Q and R are assumed to be non-negative and strictly positive definite, respectively. By a dynamic programming method and a decoupling technique, they derived a causal feedback represention of the open-loop optimal control by means of a path-dependent Riccati equation which is different from our Riccati-Volterra equation (5.1).
Besides the fact that the coefficients of the controlled SVIE (1.1) are non-convolution-type and singular, our cost functional (1.2) is also quit general compared to [1,2,19] since we do not a priori impose any non-negativity conditions on the matrix-valued functions Q and R. In particular, under standard nonnegativity assumptions for Q and R which are similar to [1,2,19], we see that the Riccati-Volterra equation is strongly regularly solvable, which implies that there exists a unique causal feedback optimal strategy (see Corollary 6.7). Our results (i) and (ii) mentioned above are extensions of the known results [16,17] of LQ control problems for SDEs (see also [18,Section 2.4]) to our LQ stochastic Volterra control problems. Type-II EBSVIEs were introduced and investigated in our previous paper [11]. They are extensions of a class of Type-II BSVIEs introduced by Yong [24] to the framework of causal feedback solutions of controlled SVIEs. The Riccati-Volterra equation (5.1) is a coupled system of Riccati-type Volterra integro-differential equations which appears for the first time in the literature. This is closely related to Lyapunov-Volterra equations which were also introduced in our previous paper [11].
The rest of this paper is organized as follows: In Section 2, we formulate the LQ stochastic Volterra control problems in the framework of causal feedback strategies. In Section 3, we recall the results of our previous work [11]. Specifically, we introduce Type-II EBSVIEs and Lyapunov-Volterra equations which play fundamental roles in the present paper. In Section 4, we give a useful representation of the cost functional. In Section 5, we introduce the Riccati-Volterra equation and prove the first main result (Theorem 5.3). In Section 6, we investigate the (strongly regular) solvability of the Riccati-Volterra equation and prove the second main result (Theorem 6.5). Some auxiliary lemmas are proved in Appendix.

Notation
(Ω, F , P) is a complete probability space, and W is a one-dimensional Brownian motion. F = (F t ) t≥0 denotes the P-augmented filtration generated by W . E[·] denotes the expectation. Throughout this paper, E[·] 1/2 denotes the square root of the expectation E[·], not the expectation of the square root. For each 0 ≤ t 0 < T < ∞, we define For each matrix M ∈ R d1×d2 with d 1 , d 2 ∈ N, |M | denotes the Frobenius norm, M ⊤ ∈ R d2×d1 denotes the transpose, M † ∈ R d2×d1 denotes the Moore-Penrose pseudoinverse, and R(M ) denotes the range. For each d ∈ N, S d denotes the set of (d × d)-symmetric matrices. We define R d := R d×1 , that is, each element of R d is understood as a column vector. We denote by ·, · the usual inner product on a Euclidean space. I d denotes the (d × d)-identity matrix. For each set Λ, 1l Λ denotes the indicator function.
For each 0 ≤ t 0 < T < ∞ and d 1 , d 2 ∈ N, we define some spaces of stochastic (and deterministic) processes as follows: is the Hilbert space of R d1×d2 -valued, square-integrable and F-progressively measurable processes on (t 0 , T ).
Throughout this paper, d ∈ N represents the dimension of state processes, and ℓ ∈ N represents the dimension of control processes. We fix a finite terminal time T ∈ (0, ∞).

LQ stochastic Volterra control problems
We define the set of input conditions by I : For each input condition (t 0 , x) ∈ I and control u ∈ U(t 0 , T ), consider the controlled linear SVIE (1.1) and the quadratic cost functional (1.2). The following is the standing assumption of this paper.
• The coefficients: In the standing assumption, the coefficients and the inhomogeneous terms of the controlled SVIE (1.1) are singular and of non-convolution-types. For example, A(t, s) is allowed to diverge as s ↑ t, and the same is true for B, C, D, b and σ. Our framework is more general than [5] (where the coefficients are of non-convolution-types, but B, C and D are essentially regular, and the inhomogeneous terms b and σ do not appear) and [1,2] (where the coefficients are singular, but they are of convolution-types with completely monotone kernels, and the inhomogeneous terms are deterministic). It is also worth to mention that the assumptions for the coefficients C and D being in L 2 (△ 2 (t 0 , T ); R d1×d2 ) fit into the framework of the socalled ⋆-Volterra kernels introduced in [10]. Furthermore, we do not impose any non-negativity conditions on the weighting matrices Q and R at this time.
The LQ stochastic Volterra control problem is stated as follows.
Problem (SVC). For each (t 0 , x) ∈ I, find a control processû ∈ U(t 0 , T ) satisfying For each (t 0 , x) ∈ I, a control processû ∈ U(t 0 , T ) satisfying (2.1) is called an open-loop optimal control at (t 0 , x). We call the map V the value functional of Problem (SVC).
In this paper, we are interested in the closed-loop framework of Problem (SVC). More precisely, we consider the following causal feedback strategies which were introduced in our previous paper [11].
Theorem 2.4. For each causal feedback strategy (Ξ, Γ, v) ∈ S(0, T ) and each input condition (t 0 , x) ∈ I, the controlled SVIE (1.1) has a unique causal feedback solution (X t0,x , Θ t0,x , u t0, Furthermore, there exists a constant K > 0 depending only on A, B, C, D, Ξ, Γ such that Proof. See [11,Theorem 2.4]. Remark 2.5. We emphasize that the causal feedback strategy (Ξ, Γ, v) is chosen to be independent of the input condition (t 0 , x), while the causal feedback solution (X t0,x , Θ t0,x , u t0,x ) depends on (t 0 , x). It is worth to mention that the causal feedback solution satisfies the (generalized) flow property with respect to the input condition (t 0 , x) in a suitable sense. For more detailed discussions, see our previous paper [11].
The purpose of this paper is to investigate the causal feedback optimal strategy defined as follows.
Proof. From the definition of the causal feedback optimality, (i) implies (ii),(iii),(iv) and (v). The implications We only need to show the implication (iv) ⇒ (v). Assume that (iv) holds. Let (t 0 , x) ∈ I and u ∈ U(t 0 , T ) be arbitrary, and denote by (X, Θ) the corresponding state pair. Define v(t) Thus, (v) holds. This completes the proof. Remark 2.8. From the above lemma, if (Ξ,Γ,v) ∈ S(0, T ) is a causal feedback optimal strategy of Problem (SVC), then for any input condition (t 0 , x) ∈ I, the outcome (Ξ,Γ,v)[t 0 , x] ∈ U(t 0 , T ) is an open-loop optimal control of Problem (SVC) at (t 0 , x). Therefore, each causal feedback optimal strategy can be seen as a causal feedback representation of an open-loop optimal control. Note that, even if a state-feedback strategy (Ξ, 0,v) ∈ S(0, T ) (in which the feedbackΓ of the forward state process is absent) is optimal among all state-feedback strategies in the sense that for any (Ξ, 0, v) ∈ S(0, T ) and any (t 0 , x) ∈ I, it must be optimal among all causal feedback strategies in the sense that for any (Ξ, Γ, v) ∈ S(0, T ) and any (t 0 , x) ∈ I.
We will also consider the homogeneous version of Problem (SVC), where the inhomogeneous terms b, σ, q and ρ are absent. In this case, the controlled SVIE (1.1) and the cost functional (1.2) become respectively. We write the homogeneous problem by Problem (SVC) 0 and the corresponding value functional by V 0 (t 0 , x). For each causal feedback strategy (Ξ, Γ, v) ∈ S(0, T ) and input condition (t 0 , x) ∈ I, the corresponding causal feedback solution (X t0,x

Preliminaries
In this section, we summarize the results of our previous work [11]. Specifically, we introduce Type-II extended backward stochastic Volterra integral equations (Type-II EBSVIEs) and Lyapunov-Volterra equations which play fundamental roles in the study of Problem (SVC). For more detailed discussions and proofs, see [11].

Optimal strategies and Riccati-Volterra equations
In this section, we characterize causal feedback optimal strategies of Problem (SVC) by means of a Riccatitype equation. We introduce the following equation (depending only on the coefficients A, B, C, D, Q, R and S): where, for each matrix M , M † denotes the Moore-Penrose pseudoinverse (see [18, Appendix A]). Noting Definition 3.7, the above can be written in the integral form. This is a coupled system of Riccati-type (backward) Volterra integro-differential equations for the pair P = (P (1) , P (2) ) of matrix-valued deterministic functions, and we call it a Riccati-Volterra equation. By a solution to the above Riccati-Volterra equation, we mean a pair P = (P (1) , P (2) ) ∈ Π(0, T ) satisfying (5.1). Similarly to the study on LQ control problems for SDEs [18], we introduce the notions of the regular and the strongly regular solutions to the Riccati-Volterra equation (5.1).
Furthermore, we say that the solution P is strongly regular if there exists a constant λ > 0 such that for a.e. t ∈ (0, T ).
The following is the main theorem of this section.
Corollary 5.5. The homogeneous Problem (SVC) 0 has a causal feedback optimal strategy if and only if the Riccati-Volterra equation (5.1) admits a regular solution P = (P (1) , P (2) ) ∈ Π(0, T ). In this case, the value functional is given by From the above representation formula of the homogeneous value functional V 0 (t 0 , x) and Lemma 3.6, we get the following uniqueness result of the regular solution to the Riccati-Volterra equation.
Corollary 5.6. The Riccati-Volterra equation (5.1) has at most one regular solution.
Remark 5.7. Consider the case where the control does not enter the drift part, that is, B = 0. In this case, the Riccati-Volterra equation (5.1) becomes Thus, if there exists a (strongly) regular solution P = (P (1) , P (2) ) ∈ Π(0, T ) to the above Riccati-Volterra equation, then P (2) (s 1 , s 2 , t) does not depend on the last parameter t ∈ (0, s 1 ∧s 2 ). Furthermore, the functioň Γ in (5.3) vanishes. Therefore, in this case, there exists a (unique) causal feedback optimal strategy (Ξ,Γ,v) of Problem (SVC) withΓ = 0. In other words, the (unique) causal feedback optimal strategy is a statefeedback form in the sense that it does not use the feedback of the forward state process Θ. Furthermore, in the case of the homogeneous Problem (SVC) 0 , the stochastic inhomogeneous termv can be zero. In this case, the (unique) causal feedback optimal strategy is a Markovian state-feedback form in the sense that it is just a deterministic linear functional of the current state. This is a surprising consequence since, even in the homogeneous Problem (SVC) 0 with B = 0, the state process is highly non-Markovian and being non-semimartingale due to the Volterra structure.
Remark 5.8. Very recently, a similar fact as in Remark 5.7 was also found in an independent work of Wang, Yong and Zhou [19], where an LQ stochastic Volterra control problem (involving a terminal cost) was studied in the open-loop framework. In [19], the coefficients A, B, C and D are non-convolution-type but assumed to be regular (i.e. bounded and differentiable), the inhomogeneous terms b, σ, q and ρ are zeros, S = 0, and the weighting matrices Q and R are assumed to be non-negative and strictly positive definite, respectively. By a dynamic programming method and a decoupling technique, they derived a causal feedback represention of the open-loop optimal control by means of a path-dependent (operator-valued) Riccati equation, which is different from our Riccati-Volterra equation (5.1). Compared to [19], our problem is in the closed-loop framework, and the Riccati-Volterra equation (5.1) is a system of integro-differential equations for the (finite-dimensional) kernels P = (P (1) , P (2) ) of a self-adjoint operator P t0 (see Lemma 3.5).
6 Strongly regular solvability of the Riccati-Volterra equation As we have seen in the previous section, any causal feedback optimal strategies of Problem (SVC) are characterized by using the (unique) regular solution of the Riccati-Volterra equation (5.1). Also, the existence of the strongly regular solution, which is stronger than the regular solution, implies the uniqueness of the causal feedback optimal strategy.
In this section, we prove the equivalence between the strongly regular solvability of Riccati-Volterra equation (5.1) and the uniform convexity of the cost functional. Furthermore, we provide a sufficient condition for the two equivalent properties. Definition 6.1. Let (H, · H ) be a Hilbert space, and consider a functional F : H → R. We say that F is uniformly convex if there exists a constant λ > 0 such that, for any u 1 , u 2 ∈ H and µ ∈ [0, 1], it holds that Lemma 6.2. The following are equivalent: is uniformly convex; for any t 0 ∈ [0, T ) and any u ∈ U(t 0 , T ).
The following corollary gives a simple sufficient condition for the uniform convexity of the cost functional. Corollary 6.3. Assume that the following standard condition holds for some λ > 0: Then the cost functional U(0, T ) ∋ u → J 0 (0, 0; u) is uniformly convex.
The following is the main theorem of this section.
and let P i = (P (1) i , P (2) i ) ∈ Π(0, T ) be the solution to the Lyapunov-Volterra equation (4.2) with (Ξ, Γ) = (Ξ i , Γ i ): We observe that the above induction is well-defined by Theorem 3.10, together with the last assertion in Lemma 6.2. Furthermore, for any i ∈ N, we have R(t) + (D ⊤ ⋉P i ⋊D)(t) ≥ λI ℓ for a.e. t ∈ (0, T ), and for any (t 0 , x) ∈ I. We shall show that {P i } i∈N converges (in a suitable sense) to the strongly regular solution of the Riccati-Volterra equation (5.1).
Remark 6.6. The above proof shows that the sequence of the solutions of the Lyapunov-Volterra equations (6.5) converges to the strongly regular solution of the Riccati-Volterra equation (5.1). This fact is useful in view of the numerical approximations of the (unique) causal feedback optimal strategy and the value functional.
Combining Corollary 5.4, Corollary 6.3 and Theorem 6.5, we get the following consequence.