A maximum principle for a stochastic control problem with multiple random terminal times

In the present paper we derive, via a backward induction technique, and ad hoc maximum principle for an optimal control problem with multiple random terminal times. Therefore we apply the aforementioned result to the case of a linear quadratic controller, providing solutions for the optimal control in terms of Riccati backward SDE with random terminal time. Eventually all the above results are applied to a system of interconnected banks.


Introduction
In the last decades stochastic optimal control theory has received an increasing attention by the mathematical community, also in connection with several concrete applications, spanning from industry to finance, from biology to crowd dyamics, etc. In all of above applications a rigorous theory of stochastic optimal control (SOC), under suitable assumption on the source of random noise, revealed to be a fundamental point.
To this aim different theoretical approaches have been developed. They can be broadly divided into two classes: partial differential equations (PDE) Introduction general form dX i (t) = µ i (t, X i (t), α i (t))dt + σ i (t, X i (t), α i (t))dW i (t) , i = 1, . . . , n , under standard assumptions of Lipschitz coefficients µ i and σ i with at most linear growth, being α i the control. The notation will be specified in detail within subsequent sections.
Then we assume that the system, instead of being stopped as soon as the stopping time τ is triggered, continues to evolve according to a new system of SDEs written as follows for some new coefficients µ i 1 and σ i 1 again satisfying standard assumptions of linear growth and Lipschitz continuity. In particular, we will assume that, according to the triggered stopping time the k−th component in equation (1) has been set to 0, according to rigorous definitions later specified. Then, we again aim at minimizing a functional of the form J 1 (x 1 , α) = E τ 1 τ L 1 (t, X 1 (t), α 1 (t))dt + G 1 (τ 1 , X 1 (τ 1 )) , with the same notation used before, τ 1 being a new stopping time. We repeat such a scheme for a series of n stopping times. Moreover, in complete generality, we assume that the order of the random times is not know a priori, hence forcing us to consider all possible combinations of random events with associated all the possible combinations of driving SDEs.
The main result of the present paper consists in deriving a stochastic maximum principle, both in necessary and sufficient form, for the whole series of control problems stated above.
Clearly, we cannot expect that the global optimal solution is given by gluing each optimal control between two consecutive stopping times. Instead, we will tackle the problem following a dynamic programming principle approach, as exploited, e.g., in [32]. In particular, we will solve the problem backward. Therefore, the case of all stopping times but one have been triggered is considered first, then we consider the problem with two random events left, etc., until the very first control problem. Following this scheme, we are able to provide the global optimal solution recursively, so that the k−th optimal control problem depends on the (k + 1)−th optimal solution. We remark that altough the backward approach has been used in literature, see, e.g. [26,32], to the best of our knowledge the present work is the first one using such techniques where stopping times are defined as hitting times.
After having derived the main result, i.e. the aforementioned maximum principle, we will consider the particular case of a linear-quadratic control problem, that is we assume the underlying dynamics to be linear in both the state variable and the control, with quadratic costs to be minimized. Such type of problems have been widely studied both from a theoretical and practical point of view since they often allow to obtain closed form solution for the optimal control.
In particular, usually one can write the solution to a linear-quadratic control problem in terms of the solution of a Riccati backward ordinary differential equation (ODE), hence reducing the original linear-quadratic stochastic control problem to the solution of a simpler ODE, see, e.g., [37] and [33, Section 6.6], for possible financial applications. Let us recall that, considering either random coefficients for the driving equation or random terminal time in the control problem, the latter case being the one here treated, the backward Riccati ODE becomes a Riccati BSDE, see, e.g., [20,21,24,25].
We stress that the results derived in the present paper find natural applications in many areas related to mathematical finance, and mainly related to systemic risk, where after recent credit crisis, the assumption of possibile failures has become the main ingredient in many robust financial models. Also, network models have seen an increasing mathematical attention during last years, as witnessed by the developmend of several ad hoc techniques derived to consider general dynamics on networks. We refer the interested reader to [13,14,16],for general results on network models, and to [23] for a financially oriented treatment.
In particular, these models have proved to be particularly suitable if one is to consider a system of interconnected banks. Following thus the approach of [10,17,28], results derived in the present work can be successfully applied to a system of n interconnected banks, lending and borrowing money. As in [10,15] one can assume the presence of an external controller, typically called lender of last resort (LOLR), who actively supervises the banks' system and possibly lending money to actors in needs. A standard assumption is that the LOLR lends money in order to optimize a given quadratic functional. Therefore, modelling the system as in [15], we recover a linear-quadratic setting allowing us to apply results obtained in the present work.
The paper is organized as follows: in Section 2 we introduce the general setting, clarifying main assumptions; then, Section 2.1 is devoted to the proof of the necessary maximum principle, whereas in Section 2.2 we will prove the sufficient maxim principle; at last, in Section 3, we apply previous results to the case of a linear-quadratic control problems also deriving the global solution by an interative scheme to solve a system of Riccati BSDEs.

The general setting
Let n ∈ N and T < ∞ a fixed terminal time and let us consider a standard complete filtered probability space Ω, F , (F t ) t∈[0,T ] , P satisfying usual assumptions.
In what follows we are going to consider a controlled system of n SDEs, for t ∈ [0, T ] and i = 1, . . . , n, evolvong has folllows dX i;0 (t) = µ i;0 (t, X i;0 (t), α i;0 (t)) dt + σ i;0 (t, X i;0 (t), α i;0 (t)) dW i (t) , where W i (t) is a standard Brownian motion, α i;0 being the control. In particular, we assume where A i ⊂ R is assumed to be convex and closed, and we have denoted by In what follows we will assume the following assumptions to hold.
be measurable functions and suppose that there exits a constant C > 0 such that, for any x, y ∈ R, for any a ∈ A and for any t ∈ [0, T ], it holds We thus assume the coefficients µ i;0 and σ i;0 , for i = 1, . . . , n, in equation (2), satisfy assumptions 2.1. Thererfore, we have that there exists a unique strong solution to equation (2), see, e.g., [19,33].
Remark 2.2. In equation (2) we have considered an R−valued SDE, neverthelesse what follows still holds if we consider a system of SDEs, each of which takes values in R m i , m i ∈ N, i = 1, . . . , n.
Remark 2.4. From a practical point of view, we are considering a controller that aims at supervise n different elements defininf a system, up to the first time one of the element of it exits from a given domain. From a financial perspective, each element represents a financial agent, while the stopping time denotes its failure time. Hence, a possible cost to be optimized, as we shall see in Section 3, is to maximize the distance between the element/financial agent from the associated stopping/default boundary.
As briefly mentioned in the introduction, instead of stopping the overall control problem when the first stopping time is triggered, we assume that the system continues to evolve according to a (possibly) new dynamic. As to make an example, let us consider the case ofτ 1 ≡τ k;0 , that is the first process to hit the stopping boundary is X k;0 . We thus set to 0 the k−th component of X 0 , then considering the new process X k (t) = X 1;k (t), . . . , X k−1;k (t), 0, X k+1;k (t), . . . , X k;n (t) , with control given by where the superscript k denotes that the k−th component hit the stopping boundary and therefore has been set to 0.
Then we minimize the following functional where L k and G k are assumed to satisfy assumptions 2.3, whileτ 2 is a stopping time triggered as soon as X k hits a defined boundary. In particular, we define the stopping boundary and, following the same scheme as before, we define by the first time X i;k reaches the boundary v i;k , denotinĝ It folllows that, considering for instance the case τ l;k has been triggered by X l;k , we haveτ 2 ≡τ l;k , meaning that v l;k has been hit. Iteratively proceeding, we consequently define again assuming X (k,l) (t) evolves according to a system as in (3), and so on until either no nodes are left or the terminal time T is reached.
As mentioned above, one of the major novelty of the present work consists in not assuming the knowledge of the stopping times order. From a mathematical point of view, the latter implies that we have to consider all the possible combinations of such critical points during a given time interval [0, T ]. Let us note that this is in fact the natural setting to work with having in mind the modelling of concrete scenarios, as happens, e.g., concerning possible multiple failures happening within a system of interconnected banks.
Therefore, in what follows we are going to denote by C n,k the combinations of k elements from a set of n, while π k ∈ C n,k stands for one of those element. Hence, exploiting the notation introduced above, we define the process X = (X(t)) t∈[0,T ] as where each X π k (t) is defined as above and, consequently, the the global control reads as follow Remark 2.5. Let us underlined within the setting defined so far, each stopping timeτ k depends on previously triggered stopping times τ π j , j = 1, . . . , k − 1.
As a consequence, also the solution X π k in (7) depends on triggered stopping times as well as on their order. To simplify notation, we have avoided to explicitly write such dependencies, defining for short By equation (7) we have that the dynamic for X is given by where, according to the above introduced notation, we have defined aiming at minimizing the following functional L and G being defined as Remark 2.6. It is worth to mention that we are considering the sums stated above as to be done over all possible combinations, hence implying we are not considering components' order, namely considering X (k,l) = X (l,k) . Dropping such an assumption implies that the sums in equations (7)-(8)-(10) have to be considered over the disposition D n,k .
In what follows we shall give an example of the theory developed so far, as to better clarify our approach as well as its concrete applicability.
Example 2.1. Let us consider the case of a system constituted by just n = 2 components. Then equation (7) becomes where X 0 (t), resp. X 1 (t), resp. X 2 (t), denotes the dynamics in case neither 1 nor 2 has hit the stopping boundary, resp. 1 has, resp. 2 has.

A necessary maximum principle
The main issue in solving the optimal control problem defined in Section 2 consists in solving a series of connected optimal problems, each of which may depends on previous ones. Moreover, we do not assume to have an a priori knowledge about the stopping times' order.
To overcome such issues, we consider a backward approach. In particular, we first solve the last control problem, then proceeding with the penultimate, and so on, until the first one, via backward induction. Let us underline that assuming the perfect knowledge of the stopping times' order would imply a simplification of the backward scheme, because of the need to solve only n control problems, then saving us to take into account all the combinations. Nevertheless in one case as in the other, the backward procedure runs analogously.
Aiming at deriving a global maximum principle, in what follows we denote by ∂ x the partial derivative w.r.t. the space variable x ∈ R n and by ∂ a the partial derivative w.r.t. the control a ∈ A n . Moreover we assume Assumptions 2.7. (i) For any π k ∈ C n,k , k = 1, . . . , n, it holds that B π k and Σ π k are continuously differentiable w.r.t. to both x ∈ R n and to a ∈ A. Furthermore, there exists a constant C 1 > 0 such that for any t ∈ [0, T ], x ∈ R n and a ∈ A, it holds (ii) For any π k ∈ C n,k , k = 1, . . . , n, it holds that L π k , resp. G π k , is continuously differentiable w.r.t. to both x ∈ R n and a ∈ A n , resp. only w.r.t. x ∈ R n . Furthermore, there exists a constant C 2 > 0 such that for any t ∈ [0, T ], x ∈ R n and a ∈ A n , it holds We thus have the following result. hold and let X ,ᾱ be an optimal pair for the problem (9)-(11), then it holds where the pair (Y (t), Z(t)) solves the following dual backward equation being solutions of the following system of interconnected BSDEs having denoted bȳ where H π k is the generalized Hamiltonian and H represents the global generalized Hamiltonian defined as Remark 2.9. Before entering into details about proving Theorem 2.8, let us underline some of its characteristics. In particular, here the main idea is to find a solution iteratively acting backward in time. Therefore, starting from the very last control problem, namely the case where a single node is left into the system, we consider a standard maximum principle. Indeed, Y π n−1 in (13) represents a classical dual BSDE form associated to the standard stochastic maximum principle, see, e.g., [37,Th. 3.2]. Then, we can consider the second last control problem. A this point, a naive tentative to obtain a global solution, could be to first solve such penultimate problem to then gluing together the obtained solutions. Nevertheless, such a method only produces a a suboptimal solution. Instead, the right approach, similarly to what happens applying the standard dynamic programming principle, consists in treating the solution to the last control problem as the terminal cost for the subsequent (second last) control problem, and so on for the remaining ones. It follows that, in deriving the global optimal solution, one considers the cost coming from future evolution of the system. Mathematically, this is clearly expressed by the terminal condition Y π k the equation (13) is endowed with. Therefore the solution scheme resulting in a global connection of all the control problems we have to consider, from the very last of them and then backward to the first one.
to be the optimal control, α being another admissible control and further setting α h as α h :=ᾱ + hα , h > 0 .
Since in the present case the cost functional reads as follow we can choose α =ᾱ −α,α ∈ A. Then, by the optimality ofᾱ and via a standard variational argument, see, e.g., [6,31,37], we have In what follows, for the sake of clarity, we will denote by X α the solution X with control α. Thus, from the optimality ofᾱ, we have Then, for any α ∈ A , by (15), we obtain where Z π n−1 and Z π n−2 solve the first variation process Applying Itô formula to Y π n−2 · Z π n−2 , we have and similarly for Y π n−1 · Z π n−1 , we obtain Exploiting equation (16), together with equations (17)-(18), we thus have ∂ α H π n−2 (t,X π n−2 (t),ᾱ π n−2 (t),Ȳ π n−2 (t),Z π n−2 (t)) α π n−2 (t)dt+ + π n−1 ∈Cn,n−1 for all α =ᾱ −α, and thus we eventually obtain, for t 0 >τ n−2 which is the desired local form for optimality (12). Analogously proceeding via backward induction, we derive that the same results also hold for any π k ∈ C n,k , hence obtaining the system (13) and concluding the proof.

A sufficient maximum principle
In this section we consider a generalization of the classical sufficient maximum principle, see, e.g., [33,Th. 6.4.6], for the present setting of interconnected multiple optimal control problems with random terminal time. To this end, we assume Assumptions 2.10. For any π k ∈ C n,k the derivative w.r.t. x of B, Σ and L are continuous and there exists a constant L a > 0 such that, for any a 1 , a 2 ∈ A, |B π k (t, x, a 1 ) − B π k (t, x, a 2 )| + |Σ π k (t, x, a 1 ) − Σ π k (t, x, a 2 )|+ + |L π k (t, x, a 1 ) − L π k (t, x, a 2 )| ≤ L a |a 1 − a 2 | . (ii) the maps (x, a) → H π k x, a, Y π k , Z π k are convex for a.e. t ∈ [0, T ] and for any π k ; (iii) for a.e. t ∈ [0, T ] and P−a.s. it holds α π k (t) = arg miñ α π k ∈A π k H π k t, X π k (t),α(t), Y π k , Z π k , then ᾱ,X is an optimal pair for the problem (9)-(11).
Proof. Let us proceed as in the proof of Theorem 2.8, namely via backward induction. For t 0 >τ n−1 the proof follows from the standard sufficient stochastic maximum principle, see, e.g., [37,Th. 5.2].
Let us thus then consider the case ofτ n−2 < t 0 <τ n−1 , denoting by ∆X π k (t) :=X π k (t) − X π k (t) and, for the sake of clarity, by using similar notations for any other function.

The linear-quadratic problem
In the present section we consider a particular case for the control problem stated in Section 2.1-2.2. In particular, we will assume that the dynamic of the state equation is linear in both the space and the control variable. Moreover, we impose that the control enters (linearly) only in the drift and that the cost functional is quadratic and of a specific form. More precisely, let us first consider µ 0 (t) as the n × n matrix defined as follows µ 0 (t) := diag[µ 1;0 (t), . . . , µ n;0 (t)] , that is the matrix with µ i;0 (t) entry on the diagonal and null off-diagonal, µ i;0 : [0, T ] → R being a deterministic and bounded function of the time.
Let us also define the n × n matrix Σ 0 , to be independent of the control, as follows Same assumptions of linearity holds for any other coefficients B π k and Σ π k , so that, using the same notation introduced along previous sections, we consider the system dX(t) = B(t, X(t), α(t))dt + Σ(t, X(t))dW (t) , where both the drift and the volatility coefficients are now assumed to be linear. In the present (particular) setting, both the running and the terminal cost are assumed to be suitable quadratic weighted averages of the distance from the stopping boundaries, namely we set for some given weights γ π k such that γ π k = (γ π k 1 , . . . , γ π k n ) T .
Remark 3.1. From a financial perspective, converting the minimization problem into a maximization one, the above cost functional can be seen as a financial supervisor, such as the one introduced in [10,15], aiming at lending money to each node (e.g., a bank, a financial player, an institution, etc.) in the system to avert it from the corresponding (default) boundary. Continuing the financial interpretation, different weights γ can be used to assign to any node a relative importance. This allows to establish a hierarchy of (financial) relevance within the system, resulting in a priority scale related to the systemic (monetary) importance took on by each node. As to give an example, in [15] a systematic procedure has been derived to obtain the overall importance of any node in a financial network.
In what follows, we derive a set of Riccati BSDEs to provide the global optimal control in feedback form. For the sake of notation clarity, we denote by X k;−k (t) the dynamics when only the k−th node is left. Similarly, X k;−(k,l) (t), resp. X l;−(k,l) (t), denotes the evolution of the node k, resp. of the node l, when this pair (k, l) survives. Analogously, we will make use of a componentwise notation, namely X i;−k will denote the i−th component of th n−dimensional vector X −k . According to such a notation, we have the following Theorem 3.2. The optimal control problem (27), with associated costs given by (28), has an optimal feedback control solution given bȳ where P and ϕ are defined as follows P π k and ϕ π k being solution to the following recursive system of Riccati BSDEs Proof. Let us thus first consider the last control problem, recalling that H −k (t, x, a, y, z) is the generalized Hamiltonian defined in (14), where B −k , resp. Σ −k , resp. L −k , is given in equation (25), resp. equation (26), resp. equation (28). An application of the stochastic maximum principle, see The-orems 2.8-2.11, leads us to consider the following adjoint BSDE Y −k being a n−dimensional vector, whereas Z −k is a n × n matrix whose (i, j)−entry is denoted by Z −k i,j . Then, considering the particular form for where ∂ x i denotes the derivative w.r.t. the i−th component of x ∈ R n . Thus we have that the k−th component of the BSDE (30) now reads Analogously, we have that the second last control problem is associated to the following system of BSDEs and so on for any π k , until we reach the first control problem with associated the following BSDEs system Therefore, for t ∈ [0,τ n ], we are left with the minimization problem for Exploiting Theorem 2.8, we have that, on the interval [τ π n−1 ,τ n ], the above control problem is associated to the following forward-backward system In what follows, for the sake of brevity, we will drop the index (k; −k). Therefore, until otherwise specified, we will write X instead of X k;−k , and similarly for any other coefficients. We also recall that system (47) has to be solved for any k = 1, . . . , n.
We thus guess the solution of the backward component Y in equation for P and ϕ two R−valued processes to be determined. Notice that in standard cases, that is when the coefficients are not random or the terminal time is deterministic, P and ϕ solve a backward ODE, while in the present case, because of the terminal time randomness, P and ϕ will solve a BSDE.
Let us thus assume that (P (t), Z P (t)) is the solution to − dP (t) = F P (t)dt − Z P (t)dW (t) , P (τ n ) = 1 , and that (ϕ(t), Z ϕ (t)) solves From the first order condition, namely ∂ a H(t, x, a, y, z) = 0, we have that the optimal control is given bȳ An application of Itô formula yields Therefore, equating the left hand side and the right hand side of equation moreover, by substituting equation (40) into the left hand side of equation (39), exploiting the first order optimality condition (38), and equating again the left hand side and the right hand side of equation (39), we obtain Since equation (41) has to hold for any X(t), we have which, after some computations, leads to F P (t) = P (t) 2 + σ 2 (t)P (t) + 2Z P (t)σ(t) − 1 .
Similarly, we also have that hence using the particular form for the generator F P , resp. of F ϕ , stated in equation (43), resp. in equation (44), in equation (36), resp. in equation (37), and reintroducing, for the sake of clarity, the index k , the last optimal controlᾱ k;−k (t) reads as follow α k;−k (t) = P k;−k (t)X k;−k (t) − ϕ k;−k (t) , P k;−k (t) and ϕ k;−k (t) being solutions to the BSDEs where we have introduced the function Notice that, from equation (46), we have that ϕ is a BSDE with linear generator, so that its solution is explicitly given by where Γ solves Moreover, by [36,Th. 5.2,Th. 5.3], it follows that equation (45) admits a unique adapted solution on [0,τ n ]. Therefore, iterating the above analysis for any k = 1, . . . , n, we gain the optimal solution to the last control problem. Having solved the last control problem, we can consider the second last control problem. Assuming, with no loss of generality, that nodes (k, l) are left, all subsequent computation has to be carried out for any possible couple k = 1, . . . , n, l = k + 1, . . . , n.
By Theorem 2.8, the optimal pair X i ,ᾱ i , i = k, l, satisfies, componentwise, the following forward-backward system for i = k, l, in what follows we will denote by Z j the j−th n−dimensional column of Z in equation (32). Note that the only non null entries of Z will be Z i,j , for i, j = k, l. Also, for the sake of simplicity, we will avoid to use the notation X i;−(k,l) , i = k, l, only using X i , i = k, l, instead.
Mimicking the same method earlier used, we again guess the solution of the backward component Y i to be of the form for P i and ϕ i , i = k, l, a R−valued process. Because of the particular form of equation (47), the i−th component of the BSDE Y depends only on the i−th component of the forward SDE X, the matrix P has null entry off the main diagonal, namely it has the form , similarly for ϕ. Let us assume that (P i (t), Z i;P (t)), i = k, l solves and that (ϕ i (t), Z i;ϕ (t)) solves From the first order condition we have that the optimal control is of the Then, again applying the Itô formula, we have Z i;ϕ j (t)dW j (t) = =   −F i;P (t) + P i (t)µ i (t) + l j=k Z i;P j (t)ρ ij σ i (t)   X i (t)dt + P i (t)α i (t)dt+ + F i;ϕ (t)dt + k j=l Z i;P j (t)ρ ij ν i (t)dt + P i (t)b i (t)dt+ + Z i;P i (t) + P i (t)σ i (t) X i (t)dW i (t) + l j=k j =i Z i;P j (t)X i (t)dW j (t) + P i (t)ν i (t) − l j=k Z i;ϕ j (t)dW j (t) . (50) Thus, substituting equation (49) into equation (50), and proceeding as for (42), we have with Turning back, for the sake of clarity, to use the extended notation dropped before, we have thatᾱ i;−(k,l) (t), i = k, l, is given bȳ with h i;−(k,l) (P k;−(k,l) (t), , v k;−(k,l) (t)) = k j=l Z i;P j (t)ρ ij ν i;−(k,l) dt + P i;−(k,l) (t)ν i;−(k,l) (t)dt+ + γ i;−(k,l) v i;−(k,l) (t) + σ i (t)ν i;−(k,l) (t)P i;−(k,l) (t) .