A Deterministic Linear Quadratic Time-Inconsistent Optimal Control Problem

A time-inconsistent optimal control problem is formulated and studied for a controlled linear ordinary differential equation with quadratic cost functional. A notion of equilibrium control is introduced, which can be regarded as a time-consistent solution to the original time-inconsistent problem. Under certain conditions, we constructively prove the existence of such an equilibrium control which is represented via a forward ordinary differential equation coupled with a backward Riccati--Volterra integral equation. Our constructive approach is based on the introduction of a family of $N$-person non-cooperative differential games.


Introduction -Time-Consistency Issue.
We begin with a classical optimal control problem for an ordinary differential equation (ODE, for short). Let T > 0. For any initial pair (t, x) ∈ [0, T ) × R n , consider the following controlled ODE:   Ẋ (s) = b(s, X(s), u(s)), s ∈ [t, T ], where b : [0, T ] × R n × U → R n is a given map, u(·), a function valued in some metric space U , is called a control, and X(·) is called the state trajectory. We denote Under some mild conditions, for any initial pair (t, x) ∈ [0, T ) × R n , and u(·) ∈ U [t, T ], (1.1) admits a unique solution X(·) ≡ X(· ; t, x, u(·)). Then we can introduce the following cost functional which measures the performance of the control u(·): J(t, x; u(·)) = T t g(s, X(s; t, x, u(·)), u(s))ds + h(X(T ; t, x, u(·))), 3) e − s t c(r,X(r),u(r))dr g(s, X(s), u(s))ds + e − T t c(r,X(r),u(r))dr h(X(T )), (1.5) with c(·) being some map taking nonnegative values, which may be called a discount map. A special case is c(·) = δ > 0, a positive constant (which is call a discount rate). Due to its form, the term e − s t c(r,X(r),u(r))dr is called an exponential discounting. If we introduce   Ẋ 0 (s) = c(s, X(s), u(s)), s ∈ [t, T ], X 0 (t) = 0, (1.6) and regard X 0 (·) an additional component of the state, then the state equation is augmented by one dimension and the cost functional becomes J(t, x; u(·)) = T t e −X 0 (s) g(s, X(s), u(s))ds + e −X 0 (T ) h(X(T )). (1.7) which is of form (1.3). Therefore, an optimal control problem with an exponential discounting can be transformed to an optimal control problem without exponential discounting. In another word, containing an exponential discounting in the cost functional does not make the original problem mathematically more general.
However, common sense tells us that the time-consistency issue in real life is actually never so simple. There are two main reasons: First, as time goes by, the environment (in the broad sense) is changing, for example, invention of new technology, new limits of resource allocation, etc., and therefore the controlled system has to be modified according to the new initial pairs; and secondly, people keep changing minds/objectives, which leads to the change of cost functional. Due to these changes, one expects some dramatic changes in the formulation of optimal control problems, as well as the solutions to the problems.
To make our statement more appealing from mathematical point of view, let us look at a very simple illustrative example. Consider a one-dimensional controlled ODE: with cost functional where h : [0, T ] → [δ, ∞), for some δ > 0, and U = R. We pose the following optimal control problem.
J(t, x; u(·)). (1.14) Note that the above problem looks like a simple standard linear quadratic optimal control problem (LQ problem, for short), except that the terminal weight h(t) depends on the parameter t (which is the initial time of the problem).
It is clear that for any initial pair (t, x) ∈ [0, T ) × R, u(·) → J(t, x; u(·)) is convex and coercive. Thus, there exists a unique optimal control for Problem (C). We can show that (see the Appendix) the optimal control of Problem (C) is given bȳ (1.15) and the corresponding optimal trajectory is given bȳ We can show that (1. 18) This tells us that the restriction ofū(· ; t, x) on [τ, T ] is not optimal for Problem (C) with initial pair (τ,X(τ ; t, x)), in general. Such a phenomenon is called time-inconsistency.
In general, for any given initial pair (t, x) ∈ [0, T ) × R n , we can consider the following controlled system: x, s, X(s), u(s)), s ∈ [t, T ], with the cost functional: x, s, X(s), u(s))ds + h(t, x, X(T ) Such a dependence allows us to catch some situations that people will modify the control system and/or the cost functional at different initial pair. Clearly, our setting is much more general than [5]. Naturally, one could pose the following optimal control problem.
It is clear that Problem (C) is a special case of Problem (N). Hence, Problem (N) is timeinconsistent, in general. Any optimal controlū(·) ∈ U [t, T ] of Problem (N) is referred to as a pre-committed optimal control on [t, T ]. Due to the time-inconsistency, finding an optimal control u(·) ∈ U [t, T ] for Problem (N) (assuming it exists) might not be very useful (if it is not useless) in long run. Hence, Problem (N) is natural, but is a little too naive.
In this paper, we will concentrate on a linear-quadratic time-inconsistent control problem. We will present a time-consistent solution via a "sophisticated" approach. The main idea comes from the works [23], [21], [20], and [7]. Here is a brief description. Take a partition ∆ : 0 = t 0 < t 1 < · · · < t N = T of the time interval [0, T ]. Consider an N -person non-cooperative differential game: the k-th player (which may be called self-k) starts the game from the initial pair (t k−1 , X(t k−1 )) and controls the system on [t k−1 , t k ], to minimize his own cost functional. At t = t k , the next player (the (k + 1)-th player, or self-(k + 1)) takes over, starting from the initial pair (t k , X k (t k )) which is the terminal pair of the k-th player, and controlling the system on [t k , t k+1 ], etc. Each player knows that the later players will do their best, and will modify their control systems as well as their cost functionals. However, in measuring the performance of the controls, each player will discount the cost/payoff in his/her own way. This is the main issue in handling the time-inconsistency, and it also has to be treated this way so that the results can recover those for exponential discounting situations. It is expected that as the mesh size ∆ ≡ max{t k − t k−1 1 ≤ k ≤ N } → 0, the Nash equilibrium strategy to the N -person differential game should approach to the desired time-consistent solution of the original time-inconsistent Problem (N).
The rest of the paper is organized as follow. In section 2, we collect some preliminary results, mainly some careful estimates relevant to our time-inconsistent optimal control problem. Section 3 is devoted to a study of N -person differential game. In Section 4, we will discuss the convergence of Nash equilibrium value function for the N -person differential game, as well as a sufficient condition for the existence of time-consistent equilibrium control for Problem (N). Finally, a time-inconsistent LQ problem will be presented.

N -Person Differential Games
Consider the following linear controlled ODE parameterized by (t, with the cost functional Here A, B, Q, R and G are some given suitable maps. Let ∆ be a partition of [0, T ] given by We now introduce an N -person differential game associated with ∆. These N players are labeled by k = 1, 2, · · · , N . The k-th player chooses controls from be the solution to the following: The k-th player has the following cost functional: (2.5) For any x ∈ R n and any partition ∆ of [0, T ], we now pose the following problem.
We now introduce the following assumptions. (2.7) and The maps G(·), Q(· , ·), and R(· , ·) satisfy the following: For any partition ∆ of [0, T ], we denote Our first result is the following.
Proof. Let x ∈ R n and ∆ : 0 = t 0 < t 1 < · · · < t N = T be given. Let (X ∆ (·),ū ∆ (·)) be an equilibrium pair of Problem (LQ ∆ ). Then the restriction of which and with the cost functional 19) where G N = G(t N −1 ). To study this LQ problem, we consider the following state equation: For such an LQ problem on [t, t N ], under (H1), there exists a unique optimal control which must have the following form: where P ∆ (·) is the unique solution to the following Riccati equation: 23) andX N (·) is the solution to the following closed-loop state equation: (2.24) Let Φ ∆ (· ; t) be the solution to the following: (2.28) Since y ∈ R n can be arbitrarily chosen, we have (2.29) Also, by the optimality ofū N (·), we have P ∆ (t)y, y = J N (t, y;ū N (·)) ≤ J N (t, y; 0) where X 0 (·) is the solution to the following: and P ∆ 0 (·) is the solution to the following Lyapunov equation: which can be represented by the following: with Φ ∆ 0 (· ; t) being the solution to the following: (2.34) Note that Φ ∆ 0 (· ; t) can be defined for any t ∈ [0, t N ), which will be used below. Hence, It is also clear that the restriction of the equilibrium pair (X ∆ (·),ū ∆ (·)) on (t N −1 , t N ] admits the following representation: Next, for Player (N − 1), inspired by the above, we consider the following state equation: (2.38) Thus, ( X ∆ N −1 (·), u ∆ N −1 (·)) is the optimal pair for Player N starting from the initial pair (t N −1 , X N −1 (t N −1 )). The cost functional for the LQ problem of Player (N − 1) on [t, t N −1 ] is taken to be

(2.40)
For such an LQ problem (on [t, t N −1 ]), under (H1), the optimal control is given bȳ where P ∆ (·) is the solution to the following Riccati equation: andX N −1 (·) is the solution to the following closed-loop state equation: (2.43) Now, similar to (2.25), for t ∈ [t N −2 , t N −1 ], let Φ ∆ (· ; t) be the solution to the following:  (2.46) and the optimal pair (X N −1 (·),ū N −1 (·)) of the LQ problem associated with (2.37) and (2.39) (on [t, t N −1 ]) is given by the following: Hence, the restriction of the equilibrium pair (X ∆ (·),ū ∆ (·)) on [t N −2 , t N ] admits the following representation: +Ψ ∆ (s; t) T R(t N −2 , s)Ψ ∆ (s; t) ds y, y . (2.49) Since y ∈ R n can be arbitrarily chosen, we have (2.50) Also, by the optimality ofū N −1 (·), we have where X 0 (·) is the solution to the following: and P ∆ 0 (·) is the solution to the following Lyapunov equation: which, similar to the above, admits the following representation: Then one can apply induction to complete the proof.

Time-Consistent Solutions
We now pose the following problem.
The following gives a weaker notion of time-consistent solutions to Problem (LQ).
Our next goal is to find the limit as the mesh size ∆ of ∆ approaches to zero. For this, we need (H2).
Simple calculation shows that The optimal control trajectory is the solution to the following closed-loop system   Ẋ (s) = −P (s; t)X(s), s ∈ [t, T ], which is given byX and the optimal control is given bȳ then the optimal value function (for fixed t) is given by J(t; τ, y; u(·)) = J(t; τ, y;ū(·)) = P (τ ; t)y 2 Next, let τ ∈ (t, T ), we consider Problem (C) on [τ, T ] with initial state The same as above, we see that the corresponding solution to the Riccati equation is given by (A.10) However, J(τ, y;ū(·)) = T τū (s) 2 ds + h(τ )X(T ; τ, y,ū(·)) 2 This shows that the restriction ofū(· ; t, x) on [τ, T ] is not necessarily optimal for Problem (C) with initial pair (τ,X(τ ; t, x)). Hence, Problem (C) is time-inconsistent. with cost functional and cost functional The corresponding Riccati differential equation reads Simple calculation shows that The optimal control trajectory is the solution to the following closed-loop system which is given bȳ 4) and the optimal control is given bȳ with the cost functional For the corresponding LQ problem, the Riccati equation is (A.1) Simple calculation shows that The optimal trajectory is the solution to the following closed-loop system   which is given byX 4) and the optimal control is given bȳ with the cost functional For the corresponding LQ problem, the Riccati equation is (A.1) Simple calculation shows that The optimal trajectory is the solution to the following closed-loop system   which is given byX and the optimal control is given bȳ Now, if we let J(t; τ, y; u(·)) = T τ u(s) 2 ds + h(t)X(T ; τ, y, u(·)) 2 , τ ∈ [t, T ], (A.6) then the optimal value function (for fixed t) is given by J(t; τ, y; u(·)) = J(t; τ, y;ū(·)) = P (τ ; t)y 2 = h(t) 1 + h(t)(T − τ ) y 2 , ∀(τ, y) ∈ [t, T ] × R. (A.7) Next, let τ ∈ (t, T ), we consider Problem (C) on [τ, T ] with initial state unless h(τ ) = h(t) or x = 0.