Subgame-perfect equilibrium strategies for time-inconsistent recursive stochastic control problems

We study time-inconsistent recursive stochastic control problems. Since for this class of problems classical optimal controls may fail to exist or to be relevant in practice, we focus on subgame-perfect equilibrium policies. The approach followed in our work relies on the stochastic maximum principle: we adapt the classical spike variation technique to obtain a characterization of equilibrium strategies in terms of a generalized second-order Hamiltonian function defined through a pair of backward stochastic differential equations. The theoretical results are applied in the financial field to finite horizon investment-consumption policies with non-exponential actualization.


Introduction
In this paper, we study time-inconsistent recursive stochastic control problems where the notion of optimality is defined by means of subgame-perfect equilibrium.In a continuous-time setting, such controls have been introduced in Ekeland and Lazrak (2006) and Ekeland and Pirvu (2008), later completed in Ekeland et al. (2012), and can be thought of as "infinitesimally optimal via spike variation": i.e., they are optimal with respect to a penalty represented by deviations during an infinitesimal amount of time.
In particular, in Ekeland and Pirvu (2008), the authors apply the classical (Pontryagin) maximum principle theory of Yong and Zhou (1999) to deal with the linear Merton portfolio management problem in the context of pseudo-exponential actualization, introducing the concept of subgame-perfect equilibrium policy as a notion to ensure the time-consistency of the portfolio strategy-possibly not unique.They arrive at equivalent formulations in terms of ODEs and integral equations due to the special form of discounting.
The first aim of this paper is to perform similar operations for a more general control problem in the context of recursive utilities and we apply our results to the financial sphere.The approach followed in our work is inspired by Ekeland and Pirvu (2008) and Hu (2017) and relies on the stochastic maximum principle; see also Peng (1990), Peng (1993) and Briand et al. (2003).We adapt the classical spike variation technique to obtain a characterization of closed-loop equilibrium strategies in terms of a generalized Hamiltonian function H defined through a pair of backward stochastic differential equations (BSDEs).Our generalized Hamiltonian function, compared with the classical one, contains the driver coefficient of the recursive utility (which has more variables than its analogue in the classical case) and involves a second-order stochastic process.
We emphasize that, similarly to the classical case, equilibrium strategies are characterized through both a necessary condition and a sufficient condition involving the generalized Hamiltonian function; whereas, contrary to the classical case, this sufficient condition works even in the absence of extra convexity assumptions.Also, we point out that the spike variation technique, explicitly required by the definition of equilibrium policy, applies indiscriminately to the case in which the control domain U satisfies particular geometric conditions such as convexity or linearity and to more general cases; see Ji and Zhou (2006) for a different approach (a terminal perturbation method) which is applicable only to the case of time-consistent optimal controls in the classical sense.
Later on, the theoretical results are applied in the financial field to finite horizon investmentconsumption policies with non-exponential actualization (e.g., a hyperbolic one).In particular, we extend the results contained in the aforementioned works Ekeland and Lazrak (2006), Ekeland et al. (2012) and Ekeland and Pirvu (2008) by introducing the recursive utilities.An explicit characterization will smoothly be computed for a portfolio problem that we will consider at the end.We would like to observe here that the assumptions we adopt for the coefficients of our stochastic control system-a decoupled forward-backward stochastic differential equation-are substantially those proposed in Hu (2017), although the concept of equilibrium is not used there.We look for controls in feedback or closed-loop form, partially mimicking what is done in Ekeland and Pirvu (2008) (or Ekeland et al. (2012)), where explicit calculations are feasible.The shape of the recursive utility follows the classic Uzawa type (see Duffie and Epstein (1992)), but many other choices are possible.See also Gundel and Weber (2008), Imkeller and Dos Reis (2010), Yong (2012) and Penner and Réveillac (2015).
The theory of recursive optimal control problems in continuous time has attracted remarkable attention in recent years, both from a theoretical and an applicative viewpoint.For the time-consistent framework, we refer in particular to the fundamental works Duffie and Epstein (1992) and El Karoui et al. (2001) (see also El Karoui et al. (1997) and references therein).Timeinconsistent problems were first analyzed through subgame-perfect equilibrium strategies by Strotz (1955) and Pollak (1968), and this line of research has been pursued by many others.We mention the series of studies by Yong (see, e.g., Wei et al. (2017), ), whose approach focuses on dynamic programming, i.e., on Hamilton-Jacobi-Bellman equations (HJB equations).In Björk et al. (2017), the time-inconsistent control problem is considered in a general Markov framework, and an extended HJB equation together with the verification theorem are derived.Considering the hyperbolic discounting, Ekeland et al. (2012) studies the portfolio management problem for an investor who is allowed to consume and take out life insurance, and the equilibrium strategy is characterized by an integral equation.See also Hu et al. (2012), Björk and Murgoci (2014), Björk et al. (2014), Björk et al. (2017), Hu et al. (2017), Yong (2012), Yong (2014) and references therein for various kinds of problems.
Very recently, Hamaguchi (2021b) considered a time-inconsistent investment-consumption problem with random endowments in a possibly incomplete market under general discount functions.Finally, we mention Hamaguchi (2021a) for recent results concerning with time-inconsistent recursive stochastic control problems where the cost functional is defined by the solution to a backward stochastic Volterra integral equation.Differently from our approach, anyway, the author here focuses on open-loop equilibrium controls.
The contributions of this paper are summarized as follows.
-The extension to the framework of recursive stochastic control problems of the definition of the equilibrium concept (given in Ekeland and Pirvu (2008)) within a class of closed-loop strategies.
-The characterization of an equilibrium policy through the solution of a flow of BSDEs and to show that, under sufficiently general assumptions of the coefficients, this flow of BSDEs has a solution.
-The formulation of a necessary and sufficient condition for a closed-loop equilibrium control via variational methods (Theorem 1).
-The treatment of an illustrative example concerned with a investment-consumption problem under non-exponential discounting.
The remainder of the paper is organized as follows.In Section 1, we introduce the notation.In Section 2, we formulate the notion of (subgame-perfect) equilibrium policy and the class of problems in which we are interested.In Section 3, we recall some preliminary results.In Section 4, we present and prove necessary and sufficient conditions for the existence of an equilibrium policy: the main results are in Subsection 4.1; the technical part is in Subsection 4.2.In Section 5, these results are extended to the multidimensional case.In Section 6, we analyze a significant portfolio management problem as an application of the results obtained in the previous sections.We then conclude the paper in Section 7 by discussing possible future research.

Notation
Set T ∈ ]0, ∞[ as a finite deterministic horizon and let (Ω, F , P) be a complete probability space such that we can define a one-dimensional Brownian motion, or Wiener process, W = (W (t)) t∈[0,T ] on it.Let F = (F t ) t∈[0,T ] be the completed filtration generated by W , for which we suppose that F T = F (system noise is the only source of uncertainty in the problem).Therefore, the filtered space (Ω, F , F, P) satisfies the usual conditions.In this regard, see, e.g., (Yong and Zhou, 1999, Chap. 1, Sect. 2).
Remark 1.For any non-empty set I of indices ı, we will keep implicit the dependence on the sample variable ω ∈ Ω for each stochastic process on I × Ω, as is usually done (and as indeed we have just done for W ). We specify also that any stochastic process on I × Ω must be seen as its equivalence class given by the quotient with respect to the equivalence relation ∼ of indistinguishability: i.e., for any process X = (X(ı)) ı∈I , X = ( X(ı)) ı∈I on I × Ω, X ∼ X if and only if We introduce the following, rather familiar, notation, in which • .Less than or equal to, unless there are positive multiplicative constants (independent of what is involved) about which we are not particularly interested in being more explicit.

Problem formulation
Take n ∈ N * and R n equipped with the Euclidean topology and the Borel σ-algebra B(R n ) with its Lebesgue measure, as will be the case for any other Euclidean space, and choose a control domain (not necessarily bounded, for now).

For an appropriate
p ∈ [2, ∞[ that we do not want to give in explicit form (see (Hu, 2017, Introduction)), we set and we call an admissible control any element u( we call the spike (or needle) variation of ū(•) w.r.t.u(•) and E ε t the admissible control ūε (•) ∈ U [t, T ] defined by setting Remark 5.The spike variation ūε (•) in ( 2) is explicitly given, for any s ∈ [t, T ], by Notation 1.Any alphabetic letter appearing as a subscript of a prescribed set that appears explicitly as (part of) the domain of a function such as, among others, u = (u 1 , . . ., u n ) T for should be seen as our preferred notation for the generic variable element of that domain.
Fix four deterministic maps b, σ : Remark 6. Regarding Assumption 1, we point out the following.
• The relations with ϕ = b and ϕ = σ could really depend on t through the fact that s ∈ [t, T ].
• The conditions of sublinear growth and boundedness imply something that is somehow stronger than implied by the classic conditions in (Yong and Zhou, 1999, Chap. 3, Sect. 3 • A similar condition could work even if, relative at least to the variable u ∈ U or x ∈ I, it uses a more generic modulus of continuity ω(•) than a linear one, namely, a map that is non-decreasing, has lim δ↓0 ω(δ) = ω(0) = 0 and is such that it quantitatively measures the uniform continuity of some (continuous) function between metric spaces • See Section 6 for a situation where the maps b, σ, f and h satisfy all the regularity conditions required by Assumption 1.
(where s ∈ [t, T ]) and we call an admissible state process any solution X(•) of (3) that belongs to L 2 F (t, T ; R).Remark 7. Regarding Definition 3, we point out the following.
• The equation ( 3) is a controlled forward stochastic differential equation (FSDE) in Itô differential form, with finite deterministic horizon T and with random coefficients that depend on the sample ω ∈ Ω only through u(•) and X(•) itself, and it depends also on b, σ, t, and x (as well as W ).
• The term "strong formulation", which henceforth we will not repeat, alludes to the fact that the filtered space of probability (Ω, F , F, P) is fixed a priori together with W and therefore must not be sought as part of the solution of (3) (see, e.g., (Yong and Zhou, 1999, Chap. 1, Sect. 6)).
• Suppose Assumption 1 holds, at least as regards b and σ.
Then there exists a unique solution See, e.g., Proposition 1 below.
Notation 2. We specify the dependence on the elements involved by writing and P-a.s., Remark 8.Under Assumption 2, the interval I may depend on T : e.g., the larger T is, the larger I may also be (however, in the worst case scenario, we could always take I = R).Moreover, if we prefer, we can imagine that the domain component in the variable x of the maps b, σ, f and h is restricted precisely to I in such a way that the (analogue of) Assumption 1 still holds.See Section 6 for a situation where Assumption 2 is satisfied.
be an admissible state process as in Definition 3. We call a recursive (dis)utility system a backward stochastic differential equation (where s ∈ [t, T ]) and we call an admissible recursive utility any process is a pair solution of ( 5) that belongs to Regarding Definition 4, we point out the following.
• The equation ( 5) is a backward stochastic differential equation (BSDE) in Itô differential form, decoupled from the FSDE (3) of Definition 3 on which it totally depends.Here, we have something much more general than a stochastic differential (dis)utility (SDU) in its original meaning: that is, essentially, a BSDE such as (where ξ t ∈ L 2 T (Ω; R) and s ∈ [t, T ]).See, e.g., Duffie and Epstein (1992) and references therein.• The term "disutility" anticipates the fact that there will be something to be minimized (not maximized).This will be done in a more general sense than the classic one: precisely, in the sense of subgame-perfect equilibrium strategies.
Notation 3. We specify the dependence on the elements involved by writing Remark 10.Y (t; t) is a deterministic constant.Indeed, since x is a deterministic constant, X(T ) (see ( 4)) is measurable w.r.t. the completed σ-algebra Ft,T on Ω generated by the process and therefore Y (t; t) (see ( 6)) is simultaneously measurable w.r.t. the two mutually independent σ-algebras F t and Ft,T (an argument of this kind is found in, e.g., Debussche et al. (2007)).Consequently, On the other hand, it is not possible to establish in general that Z(t; t) is a deterministic constant.
Remark 11.We will prefer the notation X(•) to other possible notations such as X(• ; t), but we will retain the notations Y (• ; t) and Z(• ; t).
We call the recursive stochastic control problem the combination of the two stochastic differential equations ( 3) and ( 5), i.e., (where s ∈ [t, T ]) and, if (X(•), Y (• ; t), Z(• ; t)) is a solution of (7) that belongs to Remark 12.The equation/system (7) is a controlled decoupled forward-backward stochastic differential equation/system (FBSDE) in Itô differential form and, of course, we could use Notations 2 and 3 for the respective components of the corresponding solution.
Regarding the recursive stochastic control problem (7) of Definition 5, the following standard result for existence, uniqueness, and regularity holds (see, e.g., Ma and Yong (1999)).

Proposition 1. Suppose Assumption 1 holds and fix
Then there exists a unique solution (see also Remark 10).
Remark 13.The functional J(• ; t, x) in ( 8) is a real-valued generalized Bolza-type functional and, for any an expression in which the running or intertemporal utility and the terminal utility are explicitly specified (see again Remark 10).In particular, the more constant f is w.r.t. the variables y and z, the more we return to the classical sphere of stochastic optimal control theory.
Definition 7 (Equilibrium policy).Suppose Assumptions 1 and 2 hold.We call a (subgame-perfect) equilibrium policy associated with T, I, U, W and b, σ, f, h any measurable map such that, for any t ∈ [0, T [ and x ∈ I, there exists a unique I -valued Itô process that is a solution of the FSDE dX(s) = b s, X(s), Π(s, X(s)) ds + σ s, X(s), Π(s, X(s)) dW (s), (where s ∈ [t, T ]) belonging to L 2 F (t, T ; R) and is such that if we denote, for any s ∈ [t, T ] (and then we have ū(•) ∈ U [t, T ] and, for any other u( where, for ε ↓ 0, ūε (•) is the spike variation of ū(•) w.r.t.u(•) and E ε t given by (see also Definition 2).
Remark 14.The lim inf ε↓0 in (11) will turn out to be an actual limit (see Lemma 2 in Section 4).
Notation 4. With respect to the notation of Definition 7, we specify the dependence on the elements involved by writing (similarly to Notations 2 and 3).
Remark 15.Regarding Definitions 7 and 8, we point out the following.
• If ū(•) as in ( 10) is an optimal control in the classical sense, i.e., ū(•) minimizes the objective functional J(• ; t, x) over U [t, T ], then ū(•) is also an equilibrium control (the opposite cannot be true, in general).
• We could specify an equilibrium policy/control/pair/4-tuple to be strong in cases in which the inequality in ( 11) is strong, i.e., narrow (adjusting the entire sequel accordingly).
• In general, we cannot expect an equilibrium policy/control/pair/4-tuple to be unique, even if it exists.It might therefore be an idea to select one uniquely through a constraint : this will be the subject of our future studies (see also Section 7).
• The condition (11) can be rewritten as (see Definition 6 and Notation 4).
The stochastic control problem that we will deal with, for which it is essentially a matter of obtaining a (Pontryagin) maximum principle, can be stated as follows (see Definition 8).
Problem 1. Find necessary and sufficient conditions for an equilibrium 4-tuple.
Remark 16.Having thus established definitions, assumptions and purposes, we want to rigorously emphasize that our "generalized" optimization problem (Problem 1) results generally affected by time-inconsistency exactly because of the shape of the recursive utility Y (• ; t) (see Definition 4)and so of the utility functional J(• ; t, x) (see Definition 6)-which indeed we interpret financially as having a structure of non-exponential time discounting and, therefore, a not constant (psychological) discount rate.That fact, on the other hand, explains why (subgame-perfect) equilibrium controls/strategies are considered (Definitions 7 and Definition 8).For all this, we also refer to the portfolio management problem of Section 6. See, among the others, Ekeland and Pirvu (2008), Ekeland et al. (2012) and Wei et al. (2017).

Some preliminary results
We recall a standard estimate for BSDEs that is decisive in Hu (2017) (on which we rely) and that can be found in, e.g., Briand et al. (2003).Its meaning is essentially the continuous dependence of the pair solution on the assigned data.• The main result underlying the whole theory of BSDEs is the classic representation theorem of integrable square continuous martingales, and thus it is crucial that the reference filtration remains the completed filtration F generated by W .
• The two exponentiations to powers p/2 and p concern the two deterministic integrals (the expected values of which we then calculate), and not just the respective integral functions (which are absolute values of differences).See also Remark 2.
We conclude the current section with a brief discussion of the classic comparison theorem for BSDEs, which, in the context of linearity, boils down to a simple observation, Remark 18 below, which will be important for our discussion, especially because what we will call adjoint equations will be linear BSDEs.See, e.g., El Karoui et al. (1997), where it is also possible to extrapolate the following starting result.Then, for any α(• ; t) ∈ L 2 F (t, T ; R) and ξ t ∈ L 2 T (Ω; R), there exists a unique pair solution (where s ∈ [t, T ]) and the process Ξ(• ; t) is the conditional expectation given by, for any s ∈ [t, T ] (and P-a.s.), Remark 18. Regarding Proposition 2, we point out the following.
• Since η(• ; t) > 0, it follows from (15) that (and similarly with ≤ everywhere), and, moreover, the narrow inequality for Ξ(• ; t) holds even if only one of the two other inequalities is narrow: e.g., (owing to the dependence on t of β(• ; t) and γ(• ; t)), and therefore we can expect that, as processes, (in which the latter term differs from Ξ(s; s) through the dependence on t of α(• ; t) and ξ t ).

A maximum principle: necessary and sufficient conditions
In this section, we solve Problem 1 by adapting the calculations of Hu (2017) appropriately.In particular, heuristics relating to the shape of the adjoint equations/processes and their respective generalized Hamiltonian functions are not provided.We suppose that Assumptions 1 and 2 hold, and we fix (t ∈ [0, T [, x ∈ I and) an admissible 4-tuple (see Definition 5) that we see as a candidate equilibrium 4-tuple (see Definitions 7 and 8).
Regarding the following, let us keep in mind Notation 5.
Remark 19.The process κ(• ; t) in ( 17) is strictly positive and can be interpreted as a change of numéraire relative to the (dis)utility and corresponding to the coefficients f y (• ; t) and f z (• ; t).
Proposition 3.There exist unique pair solutions (p( • ; t), q( • ; t)) of ( 18) and (P ( Remark 21. p(t ; t) and P (t ; t) are deterministic constants (for similar reasons to those given in Remark 10), while it is not possible to say the same in general about q(t; t) and Q(t; t).
Remark 22.For s ∈ [t, T ] and u ∈ U , H(s; t) and δH(s; t, u) belong to L 1 s (Ω; R) and, moreover, The key result is the following lemma, which we will prove in Subsection 4.2.
and let ūε (•) be the spike variation of ū(•) w.r.t.u(•) and E ε t .Then, for any s ∈ [t, T ], the lim inf ε↓0 as in (11) is an actual limit and takes the form Remark 23.As we will understand shortly, instead of ( 25), we could take, among other possibilities, Corollary 1 (Sufficient conditions).Suppose there exists a measurable map Π : [0, T ]× I → U such that, for any s ∈ [t, T ] (and P-a.s.), and suppose that, for any u ∈ U (and P-a.s.), Then Π is an equilibrium policy, i.e., is an equilibrium 4-tuple.
Remark 24.Regarding Corollary 1, we point out that, if Z(t ; t) and q(t ; t) are deterministic constants, then the condition ( 28) is equivalent to We are finally ready to present the first of our main results (see also Corollary 1 and Remark 24).
Theorem 1 (Maximum principle).Suppose there exists a measurable map Π : [0, T ] × I → U such that, for any s ∈ [t, T ] (and P-a.s.), Then the following three conditions are equivalent.
Under appropriate assumptions on our coefficients, we can replace H with H in Theorem 1 thus obtaining the following result, which will be used in Section 6.Let us keep in mind also Definition 11.

A proof of Lemma 2
We start from the following notational convention, borrowed from Hu (2017), with regard to which there should be no misunderstandings in this section.
What we need to properly estimate is, for ε ↓ 0, (see ( 13) in Remark 15), while, in the sense of Lemma 3, we know something useful only about (see Remark 25).Therefore, the idea is to reconstruct information by going back from this term by means of appropriate BSDEs (using the adjoint processes as in Definitions 10 and 11).
We recall below a basic calculation we use, namely, Itô integration by parts for a regular product.
Remark 30.We highlight the key difference with respect to the classical maximum principle, summarizing now in a few words what has just been seen technically (for completeness, see also Remark 16).
The utility functional J(• ; t, x) must be optimized in the ("weak") sense of equilibrium policies and, in particular, through the usual spike variation technique; so, on the one hand, a treatment similar to that initially formulated in (Yong and Zhou, 1999, Chap. 3, Sect. 4) is set.On the other hand, J(• ; t, x) has a precise ("strong") structure that derives from a recursive utility system which is a BSDE; therefore, to complete the calculations, the powerful techniques in Hu (2017) are taken up ad hoc.
Therefore, Lemma 2 changes as follows. Lemma and let ūε (•) be the spike variation of ū(•) w.r.t.u(•) and E ε t .Then, for any s ∈ [t, T ], the lim inf ε↓0 as in (11) is an actual limit and takes the form Consequently, Theorem 1 changes as follows (and the proof is essentially the same).
and any admissible control u(•) ∈ U [t, T ] is a portfolio strategy that can be written as (see Definition 1) and which we refer to as the investment-consumption policy.
We accept as true the usual self-financing condition, namely, that the variation in wealth over time is due exclusively to profits and losses from investing in the stock and from consumption (there is no cashflow coming in or out), and we consider investment-consumption policies in feedback form: We assume that the agent derives utility, from intertemporal consumption c(•)X(•) and final wealth X(T ), which she/he tries to optimize by minimizing in a sense a discounted expectation involving (dis)utility functions.Therefore, let υ(•) and υ(•) be two scalar functions of a real variable that satisfy the classical Uzawa-Inada conditions (utility functions, in fact): i.e.,  i.e., the rate of return used to discount future cashflows back to their present value, can be considered to be a monotonic function (and the same for ħ(• ; t)).Under suitable conditions of analytical-geometric regularity, it is possible to restrict attention to the first-order part of the Hamiltonian alone, as is done with the investment-consumption policies considered in Section 6.
With regard to possible future developments of the approach discussed in this paper, we highlight the following.
Introduce a constraint to the problem and seek at least necessary conditions of existence.Such a constraint could be defined on an expected value-which in turn derives or not from a recursive utility-or it could also be an infinite-dimensional constraint such as (see, e.g., Yong (1999), El Karoui et al. (2001) and Zhuo (2018)).We emphasize in this regard that such constrained problems are still quite far from being fully developed: indeed, in the existing literature, stochastic control problems have been studied under the influence of a constraint but, generally, not with recursive utilities.We refer to Pirvu (2007) as one of the first notable works in portfolio choice theory with constant-relative-risk aversion (CRRA)type preferences, for a convex and compact constraint defined through a (pseudo) risk measure such as value at risk (VaR) on a wealth process at a future time instant "very close" to the present.Here, the market coefficients are random but independent of the Brownian motion driving the stocks.For a generalization, see Moreno-Bromberg et al. (2013) (CRRA preferences) and also Hu et al. (2005) and Cheridito and Hu (2011) (martingale methods), among others.
In the search for a concrete equilibrium policy Π, in a practical situation such as that discussed in Section 6, the Hamilton-Jacobi-Bellman equation associated with the problem could be set up with an appropriate ansatz for the value function (see, e.g., Ekeland and Pirvu (2008) and Ekeland et al. (2012)).
Also with regard to practical applications, other portfolio management problems should be explored, with different choices of recursive utility and constraint.
Finally, extensions to the infinite horizon case or to random horizons τ (•) (stopping times), should be investigated as well.