THE SELF-COORDINATION MEAN-VARIANCE STRATEGY IN CONTINUOUS TIME

. The dynamic mean-variance portfolio selection problem is time inconsistent. In the literature, scholars try to derive the pre-committed strategy, the time consistent strategy and the self-coordination strategy. The pre-committed strategy only concern the global investment interest of the investor. The time consistent strategy only concerns the local investment interests of the investor, while the self-coordination strategy balances between the global investment interest and local investment interests of the investor. However, the self-coordination strategy is only studied for the discrete time mean-variance setting. We study the self-coordination strategy for the continuous time mean-variance setting in this paper. With the help of mean-field reformulation, we derive the analytical self-coordination mean-variance strategy and show that the pre-committed strategy and time consistent strategy are special cases of the self-coordination strategy.


Introduction
In a dynamic mean-variance framework, the investor decides the global optimal strategy according to the following problem, (  0 ), max where  > 0 is the risk aversion parameter of the investor.The global optimal strategy of (  0 ), which is called pre-committed strategy in this paper 1 , can guide the investor to the best performance at time  based on the information and wealth at time 0. When coming to a future time instant (> 0), the investor may reconsider the portfolio selection problem based on the information and wealth at time , and would like to choose the where  > 0 is the interest rate.The other asset is a risky stock, whose price process  1 () satisfies the following stochastic differential equation (SDE): where  is the appreciation rate,  is the volatility or dispersion rate of the stock and  () is standard Brownian motion defined on (Ω, ℱ  , {ℱ  },  ).We assume in this paper that all the given market parameters ,  and  are deterministic constants.We now consider an investor of a mean-variance type with an initial endowment  0 > 0 and an investment horizon [0,  ].Assume that the trading is self-financed and takes place continuously, and that there is no transaction cost and consumptions during the investment process.The wealth process,   (), then satisfies where portfolio () denotes the total market value of wealth in the stock.We define the infinitesimal generator A  now as follows, where   (, ),   (, ) and   (, ) denote the partial derivatives of  (, ) with respect to  and , and the second order partial derivative of  with respect to , respectively.At time 0, the investor is seeking the pre-committed optimal strategy such that the expected value of the terminal wealth is maximized and the variance of the terminal wealth is minimized, (  0 ()) max E 0,0 [  ( )] − Var 0,0 (  ( )), s.t.(  (•), (•)) satisfies (1), where (•) is the decision variable,  > 0 is risk aversion parameter, which represents the trade-off between the two conflicting objectives (the larger the , the larger the risk aversion).Varying parameter  in (0, +∞) in problem formulation (  0 ()) traces out the efficient frontier in the mean-variance space.
The pre-committed strategy of (  0 ()) can be found in [17], where  = (−) 2  2 .Moreover, the corresponding efficient frontier is expressed as Next, we provide the definition of time consistent strategy and derive the time consistent strategy for   0 ().
Following [2] and [4], we can derive the time consistent strategy as with the corresponding efficient frontier expressed as

Self-coordination mean-variance strategy
Before we derive the self-coordination strategy for the continuous time mean-variance setting, we first extend the two-tier planner-doer game framework of [8] to continuous time setting.The two-tier planner-doer game framework in discrete time can be illustrated by Figure 1.
The red person above is the planner, who can induce the blue doers to make decisions by announcing a planned strategy and committing that the deviation from the planned strategy would cause a punishment.And the blue doers in the below would make decisions by considering both the planner's announced strategy and the future doers' strategies.Thus, there are a sequential game among the doers and a principle-agent game between the planner and the doers.
Consider a controlled Markovian process   in continuous time within interval [0,  ], which is defined on a filtered probability space (Ω, ℱ  , {ℱ  },  ).At time , after knowing    = , the decision maker wants to decide the [,  ] optimal strategy by maximizing the following preference over time horizon [,  ], where (,    ) is the strategy at time ,  is a deterministic function that describes the objective of the decision maker at time .In general, these preferences at different time instants do not satisfy Bellman's principle of optimality, which makes the decision problem under consideration time inconsistent.
To balance the global and local interests of the decision maker, a punishment scheme is proposed.More specifically, the planner, who represents the global interest of the decision maker, decides a planned strategy denoted by {(,    )}  ≥0 , announces it to the doers, and commits that any derivation by the doer  from the planned control strategy leads a punishment denoted by where   () > 0 is the weighting function of the punishment, which depends on the applied strategy after time  but does not depend the strategy at time .Apparently, the punishment is always non-negative.And when the strategies  and  are identical, the punishment is zero, which implies that when the doer follows the planned strategy there is no punishment.Thus, the doer  faces the following optimization problem, ( doer ()), max As the optimization problems of the doers with punishment are still time inconsistent, the doers tries to find the subgame perfect Nash equilibrium strategy in Definition 2.1, which is denoted as {  (,    )}  =0 .s.At the same time, the total amount of punishment applied by the planner will in turn penalize the planner himself.Thus, the planner faces the following optimization problem, ( planner ()) max where  is the trade-off parameter between the original global objective and the total expected punishment, which represents the self-coordination willpower of the decision maker.The larger the , the stronger the planner's dislike of the difference between the doer's subgame perfect Nash equilibrium strategy strategy from the planned strategy.
In general, this constructed planner-doer game can be solved by the following procedure: 1. obtain the subgame perfect Nash equilibrium strategy of ( doer ()), {  (,    )}  ≥0 , for any given planned strategy {(,    )}  ≥0 ; 2. with knowing the doers' responsive subgame perfect Nash equilibrium strategy, solve ( planner ()) to obtain the best planned strategy { * (,    , )}  ≥0 ; 3. substitute the best planned strategy { * (,    , )}  ≥0 into {  (,    )}  ≥0 , and obtain the applied strategy of the doers, {  * (,    , )}  ≥0 .Under equilibrium, the applied strategy of the doers is called self-coordination strategy, which is denoted { Coor (,    , )}  ≥0 .In the following theorem, we obtain some useful results for the self-coordination strategy without solving the planner-doer game directly.We give an important assumption of the punishment scheme.Next, we derive the self-coordination strategy for continuous time mean-variance setting.As shown in Section 2, both the pre-committed strategy and the time consistent strategy take linear forms.Thus, we confine ourselves to consider only linear self-coordination strategy.
Denote the doers' carried out strategy as (, ) =   () +   () and the planner's planned strategy as (, ) =   () +   ().We propose the following particular form of punishment, where   () and   () satisfy the following ODEs, where    () and    () are the first order derivatives of   () and   () with respect to .It is easy to show that 2 }d > 0 holds for all , and the punishment is non-negative.Furthermore, based on the ODEs, we find that [  ()] 2 −   () does not depend on the value of   () but depend on the values of   (), ( > ).Under the assumption of linear strategy, the proposed punishment introduces a running cost into the doers' and planner's problems.The running cost takes a quadratic form of the wealth state, which makes it possible to obtain the explicit self-coordination strategy.If both the planner's planned strategy and the doers' carried out strategy are general functions of wealth, explicit selfcoordination strategy may not be obtained.
The following proposition reveals that the proposed punishment scheme satisfies Assumption 3.1.
Proof.Under the proposed form of punishment in (4), the doers faces the following optimization problem, max where the wealth dynamics is characterized by Next, we derive the subgame perfect Nash equilibrium strategy {   ()  () +    ()}  ≥0 .Let us define Applying the analysis method of [3], the extended HJB equation is given as follows, Then, we have the subgame perfect Nash equilibrium strategy as follows, and the parameters satisfy the following ODEs, and     (),     (),     (),     () and     () are the first oder derivatives.As equation ( 6) has constructed the one-to-one mapping between   (,    ) and (,    ), we can simply choose Remark 3.4.Proposition 3.3 shows that the subgame perfect Nash equilibrium strategy of the doers   (,   ()) depends on the coefficients    () and    (), which further depends on {  (,   ())}  ≥ .Therefore, the subgame perfect Nash equilibrium strategy of the doers   (,   ()) is not fully explicit.The system should be solved numerically by a backward scheme.When the future strategy after time , {  (,   ())}  ≥+Δ , are obtained, we can compute    () and    () by the ODEs, and then determine the current strategy   (,   ()).
Substituting the planned strategy   (, ) = [2  () − k()] + 2  () − l() back into the punishment in (4), the punishment becomes This term measures the punishment happened under punishment scheme (4), which can let the doers adopt {  ()  () +   ()}  =0 .Based on Theorem 3.2, the self-coordination mean-variance strategy can be obtained by solving the following problem parameterized in , (  (, )), where   () and   () satisfy the following ODEs, Before deriving the pre-committed strategy of (  (, )), we make more detailed discussion for the punishment scheme in (4).Based on the proof of Proposition 3.3, it is easy to check that any punishment with   () > 0 can construct a one-to-one mapping between between   (,   ()) and (,   ()).Then, why do we choose a particular form  2 ([  ()] 2 −   ())?Because this punishment has an important economic meaning.
Let us first introduce the concept of instantaneous utility loss at time  for a given policy {(,   ())}  ≥ .For a fixed real number ℎ > 0, consider the following dynamic mean-variance problem, max where the strategy  ℎ is defined as We denote the subgame perfect Nash equilibrium strategy of the problem from  to  + ℎ as {π(,  πℎ ())} +ℎ ≥ , and denote the objectives of the optimization problem under πℎ and  as  πℎ (, , πℎ ) and   (, , ).The instantaneous utility loss represents the objective reduction when choosing a local optimal strategy in an instantaneous time slot instead of following the specified strategy {(,   ())}  ≥ , which is quite similar to the concept of opportunity cost.Proposition 3.6.Under the punishment term in (7), is just the instantaneous utility loss for the strategy {  ()  () +   ()}  ≥ .
Y. SHI ET AL.
Therefore, we now can transfer the problem (  (, )) into the following mean-field reformulation (  (, )), The mean-field reformulation is a separable two-dimensional linear quadratic control problem, which can be readily solved by dynamic programming.
Theorem 3.7.The self-coordination mean-variance strategy which solves (  (, )), { Coor (, )}  ≥0 , is given as with where the parameters βCoor () and γCoor () are defined as follows, other parameters are governed by the following ODEs, and    (),    (),    (),    () and    () are the first order derivatives.Finally, the efficient frontier is expressed as Proof.Before solving the mean-field reformulation, we need to compute the value of E 0,0 [û()].Parameter   () in ODE ( 5) can be written into the following form, where   () and   () are new parameters.By noticing which implies the dynamics of   () and   ().It is easy to check that Thus,   () +   () = 0 for all , which further implies • Now, we solve the problem (  (, )).To simplify the derivation, we denote the portfolio strategy and the state, respectively, as As E 0,0 [ 1 ()] = 0, we have holds for any deterministic function   ().Thus, the following auxiliary optimization problem (), has the same optimal strategy as problem (  (, )).
We conjecture the solution as the following quadratic form, with   ( ) = 1,   ( ) = 0.It is easy to see that Thus, the optimal strategy is given as ]︁ = 0,   ( ) = 0. Therefore, we have According to the dynamics of E 0,0 [  ()] specified in (8), we have Applying Itô Lemma to (9) and taking expectation, we obtain Substituting the equilibrium strategy in (13), we have Remark 3.8.Similar to Proposition 3.3, the self-coordination strategy is not explicitly given, which depends on a circular connected system of  Coor , βCoor and   () and can be obtained by a backward numerical method.
Remark 3.9.The solution schemes in Proposition 3.6 and Theorem 3.7 are also applicable for the punishment term, where () is a deterministic function of time .Furthermore, when the parameters of the market, , , and , are deterministic functions of time, our results still hold.
Remark 3.10.Consider a linear combination strategy of the pre-committed strategy and time consistent strategy as follows, where  1 and  2 are the weights on the pre-committed strategy and the time consistent strategy, respectively.The self-coordination strategy can be rewritten as Comparing the two strategies, we can see that they are quite different.Even with a time-dependent weights, the combination strategy can not generate the self-coordination strategy.
Theorem 3.7 has shown that the self-coordination mean-variance strategy in continuous-time setting also takes a linear feedback form, and the corresponding efficient frontier is also a line in mean-standard deviation plane.Although the calculation of parameters in the self-coordination mean-variance strategy seems complicated, it does not depend on risk attitude parameter  and can be computed off-line easily by, for example, Euler method.
The self-coordination strategy is the pre-committed optimal strategy of problem (  (, )) (i.e.,   (, )).It is easy to check that the pre-committed strategy and the time consistent strategy are the special cases of self-coordination strategies with  = 0 and  = +∞, respectively.Although the pre-committed strategy and the time consistent strategy are derived by very different approaches, they can be unified in the self-coordination framework.The objective of problem (  (, )) contains two terms, one is the original mean-variance objective and the other is the punishment term.On one hand, when  = 0, the punishment term vanishes.Problem (  (, )) reduces into the original   0 () and the self-coordination strategy is just the pre-committed strategy of   0 ().The self-coordination strategy includes the pre-committed strategy because   0 () is a special case of (  (, )).On the other hand, when  = +∞, the original mean-variance objective vanishes.The self-coordination strategy of (  (, )) can be simply obtained as k()  + l(), which has the zero punishment.It is easy to check that k() = 0 and l , which implies that the self-coordination strategy is just the time consistent strategy.The self-coordination strategy includes the time consistent strategy because the punishment term forces the investors not deviate from the time consistent strategy.
Next, we would like to use an example to illustrate the influences of trade-off parameter  on the investment performance.
Example 3.11.We assume the market parameters set as  = 0.05,  = 0.12,  = 0.10 and  = 1. Figure 2a shows the pre-committed strategy, two self-coordination strategies and time consistent strategy as functions of wealth level at time 0.3,  0.3 .These strategies have the same risk aversion  = 5.We can see that the precommitted strategy and self-coordination strategies have negative slopes, and the time consistent strategy is a constant.Comparing to the self-coordination strategy with  = 10, the self-coordination strategy with  = 2 has larger negative slope, which implies that it changes more as long as the wealth level changes.Furthermore, the self-coordination strategy with  = 2 is also much closer to pre-committed strategy.The efficient frontiers achieved by these strategies are reported in Figure 2b.We can see that the efficient frontiers achieved by selfcoordination strategies lie in between the two achieved by the pre-committed strategy and the time consistent strategy.
For different s, we also compute the corresponding Sharpe ratios, which are depicted in Figure 3.It is evident that the smaller the , the larger the Sharpe ratio achieved by the self-coordination strategy.Please note that the largest Sharpe ratio and smallest Sharpe ratio are achieved by the pre-committed strategy and the time consistent strategy, respectively.
Figure 4 reveals the following interesting findings for the total punishment: 1) the stronger the willpower to conduct self-coordination (the smaller the value of ), the better the long-term investment performance, and the larger the total punishment, i.e., to achieve better mean-variance efficient frontier of the terminal wealth, the investor needs to bear larger punishment; 2) the less risk aversion of the investor (the smaller the value of ), the larger the intention of holding risky assets, and the harder to conduct self-coordination; 3) when  = +∞, the total punishment is zero, i.e., when the investor is fully risk averse and only invests in riskless asset, there is no need to conduct self-coordination.

Conclusion
We have extended the two-tier planner-doer game framework of [8] to continuous time setting in this paper and applied the framework to analyze the continuous time mean-variance portfolio selection problem.The two-tier planner-doer game framework adds a punishment term into the objective, which makes the portfolio selection problem quite challenging.We have changed the problem into its equivalent mean-filed reformulation and derived the analytical self-coordination strategy.Our work provides a different way to resolve time inconsistency of continuous time mean-variance portfolio selection problem.
While we have demonstrated that the continuous-time two-tier planner-doer game framework exhibits promising theoretical properties, our study falls short in providing an in-depth exploration of the selection of an appropriate punishment term that aligns with Assumption 3.1.The punishment term we've examined in our paper possesses sound economic significance but introduces computational complexities when addressing the associated problem.
Therefore, an avenue for future research in this domain is to identify a punishment term that not only retains its economic relevance but also lends itself to straightforward mathematical handling.Additionally, we suggest extending the applicability of this framework to address other time-inconsistent portfolio selection challenges within a continuous-time context, such as the mean-conditional value-at-risk problem.These research directions hold the potential to enhance the practicality and effectiveness of our proposed framework.