TIME CONSISTENT POLICY OF MULTI-PERIOD MEAN-VARIANCE PROBLEM IN STOCHASTIC MARKETS

. Due to the non-separability of the variance operator, the optimal investment policy of the multi-period mean-variance model in Markovian markets doesn’t satisfy the time consistency. We propose a new weak time consistency in stochastic markets and show that the pre-commitment optimal policy satisﬁes the weak time consistency at any intermediate period as long as the investor’s wealth is no more than a speciﬁc threshold. When the investor’s wealth exceeds the threshold, the weak time consistency no longer holds. In this case, by modifying the pre-commitment optimal policy, we derive a wealth interval, from which we determine a more eﬃcient revised policy. The terminal wealth obtained under this revised policy can achieve the same mean as, but not greater variance than those of the terminal wealth obtained under the pre-commitment optimal policy; a series of superior investment policies can be obtained depending on the degree the investor wants the conditional variance to decrease. It is shown that, in the above revising process, a positive cash ﬂow can be taken out of the market. Finally, an empirical example illus-trates our theoretical results. Our results generalize existing conclusions for the multi-period mean-variance model in deterministic markets.

1. Introduction. Following the seminal work of Markowitz on the single period mean-variance (MV) model, many models for the portfolio selection problem have been established under the return-risk framework. Usually, whenever a new risk measure is proposed, we can construct the corresponding portfolio optimization model, to cite a few, see [3], [14], [15], [25], [30] and references therein. The static model is powerless when the investor has a particular requirement at a specific time point in the future. Therefore, the single period MV model is naturally extended to the multi-period case. To overcome the non-separability of variance in the sense of dynamic programming, Li and Ng [23] solve the multi-period MV model by embedding it into a separable parametric auxiliary model and derives the analytical optimal policy. Zhou and Li [35] solve the continuous-time MV problem by adopting the same scheme. The continuous-time MV model with no short-selling constraints is studied in [24]. As an indispensable ingredient of risk control, bankruptcy constraints are considered in [36], where a generalized discrete-time MV formulation is proposed and an analytical optimal investment policy is derived.
In the above dynamic MV models, the asset returns at different periods are assumed to be independent, which is not sensible in reality. The dependence among the stage-wise random returns should be considered. Generally, we can assume that the distributional parameters of the returns of risky assets are governed by a stochastic process, the so-called stochastic markets. This idea has aroused great interest in recent years. For the multi-period MV model in stochastic markets, Ç akmak andÖzekici [8] assume that the stochastic return process is a Markov chain. By solving an auxiliary problem, the analytical optimal investment policy is derived. Wei and Ye [33] further incorporate the bankruptcy constraint in the Markovian stochastic market and obtain the optimal investment policy by using the similar method. The same technique is used to solve different forms of multi-period investment problems in [9] and [10].
It is now accepted that time consistency should be a required condition for multiperiod portfolio selection problems (see, for example, [20], [28] and [31]). In general, we can discuss time consistency from two different aspects: multi-period risk measure and optimal investment policy. The former aims at characterizing the relationship among risk measures at individual stages, whose main idea can be described as follows: for any two investment positions X and Y , if X is more riskier than Y under a specific measure at any time in the future, then X is more riskier than Y under the same measure at present. Inspired by this idea, different versions of time consistency have been proposed in [16], [27] and [31], and different kinds of dynamic time consistent risk measures are constructed in papers like [1], [11] and [26].
Compared with the time consistency of dynamic risk measure, the research about the time consistency of the optimal investment policy is few. Inspired by the "principle of optimality" of dynamic programming, Boda and Filar [6] define the time consistency of the optimal investment policy through introducing the following two requirements: a) the policy constituted by stage-wise optimal decisions recursively obtained by the dynamic programming method is also the optimal policy to the whole problem; b) the sub-policy of an optimal policy for the whole problem is also the optimal policy for the corresponding sub-problem, which is actually Bellman's optimality principle. At present, most studies ( [17] and [32]) about the time consistency of the optimal investment policy are referred to the second requirement. On the other hand, Shapiro [29] proposes another type of time consistency of the optimal investment policy: the optimal decision in any state should not depend on the scenarios which can not happen in the future. This notion of time consistency is similar to Bellman's optimality principle but not the same. The time consistency of the optimal investment policy does not hold even under some popular risk measures like VaR and CVaR [6]. Especially, the optimal investment policy of the multiperiod MV problem, usually called the pre-commitment optimal policy, is not time consistent because of the non-separability of the variance operator. Hence, some researchers try to find time consistent strategies by either reformulating the multiperiod MV problem or revising the pre-commitment optimal policy. With a nested conditional expectation mapping, Chen et al. [11] propose a revised multi-period MV problem with time consistency and find the explicit optimal strategy. Chen et al. [12] discuss how to model the multi-period MV problem with time consistency in Markovian markets. From another point of view, Cui et al. [17] propose a weak time consistency, i.e., the time consistency in efficiency by taking into account return and risk simultaneously, and provide a unique policy by revising the pre-commitment optimal policy. The main difference between this time consistency and that based on Bellman's optimality principle is that the trade-off parameter between risk and return can change over time for the former definition. Furthermore, the multi-period MV problem is solved by using an extended Hamilton-Jacobi-Bellman equation in [4] and the time consistent optimal investment policy is derived in [2] and [18]. Björk et al. [5] study a continuous-time MV portfolio selection problem under the game theoretic framework and obtain a time-consistent optimal investment policy. However, the obtained optimal investment policy is not really the optimal solution of the original problem.
It is assumed in [17] and [23] that the asset returns at different periods are independent. As pointed out above, the random dynamics of security markets must be considered for multi-period investment problems. In view of this, we will examine the time consistency problem of the multi-period MV model in stochastic markets by extending the results in papers such as [8], [17] and [23]. Our main contributions in this paper include: Firstly, compared with the research in [8], we further investigate the time consistency of the multi-period MV model because the time consistency can guarantee that the investor will not regret to investment decisions he/she has made before. Secondly, instead of constructing a new time consistent risk measure as that in [11] and [12], we revise the pre-commitment optimal policy to make it time consistent. Such a revision of optimal portfolios is directly visible for the investor and is easy to implement in real markets. Thirdly, we extend the definition about time consistency in [17] and [31] to the stochastic market and propose a notion of weak time consistency. Here, both the optimal policy and the trade-off parameter depend on the states of the stochastic market. Such extensions can more appropriately characterize real financial markets. Fourthly, when the pre-commitment optimal policy does not satisfy the time consistency, through relaxing the self-financing constraint, we show how to revise the pre-commitment optimal policy so that the terminal wealth obtained under the revised policy can achieve the same mean as, but not larger variance than, those of the terminal wealth obtained under the pre-commitment optimal policy. This result improves the relevant conclusion in [17], where the terminal wealth obtained under the revised policy can achieve the same mean and variance as those of the terminal wealth obtained under the pre-commitment optimal policy. Finally, the approach in [17] provides a unique revised policy, we generalize that method in the sense that, a series of more efficient revised policies can be obtained according to the level of the revised wealth, which provides the investor more flexible and attracting choices in practical investment.
The rest of this paper is organized as follows. In the next section, we introduce the fundamental market framework and the multi-period MV model in Markovian markets, and derive the corresponding pre-commitment policy. In section 3, we demonstrate that the pre-commitment policy is not always weak time consistent and show how to revise it when necessary. The obtained theoretical results are illustrated in section 4 through an empirical example. At last, we present our conclusions.

2.
Multi-period MV model in Markovian markets and its solution. We consider a stochastic security market consisting of one riskless asset and n risky assets within a time horizon [0, T ]. It is natural to assume that the total return of the riskless asset varies in different time periods, r f t , t = 0, 1, · · · , T − 1, denotes the total return during the period from time t to t + 1. The total returns of risky assets are random and their means, variances and covariances change with respect to the time-varying states of the market. Let Y t denote the state of the market at period t, we assume that Y = {Y t ; t = 0, 1, · · · , T − 1} forms a Markov chain with the discrete state space E and the transition matrix Q. The random returns of risky assets change according to the above stationary Markov chain. Let R(i) = [R 1 (i), R 2 (i), · · · , R n (i)] be the random return vector of the n risky assets in state i ∈ E and let r(i) = E[R(i)] be the corresponding mean return vector. For any state i ∈ E, the corresponding covariance matrix is denoted by σ(i) = (σ k,j (i)) n×n , here σ k,j (i) is the covariance between R k (i) and R j (i). Without loss of generality, we assume σ(i) is positive definite in this paper.
With the above notations, R e t (i) = R(i) − r f t 1 is the excess return vector under state i ∈ E at period t and the corresponding expected excess return vector is r e t (i) = r(i) − r f t 1, here 1 = [1, 1, · · · , 1] ∈ R n . For any given state i at period t, we define Due to the positive definiteness of σ(i), we can easily deduce that V t (i) is also a positive definite matrix. The above stochastic market framework is similar to that of Markovian regime-switching models for security pricing, such as [7], [19] and [34]. Let X t be the wealth at the beginning of period t and X = [X 0 , X 1 , · · · , X T ] be the investor's wealth vector over the entire investment horizon. As usual, the initial wealth X 0 = x 0 is deterministic. Let u j t (i), j = 1, · · · , n, and u t (i) = [u 1 t (i), · · · , u n t (i)] represent the cash amount invested in the jth risky asset and the corresponding investment decision vector in state i ∈ E at the beginning of the tth period, respectively. Then, the wealth dynamics can be expressed as follows, Strictly speaking, the random wealth X t varies with the state i ∈ E and should be written as X t (i), we omit i hereinafter for ease of expression, like that in [8] and [33]. For any random variable X, let denote the conditional expectation and the conditional variance, respectively, given the initial market state i 0 and the initial wealth x 0 .
For multi-period investment problems, the investor wants to minimize the investment risk and to maximize the terminal wealth at the same time. Here, we assume that the risk is measured by the variance of the terminal wealth. Therefore, under the above stochastic market framework, the portfolio selection problem can be formulated as the following dynamic MV model: Here λ ∈ [0, +∞) denotes the trade-off between the variance and the expected value of the terminal wealth X T . We assume that the return of the riskless asset varies over time. This is different from the assumption in [8], where the return of the riskless asset varies with the market state. It is known from [8] that problem (3) is nonseparable in the sense of dynamic programming. Nevertheless, problem (3) can be embedded into a separable parametric auxiliary problem as follows, Here ω is an auxiliary parameter. By using the similar solution method as that in Theorem 3 of [8], the optimal policy of the problem (4) can be expressed as where, for i ∈ E, x t is any realizable wealth at the beginning of period t according to the wealth dynamics equation (2). To simplify the mathematical expression, we define All the similar products hereinafter will be treated in the same way.
Apparently, problem (4) is separable in the sense of the dynamic programming technique, the optimal policy (5) thus satisfies Bellman's optimality principle.
For any t = 0, 1, · · · , T − 1 and i ∈ E, we define for given state i t at the beginning of the tth period. Then we have Lemma 2.1. For any t = 0, 1, · · · , T −1 and any given state i t ∈ E at the beginning of period t, we have a t (i t ) + δ t (i t ) = 1.
Proof. For any t ∈ {0, 1, · · · , T − 1}, we have Furthermore, we can deduce that It is obvious to see from (6) that a t (i t ) ∈ (0, 1). Lemma 2.1 then implies that Therefore, for given state i t ∈ E at the beginning of period t, we can easily derive that holds for any t = 0, 1, · · · , T − 1.
In the following theorem, we present the corresponding optimal policy under our assumption.
Theorem 2.2. For any t = 0, 1, · · · , T − 1 and i ∈ E, the optimal policy of the problem (3) is where i 0 and x 0 are the initial market state and the initial wealth, respectively. x t is the realizable wealth at the beginning of the tth period according to the wealth dynamics equation (2).
Proof. From the optimal policy (5) of the auxiliary problem (4), we have Noticing the statistical independence between R e t (Y t ) and X t , we take the conditional expectation to both sides of (10) under Y 0 , Y 1 , · · · , Y t and obtain Since , by applying the equation (11) recursively, we can deduce that

TIME CONSISTENT POLICY OF MV PROBLEM 235
Consequently, at time T , we have Furthermore, taking the conditional expectation to both sides of (12) under the initial wealth X 0 = x 0 and the initial state In order to derive the expression for the variance of wealth at the terminal time T , we firstly note that Moreover, by taking the conditional expectation to both sides of (14) under By now, we can determine the conditional variance of X T as follows: where γ = λ ω . Then, it is known from (13) and (15) that Moreover, because Therefore, the minimizer of U (·) can be easily found by setting dU (γ) dγ = 0, which gives us Substituting λ ω in (5) with γ * , we derive the optimal policy of the problem (3): The continuous time MV model with a stochastic interest rate is also solved in [22] and a similar solution as (9) is obtained. In the following, we call the optimal policy (9) of the problem (3) the pre-commitment optimal policy. To simplify the notation, let M V λ k−T (i k , x k ) denote the MV problem starting from period k with the state i k , the wealth x k and the trade-off parameter λ. Then, we introduce the definition of time consistency of the optimal investment policy as follows: is time consistent, if for any t = 1, 2, · · · , T − 1, the truncated precommitment optimal policy u * k (i), k = t, t + 1, · · · T − 1, is the optimal investment policy of problem M V λ t−T (i t , x t ). The similar definition can also be found in [17] and [21]. Unfortunately, the pre-commitment optimal policy doesn't satisfy time consistency of the optimal investment policy. Concretely, suppose that we have known the state Y k = i k and the wealth X k = x k at the beginning of period k. Then, the corresponding optimal investment decision problem can also be formulated as follows, min Var Similarly to the solution of the problem (3), we can obtain the optimal policy of problem (16) as i ∈ E, t = k, k + 1, · · · , T − 1. where a k (i k ) is defined in (6). In general, a 0 (i 0 ) = a k (i k ). This tells us that the precommitment optimal policy (9) doesn't satisfy the time consistency of the optimal investment policy.
Considering the above trouble, we generalize the model (16) by allowing the trade-off parameter λ to change over time. Specifically, for any k = 1, · · · , T − 1, the generalized model under given state i k and wealth x k can be described as: where λ k ∈ [0, +∞) is the trade-off parameter from the kth period to the T th period. This trade-off parameter can reflect the investor's different attitudes toward risk with respect to different amounts of wealth, different states and different investment horizons.
Similarly to the solution of problem (3), we can show that the optimal policy of problem (17) is i ∈ E, t = k, k + 1, · · · , T − 1.
In the next section, we will further consider the time inconsistency of the precommitment optimal policy of problem (3) and its revision.

Time inconsistency and its revision.
For the multi-period investment problem, the investor usually needs to determine the whole investment strategy, or the pre-commitment optimal policy of the entire investment horizon at the beginning. However, as time goes by, when the investor stands at time t(t ≥ 1), the truncated pre-commitment optimal policy may no longer be optimal with respect to the corresponding investment problem with T − t periods. According to the preceding analysis, the pre-commitment optimal policy of problem (3) doesn't satisfy the time consistency of the optimal investment policy. Due to this, we propose a weak time consistency and allow the trade-off parameter to change over time. Our new definition of the weak time consistency is similar in form to the time consistency in efficiency introduced in [17], but here we consider the multi-period MV model in stochastic markets. The other extension of our definition is that the optimal policy and the trade-off parameter depend not only on the value of wealth, but also on the market state. Specifically, the new time consistency is defined as follows: Definition 3.1. (weak time consistency) In the stochastic market, we call the optimal policy weak time consistent, if for any time t > 0, the truncated optimal policy u * k (i), for any state i ∈ E and k = t, t+1, · · · , T −1, is also the optimal policy of the generalized problem M V λt t−T (i t , x t ) with the time-varying trade-off parameter λ t ≥ 0. Now, we consider how to choose the value of λ k such that the pre-commitment optimal policy satisfies the above weak time consistency. Let and define the wealth thresholdx * k = Γ0 T −1 n=k r f n for k = 1, · · · , T − 1.
To guarantee the weak time consistency, we should equalize the optimal policy (9) of the problem (3) and that in (18) of the generalized problem (17), which gives us, for any 1 ≤ k ≤ T − 1, from which we can derive This expression and the nonnegativity of a k (·) and r f n imply that λ k ≥ 0 when x k ≤x * k , otherwise, λ k < 0. Therefore, the pre-commitment optimal policy of problem (3) satisfies the above weak time consistency when x t ≤x * t . However, when x t >x * t , the weak time consistency no longer holds because λ t < 0. Under this situation, we can see from the model (17) that the investor would minimize the investment mean and variance at the same time, i.e., the investor becomes irrational. To overcome this problem, we should revise the pre-commitment optimal policy so that it satisfies the following two conditions.
1) the revised policy should make the corresponding trade-off parameterλ k ≥ 0; 2) the terminal wealth obtained under the revised policy achieves the same mean as, but not greater variance than those of the terminal wealth obtained under the pre-commitment optimal policy. Remark 1. the condition 1) implies that the revised policy derived from the pre-commitment optimal policy would correspond to the optimal policy of another optimal investment decision problem. The trade-off parameter corresponding to the new investment problem,λ k ≥ 0, is called the implied trade-off parameter in what follows. The essence of the condition 1) is to make the investor rational through modifying the pre-commitment policy. The condition 2) generalizes the relevant idea in [17], where the terminal wealth obtained under the revised policy have the same mean and variance as those of the terminal wealth obtained under the pre-commitment optimal policy. Now, we concretely consider how to derive the revised policy from the precommitment optimal policy. To better illustrate the essence and superiority of our revised policy, we first consider the two period problem, M V λ 0−2 (i 0 , x 0 ). When T = 2, it is known from (6) and (19) that a 0 (i 0 ) = E f 0 (i 0 )f 1 (Y 1 )|Y 0 = i 0 , a 1 (i) = f 1 (i) for any state i ∈ E at the second period, and Γ 0 = r f 0 r f 1 x 0 + λ 2a0(i0) . In addition, h 1 (i) = 1−f 1 (i) = 1−a 1 (i). From (9), we see that the pre-commitment optimal policy can be written as With (22), we can easily calculate the conditional expectation and the conditional second-order central moment of the terminal wealth for any given state i ∈ E and x 1 , concretely, Here, we choose the wealth threshold asx * 1 = Γ0 r f 1 . Since it is not necessary to modify the decision at period 0, we haveũ * 0 (i 0 ) = u * 0 (i 0 ) and x 1 = r f 0 x 0 + R e 0 (i 0 ) u * 0 (i 0 ). At period 1, when x 1 ≤x * 1 , we set the corresponding revised decisionũ * 1 (i) = u * 1 (i). It is thus evident that, under this case, the terminal wealth obtained under the revised policy would achieve the same mean and variance as those of the terminal wealth obtained under the pre-commitment policy. When x 1 >x * 1 , we solve the problem M Vλ 1 1−2 (i,x 1 ) with the initial wealthx 1 , here we assume that the revised wealthx 1 is strictly less than x 1 andλ 1 is the implied trade-off parameter. The optimal policy is then given by: where Γ 1 andx 1 are parameters to be determined. The terminal wealth determined byũ * 1 (i) isX 2 = r f 1x1 + R e 1 (i) ũ * 1 (i), and the corresponding conditional expectation and the conditional second-order central moment ofX 2 are and respectively. Now, at the beginning of the first period, in order to ensure that the terminal wealth obtained under the revised policy can achieve the same mean as and, at the same time, not greater variance than those of the terminal wealth obtained under the pre-commitment optimal policy, we generalize the framework in [17]. Concretely, we consider the following restrictions: We can easily obtain from (28) that Substituting Γ 1 into (29), we have Since it is assumed thatx 1 is strictly less than x 1 , we can derive from the above inequality thatx , the terminal wealth obtained under the revised policy would achieve the same mean and variance as those of the terminal wealth obtained with the pre-commitment optimal policy, which is similar to the conclusion established in [17].
In addition, according to (26) and (27), the conditional variance of the wealth X 2 under the state i andx 1 can be computed as follows, Substituting (30) into the right-hand side of the above equation, the conditional variance can be expressed as

ZHIPING CHEN, JIA LIU AND GANG LI
is the level of the revised wealth that makes the conditional variance Var[X 2 |i,x 1 ] achieve its minimum, 0. In addition, We further explain the advantage of the above result through Figure 1. It is easy to see from Figure 1 that, whenx 1 1 <x 1 < x 1 , the conditional variance Var[X 2 |Y 1 = i, X 1 =x 1 ] corresponding to the revised policy is less than that of the pre-commitment optimal policy, Var[X 2 |Y 1 = i, X 1 =x 1 ] achieves its minimum value whenx 1 =x * 1 . In addition, the closerx 1 is tox * 1 , the closer the conditional variance Var[X 2 |Y 1 = i, X 1 =x 1 ] is to 0. Therefore, compared with the method in [17] which determines a unique revised wealth value and ensures the mean and variance of the terminal wealth unchanged after the policy revision, our framework (28)-(29) is more flexible and efficient. We can provide the investor an interval of the revised wealth value, within which the mean of the terminal wealth can keep the same after revising the investment policy, but the corresponding conditional variance can be reduced. Depending on the degree the investor wants the conditional variance to reduce, he can adaptively select the level of the revised wealth.
We now consider theλ 1 for the problem M Vλ 1 1−2 (i,x 1 ) with the initial state i and the initial wealthx 1 . In this case, Γ 1 can also be expressed as Then, we haveλ Var[X 2 |i,x 1 ] Figure 1. The relationship between the conditional variance Var[X 2 |i,x 1 ] and the revised wealthx 1 .
Substituting (30) into the above equation, we obtaiñ It can then be deduced from the above equation thatλ 1 = 0, this implies that the investor is risk-neutral. When a 1 (i) In this case, the investor would minimize the mean and variance of the terminal wealth at the same time, i.e., the investor becomes irrational. Whenx 1 <x * 1 andx 1 1 <x * 1 , the investor can choose the revised 1 ,x * 1 ). In the last two cases, we have 1 h1(i) (x 1 −x 1 ) > x 1 − x * 1 and λ 1 > 0, which shows that the investor is risk averse.
Remark 2. the above analysis shows that, as long as the revised wealthx 1 is selected from the interval [x * 1 ,x * 1 ) or [x 1 1 ,x * 1 ), the investor will be rational; moreover, depending on how much the investor wants the variance to decrease, he can make a more efficient investment decision by substituting the selectedx 1 into the expression forũ * 1 (i) in (25). These results significantly improve the relevant results in [17]. Obviously, both the conditional variance Var[X 2 |Y 1 = i, X 1 =x 1 ] and the implied trade-off parameterλ 1 are monotonously decreasing functions ofx 1 whenx 1 ∈ [x * 1 ,x * 1 ) orx 1 ∈ [x 1 1 ,x * 1 ); meanwhile, they are monotonously increasing with respect to x 1 −x 1 , which is the cash that can be taken out of the market. In other words, the more cash the investor takes out of the market, the larger the risk he has to bear and the bigger the implied trade-off parameter will become.
1 ,x * 1 ), we know from the above results that Now, we prove that the terminal wealth obtained under the revised policy can achieve the same mean as, but not larger variance than those of the terminal wealth obtained under the pre-commitment optimal policy. Actually, Inspired by the above detailed analysis for the two period problem, we now present our policy revising process for the general T (T ≥ 2) period problem.
At period 0: the revised stage decision is the same as that of the pre-commitment policy, i.e.,ũ * 0 (i 0 ) = u * 0 (i 0 ); Therefore,λ t > 0. The above results mean thatλ t is always nonnegative.
Theorem 3.3. The revised policy in (33) can generate the terminal wealth which has the same mean as, but not larger variance than those of the terminal wealth obtained with the pre-commitment optimal policy (9), i.e., Proof. We prove the theorem by induction.
It is known from the above revising process that the theorem is true for T = 2. Assume that the theorem is true for T = k with k ≥ 2, we further demonstrate that it is also true for T = k + 1. Let At period 0, we haveũ * 0 (i 0 ) = u * 0 (i 0 ). Substitutingũ * 0 (i 0 ) into (34), we obtain x 1 .
Whenx 1 ≤x * 1 , the truncated pre-commitment optimal policy u * t (i), t = 1, 2, · · · , k, is also the optimal policy of the induced problem M V λ1 Therefore, the theorem holds according to the induction assumption.
Whenx 1 >x * 1 , we can obtain the following conditional expectation and conditional second-order central moment of the terminal wealth under the truncated pre-commitment optimal policy u * t (i), t = 1, 2, · · · , k, Next, we consider the problem M Vλ 1 1−(k+1) (i,x 1 ) with the initial state i, the initial wealthx 1 , and the implied trade-off parameterλ 1 . Assuming thatx 1 is strictly less thanx 1 , then we have where Γ 1 andx 1 are parameters to be determined. To this end. we impose: It can be deduced form (44) that We substitute the above Γ 1 into (45) and obtain Sincex 1 >x 1 , we havẽ Then, the corresponding conditional variance ofX k+1 can be determined as Substituting (46) into the above equation, we have . Consequently, as long as we setx 1 ∈ [x * 1 , a 1 (i)x 1 + (1 − a 1 (i))x * 1 ) whenx * 1 >x 1 1 and, otherwise,x 1 ∈ [x 1 1 , a 1 (i)x 1 + (1 − a 1 (i))x * 1 ), we can ensure that the terminal wealth obtained with the revised policy would have a conditional variance not greater than that of the terminal wealth obtained with the pre-commitment policy. So, we have Here, the equality for the conditional expectations follows directly from (44). By using the similar argument as that for (31) and (32), we can easily deduce that Combining the above results with the conclusion for the case x 1 ≤x * 1 , we see that the theorem holds for T = k + 1.
We see from (37) that, for any t = 1, · · · , T − 1, a strictly positive amount of cash,x t −x t , can be withdrawn from the market as long asx t >x * t , by relaxing the self-financing constraint. Therefore, the investor would get a positive cash flow over the investment horizon. What's more important, the above two theorems show that the property and advantage of our new revising scheme for the two period case (especially Remarks 1 and 2) still hold for the general multi-period problem. 4. Numerical illustration. As an illustration to our theoretical results, we consider in this section a stochastic market modeled by a stationary Markov chain with three states: state 1 represents the bull market, state 2 represents the consolidation market, and state 3 represents the bear market, hence the state space can be represented by E = {1, 2, 3}. We use the daily return rates of S&P 500 from 3 January, 2011 to 31 December, 2013 as the in-sample data 1 to estimate the transition probability. Specifically, we first compute the median of all positive (negative) sample return rates, then for each examining day, we calculate the average value of the return rates in an effective time window with 2 weeks centered at that day. If the average value is greater (less) than the median of all positive (negative) sample return rates, we assign the examining day to state 1 (3); otherwise, we assign it to state 2. One can refer to [13] for more details about the state specification method. Finally, the state transition probabilities are determined by counting the relevant historical transition times in the in-sample period, and we obtain the transition probability matrix Q as follows We randomly choose three assets from the American stock market: Apple Inc. (AAPL), Wal-Mart Stores Inc. (WMT) and Exxon Mobil Corporation (XOM). These three risky assets and one riskless asset are used to test our theoretical results. For each state i ∈ E, the expected return vector of risky assets, r(i), is estimated by using the historical return rates in those days with state i in the sample period. Concretely, we get r We consider a three-period investment problem under the MV framework, here it corresponds to a three days portfolio selection problem starting from 2 January, 2014. The return rates of the riskless asset at three periods are assumed to be: We assume that the initial wealth is x 0 = 1, and the initial trade-off parameter is λ = 1. The estimated initial state is i 0 = 1. With these parameter values, we can determine V 0 (1), a 0 (1) according to (1) and (8), respectively, and the results are Sincex 1 >x * 1 , we need to revise the pre-commitment decision u * 1 . At the same time, we havex 1 1 =x 1 − 2(1 − a 1 (3))(x 1 −x * 1 ) = 16.1684 andx * 1 = a 1 (3)x 1 + (1 − a 1 (3))x * 1 = 16.2459, respectively. Obviously,x 1 1 >x * 1 . So, we can select the revised wealthx 1 from the interval [x 1 1 ,x * 1 ) = [16.1684, 16.2459). According to (33), we can obtain the revised decision at the beginning of the second period. Particularly, whenx 1 = x 1 1 = 16.1684,ũ * 1 = [0.1699, 1.0841, −0.1371] . As an illustration, we show in Table 1 the corresponding values of the implied trade-off parameter, the conditional mean, the conditional variance and the cash amount taken out of the market at the beginning of the second period under different revised wealths choosing from [16.1684, 16.2459).
It is easy to see from Table 1 that the conditional mean of the terminal wealth obtained under the revised policy is equal to that of the terminal wealth obtained under the pre-commitment optimal policy E[X 3 |Y 1 = 3,x 1 ], and it does not change Table 1. The values of the implied trade-off parameter, the conditional mean, the conditional variance and the cash amount taken out of the market at the beginning of the second period under different revised wealthsx 1 .
x  [16.1684, 16.2459). However, they are monotonously increasing with respect to (x 1 −x 1 ), which is the cash that can be taken out of the market. These results are consistent with the theoretical conclusions we have obtained.

5.
Conclusions. By solving a separable auxiliary problem of the original multiperiod MV problem in stochastic markets, we derive the pre-commitment optimal policy which is not time consistent. To overcome this, we propose a weak time consistency in stochastic markets by considering the conditional mean and variance of the wealth simultaneously. Then we demonstrate that the pre-commitment optimal policy satisfies the weak time consistency if the investor's wealth is no more than some threshold. Otherwise, when the wealth exceeds the threshold, the investor becomes irrational and minimizes the expectation and variance (risk) of the terminal wealth at the same time. For the later case, by relaxing the self-financing constraint, we revise the pre-commitment optimal policy and derive a positive cash flow that can be taken out of the market. It is shown that the revised policy can ensure the rationality of the investor's behavior. What's more important, the terminal wealth obtained under our revised policy can achieve the same mean as, but not larger variance than those of the terminal wealth obtained under the pre-commitment optimal policy; depending on how much he wants the conditional variance to reduce, the investor can flexibly choose the value of the revised wealth from the specified interval and find a more efficient optimal investment policy.
In this paper, we study the time consistency of the optimal policy under the multiperiod MV framework in stochastic markets with only the self-financing constraint. It is worthwhile to investigate the time consistency when other practical constraints, such as the bankruptcy constraint, no short-selling constraints, are included. These topics are left for future research.