A HYBRID APPROACH FOR INDEX TRACKING WITH PRACTICAL CONSTRAINTS

. Index tracking is a popular way for passive fund management, which aims to reproduce the performance of a market index by investing in a subset of the constituents of the index. The formulation of index tracking with some realistic constraints always leads to an optimization problem that is very hard to solve. In this paper, we propose an approximate formulation to the original optimization problem and analyze the approximation error bound. It is shown that the approximation can be reasonably close to the original problem. We consider both cases where the mean absolute error and mean square error are used as the tracking error measurements. The mean absolute error measurement results in a mixed-integer linear programming problem and can be solved by some standard solvers, such as CPLEX. The mean square error measurement leads to a mixed-integer quadratic programming problem. An eﬃcient hybrid heuristic method is given to solve this problem. We do some numerical experiments by the use of ﬁve data sets from OR-Library. The results are promising.


1.
Introduction. There are two kinds of basic strategies adopted for portfolio management: active and passive. Active strategies rely on the assumption that excellent investors can, through their expertise and judgement, outperform the market. While passive fund managers believes that the financial markets are efficient, and it is impossible to beat the market return consistently .
In recent years, "index funds" has gained well acceptance, especially among some mutual/pension funds sponsors. An index fund is a typical passive managed fund, which aims at constructing a portfolio to simulate the performance of some specified market index, such as the S&P 500, Dow Jones indices and etc, during a predefined period of time. We refer to ( [21]) for more available indices. Compared with active managed portfolio, index funds have lower fixed fees/transaction costs, and higher diversification. It is particularly favorable in a bull market, when the active managed funds have slim chance to beat the market.
There are two different strategies to track the market index, namely, the full replication and partial replication. Full replication is done by investing all constituents of the index proportional to their shares in the tracked index. It is simple both conceptually and computationally and can achieve a perfect match. However, its disadvantages are also obvious ( [3]). It incurs high initial transactions costs, and is difficult to rebalance when changes are made to the index, for example, the issue of a new set of shares. On the contrary, in a partial replication, an investment is made in a small proportion of the shares, while attempting to match the performance of the entire index. This incurs lower initial transaction costs, and is easier to rebalance. It also allows the investor to constrain the choice of investment that is made, by insisting on the inclusion or exclusion of some companies, or by setting the proportion of the capital that is to be invested in others [20]. Consequently, in practice, the partial replication is generally preferred.
In this work, we consider the index tracking problem using partial replication strategy. Our model is to minimize the tracking error subject to some realistic constraints including the cardinality constraint, the transaction cost constraint and other constraints such as the upper and lower bound limits to the position of each asset. The tracking error in our model is defined as the measure of the difference between the return of the index and that of the tracking portfolio. There are some other tracking error measurements in the literature such as the the correlations between the tracking portfolio and the index, and the variance of the difference between the returns of the tracking portfolio and those of the index ( [15], [20]). [3] discussed index tracking problems with realistic constraints. We will propose an index tracking model that is similar to but a little different from that of [3]. The major difference between our model and that of [3] lies in the definition of portfolio/index return. We use compound interest rate, while [3] choose continuous interest rate. The proposed tracking error is a non-convex function of the decision variables. However, the use of compound interest could help us to deduce a convex approximation to the model. In particular, under mean absolute error measurement, we can use a linear programming to approximate the model, and under mean square error measurement, we can derive a convex quadratic programming approximation to the model. Our error bound analysis shows that under mild conditions, the approximation can be locally sufficiently close to the original index tracking model.
It has been theoretically proved that the index tracking problem with cardinality constraint is NP-hard ( [19]). So, heuristic method, such as simulated annealing ( [13]) and genetic algorithms ( [10]) are often suggested. In literature, heuristic methods for cardinality constrained index tracking problem has been addressed by numerous authors. [20] treated the problem of selecting the optimal subset of assets included in the tracking portfolio and the problem of determining the optimal asset amount separately. He proposed a genetic algorithm to generate the subsets, and used a quadratic programming to find the proportion of the available capital that should be invested in each asset. However, the work by [20] did not consider realistic constraints such as the position constraints and transaction cost constraints. [9] presented a threshold accepting algorithm for index tracking. They renewed the solutions by selling a random index in the current selected asset set, denoted as J, and buying a random index in the neighborhood of J. New solutions are accepted if they are no worse (by a specified threshold) than the best solution found so far. They tested their proposed algorithm with artificial indices, where the exact solution is known, but give no results on real world market problems. [3] present an evolutionary heuristic method for his proposed non-convex index tracking problems. Like the threshold accepting algorithm proposed by [9], their proposed genetic method has the advantage of imposing no restrictions on the shape of the objective function or the constraints. [19] chose to hold the weights of the assets in the portfolio constant and their proposed tracking model is a mixed-integer quadratic programming. They presented a set coded genetic algorithm to identify the optimal subset of assets and use quadratic programming to determine the optimal asset weights. However, the strategy of holding the assets weights constant is hard to manage in practice and could bring lots of transaction costs.
In this work, we present a new hybrid optimization algorithm for our proposed index tracking model. In the algorithm, we treat the work of selecting the optimal asset subset and the work of determining the optimal amount of assets alternatively. We present a simple heuristic based on simulated annealing ( [13]) to identify the optimal subset of assets. We solve a linear/quadratic program (corresponding to the mean absolute error measurement and the mean square error measurement, respectively) to find the corresponding optimal tracking portfolio with the selected asset subset. The resulting candidate solution is accepted according to the rule of simulated annealing. Simulated annealing is one of the most popular heuristic ( [13]) for optimization since it provides a mean to escape local optima by allowing hill-climbing moves. We refer to [1], [14] for details about simulated annealing.
The latter part of the paper is organized as follows. We will describe the detail of the index tracking problem in the next section. Section 3 describes our hybrid method and Section 4 presents some numerical results with artificial data and realistic data. We conclude the paper with some remarks in Section 5.
2.1. Preliminary. Suppose that there are N assets in the market from which a tracking portfolio should be found. Let the time series be defined for equally spaced intervals. We observe the performance of the N assets as well as the market index at time 0, 1, ..., T and re-balance the current portfolio at time T . We are interested in deciding the best tracking portfolio to hold for some future invest period, denoted as {t 0 , ..., t 0 + L}. The mechanism behind our work is that history would repeat itself. We are going to find the portfolio that has best tracked the index over time interval [0, T ] and expect it would do a good job during the investment period.
Let S i (t) be the price of asset i (i = 1, ..., N ) at time t, and I(t) be the value of the index at time t. The return of the index over the period [t, t + 1) is defined as Let c i (t) be the number of units of asset i in the tracking portfolio at time t that remains constant in the interval [t, t + 1). Define as the set of indices of assets appearing in the tracking portfolio. The value of the tracking portfolio at time t, denoted by v(t), is then calculated via The tracking portfolio could also be characterized by defining return r i (t) and the weights w i (t) of each of the assets in the portfolio at time t, Then the return of the portfolio at time t is Serval possible strategies can be used to replicate a financial index ( [18]). Here are some examples. The buy and hold strategies determine the quantities in each asset c i (t) and maintain these value constant throughout the tracking period. Constant mix strategies hold a constant fraction of wealth in asset. Buy assets as the market falls and sell them as it rises. There are also some other strategies such as constant proportion portfolio insurance, option based portfolio insurance, etc. In our method, we consider the buy and hold strategies. They are simple and could be followed by all investors easily.
As we have supposed that the quantity c i (t) remains unchanged through the future investment period. In the latter part of the paper, we will simply use c i instead of c i (t).

Practical Constraints.
From a practical point of view, there are a number of constraints that should be considered. In this work, we consider the restrictions on the positions to each asset, the transaction costs and the number of assets in the portfolio.
2.2.1. Transaction cost constraint. There are several kinds of transaction costs in portfolio management, such as brokerage tees, taxes, fund loads, etc. We will also consider the transaction costs C trans incurred at re-balancing the portfolio at time T . Letc=(c 1 ,c 2 , ...,c N ) be the old portfolio before re-balance. We assume that the transaction cost G i incurred in asset i is proportional to its absolute value change, that is, where f s i , f b i denote the fractional cost of selling and buying one unit of asset i at time T , respectively. A rational investor would like to limit the transaction in order to prevent unduly buying and selling. Let C be the total wealth that could be invested in the new portfolio, and γ (0 ≤ γ < 1) be the upper bound proportion limited to the transaction costs. Then we get the transaction cost constraint We see from (3) that as a function of c i , G i is discontinuous, which makes the problem be computationally very difficult. In order to make the problem be easy to handle, we are going to approximate the constraints using the idea in [6]. Specifically, we use the following set of constraints instead of constraints (3) and (4)

Position constraint.
In general, short selling is prohibited, which means c i ≥ 0. We suppose that all the capital C is invested in the portfolio, which results in the constraint In order to avoid small trades, we require for each i, ciSi(T ) C ≥ i with some prescribed constant i ∈ [0, 1]. Meanwhile, to guarantee a certain level of diversity of the tracking portfolio, we impose an upper bound of the weight for each i. Consequently, we have the following constraints In practice, we may have some capital concentration constraints in the tracking portfolio. For example, we may require 0.1 ≤ c1S1(T )+c2S2(T )+c5S5(T ) C ≤ 0.5. This is computationally easy to handle. So, in the sequel, we will neglect these constraints, without loss of generality.

Cardinality constraint.
In index tracking, the portfolio manager would like to restrict the number of assets. This results in the following cardinality constraint where #{·} denotes the size of the set {·} and K is an integer that stands for the number of assets of the portfolio. Related work about portfolio selection with cardinality constraint can be found in, for example, [3], [5] and [9].

2.3.
Objective function and approximation. The objective of our index tracking is to minimizing the tracking error. There are many kinds of tracking errors, among which, the following tracking error given by [3] is very popular where α > 0 is a constant. In this work, we consider the cases α = 1 and α = 2. The case α = 1 corresponds to the so-called mean absolute error measurement and α = 2 corresponds to the mean square error measurement, respectively. We refer to [4] for alternative tracking errors. The tracking error defined by (7) is generally not convex due to the nonconvexity of r P (t). In fact, substituting (1) and (2) to (7), we obtain

YINGJIE LI, XIAOGUANG YANG, SHUSHANG ZHU AND DONG-HUI LI
Since v(t) relies on c i , the function T R(c) above is not convex. In what follows, we are going to find a convex approximation to T R(c), so that we can solve the problem more easily. It follows from (2) that for any t ∈ {0, 1, · · · , T − 1}, .
Notice that r P (t) is expected to be very close to r I (t), t = 0, 1, ..., T − 1, near the optimal tracking portfolio. This motivates us to replace r P (t) by r I (t), which yields the approximate relation .
Observing by (4) that C trans is bounded from above by γC, we may approximate v(T ) by a constant (see also [6]) and get Sincer P (t) is a linear approximation to r P (t), we get the following convex approximation to the objective function given by (7): which is much easier to handle than the original one.

Optimization problem.
To formulate the problem in a more tractable way, we introduce an auxiliary vector z = (z 1 , z 2 , .., z N ) whose elements are binary variables. More specifically, for each i = 1, 2, . . . , N , z i = 1 if and only if asset i is included in the tracking portfolio, otherwise, z i takes the value 0. Then minimizing the tracking error defined by (10) with α = 2 results in the following mixed-integer quadratic program Replacing the objective function of (P r2 ) with yields the tracking problem with mean absolute error measure (the case for α = 1). By introducing auxiliary vector f =(f 0 , f 1 , ..., f T −1 ), the problem can be equivalently reformulated as the following mixed-integer linear program 3. Validity of the approximation. As pointed out in the last section, the models (P r1 ) and (P r2 ) are not the exact index-tracking models but their approximations.
In this section, we analyze the relationships between the approximation models and their exact versions. We first give some simple but useful relations among r I (t), v(t),v(t + 1) andr P (t). Indeed, we get from (8) thatv(t) =v(t + 1)/(1 + r I (t)). This implies We also have from (2) and (9) The following two propositions show that under some reasonable conditions, the approximations can be very close to the exact models near the solution.
where 1 > 0 is sufficiently small. Suppose further that the corresponding return sequence where 2 is a constant and C(T ) is a constant related to the time window T . Then we have . By the assumption, we have which implies 2 > 0. We get from (11) and (12) v Then we obtain for any t ∈ {0, 1, · · · , T − 1}, We also have by the definition of r P (t) andr P (t) where the first inequality follows from (14).
It follows from (13) and (16) that For the case α = 2, we can derive from (13) and (15) |r where last inequality follows from Cauchy inequality. So, we get from (17) 1 This together with (13) imply The proof is completed.
As a direct corollary of Proposition 1, we have the following results immediately.
Suppose further that the corresponding return sequence where C(T ) is a constant related to the time window T . Then we have 1 (r P (t) − r I (t)) 2 → 0.
Proposition 1 and Corollary 1 indicate that if the approximated portfolio tracks the index sufficiently, then the real portfolio will also track the index well.
Proposition 2. Suppose that the portfolio (c 1 , c 2 , ..., c N ) satisfies Suppose further that the return sequence {r P (t)} generated by the corresponding portfolio satisfies Proof. We only need to show |r P (t) − r P (t)| α → 0, α = 1 and 2.

YINGJIE LI, XIAOGUANG YANG, SHUSHANG ZHU AND DONG-HUI LI
If β T −1 j=t 1 + j 1+r P (j) < 1, then we obtain where the last inequality follows from first-order Taylor expansion. For α = 1, we get from (20) |r This implies 1 T For α = 2, we also have from (20) |r This also implies 1 T The proof is completed.
Proposition 2 shows that if there exists a portfolio that sufficiently close the index, then the associated approximate portfolio will also track the index well.

4.
A hybrid method. In this section, we propose a hybrid method to solve the proposed index tracking formulations (P r1 ) and (P r2 ). Since (P r1 ) and (P r2 ) are very similar, we only construct algorithm for solving (P r2 ). Algorithm for solving (P r1 ) can be constructed in a similar way. The mixed quadratic programming (P r2 ) is generally NP hard ( [19]). Numerical methods for the exact solution of the problem might be unaffordable. For example, if we solve the problem with the universe assets N = 225 and cardinality constraint K = 10 by exhaustive search, it will cost a PC millions of years to get a solution ( [16]). Other exact techniques like Branch-and-Bound methods ( [11]) might be used to handle this combinational problem. However, the computational complexity still grows exponentially with the size of the universe assets. Consequently, heuristics, such as the simulated annealing ( [13]) and genetic algorithms ( [10]) are practically welcome.
Heuristics contribute to a reasonable extent in solving global optimization problems, mainly combinatorial problems ( [17]). Simulated annealing (SA) is one of the most popular heuristics [13], that have been employed in a wide variety of problems.
The key points of an SA to escape from local optima relies on its acceptance rule. Under the SA acceptance rule, improved solutions are always accepted, while a fraction of non-improved (inferior) solutions are accepted in the hope of escaping local optima in the search of a global optima. The probability of accepting non-improved solutions depends on a temperature parameter, which is typically non-increasing with each iteration of the algorithm. If the temperature parameter is decreased sufficiently slow, then the sequence generated by an SA will be asymptotic convergent to a global optima ( [7]). We refer to ( [12]), [1], [14] for details about simulated annealing. However, SA, like other meta-heuristics, suffers from the slow convergence that brings about the high computational cost.
In what follows, we are going to develop a hybrid method to find some approximate solution to the problem. We propose a heuristic procedure to identify the appropriate asset set that should be invested, and use a quadratic programming solver to find the optimal tracking portfolio that invests only in the selected asset set. The idea of hybriding quadratic programming with heuristic algorithm has been successfully applied in portfolio selection ( [5], [19]). We follow this idea and propose a hybrid approach for solution of problem (P r2 ).

4.1.
A hybrid optimization method for index tracking. We see from (P r2 ) that it is the cardinality constraint that makes the problem NP hard. Once we have decided which assets to be invested, (P r2 ) reduces to a standard convex quadratic programming which could be effectively solved ( [8]). Based on this observation, we consider to treat (P r2 ) using two steps, first to choose the asset set to be invested and then to decide the investment weight of each asset. We propose a simulated annealing based heuristic to explore the investment asset set and resort to quadratic programming to find the investment weight. Figure 1 gives the flowchart of our proposed hybrid heuristic. We first use some initialization procedure to produce an initial asset set. Then quadratic programming are used to determine the best tracking portfolio that are invested only in the selected asset set. The produced tracking portfolio is accepted as the next candidate solution through some simulated annealing rule, called SA m for simplify. If the produced portfolio fails to be accepted, adjust the asset set and resolve the corresponding investment weights. The process repeats until we get some acceptable candidate. We repeat the above procedure until it satisfies some predetermined termination conditions.
In what follows, we describe our proposed hybrid method in detail.

Initialization.
At the beginning, we need to get an initial asset set. We determine the initial asset based on the following observation. First, we consider to fully replicate the market. In other words, we solve the quadratic program (P r2 ) by dropping the cardinality constraint and get a portfolio c = (c 1 , c 2 , ...., c N ). Intuitively, we think that the assets that possess large proportions in that portfolio should be selected as candidates to get into the initial asset set. The following process gives the detail to find the initial asset set.
It was obtained by ignoring the cardinality constraints in (P r2 ). Let c = (c 1 , c 2 , ...., c N ) be the solution vector obtained by solving the convex quadratic problem (21) via CPLEX . Relabel c so that it satisfies, We use the following procedure to construct an initial asset subset.
then we set J 0 = J candi and stop. As the Update procedure is essentially exhaustive search, it could always yield some well-defined initial asset set as long as (P r2 ) is feasible. In many cases, if (P r2 ) has feasible solutions, J candi = {1, 2, ..., K} could yield a feasible solution.

4.1.2.
A heuristic strategy to adjust the asset subset. Let the asset set at iteration l be denoted by J l = {i l 1 , i l 2 , ..., i l K }. Our heuristic aims to find a J l+1 by extracting some asset, say i out ∈ J l , and finding a new asset, say i in ∈ J l , from J l . That is, We first consider the choice of the out asset i out . We regard the asset with least investment amount in the current portfolio to be least important. Based on this point, the out-asset is determined according to the amount of the scalars S i (T )c(i), i ∈ J l . Asset with smaller S i (T )c(i) has more priority to be considered.
For a given out asset i out , we use a simulated annealing procedure, denoted as SA a , to generate a possible in-asset. More precisely, for J l = {j|j ∈ J l }, we assign a fitness value for each j ∈ J l , and accept asset j as a candidate in asset according to SA a one by one, through the order of its fitness value. Asset with better fitness value have more priority to be considered.
Suppose asset j is accepted as i in by SA a . Then we could get a candidate asset subset with asset i out out and asset i in in. Otherwise, if all J l fails to be accepted by SA a , we find a next candidate out asset and an in asset. This procedure continues until we could find some accepted pair of out asset and in asset, denoted by (i out , i in ).
In case the above process could not find a pair (i out , i in ), we let i 1 = arg{max S i (T )c(i), i ∈ J l }, and randomly select an asset j ∈ J l as the i in , and set J l+1 = J l ∪ {i in } \ {i 1 }. Otherwise, set J candi = J l ∪ {i in } \ {i out }, and use another simulated annealing procedure, called SA m , to accept J candi as J l+1 . This procedure repeats until some J candi could be accepted by SA m . If no generated J candi could be accepted, randomly select a asset j ∈ J l as the i in and set It should be noted that, we have used two simulated annealing procedure, SA a and SA m , in the asset adjustment process. Procedure SA a is used to explore some candidate set for J l+1 , denoted as J candi ; while procedure SA m is used to determine if J candi could be accepted as J l+1 .
We summarize the above process as the algorithm below. Algorithm (A hybrid method for (P r2 )) 0 Initialization. Use Procedure 0 to generate an initial asset set J 0 . Set l := 0. Set the current asset subset be J l = {i l 1 , i l 2 , ..., i l K } and compute the optimal tracking portfolio with assets only invested in J l to get c l c(J l ) = c(i l 1 ), c(i l 2 ), ..., c(i l K ) . Let J l = {j|j ∈ J l }, c best := c l . Set the cooling schedule parameters: the initial temperature T max , T min , cooling rate ρ in SA a procedure; and the corresponding parameters T max , T min , ρ in SA m procedure. Set the precision parameter > 0 and the iteration tolerance parameter iter max , iter best . 1 Termination Conditions. Stop the algorithm if one of the following three cases happens: (1) the number of iterations l reaches iter max ; (2) the process fails to get a better tracking portfolio after iter best many iterations; (3) the best tracking portfolio ever found satisfies: T R(c best ) ≤ . 2 Asset set update. Let p := 1 and resort J l so that it satisfies Set t := T max , t := T max . 2.1 Let i out := i p . For each j ∈ J l , let Construct a portfolio P (j) that corresponding to J candi (j) Define the fitness value of asset j by T R (P (j)) and relabel J l so that it satisfies Set q := 0. 2.2 SA a Procedure. If q ≤ N − k, set q := q + 1, t := ρt and go to step 2.2.1. Else if p < K, set p := p + 1 and go to step 2.1; if p ≥ K, go to step 2.4. 2.2.1 If t ≥ T min , then accept k q according to the simulated annealing rule with the probability exp −(T R(c l )−T R(P (kq)))/t . If k q is accepted, set i in := k q , J candi := J candi (i in ), t = T max , and go to step 2.2.3. Else, if k q is not accepted, go to step 2.2. 2.2.2 If t < T min , choose a random j ∈ J l . Let and set t = T max .

Solve (P r2 ) with
Let c(J candi ) be the corresponding optimal tracking portfolio. If it is inconsistent, go to step 2.2.
Let c(J candi ) be the corresponding optimal tracking portfolio. If it is inconsistent, go to step 2.4; else, set J l+1 = J candi , l := l + 1, and go to step 1.

Numerical experiments.
In this section, we do some numerical experiments to test the model and the proposed hybrid algorithm. We use the data set of [3], which is publicly available from OR-Libray ( [2]). This data set includes prices of assets composing the Hang Seng (31 assets), the DAX 100 (85 assets), the FTSE 100 (89 assets), the S&P 100 (98 assets) and the Nikkei index (225 assets). We coded our algorithm in C# and used the CPLEX (http://www.ilog.com/products/cplex/) to solve the quadratic programs. The program was run on a PC with 1.6GHZ CPU processor. Unless otherwise stated, the initial parameters and control parameters are listed as follows. 5.1. Test with artificial index. Since the formulation (P r2 ) is a mixed-integer quadratic programming, it is generally hardly to obtain the optimal tracking portfolio for the problems with realistic size. To provide a means to test the validity of our heuristic algorithm, we first use an artificial index. For a tracking problem with cardinality constraint K, we define the artificial index r I (t) as r i (t), t = 0, 1 · · · , T − 1. In other words, the artificial index is generated by investing equal units in the last K assets. In the case where i = 0, δ i = 1, f b i = 0, f s i = 0, ∀i and γ = 0, we could get the exact optimal portfolio, denoted by c * , with tracking error T R(c * ) = T R(c * ) = 0. We use CPLEX (version 11.0) to solve the mixed-integer linear programming formulation (P r1 ). For α = 2, we applied our heuristic 20 times on each problem and report the average results. For all artificial problems, we set = 10 −10 , iter best = 30. Detailed computational results are listed in Tables 1 and 2 (T = 290). The meaning of each column in the following two tables is stated below.
Index: the name of the asset market; K: the cardinality of the tracking portfolio; TR: the tracking error defined in (7); PTR : defined as (TR)/ (standard deviation of {r I (t), t = 0, 1, · · · , T }); Time: the CPU time used (in seconds). We see from Table 1 that our approximate formulation (P r1 ) makes high quality of approximation. The accuracy can reach up to o(10 −17 ). Meanwhile, the approximate formulation is easy to handle. In all cases, it took CPLEX less than 7 seconds to get an optimal portfolio. From Table 2, we can see that our heuristic performed quite well for the artificial index tracking problems. The average accuracy can reach to o(10 −9 ) ∼ o(10 −10 ) and the maximum cpu time used is less than 16 seconds.

5.2.
Test with real indexes. The numerical results in the last subsection has shown that the formulations (P r1 ) and (P r2 ) and the proposed hybrid solution method worked well in sample. In practice, we wish a portfolio perform well out of sample too. In order to investigate both the in sample and out of sample performance of our approach, we used the data from time 0 to time 145 as the training data, and test the tracking performance from time 146 to time 290.
We first tested the formulation (P r1 ). We solved it by CPLEX as well as our heuristic procedure. The maximum cpu time costed by CPLEX is set to be 7200 seconds. Detailed results were shown in Table 3. We then test formulation (P r2 ). For each case, we run the heuristic procedure 20 times and reported the average results in Table 4. We considered the cases γ = 0.0025, 0.005, 0.0075, 0.01, while the case γ = 0.01 was not considered in formulation (P r1 ). This is because the cpu time used by CPLEX increases dramatically as the value of γ grows. While we solved (P r1 ) with γ = 0.01 by CPLEX, we found that in most cases, we could not get an optimal solution within 7200 seconds. The results in Table 3 show that both in sample and out of sample tracking errors are decreasing with respect to γ. Meanwhile, lower in sample tracking errors generally yield lower out of sample tracking errors. This is in accordance with the expectations that "Increasing the transaction cost limit γ will reduce tracking error" and "Reducing in-sample tracking error reduces out of sample tracking error" ( [3]). We also see from the table that the cpu time consumed by the CPLEX increases dramatically with respect to the size of the problem and the value of the transaction cost limit γ. For Hang Seng index (n = 31) with γ = 0.0025, it took less than 1.0 seconds to get an optimal tracking portfolio. While, for the Nikkei index (n = 225) with γ = 0.0075, it took more than 2 hours to get an optimal solution. On the other hand, our heuristic seems less sensitive with respective to n and γ. In all cases, it took less than 15 seconds to get a tracking portfolio. In addition, for some of the problems, eg. DAX with γ = 0.0075, S&P with γ = 0.0025 ∼ 0.0075, our hybrid method could get a portfolio with smaller tracking error than that of CPLEX. This would be not surprising because we used the original tracking error (7) to guide our exploration, while the portfolio obtained by CPLEX was an optimal portfolio with respect to our approximate tracking error measurement (10).
From Table 4, we can see that the our hybrid method could get a portfolio with average tracking error o(10 −5 ). The results also in accordance with the two expectations mentioned above. To see the effectiveness of the formulation and the algorithm, we draw the tracking portfolio in Figure 1 (fixed the market price and the portfolio price to be 100 at T = 145). From Figure 1, we can see that our portfolio tracked the index pretty well both in sample and out of sample.
Next we compare our method with some existing heuristic methods. [3] and [9] also considered index tracking problems used heuristics to get the solutions. The heuristic in [3] was based on evolutionary while the heuristic in [9] was based on a threshold accepting rule. As we have shown in Section 4, our hybrid heuristic is a combination of the quadratic programming and the simulated annealing. Table  5 lists the performance of these three heuristics. As the definition of return in [3], [9] and ours are different, it is hard to say whose tracking portfolio is of the least tracking error. So, in the table, we did not list the exact value of the tracking errors, instead, we list the order of them. The results in Table 5 show that all the three method could get optimal portfolios with similar accuracy orders. Since [9] did not test their heuristic for market index, we leave the corresponding cells blank. Average cpu time costed were reported in Table 5, where "s" stands for seconds and "m" stands for minutes. As we are testing the methods under different operational platform and using different environment, we could not say which method is least expensive. However, we think that our method is competitive with others.
6. Conclusions. We formulated the index tracking problem with realistic constrains into a constrained optimization problem. We considered two cases where the mean absolute error and the mean square error were used to measure the tracking performance. The formulations are non-convex optimization problems. We then proposed a mixed-integer linear program (for mean absolute error measurement) and a mixed-integer quadratic program (for mean square error measurement)  to approximate them. Under reasonable conditions, we proved that the approximate formulation could provide a good approximation to the original non-convex tracking problem. To solve the approximation problem, we proposed a hybrid algorithm by combining numerical method for convex programs and simulate annealing approach for combinatorial optimization problems. Our experiments show that the approximate formulations works well and the proposed hybrid method is efficient.