USING GENETIC ALGORITHM TO CONSTRUCT A MOMENTUM-BASED STOCK FUND

Portfolio optimization is an important research field in modern finance. An important goal in portfolio optimization is to maximize risk-adjusted returns. In addition, momentum investing has gained a wide acceptance by asset managers. Besides, genetic algorithms (GA), which are based on the ideas of evolution and the concepts of Darwin’s natural selection, have been widely used to generate high-quality solutions to optimization problems. In this paper, we propose an approach using genetic algorithms to construct a momentum-based 130-30 stock fund. We use the Sharpe Ratio as the fitness function for portfolio evaluation and a Mean-Variance Model with Monte Carlo simulation to optimize the portfolio’s long and short positions. Using 2020 market data for the S&P500, our fund outperforms a variety of stock portfolios as well as the S&P500 ETF Fund SPY measured by Total Return, Sharpe Ratio, and Information Ratio.


Genetic Algorithms
Genetic Algorithms [1] are a class of heuristic algorithms used to solve optimization problems without using close-formed analytical solutions. They are based on the ideas of evolution and the concepts of Darwin's natural selection, i.e., "survival of the fittest". In general, Genetic Algorithms (GA) is an iterative process which starts from a random population having a number of chromosomes, where each chromosome contains a list of genes. In each step (generation), the genetic algorithm selects chromosomes from a population based on their fitness value to produce new offspring chromosomes. Each chromosome is measured by its fitness, which is its capability to adapt to its "environment". Fitness in GA is measured by the value of an objective function to be optimized. Like natural selection, chromosomes with high fitness values are preserved and have a chance to pass their genes to the next generation while the chromosomes with low fitness values are replaced by offspring chromosomes. A new population is then formed with preserved chromosomes from last generation (with high fitness values) and offspring chromosomes, in the hope that it will survive better than the previous population. The number of chromosomes in the populations from generation to generation is constant. When the GA process is terminated, the chromosome with the highest fitness value is selected as the solution to the optimization problem. Important operations in a GA process include chromosome selection, crossover, mutation, fitness calculation, and determination of termination criteria. They are detailed as follows and shown in Chromosome Selection: The main purpose of the selection process is that the better an individual is, the higher its chance of being a parent [2]. Suppose there are N chromosomes in a population, we select M chromosomes with the highest fitness values (1 < M < N) to be parent chromosomes for crossover. 3 USING GENETIC ALGORITHM TO CONSTRUCT A MOMENTUM-BASED STOCK FUND Crossover: is implemented by selecting a random point on the chromosomes where the gene exchange happens between two parent chromosomes. The offspring chromosomes from crossover will have genes chosen from both parents based on the exchange point. Such crossover is also called one-point crossover.
Suppose there are n genes s1, s2, …, sn on each chromosome. If S1= {s11, s12, …, s1n}, S2= {s21, s22, …, s2n}, are two parent chromosomes, select a random integer number 1 < r < n, two offspring chromosomes, S3 and S4, acre created by crossing over genes as follows: Mutation: is a genetic operation to maintain genetic diversity from one generation to the next. A mutation is carried out by selecting one gene at random from a chromosome to be replaced by another gene from a separate chromosome which is randomly selected in the population.
Suppose chromosome S1= {s11, s12, …, s1n}, and Sk is another chromosome randomly selected from the population, Sk= {sk1, sk2, …, skn}. A mutation on S1 occurs when a random selected gene s1r (1 ≤ r ≤ n) from chromosome S1 is replaced by the gene skr from the chromosome Sk, i.e., a mutation of S1 = {s1i | if i=r, s1i =ski, else s1i=s1i, 1≤ r ≤ n}. The mutation operation mentioned above, which makes random change to one gene on a chromosome after crossover, is based the biological point mutation, and called single point mutation. This mutation operation involves a probability that a gene will be changed on a chromosome. The probability is called mutation rate or mutation probability, which is in the range of [0, 1], usually low, such as 3%. A common method of implementing the mutation operation involves generating a random variable for a gene on a chromosome. If the random variable is less than or equal to the mutation rate, the gene will be mutated [3].

Fitness Calculation:
A fitness function measures the strength of a chromosome's ability to survive.
The value calculated from the fitness function is used to select chromosomes for better surviving in the evolution process. In our GA, we use Sharpe Ratio as the fitness function.

Determination of Termination Criteria:
A genetic algorithm performs the operations described above, i.e., chromosome selection, crossover, mutation, fitness calculation in sequence for a population. After these operations are performed, a new population is created for next generation (shown in Figure 1). Because of the nature of genetic algorithms, most of the time, it is not clear when the algorithm should stop so the algorithm must check the termination criteria at each generation. Termination criteria are usually based on statistical information such as the maximum number of generations, or convergence to a single fitness value for the best chromosome over multiple generations.

Portfolio Optimization
Portfolio optimization is a process to select the "best" portfolio among a set of potential portfolios.
Modern portfolio theory [4] introduced the concept of the efficient frontier, which is a curve of the highest expected returns for each separately given risk level ( Figure 2). Using the risk-free rate, we can plot a tangent from the expected return axis to the efficient frontier and the tangent point locates the portfolio with the highest Sharpe Ratio of return per risk [5]. Therefore, in our studies, the portfolio with the highest Sharpe Ratio is selected as the "best" portfolio for back-testing.  [9]. Additionally, it was found that 77% of US mutual funds that relied on momentum strategies realized significantly better performance as compared with other funds [10]. In this paper, we use GA to optimize the S&P500 component stocks and their weight in momentum portfolios. Back-testing with market data from 2020 indicates our optimized GA portfolio achieves the best Total Return and highest Sharpe Ratio by comparing with a set of momentum and mean-reversion portfolios as well as the S&P500 ETF Fund SPY.
Although portfolio optimization problem could be solved analytically, financial reality may complicate both the object functions and constraints facing financial agents, making it difficult to find an analytical solution for the optimization problem. Therefore, genetic algorithms could be used as an alternative approach [11]. For example, general algorithms were used to select technical rules for trading S&P500 index and found the selected rules had the ability to identify the periods to buy the index when the daily returns were positive and volatility was low, and sell when the reverse was true [12].

Long/Short Determination -Short Ratio
The method to determine long or short direction is based on a factor named "Short Ratio", which is the shares shorted by the market over the Average Daily Trading Volume (http://regsho.finra.org/regsho-Index.html). In a momentum strategy, a stock that is shorted with a low short ratio is expected to continue its upward momentum and so is purchased, while a stock with a high short ratio is sold short. Based on the "Short Ratio", we sort and divide all component stocks of S&P 500 into 10 groups, with groups given scores from 1 to 10 as the "Short Ratio" increased, each group will have 10% of S&P500 component stocks. In our GA method, stocks with ranks 9 and 10 are available for selling short, while stocks with ranks 1 and 2 are for purchase selection. Other stock groups are not taken into consideration. More details of our stock selection will be covered in next section (section 2.2).

The Description of Our GA Method
Our GA algorithm has 20 populations in each generation, with 50 portfolios (chromosomes) in each population, and each portfolio (chromosome) had 10 stocks (genes).
To generate a portfolio for the 1st generation, 5 stocks randomly selected from groups 1 & 2 are purchased, while 5 stocks from groups 9 & 10 are made short. The long and short stocks are arranged alternatively in each portfolio, i.e., the odd numbers are long stocks and even numbers are for short. The stock selection is a random draw with replacement, i.e., a stock could be present in different portfolios in a generation.
Optimal stock weights are calculated via the Markowitz Mean-Variance Model [4]. The details of our stock weight calculation procedure are presented in next section (section 2.3). The same procedure is applied 50 times to create the optimal stock weights for each portfolio in a population. Finally, mutation is performed to introduce a random modification to each offspring portfolio as shown in Figure 4.   To simulate mutation for the offspring portfolios, a stock is randomly selected from an offspring portfolio, and assigned with a normalized random number between 0 and 1. If the random number is less than the 3% mutation rate (as defined in Section 1.1), the stock will be replaced by any stock from the same set of either longed or shorted stocks, i.e., group 1 & 2 for longs and group 9 & 10 for shorts. For each offspring portfolio, the mutation operation would repeat 10 times, so the probability for mutation will be statistically applied to every stock in the portfolio. The mutation operation is equally applicable to all the 20 offspring portfolios in each new population.
Before mutation (An offspring portfolio from crossover)

Weight Determination of Portfolio Constituents and Portfolio Fitness
Our portfolios are constructed as 130-30 funds, in which a ratio of 130% of starting capital is allocated to long positions by taking in 30% of the starting capital from shorting stocks. Therefore, each of our portfolios had a 130% leverage built in. A 130-30 fund is a "long-short" approach often lumped into long-short equity mutual funds [13] [14]. As we mentioned previously, for our portfolios, stocks with ranks 9 and 10 are sold short and stocks with ranks 1 and 2 are purchased.
Stocks in other ranks are not taken into consideration.
The Markowitz Mean-Variance Model [4] is applied to determine the optimized dollar weight for each stock in a portfolio. Specifically, Monte-Carlo Simulation is used to generate 10,000 random weight combinations for the 10 stocks in a portfolio. With 10,000 weights, each portfolio has 10,000 different fitness values (Sharpe Ratios). The highest Sharpe Ratio is selected as the portfolio's fitness, and the corresponding weights are the optimal weights for the stocks in the portfolio. The same process is applied to all the 50 portfolios in each population.
The process of calculation is shown below: Since each portfolio contained 10 stocks, a matrix of 10 × 10 is built to store the covariance between any two stocks in this portfolio.
is a matrix of 10 × 10000. Each column has 10 rows, which represent the weight for each stock.

Our Way of Portfolio Rebalancing
The back-test period is the entire year of 2020. This is a weekly rebalanced strategy, where we open positions on the first trading day of the week and exit on the last trading day of the week.
Each week, our GA method generates 20 populations, based on the daily price data from the first trading date of 2019 to the last trading day before the week for rebalancing, as described in Section 2.2. Each population has 50 portfolios, and each portfolio has 10 stocks, with 5 stocks from groups 1 and 2 for long positions and 5 stocks from groups 9 and 10 as short positions. The best portfolio from each population is selected, and 5% of our current capital is invested into each portfolio with 130% leverage. The optimized weight of each stock in each portfolio is determined by Mean-11 USING GENETIC ALGORITHM TO CONSTRUCT A MOMENTUM-BASED STOCK FUND Variance Model with Monte Carlo simulation as described in section 2.3. All the long and short positions will be closed out on the last trading date of the week.

Funds for Comparison
There are two common and simple strategies for S&P 500 stocks: 1) current best performers will continue to be the future best performers; 2) current worst performers will become the future best performers. Therefore, at the 1st trading day of a week we select best and worst performers based on their cumulative return in the week just past for rebalancing and application during the week.
In addition, we create a 130-30 fund with the 5 best performed stocks and 5 worst performed stocks and rebalance the fund weekly. We call the fund "WML" (Winner Minus Loser). Therefore, we have a total of 6 investment strategies to compare: • The S&P500 ETF fund, SPY, which serves as the benchmark for market performance.
• The Long-only fund with 10 weekly best performers, named as WIN in the following diagrams and tables.
• The Long-only fund with 10 weekly worst performers, named as LOS.

Fund Performance Evaluation
The following ( Figure 5 and Table 1   1. The best value in each row is shown as highlighted.
2. Total Return is the cumulative profit for all the trading days in 2020.
3. Volatility is the annualized standard deviation of daily returns for all the trading days in 2020.
4. Max drawdown is the maximum loss from the local maximum, expressed as a percentage, annualized.  Both the table and graph ( Figure 5 and Table 1) indicate that our GA-MOM fund performed the best, having the highest Total Return, Alpha, Sharpe Ratio, and Information Ratio while the Max Drawdown and Volatility were controlled well despite the several circuit breakers in 2020.
The performance of GA-EW-MOM was close to GA-MOM. But the simple 130-30 momentum without using GA performed worst among 6 funds in almost all the matrices.
For both long funds, although the LOS fund (Long-only fund with 10 weekly worst performers) lost considerably during March, it rebounded quickly afterwards, and exceeded GA-MOM several times. However, its Volatility was extremely high, with several big Drawdowns. The WIN fund (Long-only fund with 10 weekly best performers) was shown to be a poor portfolio with respect to return.

Quarterly Performance Evaluation
Although our momentum fund GA-MOM had the best cumulative return in 2020, other funds performed better in some quarters ( Figure 6 and Table 2).   In the first quarter, our momentum fund GA-MOM performed the best, with well controlled risk and Max Drawdown. This proves that in a steeply declining and extremely volatile market, our momentum fund can be profitable and antifragile [15].
In the second quarter, the market recovered, and SPY earned about 9%. During this time, the past losers performed the best in respect of Total Returns, Alpha, Sharpe Ratio, and Information Ratio.
However, its Volatility and Beta were high, which means the high return of past losers was mainly from the market. In contrast, our momentum fund GA-MOM was the second best with low Max Drawdown and Beta. More importantly, as shown in Table 3  It is indicated that our GA-MOM fund has lowest correlation coefficient value, 0.180884, while its p-value is the highest, which means that the correlation between GA-MOM fund and market is the least significant. This an important characteristic for a momentum fund in a volatile market.
In the third quarter, no portfolio beat the market. SPY stood out among all the funds. The market stabilized with reduced volatility and generally trended higher in the 3rd quarter. We believe in such a market situation, index funds could perform better than our momentum funds, which presents an interesting area for further research.
In the fourth quarter, all portfolios performed relatively well. Although our momentum fund GA-MOM did not achieve the best performance, it still had lowest Beta and its risk-adjusted return beat market. USING GENETIC ALGORITHM TO CONSTRUCT A MOMENTUM-BASED STOCK FUND

Summary
To summarize, although our GA momentum fund might not have been the best performer in every quarter, it performed best while the market was in its worst decline in the year and most volatile condition in 2020. Furthermore, by comparison to other funds, it had a relatively high Sharpe Ratio in most quarters. In other words, it accumulated profits through risk-adjusted returns. Therefore, in the end, it had the highest Total Return, Alpha, Sharpe Ratio and Information Ratio, Lowest Beta, second-lowest Max Drawdown, and second-lowest Volatility.
We also constructed an "Equally Dollar-Weighted Momentum" fund (GA-EW-MOM) to compare with our optimized weighted "Momentum" fund (GA-MON). As presented in the above diagrams and tables, the trend of these two funds' returns is very similar and most metrics are also close.
However, the optimized portfolio (GA-MOM) improves the Total Return by about 20% in 2020, which successfully proves the effect of optimization.

CONCLUSION AND DISCUSSION
This paper has applied Genetic Algorithms in selecting and optimizing S&P500 investment portfolios for a 130-30 momentum fund. The portfolio's stock weights are optimized through Mean-Variance Model and Monte Carlo simulation. The fitness of each portfolio is measured by its Sharpe Ratio. Through portfolio selection, crossover, and mutation, the "better" portfolios are preserved through multiple generations while abandoning the "worse" portfolios. In the end, the portfolio with the highest fitness in the last generation is added to our fund. The same strategy is repeated until the 20 best portfolios are selected with 5% investment funding for each portfolio.
From the results of the back-testing, we conclude a simple 130-30 fund fails to beat the market, while our GA constructed 130-30 fund excels in total return as well as risk-adjusted return. This indicates that the application of Genetic Algorithms could be a useful and effective approach in portfolio optimization.
In our current studies, we have not included transaction costs for the weekly portfolio rebalancing for back testing, as we assume the same transaction costs are applicable to all the funds in comparisons described in section 3.2. We are further investigating the robustness of our results by including transaction costs of 0.1% or 0.5% of the total trade value. In addition, we are expanding the time duration of our studies to 22 years, from the year 2000 to current date, and trying longer rebalance periods, such as monthly, every 3, 6 and 12 month frequencies.
In the future, more fitness functions, and a strategy to avoid short squeezing will be tested to further increase portfolio robustness and reduce maximum drawdown in a volatile market. In addition, it would be interesting to find out at what market conditions passive investment strategies such as ETFs and index funds would perform better than our momentum fund.  All the coding is done in C++ and Python. A SQLITE database is built for data persistency. All the source codes and the database are available to share upon request.

ACKNOWLEDGE
Thanks Dr. Ronald T. Slivka from the Department of Finance & Risk Engineering, NYU Tandon School. We are grateful for his guidance in reviewing drafts and making critical revisions of this paper. We appreciate Evan Tang's help for editing and proofreading our manuscripts.