Prediction of stock price movement using an improved NSGA-II-RF algorithm with a three-stage feature engineering process

Prediction of stock price has been a hot topic in artificial intelligence field. Computational intelligent methods such as machine learning or deep learning are explored in the prediction system in recent years. However, making accurate predictions of stock price direction is still a big challenge because stock prices are affected by nonlinear, nonstationary, and high dimensional features. In previous works, feature engineering was overlooked. How to select the optimal feature sets that affect stock price is a prominent solution. Hence, our motivation for this article is to propose an improved many-objective optimization algorithm integrating random forest (I-NSGA-II-RF) algorithm with a three-stage feature engineering process in order to decrease the computational complexity and improve the accuracy of prediction system. Maximizing accuracy and minimizing the optimal solution set are the optimization directions of the model in this study. The integrated information initialization population of two filtered feature selection methods is used to optimize the I-NSGA-II algorithm, using multiple chromosome hybrid coding to synchronously select features and optimize model parameters. Finally, the selected feature subset and parameters are input to the RF for training, prediction, and iterative optimization. Experimental results show that the I-NSGA-II-RF algorithm has the highest average accuracy, the smallest optimal solution set, and the shortest running time compared to the unmodified multi-objective feature selection algorithm and the single target feature selection algorithm. Compared to the deep learning model, this model has interpretability, higher accuracy, and less running time.


Introduction
Stock market forecasting is one of the most challenging research topics in the financial field. However, it is difficult to build a model or system that can predict the direction of stock prices with optimal accuracy because stock prices have typical characteristics of nonlinearity, high noise, and dynamic change [1][2][3]. In recent years, with the development of AI technologies, many researchers have applied machine learning (ML) and deep learning (DL) models to forecast stock prices. ML methods include decision trees, back propagation (BP) neural networks, a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 genetic algorithms (GA), support vector machines (SVM), and random forest (RF) [4,5]. DL methods include recurrent neural networks (RNN), convolutional neural networks (CNN), feedforward neural networks (FFNN) and so on [6]. Multi-objective optimization algorithms such as Non-dominated Sorting Genetic Algorithm (NSGA-II) also have been widely used in many practical fields [7][8][9][10].
Kara et al. studied the prediction performance of SVM model and BP neural network in stock market. The study showed BP model achieved higher accuracy than that of the SVM model [11]. Krauss et al. compared RF, DNN, and gradient-boosted trees models in forecasting stock prices [12]. Experiment results indicated that RF had the best performance. Basak et al. applied two tree-based classifiers, namely RF and extreme gradient boosting (XGBoost) to predict stock price direction. The experimental results showed the tree-based models had the higher accuracy compared with SVM, Logistic Regression, and ANN models [1,13]. Ampomah et al. proposed tree-based ensemble classifiers to forecast stock price movement [14]. RNNs also have been widely proposed by expert researchers [15]. Long short-term memory (LSTM), a successful variant of RNN, has been proven to be a prominent model in forecasting financial time series [16].
Just as DL methods, XGBoost has been proven as an outstanding algorithm in numerous machine learning since its initial introduction in 2014 [17]. Although ML and DL models have achieved superior performances, many studies propose ML or DL models do not consider the importance of feature engineering to improve the accuracy of the prediction model [18]. As mentioned above, complex and high dimensional features affect stock price, feature engineering is necessary as it enables prediction algorithms to select optimal feature sets to improve computational efficiency [17]. Feature engineering architecture in a stock price movement prediction is mainly composed of a feature set expansion module, a classifier module and a feature selection module. According to different feature subset evaluation strategies, feature selection methods can be roughly divided into two categories: filter method and wrapper method [19]. The filter method selects features based on their statistical characteristics, which has fast calculation speed and low accuracy. The wrapper method uses a hybrid classifier to select the optimal feature subset, which has high accuracy and low computational efficiency.
In recent years, there are a number of effective multi-objective approaches designed for optimizing feature engineering. Being motivated by this, we propose an improved manyobjective optimization algorithm integrating random forest (I-NSGA-II-RF) algorithm with a three-stage feature engineering process. The contributions of our study are as follows. (1) Stock price direction prediction is a complex nonlinear problem that requires considering various features and factors, as well as their interactions and influences. Therefore, an algorithm that can simultaneously optimize multiple objectives, namely maximizing classification accuracy and minimizing feature quantity, is needed. (2) The I-NSGA-II-RF algorithm is a hybrid algorithm based on multi-objective genetic algorithm and random forest, which can effectively search the high-dimensional feature space, find the optimal or near-optimal feature subsets, and simultaneously optimize the key parameters of random forest, improving prediction performance and reducing computational complexity. (3) The I-NSGA-II-RF algorithm also adopts some improvement strategies, such as combining filtering and wrapping methods, hybrid initialization, external archive mechanism, etc., to improve the convergence speed, diversity and quality of solutions.
The novelty of the model proposed in this article is as follows. Firstly, the original stock price datasets are denoised and technical indicators are generated through a three-stage feature engineering approach. Secondly, the model proposed in this article combines the advantages of two types of feature selection methods and proposes a stock prediction system that balances efficiency and performance. The NSGA-II-RF algorithm proposed in this article is based on the wrapper method, which improves the performance of the classifier by removing irrelevant redundant features and selecting feature subsets to improve the classification performance of the classifier. In the I-NSGA-II initialization stage of the algorithm, two filtering methods is combined to preprocess the selection of feature populations, further improving the efficiency of the algorithm. Thirdly, the stock forecasting model is constructed by combining the multiobjective optimization (NSGA-II) algorithm and the random forest algorithm (RF) to select features and predict the change direction of the stock.
The rest of this paper is organized as follows. Section 2 introduces related work. Section 3 describes the methodology and the proposed stock price direction prediction system. Section 4 provides experimental results and comparison. Section 5 concludes this study.

Related work
In the past decade, there have been multiple methodologies to predict stock prices such as computational intelligent methods. However, computational intelligent methods still need to improve the prediction efficiency. As the stock price data are high-noisy and nonlinear, some studies focus on feature engineering to enhance the model performance in computational intelligent methods.
Feature selection methods have their own advantages as well as disadvantages. Filter method is independent of the classifier and uses some specific evaluation criteria to evaluate the features. The evaluation criteria include the statistical methods [20,21] (e.g., Student's ttest and Chi-square test), the information theory-based methods [20] (e.g., Entropy and Kullback-Leibler divergence), other search techniques [22] (e.g., the correlation-based feature selection algorithm) and Markov blanket filter [23]. The filter methods are simple and efficient. However, they may remove relevant features for a particular class label with fewer instances, so the classification accuracy is relatively low [24]. Unlike the filter methods, the wrapper methods employ specific classifiers as the evaluation functions. Therefore, this approach is usually modeled as an optimization problem. The wrapper methods include a cuttlefish optimization algorithm [25], a new hybrid filter/wrapper algorithm for feature selection [26], a multi-objective task scheduling [27] and so on. The methods are dependent of classifiers and require the classification accuracy as the feedback to evaluate the classifiers. Therefore, the wrapper methods can achieve very high classification accuracy [25][26][27], but it is time-consuming. Especially, the wrapper method is applied to select features in predicting stock price direction [17]. Wrapper-based algorithms are formally transform feature selection into a single objective optimization problem [20][21][22]. Therefore, our motivation is to combine the advantages of two types of feature selection methods and proposes a stock prediction system that balances efficiency and performance.
In addition to high classification accuracy, past works ignored that other objectives such as optimal feature sets are also important to measure feature selection. In recent years, there are a number of effective multi-objective approaches designed for optimizing feature engineering. For example, Shone et al. introduced a hybrid model integrated deep learning and shallow learning [28]. Oliveira et al. applied NSGA for feature selection [29]. Some researchers proposed NSGA-II to build efficient feature selection models [30,31]. Furthermore, an improved algorithm (I-NSGA-III) is utilized to select optimal feature subsets with superior performance [32].
Evolutionary algorithm is a global optimization method with high robustness and wide applicability, which has great advantages in feature selection. There have many studies explored evolutionary algorithms to transform feature selection into a single objective optimization problem [33][34][35] or multi-objective feature optimization problem [36][37][38]. Many researchers improved the performance of evolutionary algorithms by adjusting parameters or designing new solution generation methods [39][40][41]. Population initialization is an important part of evolutionary algorithm, which will greatly affect the convergence speed and the quality of the final solution. It is a novel research perspective to improve the performance of feature selection problems by improving the population initialization method of evolutionary algorithm [42][43][44]. Xue et al. [45] proposed three new initialization strategies, among which the most effective method is the hybrid initialization strategy. At the same time, the filter method is also introduced into the population initialization to improve the performance of evolutionary algorithm in solving feature selection problems [46]. Moreover, Kawamura et al. [47] proposed a method combining filter (based on correlation feature selection) and wrapper (binary particle swarm optimization) to select the optimal feature subset. However, these two methods simply use the filter method to evaluate and filter the characteristics of the original data set [46,47].
Therefore, feature selection combined with multi-objective feature optimization was overlooked in the past works. In order to tackle the past's weaknesses, we are motivated to propose an improved many-objective optimization algorithm integrating random forest (I-NSGA-II-RF) algorithm with a three-stage feature engineering process. The advantages of our approach are as follows. (1) Improvement strategies are explored. Our scheme offers two significant additional strategies, namely, an improved multi-objective schemes and a predefined multiple targeted search. Therefore, the algorithm proposed in this study achieves higher prediction accuracy and less computational time compare with other DL methods. (2) Feature selection is solved. There is a "black box" problem in DL methods as they cannot evaluate the importance of features [41]. An improved NSGA-II is used in our scheme for handing the many-objective feature selection that can obtain the most important features for subsequent research. (3) The multi-objectives are achieved. The prediction model has two objectives, that is, higher accuracy and lower computational complexity. Optimal feature sets are selected by feature selection engineering, which remove irrelevant and redundant features, reduce unnecessary computational overhead, and improve the predictive performance of the model.
We propose a stock price direction prediction system applied a hybrid multi-objective evolutionary algorithm focusing on an improved feature engineering mechanism.  maximize the prediction performance and minimize the number of features. The stock price direction prediction system proposed in this study aims to reduce the amount of calculation and improve the prediction performance. In the initialization stage of NSGA-II, an improved filtering scheme integrating the filter method and the wrapper method is used to preprocess the selection of characteristic population. Then NSGA-II algorithm is used to continuously generate new solutions to adjust the structure of feature subset and random forest (classifier). Finally, the solution size and prediction accuracy of the new solution are used as evaluating indicators to select the optimal feature subset.

A scheme of stock price direction prediction
Forecasting the direction of stock price change can be regarded as a binary classification problem [1]. X t is a set of input characteristics on day t . By comparing the closing price of day t and day t+1 , Y t+1 can be defined as a binary value 0 or 1. The definition is expressed as Eq (1).
( where C t and C t+1 represent the closing stock price for two consecutive days. If the price of day t+1 is higher than the price of day t , it is set to up and the value is set by 1. Otherwise, it is set to down and the value is set by 0. Y 0 tþ1 represents the predicted direction of closing price on day t+1 . Y 0 tþ1 is a nonlinear function of the set of input features X t . where f(�) is a nonlinear function that maps a set of input features X t on day day t as shown in Fig 2. The state of input features X t is affected by n previous states of prices, which can be regard as a Markov process. For t>n, the process can be expressed as: where p(A|B) represents a conditional probability of event A given event B. Classical Markov processes assume that the past, present, and future states are independent of each other. Therefore, only the current state affects the next state in the classical Markov process as shown in Eq (4).
However, external and internal factors affect the current price of the stock. The time series of stocks have the characteristics of nonlinearity and high noise. The current daily price of the stock will be affected by the status from the previous n days. So the historical data set consists of 5 input features of day t , that is 'High', 'Low', 'Open', 'Close', and 'Volume', are limited to predict the stock price direction of day t+1 . Therefore, the first stage of I-NSGA-II-RF stock price direction prediction system proposed in this study is to expand feature set by generating technical indicators as shown in Fig 2. H H original historical data and technical indicators as the result of a nonlinear function of input features of day t .

Feature set expansion by generating technical indicators
As Markovian n-states memory property mentioned in 3.1, it is reasonable to integrate historical data and technical indicators to predict the stock price direction of day t . Generating technical indicators to expand the input feature set to gain "the blessing of dimensionality". High dimensional data set can be used to handle Markovian n-states to predict the stock price direction of day t effectively [48]. In addition, there are two macroeconomic technical indicators, that is, the exchange rate and interest rate. As US dollar plays the most important role in the international monetary market, US dollar index is used as the proxy for exchange rate in this study. Regarding the interest rate, the interbank offered rate in each market as the proxy is appropriate [49]. Federal funds rate in USA, Tokyo Interbank Offered Rate (TIBOR), Hong Kong Interbank Offered Rate (HIBOR), Shanghai Interbank Offered Rate (SHIBOR) and Mumbai Interbank Offered Rate (MIBOR) are used as technical indicators. The details are shown in Table 1.
There are 81 technical indicators generated out of the original 5 input features to get the expanded feature set, that is, TA-Lib. In recent years, TA-Lib is popular among market traders and researchers to perform technical analysis of financial time series [17]. Therefore, all technical indicators in this study are computed using the functions provided by TA-Lib library (TA-Lib, 202)). All generated technical indicators can be classified as one of 6 function groups of indicators. They are Overlap Studies, Momentum Indicators, Volume Indicators, Volatility Indicators, Price Transform, and Cycle Indicators [17].

Data preparation: Wavelet transform, data cleaning and normalization
Stock price time series is high-noise, nonlinear and non-stationary. Data preparation is a key stage of feature engineering. In recent studies, some researchers applied WT to extract multi-frequency features of time series as data preparation to forecast stock index [50,51]. Wavelet transform (WT) can effectively decomposed historical data into different frequency segmentation. Each frequency band does not overlap one another. The decomposed frequency range includes all frequency bands of the original time series. Historical stock price data have continuous variables with different measurement units for volume and closing prices. Furthermore, some technical indicators are rate measurements. Data normalization is a crucial step that transform all input data into a homogenous numerical array. The new array will be fed into the I-NSGA-II-RF algorithm. Eq (5) is used to process different scaled features. As data normalization precisely preserves all relationships in the data, and thereby, it avoids any bias [52].

Optimal feature selection by the I-NSGA-II-RF algorithm
The original historical data and the generated technical indicators constitute a total of 81-dimensional data which are used as input features in this study. However, the high-dimensional data may affect the classification speed and even lower the classification accuracy. It is necessary obtain optimal low-dimensional feature subsets in high-dimensional space. Feature selection and feature extraction are the two commonly used methods to find a suitable feature subset. Feature extraction will transform the existing features into new features. The transformation process will lose the interpretability and understandability of the original data. Comparatively, feature selection only chooses the most suitable feature subset. A stock price direction prediction system considering both efficiency and performance is proposed. I-NSGA-II-RF algorithm combine the advantages of filter and wrapped methods. I-NSGA-II-RF algorithm is wrapped-based method, which improves the performance of the

PLOS ONE
Prediction of stock price movement using an improved NSGA-II-RF algorithm classifier by removing irrelevant redundant features. In the initialization phase of I-NSGA-II, two filtering methods are integrated to preprocess the selection of characteristic population, which further improves the efficiency of the algorithm.

Random forest algorithm.
Random forest is an integrated learning method. Results of RF are generated by many decision tree votes. Multiple trees are suitable for parallel computing. The computing speed is fast, and it is not easy to generate over fitting problems. Therefore, they are more suitable for large-scale classification problems [52]. Its basic training process is to extract multiple samples from the data set by means of put back sampling, build a decision tree for each sample, and finally get the prediction results by means of collective voting of all decision trees. A group of decision trees (h 1 (x), h z (x),� � �,h n (x)) consist of a random forest. The margin function of RF is expressed as follows: where marg (�) represents margin function. I(�) represents indicator function, avg (�) represents averaging function, j represents wrong category vector, X represents input vector, Y represents correct category vector and n represents number of trees. Eq (8) indicates that the maximum average number of votes that classify the input variable X as the correct category vector (Y) exceeds the category vector (j) that is classified as the wrong. According to the theorem of large numbers, when the number of trees increases to a certain extent, the generalization error E* will be less than a fixed value. The equation is as follows: Since RF adopts repeated sampling with return, about 63% of the data is repeatedly sampled. The other 37% of the last extracted data is called out of bag data. RF uses out of bag (OOB) to evaluate generalization ability, and the resulting error is the out of bag error B error .
3.4.2 I-NSGA-II based on filtering and external archiving. Many data indicators are input into the RF algorithm. This may cause the problem of high indicator correlation and the difficulty of distinguishing the of indicators' importance, resulting in the performance degradation of the classifier. I-NSGA-II is used to improve RF. The improvement methods include deleting irrelevant and redundant indicators and optimizing the combination of technical indicators. In addition, the improvement methods also include improving the population initialization method of NSGA-II and adding an external archiving mechanism. These improvement strategies can maintain the diversity of solutions and improve the performance of multiobjective evolutionary algorithm. The strategies also synchronously optimize the key parameters of the classifier to achieve the effect of synchronous optimization, which can improve the prediction performance of the classifier.

Fitness valuation.
The process of multi-objective optimization is to find the decision vector X* and make it have the optimal fitness.
Another objective is to minimize the function vector. The constraints of the decision vector are as follows: ( Different from the single objective problem, the multi-objective problem can not find an X in the decision space, which is optimal for all the objective fitness. The reason is that the constraints are often in conflict with each other. In terms of multi-objective feature selection, the optimal decision vector represents the solution with higher classification accuracy and fewer features. Therefore, the number of selected features and the accuracy of classification are the two objectives of the algorithm proposed in this study.
The objective function for selecting the number of features is expressed as follows: Eq (12) is the calculation formula of classification accuracy: where Z is the decoding scheme, D is the dimension (number of features), N Car is the number of correctly predicted samples, and N All is the number of all samples.

Hybrid initialization.
The I-NSGA-II algorithm proposed in this study enables feature selection and classifier structure optimization synchronous. The improved strategies accelerate the convergence speed of the algorithm, shorten the running time, and enhance the performance of the algorithm.
As shown in Fig 3, the method of multi-chromosome mixed coding is adopted. The first chromosome encodes all features. The length of the chromosome is equal to the number of features in the data. Value 0 means that the feature is not selected, and value 1 means that the feature selected will be retained. The second chromosome encodes the key parameters of the random forest classifier. These parameters in Fig 3 include: e represents the number of trees in the random forest, f represents the maximum characteristic number, and d represents the maximum depth. Then, according to the selected feature subset and classifier parameters, the corresponding random forest model is trained. The selected feature number and accuracy will be used to evaluate the optimization and continue to produce better solution sets. These processes will promote the individuals in the population to find the global optimization.
Filter method is independent of the classifier and uses some specific evaluation criteria to evaluate the features. Statistical method, such as Chi-square test, can perform correlation analysis on classification variables [20][21][22]. Therefore, Chi-square is adopted in the initialization phase of NSGA-II algorithm. Chi-square is used to evaluate features first, then extract features more relevant to the target from all feature variables and reduce redundant features to initialize the population to improve performance [53]. The equation is as follows: where o stands for observation, that is, the actual frequency. e stands for expectation, that is, the expected frequency of X assuming that the variable x is independent of the target variable y. The theoretical value refers to the theoretical frequency. First, the Chi-square value of each characteristic variable to the target variable is obtained by evaluating the characteristics through the Chi-square equation. Then, the Chi-square value is sorted, and the top ranked features are selected. These selected features have higher correlation with the target variable y [54].
During the initialization of Chi-square algorithm, most of the features with high scores and a small part of irrelevant features will be retained. These two choices are to maintain the diversity of initialization from the perspective of the interaction between features. Basically, 80% of the most useful features are selected through experimental comparison. Therefore, how to get the appropriate features from the selected features and determine the appropriate number of features are the key points to be considered.
Hybrid initialization is used to solve this problem [45]. Most individuals are initialized with the first 80% of the selected features, and the rest are initialized with the last 20% of the features. The processes of hybrid initialization are as follows. First, Chi-square values are sorted. Then the 80% of the features with the highest Chi-square value from the initial features will be selected and saved in WR. As for the 80% of all individuals, the features will be retained if the features selected by the individual are saved in WR as well as in the initial matrix (the corresponding value is 1). As for the 20% of all individuals, the features will be retained if the features selected by the individual are not WR but the features are selected by the initial matrix.

Integrated learning population initialization.
In order to improve the ability of filter method, XGBoost is applied as a pre training model in the process of population initialization. XGBoost stands for extreme gradient boosting. As an embedded method, the algorithm has good performance high processing speed by exploiting all available hardware resources [55].
The importance score of each feature can be obtained directly based on XGBoost, which measures the importance of the feature in a tree structure. The score is derived for a wider range of objective functions. It is similar to the impurity score for evaluating decision trees.
In regard to a single decision tree, the feature importance of each tree is calculated according to the quality that the feature improves the performance. As for the tree ensemble of XGBoost, the greater the performance improvement of a single feature, the greater the weight, and will be selected by more lifting trees, the higher the importance. Comparatively, as for the tree ensemble of XGBoost, the greater the degree of performance improvement of features, the greater the weight assigned to them. These features will be selected by more trees, and the more important they will be. Finally, according to the importance of features in all trees, the weighted sum and average value are obtained to obtain the final importance score. Since XGBoost and Chi-square adopt different evaluation methods, they will get different evaluation results. The information obtained by integrating the two parts of heuristic information is more applicable. The initialization process of integrated learning population by combine XGBoost and Chi-square is shown in Algorithm 2 as follows.
Algorithm 2: integrated learning population initialization. 1. The chi-square value of the feature is evaluated by the Chi-square method. The features are arranged from large to small according to the chi square value. 2. The initial population a is obtained by algorithm 1. 3. Population 1 is used as the input of random forest. The TPE algorithm [56] is used to optimize the parameters for 50 times.Then the new population a' with the original population and accuracy is obtained. 4. The importance of features is evaluated by XGBoost method to get corresponding scores. The features are ranked from large to small according to the score of importance. 5. The initial population b is obtained by algorithm 1. 6. Population 2 is the input of random forest. The TPE algorithm is used to optimize the parameters for 50 times. Then the new population b' with the original population and accuracy is obtained. 7. The accuracy in a' and b' is sorted. The first 50% individuals without accuracy were selected as the final initial population P.

External archiving mechanism.
The external archiving mechanism is used to divide the population into two populations, that is, the working population and the external archiving guidance population. The algorithm performs three rounds of screening. First, the first round of screening is conducted according to the dominance relationship to remove the solutions with low fitness. And the rest are added to the archive. Secondly, the second round of filtering is carried out according to the dominance relationship in the archive to further remove the solutions with low fitness. Then the position of archived individuals in the grid is calculated. Finally, if the number of archives exceeds the archive threshold, the solutions are filtered according to the congestion distance. Consequently, the solution information is updated in the archive. The specific steps are shown in Algorithm 3.
Algorithm 3: external archiving mechanism. 1. for the new solution x, if archive is empty, then x is saved in archive. Otherwise, go to the next step; 2. if x is dominated by all solutions in the archive, the archive remains unchanged. Otherwise, go to the next step; 3. if x dominates the decomposition in the middle of the archive, delete the dominated solution and do not save it. Otherwise, go to the next step; 4.If x and all solutions in the archive do not control each other, save x to judge whether the archive size reaches the maximum value. If not, the update ends. Otherwise, calculate the congestion distance of each solution in the archive, delete the solution with the minimum congestion distance, and the update ends. Fig 4 is illustrated as follows. According to the solution set information with the highest accuracy in each generation, the optimal search approximate optimal or optimal solution is collected and updated. Subsets are used as inputs to the classifier to minimize the number of selected features while maximizing accuracy.

I-NSGA-II-RF algorithm. The operation process of I-NSGA-II-RF algorithm shown in
The initial population has 20 chromosomes. Each chromosome has two chromosomes that control the three key parameters of the random forest structure. The two functions of the multi-objective algorithm are the accuracy of all individuals in a generation of population evaluated by RF training and the number of individuals selected by the current individual. Finally, the optimal solution is found through crossover and mutation.

Experimental environment and data description
All the experiments were performed on the following computers. Hardware information are as follows: Intel i5-9500, 3.00GHZ processor and 8GB RAM. Software information are as follows:  Table 2 shows the statistical information of the Hang Seng dataset. Fig 5 shows the historical closing price of the Hang Seng Index during this period.

Feature set expansion by generating technical indicators
Two expanded feature sets are created From the original 5 features of historical prices by generating technical indicators as described in Section 3.2. The expanded feature set 1 adds 16

PLOS ONE
Prediction of stock price movement using an improved NSGA-II-RF algorithm common technical indicators. The expanded feature set 2 adds 67 technical indicators created from the original input features to capture the blessing of dimensionality. As described in Section 3, original historical values represent the current day's prices Xt while technical indicators retrospect the past 14 days price information from day t in this study. Therefore, a predicted result of day t +1 is a nonlinear function of input features of day t, which includes the historical prices of day t and the technical indicators of the period of days t-14, t-13,� � �, t-1, t.

Wavelet transform, data cleaning and normalization
Compared with Fourier transform, wavelet transform (WT) can simultaneously analyze the frequency components of financial time series Therefore, WT can effectively deal with unstable financial time series. The pywt library is used to decompose the index price sequence into time domain and frequency domain. The noise-reduced sequence is shown in Fig 6. A new column "trend" is added in the expanded sequence. The output of the model is the movement direction of the stock price from the past day to the current day. So "trend" is a binary feature consisting of 0 and 1. The first 88 null values caused by the expansion feature are deleted from 2038 trading days. Then the remaining 1950 trading days are used as the data set. All data sets are divided into training sets and test sets, in which training sets account for 80% (the first 1560 trading days) and test sets account for 20% (the next 390 trading days). The test set is used to evaluate the final selected feature subset. The training set is divided into the first 85% (1326 trading days) training model and the last 15% (234 trading days) to verify the model. All values are mapped between [0,1] according to Eq (5).

Parameter setting
The parameter settings used in this experiment are shown in Table 3.

Evaluation function
The following common classification indicators are adopted in this study to measure the performance of the classifier. TP is the true rate. TN is the true negative rate. FP is the false positive rate. FN is the false negative rate. Precision Recall

Comparison of different classifiers
In order to compare the performance of the I-NSGA-II-RF prediction model, the 200 generations of each machine learning model are optimized by using the I-NSGA-II proposed in this paper. The limit tree, XGBoost, KNN, decision tree, support vector machine with RBF kernel and random forest model compared in the experiment. The first 85% (1326 trading days) of the training set is used as the training set, and the last 15% (234 trading days) is used as the verification set. The verification accuracy of RF model is 86.54%, which is the highest among the ML models. Therefore, the advantage of hybrid I-NSGA-II-RF algorithm is promising. The experimental results of s&p500 index are shown in Table 4

Comparison of different feature subsets
Hang Seng Index was used to compare the characteristics of subsets. We test the daily data added with a total of 5 feature datasets (original datasets), all 81 feature datasets (all feature datasets) generated through feature expansion, 18 feature datasets commonly used in other studies (other datasets) [57], and 2 feature datasets (the best subset datasets) selected by I-NSGA-II-RF proposed in this study. The results are shown in Table 5. All feature sets are used as input variables to train through random forest model. The accuracy was tested after

Comparison of single objective, multi-objective and improved multiobjective 4.8.1 Comparison of classification indicators.
The single objective-based method (GA), the multi-objective based method (NSGA-II) and the improved multi-objective based method (I-NSGA-II) are used to test six data sets. The experimental results are shown in Table 6. It can be seen from the table that I-NSGA-II and NSGA-II achieved the same results in s&p500 and

PLOS ONE
Prediction of stock price movement using an improved NSGA-II-RF algorithm Hang seng index data sets. NSGA-II and GA algorithms achieve the same results in CSI300 index data set. I-NSGA-II algorithm obtains 4, 3 and 4 best in accuracy, F1 score and AUC among six stock price index data sets respectively. NSGA-II algorithm obtains 4, 4 and 4 best in accuracy, F1 score and AUC among six stock price index data sets respectively. The specific numerical difference is very small. It can be considered that the two algorithms are equal in performance. The GA algorithm achieves 2, 1 and 2 optimal results respectively. The experimental results show that the multi-objective algorithm has better performance than the single objective algorithm. From the number of features selected, the average number of features selected by multi-objective algorithm (I-NSGA-II and NSGA-II) is 14.5, while the average number of features selected by single objective algorithm (GA) is 32.5. Therefore, the multiobjective algorithm greatly reduces the number of selected features, which improves the ability of feature selection process, and improves the performance of the classifier. The changes of fitness verification accuracy during the training of the three optimization algorithms are shown in Fig 8, Fig 9, Fig 10.

Comparison of running time.
As can be seen from Table 7, the I-NSGA-II algorithm proposed in this study requires the least running time. Compared with the NSGA-II algorithm, the average running time is shortened by 70.85%. Furthermore, compared with the

PLOS ONE
Prediction of stock price movement using an improved NSGA-II-RF algorithm single objective GA algorithm, the average running time is shortened by 87.66%. The comparisons indicate that the computing performance of the data set optimized by I-NSGA-II algorithm is greatly improved. Moreover, the running time variance of I-NSGA-II algorithm is smaller. The variance difference between NSGA-II algorithm and GA algorithm is rather small. The stability of I-NSGA-II algorithm in running time further proves that it performs stably and effectively under different market conditions. Although compared with other algorithms, I-NSGA-II integrates the information of two filtering feature selection in population initialization to introduce a priori population, it spent less time. The improvements come from the appropriate random forest tree structure and the selection of fewer features. These

PLOS ONE
Prediction of stock price movement using an improved NSGA-II-RF algorithm two advantages avoid a large amount of unnecessary computing overhead and enhance the operation efficiency.
The average values of running time, number of selected features and accuracy of the experiments of the six data sets are shown in Table 8. The average accuracy of I-NSGA-II is slightly higher than that of the other two algorithms. However, I-NSGA-II has obvious advantages in the number of selected features and running time. Fig 11 shows the comparison of average comprehensive evaluation among objective, multi-objective and improved multi-objective.

Comparison of population initialization.
The I-NSGA-II algorithm proposed in this study adopts two different filtering feature selection methods to integrate information into a priori population in the population initialization stage. Introducing a priori information in the formal algorithm stage will help to promote the algorithm to find the global optimization. Fig 12 shows the initialization comparison between I-NSGA-II algorithm and NSGA-II algorithm. The horizontal axis solution size ratio is listed as the proportion of the selected feature number to the total feature number, that is, the first target represented by Eq (13). The vertical axis accuracy is the accuracy, that is, the second target represented by Eq (14). It can be seen from the Fig 14 that I-NSGA-II has a better solution than the original algorithm in the initial population (the closer to the upper left of the figure, the better).

Comparison with other models and algorithms
4.9.1 Comparison of benchmark model. The algorithm proposed in this study is compared with the classical machine learning models including support vector machine, KNN, XGBoost, random forest (RF), decision tree (DT) and two neural network models with  different layers commonly used in time series. Table 9 shows that I-NSGA-II has the highest accuracy, F1 score and AUC. 4.9.2 Comparison of I-NSGA-II-RF and benchmark studies. I-NSGA-II-RF is compared with eight benchmark studies. The comparison models are based on LSTM, GA optimization, CNN, DNN and integrated learning. The comparisons are shown in Table 10. As for Hang Seng Index, I-NSGA-II-RF achieves the best performance except AUC. The average accuracy of I-NSGA-II-RF is the same as that of integrated learning algorithm. The average accuracy of both of them is better than those of the other six models.

Comparison of recent studies that employ wrapper-based feature selection methods.
We also compare recent studies that employ wrapper-based feature selection methods with our approach. The comparisons are shown in Table 11.

Comparison with deep learning
Long Short-Term Memory (LSTM) network is an improvement of RNN neural network, which have risen to prominence in the field of financial forecasting in recent years. Its special gate structure enables it to remember the long-term and short-term information of time series. LSTM and its hybrid forms are increasingly used in the stock market prediction studies [72]. It can handle the problems of RNN gradient disappearance and gradient explosion, so it has stronger prediction ability for complex nonlinear time series. BP, LSTM and BiLSTM are often used in time series prediction. The three kinds of neural networks are used as representatives to compare with the algorithm proposed in this study.

Comparison with neural network.
In section 4.10.1, the neural network is trained with default parameters, and the specific training parameters are shown in Table 12. In section 4.10.2, the neural network structure and training parameters will be adjusted by the TPE algorithm based on Bayesian optimization. All 81 features are used as inputs in both sections. The 80% of the inputs (the first 1560 trading days) are used as the training set, the 20% of the inputs (the next 390 trading days) are used as the test set. The last 20% of the inputs in the training set  Fig 13, Fig 14, Fig 15. Experimental results indicate that the structure of three-layer LSTM network has achieved the best effect among all the nine neural network structures. The verification accuracy is stable, which reaches 86.86%. The effect of BP neural network is the worst. The verification accuracy of its three structures has not reached 54% because it lacks memory structure. Therefore, it performs poorly in sequence prediction. The bi-directional memory structure of BiLSTM neural network has no obvious advantage over LSTM. The structure also causes the over fitting problem that the verification accuracy increases first and then decreases later. Due to the poor effect of BP neural network, we only show the experimental results of LSTM and BiLSTM in Table 9.

Comparison with LSTM neural network optimized by TPE.
Deep learning is widely used in time series prediction because of its powerful prediction ability. However, the

PLOS ONE
performance of the model largely depends on the structure of neural network and the adjustment of super parameters. Therefore, the number of network layers, the number of neurons in each layer and the forgetting rate in the super parameters need to be carefully adjusted to improve the accuracy of network prediction. In this section, TPE algorithm is used to adjust the structure of LSTM neural network, the number of nodes and super parameters. The specific ranges are as follows: the number of layers of LSTM neural network [1][2][3], the number of layers of dense neural network [1][2][3], the number of nodes per layer , and the forgetting rate [0.1-0.5]. The neural network structure is optimized by TPE algorithm for 200 times. A better structure is found in the optimization process, which is shown in Table 13. The best structure is that the three-layer LSTM connected to the three-layer Dense. The details are as follows: the number of network nodes is (115,110,72,57,92,44), the forgetting rate is 0.19, and the verification accuracy reaches 89.16%. Prediction performance and computing consumption are two important indicators to measure prediction model. The running time of the model is particularly important in the ultra short-term transaction. Therefore, the running time can better measure the performance of the algorithm compared with the calculation loss. TPE-LSTM algorithm has created 200 neural networks. Each neural network needs to be iterated 200 times. The operations are undoubtedly such a huge computing consumption that the running time will increase greatly. The running time and accuracy of I-NSGA-II-RF and TPE-LSTM algorithm are illustrated in Table 14. The accuracy difference between the two algorithms is very small. However, the running time of the algorithm proposed in this study is 0.85% of that of TPE-LSTM. Fig 16 visually shows the comparison of the running time and accuracy of the two algorithms.

Comparison with other multi-objective
The I-NSGA-II-RF algorithm proposed in this study is compared with other multi-objective optimization algorithms. The experimental results of comparison of different multi-objective algorithms in Hang Seng data set are shown in Table 15 and Fig 17.

Comparison with Wilcoxon's rank-sum test and Friedman rank test
To further prove our model statistically in terms of significance, we conducted Wilcoxon's rank-sum test and Friedman rank test (depending on the number of methods compared) to test the null hypothesis that there is no difference between the performance of our method and the baseline methods. We reported the p-values and effect sizes of the statistical tests and interpreted them according to the commonly accepted significance levels and guidelines. We compare the performance of the proposed method with the other methods (10 runs on each artificial dataset). The results shown in Table 16 are the p-values of the Friedman rand test and Wilcoxon signed-rank test. The differences between the proposed INSGA2-RF method and the NSGA2-RF, GA-RF methods are all significant at a level of 0.05 (95%). We conduct a Friedman rank test. The results mean the differences of three methods are all significant (statistic = 10.0, pvalue = 0.006737946999085468).

Conclusion
The algorithm I-NSGA-II-RF proposed in this study put forward the following contributions.
(1) Significant novelty. Our scheme offers improvement strategies. First, I-NSGA-II-RF initializes the population by integrating two filtering feature selection methods. Then I-NSGA-II-RF regards the stock prediction problem as a multi-objective problem. Finally, I-NSGA-II-RF takes the solution of maximizing accuracy and minimizing resolution as the optimization direction to synchronously optimize the feature selection and the RF parameters. Compared with other benchmark studies, it has the fastest processing speed, the minimum solution and the highest prediction accuracy. (2) Good interpretability. There is a "black box" problem in DL methods as they cannot evaluate the importance of features. We put forward three-stage feature engineering. In the aspect of feature engineering, a total of 81 features including the original data and the expanded feature subsets are added, which improved the accuracy of the original prediction from 57.83% to 89.44%. The optimization effect is predominant. The problem of "Curse of dimension" is overcome reasonably. (3) Good performance. The prediction performance of I-NSGA-II-RF is the best among all the benchmark models. Compared with the original multi-objective algorithm and single objective algorithm, the running time of I-NSGA-II-RF is reduced by 70.85% respectively. The phase sum algorithm is 87.66% with high efficiency. Therefore, it is suitable for the prediction of short-term trading system. Compared with the deep learning algorithm with default parameters, the prediction performance is about 3% higher on average. I-NSGA-II-RF and TPE-LSTM with optimized parameters have the same accuracy, but the running time I-NSGA-II-RF is only 0.85% of that of TPE-LSTM. These are some possible applications of the proposed method, but there may be more domains that can benefit from it. The proposed method of multi-objective optimization and feature engineering has some practical implications in other domains which are summarized as follows [73]. 1.Solving complex engineering problems that involve multiple conflicting criteria, such as design optimization, manufacturing, structural health monitoring, etc. 2. Enhancing the performance and efficiency of machine learning and data mining models that require feature selection and parameter tuning. 3. Finding trade-offs and compromise solutions for decision making in various fields, such as economics, agriculture, aviation, automotive, etc. In summary, the limitations and future work of our study are as follows. 1. Our study only focused on one specific domain of mechanical engineering problems and does not compare or generalize the proposed method to other domains or applications. In the future, we plan to apply our method to different types of engineering problems and evaluate its performance and applicability. 2. Our study did not provide a detailed analysis and explanation of the technical indicators used in the feature engineering process, which may result in some unreasonable or redundant indicators. 3.Our study did not provide enough details or explanations of the proposed I-NSGA-II-RF algorithm, such as its mathematical formulation, algorithmic steps, parameter settings, which may affect the algorithm's stability and generalization ability. In the

PLOS ONE
future, we plan to provide more technical details and illustrations of our algorithm and its implementation. 4. Our study only used six stock indices as our dataset, which may not be representative of all stock markets. In the future, we plan to design more experiments and conduct more analyses to validate and compare our method with existing methods.
Supporting information S1 File.