Finance, Markets and Valuation

In the ﬁnancial literature, there is great interest in the prediction of stock prices. Stock prediction is necessary for the creation of different investment strategies, both speculative and hedging ones. The application of neural networks has involved a change in the creation of predictive models. In this paper, we analyze the capacity of recurrent neural networks, in particular the long short-term recurrent neural network (LSTM) as opposed to classic time series models such as the Exponential Smooth Time Series (ETS) and the Arima model (ARIMA). These models have been estimated for 284 stocks from the S&P 500 stock market index, comparing the MAE obtained from their predictions. The results obtained conﬁrm a signiﬁcant reduction in prediction errors when LSTM is applied. These results are consistent with other similar studies applied to stocks included in other stock market indices, as well as other ﬁnancial assets such as exchange rates.


Introduction
The prediction of stock returns has been widely studied in the financial literature. One of the main objectives is the construction of stock portfolios. In some cases, these portfolios are constructed by applying di erent optimisation algorithms based on the classic Markowitz model (García, González-Bueno, & Oliver, 2015). Some optimisation algorithms, such as the NSGA-II, allow the construction of these portfolios taking into account more than the two classic return-risk dimensions (W. Chen, Zhang, Mehlawat, & Jia, 2021;García, González-Bueno, Oliver, & Tamošiūnienė, 2019). Other algorithms, such as genetic algorithms together with heuristic techniques allow good solutions to be found in an NP-hard problem (Ahn, Lee, Ryou, & Oh, 2020;. Other studies try to find relations between sustainability and stock market portfolios or indexes. Thus, for example, Arribas, Espinós-Vañó, García, and Oliver (2019) analyse the composition and selection of responsible companies for the construction of portfolios and investment funds, while Espinós-Vañó,  analyse the so-called sustainable stock market indexes to determine whether they can be an investment alternative to traditional stock market indexes.
Stock prediction is also applied to speculate with the price evolution of financial assets. Predictive models can be classified into linear and non-linear models. To the first group belong time series models such as autoregressive integrated moving average, exponential smooth models and generalized autoregressive conditional heteroskedasticity, among others. Neural networks are assigned to a second group. Given that the stock returns are not stationary and present long-term dependence (Barkoulas & Baum, 1996), this second group of models have shown to obtain more accuracy and a reduction of errors prediction with respect to linear models.
In the last few decades, neuronal networks have evolved from the initial models of Hebb, which proposed the first rules of learning processes between neurons, Widrow and Ho with their model Adaline, and Rosenblatt, which proposed the well-known perceptron. There are many works related to the prediction of stock prices and their trends applying models based on artificial neuronal networks. In Qiu and Song (2016), for example, the authors present the analysis of the prediction or the direction of stock price index for the Nikkei 225 stock market index. In this case, they use a backpropagation neural network with two di erent types of inputs to determine what kind of information improves results. The result suggests that the network is able to select those variables that are suitable for the model. Moghaddam, Moghaddam, and Esfandyari (2016) apply the same type of network for the prediction of the Nasdaq stock index using historical short term data as inputs (between four and nine days). In García, Guijarro, Oliver, and Tamošiūnienė (2018) the authors apply a neural network to predict the trend of the German Dax-30 stock index.
In recent years, recurrent neural networks have been widely used for the analysis of time series with high time dependency, such as stock returns. These types of networks are based on Rumelhart's work on error backpropagation (Rumelhart, Hinton, & Williams, 1986) and Hopfield's networks (Hopfield & Tank, 1985). Some works make a comparative analysis between some already known neural networks against the recurrent neural networks as for example in Saad, Prokhorov, and Wunsch (1998). The authors compare the time delay neural network (TDNN), probabilistic neural network (PNN) and a recurrent neural network to predict short-term stock trends. The conclusion is that the TDNN has the capability to dynamically incorporate past data to internal recurrence, and is the most powerful network among those analysed, although it presents certain complexity regarding its implementation and memory requirement.
On the other hand, Yoshihara, Fujikawa, Seki, and Uehara (2014) compare RNN with another machine learning methodology, the support vector machine (SVM) model. They analyse trend market prediction on Nikkei companies. In this case, in addition to incorporating numerical inputs into the models, information on di erent economic and financial events is introduced (news reported by newspapers). The advantage of deep learning is its ability to automatically construct di erent features from data as well as pattern recognition. The results from RNN were compared with SVM, and present the lowest error rate. This study shows that recurrent models are more e ective in capturing past events that are significant with respect to long-term e ects.
Other studies also confirm these results, like the paper by Rather, Agarwal, and Sastry (2015) who analyse 25 stock returns from Bombay stock exchange indicating that this model is capable of capturing non-linear patterns more e iciently than classical models. In this case it is concluded that the RNN learning process improves as it needs to look for smaller weights. da Silva, Spatti, Flauzino, Liboni, and dos Reis Alves (2016) analyse stocks in the Bovespa index, which is an alternative for decision making in the financial stock markets. RNN has also been applied to other types of financial assets. Ye (2017) focuses on forecasting exchange rate using gradient descent method or hidden layer in the process learning for recurrent neural networks. Recurrence in neurons causes a speed up the weights update as well as convergence. This confirms the reliability and stability of neural networks.
In Section 2, the well-known time series models, Arima and Exponential Smoothing, are described. This is followed by a more in-depth description of Recurrent Neural Networks (RNN) and the particular Long Short-Term Recurrent Neural Network (LSTM). Section 3 describes the main results obtained in the work. Finally, Section 4 summarizes the main conclusions.

Data and Methods
In this section, the models used in this work are discussed. First, the well-known time series models, Exponential smoothing and Autoregressive moving average model, are presented in the summary form. Next, the recurrent neural network model is described in more detail, in particular, the long-short term memory recurrent neural network model.

Exponential smoothing model
The exponential smoothing model was proposed at the end of the '50s (Brown, 1959;Holt, 2004;Winters, 1960). In this type of model, past observations are weighted in a way that declines exponentially the further back in time.
In other words, more recent observations are associated with a greater weight while older observations have lower weights.
The model can be expressed as follow: Where α ∈ [0, 1] is the smoothing parameter. Thus, the prediction at instant t + 1 is a weighted average of the observations of y . The degree of the weightings decrease is controlled by the parameter α.

Autoregressive moving average model
The Autoregressive moving average model (ARMA) was introduced by Box, Jenkins, and Reinsel (1970). It is one of the classic models that analyses time series, and one of the most used in financial literature. These models can become stationary by di erencing. Generally, in most economic and financial time series, a single di erentiation is enough to make the series stationary and to be able to apply ARIMA models where the "I" represents the level of di erentiation (integration) of the series.
Having a time series X t where t represents the time index, the AR M A(p, q ) model is expressed as: Where α and θ are estimated coe icients and are white noise errors.
The AR M A(p, q ) model is built as the combination of two processes. The first is the autoregressive process (AR), which tries to predict the variable using a linear combination of past values of this variable. An autoregressive model of order p, represents the number of lagged variable. On the other hand, the moving average (MA) part gives a prediction of the variable from a moving average model on past prediction errors. The order q of this process represents the number of delays over the prediction errors used in the model.

Recurrent neural network: LSTM network
Recurrent neural network (RNN) is a network that has backward connections between neurons, which are generally referred to as global recurrent networks. This type of model presents some stability problems in the training process, so it requires complex learning algorithms as well as increasing the training time. Local networks models are global feedforward networks. In this case, a structure of dynamic models of neurons is designed to build a feedback network, in which the connections between these neuron models are strictly feedforward as in the case of Multilayer Perceptron (MLP). Figure 1 shows an example of the connections between the di erent layers and neurons in a recurrent neural network. It can be seen how the output obtained in one layer serves as an input for the neurons of layers located in a previous process. Each recurring unit formed by di erent neurons computes at each time step an output y t which is time dependent on the current process. In the next time step, the neuron receives a new input vector x t and additionally incorporates the output obtained previously into this vector (y t −1 ). The latter is called the recurrent input. In this way, the neuron computes the output vector from the input vector (x t ) and the recurrent input (y t −1 ) using an activation function θ. This activation function can be of the linear, sigmoidal or tanh type, although the last type of function is o en used for time series problems.  (2017) Where W and U are weight matrix that multiply the input vector and the recurrent input vector.
The simplest recurrent neural network, called "vanilla", presents as recurrent inputs only a single output obtained in a previous time step. When the net uses the previous outputs as a new input, the net can remember learned previous data. This process is important for the learning long short-dependencies.
The importance of inputs and recurrent inputs in the net depends on their corresponding weight matrix. During the learning process, the net adjusts the weights to improve the prediction, taking into account the calculated error (bacpropagation process). However, while in a feed forward network the backpropagation process goes back through the hidden layers, for recurrent neural network it is also necessary to adjust the weights of previous time steps (time adjusting). In this type of network, if the sequence is long, there may be a problem in the learning process, since with each prediction the whole way backwards must be covered again. To avoid this, a split of the di erent sequences is made. In this way, the backpropagation process should only go backwards the length of the subsequence. But in this case, the neural network is only able to determine short dependencies. This is the so-called vanishing gradient problem, in which the further back the sequence is regressed the less important it is in the current prediction, and therefore cannot adequately capture long term dependencies.
The Long Short-Term Neural Network (LSTM) is a more complex kind of recurrent neural network as it is able to capture long-term dependencies. This kind of neural network was proposed by Hochreiter and Schmidhuber (1997) as an evolution of simple RNN. This network can propagate activations over long periods to process di erent sequences that include long distance dependencies (Kelleher, 2019). This network solves the vanishing gradient problem. In this case, the recurrent unit is modified in blocks, which works like a normal recurrent unit, to which an additional cell and several gates are added. The gates control the flow of information within the recurrent unit (block). In this way it is determined which information is more revealing to improve the prediction and which is not at each of the time steps. These gates (τ) are defined

Forget
Eliminate neuron Update Importance of the past Relevance Drop previous information Output Which information is used Where W , U and b are gate specific coe icients and σ is the common used sigmoid function (while in the input activation function use a tanh function). In the LSTM recurrent neural network, four gates are used, each with a di erent function. Table 1 summarizes the role function that one of them has.
The forget gate is used to erase a neuron or not, and therefore, forget the information. On the other hand, the update gate indicates what is the past to be taken into account now. The relevance gate defines what information from the past is relevant to incorporate as input to the neuron. Finally, the output gate selects the information that is useful for the neuron in the actual prediction. Each of these four gates uses di erent weight matrices and are calculated individually during the learning process. In short, these gates control the flow of information in each neuron so that it is useful in predicting at each time step.

Results
In this work, it is intended to confirm the e iciency of a LSTM Neural Network, as opposed to some classic models applied to time series. In this case, it is going to be compared with an Exponential Smooth Time Series model and an ARIMA model. For this purpose, each of them has been applied to a sample of 284 stocks from the S&P 500 index with daily data from the last 20 years. The sample has been divided into 70% for the estimation and training processes and 30% for its validation.
For the Exponential Smooth Time Series and ARIMA model, the number of di erentiations needed to obtain stationary time series, which is required in this type of models, has been taken into account. As with many economic time series (McCabe & Tremayne, 1995), only one integration was necessary to achieve stationarity.
For the Exponential Smooth Time Series model, the AIC (Akaike) crystals have been used to select the appropriate delays in each case. In the case of the ARIMA models, Phillips Perron's criteria has been used to determine the delays of both the autoregressive part and the moving averages.
In the case of the LSTM Neural Network model, the series has been standardised, both for training and for testing. There are many works that verify that standardisation improves the learning process in neural networks such as Lachtermacher and Fuller (1995); Shen, Zhang, Lu, Xu, and Xiao (2020); Zhang, Patuwo, and Hu (1998).  Figure 2 shows the type of network applied to each of the actions. As already indicated, it is a recurrent long short-term neural network. In this case, the processes the sequence of vectors using a LSTM layer of input data. This model presents the di erent layers in a sequential way. As the model needs to know in the first layer the type of input it should expect, (since in the rest of layers it is inferred automatically), the number of samples per gradient updates is one. That is, the number of batch size of the inputs for the layer is one. The dimension of the output space of this first layer is five. This parameter is subject to tuning. The output layer is full-fully connected (dense layer), it has been configured with a batch size of one and one unit.
The three models have been estimated for each stock with the corresponding appropriate delays according to the criteria already indicated. For the evaluation of the e iciency of each model, the Mean Absolute Error (MAE) has been calculated on the predicted observations. Figure 3 shows a boxplot with the errors of each model for the 3M action. In the 284 stocks analysed, two important issues have been observed. Firstly, the Exponential Smooth Time Series model and the ARIMA model present similar MAE. For example, in the case of 3M the MAE obtained in the first model was 0.8847, while for the ARIMA model it was 0.8857. It is possible that by applying other exponential models such as Holt-Winters' that could relatively improve the ARIMA model (Maria & Dezsi, 2011). Secondly, even in the case of using other exponential models that improve the ARIMA model, they are far removed from the results obtained from MAE for the LSTM model. For example, for 3M the MAE obtained was 0.1823, that is, 79% less error than the other models. Table 2 describes the main statistics on the distribution of the MAE obtained in the total number of actions analysed and for each of the models. On the one hand, it can be seen that the classic time series models (ETS and ARIMA) present a higher MAE for all the quantiles of the sample in comparison with the LSTM model, as it was already advanced in the previous  2018) that compare these three models for various stock market indexes such as Nasdaq, Nikkei, Hang Seng with monthly data. The results suggest that the LSTM model obtains, on average, a reduction in prediction error of between 84 and 87 percent. Baughman, Haas, Wolski, Foster, and Chard (2018) compare the ARIMA model with the LSTM for Amazon stock, obtaining an error reduction of 95%.
However, four stocks have been detected in which the MAE obtained by the LSTM model is superior to any of the other two models (Table 3). Each of these stocks has been analyzed in detail to detect if this result is due to some kind of error in the sample. In all four cases, there are a su icient number of observations (several thousand). Neither have any missing or anomalous data been detected. Likewise, the quoted prices have been visually contrasted without apparently detecting any errors. The first two shares, LDOS and IRM, are listed on the NYSE, while CTSH and CHTR are listed on the NASDAQ. It can therefore be concluded that the results obtained for these four shares are plausible. However, the LSTM model has outperformed the classical time series models in 98.59% of the sample analysed, so recurrent neural networks are a good alternative for time series prediction in general, and for stocks and stock indices in particular. Abdoli (2020) analyses the Tehran Stock Exchange confirming the results obtained in the present work, where the LSTM outperforms ARIMA model, in terms of error of accuracy.

Conclusions
In this work, the e iciency of the Long short-termn recurrent neural network has been analysed in comparison with other time series models. The main conclusion that can be drawn is that there is a large reduction in the prediction error of more than 85%, which is in line with previous  results from other studies on other financial assets. Recurrent neural networks in general, and the LSTM in particular, may be an alternative to consider in the creation of stock price prediction models. However, to confirm these results, this analysis should be extended to other aspects such as the application of a larger number of fully connected intermediate layers or the application of tuning of other network parameters.
On the other hand, other authors have proposed other types of neural networks that seem to o er very e icient alternatives, as well. In M, E.A., Menon, and K.P. (2018) the authors compare several linear time series models (ARIMA) with non-linear models such as ARCH, GARCH and Neural Networks. In this case, they apply two types of recurrent neural networks, one LSTM model and the other Convolutional Neural Network. This network is applied to five stocks of the National Stock Exchange (NSE) of India. The results suggest that the Convolutional Neural Network outperforming the other models, even against the LSTM model. In the same line, the works of Y. Chen, Wei, and Huang (2018)applying a Convolutional model to the prediction of the stock market in Mainland China incorporating related corporations' information to create more accuracy in predictions are presented. In addition, other works propose the use of a hybrid model between the Convolutional Neural Network and the recurrent neuronal network LSTM (Kim & Kim, 2019).