Stock Price Prediction Using Deep Learning Models

,


INTRODUCTION
Analysis of financial time series and prediction of future stock price values and future stock price movements have an active area of research over a long period of time.While there are researchers who believe in the well-known efficient market hypothesis, and claim that it is impossible to forecast stock prices accurately, propositions exist in the literature that demonstrates that it is possible to predict the values of stock prices with a very high level of accuracy using carefully designed predictive models.It has also been found that the accuracy of a predictive model depends on the set of variables used in building the model, the algorithms deployed, and how the model has been optimized.There are propositions in the literature that focus on the decomposition of time series for stock price prediction [1][2].Applications of machine learning and deep learning-based approaches have also been quite popular in stock price movement analysis and forecasting [3][4].
Mehtab and Sen propose a model for stock price forecasting that utilizes the sentiment of the investors from the social media in augmenting the output of a deep learning framework to arrive at a very high level of accuracy in prediction.The proposed framework also deploys a nonlinear multivariate system built on a self-organizing fuzzy neural network (SOFNN) [5].In two recently published work, Mehtab and Sen presented a suite of convolutional neural network (CNN)-based regression models that exhibited a very high level of accuracy and robustness in forecasting on a multivariate financial time series data [6][7].
Several propositions exist in the literature on technical analysis of stock price movement patterns.Among the various indicators of price movements, moving average convergence divergence (MACD), momentum stochastics, meta sine wave, etc. are quite well known.These indicators provide the investors with a rich set of visualization platforms and useful metrics that help investors in making effective decisions on investment in the stock market.
In this work, we propose a suite of deep learning-based regression models for the purpose of forecasting NIFTY 50 index values.For building the models, the historical values of the NIFTY 50 index for the period December 29, 2008 (which was a Monday) to December 28, 2018 (which was a Friday) have been used as the training records.The models have been tested on NIFTY 50 index values during the period of December 31, 2018 (which was a Monday) to May 15, 2020 (which was a Friday).The five models that we have proposed in this work include two convolutional neural network (CNN)-based models, and three long-and shortterm memory (LSTM) network-based models.The models have different architectures and different structures in their input data.While all the models have univariate input data, four of them use the previous two weeks' data as their input for forecasting the open values of the NIFTY 50 index time series.However, one CNN model uses the previous one week's data as the input for the purpose of forecasting the open values of the NIFTY 50 index of the next week.
The organization of the paper is as follows.In Section II, we present a clear definition of the problem we solve in this paper.Section III presents a very brief outline of some related work in the field of stock price forecasting.In Section IV, we describe the methodology that we followed in this work.This section also presents the architectural details of all our proposed models.The results on the performance of the models are presented in Section V. Finally, in Section VI, we conclude the paper while highlighting some future research directions.

II. PROBLEM STATEMENT
Our objective is to build a robust and accurate predictive framework that contains a suite of deep learning-based regression models.We have used the historical records of NIFTY 50 index values over a period of five and a half years for building and testing our proposed models.We have chosen a very realistic value for the prediction horizon as one week for our proposed models.We hypothesize that the deep learning models will be able to extract a rich feature set from the past NIFTY 50 index values and will be able to forecast the future index values with a very high level of accuracy.In our past work, we proposed a suite of four CNN-based regression models to validate our hypothesis [6].In the current work, we augment our proposition with five different deep learning-based regression models.While two of the proposed models are built on CNN, the remaining three models are based on three variants of LSTM network architecture.

III. RELATED WORK
Forecasting of stock price values and stock price movement patterns has been an extensive area of research.Even though numerous propositions exist in the literature on prediction of stock prices, most of them fall into one of three broad categories.The propositions in the first category are all based on simple linear regression on multivariate crosssectional data [8][9][10].However, in most of the real-world scenarios, these models fail to achieve a high level of accuracy in forecasting since the underlying assumptions for the best performance of the models are not met by the financial time series data.The work belonging to the second category are based on time series modeling using statistical and econometric approaches like autoregressive integrated moving average (ARIMA), autoregressive distributed lag (ARDL), and vector autoregression (VAR) [11][12][13].These models have been found to perform very well on time series data with a strong trend and a seasonal component.However, in presence of a dominant random component, these models exhibit suboptimal performance.The models of the third category are built on learning approaches and on textual information on the web and social media.These models exploit various algorithms of machine learning, deep learning, and natural language processing [14][15][16][17].
Most of the existing propositions in the literature on stock price prediction suffer from a common shortcoming.If the stock price time series exhibits significant randomness, the forecast accuracies of the models drastically decrease.The proposed models in our current work have yielded very high level of accuracies by utilizing the power of convolutional neural networks (CNNs) and long-and shortterm memory (LSTM) networks in their ability in learning deep features from the past values of a financial time series.The learned features are used for making forecasts for the future values of the stock index.Moreover, the time needed for execution of the models were found to be quite moderate on our target hardware architecture.The fastest model in our proposition needed only 11.17s, on an average, for model building using a training dataset consisting of 2610 records and testing on a test dataset consisting of 360 records.

IV. METHODOLOGY
As we mentioned in Section II, the main objective of this work is to build a suite of predictive framework for accurately forecasting the daily price of NIFTY 50 index.For building and testing our proposed predictive models, we use the historical NIFTY index values for the period during December 29, 2008 to May 15, 2020.The stock records were downloaded in the form of a comma separated variables (CSV) file from the Yahoo Finance website [18].The following attributes constituted the daily records of NIFTY 50 index values: (i) date, (ii) open, (iii) high, (iv) low, (v) close, and (vi) volume.
The predictive models proposed in this work are all deep learning-based regression models.We use the variable open as the response variable, and all the other variables are used as the predictors.NIFTY 50 daily data for the period December 29, 2008 to December 28, 2018 has been used as the training data for constructing the models, while we tested the models using the data for the period December 31, 2018 to May 15, 2020.Hence, the training dataset comprised of 2610 records spanning over 522 weeks, while the test dataset consisted of 360 records over 72 weeks.We followed the approach of multi-step forecasting with walk-forward validation for the purpose of validation and testing of our proposed models [19].Using this method, we build the To make our forecasting framework more robust and accurate, we build some deep learning-based regression models too.In one of our previous work, we demonstrated the effectiveness and accuracy of convolutional neural networks (CNNs) in forecasting time series index values [6].In this work, in addition to exploiting the power of CNN, we have utilized another type of deep learning model -long-andshort-term memory (LSTM) networks -in forecasting on a complex multivariate time series like the NIFTY 50 series.
A CNN consists of two major processing layers -the convolutional layers and the pooling layers [7].The convolutional layers are used for reading the inputs either in the form of a two-dimensional image or as a sequence of one-dimensional data.The results of the reading are projected into a filter map that represents the interpretation of the input.The pooling layers operate on the extracted feature maps and derives the most essential features by averaging (average pool) or max computing (max pooling) operations.For extracting deep features from the input sequence, the convolution and the pooling layers may be repeated multiple times.The output from the last pooling layers is sent to a one or more dense layer(s) for extensive extraction of features from the input data.
LSTM is a deep neural network architecture that essentially belongs to the family of recurrent neural networks (RNNs).RNNs have a characteristic that distinguishes these networks from other deep neural networks -they have feedback loops [19].However, RNNs suffer from a problem known as vanishing and exploding gradient problem, in which the network either stops learning or continues to learn at a very high learning rate so that it never converges to the point of the minimum error.The architectures of LSTM networks are designed in such a way that the problems of vanishing or exploding gradient never occur in these networks, and hence, such networks are found to be suitable in modelling complex sequential data such as texts and time series.These networks consist of cells that store historical state information of the network and gates that regulate and control the flow of information through these cells.Three types of gates are used in an LSTM network -forget gates, input gates, and output gates.The forget gates are instrumental in throwing away irrelevant past information, and in remembering only the information that is relevant at the current slot.The input gates control the new information that acts as the input to the current state of the network.The memory cells in the network intelligently aggregate the old state information from the forget gates, and the current input to the network received through the input gate.Finally, the output gates produce the output from the network at the current slot.This output can be considered as the forecasted value computed by the model for the current slot [19].
In this paper, we have presented five different predictive models in this work.The models are different in their architecture and their input data shapes are also different.The five models are: (i) CNN model with univariate input data of past one week, (ii) CNN model with univariate input data of past two weeks, (iii) Encoder-decoder LSTM with univariate data of past two weeks, (iv) Encoder-decoder CNN LSTM model with univariate input data of past two weeks, and (v) Encoder-decoder Convolutional LSTM with univariate input data of past two weeks.
In the following, we briefly discuss the salient architectural details of the two CNN and three LSTM models that we have proposed in this work.
The first model is a univariate CNN model that uses the previous one week's data as the input and carries out multistep forecasting with walk-forward validation.The model is trained using the records in the training dataset, and then it is used to forecast the open values for the five days in the next week.The shape (5, 1) of the input data to the network refers to the only one attribute, i.e., the open values, of the five days in the previous week.The CNN model consists of only one convolution layer that extracts 16 feature maps from the input data with a kernel of size 3.The convolution layer enables the network to read the input data of five days in three time-steps with each reading resulting in the extraction of 16 features.The subsampling layer following the convolutional layer performs a max-pooling operation of size 2, thereby reducing the size of the feature maps.The output of the subsampling layer is then converted into a onedimensional vector and then interpreted by a fully-connected layer before the output layer (which is also fully-connected) predicts the open values for the next five days.The rectified linear unit (ReLU) function has been used in the convolution and fully-connected layer.The performance of the layers is optimized by using the ADAM version of the stochastic gradient descent algorithm.We trained the model using 20 epochs with each batch consisting of 4 inputs.For measuring the level of accuracy in the forecasting of the model, we have used root mean square (RMSE) is used as the metric.Fig. 1 depicts the architecture of the CNN model for Case I. Throughout this paper, we will refer to this model as CNN#1.
Our second model is also a CNN-based regression model that uses the previous two weeks' open values as the univariate input.This model has exactly the same architecture and same parameters as those of the CNN#1 model.However, unlike CNN#1, the model, in this case, is fed with the previous two weeks' data (i.e., 10 records as the data input).We refer to this model as CNN#2, whose architecture is presented in Fig. 2. The third model in the suite is a univariate LSTM model that consists of an encoder and a decoder sub-model and uses the previous two weeks data as the input.The encoder submodel is used for reading and encoding the input sequence, while the decoder sub-model is delegated with the responsibility of interpreting the encoded input sequence, and then making a one-step prediction at a time.The decoder sub-model uses an LSTM network that allows it to memorize the value that was predicted in the previous round (i.e., for the previous day) in the output sequence, and store the internal state of the network while producing the predicted values in the output sequence.The first LSTM layer acts as the decoder sub-model, and it consists of 200 nodes.This layer reads the input sequence with shape (10, 1), indicating that only one attribute (i.e., the open value) for the previous two weeks is fed as the input to the layer.The output of the LSTM decoding layer is a 200-element vector (one element per node) that extracts deep features from the 10 input values.For each time step in the output sequence, the input data sequence is analyzed once.With five time-steps and 200 features extracted by the LSTM decoding layer, the repeat vector layer takes the shape (5,200).The output sequence is now decoded by another decoder LSTM model with 200 nodes [20].The output sequence is then interpreted in each time-step by a full-connected layer before it is sent to the final output layer.The final prediction is done in a step-bystep manner, and not for all the five days in a week at a time.Essentially, it implies that the same fully connected layer and output layer are being utilized for each step of forecasting.This is implemented using a TimeDistributed wrapper layer that creates a wrapped layer to be used for each time-step of the forecasted sequence [20].As the model predicts one value at each time-step (i.e., prediction for a single day), the weekly prediction data has a shape (5,1).The final model and the third one from the LSTM suite in our proposition is an extension of the CNN-LSTM model that performs the convolutions of the CNN (i.e., the way a CNN reads and extracts features from a sequential input data) as an integrated operation with the LSTM for each time-step.We refer to this variant as a Convolutional LSTM model [20].A pure LSTM-based regression model reads the input sequential data directly and computes the internal state and state transitions to produce its predicted sequence.On the other hand, a CNN-LSTM model extracts the features from the input sequence using a CNN encoder sub-model, and then interprets the output from the CNN sub-model using the LSTM decoder sub-model.In contrast to these, a Convolutional LSTM model uses convolution operation to directly read a sequential input into a decoder LSTM submodel.We have reconfigured the ConvLSTM2d class in Keras library that supports two-dimensional ConvLSTM models so that it can handle a one-dimensional univariate time series.With two weeks' univariate time series values as the input, the sequence can be visualized as one row with 10 columns.Although a Convolutional LSTM can read this sequence in a single time-stamp, a single reading and a subsequent convolution operation are not sufficient for deep feature extraction from the input sequence.Hence, we divide the sequential data for 10 days into two subsequences, each with a length of 5 days.The Convolutional LSTM model reads the two subsequences in two time-steps and performs a convolution operation after each time-step.This enables the model to extract more features from the input sequence.The input data in the training set was reshaped into the following structure: [sample no., time-step, row, column, channels].In our model design, the values of the time-step, row, column, and channels were 2, 1, 5, and 1 respectively, as we considered a univariate sequence consisting of the open values only.The architecture of the model is presented in Fig. 5.We refer to this model as the LSTM#3 model.
models based on the training dataset and forecast the open values of the NIFTY 50 index on weekly basis for the records in the test dataset.As a week gets over, the actual open values of the records for that week are included in the training dataset and forecasting for the open values for the next week is done.NSE of India remains operational for five days a week -Monday to Friday.Hence, each round of forecasting involves forecasting of the open values corresponding to those five days in the upcoming week.

Fig. 1 .
Fig. 1.The architecture of univariate CNN model (CNN#1) with prior one week's data as the input

Fig. 3 .
Fig. 3.The architecture of univariate encoder-decoder LSTM model (LSTM#1) with prior two weeks' data as the inputWe have used the ReLU activation function in the two decoder LSTM layers and the TimeDistributed Wrapper layer, and the loss function and the optimizer used in the output layer were MSE and ADAM respectively.The model is trained over 20 epochs using a batch size of 16.These are the optimum values of the hyperparameters obtained using the grid search technique.We refer to this model as the LSTM#1 whose architecture is presented in Fig.3.The fourth model and the second one in the LSTM family that we propose is a variant of a univariate encoder-

Fig. 5 .
Fig. 5.The architecture of the univariate encoder-decoder Convolutional LSTM model with prior two weeks' data as the input (LSTM#3)

TABLE I .
RESULTS OF UNIVARIATE CNN MODEL WITH PREVIOUS ONE WEEK'S DATA AS INPUT (CNN#1)