Share Price Trend Prediction Using CRNN with LSTM Structure

ABSTRACT In this paper, the convolutional recurrent neural network (ConvLSTM) architecture is proposed to predict individual stock prices. The characteristics of stock data are automatically extracted through convolutional neural network (CNN). Afterward, the extracted features are inputted into a long short-term memory (LSTM) model with memory characteristics for prediction. Since CNN has been a representation learning model, it is quite appropriate for automatic feature extraction. Besides, the LSTM architecture of the recurrent neural network could effectively surmount the issues of gradient disappearance and expansion of the time series data. Ten stock historical data had been collected in the experimental data set. Furthermore, many commonly used technical indicators are calculated in advance for the expansion of the dimension of the training samples. The experimental results obtain a 3.449 RMSE (root-mean-square error) in average. Graphical Abstract The architecture of the ConvLSTM


Introduction
Investment and financial management has gradually turned out to be one of the ways for people nowadays to make profit. Currently, the common investment tools available are stock, fund, public debt, option, and future, among others. Since stock has the merits of high return on investment, good elasticity in investment amount, high liquidity, and rich in stock market information, it becomes the first choice for most investors. The rise and fall of stock price in the future has a certain extent of correlation with its past rise and fall; hence there exists a certain extent of predictability. However, stock has many uncertainties. In consequence, there are a variety of forecasting tools on the market.
Based on different foundations, techniques, and analytical methods, various models have been proposed by many experts to predict stock prices. Basic analysis considers various external macroeconomic factors to attain a comprehensive reasoning in stock price change. Approach of this kind relies on the experience of the analyst. Its investigation of financial variables is subjective. On the other hand, technical analysis focuses on stock value, quantity, and other financial facts to estimate the future development of stocks [1].
In the recent years, owing to the booming development of artificial intelligence, machine learning has been introduced to the various industries as a development strategy to elevate competitiveness. Consequently, the stock exchange also promotes the intelligent stock investment and financial management tools. Since the rise and fall in stock prices are closely related to past prices, the use of historical data of individual stocks to predict future price is currently the most commonly used method.
Deep learning is an artificial neural network. The initial concept could be traced back to the research on perceptron published by Rosenblatt in 1957 [2]. He proposed the concept of perceptron algorithm based on artificial neuron model. However, this algorithm could not handle the nonlinear separable issue. Later, Rumelhart and Hinton et al. proposed the Backpropagation algorithm [3], which could successfully and effectively classify the nonlinear issue. The algorithm had been widely used in various practical applications since after. However, due to the hardware limitation at the time, there was no further breakthrough. Nowadays, owing to the recent maturity of big data and high-performance computing technology, a new chapter in machine learning has been commenced.
Chiang et al. used the backpropagation neural network (BPN) to predict stock prices [4]. Besides, some technical indicators were also used, including stochastic oscillator, relative strength index, exponential smoothing similarity moving average, directional movement index, bias ratio, and all foreign capital and financing balance. The experimental results had shown that it was not possible to get positive compensation for all stocks. Besides, proper variables must be inputted to improve the accuracy in stock price prediction. Kim and Han [5] proposed the use of Bootstrap aggregating, improved by the random forest method, to predict the trend in stock prices. The experimental results had shown that if other technical indicators could be added in, the accuracy would be enhanced.
The feature selection methods proposed by Zhang et al. [6] were helpful to improve the predictive performance, such as the principal component analysis, classification and regression tree, and least absolute shrinkage and selection operator. The causal feature selection (CFS) was also proposed to compare with the aforementioned feature selection methods. The experimental results had shown that the accuracy via CFS method was better. This hinted that CFS might have high potential in quantitative investment.
Ballings et al. [7] used the random forest method, AdaBoost, and Kernel factory to compare with neural networks, logistic regression, support vector machines, and K-nearest neighbor methods. The experiments had shown that the random forest method provided the best result.
Chang et al. [8] proposed the intelligent piecewise linear representation (IPLR) method. It was a piecewise linear representation (PLR) based on the genetic algorithm. It was applied to BPN. It predicted the rising, stable and declining trends on different stocks. The experimental results had shown that huge profits could be obtained via the IPLR method. In summary, this article tried to predict future stock prices using today's popular deep learning algorithms.

Related Works
Deep learning is a branch of machine learning. The two most representative models are convolutional neural network (CNN) and recurrent neural network (RNN). CNN model has been widely applied to image recognition and has attained immense breakthroughs. RNN model is suitable for time series data learning. One of the variations, the long short-term memory (LSTM), has overcome the issues of gradient disappearance and expansion in time series data. In consequence, these two models will be introduced respectively.

Convolutional Neural Network
LeCun et al. [9] proposed the CNN to identify handwritten characters. It is a neural network through convolution and pooling operations, and is finally connected to a fully connected layer, as shown in Figure 1. The effect of CNN on feature extraction is extraordinary. Specifically, the classification models in the field of image recognition are originally from CNN.

Long Short-Term Memory
The RNN is a directed graph sequence neural network which connects outputs to inputs. Furthermore, the current input is the result of previous output with the upcoming data. In addition, relevant information is recorded in the neurons, so that the current output is related to the pervious output.
The LSTM is a variant of the RNNs [10]. Its special design can solve the long-term dependence issue of RNN. There are three special gate designs for LSTM: input gate, forget gate and output gate, as shown in Figure 2. They can be subdivided into block input, input gate, forget gate, cell state, output gate, and block output. The related equations are shown as follows: (1) Block input, z t : (2) Input gate, i t : (3) Forget gate, f t : (4) Cell state, c t : (6) Block output, y t : Where denotes the pointwise multiplication of two vectors. It is proved that LSTM architecture is better than the performance of traditional RNN networks. In ICDAR handwriting-recognition competition, the neural network using LSTM has won the 2009 championship. In 2013, the TIMIT Speech Dataset using LSTM had attained an amazing 17.7% error rate record.

Convolutional Recurrent Neural Network (ConvLSTM)
Previously, prior to training a classifier, an artificially designed feature extraction method must be derived  firstly. The whole process, in general, is to extract the important features of the input data via the artificially designed feature extraction method. These features are used to train the classifiers. Then the finally prediction is made. Representation learning enables the model to learn by itself the important features from data. The most representative model is the Autoencoder. This method can derive more representative features than the previous artificial feature extraction method. It elevates the overall classification efficiency significantly.
In this paper, a convolutional recurrent neural network (ConvLSTM) architecture is proposed. It combines a CNN model with representation learning characteristics and an LSTM model with memory characteristics. Since the rise and fall of stock prices is closely related to past prices, the stock data possesses a time-series feature. Hence the LSTM model can play a better role via its memorized feature.

Neural Network Model Design
The network architecture proposes in this paper is an end-to-end neural network with the combination of a CNN and a RNN. It employs an LSTM architecture called ConvLSTM architecture. Its architecture is shown in Figure 3. The first layer is a convolutional layer. Its purpose is to extract the characteristics of input data, such as the instantaneous characteristics of the rise and fall in stock trend. The second and third layers are all LSTM layers. The design is intended to retain past data through the LSTM features. Historical information is used to learn and evolves to be better networks. Two layers of LSTM are used to deepen the network, so as to acquire more subtle data. Eventually, the data are connected to a fully connected layer as outputs.

Neural Network Parameter Settings
is used in this paper. Prior to the start of neural network training, for the model architecture is used in this article, the input data must be reshaped into a format acceptable to the first layer of convolution. Supposing that there are totally 4000 historical stock data collected; and each data contains 18 dimensions. The current historical data format could be simply expressed as (4000, 18). The kernel size used in onedimensional convolution layer is 1 × 5. To enable the convolution operation containing contextual relevance, the input data must be reshaped as (4000, 5, 18). The shaping method is shown in Figure 4. The original input datum containing only a single value is expanded to conform to the convolutional kernel size. Assuming that the ith input datum When the neural network starts training, the numerical value of the convolution kernel is randomly initialized. Through network iteration, it slowly learns the more effective enhancement feature of the convolution kernel.
After the convolution operation of the first layer, the characteristics of the input data are enhanced via the characteristics of convolution. Afterward, the data are directly connected to the two LSTM layers in the RNN. Through the LSTM design, the neural network is able to learn the contextual relevance of the stock historical data. After the two LSTM layers, the data are connected to a fully connected layer. An output now becomes a single value; and the linear activation function is used. The effect of linearity is equivalent to directly outputting the received numerical value. The output value now has no upper or lower limit. The predicted stock value might exceed the maximum or minimum value of the input datum. In order not to restrict the predicted value, the linear function is used as an activation function for the fully connected layer in this paper. The detailed network setting parameters are shown in Table 1.
The loss function used in this paper is mean-square error (MSE). In statistics, MSE is an important indicator used to measure the performance of estimated continuous variables, as shown in Equation (11).
The N denotes the number of sample data. Y i denotes the correct answer of the output. In this paper, it represents the closing price in the next time. Here i denotes the ith neuron output in the output layer, where i ¼ 1; 2; Á Á Á ; o; andŶ i denotes the output of the neural network prediction. The learning rate affects the training time and the performance of the model. The training rate needs to be larger in the beginning learning in order to achieve a large gradient descent, but as the epoch increases, the learning rate must gradually decrease. The learning rate can be adjusted through the two types of learning rate schedules and adaptive learning rate method.
Step decay is a learning rate schedule, which reduces the learning rate by every ε epochs, and the step decay is shown as follows: where l 0 is the initial learning rate. The df is a decay factor for less than 1, and the e is the eth epoch.
AdaDelta [11] is an adaptive learning rate method, and the results in section 4 show that AdaDelta can obtain better prediction results. Finally, AdaDelta is used as a method to adjust the learning rate in this paper. AdaDelta is shown as follows: where ρ and ε are constants; ρ is the attenuation coefficient; ε is the constant added to avoid the denominator of Δw t being 0; g is the gradient of the optimized objective function; and G t is the additional attenuation coefficient adjusted by the previous G tÀ1 . The AdaGrad used in the past is directly adding g. It causes the gradient only attenuating to 0 with the number of iterations. The D t in AdaDelta is similar to the momentum effect in gradient descent. Consequently, the Δw t at the current time point is related to the momentum of the past time point; it is also related to the slope. Through the aforementioned mechanism, AdaDelta does not have to set additional learning rate. It adjusts its magnitude according to the slope of the gradient.

Experimental Results
Usually a large amount of data is required for the architectures of deep learning and neural network. They are trained with a large amount of data, so that the neural network model can learn the characteristics or decision patterns in the big data. The stock historical data used in this paper had been come from the Taiwan Stock Exchange Corporation (TWSE). The daily transaction information of each stock is fetched from TWSE, including the transaction amount, the strike price, the opening price, the bid price, the ask price, the closing price, the price fluctuation limit, the trading volume, the net buy/sell of investment trust corporation, the net buy/sell of OTC operators, and the net buy/sell of foreign investors.  In addition to expansion of the depth of the data, the neural network is needed to widen the breadth of the data. When the neural network only learns certain specific information, and without additional and relevant information, it is hard for the network model to find the relevance or characteristics between specific data and the final results could be undesirable. Thus, besides the aforementioned information, other additional technical indicators are calculated and incorporated into the input data in this paper, including technical indicators Stochastic Oscillator [12] KD, Relative Strength Index (RSI) [13], and Bollinger Bands (BBands) [14]. Technical indicators have a certain extent of influence on investors in the technical aspects of stocks. Hence, technical indicators used by most people in the market are adopted.
Ten stocks are collected in this paper, including food, petrochemical, steel, electronics, financial, electrical, machinery, and construction industries are shown in Table 2.
The input data used in this paper are 18 in dimensions. Those dimensions were the transaction amount, the strike price, the opening price, the bid price, the ask price, the closing price, the price fluctuation limit, the trading volume, the net buy and sell of institutional investors, K and D values of Stochastic Oscillator [12], the RSI5 and RSI10 of Relative Strength Indexes [13], the upper band, moving average, and lower band of Bollinger Bands [14], as shown in Table 3.
The computer equipment and platform tools used in the experimental results of this paper are shown in Table 4.
The estimator of the experimental results uses the root-mean-square error (RMSE) as the basis in the following experiment. The RMSE is shown as follows: Here Y i denotes the correct answer for the input datum Xi; andŶ i denotes the predicted output of the neural network.
In this experiment, the input data that are divided to the first 80% is the training data, and the last 20% is the testing data. During the experiment, the number of iterations is 1000 times. Data are trained in batch. The batch size is settled to 200. In this study, we use two strategies for adjustment learning rate. The step decay as the learning rate schedules and the AdaDelta as the adaptive learning rate method are experimented for the two strategies. Finally, the AdaDelta is adopted as the adjusting method for learning rate in the following experiment. The RMSE of each stock is shown in Table 5.
Two of the stocks have higher RMSE. They are brought out for special discussion. The two stocks are    Voltronic Power (6409) and Ennoconn (6414). The loss functions and training prediction curves of these two stocks are shown in Figure 5.
Although machine learning can predict the closing price of the future by past historical prices, it is also affected by other factors. The factors affecting the stock market price are politics, economy, industry and company operations, etc., so there are many uncertainties in the rise and fall of each stock. The effects of these factors are not necessarily predictable from historical prices, so there are few programs that fully predict future price changes.
Judging from the stock price chart, the two stocks do not have obvious extreme values. Also their loss function curves do not have obvious extremes. The reason might be due to amount of data, as shown in Table 2. The amount of data of each stock is between 2700 and 3900. However, the data of the two stocks Voltronic Power and Ennoconn collected from TWSE were less than 800. Hence it can be concluded that the actual amount of data for neural network might seriously affect the performance.
In some industries, stock price changes are more difficult to predict. For example, the ASUS and the Voltronic Power of the Electronics industry has higher RMSE, but the RMSE of ASUS is gradually reduced by providing more training data. In Figure 6, the red box is the part that makes the RMSE particularly high. However, the closing price predicted by ConvLSTM on the test dataset is still a long-term trend that matches the real closing price, and almost all of the predicted prices are lower than the real price. Therefore, the stock losses is hardly existed in the predicted price, so the result of the forecast is to provide a conservative investment.
The networks using Dropout [15] or not are compared in this paper. Dropout is used to avoid overfitting in network models. The main idea is that when the network model possessed too many neuron nodes, the model can be overfitting the training data. It will lead to bad result in actual testing. In this paper, for each layer of the ConvLSTM architecture, Dropout is added. The discard rate is settled to 0.2. The comparison results are shown in Table 6. The bold values are relatively low values. The results show that the majority of stocks that do not use Dropout actually have lower error rates. The reason might be that the neural network model being proposed in this paper merely has a simple three-layer architecture. Hence the model has fewer hyper parameters. It might lead to increase in error rates in using Dropout models.
Pure LSTM network architectures are compared to ConvLSTM in this paper. The experimental comparison results are shown in Table 7. The ConvLSTM can obtain lower test error in the eight stocks. Note that the network architecture used in this paper merely adopts one layer of convolution to improve the capability in extracting features. If the number of layers is increased in the future, it will likely exceed the LSTM overall performance.

Conclusion
Historical stock data with time series are collected in this study. The collected historical stock data are daily historical transaction data. The RNN with memory characteristics is used. The LSTM architecture in RNN is also used. It can effectively solve the Long-term dependencies problems of neural networks. Further, the characteristics of CNN in intercepting the ups and downs features of stock data are utilized. The problem of gradient disappearance and expansion of time series data is solved via connecting the LSTM architecture in the RNN. The experimental results for each stock provide acceptable error rates.
The experimental results show that the amount of data has a profound impact on the effectiveness of deep learning. In other words, more detailed information should be obtained, such as the stock trading data of 30-minute line or 60-minute line. It is believed that the accuracy will be enhanced.
In the future, the predicting accuracy can be improved via analysis of stock fundamentals, or the capture of stock news highlights into the input data. However, this is still a research for further breakthrough. From the viewpoint of pure technical analysis in prediction, the results shown in this paper have achieved acceptable error values.

Disclosure statement
No potential conflict of interest was reported by the authors.