Is Deep Learning for Image Recognition Applicable to Stock Market Prediction?

Stock market prediction is a challenging issue for investors. In this paper, we propose a stock price prediction model based on convolutional neural network (CNN) to validate the applicability of new learning methods in stock markets. When applying CNN, 9 technical indicators were chosen as predictors of the forecasting model, and the technical indicators were converted to images of the time series graph. Forverifying the usefulness of deep learning for image recognition in stock markets, the predictive accuracies of the proposed model were compared to typical artificial neuralnetwork (ANN)model and support vectormachine(SVM) model. From the experimental results, we can see that CNN can be a desirable choice for building stock prediction models. To examine the performance of the proposed method, an empirical study was performed using the S&P 500 index. This study addresses two critical issues regarding the use of CNN for stock price prediction: how to use CNN and how to optimize them.


Introduction
Stock markets have random walk characteristics.Random walk characteristics in stock markets mean that the stock price moves independently at every point in time.Due to the random walk characteristic, stock market prediction using past information is very challenging [1].In addition, Carpenter et al. [2] insisted that the stock market can be influenced by complex factors, such as business and economic conditions and political and personal issues.There is a high degree of uncertainty in the stock market, which makes it difficult to predict stock price movements [3].
With the globalization and development of information and communication technology (ICT), however, many people are looking toward stock markets for earning excess returns under a convenient investment environment.Therefore, the study of stock market prediction has been a very important issue for investors.
Stock market prediction methods can be categorized into fundamental analysis and technical analysis [4].Fundamental analysis is a method of analyzing all elements that affect the intrinsic value of a company, and technical analysis is a way of predicting future stock price through graph analysis.
When fundamental analysis is applied, some problems may occur.For example, forecasting timeliness can be reduced, subjectivity can be intervened, and the difference between stock price and intrinsic value can be maintained for a long time [5].Due to the limitation of fundamental analysis, many studies related to stock market prediction using technical analysis have been conducted.
In recent years, many researchers have suggested that artificial neural networks (ANNs) provide an opportunity to achieve profits exceeding the market average by using technical indicators as predictors in stock markets [6][7][8][9].Shin et al. [10] proposed a stock price prediction model based on deep learning techniques using open-high-lowclose (OHLC) price and volume and derived technical indicators in the Korean stock market.
However, since many financial market variables are intertwined with each other directly or indirectly, it is difficult to predict future stock price movements by using technical indicators alone, even when applying a typical deep learning model.
In this study, a stock price prediction model based on convolutional neural network (CNN) and technical analysis is proposed to validate the applicability of new learning 2 Complexity methods in stock markets.Unlike typical neural network structures, the CNN, which is most commonly applied to analyze visual imagery, can improve learning performance by convolution and pooling processes [11].For applying the CNN, various technical indicators, which are used for technical analysis, have been generated as predictors (input variables) of the prediction model, and these technical indicators were converted to images of the time series graph.This study compared the forecasting accuracies of the proposed model and the typical ANN model as well as support vector machine (SVM) model to verify the usefulness of deep learning for image recognition in the stock market.
The remainder of this paper is organized as follows.Section 2 describes the theoretical background for typical ANN, SVM, and CNN.Section 3 introduces the proposed model for stock market prediction in this study.Section 4 demonstrates the empirical results and analysis.Finally, we draw conclusions in Section 5.

Typical ANN.
A typical ANN model is a data processing system consisting of layers, connection strengths (weights), a transfer function, and a learning algorithm.The ANN has a structure in which relations between input and output values are learned through iterative weight adjustments.The neural network structure consists of a fully connected layer, in which all neurons are combined with adjacent layers.
The ANN consists of a perceptron, called a neuron, and the overall structure of the general ANN is given in Figure 1(a).The general ANN consists of three layers: the input layer, the hidden layer, and the output layer.In the input layer, the neurons correspond to each input variable.The neurons in the hidden layer and output layer perform the function of calculating the summation of input values and weights in the previous layer.The fully connected layer structure may cause a problem, in which spatial information is lost by ignoring the shape of the data [12].To increase the representation ability of the data in the ANN model, the number of hidden neurons is increased, or hidden layers are added.However, a vanishing gradient problem occurs when a backpropagation algorithm carries error information from the output layer toward the input layer [13].[14], is an artificial intelligence learning method.It is a machine learning technique based on statistical learning theory and structural risk minimization.The purpose is to identify the optimal separating hyperplane to divide two or more classes of data with the learning mechanism by training the input data.SVM is a type of supervised learning to predict and classify items and it is well known as useful machine learning algorithm for classification [15].

SVM. SVM, developed by Vapnik
Assume that there are n number of data points existing in the eigenspace, {( 1 ,  1 ), ( 2 ,  2 ), ⋅ ⋅ ⋅ , (  ,   )}, the symbol  1 ∈ {+1,−1} indicates the classification for data point  1 .These data points serve as the training data for the identification of the optimal separating hyperplane as The symbol w denotes the separating margin and a is a constant.There could be multiple solutions to w, but the optimal w is the one with the maximum margin.Equation ( 2) is the solution to the optimization problem: After the network learning obtains the w with the maximum margin, it is then possible to establish the classification Ĉ by using (3) on the test data that has yet to be classified.
2.3.CNN.The CNN, as a deep learning technique, is a model that imitates the visual processing of living organisms that recognize patterns or images.The CNN has a structure in which one or more convolutional layers and pooling layers are added to a fully connected layer, which results in an ANN structure.
Figure 2 shows the structure of LeNet-5, which is the most famous CNN algorithm.According to Figure 2, a fivelayer CNN was established.LeNet-5 is composed of two convolutional layers for the first two layers and three fully connected layers for the remaining three layers.First, the image of the input layer is filtered through the convolutional layer to extract appropriate features [16].
The convolutional layer is the first layer to extract features from an input image.Convolution preserves the relationship between pixels by learning image features using small squares of input data.Convolution is a mathematical operation that requires two inputs, such as an image matrix and a filter or kernel.
A convolution operation is an elementwise matrix multiplication operation, where one of the matrices is the image and the other is the filter or kernel that turns the image into something else.The output of this is the final convoluted image.If the image is larger than the size of the filter, the filter is moved to various parts of the image to perform the convolution operation.If the convolution operation is performed each time, a new pixel is generated in the output image.
In image processing, there are few sets of filters that are used to perform several tasks.The convolution of an image with different filters (kernels) can perform operations, such as edge detection, blurring, and sharpening, by applying filters.

Input layer
Hidden layer Output layer In CNNs, filters are not defined.The value of each filter is learned during the training process [17].Every filter is spatially small (in terms of width and height) but extends through the full depth of the input volume.During the forward pass, each filter is moved across the width and height of the input volume, and dot products are computed between the entries of the filter and the input at any position.As the filter is moved over the width and height of the input volume, a 2-dimensional feature map that gives the responses of that filter is produced at every spatial position [18].Intuitively, the network learns filters that activate when they see some type of visual feature, such as an edge of some orientation of the first layer or eventually the entire honeycomb or wheel-like patterns within the higher layers of the network.An entire set of filters is generated in each convolutional layer, and each one produces a separate 2dimensional feature map.
Figure 3 shows the process of generating a feature map for a convolutional layer.The original image is the one on the left, and the matrix of numbers in the middle is the convolutional matrix or filter.Consider a 4 x 4 matrix, whose image pixel values are 0, 1, 2, and 3, and a 3 x 3 filter matrix, as shown in Figure 3.Then, the convolution of the 4 x 4 image matrix multiplies with the 3 x 3 filter matrix, which results in the feature map, as shown in Figure 3.
The activation functions of every convolutional layer and the first two fully connected layers are shown in (4) (i.e., ReLU (Rectified Linear Unit)).The ReLU function is used to solve the vanishing gradient, which does not reflect the output error of the neural network as it moves away from the output layer in the process of the neural network [19].
Generally, the pooling layer is located after the convolutional layer.The pooling layer was introduced for two main reasons [20].The first was to perform down sampling (i.e., to reduce the amount of computation that needs to be done), and the second was to only send the important data to the next layers in the CNN max pooling layers by taking the largest element from the rectified feature map, as shown in Figure 4.The most common form is a pooling layer, with filters of size 2x2, which are applied with a stride of 2 down samples every depth slice in the input by 2 along both the width and height, discarding 75% of the activation.These values are then linked to a fully connected layer, such as an ANN structure, to output the label-specific prediction probabilities

CNN Architecture for Building a Stock Price Prediction Model
3.1.Input Image Generation.In this study, historical S&P 500 minute data are used, and these time series data are divided into 30 minute increments for stock price prediction.
When learning a prediction model, the closing price and technical indicators are considered as input variables, and target variables are set to values expressed as 1 or 0. If the target has a value of 0, the closing price at time t − 1 is higher than the closing price at time t, as shown in (5).In other words, the stock price prediction model proposed in this study learns the moving pattern of the independent variables for 30 minutes and forecasts the increase or decrease in the stock price after one minute.
Table 1 shows the technical indicators used in this study.Nine technical indicators are selected for the prediction model (refer to [21]): simple moving average (SMA), exponential moving average (EMA), rate of change (ROC), moving average convergence divergence (MACD), fast %K, slow %D, upper band, lower band, and %B.Finally, the technical indicators calculated by Table 1 are standardized to have a value between 0 and 1 for converting to images of time series graph.Now, the technical indicators are converted to the images of a time series graph using the input image of the CNN.Finally, 1100 input images in the training period and 275 input images in the test period are generated.Figure 5 shows the example of the input images in the test period when applying only 3 input variables.In Figure 5, the red line, green line, and blue line indicate the closing prices of the S&P 500 index, SMA 20, and EMA 20, respectively.

CNN Parameter Settings for the Best Prediction Model
Architecture.In this study, the LeNet-5 algorithm is used for stock price prediction.The CNN structure of this study is shown in Figure 6.The 64×64×3 input image is filtered in the first convolutional layer by 3×3×3 kernels, with a stride of 1 pixel.Then, max pooling is used in the pooling layer.The main purpose of the pooling operation is to reduce the size of the image as much as possible, taking a 2×2 matrix to minimize pixel loss and obtain the correct characteristic region [22].
The second convolutional layer filters the output of the first convolutional layer using 3×3×3 kernels, with a stride of 1 pixel.After the pooling process is performed once again, flattening, which is a process of converting a two-dimensional array into one long continuous linear vector, is performed.That is, the process of converting a pooled image pixel into a one-dimensional single vector is performed.
In the fully connected layer, the entire connection of 512 neural networks is performed.The number of neurons in both of the first two fully connected layers is 512.Then, because the process is a binary classification, the connection goes through an output layer that contains only one node.The last layer uses the sigmoid activation function.
Adaptive Optimization Methods.Stochastic gradient descent (SGD) has been widely used when training CNN models.Despite its simplicity, SGD performs well empirically across a variety of applications but also has strong theoretical foundations [23].
Training neural networks is equivalent to solving the nonconvex optimization problem in min where  represents a loss function.The iterations of SGD can be described in where   denotes the  ℎ iteration,   represents a (tuned) step size sequence (also called the learning rate), and ∇(  ) denotes the stochastic gradient computed at   .The Adam optimization algorithm is an algorithm that can be used instead of the classical SGD procedure to update network weights iteratively based on training data.The Adam algorithm is popular in the field of deep learning because it achieves good results quickly [24].The updated Adam equation can be represented in where  ∈ [0, 1) represents a momentum parameter, and V 0 is initialized to 0.
Dropout.The dropout method introduced by Hinton et al. [25] is known as a very effective way to reduce overfitting when applying neural networks with many hidden layers.This method consists of setting the output of each hidden neuron in the chosen layer to zero with some probability (usually 50%).In this paper, the dropout method was applied after the pooling operations.
Loss Function.The ANN uses the loss function as an indicator to determine the optimal weight parameter through learning [26].In this study, the mean square error (MSE) and cross entropy error (CEE) were adopted to comprise the objective function (loss function).Equations ( 10) and (11) show the MSE measure and CEE measure, respectively.  represents the output of the neural networks, and   represents the target value in (10) and (11).
When calculating the MSE, the neurons in all output layers are entered.This loss function is most commonly used because it is simple to calculate.Basically, the difference between the output of the model and the target distance is used as an error.The advantage of squaring the distance difference is that the difference between data with small distance differences and the large data error becomes larger, which has the advantage of being able to know exactly where the error is located.
The CEE only counts the neuron corresponding to the target, which results in a larger penalty as it moves farther from the target.
Epoch and Batch Sizes.

Empirical Studies
4.1.Experimental Settings.In this study, the empirical analysis covers a 1-month period.The dataset consists of minute data of the S&P 500 index from 10:30 pm on April 3, 2017, to 2:15 pm on May 2, 2017.The entire dataset covers 41,250 minutes.Figure 7 shows a time series graph of the S&P 500 closing price during the analysis period.
Among the entire dataset, 33,000 minutes are allocated for the training data (80% of the entire data), and 8,250 minutes are allocated for the testing data (20% of the entire data).When the time series data are converted into an image every 30 minutes, the training data consist of 1,100 input images, and the testing data consist of 275 input images.
For experimenting with the CNN algorithm, the technical indicators used for forecasting the stock price in [21] are employed as input variables here.
To evaluate the forecasting accuracy, the following three measurements are employed: hit ratio, sensitivity, and specificity (see ( 12)-( 14)).
In ( 12)-( 14),  0,0 and  0,1 represent the number of predicted values of 0 and the number of predicted values of 1 when the actual value is 0, respectively.Additionally,  1,0 and  1,1 represent the number of predicted values of 0 and the number of predicted values of 1 when the actual value is 1, respectively.The hit ratio is a metric or measure of the prediction model performance when the target variable is binary.While  In this study, these models are called CNN1, CNN2, CNN3, and CNN4, respectively.Table 2 presents the input variables applied to these four models.Table 3 shows the accuracies of the four models.To determine the adaptive optimization method, all CNN parameters (except for the adaptive optimization method) are applied equally to each model.Here, the dropout probability, batch size, and epoch are fixed at 0.5, 1, and 2500, respectively.Additionally, the steps per epoch in the training and testing data were set to 250 and 50, respectively, and the loss function was the CEE.
As shown in Table 3, when the SGD optimizer is used for the adaptive optimization method, CNNs achieve a high level of predictive performance.CNN1, which is the prediction model without technical indicators, has the highest hit ratio among the four models.Therefore, technical indicators cannot affect the positive impact of the CNN on stock price forecasting.However, a large difference between the sensitivity and specificity of CNN1 indicates that an overfitting problem occurs due to considering only one input variable.
Table 4 shows the accuracies of the four models with SGD optimizers using different loss functions.From Table 4, we know that the use of the MSE as a loss function increases the predictability rather than the use of the CEE.
The accuracies of the four models with the SGD optimizer and MSE loss function using different dropout probabilities are given in Table 5. CNN1 has the highest hit ratio (0.85) when the dropout probability is 0. The results in Table 5 show that an increase in the dropout probability does not contribute to the predictive performance of the CNN, which is interesting because dropout options are widely known to play an important role in deep learning architecture construction.In the case of this experiment, however, since the learning image of CNN models is simpler than the character recognition or text recognition generally applied to CNNs, it is considered that the dropout option has a negative effect.
Table 6 shows the accuracies of the four CNN models with different steps per epoch when applying the SGD optimizer, To verify the performance of CNN models, ANN and SVM models are generated and their accuracies are evaluated.The same input variables for CNNs in Table 2 are applied to ANNs and SVMs.Before exploring the ANN and SVM for stock price prediction, small preliminary experiments were performed to obtain proper parameter settings for the successful implementation of the ANN and SVM.As a result, the number of hidden layers, the number of hidden units, and the activation function of ANN are set to be 1, 3, and sigmoid, respectively.And SVM uses polynomial kernel to make a nonlinear classification interface.
Based on the results show in Table 7, when the ANN and SVM are applied, technical indicators are shown to be input variables positively affecting the stock price prediction, as opposed to when the CNN is applied.Nevertheless, the predictive performances of the ANN and SVM are lower than that of the CNN (refer to Table 5 when dropout probability is 0).Therefore, CNNs using input images can be a useful method for stock price prediction.In practice, CNN models are good at detecting patterns in images such as lines.CNNs can detect relationships among images that humans cannot find easily; the structure of neural networks can help detect complicated relationships among features.For example, in CNN, color images are composed of RGB channels, and the features of input for each channel can be extracted.This allows CNN to extract features better than when it uses a vectorized input such as ANN [28].

Concluding Remarks
In this study, we attempted to check the applicability of the CNN for stock market prediction.Previously, many researchers have suggested that ANNs offer a chance to achieve profits in financial markets.Therefore, this study determined the predictive performances of the CNN and ANN to validate the usefulness of the CNN.In addition, SVM, well known for useful classification algorithm, was employed to verify the usefulness of the CNN.
To design the CNN architecture, this study focused on two points.First, the CNN parameters were optimized.For this, the experiments were performed over the parameter range given in Table 8, and the best experiments were obtained.Second, technical indicators, which are well known as efficient input variables in stock price forecasting, were verified to play a role as a suitable input image for CNNs when technical indicators are converted into images.
Our empirical experiments demonstrate the potential usefulness of the CNN by showing that it could improve the predictive performance more than the ANN.In this sense, the CNN appears to be a desirable choice for building stock prediction models.In addition, technical indicators were input variables that did not positively affect the stock price prediction when the CNN was implemented for the prediction model.This result is because technical indicators cannot be good input variables, as they are similar to the moving pattern of the closing price.Therefore, building a stock price prediction model with better performance can be expected if other factors that move opposite the stock price, such as gold price and interest rate, are considered as input variables for the CNN.As a result of this study, it is difficult to predict technical indicators of stock market by general data mining classification technique.Therefore, CNN, which is a deep learning method that analyzes time series data into graphs, can be a useful for stock price prediction.

Figure 1 (
b) represents the relationship between input and output values in each layer.In Figure 1(b),  1 ,  2 , and  3 represent input signals and have weights of  1 ,  2 and  3 , respectively.The net input function combines the input signal and weight linearly and converts the value through the activation function to output the signal y.

Figure 1 :Figure 2 :
Figure 1: Typical ANN structure.(a) The overall structure of the general ANN.(b) The relationship between the input and output values in each layer.

Figure 3 :Figure 4 :
Figure 3: The process of generating the feature map of the convolutional layer.

Figure 5 :Figure 6 :
Figure 5: Example of the input image.(a) Generated input image when the closing price increases after 1 minute.(b) Generated input image when the closing price decreases after 1 minute.

Table 1 :
Technical indicators used for the proposed prediction model.) •  + ( ( − 1) • (1 − ) () = The closing price at time i ( − 1) = Exponentially moving average of the closing price at time i-1  = the percentage using the price value MACD  −   Fast MA is the moving average (5) Slow MA is the moving average (20) %, )  % = SMA (Fast %K, KMA) KMA = Period of moving average used to smooth the slow %K values Upper Band   + ( •  −  V)   = n-period moving average Lower Band   − ( •  −  V)  = factor applied to the standard deviation value %B  −     − [27]rs to the number of training examples utilized in one iteration.The batch size is 1[27]in this study.Steps per epoch indicate the number of batch iterations before a training epoch is considered finished.These steps represent the total number of steps (i.e., batches of samples) before declaring one epoch finished and starting the next epoch.
An epoch consists of one full training cycle for the data.An epoch is an iteration over the entire training data and target data provided.The epochs are equal to 2500 in this study.The batch size is a term used in machine learning and

Table 2 :
Input variables for each CNN model.

Table 3 :
Accuracy comparison for CNNs with different optimizers during the test period.

Table 4 :
Accuracy comparison for CNNs with different loss functions during the test period.

Table 5 :
Accuracy comparison for CNNs with different dropout probabilities during the test period.

Table 6 :
Accuracy comparison for CNNs with different steps per epoch during the test period.

Table 7 :
Predictive accuracies of ANNs and SVMs.

Table 6 ,
we can realize that an increase in steps per epoch causes an overfitting problem and results in a decrease in accuracy.As a result, it is not effective in increasing the number of steps for stock price prediction based on a CNN using technical indicators.