An Artificial Neural Network for Data Forecasting Purposes

Considering the fact that markets are generally influenced by different external factors, the stock market prediction is one of the most difficult tasks of time series analysis. The research reported in this paper aims to investigate the potential of artificial neural networks (ANN) in solving the forecast task in the most general case, when the time series are non-stationary. We used a feed-forward neural architecture: the nonlinear autoregressive network with exogenous inputs. The network training function used to update the weight and bias parameters corresponds to gradient descent with adaptive learning rate variant of the backpropagation algorithm . The results obtained using this technique are compared with the ones resulted from some ARIMA models. We used the mean square error (MSE) measure to evaluate the performances of these two models. The comparative analysis leads to the conclusion that the proposed model can be successfully applied to forecast the financial data .


Introduction
Predicting stock price index and its movement has been considered one of the most challenging applications of time series prediction.According to the efficient market theory proposed in [1], the stock price follows a random path and it is practically impossible to make a particular long-term global forecasting model based on historical data.The ARIMA and ANN techniques have been successfully used for modelling and forecasting financial time series.Compared with ANN models, which are complex forecasting systems, ARIMA models are considered to be much easier techniques for training and forecasting.An important feature of neural networks is the ability to learn from their environment, and, through learning to improve performance in some sense.One of the new trends is the development of specialized neural architectures together with classes of learning algorithms to provide alternative tools for solving feature extraction, data projection, signal processing, and data forecasting problems respectively [2].Artificial neural networks have been widely used for time series forecasting and they have shown good performance in predicting stock market data.Chen et al., [ In the recent years, a series of studies have been conducted in the field of financial data analysis using ARIMA models for financial time series prediction.Meyler et al, [9] used ARIMA models to forecast Irish Inflation.Contreras et al, [10] predicted next day electricity prices using ARIMA methodology.V. Ediger et al, [11] used ARIMA model to forecast primary energy demand by fuel in Turkey.Datta [12] used the same Box and Jenkins methodology in forecasting inflation rate in the Bangladesh.Al-Zeaud [13] have used ARIMA model for DOI: 10.12948/issn14531305/19.2.2015.04modelling and predicting volatility in banking sector.The paper is organized as follows.In the second section of the paper, we briefly present the ARIMA model for prediction.Next, the nonlinear autoregressive network with exogenous inputs aiming to forecast the closing price of a particular stock is presented.The ANN-based strategy applied for data forecasting is analysed against the ARIMA model, and a comparative analysis of these models is described in the fourth section of the paper.The conclusions regarding the reported research are presented in the final part of the paper.

ARIMA Model
The Auto-Regressive Integrated Moving Average (ARIMA) model or Box-Jenkins methodology [14]  The process is an ARIMA(p,d,q) process if is a causal ARMA(p,q) process, that is where and are the autoregressive and moving average polynomials respectively, B is the backward shift operator and is the white noise.The problem of predicting ARIMA processes can be solved using extensions of the prediction techniques developed for ARMA processes.One of the most commonly used methods in forecasting ARMA(p,q) processes is the class of the recursive techniques for computing best linear predictors (the Durbin-Levison algorithm, the Innovations algorithm etc.).In the following we describe the recursive prediction method using the Innovation algorithm.[15] Let be a zero mean stochastic process and K(i,j) its autocorrelation function.We denote by where the coefficients { nj , j=1,2,…,n; v n } are given by a recursive scheme [15].
The equation ( 7) gives the coefficients of the innovations, in the orthogonal expansion (6), which is simple to use and, in the case of ARMA(p,q) processes, can be further simplified [15].

The ANN-Based Technique for Forecasting the Closing Price of a Stock
The nonlinear autoregressive network with exogenous inputs aiming to forecast the closing price of a particular stock is presented in the following.We assume that is the stock closing value at the moment of time t.For each t, we denote by the vector whose entries are the values of the indicators significantly correlated to , that is the correlation coefficient between and is greater than a certain threshold value, for .The neural model used in our research is a dynamic network.The direct method was used to build the model of prediction of the stock closing value, which is described as follows.
where is the forecasted value of the stock price for the prediction period p and d is the delay expressing the number of pairs used as input of the neural model.In our model, we consider .The considered delay has significant influence on the training set and prediction process.We use correlogram to choose the appropriate window size for our neural networks.We need to eliminate the lags where the Partial Autocorrelation Function (PACF) is statistically irrelevant [16].The nonlinear autoregressive network with exogenous inputs (NARX) is a recurrent dynamic network, with feedback connections encompassing multiple layers of the network.The scheme of NARX is depicted in Figure 1.DOI: 10.12948/issn14531305/19.2.2015.04The output of the NARX network can be considered an estimate of the output of a certain nonlinear dynamic system.Since the actual output is available during the training of the network, a series-parallel architecture is created [17], where the estimated target is replaced by the actual output.The advantages of this model are twofold.On the one hand, the inputs used in the training phase are more accurate and, on the other hand, since the resulting network has feed-forward architecture, a static backpropagation type of learning can be used.The NARX network is used here as a predictor, the forecasting formula being given by where is the next value of the dependent output variable y and u is externally determined variable that influences y.The "previous" values 2 of y and 1, 2 of u are used to predict , the future value of y.
An example of this series-parallel network is depicted in Figure 2, where d=2, n=10 and the number of neurons in the hidden layer is 24.The activation functions of the neurons in the hidden and output layers respectively can be defined in many ways.In our tests, we took the logistic function (12) to model the activation functions of the neurons belonging to the hidden layers, and the unit function to model the outputs of the neurons belonging to the output layers.DOI: 10.12948/issn14531305/19.2.2015.04  this problem, a maximum growth factor is introduced, and the learning rate is computed according to the following equation If the considered search direction is defined by then the obtained updating rule of the backpropagation gradient-based algorithm with adaptive learning rate is [18] In our work, the number of neurons in the hidden layer is set according to the following equation [19] where m stands for the number of the neurons in the output layer and N is the dimension of input data.

Experimental Results
We tested the proposed model on 300 samples dataset.The results obtained using the above mentioned technique are reported in the following.The overall forecasting error computed on the already trained data prediction is 0.00035.The regression coefficient computed on the already trained data and the data fitting are presented in Figure 4.The network predictions versus actual data in case of already trained samples are illustrated in Figure 5.The overall forecasting error computed on the new data prediction is 0.0012.The network predictions versus actual data in case of new samples are illustrated in Figure 6.In order to tune the differencing parameter of the ARIMA model, the first order and the second order differenced series respectively have been computed.The corresponding correlogram of the first order differenced series is presented in Figure 9. Since the values of ACF in case of using the first order differenced series are quite small, we concluded that the differencing parameter of ARIMA model should be set to the value 1.

Fig. 9. The correlogram of the first order differenced series
The parameters of ARIMA model related to AR(p) and MA(q) processes were tuned based on the following criteria: relatively small values of BIC (Bayesian Information Criterion), relatively high values of adjusted R 2 (coefficient of determination) and relatively small standard error of regression (SER).The results of our tests are summarized in Table 1.According to these results, the best model from the point of view of the above mentioned criteria is ARIMA(1,1,1) model.We concluded that the best fitted models are ARIMA(1,1,0) and ARIMA(1,1,1).The overall forecasting error computed on the new data prediction is 0.0077 in case of using ARIMA(1,1,0) model, and 0.0096 in case of using ARIMA(1,1,1) model.The results of forecasting are illustrated in Figure 10.

Conclusions
The research reported in this paper focuses on a comparative analysis of NARX neural network against standard ARIMA models.
The study was developed on a dataset consisting in 300 historical weekly observations of a set of variables, between 3/1/2009 and 11/30/2014.The results obtained using the proposed neural approach proved better results from the point of view of MSE measure.The obtained results are encouraging and entail future work toward extending the study in case of using alternative neural models.

Fig. 6 .
Fig. 6.The network predictions versus actual data in case of new samples

Fig. 7 .
Fig. 7.The error histogram in case of new samples

Fig. 8 .
Fig. 8.The correlogram of the available data