Financial Time Series Modelling with Hybrid Model Based on Customized RBF Neural Network Combined With Genetic Algorithm

In this paper, authors apply feed-forward artificial neural network (ANN) of RBF type into the process of modelling and forecasting the future value of USD/CAD time series. Authors test the customized version of the RBF and add the evolutionary approach into it. They also combine the standard algorithm for adapting weights in neural network with an unsupervised clustering algorithm called K-means. Finally, authors suggest the new hybrid model as a combination of a standard ANN and a moving average for error modeling that is used to enhance the outputs of the network using the error part of the original RBF. Using high-frequency data, they examine the ability to forecast exchange rate values for the horizon of one day. To determine the forecasting efficiency, authors perform the comparative out-of-sample analysis of the suggested hybrid model with statistical models and the standard


Introduction
Predicting time series using statistical analysis started in the 60s years of 20th century.First statistical models were based on the theory of exponential smoothing originally published in [1], [2] and [3].Another breakthrough came with publishing a study from Box & Jenkins [4].In this study, they integrated all the knowledge including autoregressive and moving average models into one book.From that time, the ARIMA models have been very popular in time series modelling for a long time as O'Donovan [5] showed that these models provide better results than other models used in that time.However, in 1982 Engle [6] showed that using ARIMA models in financial series modelling is not always correct as these series usually have conditional variance instead of constant.Therefore, he suggested ARCH (autoregressive conditional variance) models for financial modeling.
One of the reason computers started to apply in time series modeling was the study of Bollershev [7] where he proved the existence of nonlinearity in financial data.The first techniques of machine learning applied into time series forecasting were artificial neural networks (ANN).As ANN was a universal approximator, it was believed that these models could perform tasks like pattern recognition, classification or predictions [8], [9], [10].Today, according to some studies [11] ANNs are the models having the biggest potential in predicting financial time series.The reason for the attractiveness of ANNs for financial prediction can be found in works of Hill et al. [12], where authors showed that ANNs works best in connection with high-frequential financial data.
While in the first application of ANNs into financial forecasting, perceptron network, the simplest feedforward neural network, was used [13], nowadays it is mainly RBF network that is being used for this as they showed to be better approximators than the perceptron networks [14].
We decided to apply our neural networks models into the market of exchange rates.Forex, which is short for the foreign exchange market, is one of the world's largest and most liquid financial markets in the world.Therefore, it is no surprise that the exchange rates forecasting has attracted the attention of many financial researchers and analysts for a long time.Very common approach investors use for trading is technical analysis (TA)."Technical analysis can be seen as a collection of algorithms and mechanical rules which attempt to aid investors in forecasting future market movements, using only historic data" [15].
In this paper, forecasts of USD/CAD exchange rate through customized RBFGA neural network with Moving Average errors (RBFGA-SMA) was performed The standard RBF model will be extended by using Moving Average for modeling the errors of RBF network.Additionally, the hybridized version of ANN will be used; we will combine the standard ANN with EC technique called genetic algorithms that will be used in the process of finding optimal parameters of the neural network.According to some scientists [16], the use of technical analysis tools can lead to the efficient profitability on the market, we decided to combine our customized RBF network with a tool of the technical analysis.Investors use a large number of technical trading tools so as to make the decision-making process easier.In the process of searching an optimal tool of TA for hybridization with RBF, we were inspired by ARIMA models where the MA part is correcting the error of the AR model.Therefore, for our experiments one of the simplest tools of TA -Moving Averages was chosen.Just like many other tools of technical analysis, Moving Averages are also based on analyzing historical data using statistics.From a forecasting point of view, it is a type of finite impulse response filter used to predict the next value in the series by creating a series of averages of different subsets of the full data set.The simple moving average was used to model the error part of the RBF network as there was a suspicion it could enhance the prediction outputs of the model.
Our machine learning application to exchange rates forecasting is novel in two ways -we use the standard neural network hybridized with simple moving averages to form a whole new hybrid model for forecasting.Nowadays, there are lots of hybrid models of neural networks.Hassan et al. [17] tested a fusion model of Hidden Markov Models (HMM), ANN and GA for stock market forecasting.Thinyane and Millin [15] use the intelligent hybrid system composed of GA and ANN to enhance the decision-making process in the field of currency trading.Sterba and Hilovska [18] combine ANN and statistical ARIMA models to create a hybrid model.Other studies of hybrid models include, for example, those by Zhuang and Chan [19], Chikhi et al. [20], Choudhry and Garg [21] or Marcek and Falat [22].However, none of the mentioned hybrid models does not focus on a combination of ANN with TA.
Moreover, we also use other than just the standard algorithm for training parameters of the neural network in order to achieve the maximal prediction accuracy of our suggested model.

2.
Models and Methods

Box-Jenkins and GARCH Models
For more than 20 years Box-Jenkins ARMA models have been widely used for time series modelling.The models published in [4] are autoregressive models (AR) and moving average (MA) models.Let y t be a stationary time series that is the realization of a stochastic process.General formula of ARMA(p,q) model is expressed as follows (for details, see [4]): The weakness of ARMA models is the inability to model non-constant variance.As this type of variance is very common in currency pairs, constant volatility is not able to capture some of the basic properties of heteroscedastic volatility present in financial time series such as stochasticity of volatility, volatility clustering, mean reversion and existence of fat tails.In [6] Engle suggested the solution by creating so-called ARCH (Autoregressive Conditional Heteroscedastic) models which assume heteroscedastic variance of ε t .

Feed-Forward Neural Network
The model of artificial neural network based on human neural system is an universal functional black-box approximator of non-linear type [23], [24], [25] which are especially helpful in modelling non-linear processes having a priori unknown functional relations or this system of relations is very complex to describe [26].
In [23] and [24] it has been showed that a neural network can approximate any continuous function into any demanded accuracy.Moreover, artificial neural network can generalize.After learning on a training set data, the network is very often able to produce good outputs on unknown inputs.
Let F be a function defined as: is a representation assigning one value y t to kdimensional input a given time period t.Let G be is a restriction of F defined as: where R train is a complement of R val to R.Then, artificial neural network is a mathematical model defined by finding the values w t so that the function in Eq. ( 1) would be minimal: when E minimal, one can say G(x t , w t ) is adapted to approximate the function F .
Multilayer perceptron (MLP) is the type of feedforward neural network usually having three layers.Input layer is composed of the input vector; the output layer is represented by just one neuron and contains the network output.Usually, there is also one or more hidden layers between inputs and output.In most cases, one hidden layer is sufficient since according to Cybenko theorem [27] the network with one hidden layer is able to approximate any continuous function.Layers are interconnected via synapses (also called weights), which represent parameters of the neural network model.

Radial Basis Neural Network
Radial Basis Function (RBF) ANN is the upgrade of MLP network.The name of this type of neural network comes from the name of its activation function.Generally, a radial basis function (RBF) is real-valued functions whose values depend only on the distance from the origin or from some other point c, called a center: Any function φ that satisfies the property in Eq. ( 7) is a radial function.The norm is usually Euclidean distance.
Hence, the biggest difference between MLP and RBF is in using the different function for activating hidden neurons.RBF neural network uses radial basis function of Gaussian type instead of the sigmoid function for activating neurons in the hidden layer.This function is defined for j th hidden neuron as: where σ 2 j is the variance of j th neuron.The network output for RBF neural network is then counted as follows:

Back-Propagation Algorithm
The most popular method for learning in multilayer networks is called back-propagation (BP).It was first invented in 1969 by Bryson and Ho [28], but was largely ignored until the mid-1980s.BP is a multi-stage dynamic system optimization method mainly used for adapting parameters of feed-forward neural networks.This algorithm, which is based on gradient descent and is a generalization of the delta rule, is a supervised learning method.Hence, the training set of desired outputs, according to which the adaptation is performed, is required.The assumption of using backpropagation is that the activation function of the neurons is differentiable.

Genetic Algorithms
Genetic algorithms, which are algorithms for optimization, are stochastic search techniques that guide a population of solutions towards an optimum using the principles of evolution and natural genetics [29].Their representation and operators characterize them.Basic genetic operators include reproduction, crossover and mutation [29].
A key concept for genetic algorithms is that of schemata, or building blocks .A schema is a subset of the fields of a chromosome set to particular values with the other fields left to vary.As originally observed in [30], the power of genetic algorithms lies in their ability to implicitly evaluate large numbers of schemata simultaneously and to combine smaller schemata into larger schemata [31].
Adopted from biological systems, genetic algorithms are based loosely on several features of biological evolution [23].In order to work properly, they require five components (way of encoding solutions, evaluation function, way of initializing population, operators for reproduction and parameter settings).For details, see [31].
When the components of the GA are chosen appropriately, the reproduction process will continually generate better children from good parents, the algorithm can produce populations of better and better individuals, converging finally on results close to a global optimum.Additionally, GA can efficiently search large and complex (i.e., possessing many local optima) spaces to find nearly global optima [31].In many cases, the standard operators, mutation and crossover, are sufficient for performing the optimization.Moreover, GAs are also capable of handling problems in which the objective function is discontinuous, non-differentiable, nonconvex or noisy.Since the algorithms operate on a population instead of a single point in the search space, they climb many peaks in parallel and therefore reduce the probability of finding local minima [30].

Hypothesis
Scientists try to incorporate other methods into RBF network to better its outputs.For example, in [32] authors use genetic algorithms for creating "Evolving" RBF -i.e. to automatically find the ideal number of hidden neurons.Montana in [31] uses genetic algorithms with multilayer perceptron neural network for weight adaptation.
First, RBF neural network will be combined with an unsupervised learning method for clustering called Kmeans.Since Kohonen [33] and Marcek [34] demonstrated that non-hierarchical clustering algorithms used with artificial neural networks could cause the better results of ANN, K-means will be used together with RBF in order to find out whether this combination can produce the effective improvement of this network in the domain of financial time series.The K-means will be used in the phase of non-random initialization of weight vector w performed before the phase of network learning.In many cases it is not necessary to interpolate the output value by radial functions, it is quite sufficient to use one function for a set of data (cluster), whose center is considered to be a center of activation function of a neuron.The values of centroids will be used as the initialization values of weight vector w.Weights should be located near the global minimum of the error function Eq. ( 4), and the lower number of epochs is supposed be used for network training.We will use adaptive version of K-means.Also, the back-propagation itself is a weakness of RBF.The convergence is slow and in addition it generally converges to any local minimum on the error surface, since stochastic gradient descent exists on the surface, which is not flat.Therefore, the back-propagation will be substituted by the genetic algorithm (GA) as an alternative learning technique in the process of weights adaptation.GAs are generally not bothered by local minima.The mutation and crossover operators can step from a valley across a hill to an even lower valley with no more difficulty than descending directly into the valley.
Finally, in this paper we also suggest a hybrid model.The reason to use this hybridization is to create a model with better forecasting properties.There exist quite a lot hybrid models in time series forecasting; for example in [35] authors combine Hidden Markov Models (HMM) with GARCH models to forecast time series, authors in [36] and [37] use a combination of Support Vector Machines (SVM) and genetic algorithms and [38] constructed a hybrid model composed of SVM and self-organizing maps (SOM) in time series prediction process.Our suggested hybrid model combines RBF neural network and moving average used for error modeling.Hence, in the next part of this paper, hypothesis that a combination of ANN and statistical model can produce a model with better properties, will be tested.

4.
Pre-Experimental Procedures  changing anymore) in order to find out the prediction power, as there is an assumption that if the model can handle to predict data from ex-post set, it will also be able to predict values of a currency pair in the future.

Box-Jenkins Analysis
Box-Jenkins analysis was performed to make a comparison between statistical models and our tested neural network models.For statistical modelling, Eviews software was used.The ADF unit root test confirmed the hypothesis that the series was non-stationary.To confirm stationarity, the series was differentiated which is a necessary condition in Box-Jenkins modeling.
By analyzing autocorrelation and partial autocorrelation functions of first differences of USD/CAD, there was no significant coefficient in the series.Therefore, we proceeded to model this series as a pure GARCH process.The suitability for using stochastic volatility model was also accepted by performed heteroscedas-ticity test.ARCH test confirmed the series was heteroscedastic since the null hypothesis of homoscedasticity was rejected at 5 % and so the residuals were characterized by the presence of ARCH effect which is quite a frequent phenomenon at financial time series.Several models of ARCH [6] and GARCH [7], were estimated.The estimation of different models was only based on 912 in-sample observations, in order to make ex-ante predictions with remaining 132 observations.Marquardt optimization procedure was used for finding the optimal values of GARCH parameters; initial values of parameters were counted using Ordinary Least Squares (OLS) method and these values were then by iterative process consisted of 500 iterations.Convergence rate was set to 0.0001.After GARCH model had completed, the standardized residuals of this series were tested with Ljung-Box Q test in order to confirm there are no significant coefficients in residuals of this model.Our assumption was confirmed, hence according to statistical tests the model was correct.The final definition of the model is:

Experiments
The estimation of all models was only based on 912 observations, in order to make further comparisons with the predictions of the 132 remaining observations.In this paper, only one-step-ahead forecast were used, i.e. horizon of predictions was equal to one day.MSE (Mean Square Error) numerical characteristic was used for assessing models.The result of a given model is from the best neuron configuration (in every model we tested number of hidden neuron from 3 to 10 to find the best output results of the network).Experiment for every model configuration was performed 12 times; the best and worst results were eliminated and from the rest the mean and standard deviation were counted.
Our created models of neural networks were then compared to standard statistical models.

RBF Neural Network
Own application of RBF neural network (implemented in JAVA with one hidden layer where we tested from three to ten processing neurons to achieve best results of network) was used.For every model, just the result with the best configuration is stated.The identity function was used as an activation function for the input layer and the output layer too.For the hidden layer the radial basis function was used as an activation function as it has been showed that it provides better accuracy than the perceptron network.In a normal case, the weights of neural network were initiated randomly -generated from the uniform distribution < 0.1).For the BP learning the learning rate was set to 0.001 to avoid the easy imprisonment in the local minimum.The number of epochs for each experiment with backpropagation was set to 5000 as this showed to be a good number for backpropagation convergence.The final results were taken from the best of 5000 epochs and not from the last epoch in order to avoid overfitting of the neural network.As raw (nonstandardized) data was used, we analyzed original nonstationary series for autocorrelation.As there was a strong dependence on the previous day (autocorrelation = 0.996), we used just one network input -the previous observation.

K-means Algorithm
K-means instead of random initialization of weights was used before they were adapted by BP.Coordinances of clusters were initiated as coordinances of randomly chosen input vector.After that, every input vector was assigned the nearest cluster.When this procedure has been done, the coordinates of clusters were recounted.This cycle was repeated 5000 times and the learning rate for the cluster adaptation was set to 0.001.The number of clusters was set to the number of hidden neurons.
In our experiments, the adaptive version of K-means will be used which is defined as follows: • Random initialization of centroids in the dimension of the input vector.
• Introduction of the input vector x i .
• Determination of the nearest from all centroids to a given input.
• Adaptation the coordinates of the centroid according to the rule: , where j is the nearest cluster to the introduced input, η is a learning rate parameter.
• Termination of the algorithm if all inputs were processed or the coordinates of the cluster are not changing anymore.

Genetic Algorithm
Our own implementation of the genetic algorithm was used for weight adaptation.The chromosome length was set according to the formula: D * s + s, where s is the number of hidden neurons and D is the dimension of the input vector.A specific gene of the chromosome was a float value and represented a specific weight in the neural network.The whole chromosome represented weights of the whole neural network.The fitting function for evaluating the chromosomes was the mean square error function (MSE).The chromosome (individual) with the best MSE was automatically transferred into the next generation.The other individuals of the next generation were chosen as follows: By tournament selection (size of the tournament equaled to 100) 100 individuals were randomly chosen from the population.The fittest of them was then chosen as a parent.The second parent was chosen in the same way.The new individuals were then created by crossover operation.If the generated value from 0.1) was lower than 0.5 the weight of the first parent at the specific position was assigned to a new individual.Otherwise, a new individual received the weight of the second parent.
The mutation rate was set to 0.01.If performed, the specific gene (weight) of a chromosome was changed to a random value.The weight initialization for the first population of chromosomes was tested too.Testing intervals −1, 1 , −10, 10 and −100, 100 it was found out that the best results are presented when using weights generation from −10 to 10.The size of the population and the number of generation for the genetic algorithm were set accordingly to the settings of back-propagation.In back-propagation there were 5000 cycles of the forward signal propagation plus 5000 cycles of backward error propagation.In GA the size of the population equalled to 1000 and 10 was the number of generations.As the operators of mutation and crossover do little computation, the computational demand is relatively the same.

Hybrid Model
Hybrid model is a combination of more independent models integrated into one model to produce only one output in specific time t.The main point in constructing hybrid models is to find out how to combine independent models in order to produce the best possible results.
In this work the following hybrid model was tested -a combination of RBF neural network and the error moving average model.The inspiration for the moving average part came from Box-Jenkins statistical models.We try to eliminate the error of the neural network by modeling the residuals of RBF just like moving averages of random part in Box-Jenkins ARIMA models.The hybridized model which was suggested is defined as follows: e RBF k−i , j = 1..s, (11) 6.

Results and Discussion
The reason why the prediction qualities were applied on the validation set (ex-ante predictions) was the fact that an ANN can become so specialized for the training set that could lose accuracy in the test set.Therefore, it is obvious that even if the network provides acceptable results on the training set, it is not sure the results will not acceptable when they are applied to new data.Therefore, the estimation of all models was only based on 912 observations, in order to make further comparisons with the predictions of the 132 remaining observations.In this paper, only one-step-ahead forecast were used, i.e. horizon of predictions was equal to one day.We used MSE (Mean Square Error) numerical characteristic for assessing models.The result of a given model is from the best neuron configuration (in every model we tested number of hidden neuron from 3 to 10 to find the best output results of the network).Experiment for every model configuration was performed 12 times; the best and worst results were eliminated and from the rest the mean and standard deviation were counted.
First of all, from Tab. 1 it can be clearly seen (RBF network, one autoregressive input) that network with BP achieved the best results when there are 4 neurons in the hidden layer.On the other hand, the advanced methods for network learning (K-means + BP, GA) achieved the best results with 4 (GA), respectively 9 neurons (K-means + BP).However, when these advanced methods are used, the number of hidden neurons seem to not play an important role as the results where comparable .Following from that one can deduce that for remembering the relationships in this time series it is enough to use a smaller number of hidden neurons (three or four).
Secondly, the standard back-propagation algorithm for weights adaptation showed to be the great weakness of the neural network.The convergence was very slow and in addition, it generally converged to any local minimum on the error surface, since stochastic gradient descent existed on the surface which was not flat.It was due to the fact, that the gradient method does not guarantee to find optimal values of parameters and imprisonment in the local minimum is quite possible.
Bearing in mind the disadvantages of BP stated in the previous part of the paper, also other methods for network adaptation were tested-K-means, that was used in the phase of non-random initialization of weight vector w performed before the phase of network learning, and GA.It is no surprise that the RBF network combined with K-means or GA for weights adaptation provided significantly better results than the standard RBF (see Tab. 1).Moreover, besides lower MSE, another advantage of using a genetic algorithm or Kmeans upgrade is the consistency of predictions.The standard deviation of these methods is incomparably lower than the SD when the standard BP is used (see Tab. 1).
As for K-means, its biggest strength is in the speed of convergence of the network.Without K-means, it took considerably longer time to achieve the minimum.However, when the K-means was used to set the weights of the network before backpropagation, the time for reaching the minimum was much shorter.Therefore, the advantage of using K-means together with backpropagation is in the speed of adaptivity rather than in better predictions.Following from that one can say that in many cases it is not necessary to interpolate the output value by radial functions, it is quite sufficient to use one function for a set of data (cluster), whose center is considered to be a center of activation function of a neuron and the values of centroids can be used as an initialization values of weight vector w.The assumption used here was that weights should be located near the global minimum of the error function the (Eq.( 4)).The advantage of this combination is that the lower number of epochs is supposed be used for network training.Moreover, Kmeans is quite simple to implement.However, one must bear in mind that Kmeans is a relatively efficient algorithm only in the domain of non-extreme values.Otherwise, other advanced non-hierarchical clustering algorithms must be used.
Having also tested GA in weights adaptation, it was found out the convergence was also considerably faster than at BP and therefore it was no surprise that sometimes the network converged only after 5 generations.
If we compare weights adaptation via GA and Kmeans plus backpropagation, the results are almost the same.Even though, K-means provided better results compared to GA, the differences are not very large.However, GA has a bigger potential to perform even better forecasts as there are more parameters needed to be optimized.Backpropagation, even though used with Kmeans, seemed to reach its global minimum as even with the higher number of epochs (we tested back-propagation up to 10000 cycles) the results were almost the same.The number of hidden neurons seemed to not play an important role as the results were comparable.Also the number of inputs coming into the network in this paper was tested; the ANN with only one input provided better forecasts than the network with three autoregressive inputs (see Tab. 1).
For the RBF-SMA hybrid ANN model, the same strategy as for the standard ANN was used.With the given number of hidden neurons (step by step we tested the model with hidden neurons from three to ten) twelve testing procedures were performed.Just like for the standard model, we firstly trained the standard network, using either GA, BP or K-means plus BP, we then checked the predictive accuracy of the standard RBF using the last 133 observations which were not used for model training.Afterwards, the hybridization of the ex-ante predictions with moving averages of previous errors was performed.What for the value of parameter of the moving average, the values from one to one hundred were tested and we experimentally found out the best values for the tested data (for the majority of testing procedures the optimal value of moving average parameter was 44).Finally, just like for the standard RBF, from the best ten out of twelve experiments, the mean and standard deviation of the best results of RBF-SMA (having the optimal value of MA parameter) were counted.For every number of hidden neurons tested, the results are stated in the Tab. 1 which contains the results of out-of-sample predictions provided by the different models and optimization techniques, respectively.
To see the effectiveness or ineffectiveness of the suggested hybrid model, comparative analysis with individual model is usually performed.We provided the comparison with standard RBF network as well as the statistical ARIMA and GARCH model in order to show the prediction power of our suggested model.Table 2 states the final results of the numerical comparison of all tested and quantified models.Table 3 states the percentual comparison of our suggested model against the standard neural network.
Deducting from the Tab. 1 and Tab. 2, the network with only one input provided significantly better results in the validation set than the network with three inputs.The standard RBF provided the best outputs when it was combined with K-means and BP.The error of prediction at this network was a little bit lower compared to the statistical model; however these two models provided almost the same results.
Comparing the numerical (Tab.2) as well as graphical results (Fig. 2 and Fig. 3), the suggested hybrid model improved the prediction power of the standard RBF considerably.Therefore, deducting from the performed experiments one can state that the application of our suggested new hybrid neural network model into the domain of exchange rates provides significantly bet-  ter results than the standard RBF neural network as well as statistical ARIMA model.

Conclusion
In this paper we performed financial time series modeling with RBF neural networks as well as our suggested hybrid model.We used USD/CAD data that was divided into training set and validation due to model checking.Except for a standard ANN, we also combined an unsupervised learning method K-means and GA into the RBF in order to achieve better accuracy of the network.Both of the algorithms were used in the process of adapting weights of the network.The reason for incorporating other algorithms into the network was that the BP is considered a weakness of the RBF.Some of the drawbacks of BP include the scaling problem, the complexity problem, the slow convergence, the conver- gence to a local minimum etc.K-means was used in the phase of non-random initialization of weight vector w before learning.
We also suggested the new hybrid neural network model in order to improve the prediction accuracy of the standard ANN.One can say we have used only the easiest and slow BP and not a more powerful version of BP.However, it is important to note that our main objective was to compare the standard neural network with our suggested hybrid model.Due to this we used only the standard BP in both models.
We performed experiments to find out that our suggested hybrid model based on RBF neural network had a significant predictive superiority over the statistical model as well as standard characteristics always overcame individual models (ANN, statistical model); the improvements were ranging from about 18 percent to more than 89 percent.Hence, the suggested hybrid showed to be a great improvement of the standard RBF neural network as we experimentally clearly proved that for the USD/CAD the hybrid model provided significantly better forecasts than the standard model of the RBF neural network and the statistical model and there was a clear benefit of better one-day-ahead forecasts.Following from these empirical findings for outof-sample one-step-ahead forecasts, we believe that this model has a great potential in time series modeling.

4. 1 .
Data and Model ValidationFor our experiments daily close price of the USD/CAD currency was chosen.The interval was from 10/31/2008 to 10/31/2012, i.e. 1044 daily observations.The data was downloaded from the website http://www.global-view.com/forex-trading-tools/forex-history.Due to va-
Tab. 1: Prediction power of RBF-SMA model (back-propagation, one input).Final comparison of predictive qualities -best configurations (out-of-sample predictions).Tab.3: Percentual improvement of the suggested hybrid model compared to the standard RBF.