New Approaches of NARX-Based Forecasting Model . A Case Study on CHF-RON Exchange Rate

The work reported in the paper focuses on the prediction of the exchange rate of the Swiss Franc-Romanian Leu against the US Dollar-Romanian Leu using the NARX model. We propose two new forecasting methods based on NARX model by considering both additional testing and network retraining in order to improve the generalization capacities of the trained neural network. The forecasting accuracy of the two methods is evaluated in terms of one of the most popular quality measure, namely weighted RMSE error. The comparative analysis together with experimental results and conclusive remarks are reported in the final part of the paper. The performances of the proposed methodologies are evaluated by a long series of tests, the results being very encouraging as compared to similar developments. Based on the conducted experiments, we conclude that both resulted algorithms perform better than the classical one. Moreover, the retraining method in which the network is conserved over time outperforms the one in which only additional testing is used.


Introduction
The prediction of time series is a very topic subject with a promising future in the context of globalization and the full implication of information technology in the financial field.
For the economic processes it is essential that the solutions offered by the literature be found in the conception of the applications and the understanding of the phenomena.The need to predict data in the form of time series is generally understood in the context of stock market speculation, more precisely in terms of potential profit or loss.The level of knowledge in the field is of real use in analyzing markets, stock prices, national economy evolution, in order to see possible collapses, scenarios or even frauds in time.This paper aims to find a new viable solution by combining several elements of statistics, mathematics and neural computations.The main purpose of this article is to study the variance of the Swiss franc exchange rate -RON according to the dollar-RON exchange rate.Taking into account the volume of credits granted in Swiss francs in Romania, the variance of this exchange rate according to the most traded currency on the international market, achieving an accurate forecasting is extremely useful.There are two prediction categories, based on the number of values to predict.The first is the one-step ahead prediction and the second is the multi-step ahead prediction.In the study [1] the two categories are highlighted by the fact that the long-term predicted values can be calculated only based on the forecasted predicted values.This observes a limitation consisting of the accumulation of errors and lack of concrete information.The time series that correspond to the exchange rates are by nature non-stationary.Since 1989, the paper [2] describes that the property of the volatility of trajectories generated by nonlinear systems makes the predictive value of future values extremely difficult.In a recent study [3], the authors obtained better results using SVM (part of Analogy paradigm) in front of NARX (part of Connectivism paradigm) to predict a stock market.Another limitation is described in [4] where the authors note that the prediction performance of neural networks is very dependent on network structure and the training procedure.At the same time, the difficulty of identifying data quality, data preprocessing, networking, and repeating this process until a good model is determined must not represent a discouragement [4].Ahmed [5] predicts the currency exchange rate of a coin according to three other exchange rates using a neural approach, the results after the network was coaching by showing that ANN is an effective prediction method.Also, finding the right lag, the market prediction ability for the NARX network ranged between 99% (best result) and 47% (the worst result) [5].Exogenous Autoregressive Nonlinear Networks have the advantage that they can be used for a very large range of prediction issues and the promising approach they are looking at.As can be seen from [6], a comparative analysis between ANN models and econometric models shows that MLFFNN (Multilayer Feed Forward Neural Network) and NARX (Nonlinear Auto Regressive Exogenous) are better methods of predictive efficiency.Another conclusion from this study is that for the NARX model the number of hidden layer neurons and network drive algorithms do not have a significant influence on the results obtained.This is an encouragement to focus on the optimizations that can be brought to the ANN networks without focusing on changing the core components.There are three ways to see the problem we proposed to analyze: (A) The ideal results.The general problem consists in obtaining an artificial neural network that successfully learns input data and has the ability to learn the patterns from data as accurately as possible so that the test result is as close as possible to the real, new and unknown values.(B) The limitations.There are a lot of factors that prevent the accuracy of predictions for instance: mathematical limitations (the use of probability-based algorithms), computational limitations (time-consuming, complex configurations, computational complexity, processing power), statistically limitations (non-stationarity of the time series, the need of large amounts of data) (C) What is our purpose.Based on the features provided by the Matlab software, we optimized the NARX network using 3 models.In this study a new NARX scheme will be developed to determine the forex rate prediction.Thus, three methods will be compared for predicting: (1) the standard NARX method offered by Matlab software with an additional test method in which the network will be validated, (2) the previous method with an additional feature to can conserve the network if it obtains good results on training but poor results on testing and (3) the previous method with an additional feature to can made changes on the weight while the network is conserve.
The NARX model has a flexible architecture that combines simplicity (such as feed-forward neural networks) and time series prediction (recurrent neural networks) [7].The rest of paper is organized as follows.The general NARX prediction model is briefly described in Section 2. The classic, most commonly used forecasting model together with the proposed techniques developed to improve the forecasting accuracy and generalization capacities of the trained neural network are presented in the third section of the paper.In Section 4 the quality measure used to evaluate the results and the acquisition of data analyzed in this case study are presented.The last section is dedicated to disseminating and discussing the results obtained and suggesting ideas for future work.According to [8], the convergence rate and learning ability of NARX neural networks are higher than other ANNs.This makes them suitable for multi-step forward prediction, as is the case in the present study.

NARX Model
From a formal point of view, the equation for the NARX model is: where yt+1 is the next predicted value of the dependent time series {yt}, calculated based on the past values of the exogenous series {ut}.f function is the global operation of the neural network through which the estimate is made.The values d1 and d2 represent the delays of the target and input and must respect the following properties: d1 > 0, d2 ≥ 0 and d1 ≥ d2.The representation of the architecture of a NARX neural network is presented in Figure 2. To determine the number of neurons on the hidden layer, use formula (6).A large number of neurons will make learning too dependent on the network and the accuracy of the predicted results will suffer.At the same time, a small number of neurons may be insufficient for learning.
where N is the output size and m is the size of the input.The Levenberg-Marquardt algorithm is used to train the network.It provides a numerical solution for the problem of minimizing the smallest squares [3] and the experimental results show the effectiveness of ANN using the LM algorithm [10].

Forecasting Technique
The study focuses on getting results through the three methods mentioned in the first section.
In Figure 2  One way to improve the performances of the forecasting methods is to consider additional information obtained base on technical analysis.Note that technical analysis considers the analysis of market-based information such as price and volume to analyze future price developments.Therefore, we include additional exogenous variables that, on one hand, could help to better train the network and, on the other hand, could have the ability to describe the future price movements.

Fig. 2. Extra data test method
Beside the USD series considered for prediction, our models include the following technical indicators: the Open, the High and the Low values of USD and the moving average of CHF series.We take into account these technical indicators based on the studies reported in [3].
A moving average is an indicator that shows the average value of a security's price over a specified period of time.The N-period MA for the asset's price is computed by: where   is the current price, and  is the number of time periods.The moving average has the role of analyzing the trend of the time series.In our approach  is set to 5: the value for current day is computed as the average of the last 5 days.

Fig. 3. Recursive Flow for Model 1
The second proposed model for the problem consists in conserving the properties of the neural network (weights and biases) and try to improve its generalization capabilities by retraining it.This concept is based on the idea of not losing a well-trained network that does not get good results on the additional test and trying to improve it.Future data will be predicted after finding a NARX configuration that falls within the expected error limits.The flow is represented in Figure 4.The value for iteration limitation is chosen arbitrary, in our case the value is 10.

Fig. 4. Recursive Flow for Model 2
The third proposed model for the problem consists in conserving the most of the properties of the neural network (weights and biases) in order to re-establish the same network.This concept is based on the idea of forcing an optimal configuration by modifying a small part of a well-trained network.The way in which changes are made to a value of the weight on the input layer is represented in Figure 5.The computation of the training error and the prediction are usually done by referring to the mean square error (RMSE), which is proven to be an appropriate metric for time series with a sufficiently large number of observations [12].(7) where: n = the size of the time series yp = the current element of the learned/predicted series yr = the current element of the real series This evaluation of the forecasting accuracy using RMSE is used in our study to determine performance on tests on unknown data for all three methods.The developing of the proposed methods includes testing of their learning capabilities of training data and testing the prediction capacities.Both of these procedures use an additional measure to assign a more significant weight to the end of the time series that coincides with the beginning of what the prediction will be.In other words, we propose the following scenario: a particular network configuration is considered suitable if the error is within the limits set for the training and additional data, forcing the error to be reduced by increasing the weights from the beginning of the time series to the end of it.The weighted average weighted square root formula for calculating the error is given by the following equation ) −1 ∑   (  −   ) 2  =1 (8) where n, yp and yr have the same significance of ( 7) and   is the weights vector.

Experimental results
For this study, the time series stationarity check was performed with the Kwiatkowski-Phillips-Schmidt-Shin test, finding themselves to be non-stationary.
To determine the delay (optimal delay) the PACF function was used, the results of which are presented in the table below:  The prediction obtained with the third model is more accurate overall and we can observe a very good fitting at the end of prediction.The error has decreased by 24.7% compared to the first model and by 12.3% compared to the second model.

Conclusions
NARX neural networks have real potential in learning and understanding stock market developments.Our main goal is to optimally configure them to create more and more performing versions.An idea of future work emerged from the present study is the ability to determine at the level of weights which ones bring for a network the best quality for a version of it and their dynamic preservation according to this criterion.Moreover, since the conducted tests proved good results, they entail the hope that further and possibly more sophisticated extensions can be expected to improve it.Among several possible extensions, some work is still in progress concerning the use of different output functions for the hidden and output neurons.

Fig. 1 .
Fig. 1.NARX Architecture.Source: Beagle et al. (2017) are the status charts for the algorithm in which the configuration is tested on additional data until obtaining a desired result.The software development is based on recursive functions.Considering the following components of the methodology: CN () -Function to create the neural network, TRN () -Function to test the neural network and TN () -Function to test the neural network, the representation of the flow corresponding to the first model is presented in Figure 3.

Fig. 5 . 4 Fig. 6 .
Changes on weights level for Model 3 Error measurementsThe data on which the case study is based is the history of the exchange rate between CHF-RON and USD-RON.The records correspond to the period April 2016 -March 2018 and were collected from the database of the National Bank of Romania[11].Their charts are as follows: Exchange Rate Daily Data for Swiss Franc (a) and for US Dollar (b)

Fig. 9 .
Fig. 9. Results for the third model.RMSE = 0.0199 The NARX model is part of the recurrent network class.These networks have the property to store past values and add the output of the hidden layer in part from the new network input.
where () ∈ , and () ∈  are the input and the output of the network at time ,   and   are the input and the output order, while  is the nonlinear function.When the nonlinear function  can be approximated by a multi-layer perceptron, the resulting system DOI: 10.12948/issn14531305/22.2.2018.01 is called a NARX neural network.
The third proposed model for the problem consists in conserving the most of the properties of the neural network (weights and biases) in order to re-establish the same network.
Fig. 7. Results for the first model.RMSE = 0.0263 Model 2. The idea behind this model is to conserve the properties of the neural network (weights and biases) and try to improve its generalization capabilities by training it again Fig. 8. Results for the second model.RMSE = 0.0227 The prediction obtained with the second model is more accurate overall, the error has DOI: 10.12948/issn14531305/22.2.2018.