Modelling and optimisation of effective hybridisation model for time-series data forecasting

: Financial time-series data have non-linear and uncertain behavior which changes across the time. Therefore, the need to solve non-linear, time-variant problems has been growing rapidly. Traditional models such as statistical and data mining approach unable to cope with these issues. The main objective of this study to combine forecasts from the autoregressive integrated moving average model, exponential (EXP) model, and the multi-layers perceptron (MLP) in a novel hybrid model. The analysis was based on ﬁ nancial data of Sudanese pound/EURO exchange rate in Sudan. In this case, simple additive combination and weight combination methods are used in combining linear and non-linear models to produce hybrid forecast. Comparison between benchmark models and hybrid indicates that the hybrid model offers more accurate forecasts with reduced mean-absolute percentage error of around 0.82% for all models over all forecasting horizons. Moreover, the results recommend that the non-linear method can be applicable to an alternate to linear combining methods to accomplish better forecasting accuracy. On the basis of the results of this study, the authors can conclude that further experiments to estimate the weight of the combination methods and more models essential to be surveyed so as to explore innovative concerns in series prediction.


Introduction
Forecasting time series, especially for financial data, is of great interest to the economic world. Until now, the primary methods used for forecasting are conventional statistical methods such as regression analysis and autoregressive integrated moving average (ARIMA) [1]. ARIMA is a very efficient method for forecasting linear time series, and its building process has been well described by Box et al. [2]. According to Zhang [3,4], the most extensively used statistical methods in time-series forecasting are linear, which can only catch the linear patterns; however, the author added that most time-series data are either non-linear or contain non-linearity properties. Machine learning methods have gained more attention from researchers as non-linear forecasting methods, in particular, the artificial neural network (ANN) method, as this has been the most commonly used method in the last few years for financial time-series forecasting, as stated by Shadbolt [5], Zhang and wu [6], and Merh [7]. NNs have also been recognised as a successful technique in time-series forecasting [8]. A comparison between the most used linear model, ARIMA, and ANN, concluded that ANN outperformed ARIMA in many cases, as the study conducted by Zhang [4] and Hua et al. [9] found. However, some studies have suggested that in some situations, ANN methods outperform ARIMA, as it depended on the behaviour of the time series, as demonstrated by Babu and Reddy [10] and Humphrey et al. [11]. Most time-series data contain both linear and non-linear patterns, so using linear methods alone to model the data is not practical and using non-linear methods alone will also fail to model the linear patterns [3,4,12]. To overcome this problem, a new method has emerged, but is not yet widely used. The idea is to combine the two models: linear and non-linear. The linear model will help in modelling linear patterns and the non-linear model will help in modelling the non-linear patterns. This hybridisation has been featured in many studies as the best approach for forecasting time-series data [7,[13][14][15]. According to the literature, combining models (hybridisation) has shown promising results in studies of financial time series such as the study of [4]. However, some researchers have argued that hybrid systems do not necessarily produce better forecasts, as stated by Taskaya-Temizel and Casey [13]. Thus, based on this argument, this study will address this issue by investigating several different models (linear, non-linear, and a hybrid of both) for the daily exchange rate of the EURO against the Sudanese pound (SDG).
Since we are proposing an approach which is proven by many authors to be more effective than a single approach, we review some of these studies here. For example, a study proposed by Fatima and Hussain [15], who built a hybrid financial system combining linear and non-linear models (ARIMA or Autoregressive conditional heteroskedasticity/Generalized Autoregressive conditional heteroskedasticity (ARCH/GARCH) as the linear model and ANN as the non-linear model), concluded that the proposed hybrid system was superior to all the models studied for forecasting the daily Karachi Stock Exchange (KSE) 100 index. In their experiment, the authors relied on trial and error in building the ANN model with different inputs and hidden nodes. Another hybrid system was built by Aladag et al. [16] using ARIMA with ERNN NN. The authors used ERNN to form the network architecture. They suggested that a combination of ARIMA and ANN can yield a network architecture that gave better forecasting accuracy. Sallehuddin et al. [14] proposed two hybrid models for multivariate time-series analysis. However, their final proposed hybrid was different from others in terms of the execution or implementation order of the two models. They dealt with non-linear patterns first by applying the ANN model and then executed the linear model for forecasting the residual. Their results showed that the order of the execution of the hybrid system had an effect on the forecasting accuracy. Moreover, Zhang [4] suggested that combining two models (e.g. ARIMA and ANN) was better than using them separately and was an effective way to improve forecasting accuracy. The author showed in two different studies that combining linear and nonlinear methods yielded a better forecast. In his first study, three different data sets were studied: Wolf's sunspot yearly data, the Canadian lynx yearly data, and weekly British pound/US dollar exchange rate data. In the three data sets under study, the ARIMA-ANN hybrid model outperformed the two models in both the short and long terms. The second hybrid ARIMA-ANN study, published by Zhang [3], used three monthly time series: the US industrial production for clothing, residential utilities, and auto products. The author confirmed that a combination of the two methods gave a better forecasting result. Similarly, Javedani et al.
proposed Auto Regressive Fractional Integrated Moving Average (ARFIMA) models and Fuzzy Time Series (FTS) (ARFIMA-FTS) hybrid model, validated by common data set to remain Taiwan Capitalization Weighted Stock Index (TAIEX), and Dow Jones Industrial Average (DJIA), together with exchange rate data of nine main currencies versus USD. On the basis of the reported results, it concluded that to apply more effective hybridised methods in financial time-series forecast, accordingly importance in this research field [17].
Another study that looked at the hybridisation approach was the study by Koutroumanidis et al. [18]. The authors used an ARIMA-ANN hybrid model to forecast fuel wood prices in Greece. In their comparison, they conclude that using a hybrid ARIMA-ANN model is better than using either one individually. More recently, Faruk [19] constructed a hybrid system combining both ARIMA and ANN to predict river water quality. The results of the hybrid model were compared with each of the single models' output. The results showed that the proposed hybrid system outperformed the two forecasting models if used individually. All previous studies mentioned have compared a hybrid system consisting of ARIMA and ANN with each of the models' performance when used alone and all agreed that combining linear and non-linear methods such as ARIMA and ANN yielded a better forecast.
The study of time-series data is a wide field; this study is devoted to financial time series and, in particular, exchange rate forecasting using a hybrid model comprising ARIMA, Exponential Model (EXP), and of Multi-layers Perceptron (MLP), In terms of financial time-series modelling, the SDG-EURO will be used as the case study in this research; as discussed briefly next.

Proposed hybrid method
In this section, we consider two methods to combine separate forecasts produced by the EXP, ARIMA, and MLP models. To investigate the best model for solving time-series forecasting.
This method comprises linear and non-linear components. However, most time-series data, especially financial data, have both patterns. Many methods have been applied to time-series forecasting, both linear and non-linear, but none are capable of handling both patterns simultaneously. To overcome this problem, and improve forecasting accuracy, a hybridisation of linear and non-linear methods is proposed. This method has been applied in various studies related to financial time series, as discussed previously in the literature review. The combining methods included (additive combined method and linear regression weighted method). Brief details about the above-combining methods are assumed below.

Additive combined method
The authors explained that this method comprised three stages: (i) The first stage was to model the linear patterns using statistical model (ARIMA or EXP), to forecast the future value of exchange rate, and forecasting errors (residuals) will be generated from this process. (ii) In the second stage, MLP will be used to forecast the residuals generated in the first stage (from the ARIMA model). (iii) In the third stage, the value forecasted by the statistical model will be summed with the error forecast generated by MLP to produce the final forecast value, the hybrid forecast value. The framework of the proposed model is shown in Fig. 1.
For illustration, let us assume that a given time series comprises linear patterns statistical model structure and non-linear patterns the residuals where L t is the linear pattern at t and N t is the non-linear pattern at t. Statistical model will model the linear pattern L t and generate the residuals from the process, which is a non-linear pattern. By letting e t denote the residuals, we have the following representation: whereL t is the forecast result at t from the statistical model. We know from ARIMA diagnostic checking that the residuals should contain only non-linear patterns not linear correlated. In ARIMA diagnostic checking, we examine the residuals for any linear correlation and make sure that there is no linear correlation left in the residuals. We do this in order to satisfy the ARIMA assumption of no correlation in the residuals. Thus, the residuals contain only nonlinear patterns, which can be modelled with a non-linear method, in this case, MLP. MLP was chosen because of its ability to approximate any function of unknown form. Hence, the function that represents the residuals is unknown, and MLP can approximate it. The residual to be forecast by MLP is given as where f is the function to be approximated by MLP and 1 t is the unexplained error.
By letting e t in (3) denote the estimated N t we obtain whereŷ t is the final forecast,L t is the linear forecast from statistical model, andN t is the non-linear forecast from MLP. In the literature, several studies that combined statistical model and MLP concluded that hybridising these two methods yields better forecast results.

Linear regression combined method
In the second method, three models combined into the hybrid model (i.e. ARIMA, MLP, and the EXP models) as shown in Fig. 2. Those three models fed by same input values while the output of each of them indicates independent predictors used for the hybrid model. The authors explained that the method comprised four stages: (i) The first stage was to fit all models (ARIMA, EXP, and MLP) to forecast the future value of exchange rate. (ii) In the second stage, the weight of each model calculated, in order to estimate the contribution weight for those predictors, we applied linear regression between them. Accordingly, the combination equation can be defined as follows. (iii) In the third stage, the linear combined method used to design hybrid model is as follows:  Fig. 3.

Performance measures
Several statistical measurements are used in order to estimate a fit model that minimised the error [20]. Those measurements are illustrated in Table 1. According to observation results, we used mean-absolute percentage error (MAPE) as the best benchmark [21] for aforementioned models. The following terminology explained that: if y 1 , ..., y n represents a time series, thenŷ i represents the ith predicted value, where i ≤ n, for i ≤ n, the ith error e i is then: 4 Results and discussion

Data set
To implement the objectives of this paper, investigate the daily exchange rate of the EURO against the Sudanese pound (SDG) in the Sudanese market, this data was collected from the Central Bank of Sudan, Khartoum, The data has a duration from the 3rd of July of

Benchmark results
To further, explain for linear models (EXP, ARIMA) and non-linear MLP models are presented, and its ability in exploring the prediction pattern in the historical FTS data. These models are applied separately and integrated to demonstrate their predictability of real study for FTS. In addition, this paper submits a new hybrid model based on MLP, EXP, and ARIMA methods, which is constructed to predict SDG next day closing prices. To establish the validity of the proposed method, further procedure did by comparing the obtained results of single approach's models with the results of the hybrid proposed models. After fitting individual models, Fig. 2a illustrated the actual testing data set of SDG-EURO daily closing rate exchange price and predicted the value of the single models (EXP, ARIMA, and MLP). The outputs of five tests run on the residuals to determine whether each model is enough for the data, to make the forecasting results more stable. Simple EXP model, ARIMA (0, 1, 1), and MLP 1-5-1 have been selected. Table 2 summarises the prediction values of the currently selected model in fitting the historical data. It displays each of the statistics is based on the one-ahead forecast errors, which have been used to generate the forecasts. As it can be observed from Fig. 2a, all used models have generated a good predicting result. The forecast values are so close to the actual values and to one another as well. It can be observed that compared with the single predicting models. MLP model is the best one for forecasting the SDG-EURO data with a higher fit ability and better forecasting accuracy.

Additive combination results
After fitting additive combination technique two hybrid models were generated, as showed in Fig. 2b illustrated the actual (SDG-EURO) closing price of the testing data set and the predicted value of the hybrid models (MLP + EXP and MLP + ARIMA). Similarly, from Table 2 can be observed that the forecast obtained values from all the utilised models are so close to the actual values. Table 3 summarises performance errors of each hybrid model in fitting the historical data. From Fig. 2b, MLP + ARIMA does not perform well when forecasting the SDG-EURO data, and the MAPE increased from 1.46% of ARIMA to 1.57% MLP + ARIMA. This may be caused by weak forecasting stability of MLP, and though ARIMA can optimise its parameters, the effect to improve its stability is weak. Besides, MAPE decreases from 1.76% of ES to 1.59% of MLP + EXP. It can be proved that the forecasting ability of MLP + ARIMA is better than MLP + EXP, which is because that MLP + ARIMA can deal well with the data such as SDG-EURO time series.

Linear regression combination method results
After fitting weighted combination technique (MLP + EXP + ARIMA) hybrid model generated, as presented in Fig. 2c which illustrated the actual (SDG-EURO) values from the data set and the predicted value from the hybrid model. Additionally, to estimate the weights of a composite model linear regression method determined that according to regression equation of the preferred model as below: The correlation coefficient (r) between variables in hybridised equation equal to 0.83 which measured the efficiency of the composite model. It can be said that the relation between these variables are positively correlated. From the evaluation measures in Table 3, it can be accepted that the forecasting ability of combined techniques for the proposed hybrid model (MLP + EXP + ARIMA) based on the weighted method can improve the forecasting accuracy as well as in MAPE value 0.82%. However, hybrid model can reduce MAPE within 2% of the obtained forecasting quality and results showed in Fig. 2c and Table 3. This figure indicates that the hybrid model fitting on the SDG-EURO data performs well when measured by different evaluation metrics. Smaller mean-absolute error (MAE) means a mean higher forecasting accuracy. A lower root-mean-square error (RMSE) indicates a better fitting degree of the daily exchange rate, and MAPE is an index to evaluate the forecasting ability of the model. At present, for the data of SDG-EURO, the best standard is about 0.97%. From the average of MAE in five experiments, MLP has the smallest value, indicating the best forecasting accuracy. What is more, the smallest RMSE cannot only mean that the hybrid model can fit the SDG-EURO time series well, but it can also prove that the forecasting results from the model are consistent. It can be proved that compared with the single forecasting model. The hybrid model is the most suitable for forecasting the SDG-EURO time-series data with a higher fit ability and better forecasting capacity.

Forecasting analysis and comparisons
Toward comparing the performance of different models, first fitting for the benchmark (MLP, ARIMA, and EXP), to forecast the exchange rates, individually. The performance comparison of six models (EXP, ARIMA, MLP, MLP + EXP, MLP + ARIMA, and the MLP + EXP + ARIMA) according to five evaluation criteria [MSE, RMSE, MAPE, MAE, and standard deviation (SD)] is explained in Table 3.  Table 2, it also can be observed that the predicted obtained value from all the utilised models is so close to the actual and to one another. Table 3 summarises the performance errors of each hybrid model in fitting the historical data. The empirical analysis confirms that the performance of all hybrid model's MAPEs are all within 2%, which indicate that the hybrid forecasting model has better performance. In detail, a hybrid model (MLP + EXP + ARIMA) based on the weighted combination technique proposed in this paper can control the MAPE <2%; thus, relative errors of the hybrid model are very small than other models. This observation demonstrates that the weighted combination method can reduce noise contained in time series and can thus enhance accuracy. It can be known that it has a very strong fit ability for non-linear data.
Note from Table 2 the convergence of the actual values to predict values in the hybrid model, which confirms that the hybrid model is a convenient and efficient model to predict currency exchange rate price. Moreover, each method was run five times, and the SD was calculated. It can be observed that the results of SDG-EURO exchange rate for all models are relatively small, which indicates that the models are not running randomly.

Comparison hybrid model and the literature models.
Finally, comparison process of hybrid model performance concluded this study compared with many aforementioned models in the literature such as [4,9,12,17,22] explained in Table 4. Inside the compared error values for all models, the proposed model (MLP + EXP + ARIMA) acquires the lowest MAPE, which is 0.82%. Therefore, we can summarise that the proposed hybrid model outperforms compared against investigative models within the literature. The superior performance of the hybrid model (MLP + EXP + ARIMA) result will influence each trend and regularity within the original time series, which significantly proved to enhance the financial series prediction with the high-accuracy rate. Besides, was against conventional MLP and ARIMA, EXP has a robust ability of generalisation, robustness, fault tolerance, and convergence ability.

Conclusion and future works
Forecasting financial data is a big issue for time-series analysts and researchers, and everyone in the scope of business. Despite numerous time-series models obtainable, the analysis for enhancing the effectiveness of prediction FTS has not been previously stopped. To overcome the deficiencies of normally used model and yield results that are additionally accurate. This paper has proposed two combination mothers from cooperatively MLP machine learning model, EXP, and ARIMA statistical models to capture both linear and non-linear characteristics that will be detected in time-series data. The proposed methodology was applied to SDG-EURO exchange rate case study.
Experimental results acceptable to prove that the proposed hybrid model (MLP + EXP + ARIMA) considerably outperforms the contemporary approaches for finance modelling and prediction. It is a valuable means within the forecasting task, particularly once higher forecasting accuracy is required. This procedure supports the validity of the advised forecasting methodology. We can conclude with some findings from this paper: (i) Methodological contribution and significance to this study were conducted to propose an improved method for a hybrid model, exchange rate forecasting to be applied from the SDG-EURO data sets then compared with the most related works in Section 2.
(ii) The proposed model tries out many innovative architectures and experiments in the financial field for concerned parties and acquired a suitable result. In particular, our research on previous studies indicates that the practical application framework of the proposed model to identify objectively the weights of each then to combine these with linear and non-linear to build a forecasting model. (iii) This study fills the knowledge gaps to highlight the importance and significance of MLP, EXP, and ARIMA as predictors, providing the rationale for the proposed model. Thus, this study has a contribution and significance in methodological terms from the theoretical learning point of view. (iv) A novel contribution to researchers is that this study highlight using the hybrid model in the financial application fields specially for the exchange rate. Moreover, the proposed models try to solve time-series models weight problem.
(v) Salacity of evaluation measures by using both statistical measures test to estimate errors and goodness of fit test by visual observation with empirical Cumulative Distribution Function (CDF) to show that the proposed model outperforms the other listed models. (vi) Proved that weighed method selects as the best combiner from suggested combination technical methods, so that it is the best hybrid architecture.
Consequently, future work should revolve around a definitely unique hybrid combination model's paradigm with different single models. Moreover, to check the model strength more extension to this study by essential testing with different data sets. We tend to suggest that further experiments to estimate the weights of the combination method.