Forecasting Stock Market Volatility Using Hybrid of Adaptive Network of Fuzzy Inference System and Wavelet Functions

This study aims to model and enhance the forecasting accuracy of Saudi Arabia stock exchange (Tadawul) data patterns using the daily stock price indices data with 2026 observations from October 2011 to December 2019. This study employs a nonlinear spectral model of maximum overlapping discrete wavelet transform (MODWT) with ﬁve mathematical functions, namely, Haar, Daubechies (Db), Least Square (LA-8), Best localization (BL14)


Introduction
e stock price movements are evaluated by volatility in the stock exchange market. Volatility explains the action of stock exchange market. It reflects that if the price of a stock fluctuates a lot over time (high volatility) or if a stock price fluctuates slowly over time (low volatility). Volatility is measured as the standard deviation of stock price [1]. Stock market volatility is a metric that measures riskiness of stocks and is relevant to both market policy makers and practitioners, mainly in emerging markets [2]. Indeed, an effective quantitative approach is needed to model the volatility of stock market in order to protect against unexpected price changes. Previous studies have shown that volatility in the stock market is time-varying; thus, the movements in volatility are nonrandom. erefore, a number of time-varying volatility models have been developed by financial econometricians and other practitioners [3][4][5][6][7][8].
ere are many articles that employed ANN for the prediction of stock market. For example, the authors [13] also predicted price fluctuations using the Haar wavelet and Takagi-Sugeno-Kang (TSK) fuzzy rule-based system. e TSK fuzzy rule-based method is used to forecast stock prices using a number of technical indices. e model has successfully predicted stock price fluctuations in Taiwan Stock Exchange market with an accuracy of up to 99.1% according to simulation results [14]. e authors proposed a forecasting fusion model that combined wavelet as a data preparation tool, fuzzy logic, and neural network. e proposed model was trained on dataset that covers duration from 2005 to 2010. e results indicate that this hybrid model achieves better forecasting accuracy than either of the models used separately. Similarly, the authors in [15] presented fuzzy wavelet neural network (FWNN) for the prediction of stock prices. e daily stock prices for the last three years have been used as dataset of 1000 samples where 950 samples were used for training and the remaining 50 samples were used for testing. e simulation results demonstrated that the proposed FWNN system with differential evaluation (DE) training has achieved better performance compared with other models. e ANFIS uses both fuzzy logic and ANN [16,17]. e forward and the backward processes comprise the ANFIS learning algorithm. e forward process goes via the five layers as given in [17][18][19].
e ANFIS models are used successfully in many fields such as engineering, computer science, and chemistry. Moreover, many models are used successfully for predictions when combining with MODWT. Note that, MODWT is a mathematical model based on five functions, namely, Haar, Db, LA-8, BL14, and C6 [20]. e literature reveals that a number of research works have been published that used ANFIS. e authors in [21] presented an ANFIS approach for long-term prediction of electricity consumption. ey introduced ANFIS and AR models to forecast the prediction of long-term natural electrical demand for some European countries. e authors in [22] applied the ANFIS model to the Yamadu Hydrological Station annual runoff forecast in China. e results show that the ANFIS model has better forecasting efficiency than the ANN model on the basis of relative percentage errors. e authors in [23] used ANFIS to forecast the future sales of an online shop. e sample size was 80 day's sale of 200 products.
e results show that the ANFIS model can partially improve the accuracy in time series prediction. e literature also reveals that few articles have focused on the wavelet with fuzzy logic. In [24], the weekly data have been taken from January 2012 to November 2014. ey used the fuzzy wavelet model to forecast the exchange rate of IDR of USD. e fuzzy wavelet model is the combination of the fuzzy Mamdani model and Discrete Wavelet Transforms (DWTs). e authors in [25] used a fuzzy wavelet neural control scheme for the micro-electro-mechanical system (MEMS). A novel time series forecasting model based on fuzzy cognitive maps and empirical wavelet transformation is proposed in [26]. e performance of wavelet neural network (WNN) and ANFIS models was compared using small datasets by [27]. e stock market volatility has been affected by macroeconomic variables such as inflation rate, unemployment, interest rate, gross domestic product, and oil prices. e Repo plays an important role in macroeconomy. e Repo is the monetary policy interest rate as it is used by the central bank to lend money to the banks for short term. e impact of Repo on stock market is studied by [28]. Furthermore, the oil price refers to the close price of a barrel of crude oil. e effect of oil prices on stock markets is studied by [29]. Indeed, the financial covariates (Repo and Loil) are investigated as input variables in our study.
According to the literature review, no one has concentrated on MODWT for modeling and enhancing the prediction accuracy in Tadawul over the last decade. In terms of the study's objectives, a variety of comparative studies have been conducted on the usage of various MODWT functions individually as well as in combination with other MODWTmodels over the last ten years in various methodologies. However, potential room exists for further investigation about comparative applications of all MODWT functions, which include Haar, Db, LA-8, BL14, and C6 in combination with fitting ANFIS model in terms of single particular context or financial market. In this connection, this study undertakes this work in relation to Tadawul since some researchers in the literature have only used one feature of MODWT. e current research aims at using MODWT functions to analyze fluctuations in Tadawul. e index refers to the average performance of firms listed in Saudi exchange market. Additionally, the causes of stock market volatility and modeling of variance behavior are also specified to represent the accuracy of expectations and the percentage of possible risks. Furthermore, by combining MODWT functions with the ANFIS model and using the statistical criterion such as MSE, RMSE, MAE, and MAPE, the forecasting accuracy is enhanced and the new forecasting model is proposed.
is study is organized as follows. Materials and methods are explained in Section 2. e research design and methodology are discussed in Section 3. e empirical results are analyzed in Section 4. e conclusions are drawn in Section 5.

Dataset.
e dataset for closing prices is obtained from Saudi Arabia stock market (Tadawul), Saudi Authority for Statistics, and Saudi Central Bank [30,31]. e day-to-day closing prices were gathered from August 2011 to December 2019. e size of observations is 2026 [20,32]. Table 1 shows the descriptive statistic of dataset.
LSCS refers to the logarithm of standard deviation for closing stock prices, which can be expressed as , where x is the closing stock price.
LSCS has a mean of 6.75 and a standard deviation of 0.6923. LSCS has a minimum value of 3.83 and a maximum value of 7.22. It should be noted that Repo has a mean of 0.70 and a standard deviation of 0.28 whereas Repo has a minimum value of 0.13 and a maximum value of 4.55. Loil has a mean of 4.30 and a standard deviation of 0.35 whereas Loil has a minimum value of 3.33 and a maximum value of 4.84.

Wavelet Transform Formula.
Wavelet transform (WT) is a mathematical formulation for transforming the original time series data into a time-scale domain. WT is an appealing option for nonstationary data, especially stock exchange market data because of its inherent nature. WT can be categorized as continuous wavelet transforms (CWTs), DWT, and MODWT. Note that, these transforms demonstrate similar behavior in general. e key difference among DWT and MODWT is that DWT can be applied to a certain number of observations (the size of samples should be 2 raised to the power J) whereas MODWT can handle data of any size. In this study, we focus on MODWT due to its flexible behavior [32,33].
eoretically, WT is an extension of Fourier transform (FT) [34] that is the output on sine and cosine functions. WT should meet the following criterion [20]: where φ(f) is a function of frequency f and known as the FT. WT is used in a variety of applications including signal processing and image analysis. It was developed to solve the FT issue, essentially when treating with time, space, and frequency. As shown in equations (2a) and (2b), the father wavelet represents the low-fluctuate (smooth data) components whereas the mother wavelet represents the highfluctuate (detailed data) components, with j � 1, 2, 3, . . . , J in the J-level wavelet decomposition: where J defines the maximum scale supported by the number of data points and the two forms of wavelets, father and mother wavelets, and meets the following criteria as expressed by the following equations: e general mathematical model is presented in the following equation: In more detail, as expressed by the following equation, In equations (6a) and (6b) and, where S j (t) and D j (t) are the smooth and detailed coefficients, respectively, the WT is used to measure the approximation coefficient. e detailed coefficients are used to measure the significant fluctuations of the original data, while the smooth coefficients contain the most significant features of the original data. In general, Haar, Db, LA-8, BL14, and C6 are common transform functions in WT [32]. e following are some of the key characteristics of these functions. Except for the Haar model, the WT functions are arbitrarily regular. WT functions, with the exception of the Haar model, do not have explicit expression. WT functions are applied to real numbers. WT functions support an arbitrary number of zero moments, orthogonal, compact, bio-orthogonal analysis, orthogonal analysis, continuous/discrete transformation, fast algorithm, and exact reconstruction.

ARIMA Model.
e autoregressive moving average (ARMA) model is considered one of the most important mathematical models, which is widely used in time series analysis. A moving average (MA) and an autoregressive (AR) models are combined in the ARMA model. A time series e t denotes a white noise (WN) process, and Y t denotes Gaussian process iff for all t, e t is iid N(0, σ 2 ). A time series Y t , given by equation (7), follows the ARMA(p, q) model of [20,35,36]: where q and p are used as nonnegative integers, p is defined as the order of autoregressive part (AR), q is used as an order of the first (MA) part, and e t is defined as the white noise (WN) process. (ARIMA(p, d, q)) is an extension of the ordinary ARMA model. (ARIMA(p, d, q)) is given by [20].
where p, d, and q are the orders of auto-regression, integration (differencing), and moving average, respectively. When d � 0, the ARIMA model is reduced to the ordinary ARMA model. [37] where the ANN learning algorithm is used for training. Its operations consist of forward and backward steps that collectively comprise the ANFIS learning algorithm. e forward step consists of five layers. e fuzzy inference system under consideration is supposed to have two inputs (x, y) and one output (z) to simplify the explanations. Note that the input x represents the variable oil price, the input y represents the variable Repo, and the output z represents the logarithm of standard deviation for closing stock prices (LSCS). A standard rule base of fuzzy if-then rules for a first order of the Sugeno fuzzy model can be expressed as follows: if x is A 1 and y is B 1 , then f 1 � p 1 x + q 1 y + r 1 , where p, r, and q are denoted as linear output parameters. Figure 1 depicts the ANFIS architecture, which has two inputs and one output. Layer 1. Every node i in this layer is a square node with a node function as given by the following equations:

ANSIF Model. ANFIS utilizes both fuzzy logic and ANN
where x and y are denoted as inputs to node i and A i and B i are defined as linguistic labels for the inputs. In other words, O 1,i is the membership function of A i and B i . Typically, μA i (x) and μB i (y) are selected to be bellshaped with maximum value of 1 and minimum value of 0, such as where the set of parameter is a i and c i . ese parameters are referred to as premise parameters in this layer. Indeed, using the Gaussian function as the shape of the membership function, the fuzzification process transforms crisp values into linguistic values. Layer 2. Each node in this layer is a circle node labeled that multiplies the incoming signals and sends out the product as expressed by the following equation: Each node output describes the firing strength of a rule. In this layer, the t-norm operator (the AND operator) is used by the inference stage. Layer 3. Each node in this layer is a circle node called N.
e i th node measures the ratio of the i th rule firing strength to the sum of all rules firing strengths as given by the following equation: In short, the ratio of the strengths of the rules is calculated in this layer. Layer 4. Each node i in this layer is a square node with a node function as expressed by the following equation: where w i is the output of layer 3 and p i , q i , and r i are the parameter set. Parameters in this layer are referred to as consequent parameters. In short, the parameters for the consequent parts are measured in this layer. Layer 5. A circle node called is the single node in this layer that calculates the overall output as the summation of all incoming signals as given by the following equation: e backward step is a database estimation method consisting of the membership function parameters in the antecedent part and the linear equation coefficients in the consequent part. Since the Gaussian function is used as the membership function in this process, two parameters, namely, mean and variance of this function are optimized. e least squares method is used to perform the parameter learning in this step.

Performance Measures.
We use a number of accuracy criteria including the mean absolute percentage error (MAPE), the mean absolute error (MAE), the mean error (ME), and the root mean squared error (RMSE) [38,39] as follows: e MAPE criterion, also referred to as mean absolute percentage deviation (MAPD), is a criterion of prediction accuracy for the forecasting method in statistics. It always expresses accuracy as a percentage and is determined by equation (14), where X t represents the actual value, F t represents the forecasted value, and n represents the sample size. In this equation, the absolute value is added for each forecasted point in time and divided by the number of fitted points. In addition, the MPE is defined by equation (15), the MAE is given by equation (16), and ME is expressed by equation (17). e root mean square deviation (RMSD), also known as the RMSE, is a widely used criterion of the estimator differences. It estimates the mean error produced by the model in predicting the outcome for an observation. It is determined by equation (18), where N denotes the number of observations.

Research Design and Methodology
e aim of this study is to propose a new model to forecast the closed price data from the Tadawul stock market, which covers the period from 2011 to 2019. e proposed model coupled the ANFIS model with MODWT-LA8. In this connection, we have employed five MODWT functions, namely, Haar, Db, LA-8, BL14, and C6. Note that, the statistical tests are used to evaluate the accuracy of the models. Moreover, the original data were transformed into a time-scale domain using MODWT. e different phases of the MODWT forecasting mechanism are depicted in Figure 2. It should be noted that the wavelet process is used repeatedly while the data pattern was fluctuating. e objective of preprocessing is to reduce the statistical error criteria such as RMSE in the data before and after transformation. In this way, the noise in the original data can be eliminated. Essentially, the adaptive noise in the training pattern can help to minimize the risk of overfitting in the training process.
us, we used MODWT twice for the preprocessing of the training data in this study. Further, MODWT converts the data into two sets, namely, detail series and approximation series. Since the financial data fluctuates significantly, we have employed these two series due to the fact that they show good behavior on such data. is helps in anticipating the transformed data more precisely. e MODWT's filtering effect is responsible for these two series' positive behavior.
In order to propose our new model, we designed the following methodology. Firstly, we have collected the closed price data from Tadawal. Secondly, the closed price data have been treated using logarithm standard deviation to find LSCS. irdly, e LSCS data have been decomposed using MODWT function that divides the LSCS data into two partitions, namely, details coefficient (high fluctuated data) and approximation coefficient (low fluctuated data). We have employed five MODWT functions, namely, Haar, Db, LA-8, C6, and BL14. e approximation coefficient for each function consists of the main features of the data and is used as output in the forecasting model. Fourthly, the approximation coefficient (LSCS) for each function is used with input variables (Repo and Loil) inside ANFIS to our proposed model MODWT-ANFIS. Finally, the best MODWT-ANFIS model has been compared with other functions of MODWT-ANFIS and also with traditional models (ARIMA and ANFIS models).
In order to make a fair comparison, we first applied 80% of the original data and the converted data to the proposed model and then selected the best performing model, which is further used with other suggested models for the remaining 20% data. is confirms the outstanding performance of our proposed model.

Endogeneity Issues.
In this section, we discuss to select suitable variables by removing multicollinearity, causality test, and multiple regression analysis.

Correlation.
In this section, we carefully picked independent variables from various other variables, which are removed depending on the statistical test. Firstly, we removed variables as a result of multicollinearity among independent variables as shown in Table 2. e absence of perfect multicollinearity, an exact (nonstochastic) linear relationship between two or more independent variables, is generally referred to as no multicollinearity. We removed some variables from input variables based on their strong correlation with other input variables. e correlation between the input and output variables is shown in Table 2.

Engle and Granger Causality Test. Engle and Granger's test uses co-integration to represent causal relationships.
Based on static regression, it creates residuals (errors). An augmented Dickey-Fuller test or another similar test uses residuals to detect unit roots. e residuals will be almost stationary if the time series is co-integrated [30,31,39,40].
where Y is the output variable, X is the input variable set, ECT is the word for error correction whereas β, α, and ϕ are the parameters. If ϕ is negative or greater than 1.96, the null hypothesis for the Engle Granger test (H0: there is no cointegration) is rejected. In more details, the rule for hypothesis testing says that if the p value is less than or equal to the critical value, then we reject the null hypothesis. Table 3 explains the Engle and Granger test for output and input variables and shows that all values of "p value" are less than 0.05. Accordingly, the null hypothesis rejected the input variables, and we conclude that there is sufficient evidence to support the claim: there is co-integration with output variables at a significant level of 5%. is result almost implies that the output variable is influenced by the input variables.

e Results of Multiple
Regressions. An extension of simple linear regression is multiple regression that is used when we want the value of a variable to be predicted based on the value of two or more other variables. In this study, the variable that needs to be predicted is the dependent variable (LSCS) whereas Repo and Loil are the independent variables that are used to predict LSCS. e multiple regression analysis is shown in Table 4. At 5% significant level, the Repo rate and Loil are significant. In addition, R-square and adjusted R-square are approximately 46%, which implies that the independent variables can explain about 46% of the output variable. F-statistic at 1 percent signifies that the linear regression model is better suited to the results.
ere is negative relationship between oil prices and volatility risk (β � − 1.368), which measures the standard deviation of closing stock prices.
is indicates that the    Journal of Mathematics increase in oil prices will reduce the volatility risk in stock market. On the other hand, the Repo rate has positive relationship with volatility risk (β � 0.264). is implies that the increase in Repo will increase the volatility risk.

Results and Discussion
e current study investigates the closing price data from Tadawul. It has been selected for a variety of reasons. In terms of financial market volatility, emerging markets have a deep characteristic. Due to lack of information, stochastic trading, unprofessional financial analysis, and the Saudi sector have experienced considerable volatility. Furthermore, the investors from outside the Gulf Cooperation Council (GCC) are not permitted to invest in Saudi stocks. Tadawul is the largest exchange market in the Middle East. As seen in Figure 3, the volatility data are decomposed using MODWT with the LA-8 function.
Firstly, the closed price data have been treated using logarithm standard deviation to find LSCS. Secondly, e LSCS data have been decomposed using MODWT via R-statistic software. e MODWT mechanism divides the LSCS data into two parts, namely, details coefficient (high fluctuated data) and approximation coefficient (low fluctuated data). e approximation coefficient for each function consists of the main features of data and is used as output in the forecasting model. We have employed five MODWT functions, namely, Haar, Daubechies (Db), least Asymmetric (LA-8), Coiflet (C6), and the best-localized function (BL14). As a result the best function with MODWT is LA-8 (see Table 5). e best result of MODWT with the LA-8 is described in Figure 3.
MODWT-based decomposition is an effective approach for revealing variations, magnitudes, and phases of the data. e WT will determine the levels of decomposition using the formula, according to the WT mechanism X � TV1 + TW1 where the original signal is referred to as X. e next component shows the plot of the transformed data approximation coefficients at one level of approximation (TV1). e TW1 reflects the level of detail, whereby TW1 is the plot of the first level of the coefficients of detail, so the fluctuation can be explained by this level. Note that, 80% of the data are represented by 1620 samples out of 2026 total samples that are given along x-axis of 3.
Tadawul has numerous fluctuations from 2011 to 2019. It was recorded that the general index of stock exchange market dropped to 6417.7 points in 2011 whereas it grew up to 8535 points in 2013. Market management modified the trading process from SAXESS to X-Stream INET, and an interactive multiuser system (IFSAH) was developed to improve the market's efficiency and effectiveness [41]. e fluctuation of stock prices is one of the issues that confronted

e Result of Forecasting WT Models with ANFIS.
In order to validate our findings, the forecasting is conducted using the remaining 20% of the transformed and original data with the same proposed models. e best model is ARIMA-MODWT (LA-8) with ANFIS since it has the lowest ME, RMSE, MAE, and MPE-fit as shown in Table 6. Similar to the training phase, LSCS is used as output variable whereas Repo and Loil are used as input variables by MODWT to construct ANFIS and ANFIS + MODWT models.

Conclusion
In this study, we have proposed a new model (MODWT-LA8-ANFIS) successfully. e model is used to forecast the closing price in stock market. We have selected oil price and Repo rate as input values based on correlation, the Engle and Granger Causality test, and multiple regressions. We found that there is weak correlation between the input variables (r � 0.327). On the other hand, the correlation between oil price and output variable (closing price) is strong (r � − 0.673). Moreover, the input variables have causality with output variable based on the Engle and Granger Causality test. In order to check the significant effect, the multiple regression test is used. As a result, the input variables are significant at level 5%. e output variable is collected from Tadawul from October 2011 to December 2019 with 2026 observations. e input variables in this study have been collected from Saudi Authority for Statistics and Saudi Central Bank. e MODWT mechanism splits variables into details coefficient and approximation coefficient. MODWT has five functions, namely, Haar, Daubechies (Db), least Square (LA-8), Best localization (BL14), and Coiflet (C6). erefore, the output variable is split into details coefficient (high fluctuated data) and approximation coefficient (which consists of the main features of data). e approximation coefficient data (MODWT-LA8) are used with input variables to build our model MODWT-LA8-ANFIS. e MODWT-LA8-ANFIS has been evaluated using statistical tests, namely, mean error (ME), root mean squared error (RMSE), and mean absolute percentage error (MAPE). e MODWT-LA8-ANFIS model has been compared with traditional models (ARIMA and ANFIS models). e MODWT-LA8-ANFIS is more accurate than the traditional models. erefore, the new proposed forecasting model can be generalized to forecast in other international stock markets. Furthermore, this model is sufficiently powerful to optimize business processes for economic development of a country.
Data Availability e data are publicly available online that are referred to in the article.

Conflicts of Interest
e authors declare that they have no conflicts of interest.