Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Hybrid fuzzy inference rules of descent method and wavelet function for volatility forecasting

  • Abdullah H. Alenezy,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft

    Affiliations Department of mathematics, College of Science, University of Ha’il, Hail, Kingdom of Saudi Arabia, School of Mathematical Science, Universiti Sains Malaysia, Penang, Malaysia

  • Mohd Tahir Ismail,

    Roles Investigation, Project administration, Resources, Supervision, Validation, Writing – review & editing

    Affiliation School of Mathematical Science, Universiti Sains Malaysia, Penang, Malaysia

  • Jamil J. Jaber,

    Roles Formal analysis, Investigation, Methodology, Validation, Visualization, Writing – original draft

    Affiliation Department of Finance, School of Business, The University of Jordan, Aqaba, Jordan

  • S. AL Wadi,

    Roles Conceptualization, Data curation, Project administration, Supervision

    Affiliation Department of Finance, School of Business, The University of Jordan, Aqaba, Jordan

  • Rami S. Alkhawaldeh

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Software, Validation, Visualization, Writing – review & editing

    r.alkhawaldeh@ju.edu.jo

    Affiliation Department of Computer Information Systems, The University of Jordan, Aqaba, Jordan

Abstract

This research employs the gradient descent learning (FIR.DM) approach as a learning process in a nonlinear spectral model of maximum overlapping discrete wavelet transform (MODWT) to improve volatility prediction of daily stock market prices using Saudi Arabia’s stock exchange (Tadawul) data. The MODWT comprises five mathematical functions and fuzzy inference rules. The inputs are the oil price (Loil) and repo rate (Repo) according to multiple regression correlation, and the Engle and Granger Causality test Engle RF, (1987). The logarithm of the stock market price (LSCS) in Tadawul reflects the output variable. The correlation matrix reveals that there is no collinearity between the input variables, and the causality test demonstrates that the input variables significantly influence the outcome variable. According to the multiple regression, there is a substantial negative influence between Loil and LSCS but a significant positive effect between Repo and output. For the 80% dataset under ME (0.000005), MAE (0.003214), and MAPE (0.064497), the MODWT-LA8 (ARIMA(1,1,0) with drift) for the LSCS variable performs better than other WT functions. In the novel hybrid model MODWT-FIR.DM, each function’s approximation coefficient (LSCS) is applied with input variables (Loil and Repo). We evaluate the performance of the proposed model (MODWT-LA8-FIR.DM) using different statistical measures (ME, RMSE, MAE, MPE) and compare it to two established models: the original FIR.DM and other MODWT-FIR.DM functions for forecasting 20% of datasets. The outcomes show that the MODWT-LA8-FIR.DM performs better than the traditional models based on lower ME (3.167586), RMSE (3.167638), MAE (3.167586), and MPE (80.860849). The proposed hybrid model may be a potential stock market forecasting model.

1 Introduction

The stock market collapse is a significant fall in stock prices. Economic crises are typically the cause of stock market collapses. Saudi Arabia’s economy, like other countries, is influenced by global financial crises. The financial crisis of 2007–2008, often known as the “Subprime Mortgage Crisis”, began with the collapse of the United States housing market which eventually brought about the Great Recession. The crisis was a severe global economic depression in the early twenty-first century. It was the worst economic depression since the Great Depression (1929). Between mid-2014 and early 2016, the world economy experienced one of the most significant drops in oil prices in modern history. One of the three biggest price drops after World War II occurred during that time, with prices falling by 70%. However, volatility in the stock exchange market is a statistical measure of the dispersion of close price fluctuations. The term “volatility” refers to the degree of risk or uncertainty connected to the amount of fluctuation in stock prices.. A stock’s value may be more widely distributed throughout a wider range of prices with greater volatility [1]. However, Lower volatility is a sign that the stock’s value is more stable and does not fluctuate significantly. In other words, volatility indicates whether a stock’s price fluctuates dramatically a lot over time (high volatility) or slowly over time (low volatility). The standard deviation in close price from the same stocks or market index is frequently used to measure volatility [1]. To hedge contra unpredictable prices movements, an effective quantitative approach is required to model stock market volatility. The stock market volatility is time-varying, according to previous research. As a consequence, financial econometricians and other practitioners have created a number of time-varying volatility models [2, 3]. In finance and economics fields, the time series prediction approach has a wide range of applications [4, 5]. The methods for time series prediction are classified as statistical or non-statistical. The statistical methods are the autoregressive (AR) and Autoregressive Integrated Moving Average (ARIMA) models. These methods are only applicable for linear time series and are ineffective for nonlinear time series. The non-statistical and soft computing models are fuzzy algorithms and neural networks [6, 7]. Traditional time series models produce accurate forecasts, which is important to highlight. However, because future conditions are uncertain, it is preferable to anticipate a quantity using imprecise values such as fuzzy sets. Time series in the actual world are composed of linear and non-linear components.

A mathematical model called the MODWT is built on five functions; which are Haar, Db, LA-8, BL14, and C6. The literature review indicates that no research has specifically focused on MODWT for modeling and enhancing accuracy in the Tadawul dataset. In terms of the study’s objectives, ranges of comparative studies have been undertaken various methodology of various MODWT functions separately. However, our contribution is to combine the gradient descent approach (FIR.DM) as a learning method to train the MODWT model that is combined with fitting Fuzzy inference rules in a single particular context or financial market. The gradient descent approach is a backward process that updates the weights of the MODWT model by the error calculated using the predicted and actual values. The proposed model intends to examine volatility in the Tadawul dataset. The stock index measures the average performance of the Saudi Stock Exchange listed companies. To illustrate the reliability of predictions and the proportion of potential risks, the sources of stock market volatility and variance modeling behavior are provided. As a result, the forecasting accuracy is improved and a new forecasting model is proposed by combining MODWT functions with the FIR.DM model and using a statistical criterion such as the Mean absolute percentage error (MAPE), the mean percentage error (MPE), the Mean error (ME), the Mean absolute error (MAE), and the Root means squared error (RMSE). The proposed model dwells on prior research and enhance stock price volatility forecasting accuracy by combining wavelet function and fuzzy inference rules of descent method. This new model improves stock price volatility forecasting, which is crucial for trading, hedging, and arbitrage purposes.

The structure of this study is as follows. Section 2 shows the previous studies. Section 3 explains the materials and processes. Section 4 discusses the research design and methodology. The empirical findings are analyzed in Section 5. The limitations and directions for future research are discussed in Section 6 followed by the conclusions in Section 7.

2 Literature review

Several researchers used different models for time-series forecasting. Chen-Xu and Jie-Sheng [8] used ARMA model to forecast the cash flow of a commercial bank. In order to create an ARMA model to forecast stock returns, Kim investigated the symmetric maximum likelihood (ML) loss function and developed an asymmetric loss function [9]. Similarly, numerous researchers studied ARIMA models for time series predicting [10, 11]. Plenty of studies in time-series forecasting indicate that non-linear models show superior accuracy performance to linear models [12, 13]. As a result, nonlinear models were used for time series forecasting. For instance, Santos and dos Santos Coelho [14] used a non-linear Multilayer Perceptron (MLP) Artificial Neural Network (ANN) with the Takagi-Sugeno fuzzy system and a Radial Basis Function Neural Network (RBFNN) to predict exchange rates. Xiao-Ming and Cheng-Zhang [15] studied 10 ANNs models as base model in AdaBoost approach to predict stock price in the Shanghai Stock Exchange and foreign stock markets. The authors in [16] studied the Haar wavelet and Takagi–Sugeno–Kang (TSK) fuzzy rule-based system to forecast price fluctuations. The TSK fuzzy rule-based model is used with a number of indicators to predict stock prices. According to simulation results, the model effectively predicted the stock price fluctuations in the Taiwan Stock Exchange market with an accuracy up to 99.1%. The authors of [17] suggested a forecasting fusion model that integrated wavelet as a data preprocessing, neural networks, and fuzzy logic. On data from the years 2005 to 2010, they trained the suggested model. The findings indicate that the hybrid model shows superior predicting accuracy to any of the separate models.

Furthermore, the authors in [18] proposed a fuzzy wavelet neural network (FWNN) for forecasting stock prices. The daily dataset of 1000 samples split to 95% from sample for training and the remaining 5% for testing. The simulation results showed that the proposed FWNN system with differential evaluation (DE) training performed better than alternative models. The authors in [4] used ANFIS model, but we used FIR.DM in our proposed model. The model in [4] combined adaptive network-based fuzzy inference system (ANFIS) with five mathematical functions for MODWT to enhance the forecasting accuracy of Saudi Arabia stock exchange (Tadawul). The daily dataset of 2026 samples split to 80% from sample for training and the remaining 20% for testing. The performance of the proposed model better than traditional models.

The TSK model has a special case called fuzzy inference rules with descent method (FIR.DM) [19]. Nomura, Hayashi, and Wakami (1992) used a genetic-algorithm-based method to adjust an input space’s fuzzy partition. They developed a simplified model of fuzzy reasoning, where the real number in the consequent part of the inference rules and the membership functions in the antecedent part are tuned using the descent approach. As a result, the performance of this method is higher than that of a conventional back-propagation type neural network [19, 20]. The form of the membership function of each antecedent fuzzy set and the number of fuzzy if-then rules were determined from numerical data by [2022]. There are numerous studies on fuzzy inference system learning. Learning approaches that use the steepest descent method (SDM) and vector quantization (VQ) are recognized to be superior to other methods [23]. The learning approaches based on SDM which (1) reduce fuzzy rules to one by one from a significant large number of rules, or create fuzzy rules one by one from any number of rules; (2) determine fuzzy systems by particle swarm optimization (PSO) and genetic algorithm (GA); (3) use single input rule modules (SIRMs) and double input rule modules (DIRMs) approaches, which are fuzzy inference systems with a small number of input rule modules; (4) identify he initial assignment of parameters by self-organization or a vector quantization technique [23].

The volatility is a crucial component for many financial market. it may be used to estimate the risk and reward potential of a certain financial asset. In [24], they employ long short-term memory models (LSTM) and deep neural networks (DNN) as in [25, 26] to predict the volatility of stock indices in US stock market. S&P 500 Index, Dow Jones Industrial Average Index, and NASDAQ Composite Index represent the three samples. The periods are correspondingly 23240, 31096, and 12457 trading days. The findings demonstrate that deep learning models with likelihood-based loss functions are better at predicting volatility than deep learning models with distance loss functions and econometric models, with the LSTM model being the superior model among the two deep learning models with likelihood-based loss functions.Verma [27] predicts the volatility of crude oil using a hybrid models from Bloomberg from 2003 to 2020. He uses Glosten-Jagannathan-Runkle (GJR)-GARCH and generalized autoregressive conditional heteroscedasticity (GARCH) and long short-term memory (LSTM) to create Three novel forecasting models—called GARCH-LSTM, GJR-LSTM, and GARCH-GJRGARCH LSTM. Chen uses the S&P 500 index and WTI oil prices for the period of January 1990 to December 2021 [24]. The nonlinear threshold effect of stock market shock on oil market volatility is captured by the threshold autoregressive regression (TAR) model. According to empirical study, referring to the significant importance of stock volatility’s strong threshold effects for predicting oil volatility.

Macroeconomic factors including unemployment, inflation, and interest rates, gross domestic product, and oil prices have an impact on stock market volatility. In the macroeconomy, the Repo is crucial important factor. The Repo is the monetary policy interest rate, which the central bank uses to lend money to banks for a short period. [28] investigates the impact of repo on the stock market. The other factor is oil price that also refers to the current price of a barrel of crude oil. [29] investigates the impact of oil prices on stock markets. Financial covariates (Repo and Loil) are indeed studied as input factors in our study.

3 Methodological issues

The background information for the key ideas in our study is provided in this section.

3.1 Autoregressive Integrated Moving-Average (ARIMA) model

One way to assess the strength of a dependent variable (outputs) in relation to other fluctuating variables (inputs) is using a regression analysis called ARIMA model. The purpose of the model is to estimate financial market behaviour by looking at the differences between values in a series rather than actual values. The ARIMA model is a combination of an auto-regressive (AR) model, an Integrated (I) model, and a Moving Average (MA) model. A model in which a changing variable regresses on its own lagged, or prior, values is known as an AR model. The I model denotes the differencing of raw observations in order for the time series to stability become stationary. The MA incorporates the dependency between an observation and a residual error from the MA model applied to lagged observations. A time-series et which is called a white noise (WN) process, and Xt is called Gaussian process if for all t, et is iid N(0, σ2). A time-series Xt is said to follow the ARMA(p,q) model if [30]: (1) where q and p are non-negative integers, p represents order of autoregressive part (AR), q is defined as order of the first (MA) part and et is the white noise (WN) process. An extension of the ordinary ARMA model is the auto-regressive integrated moving-average model (ARIMA(p,d,q)) given by [30]: (2) where p, d and q denote orders of auto-regression, integration (differencing) and moving average, respectively. Where ϕp(B) = (1 − ϕ1 B−…−ϕp Bp) is a p order polynomial in B and θq(B) = (1+ θ1 B+ …+ θq Bq) is a q order polynomial in B. B is called time lage operator or backward shift, such as (B2 Xt = Xt−2). When d = 0, the ARIMA model reduces to the ordinary ARMA model.

3.2 MODWT model

The spectral filtering technique known as Fourier transform (FT) has been extensively employed in industrial and scientific applications. FT transfers a group of complex valued functions to another function; which is known as frequency domain. The Discrete Fourier Transform (DFT) is one type of FT that defined as follow [31] (3) Where X is time series data, , and N is discrete point. The inverse discrete Fourier transform (IDFT) was also defined as: (4)

In light of this, FT and inverse FT, which are created using the previous equations DFT and IDFT, respectively, directly rely on DFT and IDFT.

A feature of the Wavelet Transform (WT) is its ability to “Zoom in” on short lived frequency events. While WT localizes in both the frequency (scale) and time (position) domain, FT only localizes in the frequency domain and not the time domain. The original time series data are transformed using the mathematical function WT into a time-scale domain [32]. The WT transforms the period (or frequency) of data without affecting time resolution. The WT can be applied to both stationary and non-stationary data, such as noise removal from time series, trend analysis and forecasting, and detection of abnormal behavior in data. However, the WT is an attractive option for non-stationary data especially stock market data due to its inherent nature. Three types of wavelet transforms are available: maximum overlapping wavelet transform (MODWT), the discrete wavelet transform (DWT), and continuous wavelet transform (CWT). In general, the aspects of these functions are the same. The main difference between DWT and MODWT is that DWT can be used with a specified number of data (number of observations should be 2 raised to the power J) whereas MODWT can be used for any size of data.

Therefore, our focus in this article is on MODWT due to its the most recent approach for overcoming the Fourier Transform’s (FT) [33, 34]. The WT is an extension of FT [35], which is based on sine and cosine functions. WT satisfies the admissibility condition [30]: (5) where φ(f) is the FT that is a function of frequency f, φ(t). The applications of WT are image analysis and signal processing [32]. It overcomes the problem of FT, especially when dealing with time, space, or frequency.

The following two WT types are according to Eq 6, the mother wavelet defines the high-frequency (detailed data) components and the father wavelet describes the low-frequency components (smooth data), with j = 1, 2, 3, …, J in the J-level wavelet decomposition. The General WT model is summarized as: (6) where Sj,k and dj,k demonstrate the smooth and detailed coefficients respectively, J denotes the maximum scale sustainable by the number of data points and the two types of wavelets stated above, the and . The smooth coefficients contain the most important features of the original data whereas the detailed coefficients are used to detect the original data’s main fluctuations. The father wavelets and mother wavelets. It satisfies the following conditions: (7)

Generally, the MODWT has popular transform functions; which are Haar, Daubechies(d4), coiflet (c6), Least Asymmetric (LA8), and the best-localized (bl14). The number of main characteristics of these functions are; the MODWT functions are arbitrary regular and do not have explicit expression except the Haar model.. Moreover, the functions use real numbers, orthogonal, compact and support arbitrary number of zero moments, existing of the scale function, continuous / discrete transformation, exact reconstruction, and fast algorithm. However, the Haar model is symmetry. The LA8 and d4 are Asymmetry, whereas near symmetry is associated with C6 and Bl14.

3.3 FIR.DM model

FIR.DM is a particular case within the TSK model. This method is proposed by [21]. The FIR.DM uses simplified fuzzy reasoning where the membership functions in antecedent part and the real number in consequent part of inference rules are tuned by means of the descent method. The learning speed and the generalization capability of this method are higher than those of a conventional back-propagation type neural network [21].

3.3.1 The conventional Fuzzy Inference (FI) model.

The conventional fuzzy inference (FI) model using descent method is defined [21]. Let x = (x1, …, xm) and y be input and output variables, respectively, where x, yR. R is the set of real numbers. Then, the rule of simplified fuzzy inference model is expressed as [21]. (8) where (j = 1, …, m) is a rule number, (i = 1, …, n) is a variable number, Aij is a membership function of the antecedent part (x1, …, xm), and wi is a real number of the consequent part (y). A membership value of μi of the antecedent part for input x is expressed as (9) Then, the output of fuzzy reasoning y can be derived from the equations: (10) If Gaussian membership function is used, then Ai j is expressed as: (11) where cij and bij denote the center and the width value of Aij, respectively [21].

3.3.2 Algorithm of self-tuning.

The objective function (E) is dedicated to estimate the inference error between the output that is desired and the output that is inferred based on the self-tuning process by a decent method. Let (p = 1, …, P) is a number of input variables. The objective of learning is to minimize the following mean square error (MSE) as [21]: (12) In order to minimize the objective function MSE, each parameter of c, b, w is updated based on the learning rule is expressed by the following formula. (13) where (zi = ci j, bi j, wi), t is a number of iteration time and K is a leaning rate constant. The gradients of the objective function in Eq 13 can be derived from the Eqs 14 to 16, as following [21]: (14) (15) (16) The learning algorithm for the conventional fuzzy inference model is shown in the following:

Algorithm 1 Learning Algorithm of the Conventional Fuzzy Inference Model

Step1: The threshold θ of inference error and the maximum number of learning time Tmax are set. Let n0 be the initial number of rules. Let t = 1.

Step2: The parameters wi, cij, and bij are set randomly.

Step3: Let p = 1.

Step4: The input-output data (x1, …, xm, yr) is inputted.

Step5: Fuzzy reasoning is performed for the input data (x1, …, xm) by using Eqs (9 and 10), The membership value μi of each inference rule and the output of fuzzy reasoning y are derived.

Step6: Parameters wi, cij, and bij are updated by Eqs (1416).

Step7:

if p = P then

  go to Step 8

else if p < P then

 go to Step 4

p = p + 1

end if

Step8: Let E(t) be inference error at step t calculated by Eq (14).

if E(t)>θ and t < Tmax then

 go to Step 3

t = t + 1

else if E(t) ≤ θ and t < Tmax then

 The algorithm terminates.

end if

Step9:

if t > Tmax and E(t) > θ then

 go to Step 2

n = n + 1

t = 1

end if

3.4 Accuracy criteria

This section can be used to present the criteria that were applied to a fair comparison between actual value and expected value. Five different types of accuracy criteria have been adopted; RMSE, ME, MAE, MPE, and MAPE. The RMSE, also known as root-mean-squared deviation (RMSD), is a commonly used indicator of the divergences between estimators. It determines the average error the model makes while predicting the outcome of an observation. It is defined as , where MSE mentioned in Eq (11) [3, 29]. The MPE is the calculated average of the percentage errors between the forecasted values of a model and the original values of the quantity being predicted. As a percentage, it is typically expressed as follows: (17) Where Xt is the original value, Ft is the forecasted value and n is the sample size. The absolute value in this calculation is summed for every forecasted point in time and divided by the number of fitted points. In addition, MPE, MAE, and ME are defined as: (18) (19) (20) The RMSE is also known as root-mean-squared deviation (RMSD) that is a frequently used measure of the estimators differences. It measures the average error produced by the model in predicting the outcome for an observation. It is defined as: (21) where MSE mentioned in Eq 12 [4, 36].

4 Research design and methodology

The goal of this study is to develop a hybrid model for forecasting closing price data from the Tadawul stock market from 2011 to 2019. The proposed model combines the FIR.DM and MODWT-LA8 models. The Haar, Daubechies (d4), least Asymmetric (La8), and Coiflet (C6) and the best-localized (bl14) are included in MODWT functions. The performance is assessed using the accuracy measure. In addition, the MODWT is used to convert the original data into a time-scale domain. Fig 1 illustrates the different stages of the MODWT forecasting model. It is important to note that the wavelet technique is employed many times while the data pattern was fluctuating. The goal is to minimize statistical error criteria in the data before and after transformation, such as RMSE. Furthermore, the MODWT divides the data into two groups; detail series and approximation series. If the original financial data have significant fluctuation, we used these two groups because they explain the behavior of data.

The proposed hybrid model depends on the robust methodology designed; which include: (1) gathering closing price data from Tadawal, (2) calculating LSCS from closed price data using logarithm standard deviation, (3) decomposing the LSCS data using the MODWT function, which splits the data into two groups: low fluctuated data (approximation coefficient) and high fluctuated data (details coefficient). In addition, Haar, Db, LA-8, C6, and BL14 are five MODWT functions that we used. The main features of the data refer to the approximation coefficient for each function, which is used as an output in the forecasting model, (4) the approximation coefficient (LSCS) for each function is applied with input variables (Loil and Repo) within FIR.DM in new hybrid model MODWT-FIR.DM. Finally, a comparative study is conducted between the best MODWT-FIR.DM model and alternative MODWT-FIR.DM functions as well as traditional models (ARIMA and FIR.DM).

To evaluate the proposed model, we use handout technique of 80% of the original data for model training phase to pick the best effective model, which is further used with other remaining 20% data as test data.

4.1 Data description

The stock market (Tadawul) in Saudi Arabia provided the sample data for closing prices. Input for the daily closing prices ranged from August 2011 until December 2019. The sample size of observations is 2026 [30, 33]. Table 1 explores the descriptive statistic of the dataset.

thumbnail
Table 1. Descriptive statistic for the Saudi stock market dataset.

https://doi.org/10.1371/journal.pone.0278835.t001

LSCS refers to the logarithm of standard deviation for closing stock prices, which can be expressed as , where x is closing stock price. Likewise, Repo and Loil represent the repo rate and logarithm of oil prices, respectively.

The mean and standard deviation of LSCS are 6.75 and 0.69, respectively. The skewness and kurtosis values of LSCS are -2.10 and 4.26, respectively. It should be note that the mean and standard deviation of Repo are 0.70 and 0.28 whereas the skewness and kurtosis values of Repo are 2.00 and 22.80, respectively. The mean and standard deviation of Loil are 4.30 and 0.35, correspondingly. The skewness and kurtosis values of Loil are -0.18 and -1.11, respectively. The graph representation of the datasets over days are shown in Fig 2.

thumbnail
Fig 2. Data description over days (a) LSCS LSCS; (b) Loil Loil; (c) Repo Repo.

https://doi.org/10.1371/journal.pone.0278835.g002

4.2 Endogeneity issues

This section discusses how to choose appropriate variables by eliminating multicollinearity, multiple regression analysis, and causality test.

4.2.1 Correlation.

In this section, we carefully picked independent variables from various other variables, which are eliminated based on certain test. First, we removed variables as a result of multicollinearity among independent variables as shown in Table 2. The absence of perfect multicollinearity, an exact (non-stochastic) linear relationship between two or more independent variables, is generally referred to as no multicollinearity. We extracted some variables from independent variables according to the strong relation with other independent variables. Table 2 gives the significant correlations between the independent and the dependent variables.

thumbnail
Table 2. The significant correlations between the selected variables.

https://doi.org/10.1371/journal.pone.0278835.t002

4.2.2 Engle and granger causality test.

The Engle and Granger test uses co-integration to illustrate the causal relationships. It produces residuals (errors) depending on static regression. Using an Augmented Dickey-Fuller test or another similar test, the test uses residuals to see whether unit roots are available. If the time series is co-integrated, the residuals would be almost stationary [37]. where Y is the dependent variable, X is the independent variable set, ECT is the word for error correction where as β, α, and are the parameters. The null hypothesis for the Engle Granger test (H0: There is no cointegration) is rejected if ϕ is is negative or higher than 1.96, the Engle Granger test’s null hypothesis (H0: There is no cointegration) is rejected. The null hypothesis is rejected, indicating that independent factors influence dependent variables.

In Table 3, an explanation of the dependent and independent variable Engle and Granger test is provided. According to the results, there is co-integration between independent and dependent variables at a significant level of 5%. This result almost implies that the independent factors are the ones responsible for the dependent variable.

4.2.3 The regression analysis.

Regression analysis is important for statistics because it may help researchers identify the variables that are most important, the ones they can ignore, and the relationships between those variables. Forecasting and determining the causal connection between variables are both done using also regression analysis. LSCS is the dependent variable, whilst Repo and Loil are the independent variables that are employed to forecasted LSCS in this study. In Table 4, the multiple regression analysis is presented. The Repo rate and Loil are important at 1 percent. Additionally, R-square and modified R-square are close to 45%, which indicates that the independent variables may contribute for 45% of the dependent variable, The linear regression model is more appropriate to the data, according to an F-statistic of 1 percent.

Oil prices and volatility risk, which calculates the standard deviation of closing stock prices, have a negative association. This suggests that the stock market volatility will decrease as oil prices rise. The repo rate, on the other hand, has a favorable association with volatility risk. This suggests that the volatility risk would rise by around 26% as a result of the increase in repo rate. In Fig 3, the residuals approximately match with the diagonal line. These residuals seem to be dispersed normally.

thumbnail
Fig 3. The multiple regression analysis (b) Loil Residuals vs Fitted; (c) Repo Normal Q-Q.

https://doi.org/10.1371/journal.pone.0278835.g003

5 Results and discussion

The data from Tadawul’s closing prices are examined in this study. It is selected for many reasons. The history of stock market volatility in emerging economies is interesting. The Saudi market serves as an illustration of how enormous volatility may result from informational imbalances, irrational trading, and inexperienced financial analytics. Additionally, investors from nations outside of the Gulf Cooperation Council (GCC) are not permitted to purchase Saudi Arabian securities. The biggest city in the Middle East is Tadawul. Therefore, the volatility data is decomposed using MODWT with La8 function as shown in Fig 4.

thumbnail
Fig 4. Decomposing the data using MODWT based on La8 function.

https://doi.org/10.1371/journal.pone.0278835.g004

In the beginning, the closing price data have been processed with logarithm standard deviation to find LSCS. Next, MODWT has been used to decompose the LSCS data using R-statistic software. The MODWT technique split the LSCS data into two groups, namely, low fluctuated data(approximation coefficient) and high fluctuated data(details coefficient). The main features of the data refer to the approximation coefficient for each function, which is used as an output in the forecasting model. Haar, Daubechies (Db), least Asymmetric (LA-8), Coiflet (C6), and the best-localized function (BL14) are the five MODWT function which have been used. As a result, LA-8 is the most efficient MODWT function (see Table 5). Fig 4 shows the best MODWT performance when using the LA-8.

thumbnail
Table 5. The WT function of output variable for 80% dataset.

https://doi.org/10.1371/journal.pone.0278835.t005

The MODWT-based decomposition is an effective technique for displaying data fluctuations and significance levels. The decomposition levels can be carried out by the WT using the formula, according to the WT mechanism: X = TV1 + TW1 where the original signal is referred to as X. The next component consists of one level of approximation (TV1) that shows the plot of the transformed data approximation coefficients. The following parts of TW1 reflect the level of detail, whereby TW1 is the plot of the first level of the coefficients of detail, so the fluctuation can be explained by this level. It’s interesting to note that 1620 samples out of a total of 2026 are used to represent 80 percent of the data.

Tadawul has experienced several fluctuations from 2011 until 2019. In 2011, the general index of the stock exchange market declined to 6417.7 points, but it climbed to 8535 points in 2013. To increase the market’s efficiency and efficacy, market management changed the trading mechanism from (SAXESS) to (X-Stream INET), and an interactive multi-user system (IFSAH) was designed [38]. One of the challenges that confronted different economies around the world is the fluctuation of stock prices. Both the domestic and foreign economies have been effected on Tadawul. As a result, the effects of financial crises in other economies are transferred to the domestic economy. Consequently, the global financial crisis in 2008 hit Tadawul heavily [39].

The results of the proposed models applying the first 80% of the dataset are shown in Table 5. The original LSCS dataset are provided by MODWT models. MODWT (LA-8) is the best model based on the comparison because it has a minimum value of 0.0000053, 0.003214, and 0.064497, respectively, for ME, MAE, and MAPE. The FIR.DM model was constructed using Repo and Loil as input variables, whereas the MODWT (LA-8) model used LSCS as an output variable.

Table 6 shows The result of forecasting WT Models with FIR.DM. The table explain the remaining 20% of the original and transformed data with the same proposed models in order to validate our results. The best model is ARIMA MODWT (LA-8) with FIR.DM because it has the lowest ME, RMSE, MAE, and MPE. Similar to the training phase, LSCS is used as output variable whereas Repo and Loil are used as input variables by MODWT to construct FIR.DM and FIR.DM + MODWT models.

thumbnail
Table 6. The WT function of output variable for 20% dataset.

https://doi.org/10.1371/journal.pone.0278835.t006

6 Limitations and directions for future research

This research used MODWT-LA8-FIR.DM to improve volatility prediction of Tadawul (Saudi Arabia’s stock exchange dataset). There are several limitations of the research. (1) we merely use oil price and repo rate as input variables. In the future, we intend to address this restriction utilizing different macroeconomic variables. (2) we consider the Tadawul dataset, intending to expand the experiments to include data from other stock exchange markets, including the New York Stock Exchange (NYSE), National Association of Securities Dealers Automated Quotations (NASDAQ), Shanghai Stock Exchange (SSE), and Hong Kong Stock Exchange (HKSE). (3) the limited daily data are selected from 2011 to 2019 without considering the COVID-19 period. In future work, we will consider COVID-19’s daily stock price data in our upcoming research.

7 Conclusion

We proposed a hybrid model (MODWT-LA8-ANFIS) with gradient decent learning approach. The model is applied on the Saudi Arabian stock exchange (Tadawul) to predict the closing price. According to correlation, multiple regressions, and the Engle and Granger Causality test, we picked the oil price and the repo rate as input values. The result found that the input variables had a weak correlation(r < 50%). On the other hand, there is a strong correlation (r ≥ 50%)between the price of oil and the output variable (closing price). Furthermore, the Engle and Granger Causality test shows that the input variables have been causally related to the output variables. The multiple regression test is used to determine whether the impact is significant. Consequently, the input variables have a level of significance at 5%. The output variable is based on 2026 observations from Tadawul between October 2011 and December 2019. The Saudi Authority for Statistics and the Saudi Central Bank provided the study’s input variables. The MODWT technique divides variables into approximation and detailed coefficients. Haar, Daubechies (Db), least Square (LA-8), Best localization (BL14), and Coiflet (C6) are the five MODWT functions. As a result, the output variable is divided into two parts: details coefficient (highly fluctuated data) and approximation coefficient (lowly fluctuated data) which consists of the main features of data. Our MODWT-LA8- FIR.DM model is built using approximation coefficient data (MODWT-LA8) and input variables. Statistical criteria tests were used to assess the MODWT-LA8-FIR.DM, including mean error (ME), root mean squared error (RMSE), and mean absolute percentage error (MAPE). Traditional models (ARIMA and FIR.DM models) were compared to the MODWT-LA8-FIR.DM model. Traditional models are less accurate than the MODWT-LA8-ANFIS. As a result, the new forecasting model proposed may be applied to other foreign stock markets.

References

  1. 1. Hull J. Risk management and financial institutions,+ Web Site. vol. 733. John Wiley & Sons; 2018.
  2. 2. Khera A, YADAV MP. Predicting the volatility in stock return of emerging economy: An empirical approach. Theoretical & Applied Economics. 2020;27(4).
  3. 3. Chen J, Politis DN. Time-varying NoVaS versus GARCH: point prediction, volatility estimation and prediction intervals. Journal of Time Series Econometrics. 2020;12(2).
  4. 4. Alenezy AH, Ismail MT, Wadi SA, Tahir M, Hamadneh NN, Jaber JJ, et al. Forecasting Stock Market Volatility Using Hybrid of Adaptive Network of Fuzzy Inference System and Wavelet Functions. Journal of Mathematics. 2021;2021.
  5. 5. Hamilton JD. Time series analysis. Princeton university press; 2020.
  6. 6. Darjani N, Omranpour H. Comprehensive Learning Polynomial Auto-Regressive Model based on Optimization with Application of Time Series Forecasting. International Journal of Industrial Electronics Control and Optimization. 2022.
  7. 7. Ghoreishi SA, Khaloozadeh H. Application of Covariance Matrix Adaptation-Evolution Strategy to Optimal Portfolio. International Journal of Industrial Electronics Control and Optimization. 2019;2(2):81–90.
  8. 8. Chen-xu N, Jie-sheng W. Auto regressive moving average (ARMA) prediction method of bank cash flow time series. In: 2015 34th Chinese Control Conference (CCC). IEEE; 2015. p. 4928–4933.
  9. 9. Kim M. Cost-sensitive estimation of ARMA models for financial asset return data. Mathematical Problems in Engineering. 2015;2015.
  10. 10. Banerjee D. Forecasting of Indian stock market using time-series ARIMA model. In: 2014 2nd international conference on business and information management (ICBIM). IEEE; 2014. p. 131–135.
  11. 11. Wang X, Liu Y. ARIMA time series application to employment forecasting. In: 2009 4th International Conference on Computer Science & Education. IEEE; 2009. p. 1124–1127.
  12. 12. Fay D, Ringwood J, Condon M, Kelly M. A comparison of linear and neural parallel time series models for short-term load forecasting in the republic of Ireland. In: Proceedings of the 3rd European IFS Workshop. Shaker Verlag; 2000.
  13. 13. Sharma SK, Ghosh S. Short-term wind speed forecasting: Application of linear and non-linear time series models. International journal of green energy. 2016;13(14):1490–1500.
  14. 14. Santos AAP, dos Santos Coelho L. Neural networks, fuzzy system, and linear models in forecasting exchange rates: comparison and case studies. In: The 2006 IEEE International Joint Conference on Neural Network Proceedings. IEEE; 2006. p. 3094–3099.
  15. 15. Bai XM, Wang CZ. AdaBoost artificial neural network for stock market predicting. In: Proc. Joint Int. Conf. Artif. Intell. Comput. Eng.(AICE), Int. Conf. Netw. Commun. Secur.(NCS); 2016.
  16. 16. Chang PC, Fan CY. A hybrid system integrating a wavelet and TSK fuzzy rules for stock price forecasting. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews). 2008;38(6):802–815.
  17. 17. Homayouni N, Amiri A. Stock price prediction using a fusion model of wavelet, fuzzy logic and ANN. In: International conference on e-business, management and economics. vol. 25; 2011. p. 277–281.
  18. 18. Abiyev RH, Abiyev VH. Differential evaluation learning of fuzzy wavelet neural networks for stock price prediction. Journal of Information and Computing Science. 2012;7(2):121–130.
  19. 19. Miyajima H, Shigei N, Miyajima H. Fast learning algorithm for fuzzy inference systems using vector quantization. Proceedings, pp. 2009;1212:1215.
  20. 20. Ishigami H, Fukuda T, Shibata T, Arai F. Structure optimization of fuzzy neural network by genetic algorithm. Fuzzy Sets and Systems. 1995;71(3):257–264.
  21. 21. Nomura H, Hayashi I, Wakami N. A learning method of fuzzy inference rules by descent method. In: [1992 Proceedings] IEEE International Conference on Fuzzy Systems. IEEE; 1992. p. 203–210.
  22. 22. Nomura H. A method to determine fuzzy inference rules by a genetic algorithm. The Transactions of The Institute of Electronics, Information and Communication Engineers A. 1994;77(9):1241–1249.
  23. 23. Miyajima H, Shigei N, Miyajima H. The ability of learning algorithms for fuzzy inference systems using vector quantization. In: International Conference on Neural Information Processing. Springer; 2016. p. 479–488.
  24. 24. Chen Y, Qiao G, Zhang F. Oil price volatility forecasting: Threshold effect from stock market volatility. Technological Forecasting and Social Change. 2022;180:121704.
  25. 25. Alkhawaldeh RS. Arabic (Indian) digit handwritten recognition using recurrent transfer deep architecture. Soft Computing. 2021;25(4):3131–3141.
  26. 26. Alkhawaldeh RS, Khawaldeh S, Pervaiz U, Alawida M, Alkhawaldeh H. NIML: non-intrusive machine learning-based speech quality prediction on VoIP networks. IET Communications. 2019;13(16):2609–2616.
  27. 27. Verma S. Forecasting volatility of crude oil futures using a GARCH–RNN hybrid approach. Intelligent Systems in Accounting, Finance and Management. 2021;28(2):130–142.
  28. 28. Alhodiry A, Rjoub H, Samour A. Impact of oil prices, the US interest rates on Turkey’s real estate market. New evidence from combined co-integration and bootstrap ARDL tests. Plos one. 2021;16(1):e0242672. pmid:33395440
  29. 29. Murali S, Thiyagarajan S, Gopal N. Oil prices and stock market interplay in Dubai. International Journal of Management Practice. 2021;14(1):107–127.
  30. 30. Jaber JJ, Ismail N, Ramli S, Al Wadi S, Boughaci D. Assessment OF credit losses based ON arima-wavelet method. Journal of Theoretical and Applied Information Technology. 2020;98(09):1379–392.
  31. 31. Rao KR, Kim DN, Hwang JJ. In: Integer Fast Fourier Transform. Dordrecht: Springer Netherlands; 2010. p. 111–126.
  32. 32. Yaacob NA, Jaber JJ, Pathmanathan D, Alwadi S, Mohamed I. Hybrid of the Lee-Carter Model with Maximum Overlap Discrete Wavelet Transform Filters in Forecasting Mortality Rates. Mathematics. 2021;9(18):2295.
  33. 33. Gençay R, Selçuk F, Whitcher BJ. An introduction to wavelets and other filtering methods in finance and economics. Elsevier; 2001.
  34. 34. Zhang P, Wang H. Fuzzy wavelet neural networks for city electric energy consumption forecasting. Energy Procedia. 2012;17:1332–1338.
  35. 35. Adil IH, et al. A modified approach for detection of outliers. Pakistan Journal of Statistics and Operation Research. 2015; p. 91–102.
  36. 36. Hyndman RJ, Koehler AB. Another look at measures of forecast accuracy. International journal of forecasting. 2006;22(4):679–688.
  37. 37. Engle RF, Granger CWJ. Co-integration and error correction: representation, estimation, and testing. Econometrica: journal of the Econometric Society. 1987; p. 251–276.
  38. 38. Alotaibi T, Nazir A, Alroobaea R, Alotibi M, Alsubeai F, Alghamdi A, et al. Saudi Arabia stock market prediction using neural network. International Journal on Computer Science and Engineering. 2018;9(2):62–70.
  39. 39. Shaik A, Syed A. Intraday return volatility in Saudi stock market: an evidence from Tadawul all share index. Management Science Letters. 2019;9(7):1131–1140.