An ensemble approach based on transformation functions for natural gas price forecasting considering optimal time delays

Natural gas, known as the cleanest fossil fuel, plays a vital role in the economies of producing and consuming countries. Understanding and tracking the drivers of natural gas prices are of significant interest to the many economic sectors. Hence, accurately forecasting the price is very important not only for providing an effective factor for implementing energy policy but also for playing an extremely significant role in government strategic planning. The purpose of this study is to provide an approach to forecast the natural gas price. First, optimal time delays are identified by a new approach based on the Euclidean Distance between input and target vectors. Then, wavelet decomposition has been implemented to reduce noise. Moreover, fuzzy transform with different membership functions has been used for modeling uncertainty in time series. The wavelet decomposition and fuzzy transform have been integrated into the preprocessing stage. An ensemble method is used for integrating the outputs of various neural networks. The results depict that the proposed preprocessing methods used in this paper cause to improve the accuracy of natural gas price forecasting and consider uncertainty in time series.


INTRODUCTION
In most financial matters, the current data is usually affected by past data modeled in time series, such as time series of natural gas price. These past data must be reliable to obtain acceptable results. For example, in a sample of time series, the observation value of t−1 st must be specified to predict the value of t. Also, the observation value of t−2 st must be specified to predict the value of t−1 st and so on. The observation value of t−2 is called a time delay for t−1 st period with 1 step.
The contributions of this study are divided into three parts. First, a new approach is presented to identify optimal time delays. In previous studies, the delays were identified based on the experience of experts. On the other hand, gas price is influenced by political, economic, social, etc., factors that include uncertainty in values. In the second contribution, the fuzzy transform with different membership functions is proposed to model the uncertainties in time series. Finally, in the third contribution, the wavelet Bai et al. (2016) integrated wavelet decomposition with a backpropagation neural network (BPNN) for improving the accuracy of air pollutants forecasting. Wavelet decomposition was applied to decompose original data to different levels. Then, each level was considered input of BPNN. Yu et al. (2018) proposed a hybrid approach for short-term wind speed forecasting by the wavelet packet decomposition (WPD), density-based spatial clustering of applications with noise (DBSCAN), and the Elman neural network (ENN). The results showed the WPD-DBSCAN-ENN model was better than the others. Ji et al. (2014) introduced an approach based on the wavelet neural network for short-term forecasting of the parking guidance information systems problem. Also, an integration wavelet decomposition and ANN are presented for daily streamflow forecasting. The results showed WA hybrid models are better than classic ANNs (Guimarães Santos & Silva, 2014). In another research, Dadkhah, Rezaee & Chavoshi (2018) provided an approach integrating clustering and classification algorithms for predicting short-term power output forecasting of hourly operation based on climate factors. Also, they examined the effects of wind direction and wind speed on the accuracy of prediction results. In the following, some studies about using fuzzy models for forecasting time series are presented.
An adaptive fuzzy inference system was proposed for predicting the power plant output by considering various climate factors (Rezaee, Dadkhah & Falahinia, 2019). The authors have been applied metaheuristic algorithms for increasing the accuracy and performance of the designed system. In another study, a hybrid model has been used for chaotic time series prediction. An approach has been proposed for training secondorder Takagi-Sugeno-Kang fuzzy systems using an adaptive neuro-fuzzy inference system (ANFIS) (Heydari, Vali & Gharaveisi, 2016). A similar study for forecasting time series using a robust fuzzy time series model may be found in Yolcu & Lam (2017). In another research, for forecasting air pollution in Turkey, a novel fuzzy time series model has been used based on the Fuzzy K-Medoid clustering (Dincer & Akkuş, 2018). The proposed method handled outliers in time series. Wang, Li & Lu (2018) applied fuzzy logic to forecast air pollution in China. They combined the fuzzy time series forecasting method and data reprocessing approaches for forecasting main air pollutants.
Using new artificial neural networks are another category of a forecasting method.  introduced a semi-heterogeneous approach to forecast crude oil prices. The mentioned approach applied decomposed techniques, including wavelet decomposition, singular spectral analysis, empirical mode decomposition, and variation mode decomposition in the preprocessing stage. The autoregressive, ARIMA, support vector regression (SVR), and ANN have been used to forecast. Finally, the results of forecasting have been integrated by a new combination approach. Agrawal, Muchahary & Tripathi (2019) provided a new ensemble approach based on relevance vector machines (RVM) and boosted trees. In the first stage, the RVM has been used based on Gaussian RBF and polynomial kernels. The performance of the model has been improved by Extreme Gradient Boosting. The outputs of each model have been integrated using Elastic net regression. Rezaee, Jozmaleki & Valipour (2018) provided an approach to predict the stock market using dynamic fuzzy C-means, Data Envelopment Analysis, and multilayer perceptron (MLP). In another study, a novel ensemble method has been introduced for time series forecasting integrating various machine learning models (Adhikari, 2015). Finally, the MLP has been integrated with inverse neural network and mathematical programming for determining optimal daily amounts of fuel consumption used by a power plant .

METHODOLOGY Wavelet transform
In computational and functional analysis, a discrete wavelet transform (DWT) is a general WT for which the wavelets are discretely sampled. In this paper, a type of DWT family known as Daubechies with order 18 (Daubechies-18) has been used in the preprocessing stage (Han, Kamber & Pei, 2012).
where a ¼ 2 j is the scale Parameter and k is the time-translation parameter. Therefore, the discrete wavelet is expressed as follows: It gives the range of j and k as 0 < k < 2 JÀj À 1 and 1 < j < J.

Fuzzy transform
The basic idea of fuzzy transformation is to transform an original space into a special space for discovering information. It can use a continuous function on a fixed distance, a; b ½ 2 R (transforming the fuzzy space of functions on an open space of real numbers by a direct transformation into the real n-dimensional vectors). Their inverse conversion returns the derived n-vector to the initial or approximate function of it. Let x 1 < x 2 < Á Á Á < x n be fixed nodes in [a, b], such that x 1 ¼ a; x n ¼ b and fuzzy sets A 1 ; A 2 ; . . . ; A n have their membership functions A 1 x ð Þ; A 2 x ð Þ; . . . ; A n x ð Þ defined on [a, b], form a fuzzy partition if they satisfy the following conditions (k ¼ 1; 2; . . . ; nÞ : for all x 2 0; h ½ ; k ¼ 2; . . . ; n À 1 where all x 1 ; x 2 ; . . . ; x n are equidistant, i.e., where h is the length of support of A 1 … A n and 2h is the support of other basic functions A k , k ¼ 2; 3; . . . ; n À 1 in case of uniform partition. The membership functions A 1 ; A 2 ; . . . ; A n are called basic functions. Let C([a; b]) be the set of continuous functions on the domain [a, b], A 1 ; A 2 ; . . . ; A n are basic functions on [a; b] and let f be any function in C([a; b]). Then n-tuple vector of real numbers of ½F 1 ; F 2 ; . . . ; F n is defined by Eq. (4): Equation (4) is called the F-transform f concerning A 1 ; A 2 ; . . . ; A n . In case of discrete F-transform, f be given at nodes P 1 ; P 2 ; . . . ; P l and A 1 ; A 2 ; . . . ; A n ; n < l are basic functions which form a fuzzy partition on [a; b], then, ½F 1 ; F 2 ; . . . ; F n are given by the following formula: One of the most important issues in fuzzy systems is membership functions. These functions specify the membership value of variables. They can be defined as either discrete or continuous. If a defined variable to be is a discrete variable, the discrete membership function is used. If the variable being defined is continuous, the continuous membership function is defined. For continuous variables, membership functions may be either mathematical or real analytic functions. The fuzzy transform functions used in this research are provided in Table 1.

Multilayer perceptron
MLP is the most famous ANN applied in time series forecasting problems. A multilayer perceptron is essentially a feed-forward structure of the entrance, one or more hidden layers, and one output layer. There are some nodes on each layer that are linked through irreversible links to a forwarding layer. The relationship between inputs and outputs in a multilayer perceptron network is given by the following formula: where a j and b ij i ¼ 1; 2; . . . ; p; j ¼ 1; 2; . . . :; h ð Þ are the alliance weights, a 0 and b 0j are the bias, and F, G are the hidden and output layer activation functions, respectively. In the MLP, each input data (y 1, y 2 , y t−p ) is first multiplied in relative weights (β 11, β 21, β p1 ), then it is added by β 01, multiple in a j and enters into the transfer function (F). The output of the transformation function is found by multiplying the weights with their corresponding inputs, adding the bias (a 0 ), and entering into the activation function (G). This calculation will keep on until j ¼ h . h is the number of hidden layer neurons. The most common activation function in the MLP is the sigmoid function.

Radial basis function
RBF operates radial-base function as the activation function. The output of this network is a linear mixture of radial-base functions for input parameters and neurons. RBF networks are used in function approximation, time series forecasting, classification, and control of the system. RBF networks have a three-layer structure. The initial layer is described as the input layer, the next layer is described as the hidden layer, and the latest layer is described as the output layer. The RBF describes the potential of jth neuron in the layer on the ground of varying of the Euclidean distance obtained by the vector: where x is the k-dimensional input vector, and w j represents the weights in the hidden layer. The RBF neural network uses varying species of activation functions famous on the ground of Gaussian or RBF. The activation function for jth hidden neuron is described as: where r 2 j is the variance of the inward time series. kx À w j k is the Euclidean norm, w j and r stand for the center and variance of Gaussian function, respectively.
If the time series data in the entrance vectors are not erect, hence the activation function makes Eq. (13).
where P À1 is the reverse of the variance-covariance matrix for the entrance vectors of time series.

Group method of data handling
Group method of data handling (GMDH) has some of the layers, and each layer has a number of neurons. All neurons utilize contiguous structures in which have two entrances Table 1 The membership functions used for considering uncertainty in time series.
and one output. Without loss of generality, suppose for each neuron, five weights, and one bias to bring about processing function between input and output time series. Therefore, we have: where i ¼ 1; 2; 3; . . . ; N ð Þ , N is the number of input and output data K ¼ 1; 2; 3; . . . ; C 2 m À Á ; a; b 2 1; 2; 3; . . . ; m f g ; m is the number of the preceding layer weights computed by MSE and therefore obvious and stable values are substituted in each neuron. The number of candidates of new neuron foundation is calculated by The GMDH algorithm is capable of constructing models for complex systems with a high degree of regression. The criterion for selecting the optimal neural network model is to minimize the relationship (15). A model with less prediction error is considered a desirable model. In the relationship (15), n is the number of observations.
Adaptive neuro-fuzzy inference system ANFIS was developed in 1990. Fuzzy Inference System (FIS) is based on the nonlinear approach that delineates the input/output relationships of an actual system using a set of if-then rules. In this neural network, the Takagi-Sugeno species FIS has been used. The output of each rule is a linear mixture of entrance variables. The ultimate output is the average weighted output of each rule. Theoretically, ANFIS models include five primary sections: Input and output database, a fuzzy model creator, an FIS, and a consistent ANN. ANFIS eliminates the initial problem in the design of the fuzzy system, the parameters of the membership function, and the design of the if-then rules. The ability to train ANNs are used for creating automated fuzzy rules and parameters optimization.

Support vector regression
Support vector machines (SVMs) were first used to solve classification problems. Then, it was developed for regression problems in 1992. Support vector machines associated with modeling and prediction are called support vector regression. In the SVR, the activation functions are first mapped on the input vectors, and the kernel functions perform (.) product between the outputs of these mapped vectors. The original SVR formulation is: where f x ð Þ is the evaluated model output, w is a weight vector, and b is a bias period. The vector w is a component of the characteristic space of the issue. Hence the issue is to select the straight nonlinear maps f x ð Þ, and the fitting for excellence RBF is one of the grossest used nonlinear kernels for SVR models and is performed in this paper. In SVR, first, the input support vectors (x 1 , x 2 , x N ) and x test are considered inputs of the activation functions. Then, activation functions are mapped on the input vectors. The output of the activation function for each input vector is multiplied by the output value of the x test activation function, and each result is multiplied by the corresponding weight (a 1 , a 2 , …, a N ) and enters into the transformation function. The output of the transformation function in the form of a linear combination is aggregated with a bias (b), and the output is derived from the SVR model.

Ensemble forecasting model
Preceding studies have shown which combination models commonly lead to distinct forecasts. An ensemble method is referred to as multiple forecasts and handles linear or nonlinear integrations of the forecasting results, direct to a coherence forecast. This research checks an ensemble method to combination forecasting of natural gas price, where the forecasts are established from five models, including decomposition and transformation techniques. If there are m types of forecasting models for solving a special problem, the outcomes of them are added together. Suppose that: where Combination model is the forecast combination of each model, the output of the forecast combination model based on five individual models can be displayed as: Chu et al. (2006), according to a study at Stanford University, computed and evaluated the complexity of various Machine Learning algorithms on Multicore CPUs. Table 2 shows the complexity of the algorithms used in this paper based on the mentioned study (updated in some cases). The complexities may be changed according to the number of layers, the number of neurons, epochs/iterations, or other hyperparameters of each algorithm.
Where m ,n, p, and l are the number of data/batches, the number of neurons in each layer, number of features, and the number of rules, respectively.

PROPOSED APPROACH
Time-series data of natural gas price are gathered, which have 3,720 records, with daily data from 2004/09/09 to 2019/06/10, which were obtained from http://finance.yahoo.com. The descriptive statistics including the measures for natural gas daily prices: Minimum = 1.49, Mean = 4.6451, Maximum = 15.39, Median = 3.86, SD = 2.3406, Skewness = 1.5064, Kurtosis = 5.3989. Time delays play an important role in increasing forecasting accuracy. Determining the optimal time delay has a great impact on the training process of models in time series forecasting. If optimal time delays are not selected, the training process is hampered, and neural networks perform the learning process with low accuracy. On the other hand, overfitting occurs when the network does the learning process more than necessary and the network is not able to forecast appropriately new data.
For handling this problem, the models should adjust hyper-parameters. The values of all hyper-parameters such as learning rate, epochs, the number of inputs membership functions in ANFIS, the number of clusters in ANFIS, max layer neurons and alpha parameters in GMDH, hidden layer size in MLP, spread and max neurons in RBF, and epsilon and sigma in SVR are determined by various methods proposed in articles about this issue. Various values are examined to obtain the best conclusions in terms of the accuracy and performance of ANNs for test data (unseen data).
Therefore, in the preprocessing stage, a new method has been proposed to identify optimal time delays for time series of natural gas price. The steps of the new approach to identify optimal time delays of time series are expressed as follows: Step 1: Determining input and target vectors for each delay.
Step 2: Normalizing input and output vectors.
Step 3: Calculating the Euclidean distance between input and output vectors for each delay.
Step 4: Calculating the ratio between the consecutive distances in pairs.
Step 5: Determining larger ratios as optimal delays. Then, wavelet decomposition is used to study natural gas prices and reduce noise and increase forecasting accuracy. The mentioned time series data were decomposed by wavelet decomposition into four levels as the best number of levels (w1, w2, w3, w4).
To model the uncertainty, the fuzzy transforms with different membership functions are used. Time-series are inherently uncertain which the fuzzy approach is used for importing uncertainty to the system. In a fuzzy system, membership functions play an important role. Therefore, in this study, five different and well-known membership functions are used in a fuzzy transform called Triangular type 1, Gaussian, Trapezoidal, Triangular type 2, and Bell-shaped. F-transform is applied for decomposing the original time series, and therefore, the decomposed data are used as inputs (F1, F2, F3, F4, F5, F6). Figure 1 shows the results of the fuzzy transformation with the Triangular type 1 membership function for the time series of natural gas prices. Figures 1A-1F show the different levels of inputs based on the Triangular type 1 function.
Afterward, the decomposition techniques and combinations are integrated with MLP, RBF, GMDH, ANFIS, and SVR. The efficiency and accuracy of the neural networks are evaluated by the RMSE and R (correlation coefficient). In summary, this study uses wavelet decomposition and fuzzy transform in the preprocessing phase. Also, MLP, RBF, GMDH, ANFIS, and SVR are applied for forecasting the natural gas price. Finally, an ensemble method combines the results. These steps are shown in Fig. 2.

DATA AND RESULTS
Energy is supplied by various carriers such as oil, gas, etc. According to the effective role in the development and economics of countries and due to the limited resources, price forecasting is an essential tool for the short-term energy market. In this study, for all models, 70% and 30% of data are considered the train and test data, respectively. Also, optimal time delays are identified according to the new method presented in the previous section. According to the calculations, delay 1 has the largest ratio and is considered the optimal delay. Then, other delays with higher ratios are added to the delay 1 respectively, until there is no improvement in the neural network performance. Finally, delays 1, 2, and 5 are considered optimal delays for the natural gas price time series. The identified delays are rational because the current day price is related to one, two, and five working days ago. The same delays are used in all calculations for the gas price time series.
First, the original time series is considered the input of neural networks without wavelet decomposition and fuzzy transformation, with time delays of 1, 2, and 5. In this study, the correlation coefficient (R) is a criterion that indicates the severity of the relationship as well as the type of relationship (direct or reverse) between the two set variables (targets and outputs). According to the results, the RBF performs the best result for the original natural gas price time series. According to the results in Fig. 3, the root-meansquare error (RMSE) of the RBF for time series is 0.1483, and the correlation coefficient is 0.95731.
In the next examination, the wavelet decomposition and various neural networks are integrated.
In Table 3, the original time series of natural gas price is decomposed by wavelet decomposition into four levels. The levels (w1-w2-w3-w4) are considered the input of neural networks. Table 3 shows the results of implementing neural networks for four levels. The sum of noises that each wavelet level has derived from the original data is calculated. The sum of noises for Levels 1 to 4 is equal to 203.6893, 280.7320, 330.8307, and 435.7385, respectively. Also, the sum of the cumulative noise taken up to Level 4 is 1251. In Table 3, 20 combinations are evaluated. The W4-MLP combination has the best performance. However, its performance is very little different from the performance of other networks. According to Fig. 4, the root-mean-square error of the W4-MLP is 0.061674, and the correlation coefficient is 1.  For reducing noise and increasing the accuracy, the natural gas price time series was decomposed by wavelet transform into four levels. According to Table 4, it should be determined which of these levels are more similar to the original data. In this study, to examine the behavior and performance of the presented algorithms, the noise between original and decomposed data has been calculated and presented in Table 4. Calculating noise between original data and decomposed data has been used by two criteria (RMSE and R). Less root-mean-square error and high correlation coefficient mean less difference between the original data and decomposed data. Of course, it does not mean which the decomposed data inside itself has less noise and swing. Table 4 compares the calculated values by RMSE and R. It depicts W1-ANFIS has the most similarity to the original time series. In other words, the W1-ANFIS has the least noise toward the original data and has a closer behavior to it. Also, W1-MLP, W1-GMDH, and W1-RBF have similar behavior to the original time series. In the next stage, the fuzzy neural networks with different membership functions are used (see Fig. 2). According to this figure, first, the original time series is decomposed into six levels by fuzzy transform with five membership functions, then the fuzzy levels (F1-F2-F3-F4-F5-F6) are considered inputs of each model.
In Table 5 (second and third columns), the Triangular type 1 membership function is used. They show the results of implementing neural networks for six fuzzy levels with Triangular type 1 membership function. In these columns, 30 combinations are considered. According to the mentioned columns, the MLP has the best performance for Level 6 of the fuzzy transform. The root-mean-square error of the MLP for Level 6 is 0.00010555, and the correlation coefficient is 0.99949 (see Fig. 5).
In the fourth and fifth columns of Table 5, the Gaussian membership function is used in the fuzzy transform structure. They represent the results of implementing neural networks for six fuzzy levels with the Gaussian membership function. In these columns, Figure 4 The results of the MLP neural network for Level 4 obtained from the wavelet decomposition.
Full-size  DOI: 10.7717/peerj-cs.409/ fig-4 30 combinations are evaluated. According to the results, the SVR has the best performance for Level 1. According to Fig. 6, the root-mean-square error of the SVR for Level 1 is 0.0002834, and the correlation coefficient is 0.95448. In the sixth and seventh columns of Table 5, the Trapezoidal membership function is used in the fuzzy transform structure. They depict the performance of neural networks for six fuzzy levels with the Trapezoidal membership function. In these columns, the time series is forecasted based on 30 different combinations. According to the results, the ANFIS has the best performance for Level 6. Figure 7 presents the root-mean-square error and the correlation coefficient for Level 6 with values 0.00063896, and 0.9625 respectively.
According to the eighth and ninth columns of Table 5, the Triangular type 2 membership function is used in the fuzzy transform structure. They show the results of implementing neural networks for six fuzzy levels with Triangular type 2 membership function. In these columns, 30 different combinations are evaluated. SVR has the best performance for Level 3. According to Fig. 8, the root-mean-square error of the SVR for Level 3 is 0.00016989, and the correlation coefficient is 0.95957.
According to Table 5 (10th and 11th columns), the Bell-shaped membership function is used. These columns present the results of implementing neural networks for six fuzzy levels with the Bell-shaped membership function. According to this table, 30 combinations are considered. The SVR has the best performance for Level 3 of the fuzzy transform.  Figure 9 shows the root-mean-square error and the correlation coefficient for Level 3 with values of 0.000046212 and 0.98144. In the final stage, the combination of wavelet decomposition and fuzzy transformation is done. First, the original time series is decomposed by wavelet transform into four levels. Then, the levels decomposed by wavelet are considered the input of fuzzy transform with different membership functions and are subdivided into six levels. Continuing, these compounds are considered the input of neural networks.
In each wavelet-fuzzy combination with a distinct membership function, 120 different combinations are evaluated. Table 6 presents the best combination in terms of performance and accuracy. In this stage, 600 different combinations are made to forecast the natural gas price. In the combination of wavelet and fuzzy transform with the Triangular type 1 membership function, W1-F5-RBF is the best performing among 120 compounds. Combined with the Gaussian, Trapezoidal, Triangular type 2, and Bellshaped membership functions, W4-F5-MLP, W2-F1-MLP, W4-F5-ANFIS, and W4-F6-ANFIS have the best performance, respectively. By comparing the results of decomposition techniques, the wavelet decomposition performed better result than others. Then, the fuzzy transform with the Triangular type 1 membership function has the best performance. Similarly, the combination of wavelet decomposition and fuzzy transform separately, the Triangular type 1 and the Bell-shaped membership functions have appropriate performance. In Table 7, in the single model, the original time series is considered the input of neural networks. In this state, no decomposition technique is used. The results of the single model for each network are presented in Table 7. Then, decomposition techniques are applied. In DWT, the first original data is decomposed into four levels. These four levels are integrated and considered the input of neural networks. The results of the DWT are presented in Table 7. In the fuzzy transform, the first original data is considered the input of fuzzy transform with five membership functions. The original data is decomposed into six levels by the fuzzy transforms with various membership functions. Thirty levels are obtained from fuzzy transforms with five membership functions, and they are integrated  Table 7. According to Table 8, in the single model, original data is considered input of neural networks. Then, the output of neural networks is calculated, and these output vectors are integrated. According to the obtained output and target, the root-mean-square error and the correlation coefficient are calculated. In the combined model, different levels are integrated and considered input of neural networks. Then, the outputs of neural networks are calculated and integrated. According to the obtained target and output, the root-mean-square error and the correlation coefficient are calculated (see Table 8).
The results and accuracy of forecasting are improved when the decomposition techniques are used. In the single model, five levels are integrated, but in the combination model, 154

LIMITATIONS
Despite the good results that the proposed ensemble approach theoretically provides, in practice, price time-series forecasting is faced with challenges that limit their use and, in some cases, make it impossible. Algorithms use methods such as violet, which not only removes noise in the data but also changes the dimensionality of the data. Prices in the real world are accompanied by noise, so removing noise can change the data's nature.
Although the use of fuzzy transformations reduces the accuracy of the models, it preserves the data's nature. Also, time-series data are strongly correlated with delay 1 (see related figures in "Data and Results"). It causes the forecasting to be done with one delay phase.
To remove this problem, the smoothed price, such as moving average or exponential moving average may be used instead of the original price.

SUMMARY AND CONCLUSION
In this research, a new approach is presented to identify optimal time delays. Also, for removing the noise from data, the wavelet decomposition has been used. On the other hand, for modeling uncertainty of gas price time-series that influenced by political, economic, social, etc., factors, the fuzzy transform with different membership functions is used. Afterward, the wavelet decomposition and fuzzy transformation are combined for improving the accuracy of forecasting and modeling uncertainty. Finally, the best-selected models are integrated as an ensemble method for obtaining the best accuracy and performance. In this paper, for the time-series of natural gas prices, optimal time delays (1, 2, and 5) were identified by a new proposed approach. Then, wavelet decomposition and fuzzy transform with different membership functions are used in the preprocessing stage. In this study, four models were presented to forecast the natural gas price. In the first model with the input of the original time series, RMSE and R of RBF were 0.1483 and 0.95731 respectively as the best performance. In the second model or wavelet neural networks, the values of RMSE and R of W4-MLP have been obtained 0.00061674 and 1, respectively, as the best performance. In the third model or fuzzy neural networks, the values of RMSE and R of F6-MLP with Triangular type 1 membership function were 0.00010555 and 0.99949, respectively, as the best performance. In the fourth model or wavelet-fuzzy neural networks, the values of RMSE and R of W1-F5-RBF combination have been obtained 0.00011825 and 0.99875, respectively, as the best performance. Finally, by integrating the forecasting results for the original time series of natural gas price, the values of RMSE and R were 0.75907 and 0.95513, respectively. The combination of the forecasting results for time series decomposed by wavelet and fuzzy transforms has been performed, and the values of RMSE and R have been calculated 0.966634 and 0.99803, respectively. All four models, in terms of RMSE and R, are favorable but comparing the four models, the second model considering the correlation coefficient, performed better than the other three models.

ADDITIONAL INFORMATION AND DECLARATIONS Funding
The authors received no funding for this work.