Hybrid wavelet-neural network models for time series

The use of wavelet analysis contributes to better modeling for financial time series in the sense of both frequency and time. In this study, S&P500 and NASDAQ data are separated into several components utilizing multiresolution analysis (MRA). Subsequently, using an appropriate neural network structure, each component is modeled. In addition, wavelets are used as an activation function in long short-term memory (LSTM) networks to form a hybrid model. The hybrid model is merged with MRA as a proposed method in this paper. Four distinct strategies are employed: LSTM, LSTM+MRA, hybrid LSTM-Wavenet, and hybrid LSTM-Wavenet+MRA. Results show that the use of MRA and wavelets as an activation function together reduces the error the most.


Introduction
Plenty of time series is non-stationary and chaotic in many fields.For such complex time series, nonlinear models are more https://doi.org/10.1016/j.asoc.2023.1104691568-4946/© 2023 The Author(s).Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).adequate than linear models (see [1]).Considering the wide variety and adaptability, neural networks are among the most used and widespread nonlinear models.
The performance rankings of the methods used in the related literature may vary according to the data type and parameter selection [4,5,14,15,18,27,35,36].For this reason, instead of finding the best model between classical or state-of-the-art methods, the study focuses on the question of whether the LSTM model, which is one of the state-of-the-art methods, can be improved.While doing this, benchmarking is accomplished by using several other methods as well.
In a very large variety of fields, wavelets are used and their scopes are growing day by day.For complex time series, the frequency domain is crucial for most situations.As the data gets more complex, examining it in both time and frequency domains brings improvements in data analysis, modeling, and prediction [1,37,38].Moreover, for both linear and nonlinear time series, multiresolution analysis (MRA) gives better modeling and forecasting results [1].
In this paper, making better predictions for stock market indexes is investigated to guide buying and selling attitudes of investors.Furthermore, several up-to-date applied studies and competitions on online platforms comprise traditional and advanced machine learning topics.Therefore, this study contributes to both literature and online studies by using MRA and wavenet with LSTM networks.
Wavenet is a type of wavelet neural network (WNN), where the translation and the dilation parameters are not trainable parameters.In other words, these parameters do not change in the activation functions.In this research, activation functions are generated using the polynomial powers of sigmoid (PPS) and model accuracy is increased by the use of MRA and LSTM.
Developing models by taking advantage of both neural networks and wavelets is a method used by many researchers.Jothimani et al. [39] synthesize wavelets and nonlinear models to predict the stock market data.They show that the accuracies of hybrid models are higher than the accuracy of classical models.Similarly, the study of Chandar et al. [40] also illustrates that utilizing wavelets with neural networks increases the accuracy of the stock market data.
Jin and Kim [41] make predictions about natural gas prices using the ARIMA model and ANN.They conclude that using wavelet with ANN and ARIMA gives better results than using only ANN and ARIMA models respectively.
A wavelet-based neural network structure for two deep learning models in time series classification and forecasting is studied by Wang et al. [42].Results again show that hybrid techniques improve the outcomes of the models without wavelets.
Arévalo et al. [43] specify that using wavelet analysis for highfrequency financial data increases the accuracy of each ARIMA, deep neural network (DNN), gated recurrent unit (GRU), and LSTM methods.
Some studies practice wavelets only for data decomposition to use sub-series as normal neural network inputs [45][46][47][48][49][50], while some works utilize wavelets just for activation functions of WNN [51][52][53][54][55].Both MRA and WNN provide great benefits in many fields.Therefore, the motivation of this paper is to benefit from wavelet theory from multiple directions to increase the accuracy of the LSTM model for time series prediction.This paper aims to bridge the gap between these two different approaches by combining them.
The paper is organized as follows.As preliminary knowledge and the fundamental concepts used in this paper and the essentials of the LSTM are provided in Section 2.1.Further in Section 2.2, the significant technical points in using wavelets are addressed; maximal overlap discrete wavelet transform (MODWT) and MRA are briefly explained.Finally, in Section 2.3, the essential details of PPS are listed, as the wavelets used in the empirical analysis are derived from PPS.In Section 3, the LSTM and wavelets are used to model S&P500 and NASDAQ financial time series using four different methods: LSTM model without MRA, LSTM model with MRA, hybrid LSTM-Wavenet model without MRA, and hybrid LSTM-Wavenet model with MRA.The results obtained by these methods are fully discussed in Section 4. In addition, the results of classical statistical and machine learning models are compared with the proposed approach.A brief conclusion as well as a possible extension of the methodology is given in Section 5. Finally, Appendix contains the configurations used by the models used in the paper.

Preliminaries and the fundamental concepts
In this section, a short preliminary knowledge of LSTMs as well as wavelets is given; however, for the methodology used in the paper to be understood well enough the fundamentals of MRA as well as the WNN are also recalled briefly.

LSTM
LSTM, which is a specific type of RNN, is used in this study for financial time series analysis.Because financial data has fluctuations for both short and long time intervals, the network structure needs to own various memories for different time gaps.
Hochreiter and Schmidhuber [56] list several advantages of LSTM, while Viswanath [57] describes differences between ANN, RNN, and LSTM.The mathematical formulation of RNN can be described by where h t , x t and y t are hidden state, input, and output vectors; W IH , W HH and W HO are the weight matrices; f H and f O are the activation functions for hidden and output parts, respectively.RNN would have vanishing or exploding gradient problems; however, LSTM could solve these obstacles by appending additional parts like the input gate, the forget gate, and the output gate.Therefore, LSTM would be a more suitable choice for time series modeling and prediction.
In Fig. 1 one unit LSTM is illustrated; particularly equations for estimating the output h t of the memory cell at time t are given as follows: where x t is input vector to the LSTM at time t, the W 's and U's are weight matrices of the input and recurrent connections, the b's are bias vectors, h t in (8) is the output vector of LSTM cell, C t in (6) and Ct in (5) are state and candidate state vectors respectively, F t in (3) contains the forget gate values, I t in (4) includes input gate values and O t in (7) covers output gate values.For further details of LSTM and how it works, one could refer to [58].

Wavelets
A signal can be split into high-frequency and low-frequency components in appropriate time intervals with the aid of the wavelet transform.Since financial time series have a highfrequency for short time lengths and a low-frequency for lengthy time periods; apparently, the use of wavelets should be appropriate and of great help.

MODWT
In theory, it is more common to apply continuous wavelet transform (CWT) for continuous functions but not for discrete signals as Masset remarks in [37].With this, MRA with sampled wavelets is described by using MODWT.
MODWT has some advantages as opposed to DWT.MODWT does not demand dyadic length time series where DWT does.In other words, DWT restricts the data for owning a length of N = 2 J where J is the scale level.The sizes of wavelet and scaling coefficients are equal to the original time series's length at every step of the transform for MODWT.Besides, MODWT is timeshift invariant while DWT is influenced by time-shifting as stated in [59].Therefore, the variance analysis (i.e., scale-based analysis of variance) of MODWT is more productive than the variance analysis of DWT as mentioned in [60].
Let w hold wavelet and scaling coefficients of the MODWT, where the length of w j (wavelet coefficients) is N/2 j and the length of v J (scaling coefficient) is N/2 J ; these are consistent with scale sizes λ j = 2 j−1 and λ J = 2 J−1 , respectively for j = 1, 2, . . ., J. The vector w is obtained by using high-pass and low-pass filters as mentioned in [59], and where W = [W 1 , W 2 , . . ., W J , V J ] T is (J +1)N ×N matrix in which each W j and V J are N × N matrices and ξ is the underlying time series (signal).
High-pass and low-pass filters are convolved with the time series to obtain wavelet and scaling coefficients of the first level as follows: where t = 0, 1, . . ., N − 1 and ṫ = t − l (mod N).Wavelet and scaling coefficients of the second level are captured by convolving v 1 (t) with the high-pass filter hl and low-pass filter gl .After J = log 2 N convolution iterations, wavelet and scaling coefficients become: where ṫ = t − 2 J−1 l (mod N).
As (11) and (12) are convolved with high-pass filter and lowpass filter respectively, the scaling coefficient of the previous level is received by summing the two convolving parts [59]: This scheme is iterated up to the first level of wavelet and scaling coefficients to get the original time series, written in the form, where ṫ = t + l (mod N).

MRA
MRA is a sequence of closed nested subspaces , the spaces V 's are self-similar, 6. V j+1 = V j ⊕ W j where the W j is the jth resolution level and V j ∩ W j = {0}.
Scaling ϕ j,k and wavelet ψ j,k functions generate bases for V j and W j subspaces by applying scaling and translation parameters respectively: After defining MRA by utilizing subspaces in ( 15) and ( 16), L 2 (R) space or any function in it can be written as a direct sum of V 0 and W j subspaces, since Now, it should be clear that any function f can be given in the form of wavelet series expansion: in which the first sum with the scaling function represents the smooth part and the second sum represents the detailed parts of the function.The following integrals are used to find the coefficients of the expansion of f in (18): In practice, the scale level J is chosen to be finite, and hence the function in (18) can be written as a time series as follows: Using the celebrated Mallat's pyramid algorithm [64], MODWT divides (20) into smooth and detailed parts where the component A J,k keeps the average information (or trend) of the original data at the largest scale and is associated with the scaling coefficients.Components D j,k 's, from the first scale to the last, are linked with wavelet coefficients.They are implemented for accumulating higher frequency information [37].

WNN
The goal of this part is to benefit both wavelets and neural networks for better modeling of the time series.For complex data, such as financial time series, the WNN is functional.
Traditional activation functions such as logistic, hyperbolic tangent, rectified linear unit (ReLU), softmax, etc., or some custom linear/nonlinear activation functions are used in neural networks.
On the other hand, WNNs' activation functions consist of wavelets in the hidden layer neurons.The hidden layer neurons in WNN are named wavelons [65][66][67].Wavelons have two parameters, which are termed translation and dilation.The single wavelon is given by where u is the translation (or the location) and v is the dilation (or the scale) parameter.
Wen et al. [68] state that since wavelets are quickly vanishing functions, it is important not to select too small dilation parameters.Moreover, Radhwane and Bereksi [69] point out that random initialization of the translation and dilation parameters may result in too local wavelets.

Wavenets
Veitch [70] states that if the translation and the dilation parameters given in (22) are fixed for the learning process, then the network is called wavenet.In this study, wavenets are utilized.
Parameters specified in ( 22) may alter for each node in the hidden layer.These parameters need to be initialized according to the data (see Section 3).In this study, the wavelet functions ψ's are acquired from PPS. Marar and Bordin [71] state that there are many limitations in the traditional backpropagation algorithm, and hence, a family of polynomial wavelets generated from powers of sigmoid functions is used to eliminate these limitations.

PPS
Fernando et al. [72] declare that a group of polynomial wavelets created from the powers of sigmoid provides robust neural networks, particularly WNNs.Consecutive powers of sigmoid functions are used to produce polynomial types of wavelet functions so that the square integrability and admissibility conditions, are satisfied, where ω is frequency and C is the Fourier transform of ψ(x).
Particularly, consider the following sigmoid function where α is the smoothness constant.To create functions of wavelet family from the sigmoid function in (24), the nth power of the sigmoid function and the set of all powers of the sigmoid function are used.
The polynomial wavelet function is given as (see [73]) where n is the order of the derivative of the sigmoid function and f (k+1) (x) is the (k + 1)th derivative of the sigmoid function.
Particularly, the first, the second, and the third polynomial types of wavelet functions in (25), which use consecutive powers of sigmoid functions, are given respectively, as Note that the observed family of polynomial wavelets ψ i ∈ L 2 (R).
Consequently, the conditions (23) hold not only for this family but also for shifted and dilated versions of this family.

Configurations and empirical results
The flowchart of LSTM, LSTM+MRA, hybrid LSTM-Wavenet, and hybrid LSTM-Wavenet+MRA methods are illustrated in Fig. 2.There is a decision point that determines whether MRA will be used or not.It is represented to show whether the algorithm is working with or without an MRA.The general flow for both cases includes data preprocessing, hyperparameter optimization, modeling and learning, prediction and visualization.Main parts of the flowchart are described below.
Data preprocessing.In this step, it is decided which parts of the data will be used.Then selected financial data (S&P500 and NASDAQ) is combined, and this fused data is normalized between 0 and 1. Normalization for the ith input is done via , where x i is the input value, x max and x min are the maximum and the minimum values of the data.The data is then converted to the structure of a supervised learning type by using certain timesteps. 1The modified input is eventually divided into the train set, the validation set, and the test set.On the other hand, one should also notice that when MRA is used, the data is decomposed after data selection and integration parts in data preprocessing.All decomposed data levels are normalized and transformed into a supervised learning problem.Each level is subdivided into the train set, the validation set, and the test set.Later on, hyperparameter optimization is done for each subseries by Talos optimization and hand-tuning 2 methods.If wavelets are used as an activation function, then the model is named a hybrid LSTM-Wavenet model with MRA.Otherwise, the model is called an LSTM model with MRA.After predicting train and test parts for each level by using Monte Carlo estimates, inverse MODWT is applied to the predictions to reconstruct the original time series.
The whole analysis is done using Python language in Anaconda.To perform ANN, the Keras library, which uses the Ten-sorFlow backend, is used [76].
Batch normalization is practiced in each analysis between LSTM and dense layers to produce smaller pieces of data with a mean of zero and a standard deviation of one.In [77,78] it is claimed that batch normalization has some advantages like reducing overfitting problems, speeding up the training process, and improving accuracy results/decreasing loss values.
The list of the specifications of the computing environment codes run on as well as the error metrics used are given in Appendix.S&P500 is a USA-based stock market index that carries 500 large corporations registered on stock exchanges.S&P500 includes finance, health care, industry, energy, information technology, and many other sectors.NASDAQ Composite is also another important USA-based stock market index that embraces the information technology sector.Because S&P500 keeps almost every sector, it is less volatile than NASDAQ.NASDAQ is supposed risky, while S&P500 is considered risk-free.Portfolio diversification by investing in both riskless and risky markets is crucial for market players and investors.

Data sets
In Figs.3(a) and 3(b), almost 70% of the data is in the train set (length of 6860), about 15% is in the validation set (length of 1485), and nearly 15% of all data is in the test set (length of 1485).It is noticed that there are large changes in values between the training set, the validation set, and the test set.Also, there are fluctuations: the minimum, maximum, mean, and standard deviation of both S&P500 and NASDAQ are quite different for the training set, the validation set, and the test set; thence classical methods may not be appropriate choices to be expected to work efficiently.

LSTM model without MRA
Firstly, closing prices of S&P500 and NASDAQ stocks are concatenated.Each sample is related to index 0 for closing prices of S&P500 and index 1 for closing prices of NASDAQ, respectively.By using ten days window set, the next day's closing price is focused on being predicted.
Two hidden layers are used, where the first one consists of LSTM nodes, and the second one is formed of a regular denselyconnected neural network.The kernel, recurrent, and bias regularizers are used in the LSTM layer.Batch normalization and ReLU activation function are joined between dense and LSTM layers, respectively.In the dense part, kernel and bias regularizers are utilized.
After modeling the problem according to the selected configuration using Talos optimization and hand-tuning, the model is fit to the data and predicts train/test data multiple times within a loop.Subsequently, train/test predictions and the means of the error metrics are estimated.
Results are affected by tuning L 1 regularization and L 2 regularization in the kernel, recurrent and bias regularizers for a fixed epoch value.The optimized Talos configuration is given in Table A. 13.Training and test results which are obtained using the optimized parameters are given in Table 3 for S&P500 and NASDAQ.

LSTM model with MRA
In this section, the data is decomposed into two detail parts and one approximation part by applying MRA.Daubechies wavelets, particularly the ''db2=D4'' filter and MODWT, are used to decompose the time series with a level of two.Then the first detail, second detail, and approximation parts are obtained as subseries.Each subseries (length of 9830) is split into the train, validation, and test sets.Almost 70% of each level is used in the train set (length of 6860), about 15% of each level is used in the validation set (length of 1485), and nearly 15% is used in the test set (length of 1485).
The same network form is used in the approximation level as built in Section 3.2.On the other hand, a single LSTM hidden layer is used for the first and the second detail levels independently.Each level is modeled by using Talos optimization and handtuning methods.The model is then fit for the data to predict train and test datasets for 1000 experiments.After denormalizing predictions, inverse MODWT is applied to these predictions of the train and the test separately.Lastly, the means of the error metrics are calculated and train/test predictions are synthesized.
In Table A.14 configurations of the first detail, second detail, and approximation are presented for S&P500 and NASDAQ.In Table 4 means of 1000 reconstructed scores are given for S&P500 and NASDAQ.It is seen that using MRA improves the test results based on RMSE, MAE, and MdAE metrics except for the MdAE scores of S&P500.

Hybrid LSTM-wavenet model without MRA
PPS is used to generate an internal activation function for the time series in this part.Details of polynomial wavelets are given in Section 2.3.2.Data preparation, time-steps, and network structure are the same as used in Section 3.2.
When the polynomial wavelet function generated by the nth derivative of the sigmoid function is used for the nth node of the LSTM layer, results are much worse than those when a single polynomial wavelet function in each cell is used.The polynomial wavelet function generated by the 6th derivative of the sigmoid function is used to create a wavelet activation function in each cell.Each activation function with index j is generated as where u is the translation (or location) and v is the dilation (or scale) parameters for j = 1, 2, . . ., 16.The subscript 6 denotes the order of the derivative.Translation and dilation parameters are initialized as where α and β are, respectively, the minimum and the maximum values in the training set; n is the number of LSTM nodes.
As a result, initial translation and dilation parameters in (30) for S&P500 data are u 1 = 40 and v 1 = 50.On the other hand, the same values are used for the initial parameters of NASDAQ data for the sake of the same configurations.Thus, where j is the index of the activation functions.Two different approaches are used for the hybrid LSTMwavenet model without MRA.In the first method, the same wavelet activation function is used for all 16 LSTM nodes, where j is fixed to 6 in (31).In the second strategy, different activation functions are created using (31) for j = 1, 2, . . ., 16.The second approach is named configuration by API since the functional API of Keras is used to create models.
In both approaches, learning and prediction processes are carried out with the model created after selecting the model parameters using the Talos optimization and hand-tuning methods.Finally, the mean values of predictions and errors are calculated.
In Table A.15 configuration parameters of the first approach are given for S&P500 and NASDAQ.Results obtained are presented in Table 5 for S&P500 and NASDAQ.It is clear that the results are better than the non-hybrid methods.
Configuration parameters of the second approach (configuration by API) are given in Table A. 16 for S&P500 and NASDAQ.Outcomes are shown in Table 6 and for S&P500 and NASDAQ.Once again, outcomes are superior to the methods LSTM and LSTM+MRA according to RMSE, MAE, and MdAE metrics.

Hybrid LSTM-wavenet model with MRA
MRA and the hybrid LSTM-Wavenet model are combined in this section in order to obtain the ultimate possible benefit from MRA.The model structure of the proposed hybrid LSTM-Wavenet with MRA model is demonstrated in Fig. 4. In substance, MRA is used to decompose time series, and wavenet form is applied to the approximation part.Finally, all outputs derived from different decomposition levels are merged by inverse wavelet transform.
MRA, data preparation, and selection of the time-steps processes are carried out exactly in the same way as in Section 3.3.However, in this case, two different approaches, mentioned in Section 3.4, are used to create activation functions by utilizing PPS.
Daubechies wavelets, particularly the ''db2=D4'' filter and MODWT, are employed to decompose the time series with a level of two as selected in Section 3.3.
Each level is modeled by taking advantage of both Talos optimization and hand-tuning again.Following that, a thousand experiments are conducted for the model fitting and predictions of train/test sets.The means of the error metrics and train/test predictions are then averaged for each wavelet level.Then, the mean error values of the reconstructed data are calculated.In Table A.17 picked Talos configuration parameters of the first detail, the second detail, and the approximation parts are given for S&P500 and NASDAQ.In this configuration, j is set to 6 in (31) for the approximation part.Wavenet structure is not used in the first and the second detail parts since the mentioned levels may be considered to be noise.The noise structure can be captured without the need for wavenets.In Table 7 average error scores of reconstructed train and test data are given for S&P500 and NASDAQ concerning this configuration 1.
In Table A.18 the second Talos configuration parameters of the first detail, the second detail, and the approximation parts are given for S&P500 and NASDAQ time series.At this point, the second approach (configuration by API) mentioned in Section 3.4 is used to create activation functions.In Table 8 results are given for S&P500 and NASDAQ with respect to configuration 2. Both configurations, which describe the hybrid LSTM-Wavenet model with MRA, outperform LSTM, LSTM+MRA, and hybrid LSTM-Wavenet methods.It is seen that configuration 1 is slightly better than configuration 2 when their respective RMSE, MAE, and MdAE metrics for both S&P500 and NASDAQ are compared.

Discussion of the results and comparison with classical methods
Table 9 summarizes the results obtained for S&P500 and NAS-DAQ.It should not be a surprise that using MRA improves the capability of both LSTM and hybrid LSTM-Wavenet models.Using different dilation and translation parameters for each activation function does not change the results significantly when compared to the case of using constant dilation and translation parameters for all nodes in LSTM.
The proposed method (hybrid LSTM-Wavenet+MRA) outperforms all the methods, LSTM, LSTM+MRA, and hybrid LSTM-Wavenet, for financial time series in terms of the error metrics.Hence, it is obvious that using wavelets in both MRA and activation functions improves the train and test performances.
In Section 3.1, it is declared that all training, validation, and test sets display diverse characteristics.The differences between the training sets and the test sets are relatively notable.Since such a difference occurs in the financial time series dynamics, it is understandable to view an absolute distinction between the RMSE values of the training set and the test set for both S&P500 and NASDAQ: RMSE error rates are relatively low compared to time series values.For instance, the mean of the training data is 654.61, and the mean of the test data is 2283.34 for S&P500.On the other hand, in the results obtained from the four methods, the maximum RMSE value for the training set is 15.29, and the minimum RMSE value is 7.45 for S&P500.Furthermore, in all results, the maximum RMSE is 29.06, and the minimum RMSE is 17.93 for the test set for S&P500.The mean of training data is 1147.35,and the mean of the test data is 5701.76 for NASDAQ data.The results belonging to the four methods show that the maximum RMSE value for the training set is 36.65,and the minimum RMSE value is 18.41 for NASDAQ.Additionally, the maximum RMSE is 85.31, and the minimum RMSE is 40.62, in the test set for NASDAQ.
If the SRMSE values are examined, it is seen that the error difference between training and test sets is small for both S&P500 and NASDAQ.The reason is that the time series observations are relatively high compared to the calculated RMSE values.
The error metrics R 2 and EVS show how well the model fits.RMSE, SRMSE, and MAE metrics generate average errors by using residuals.RMSE and SRMSE penalize large error values rather than other error metrics due to taking the square of residuals.RMSE and SRMSE are mostly used for model comparison.MdAE is quite robust against outliers.Nevertheless, it is not an advantage in our case, as it brings a drawback since large errors occur at big jumps in financial time series and the results of these large errors, i.e., outliers, might be significant.
In conclusion, nearly all error metrics generate robustness: each model trial with thousand repetitions gives very similar outcomes.
RMSE and MAE test error values of LSTM and hybrid LSTM-Wavenet methods decrease when MRA is used (for both S&P500 and NASDAQ).On the other hand, if wavelets are used as an activation function, then RMSE and MAE test error values for both LSTM and LSTM+MRA methods decrease for both time series.Consequently, the best test scores are reached by the hybrid LSTM-Wavenet+MRA method with fixed dilation and translation parameters for both S&P500 and NASDAQ data.The use of wavelets in MRA and activation function effectively increases the performance for time series prediction.Even further, the use of wavelets for both MRA and activation functions improves the performance the most.The proposed method contributes to the modeling of (financial) time series which are non-stationary, non-normal, noisy, and chaotic.Therefore, the recommended method is likely to be applied to other time series efficiently.

Comparison with other classical benchmark methods
Although the main aim of this study is to combine MRA and wavelet activation functions for LSTM, the usability of the proposed method is examined by comparing it with various classical state-of-the-art methods used in quantitative finance.
For comparison, the test data stated in Section 3.1 is used for selected methods.In addition, a one-step ahead prediction is considered, and no model is updated after a prediction.Because these conditions are the same for all LSTM experiments.
The methods used for comparison are Prophet (additive regression model released by Facebook 3 ) for univariate (uni.) and multivariate (multi.)data, KNN, lightGBM, random forest, support vector regression (SVR) inference with radial basis function (RBF), Bayesian ridge regression, LASSO, XGBoost, and SARIMA.The SARIMA model takes into account NASDAQ data as an exogenous variable for S&P500 and vice versa.Depending on the size of the number of parameter combinations, model parameters were chosen using either a randomized search on hyperparameters or an exhaustive search across the estimator's given parameter values based on the metrics used, such as, R 2 , negated mean square error, Akaike information criterion (AIC).
The proposed method has lower error than those the stateof-the-art methods provide for both S&P500 and NASDAQ data for the predictions made for test data on evaluation with metrics including RMSE and MAE.In addition, all methods in Table 9

Conclusion
Since there is a shortage of merging MRA and WNNs in the literature; the hybrid LSTM-Wavenet+MRA approach is a recommended method in this paper to overcome this shortage.The proposed hybridization is compared with the performances of LSTM, LSTM+MRA, and hybrid LSTM-Wavenet methods to make a one-step ahead prediction of S&P500 and NASDAQ.It is seen that the two different wavelet methodologies used increase the performances and the best performance is obtained when the two wavelet techniques are used together.Moreover, comparison with other classical state-of-the-art methods also shows that our proposed method provides a significant improvement in predictions.According to the results obtained, it is noticed that practicing wavelets in modeling financial time series is essential and promising; furthermore, such modeling may be very appreciated by practitioners and investors in financial markets.
The proposed method (hybrid LSTM-Wavenet+MRA) provides an original contribution to the knowledge in time series analysis by combining wavenets and MRA.As a result, this study bridges the gaps between hybrid models using MRA and hybrid models with WNNs in the related literature.
Some future applications and extensions to the proposed hybridization methodology may be on multi-step ahead predictions, dynamically updating the models after each prediction, and adaptive parameter selection in datasets with different characteristics.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix. Configurations of the models
Components of the computing environment are listed in Table A.11 together with python and the packages used.
The error metrics used in this paper are given in Table A.12, where y i is the ith observed value, ŷi is the ith predicted value, y max is the maximum observation, y min is the minimum observation and y is the mean of all observations.) for j = 1, . . ., 16.
The configuration parameters of the LSTM model are shown in Table A. 13 for S&P500 and NASDAQ data.The network uses mean square error as a loss function, Adam as an optimizer for a batch size of 1024 and epochs of 100 where the length of the time-steps is 10.There are two hidden layers where the first one is the LSTM layer and the second one is the regular dense layer.
The configuration parameters of the LSTM+MRA model are given in Table A.14 for S&P500 and NASDAQ data.Data is decomposed into two detail and one approximation parts by applying the Daubechies wavelet filter db2.All network structures practices mean square error as a loss function and Adam as an optimizer.Loss functions, optimizers, lengths of time-steps, and batch sizes are mean square error, Adam, 10, and 1024, respectively.Differently, the number of epochs is halved and only one LSTM hidden layer is used in each detail part.
The configuration parameters of the hybrid LSTM-Wavenet model are shown in Table A. 15 where the fixed wavelet activation function is used in all LSTM nodes.On the other hand, in Table A. 16    ) for j = 1, . . ., 16.

Fig. 2 .
Fig. 2. The flowchart of four different methods employed in the study.

Table 1
Descriptive statistics of S&P500 data.Modeling and learning.If wavenets are not used, then the created model is called the LSTM model.On the other hand, if wavelet activation functions are utilized, the model is called a hybrid LSTM-Wavenet model.The train set and the validation set are used to trigger the learning process for the selected model.
Prediction and visualization.After the learning process, the models are used for the prediction using the test set.Further, the model training set is also predicted.All predictions are denormalized via x i ← x i (x max − x min ) + x min .Monte Carlo estimates are used to get the mean error metric scores for the train set and the test set.In addition to calculating different error metric results, visuals of the analysis are acquired.

Table 2
Descriptive statistics of NASDAQ data.

Table 3
LSTM model, configuration (S&P500/NASDAQ): mean scores for the train set and the test set by running 1000 experiments.

Table 4
LSTM model+MRA, configuration (S&P500/NASDAQ): mean scores for the synthesized train set and the synthesized test set by running 1000 experiments.

Table 5
Hybrid LSTM-Wavenet model, configuration 1 (S&P500 /NASDAQ): mean scores for the train set and the test set by running 1000 experiments.

Table 6
Hybrid LSTM-Wavenet model by API structure, configuration 2 (S&P500/NASDAQ): mean scores for the train set and the test set by running 1000 experiments.

Table 7
Hybrid LSTM-Wavenet model+MRA, configuration 1 (S&P500/NASDAQ): mean scores for the synthesized train set and the synthesized test set by running 1000 experiments.

Table 8
Hybrid LSTM-Wavenet model+MRA by API structure, configuration 2 (S&P500/NASDAQ): mean scores for the synthesized train set and the synthesized test set by running 1000 experiments.

Table 9
Summary table for results (S&P500/NASDAQ): mean scores for the train set and the test set by running 1000 experiments where 'conf.' stands for configuration.

Table 10
have 3 https://facebook.github.io/prophet/Performance results of the classical state-of-the-art models on the test sets for S&P500/NASDAQ.

Table 10 .
Only the MAE values of the SARIMA model are lower than the corresponding values of the LSTM and LSTM+MRA models.These results demonstrate the strength and the superiority of the LSTM and its hybrid versions.For instance, considering SARIMA, which yields the best result in
different wavelet activation functions are applied for each LSTM node.The hybrid LSTM-Wavenet+MRA approach uses the configuration parameters given in TableA.17buthas activation functions in the LSTM layer of the approximation part.Similar to the LSTM-Wavenet model, fixed wavelet activation function is used for each node in Table A.17.However, changing wavelet activation functions are used in TableA.18.