Performance improvement of machine learning models via wavelet theory in estimating monthly river streamflow

River streamflow is an essential hydrological parameters for optimal water resource management. This study investigates models used to estimate monthly time-series river streamflow data at two hydrological stations in the USA (Heise and Irwin on Snake River, Idaho). Five diverse types of machine learning (ML) model were tested, support vector machine-radial basis function (SVM-RBF), SVM-Polynomial (SVM-Poly), decision tree (DT), gradient boosting (GB), random forest (RF), and long short-term memory (LSTM). These were trained and tested alongside a conventional multiple linear regression (MLR). To improve the estimation and model performance, hybrid models were designed by coupling the models with wavelet theory (W). The models performance was assessed using root mean square error (RMSE), mean absolute error (MAE), coefficient of determination (R2), Nash-Sutcliffe efficiency (NSE), and Willmott’s index (WI). A side-by-side performance assessment of the stand-alone and hybrid models revealed that the coupled models exhibit better estimates of monthly river streamflow relative to the stand-alone ones. The statistical parameter values for the best model (W-LSTM4) during the test phase was RMSE = 36.533 m3/s, MAE = 26.912 m3/s, R2 = 0.947, NSE = 0.946, WI = 0.986 (Heise station), and RMSE = 33.378 m3/s, MAE = 24.562 m3/s, R2 = 0.952, NSE = 0.951, WI = 0.987 (Irwin station).


Introduction
Precise estimation of streamflow time series data for rivers is one of the most important issues for optimal management of surface water resources; in particular, making appropriate decisions when dealing with floods and droughts. The river streamflow phenomenon appears complex, non-stationary, and non-linear (Adnan et al., 2019;Bayazit, 2015;Meira Neto et al., 2018); because the river streamflow time series data can be influenced by a variety of parameters such as temperature, rainfall, and evaporation, these render it nearly impossible to estimate accurately. However, in the age of multi-threaded parallel computing, it is now possible CONTACT Shahab S. Band shamshirbands@yuntech.edu.tw; Chung-Chian Hsu hsucc@yuntech.edu.tw; Amir Mosavi amir.mosavi@nik.uni-obuda.hu to deploy powerful mathematical and machine learning approaches to the issue. In general, two different are used, these are conceptual (physical) models, and machine learning (data-driven). These are both proposed, and used by hydrologists, for estimating river streamflow (He et al., 2014;Reis et al., 2021;Zhang et al., 2016). Conceptual paradigms are usually complex models that require many hydrological and climatological parameters as inputs, many of which may be unavailable for certain locations. In addition, the inherent complexity of streamflow processes make it challenging to use physical models; accordingly, in recent years, researchers have shown increasing interest in machine learning models to estimate these time series solutions (Kalra et al., 2013;Mehdizadeh & Sales, 2018;Qu et al., 2021;Rasouli et al., 2012;Sahour et al., 2021;Wang et al., 2019;Xiang & Demir, 2020;Zhu et al., 2020). The most important reason for using machine learning, is that these models can estimate target parameters using relatively limited data series, without the need to for additional data regarding the (potentially) complex relationships between inputs and outputs (Deng et al., 2021;Deng et al., 2022;Singh et al., 2022).
More recently, hybrid models have received attention from researchers for river streamflow estimation. For example, Kilinc and Haznedar (Kilinc & Haznedar, 2022) proposed a hybrid approach that integrated long short-term memory (LSTM) and a genetic algorithm (GA) for streamflow estimation of the Euphrates River, Turkey. Particle swarm optimization (PSO) was coupled with LSTM by Kilinc (Kilinc, 2022), to develop a hybrid PSO-LSTM for forecasting the streamflow of the Orontes River basin, Turkey. Feng et al. (2021) optimized the parameters of an LSTM with a PSO algorithm for runoff prediction in the Jiulong River Basin, China. Huang et al. (2014) successfully predicted monthly streamflow at three hydrometric stations in China, using a hybrid empirical and decomposition-support vector machine (EMD-SVM). Wang et al. (2015) increased the forecasting accuracy of a time series model using an autoregressive integrated moving average (ARIMA) variant that also leveraged ensemble empirical mode decomposition (EEMD). This was able to predict the runoff values of three reservoirs, China. Wang et al. (2013) simulated the rainfall-runoff process of a catchment in the Yellow River, China, by coupling both a PSO and EEMD on the stand-alone SVM. A modified form of EMD was used by Meng et al. (2019), where they combined EMD and SVM to predict streamflow in the Wei River Basin, China. The performance of his model was then compared with the stand-alone SVM and ANN. Zhao and Chen (2015) developed two hybrid models via coupling the EMD and EEMD on an auto-regressive (AR) model, for forecasting runoff of from the Fenhe River basin, China. Rezaie-Balf et al. (2019) integrated EEMD into multivariate adaptive regression splines (MARS) and M5Tree models, to forecast the daily river streamflow's of two river basins in South Korea. Fu et al. (2020) tested the performance of an LSTM for simulating the daily streamflow of the Kelantan River, Malaysia.
As alternative approach uses bio-inspired algorithms. For example, Yaseen et al. (2020) used PSO, GA, and differential evolution (DE) to form a hybrid adaptive neuro-fuzzy inference system-based models to forecast the streamflow of the Pahang River, Malaysia. Al-Sudani et al. (2019) integrated the DE algorithm into a MARS system to estimate the streamflow of the Tigris River, Iraq. Liu et al. (2020) also proposed a hybrid model, by coupling the EMD and Encoder Decoder LSTM to complete a streamflow prediction of the Yangtze River, China. A new model was proposed by Hadi et al. (2019) using a combination of extreme gradient boosting (XGB) and extreme learning machine (ELM); then compared its capabilities with stand-alone models for streamflow predictions of the Goksu-Himmeti basin, Turkey. Aside from river streamflow estimation, other types of hybrid model have been also used by scholars in various fields, such as sediment yield estimation (Meshram et al., 2019), shield movement prediction (Lin et al., 2022), and rainfall simulation (Chen et al., 2022).
It is already well documented, that coupled models can demonstrate improved performance when compared with stand-alone ones. This remains true for many types of time series estimation, not just hydrological parameters such as this study. One method of data pre-processing is to use wavelet transforms. This is an approach that has become widely used to generate wavelet-based hybrid models. One of the main advantages of this method is that wavelets offer simultaneous localization in both the time and frequency domains. The other main advantage is that it applies a fast wavelet transform, which is computationally quick. Wavelets are also able to separate fine details within a signal in a manner similar to an enhanced Fourier transform (Sifuzzaman et al., 2009). Wavelet transforms are an efficient mathematical transformation for signal processing and data pre-processing because they can transform a signal into a set of basic signal functions. The mother wavelet functions similarly and can be used for signal analysis, techniques to achieve this include Daubechies, Symlet, Haar, etc. Utilization of the wavelet theory depends on a number of basic principles, for example observational time series must first be decomposed into several sub-series, and then the generated sub-series need to be considered as new inputs for the machine learning model being used.
With reference to river streamflow data, for the first part of this study, the monthly river streamflow time series for the Heise and Irwin hydrometric stations located on the Snake River, USA, were estimated using five machine learning-based models. These were support vector machine-radial basis function (SVM-RBF), SVM-Polynomial (SVM-Poly), decision tree (DT), gradient boosting (GB), random forest (RF), and long shortterm memory (LSTM). A conventional multiple linear regression (MLR) is also commonly applied to most predictive models (Adnan et al., 2019;Kadam et al., 2019). These models are foundational, but that also makes them controversial. Various methods are still debated, such as SVM being used for non-stationary streamflow prediction, (Adnan et al., 2021;Meng et al., 2019), RF was implemented in (Latif & Ahmed, 2021), GB was used in (Ni et al., 2020), and LSTM being used in (Dong et al., 2020) for streamflow prediction.
Wavelet theory is a valuable pre-processing technique, especially when hybridized with the aforementioned stand-alone models. Its primary role, is improving model performance. A survey of published literature on estimating river streamflow time series models, shows that there are still few studies reporting or evaluating the performance of these models, particularly those that are hybrids and using wavelet theory.

Case study and data used
Monthly river streamflow data from two gauging hydrometric stations at Heise and Irwin on Snake River, USA, were used for this study. The data was obtained from the United States Geological Survey (USGS) and is available at https://waterdata.usgs.gov/nwis/. Snake River is one of the largest rivers in the Pacific Northwest region of the USA. The drainage basin of Snake River covers six states, however both selected stations for this study are located in Idaho. Heise station (USGS 13037500) is at latitude 43°36 45 and longitude 111°39 36 , with a drainage area of 5,752 square miles; and Irwin station (USGS 13032500) is at lat. 43°21 03 , long. 111°13 08 , and is 5,225 square miles. The map in Figure 1 shows the relevant geographical locations.
The data used in this study utilized monthly river streamflow values from Heise and Irwin stations for the period Oct. 1960 to Sep. 2020 (i.e. 720 data). The data was divided into training and testing datasets. In the applications, data from Oct. 1960to Sep. 2005 at both stations were utilized as a training set, while the remaining data between Oct. 2005 and Sep. 2020 (i.e. 180 data) constituted the model. Table 1 is an overview of some statistical information regarding the data for both locations, during both the training and testing phases. In general, similar statistics can be observed for the training and testing stages. Monthly river streamflow data for the sites are shown in Figure 2.

Methodology
Wavelet (W) theory was used as a noise removal system for the data. This was implemented as a pre-processing technique for the applied machine learning models. The models used were multiple linear regression (MLR), support vector machine (SVM), decision tree (DT), random forest (RF), gradient boosted decision trees or gradient boosting (GB), and long short-term memory (LSTM). Explanations of the applied methods follow.

Pre-processing-based wavelet theory (W)
Wavelet (W) transformation is a powerful technique used in signal processing (Starck & Murtagh, 2001), which is often used to de-noise, compress or decompress data (Daubechies, 2009). Some features of wavelet transformation and Fourier transformation are alike, and thus in time and frequency domain, wavelet analysis can be considered as a type of multi-resolution analysis. In order to decompose signals into different resolutions, both shifting and scaling of a wavelet basis function is required and this creates the mother wavelet (Ebrahimi & Rajaee, 2017).
Streamflow time series are accompanied by noise, which comes in the form of signals with high-frequency. Wavelet transformation is used to remove these and extract high-frequency signals from raw signals, and the process occurs in three main steps. First, by using the selected mother wavelet and level of decomposition, wavelet transformation of the input signals is performed. Second, a threshold is determine and applied to find the amount of high-frequency wavelet transformation. Third, the denoised time series signals are obtained by using low-frequency and high-frequency wavelet coefficients. In this study, the Daubechies wavelet (db4) model was used to determining the optimum level for the mother wavelet, and trial-and-error was used to find the level of decomposition, respectively.
One of the main advantages of wavelets is that they offer simultaneous localization in the time and frequency domains. The second main advantage of wavelets is that, using a fast wavelet transform, it is possible to make calculations very quickly. Wavelets have the great advantage of being able to separate the fine details in a signal. Wavelet transform is a highly efficient mathematical transformation function in signal processing (data preprocessing) and decomposes a signal into its basic signal functions (Singh et al., 2020).

Multiple linear regression (MLR)
MLR is implemented to identify the possible existence of relationships between independent and dependent variables. This method is often used as a tool to prove a correlation between the inputs and outputs of a given system (Clarke et al., 1959).
Linear regressions are a form of bivariate model used to predict an independent variable (y) from a dependent one (x). By extending the model to include more than one explanatory variable (x 1 , x 2 , . . . , x p ) such as in MLR, a  multivariate model is produced. MLR can be used to discern a linear relationship between two or more independent variables and a dependent variable. Since hydrological variables such as river streamflow can depend heavily on lagged data, river streamflow is generally considered as a dependent variable, y, and lagged streamflow data is considered the independent variable, x 1 , x 2 , . . . , x p . Thus, the regression equation is expressed as: where b 0 is a constant, and b 1 , b 2 , . . . , b p are partial regression coefficients that can be fitted using a least squares approach (Uyanık & Güler, 2013).

Support vector machine (SVM)
Machine learning algorithms such as classifiers, regressions, and outlier detectors utilize SVMs as a set of supervised learning methods (Niu & Feng, 2021). Vapnik et al. introduced this in 1995 to do classification and regression on a set of data points (Cortes & Vapnik, 1995). Since the cost function of a model is not sensitive to training data point positions, models produced using SVM rely on a subset of training data beyond the margin. The intuition behind SVM, is to first obtain an optimized hyper-plan, which is as far as possible from both classes' actual samples. In other words, the learning method maximizes the class margin, based on the choice of the type of margin, which may be either soft or hard. The former SVM misclassifications would probably happen whereas it is not accepted with hard margin SVM (Hamasuna et al., 2008). SVM uses a kernel function to build expert knowledge, which can be considered one of the main advantages of this method. The most commonly used SVM kernel functions are linear, polynomial, sigmoid, and Radial Basis Function (RBF). This study adopts the RBF and polynomial kernels to estimate the river streamflow, following a previous study by (Adnan et al., 2020;Baesens et al., 2000;Leong et al., 2021), where the efficiency and output quality of the RBF and polynomial kernels were compared with different data sets and yielded good results.

Decision tree (DT)
Decision trees are another important, and frequently used, method for supervised learning. It can be used to solve both regression and classification tasks, but the latter is more common. There are three types of nodes, (i) a root node, which represents the entire sample and is the initial node and may split into other nodes. The features of the dataset contain branches that represent the various decision rules, are these are determined using (ii) interior nodes. The final output is delivered at the (iii) leaf nodes.
This algorithmic approach can also be implemented to solve decision-related problems. To check an algorithm with a particular data point, all nodes in the tree need pass a conditional test (true/false) until the leaf node is reached. The final prediction is taken as the average value of the dependent variable in a specific leaf node. In order to predict precise values for the given data points, the algorithm runs for multiple iterations on the tree (Safavian & Landgrebe, 1991).

Random forest (RF)
Random forest regressions are another supervised learning algorithm, however this one implements an ensemble learning method to conduct the regression. Ensemble learning methods combine the predictions from multiple machine learning algorithms to make a precise prediction from a single model (Bernard et al., 2009). A metaestimator, which fits a series of classifying decision trees on different subsets of a given dataset, plays a key role in this algorithm. This type of estimator benefits from averaging techniques to increase its predictive accuracy, and gain control of the over-fitting problem. The trees in a random forests model run in parallel, there is no interaction while they are being built. During training, multiple trees are constructed and then the classification to determine the related class of the data points is performed. After this the whole model can be regressed or particular trees can be selected (Mosavi, et al., 2022). The followings are steps required to implement a random forest algorithm: (1) Choose k data points randomly from the training set.
(2) Build the corresponding decision tree for the given data.
(3) Repeat steps 1 and 2 to build N trees.
(4) After building N trees, for any new data point, the related prediction of the value of y as it corresponds to the data point can be made. It is then possible to assign the new data point an average value across all the obtained y values.

Gradient boosting (GB)
In the field of machine learning, the gradient boosting algorithm can be considered one of the most powerful algorithms. The most common types of allocated errors in machine learning algorithms are bias error and variance error (Islam & Amin, 2020). To decrease the bias error of a model, the gradient boosting algorithm can be used. Gradient boosting algorithm can also target a continuous variable as a regressor, as well as a categorical target variable (as a classifier) (Friedman, 2001). In the former case, the mean square error (MSE) is the cost function, while in the latter it appears as log loss.
For both regression and classification problems, GB can show notable outcomes.
Steps to implement GB are as follows: (1) Determine the average value of the target label. 1. Compute the residuals using 'residual = actual value−predicted value'. 2. Construct a decision tree. 3. Use all presented trees within the ensemble to predict the target label. 4. Compute the new residuals, such that the number of iterations matches the number of estimators by repeating steps 3 through 5. 5. Assign the value of the target variable using all trees from the training step.

Long short-term memory (LSTM)
A suitable performance for streamflow prediction has also been demonstrated by deep learning models that use an appropriate architecture (Ghimire et al., 2021;Lin et al., 2021). Using dynamic system modeling in diverse application areas such as speech recognition, image processing, manufacturing, communication or energy consumption and autonomous systems, recurrent neural network (RNN)-based deep learning models, especially LSTM, play a vital role (Lipton et al., 2017). The setup of prediction models according to time series data can be used to predict non-linear, time variant system outputs for many data types (Lindemann et al., 2021). LSTMs, which rely on memory blocks in their hidden layers, and perform the same role as neurons in hidden layers, are a class of RNN. There are input, output, and forget gates in the memory blocks. These gates are used for controlling and updating the information through the memory blocks (Hochreiter & Schmidhuber, 1997;Sainath et al., 2015). The equations for determining the gates in an LSTM are: where i t , f t , and o t are the input, forget, and output gates, respectively. σ denotes a sigmoid function (gates in LSTM are the sigmoid activation functions). w x and b x are weights and biases for the correspondence gate (x) neurons. x t denotes the input at the current timestamp, and h t−1 is used for the output of the previous LSTM block at timestamp t-1.
The equations for the cell state, candidate cell state and final output are as follows: where c t represents the cell state (memory) at timestamp t, andc t represents the candidate for all states at timestamp t (Salem, 2018;Wang et al., 2020). Overall, LSTMs are an efficient, gradient-based method to handle complex, artificial long-time-lag tasks. In addition, LSTMs are the RNN variant that is capable of learning long-term dependencies because their cells have the ability to retain previous time step information.

Performance evaluation of models
Five evaluation metrics are often used to determine model performance: root mean square error (RMSE), mean absolute error (MAE), coefficient of determination (R 2 ), Nash-Sutcliffe efficiency (NSE), and Willmott's index (WI). Accordingly, these were employed in this study to determine the relative performances of the stand-alone, and coupled models. These determinants are described in the equations below, and follow the form described by Willmott et al., 2012;Yaseen et al., 2020): where Q o,i and Q e,i describe the observed and estimated daily river streamflow for the i th month, respectively. Q o and Q e indicate the average values of the observed and estimated daily river streamflow, respectively. N shows the total number of observations. A better model should present lower RMSE and MAE values, and a higher R 2 , NSE, and WI.

Results and discussion
This study utilizes historical monthly streamflow data from 1 to 4 months as input predictors for the models, as shown in Table 2. Setting parameters for each model accordingly will improve the performance of the models. Table 3 summarizes the parameter settings for all the models used in this paper. Determination of the optimum parameter values changes the efficiency of the model design. To accomplish this, trial-and-error was used to select the optimum value each parameter. For all experiments, the model parameters were independently drawn from a 10-fold cross-validation run on the training set. The parameter ranges are shown in Table 3.
First, a standard MLR and stand-alone machine learning models using SVM-RBF, SVM-Poly, DT, RF, GB, and LSTM were established using the input configurations shown in Table 2. RMSE, MAE, R 2 , NSE, and WI values for the Heise and Irwin hydrometric stations are presented in Tables 4 and 5, respectively. These standalone models were able to estimate daily streamflow using streamflow data of previous daily periods. The tables clearly show that the performance of the models generally improved as the number of inputs increased (from M1 to M4). As a result, the stand-alone models exhibited the potential for estimating river streamflow in the current month, using antecedent monthly streamflow data.   In addition to the stand-alone models, the models can be fine-tuned to increase accuracy. To achieve this purpose, wavelet theory (W) was included. The hybrid W-MLR, W-SVM-RBF, W-SVM-Poly, W-DT, W-RF, W-GB, and W-LSTM details are established in this section. Daubechies (db4) is used in this study as the mother wavelet, and values of RMSE, MAE, R 2 , NSE and WI are used to measure the product of the model. The results are shown in Tables 4 and 5, for Heise and Irwin stations, respectively. Comparing the estimation accuracy of stand-alone and hybrid models in Tables 4(a) and Table 5(a) (i.e. using only the streamflow data of the preceding month), it can be observed that wavelet theory shows relatively little ability to improve the performance of the single models, and in very few cases, it even slightly reduces estimation accuracy. However, wavelets can enhance the accuracy of stand-alone models using monthly data inputs with two to four lags. Examining This shows a dependable potential for W theory to capture the monthly streamflow time series. This result is also observed for the other hybrid models. The most important reason for the improved performance is that the wavelet removes unwanted high-frequency signals from the raw signal, improving output overall. The same results are observed for stand-alone and hybrid models. The coupled methods perform the best for both locations, when utilizing longer streamflow data, this is particularly evident in the M4 models which draw on 4-months of streamflow data.
It is possible to visualize the performance of these models using scatter, time series and Taylor diagrams. These comparison diagrams are shown in Figures 3 and  4, respectively. In this regard, the superior coupled models during the test phase for both locations (i.e. W-LSTM4) and the stand-alone models (i.e. LSTM4) were considered. Dashed lines in the scatter graphs denote the perfect line. It can be seen in the scatter plots that there is a lower dispersion in the hybrid W-LSTM4 models compared to the stand-alone LSTM4. This indicates that the hybrid models offer reliability and compatibility with the data for estimating monthly river streamflow time series. The time series plots also illustrate that peak-points of streamflow data can be estimated accurately using hybrid W-LSTM4 models, this is true for both stations and is an improvement over the LSTM4 models. Figure 5 also shows the Taylor diagrams for the observed and estimated streamflow data using the best hybrid models and    most relevant stand-alone methods. A Taylor diagram consists of three error measures, standard deviation, correlation, and centred RMSE. The short distance between the point of the hybrid W-LSTM4 models at both stations (marked in red) and the observational data (black) indicates the superior performance of the hybrids. To better understand the estimation accuracy of the various models, all were ranked using their RMSE values. Tables 6 and 7 show the rankings for all models used to analyse data from Heise and Irwin stations, respectively. In this case, lower ranked models denote a superior model performance for estimating streamflow. Here, the total ranking for all input combinations is obtained (i.e. M1-M4) for each region. Examining the total ranking for models concerning Heise station, in Table 6, it can be seen that W-RF, followed by W-GB, outperformed the other standalone and coupled models during the training period. Meanwhile, W-LSTM models showed superior performance during the testing phase of all M1-M4 input patterns; this had the lowest total ranking of 4. After that, W-GB, W-MLR, and W-RF were lower ranked, indicating their better performance.
The same result was observed at Irwin station, and the rankings are shown in Table 7. In this context, the W-RF and W-GB methods (training), as well as W-LSTM, and W-GB, and W-MLR (testing) showed lower rankings, showing their superior performance.
It can be seen that the coupled W-GB model was one of the superior models during both training and testing of the hydrometric station data. Therefore, this is the recommended process for precisely estimating river streamflow data. Conversely, the stand-alone SVM-Poly illustrates the highest total ranking, showing that it offered the poorest performance. Examining Tables 6 and 7 in more detail, the stand-alone SVM-RBF and hybrid W-SVM-RBF models were superior to the SVM-Poly and W-SVM-Poly ones for both locations. Furthermore, of the tree-based models, the stand-alone RF, GB and their hybridized forms with the wavelets (W-RF and W-GB) were superior to the DT and W-DT models in training and testing. It is possible to conclude that the hybrid models exhibit better river streamflow estimations than stand-alone methods. It was also shown several papers that coupled techniques frequently outperformed single models. For example, pre-processing with EMD or EEMD will improve the performance of machine learning models for estimation of river streamflow (Huang et al., 2014;Liu et al., 2020;Meng et al., 2019;Rezaie-Balf et al., 2019;Wang et al., 2013). Different optimization algorithms have been also been proposed to develop new hybrids Kilinc & Haznedar, 2022;Yaseen et al., 2019).
The potential of wavelet theory to improve the performance of stand-alone machine learning techniques has been widely reported for river streamflow estimation (Hadi & Tombul, 2018;Ravansalar et al., 2017;Sun et al., 2019). The superiority of wavelet and hybrid models is not limited to riverflow, as other authors have shown it can estimate other time series hydro-climatological parameters such as precipitation (Kumar et al., 2021;Paul et al., 2020), groundwater levels (Band et al., 2021;Yosefvand & Shabanlou, 2020) evapotranspiration (Kisi & Alizamir, 2018), and soil temperature (Mehdizadeh et al., 2020;Samadianfard et al., 2018).

Conclusion
Monthly river streamflow time series for two hydrometric stations on Snake River, USA were estimated. Standalone machine learning (ML) models, SVM-RBF, SVM-Poly, DT, RF, GB, LSTM, and a traditional MLR were used. It was found that the river streamflow for each month could be estimated using lagged monthly data as inputs, using lags of 1 to 4 months. Wavelet theory was incorporated into the stand-alone models to establish a wavelet-based hybrid model. The study results indicate that the coupled models generally performed better than their corresponding stand-alone counterparts. [when measured by . . . ] In general, input patterns of the models benefitted from a larger number of inputs, especially the M4 models. These exhibited better estimates than other input combinations. The final performance rankings of models showed that W-RF and then W-GB were the best-performing methods in the training phase. Conversely, the W-LSTM and W-GB models exhibited the lowest rankings during testing, indicating that they were highly dependable. The db4 was used in this study to produce the mother wavelet, when hybridizing the stand-alone models. Future studies could test the efficiency of other mother wavelets when estimating hydrological parameters. Seeing the successful application of wavelets here, many possibilities for complex estimations open up. Wavelet theory can be combined effectively with other ML-based models and determining which places they can be used would be beneficial for researchers. Also, different ML techniques, in combination with other pre-processing methods, such as empirical mode decomposition, ensemble empirical mode decomposition, may also be used to produce bio-inspired optimization algorithms.