Application of hybrid machine learning models and data pre-processing to predict water level of watersheds: Recent trends and future perspective

Abstract The community’s well-being and economic livelihoods are heavily influenced by the water level of watersheds. The changes in water levels directly affect the circulation processes of lakes and rivers that control water mixing and bottom sediment resuspension, further affecting water quality and aquatic ecosystems. Thus, these considerations have made the water level monitoring process essential to save the environment. Machine learning hybrid models are emerging robust tools that are successfully applied for water level monitoring. Various models have been developed, and selecting the optimal model would be a lengthy procedure. A timely, detailed, and instructive overview of the models’ concepts and historical uses would be beneficial in preventing researchers from overlooking models’ potential selection and saving significant time on the problem. Thus, recent research on water level prediction using hybrid machines is reviewed in this article to present the “state of the art” on the subject and provide some suggestions on research methodologies and models. This comprehensive study classifies hybrid models into four types algorithm parameter optimisation-based hybrid models (OBH), pre-processing-based hybrid models (PBH), the components combination-based hybrid models (CBH), and hybridisation of parameter optimisation-based with preprocessing-based hybrid models (HOPH); furthermore, it explains the pre-processing of data in detail. Finally, the most popular optimisation methods and future perspectives and conclusions have been discussed.


Introduction
Providing enough drinkable water, especially in major cities, is a huge task. Cities have grown without proper planning, resulting in the loss of vegetative cover and soil impermeability. As a result, hydrological and meteorological problems have arisen-changes in air temperature, evapotranspiration, and flood danger De Souza Groppo et al., 2019). Beachfront zones are dynamic and vulnerable to natural disturbances caused by environmental change and human activities. They are particularly vulnerable since they are typically densely populated, with more than 40% of the population living in shoreline areas. The level of risk associated with this can be rather substantial (Hussaini et al., 2020). Additionally, the overuse of groundwater directly affects the sea water level (Koutsoyiannis, 2020). Water level forecasting is a vital mission for the hydrologists, relevant authorities and engineers in improving a sustainable conceptual design of water infrastructures. Forecasting water levels aid in flood and drought control and practical applications such as studying the behaviour of rivers, lakes, and reservoirs (Deng et al., 2021). The watershed's temporal and spatial variability, climate variability, seasonal patterns, and local to regional-scale heterogeneity in precipitation and temperature patterns all influence water level (Yarar et al., 2009). Flood monitoring, river level monitoring, wetland studies, tidal studies, groundwater monitoring and surface water monitoring are all typical uses for water level monitoring (Rakshitha & Maheshan, 2020).
Hurst was the first to propose a method to control the water level in the Nile River by taking a comprehensive view of the Nile River, from its headwaters in the African Great Lakes and Ethiopian plains to the great delta on the Mediterranean, where its recurrent floods and unpredictable flows were a serious development hindrance. Early attempts to regulate the flow by building a dam at Aswan were only partially successful (Graves et al., 2017). Then, this phenomenon has been identified in many key hydrological-cycle processes, such as Dimitriadis  used this phenomenon to identify stochastic similarities in marginal distribution and dependent structure of the hydrological cycle. The Hurst phenomenon is so important in the generation and predictability of a process, since it expresses the high variability/uncertainty observed in the time series that can influence the prediction in both the long-and short-term fluctuations.
The Hurst phenomenon was employed for long-range dependence, Dimitriadis and Koutsoyiannis (Dimitriadis & Koutsoyiannis, 2018) utilised the symmetric-moving-average (SMA) scheme for the stochastic synthesis of a stationary process for estimating any dependence structure and marginal distribution. As well as taking into consideration the Hurst effect, Rozos (Rozos et al., 2021) employed a multilayer perceptron (MLP) network to stochastically synthesize time series of daily rainfall. Unlike other existing similar approaches, this MLP-based strategy is unique in that it replicates the statistical characteristics of the related historical time series at various scales (Hurst effect).
In the past, different water level forecasting models have been developed to increase the capabilities of water level forecasting. Machine learning (ML) techniques can learn from past experience (data) and create new (mathematical) models that can be applied to new data. ML has been applied in many areas of the hydrology fields, for example, stream flow (Kilinc, 2022;Kilinc & Haznedar, 2022). There are different types of ML used in the field of water level prediction, such as artificial neuro-fuzzy inference system (ANFIS), artificial neural network (ANN), support vector regression (SVR), and a variety of hybrid models (Çimen & Kisi, 2009;B. Li et al., 2016;Tiu et al., 2021;Yarar et al., 2009). Also, recently different techniques have been used for different areas, such as Young (Young et al., 2015), Zhang (J. Zhang et al., 2022), and Chen (X. Chen et al., 2022).
It might be challenging for researchers to select which forecasting model is best suited to their study activity due to the great number of models available. The literature on water level forecasting issues can be looked at from various angles. Focusing on the supply side of the issue, Wee (Wee et al., 2021) reviewed the paper on ML applications for studying reservoir water level prediction strategies. The main focus was on ANN, ANFIS, BA, COA, and support vector machine (SVM), as well as their main benefits. Zhu (Zhu et al., 2020) provided the literature with a complete overview of how ML models can be used to predict water-level dynamics in lakes. Seven popular ML model types are examined among the many existing ML models: ANFIS, ANN, SVM, hybrid models, evolutionary models, extreme learning machines (ELM), and deep learning (DL). In addition, model inputs, data split, model performance criteria, and model inter-comparison were all studied. Hussaini (Hussaini et al., 2020) reviewed the global water level fluctuation modelling project to provide evidence for further improvement in modelling. The scientific theories behind the modelling approaches were closely examined to better grasp their primary features and to present a comprehensive picture of the current water level fluctuation modelling effort. Despite numerous advances in water level forecasting systems, no global strategy can consistently outperform all models throughout all research areas. Each circumstance must be studied independently, with the effectiveness of each technique or set of techniques being assessed (De Souza Groppo et al., 2019). Hybrid models have contributed tremendously to methodological advancement in water level prediction. Hybrid models have been widely used in water level forecasting during the last decade since their performance is far superior to standalone models. A hybrid model combines the advantages of each distinct model, leading to high prediction performance for time series with shorter time scales and longer lead times (Fung et al., 2019). Hybridisation has emerged as a promising technique for overcoming numerous drawbacks of standalone methods while also improving prediction accuracy (Hajirahimi & Khashei, 2022). There are several types of hybrid models, e.g., the hybrid model that combines a physical model and a machine learning model (Nualtong et al., 2021;Xu et al., 2019), and the hybrid stochastic-ML model proposed by Koutsoyiannis and Montanari (Koutsoyiannis & Montanari, 2022). Papacharalampous  showed that the hybrid stochastic-ML model is better than single stochastic or ML models.
To the best of the authors' knowledge, no review paper has been written specifically to study hybrid machines in water level forecasting. In this review paper, the hybrid models will be studied in detail with the pre-processing of the data for the recent year. It can be observed the increase in the use of the hybrid machine in recent years ( Figure 1). According to the findings of the summarised articles, certain countries contributed more in the field of water levels than others ( Figure 2). The most active societies in water level studies, for example, are China with eight articles and Malaysia with four. The results of Fig. 1 and 2 belong to the WL prediction papers published between 2014-2021 considered in this study. The current study's objective is demonstrating briefly of data pre-processing techniques and hybrid machine learning models of the WL prediction papers published between 2014-2021 that can aid in understanding the gaps in the literature concerning the use of hybrid techniques in WL forecasting.
The paper is organised as follows-an overview of famous models in the second section and data pre-processing in the third section. Then the hybrid models are divided into four sections in this paper, pre-processing-based hybrid models (PBH), the components combination-based hybrid models (CBH), hybridisation of parameter optimisation-based with preprocessing-based hybrid models (HOPH), and parameter optimisation-based hybrid models (OBH) in the fourth sectionfinally, the conclusion and future perspective.

Methods of forecasting water level
There are two types of water level forecasting methods: linear and nonlinear (De Souza Groppo et al., 2019). Linear approaches, such as multiple linear regression methods (MLR), autoregressive integrated moving averages (ARIMA) (e.g., (Kisi et al., 2012;Yarar et al., 2009)). Also, nonlinear methods, such as SVM, ANN, ANFIS, and hybrid methods (e.g., (Bazartseren et al., 2003; Al Visi  et al., 2006;Yarar et al., 2009)). The hybrid models will be dealt with in particular due to their merit and superiority over traditional models and single ML models.

Artificial neural network
An artificial neural network (ANN) is a large-scale distributed parallel information analysis theory with performance characteristics that are similar to those of a biological neural network in the human brain. ANNs are technologically advanced, and they can accomplish a lot of large-scale computing in a short amount of time. They are inspired by human cognition and neurobiology by a mathematical model (Mohammadi, 2021). The ANN tool could be particularly useful in situations when mathematically establishing the relationship between the dependent and independent variables for any physical occurrence is difficult. Using historical data, ANN can create a relatively accurate prediction of the modelled parameters.
A neural network is made up of a network of processing elements (PEs) connected by various weights and topologies. PEs are typically laid out in layers. The input layer receives the input variables for forecasting, the output layer calculates the output, and the remaining levels (hidden layers) transform the inputs into outputs. An ANN's layers can be fully or partially connected, and the weights must be changed correspondingly for forecasting purposes. Training algorithms are responsible for modifying the weights to reduce the predicting error (Ghalehkhondabi et al., 2017). Multilayer feed-forward neural network (FFNN) learned using the back-propagation technique, recurrent neural network (RNN), bayesian neural network (BNN), echo state network (ESN), and long short-term memory (LSTM), are the most prominent ANN models.
The main characteristics of the ANN model are that they can train and test data series continuously to increase forecast accuracy, and can handle nonlinear data series (Z. Zhang et al., 2018).

Support vector machine
The support vector machine (SVM) technique has gained popularity as a modern kind of statistical learning over the last four decades, and it has been proven to be a quick and successful tool. Additionally, it is a classification tool that is a supervised technique in machine learning that optimises the range of differences between the two groups.
By modifying the data using a kernel trick technique, SVM can be used for classification or regression tasks. On the other hand, the SVR technique is defined as a regression or prediction method that keeps all of the fundamental characteristics of the optimum margin methodology. In a nutshell, both the SVR and the SVM, with minor differences, follow the same value (Ibrahim et al., 2022). Understanding the three main elements that make up the foundation of SVM and SVR models, which are described below, will help you comprehend the SVM/SVR models more clearly.
• Hyperplane separation into high-dimensional spaces.
• To solve a nonlinear issue, use kernel functions and their methods.
• Soft margins are used to reduce error.
One of the most important features of the SVR model is the availability of various kernel functions, which allows for multiple options (Ibrahim et al., 2022).

Artificial neuro-fuzzy inference system
Fuzzy logic models have proven to be effective for resolving complex computational issues. It can deal with nonlinearity, uncertainty, and subjective information. The adaptive neuro-fuzzy inference system (ANFIS) is the most extensively used fuzzy logic model. The ANFIS is a multi-layer feedforward network that employs a neural network learning algorithm. It can recognise nonlinear boundaries, use fuzzy logic to differentiate nonlinear equations and map the input-output space. It is capable of achieving a very nonlinear mapping. Choosing the type of interfering system, such as mamdani, sugeno, and tsumoto; aggregation; and defuzzification are the stages of ANFIS (Karaboga & Kaya, 2020).
ANFIS model's characteristics include interpreting fuzzy rules using natural language and simulating nonlinear functions with arbitrary complexity and insufficient data (Fung et al., 2019).

Random forest
In an attempt to improve the accuracy of decision trees, Breiman (Breiman, 2001) proposed an improved method named a random forest (RF). In this method, classification, and regression tasks can be achieved depending on the base models whether these models are classification trees or regression trees . The name of this algorithm is inspired by the movement from the root of one tree to its terminal's nodes. Accordingly, this method has been increasingly used due to this movement. In other words, once a tree within a particular forest is split, the RF method chooses a random subset of the independent variables. Not only the predictive accuracy but also the running time is also relatively improved (Herrera et al., 2010;Seo et al., 2018). We would refer the readers to Breiman (Breiman, 2001) to get detailed information regarding how the RF is mathematically formulated.
RF can handle large datasets with multiple features, and modelling accuracy improves as the number of trees rises (Ibrahim et al., 2022).

Deep learning model
As artificial intelligence has quickly advanced, numerous studies have successfully used deep learning models like the long short-term memory (LSTM), stack Autoencoder, and deep restricted Boltzmann machine. One class of advanced ANN network among these deep learning techniques is the LSTM, which has feedback connections built into its model architecture. In the meantime, it comprises memory blocks with self-connection (in the hidden layer), which can store the network's temporal state (Zhu et al., 2020). Applications of deep learning models for WL forecasts, such as (Kim et al., 2022;Shuofeng et al., 2021;Xie et al., 2021).

Data pre-processing
Data pre-processing methods are regarded as critical to the process of data mining. Data preprocessing is essential to ensure that all predictors receive adequate importance during the learning phase and speed up the procedure. These methods are critical in models because they promote high accuracy and low computational costs during the learning phase, as noisy and unreliable information in data records will negatively impact the training stage and result in a poor model (Khudhair et al., 2022).

Normalisation
This method suggests reducing outliers' impact by smoothing the answer space. The continuous variables must be transformed for the time series to be normal or nearly normally distributed. In addition, if the time series of the dependent variable is not normally or nearly normally distributed, the model's findings are degraded (Zubaidi, Kot et al., 2018;. There are various methods for normalisation such as, the natural logarithm was employed to normalise the data to decrease the effect of multicollinearity between input variables, Z-score normalisation is a traditional method of standardisation that uses the mean and standard deviation to standardise parameters, Min-Max normalisation is among the most common and widely utilised data normalisation methods, and decimal scaling is the process of moving the decimal point of the variable's values to achieve the normalised value (Alawsi et al., 2022).

Cleaning
Noise and outliers can have a negative impact on data analysis and the suggested model's performance as well. As a result, data cleaning is required to discover and eliminate data corruption . Where an outlier is a case in which the extreme value on one variable or the strange combination of scores on a variety of variables distorts statistics, and noise is an undesirable variation in the dependent variables that are estimated by covariate scores (Tabachnick & Fidell, 2013).
There are different noise components in each time series. The most efficient techniques to denoise the original time series are by analysing them into multiple components (such as, Wavelet, Empirical Mode Decomposition, Singular Spectrum Analysis) (Alawsi et al., 2022).

Selection of best model input
The selection of explanatory factors as model input data is a crucial stage in creating any successful prediction model (Maier & Dandy, 2000).
Most of the previous research studies only employed one or two steps of data pre-processing. Also, several studies did not apply data pre-processing techniques (see Table 1), which may adversely affect the accuracy of the forecast's result (see Table 2).

Hybrid models classification
Researchers have developed unique models in response to the requirement for enhanced reliability, capability, and accuracy in data-driven methodologies. Hybrid models are being developed to fulfil new requirements; their main aim is to combine the benefits of two or more methods to improve the capabilities of single models. These hybrid models are typically made up of various procedures, with one serving as the principal technique and the others as pre-or post-processing methods (Zubaidi, Ortega-Martorell, Al-Bugharbee et al., 2020). According to the reviewed papers, hybrid models can be classified into four groups: the components combination-based hybrid models (CBH), pre-processing-based hybrid models (PBH), parameter optimisation-based hybrid models (OBH), and hybridisation of parameter optimisation-based with preprocessing-based hybrid models (HOPH) as in Hajirahimi and Khashei (Hajirahimi & Khashei, 2022) (see Figure 3) and (Table 2).

Pre-processing-based hybrid models (PBH)
In pre-processing-based hybrid models, the input data is first pre-processed employing different methods such as decomposition-based, denoising-based, feature selection, dimensionality reduction, and data cleaning approaches. In general, the main motivation for decomposition-based approaches is to divide a time series into several components with varying degrees of complexity. While the primary aim of the filter and denoising-based approaches is to detect and remove existing noise in the underlying time series. In the second step, the screened time series is forecasted by the appropriate individual model (Hajirahimi & Khashei, 2022).
Wavelet transformation can be used as a pre-treatment tool to help solve different problems in a variety of disciplines of inquiry. Wavelet analysis has gained prominence in recent years, particularly for overcoming the limitations of stochastic forecasting models such as ARIMA or classic neural network techniques. Wavelet analysis can be used in statistics to denoise data, estimate nonparametrically, and compress data (Ozan Evkaya & Sevinç Kurnaz, 2020).
The daily water levels at the northern and southern boundaries of the Bosphorus Strait are predicted using a combination of discrete wavelet transform-fuzzy (DWT-Fuzzy) and continuous wavelet transform-fuzzy (CWT-Fuzzy) models. Based on evaluation criteria (i.e., RMSE and CE), the CWT-Fuzzy model outperformed both the DWT-Fuzzy and standalone Fuzzy models for forecast lead-times up to 7 days (Altunkaynak & Kartal, 2019).
Loh (Loh, Ismail, Mustafa et al.,) used discrete wavelet transformation (DWT) with different kinds of mother wavelets (i.e., haar, sym2, sym3, db2, db3, coif1, coif2, and coif3) to denoise the data. An artificial neural network (ANN) model is also applied to simulate the Kelantan River water level. The results revealed that the performance of the sym3 offers the best scenario of DWT and the hybrid model DWT-ANN better than the standalone ANN model. Shafaei and Kisi (Shafaei & Kisi, 2015) combined the wavelet (W) method with three different prediction models, auto-regressive moving average (ARIMA), support vector regression (SVR) models, and adaptive neuro-fuzzy inference system (ANFIS), to forecast monthly lake level variations. The suggested combined models W-SVR, W-ANFIS and W-ARMA are compared against single ARMA, SVR, and ANFIS models based on their performance accuracy. The combined models provide improved precision in forecasting lake levels in the study region, according to the findings. Also, the W-SVR model slightly outperforms the other combined models.
Seo (Seo et al., 2015) applied two hybrid models, which are wavelet-based adaptive neuro-fuzzy inference system (WANFIS) and wavelet-based artificial neural network (WANN). A time series is decomposed into approximation and detail components using wavelet decomposition. The findings of this study showed that combining wavelet decomposition with artificial intelligence models can be a valuable tool for accurately forecasting daily water levels, and it can outperform standalone forecasting models. Also, the WANFIS yields the best performance.
Xie (Xie et al., 2021) proposed a deep learning approach called long-, short-term memory network combined with discrete wavelet transform (WA-LSTM) and compared it with WA-ANN and WA-ARIMA models for daily water level prediction in Yangtze River, China. A novel LSTM network is used to learn generic water level features through layer-by-layer feature granulation with a greedy layer-wise unsupervised learning algorithm. The wavelet transform is applied to decompose time series into details and approximation components to understand temporal  1961-1972-1977-1982-1988-2010-  properties better. According to the results, the WA-LSTM model is stable, dependable, and extensively applicable.
Season Algorithm technique can successfully remove the seasonal component from the original time series data. The additive season algorithm (ASA) and the multiplicative season algorithm (MSA) are two decomposition algorithms. The season method decomposes observed time series data into trend-cycle, seasonal index, and irregular (error term) components based on these concepts (Altunkaynak, 2019).
Altunkaynak (Altunkaynak, 2019) suggested increasing prediction accuracy and prolonging water-level lead-time prediction, a new predictive model based on the season algorithm (SA) and multi-layer perceptron (MLP) approaches. The additive season algorithm (ASA) was employed for the first time to estimate water levels as an alternative data pre-processing technique, and its performance was compared to that of the wavelet transform (WT). Based on the mean squared error (MSE) and the Nash-Sutcliffe coefficient of efficiency (NSE) as performance evaluation criteria, Additionally, other types, such as Kalman filtering (KF) was applied: Zhong ) established a hybrid ANN-Kalman filtering technique to forecast the short-term daily water level of the Yangtze River, China. The performance of the hybrid model was compared with the standalone ANN model. Historical daily water level data from 29 July 2012 to 31 July 2016 were applied to build and assess the prediction models. The findings showed that the hybrid model could accurately simulate the water level data better than the standalone ANN model.
Zhong  proposed integrating the artificial neural network with the Kalman filtering algorithm (ANN-KF) to predict the short-term water level of the Yangtze River in China. The model was built and assessed by the daily water level data over three years (2014)(2015)(2016). The KF algorithm confirms the superiority of local Kalman filtering, and the ANN-KF technique predicted the water levels better than the traditional ANN model.

The components combination-based hybrid models (CBH)
The CBH models use the exceptional capacity of an individual prediction model in various combination structures to improve forecasting accuracy (Hajirahimi & Khashei, 2022).
Gated recurrent unit (GRU) and convolutional neural network (CNN) network structures are combined to create a CNN-GRU model. The GRU part learns the changing water level trend, and the CNN part learns the spatial correlation among water level data observed from adjacent water stations on the Yangtze River in China. The study employed data from multiple locations to predict the water level of the middle location. The CNN-GRU model was compared with three models based solely on GRU and other cutting-edge techniques such as the autoregressive integrated moving average model (ARIMA), wavelet-based artificial neural network (WANN), and long-short term memory model (LSTM). The outcomes showed the CNN-GRU model outperformed the ARIMA, WANN, and LSTM models depending on three assessment factors: Nash-Sutcliffe efficiency coefficient (NSE), average relative error (MRE), and root mean square error (RMSE) (Pan et al., 2020).
Nie (Nie et al., 2021) proposed the CNN-BiLSTM water level forecasting method includes an attention mechanism. The study employed hourly water level and rainfall data for Pinghe basin in China, from 2010 to 2020. CNN extracts spatial characteristics from water level data, and BiLSTM learns time period characteristics by integrating past and future sequence information. An attention mechanism has been implemented to focus on the salient features in the sequence. This method outperforms the support vector machine (SVM), bidirectional long short-term memory network (BiLSTM), and temporal convolutional neural network (TCN) models in terms of accuracy.

Parameter optimisation-based hybrid models (OBH)
The parameter optimisation-based hybrid models focusing on metaheuristics are solution approaches that coordinate an interplay between local improvement procedures and higherlevel strategies to achieve the objective. Develop a procedure that can escape local optima and conduct a rigorous solution space search (Ghalehkhondabi et al., 2017).

Particle swarm optimisation (PSO)
Particle swarm optimisation (PSO) is a heuristic approach for solving non-continuous and nonlinear problems. This method tracked the cooperative and social behaviour shown by birds and fish. It is a population-based algorithm that uses particles to optimise a parallel group of swarms. These swarms move at a set speed through the issue area in pursuit of the best-fit solution. The memory of each particle is retained, allowing it to keep its previous optimal position. Particle positions are categorised into global and local best (Shah et al., 2021). Flow diagram and formulas of the PSO algorithm are found in Shah (Shah et al., 2021).
Panyadee (Panyadee et al., 2017) enhanced the performance of the artificial neural network (ANN) model by combining it with the particle swarm optimisation (PSO) algorithm to predict the water level of the Mea-Bong River, Thailand. The PSO algorithm proposes in this research is to finetune the hyperparameters of the ANN model. The evaluation results reveal that the PSO algorithm provides the best hyperparameters values that lead to shortening the time it takes to train an ANN. Also, the outcomes present that the forecast error is 1.88 percent for the training stage, whereas the error is 7.82 percent for the testing stage. The simulated water level from the suggested hybrid model is suitable for use in early warning systems for flash floods.

The firefly algorithm (FFA)
The firefly algorithm (FFA) is a swarm intelligence optimisation technique inspired by fire flies' movement. An optimisation problem's answer can be modelled as a firefly that lights in proportion to its quality. As a result, each brighter firefly attracts its mates, regardless of gender, making the search for space exploration more efficient. Fire flies are drawn to the light. The entire swarm makes a beeline for the most brilliant firefly. As a result, the firefly' appeal is related to their brightness. Furthermore, the brightness is determined by the agent's intensity (Ghorbani et al., 2018;Tripura et al., 2020). FFA algorithm formulas are mentioned in Yang (Yang, 2010).
Soleymani (Soleymani et al., 2016) designed a novel technique that combined radial basis function (RBF) and firefly algorithm (FFA) to forecast the water level of Selangor River, Malaysia. The FFA algorithm is employed to interpolate the RBF model to estimate the best solution. The RBF-FFA technique was compared with support vector machine (SVM) and multilayer perceptron (MLP) models to increase its validation. The results reveal that the proposed RBF-FFA model delivers more accurate predictions than other machine learning models, the SVM and MLP, based on several statistical indicators. The outcomes also specify that the established RBF-FFA model can be utilised as an efficient technique for the precise forecast of the water stage of the river.
Kisi  evaluated a new methodology that coupled the support vector machine (SVM) with the firefly algorithm (FFA) to predict daily water-level data for Urmia Lake, Iran. The FFA was applied to estimate the optimum SVM hyperparameters. The SVM-FFA model was validated by comparing genetic programming (GP) and artificial neural networks (ANNs). According to the findings, the SVM-FFA technique produced better predictions with greater generalisability in 1 day ahead of lake level forecasts than GP and ANN models.

Ant colony optimisation (ACO)
Ant colony optimisation (ACO) is a probabilistic strategy that uses graph minimisation to locate acceptable paths to solve numerical problems. Artificial ants are multi-agent tactics modelled after real-life ant behaviour. The pheromone-based contact of biological ants is always the most popular model. An ant colony optimisation algorithm is a type of optimisation algorithm that focuses on the behaviour of ants. The artificial' ants' identify optimal solutions by iterating over a parameter space that describes all possible solutions. When true ants are out investigating their environment, they leave behind pheromones that help them find food. The simulated 'ants' keep track of their positions and the consistency of their solutions, such that in succeeding simulation iterations, more ants find better answers (Adnan, Mostafa, Elbeltagi et al., 2021). Kucukkoc and Zhang (Kucukkoc & Zhang, 2013) include a flow diagram and the ACO algorithm's formulas.
Deng (Deng et al., 2021) examined the combined Elman neural network (ENN) with ant colony algorithm (ACO) to forecast water level in Dongting Lake, China. To ensure that the ENN-ACO model captures the nonlinear pattern of independent and dependent factors, the results of the ACO-ENN model validate with a hybrid multi-layer perceptron and genetic algorithm (MLP-GA) model and standalone MLP model. The study used hourly data of water levels from 2004 to 2020.
The outcomes gained display that the ENN-ACO model can forecast the hourly water level with an error of 1.2%. It can offer an advanced water level forecast of 21 h ahead of the time step. Generally, the results indicate that the ACO-ENN model's accuracy is better than the GA-MLP and the standalone MLP models.

The genetic algorithm (GA)
The genetic algorithm (GA) is the most prevalent algorithm, which is a population-based optimisation algorithm that mimics natural evolution (natural genetics and natural selection). GA's fundamental theories are founded on Darwinian principles of mutation, operator selection, and subsequent recombination (crossover) (Ibrahim et al., 2022). In a nutshell, the method begins by producing a random population of chromosomes representing potential solutions to a given problem. Then, for each chromosome, determine the fitness function that defines the selection stage's probability. By uniting two separate chromosomes, the crossover operation is conducted on a pair of selected chromosomes to produce a new better offspring. As a result, chromosomal components (genes) at randomly selected chromosome sites are altered. Mutation is the word for this final genetic alteration. The next population to be examined is the offspring produced through genetic alteration procedures (Mulia et al., 2013). Flow diagram of GA algorithm is found in Ibrahim (Ibrahim et al., 2022).
Chen (N. Chen et al., 2019) explored a genetic algorithm linking a back-propagation neural network (GA-BPNN) to predict water level. However, a conventional genetic algorithm is susceptible to local optimisation and local convergence when confronted with a complicated neural network. To solve this issue, a new technique termed an improved genetic algorithm (IGA) coupled with a back-propagation neural network model (IGA-BPNN) is presented using several genetic methods. Weather and hydrologic data collected between 2010 and 2017 were applied to estimate the Han River's water levels. The Pearson correlation coefficient (R), Nash-Sutcliffe efficiency (NSE) coefficient and root mean square error (RMSE) were used to assess the prediction models. The outcomes displayed that IGA-BPNN outperformed the GA-BPNN and BPNN models depending on three statistical indicators. The IGA-BPNN technique revealed suitability for water-level estimates and would offer a better effect on short-term flood anticipating.
Imran (Imran et al., 2021) inspected the performance of an adaptive neuro-fuzzy inference system (ANFIS) model that coupled with a genetic algorithm to forecast the water levels of the Jhelum River, India. Two meteorological stations' temperature and precipitation data were employed to train the model to simulate the river's water level. Several ANFIS methods with various membership functions and optimisation approaches are examined, and the bestperforming technique is selected for additional modification. The parameters of chosen ANFIS technique are then optimised via the genetic algorithm to gain better outcomes. The ANFIS-GA technique exhibited better results as compared to the traditional ANFIS model.
Lineros (Lineros et al., 2021) studied the effect of using a multi-objective genetic algorithm (MOGA) framework for the design of an artificial neural network (ANN) technique that is applied for designing 1-step-ahead river water level prediction models. A design process is a semi-automatic approach that can split data into datasets and find a near-optimal model with the proper topology and inputs, performing well on unseen data (data not utilised for model design). The study used water level data every 10 minutes over eleven years of the Carrión River, Northwest of the Iberian Peninsula. The results demonstrate that the proposed framework can produce low-complex models with high performance on unseen data, obtaining an RMSE of 2.5 10-3, which compares favourably to outcomes produced by alternative methods.

Grey wolf optimisation (GWO)
Grey wolf optimisation (GWO) was created as a result of natural inspiration. Grey wolves' hunting habits and social leadership structure serve as inspiration for the core concept. The GWO has a number of advantages when it comes to dealing with problems. Simplistic, flexible, and avoiding local optima are all advantages of nonlinear and multivariable functions . Based on the effectiveness and decision-making power of the group, each member of a grey wolf herd is classified asα, β, δ and ω. The alpha wolf is normally the strongest and most dominating wolf in the herd, and the rest of the pack should obey his or her orders. The β wolves are the second group of alphas that serve as counsellors. The beta wolves help the alpha wolf by reiterating their directives to the remainder of the pack. In terms of leadership structure, the δ wolves are below the α and β wolves and above the ω wolves. They serve as the group's guardians, sentinels, hunters, and caregivers. The ω wolves are ranked last in the decision-making hierarchy and must subordinate to all other wolves (Mohammadi et al., 2020). GWO algorithm formulae can be found in Mohammadi et al. (2020).
Other algorithms for forecasting water level have also been combined with machine learning, such as the gravitational search algorithm (GSA), sunflower optimisation (SO) algorithm, and grasshopper optimisation algorithm (GOA).
Ghorbani (Ghorbani et al., 2018) designed a new hybrid forecasting model that combines the gravitational search algorithm (GSA) with the multi-layer perceptron (MLP) method to estimate water levels in two lakes. The MLP was coupled with an additional two meta-heuristic optimisations, namely particle swarm optimisation (PSO) and firefly algorithm (FFA), to increase the validation of the suggested model. Also, the study used two stochastic models: ARMA and ARIMA. To train and assess the MLP-GSA model, monthly water level data from 1938 to 2005 and 1942 to 2011 for Lakes Winnipesaukee and Cypress were used. The results highlighted the significant efficacy of the MLP-GSA, which outperformed the other hybrid (MLP-PSO and MLP-FFA) and standalone (MLP, ARMA, and ARIMA) models.
Ehteram (Ehteram et al., 2021) enhanced the capacity of the adaptive neuro-fuzzy inference system (ANFIS) and multi-layer perceptron (MLP) models utilising a sunflower optimisation (SO) algorithm to predict the lake water level. Also, the hybrid models' performance was validated with the firefly algorithm (FFA) and practical swarm optimisation (PSO) algorithms. The optimisation algorithms were applied to determine the optimum tuning hyperparameters for ANFIS and MLP models. The rainfall, temperature, and water level lags data were used to predict the water level of Urmia Lake, Iran, from 1940 to 2004. The ANFIS-SO model was found to have a lower level of uncertainty depending on the percentage of more responses in the confidence band and the smaller bandwidth of the model.

Hybridisation of parameter optimisation-based with preprocessing-based hybrid models (HOPH)
The metaheuristic algorithms integrated with PBH models to choose optimal parameters of a forecasting model, pre-processing technique, or investigate the optimised weights to aggregate decomposition component predictions (Hajirahimi & Khashei, 2022).
Mohammadi (Mohammadi et al., 2020) inspected the accuracy of coupled support vector regression (SVR) and the grey wolf algorithm (GWO) to forecast fluctuations in lake water level. Additionally, three pre-treatment methods, i.e., random forest, relief algorithm, and principal component analysis, were applied to select the best scenario of predictors. Monthly datasets of Titicaca Lake in South America, from August 1973 to January 2017 were used to build and evaluate the model. The results show that the random forest method offers the best model input scenario with four lags. The hybrid model SVR-GWO simulates water level better than standalone SVR based on several statistical criteria.
Tao (Tao et al., 2021) used improved grasshopper optimisation (IGOA) algorithm to integrate both relevance vector machine (RVM) and artificial neural network (ANN) models to forecast the catchment water level in Malaysia. The classical GOA and particle swarm optimisation (PSO) algorithms were employed to validate the performance of the IGOA algorithm. The hourly rainfall and water level data lags from 1 January 2017 to 31 December 2019 adopted to build and validate the prediction models. Considering different statistical criteria, the IGOA algorithm significantly improved the models' performance, and the RVM-IGOA was superior.
Ghorbani (Ghorbani et al., 2017) investigated the predictive capability of a combined model integrating the multi-layer perceptron (MLP) with the firefly algorithm (FFA) to forecast water level in Lake Egirdir, Turkey. Monthly data for 56 years  were used to train and test the suggested hybrid MLP-FFA model to develop and explore its veracity. Four lagged combinations of historical data were adopted as predictors using the average mutual information technique. The outcomes show that the MLP-FFA model outperforms the standalone MLP model based on different statistical score metrics, e.g., root mean square error was about 0.029 m for the MLP-FFA model and compared to 0.102 m for the standalone MLP model.
Following is a list of observations obtained from a survey of many articles: (1) Generally, almost the previous studies applied pre-treatment signal only and did not focus on the other steps of data pre-processing techniques.
(2) The wavelet algorithm was shown to be effective in denoising raw data, improving the results' accuracy.
(3) Different metaheuristic algorithms were used to integrate machine learning models. These algorithms have proved their capacity to tune all machine learning models and earn a significantly higher score on several statistical evaluations. In addition, when compared to a trial and error procedure, the chances of achieving optimal parameters are substantially better.
(4) Few studies have employed the hybridisation of parameter optimisation based with preprocessing-based hybrid models (HOPH). Tao (Tao et al., 2021) suggested assessing the performance of other nonlinear input choice approaches, such as integrating REF with other ML techniques like ANN and RF. Pan (Pan et al., 2020) recommended employing cluster approaches to analyse the water levels of various river segments for clustering and extracting significant environmental parameters. Ghorbani (Ghorbani et al., 2018) advised using the hybrid MLP-GSA model for short-term forecasting of other hydrological variables (e.g., lake water levels, rainfall, evaporation, daily discharge, drought and flood indices) because of the high accuracy of the developed hybrid model over the applied standalone and hybrid methods. Seo (Seo et al., 2015) suggested developing hybrid techniques joining the wavelet decomposition method with other machine learning models and metaheuristic algorithms for predicting hydrological factors with non-stationary and nonlinear relationships. A comparative research study on various learning algorithms such as Levenberg-Marquardt and conjugate gradient algorithms can also be proposed to enhance hybrid techniques' performance. Further research studies can also be recommended to explore the model performance for various input series constructed from effective or all wavelet components. Farzad and El-Shafie (Farzad & El-Shafie, 2016) recommended that a variety of ANN techniques should be examined to identify the best fit for each cluster of data points.

Future perspectives
Also, the literature review allows us to summarise some future perspectives: (1) It is suggested to apply additional approaches for pre-treatment data, such as empirical mode decomposition (EMD) and singular spectrum analysis (SSA).
(2) The choice of input variables is crucial in determining the model's performance and accuracy. With this in mind, it is suggested that more effort be made into determining the ideal input variable combination; therefore, it is recommended to apply other methods to determine the inputs, such as dimensionality reduction methods, feature extraction, and feature selection methods.
(3) It is recommended that the data pre-processing steps be completed to avoid outliers and noise and to determine the most dependable and accurate data to be utilised as predictors later.
(4) Implementing the combined metaheuristic algorithms and machine learning techniques in WL forecasting has grown considerably in recent years. Nevertheless, there is still room for improvement regarding the WL forecast.

Conclusion
The use of machine learning to forecast water levels has expanded, as has the research and development of this technology. Optimisation models have long been used to aid decision-making. However, certain situations are complex, poorly understood and nonlinear. They have a large number of possible solutions, particularly when it comes to ecological modelling. This shows that, depending on the problems to be solved, this strategy can be used in a variety of contexts.
This paper reviews the recent water-level forecasting works for recent years, where the data pre-processing and hybrid models have been studied in detail, and reviewed a number of the machine learning models, as it turned out that the majority of the models, where it was observed that most machine learning models, such as ANN, ANFIS, and SVR deal with nonlinear data series. The papers chosen for this review demonstrated that there has been a rising trend in recent years toward using hybrid methods in the field of WL modelling. The results reveal that no single global model or technique consistently outperforms all other methods. Comparing the hybrid models with standalone models shows that hybrid models produce better results than standalone models. Also, metaheuristic has improved the single models by selecting the best hyperparameter for the model, saving time and avoiding the lengthy and imprecise trial and error method. In addition, data pre-treatment methods effectively enhance data quality before feeding it into the model by removing noise from time series and choosing the suitable model input scenario. The majority of previous studies used one or two data preprocessing stages. While several of the research did not use all steps of pre-processing data procedures. Therefore, the fact that the current study offers a thorough analysis of all the aforementioned criteria is a key strength of the study.
No new strategies such as the hybrid model or any other have emerged as the best prediction method despite recent significant breakthroughs in machine learning models. Therefore, academics still have an opportunity to create hybrid approaches for particular applications for water level prediction as a research challenge.
In the future, it is recommended to use data pre-processing approaches for their high ability to improve data accuracy, employ updated metaheuristics to simulate water levels in rivers and lakes, and utilise the hybrid model (HOPH) because it optimises both the data and the model.

Citation information
Cite this article as: Application of hybrid machine learning models and data pre-processing to predict water level of watersheds: Recent trends and future perspective, Sarah