Pan evaporation estimation by relevance vector machine tuned with new metaheuristic algorithms using limited climatic data

This study investigates the feasibility of a relevance vector machine tuned with improved Manta-Ray foraging optimization (RVM-IMRFO) in predicting monthly pan evaporation using limited climatic input data (e.g. temperature). The accuracy of the RVM-IMRFO was evaluated by comparing it with RVM tuned by gray wolf optimization, RVM tuned with a whale optimization algorithm, and RVM tuned with Manta Ray foraging optimization concerning root mean square errors (RMSE), mean absolute errors (MAE), determination coefficient (R2) and Nash-Sutcliffe Efficiency (NSE) and new graphical inspection methods. The models were assessed using data acquired from two stations in China and data were divided into three equal parts. The models were tested using each data set. The application outcomes revealed that the proposed algorithm considerably improved the accuracy of a single RVM in monthly pan evaporation prediction by an average improvement in RMSE, MAE, R2, and NSE as 27.65%, 27.53%, 8.40% and 8.63%, respectively. It is also found that the proposed algorithm showed significant dominance over others models with respect to improvement in overall mean values of RMSE, MAE, R2, and NSE statistics from 34.7–38.2 to 18.2–19.5, 36.2–36.4 to 19.1–18.5, 12.5–13.8 to 3.6–3.7, and 12.4–14.6 to 3.6–3.9%, for both climatic stations, respectively. Importing extraterrestrial radiation and periodicity component (month number of the data) into the model inputs improved the prediction accuracy of the implemented models. The outcomes revealed that the RVM-IMRFO performed superior to the other methods in predicting monthly pan evaporation using only temperature data which is essential, especially in developing countries where other climatic data are missing or unavailable. The RVM model was also compared with standard multi-layer perceptron neural networks (MLPNN) and found that the first acts better than the latter in monthly pan evaporation prediction.


Introduction
Evaporation is an important component in the hydrological cycle that plays a crucial role in clouds' moisture budgets and consequently influences the precipitation's timing, frequency, intensity, duration, and spatial occurrence, among others (Worden et al., 2007).That is why the magnitude of total evaporation is often equivalent to total rainfall.Therefore, it is crucial to the water balance of a catchment or region.Changes in evaporation rates will ultimately alter hydrological regimes and affect soil moisture availability.Climate change is expected to affect water resources through changes in evaporative demand.
due to wind speed reduction (Kirono et al., 2009;Liu et al., 2004;Roderick & Farquhar, 2004;Zhang et al., 2015).Pan evaporation observation has conventionally been used to characterize the evaporative stress of the atmosphere when assessing crop water needs.Measurement of evaporation through pans effectively integrates the relevant physical parameters, such as temperature, radiation, humidity, and wind speed, into a single measure of evaporative demand to quantify water release in the atmosphere (Roderick et al., 2007).The quantity of evaporation from a saturated soil surface is usually similar to that from nearby a water body (Lu et al., 2018).Although the moisture supply may cause a reduction in evaporation, a lack of surface moisture is not a limiting factor on E pan .Thus, E pan is not necessarily related to the surface evaporation amount.Although actual evaporation decreases with the decreased available moisture, the heat flux that produces the energy to increase potential evaporation also increases the evaporation from water-filled pans (Lawrimore & Peterson, 2000).Just as actual and potential evaporation have a complementary relationship, the same is for actual and pan evaporation.Despite being an important hydrological component connecting water balance and energy, many regions lack pan evaporation measurements for different technical-economic reasons (Fu et al., 2009).
In case when direct observation of pan evaporation (E p ) is not possible, then it can be indirectly estimated from empirical equations that consider different meteorological parameters wind speed, relative humidity, temperature, and global solar radiation (Kumar et al., 2012;Vishwakarma et al., 2022).Data limitations in most remoted areas underline the significance of using alternative methods instead of conventional methods.Remote sensing is used in many cases to estimate the evaporation in those regions where ground observations were missing.Nevertheless, remote sensing data with good quality in terms of resolution and temporal, and spatial distribution are unavailable at each point of the globe.Furthermore, those data are costly and, therefore, not always affordable (Anayah & Kaluarachchi, 2014).Therefore, many researchers have proposed and tested different soft computing methods that could estimate evaporation using limited meteorological.Kumar et al. (2021) estimated E pan considering different input parameters using several soft computing techniques such as the artificial neural network (ANN), radial function-based support vector machine (SVM-RF), linear function-based SVM (SVM-LF), wavelet-based ANN (WANN), and multilinear regression (MLR).They found that the SVM-RF model performed better than the other proposed models; nevertheless, that depends on the input data and the testing/ training sample size.Ghorbani et al. (2018) proposed a hybrid soft computing model called the Multilayer Perceptron-Firefly Algorithm (MLP-FFA) based on the FFA optimizer integrated within the MLP model for the prediction of daily pan evaporation.They applied their proposed model to a semi-arid region and compared it with other soft computing models.They suggest that the MLP-FFA model outperformed other models thanks to the FFA optimizer.Wu et al. (2020) couped extreme learning machine (ELM) with two novel meta-heuristic algorithms, namely, whale optimization algorithm (WOA) and flower pollination algorithm (FPA), to estimate monthly E pan .The proposed model was applied in four different locations using limited climatic data.The results from the proposed hybrid model were compared with those from the popular soft computing models.Their findings revealed that the prosed hybrid model demonstrated better accuracy and significantly decreased computation time.Also, they found that extraterrestrial solar radiation and T max , T min would be sufficient to estimate monthly E p .Abghari et al. (2012) investigated whether using mother wavelets as activation functions instead of frequently used sigmoid to see if there is a significant difference among them in the case of daily pan evaporation prediction in semi-arid regions.They concluded that overall computation performance and results' accuracy were significantly improved.
Similarly, Ghaemi et al. (2019) found that hybrid soft computing models performed considerably better than existing standalone models.Namely, optimizers such as MODWT and C p as pre-processing techniques enhance prediction accuracy.Thus, they are both recommended for application in different climate zone.Malik et al. (2022) developed two innovative soft computing models, namely, Deep Learning (DL) and Gradient boosting Machine (GBM), to estimate the monthly pan evaporation based on a maximum air temperature.They found that the DL model outperformed the GBM model regarding accuracy and computation time.In addition to some innovative soft computing models mentioned above, there is a handful of other models, e.g. the one developed by Molina Martínez et al. (2006), Abed et al. (2021), Kim et al. (2015), Ehteram et al. (2022), Lakmini Prarthana Jayasinghe et al. (2022) developed to predict E pan at different temporal and spatial scale, and climate conditions.Although many soft computing models were developed specifically to predict E pan , there is still room for improvements, especially regarding computation time and accuracy under limited meteorological data constraints.In recent years, hyrid soft computing methods have been preferred over standalone models and successfully applied in the field of hydrology to model different variables; e.g for estimation of hydraulic conductivity and longitudinal dispersion coefficient (Chen et al., 2022;Kargar et al., 2020;Singh et al., 2022), water quality modelling (Deng et al., 2021;Deng et al., 2022), drought prediction (Shamshirband et al., 2020), evaporation forecasting (Malik et al. 2020 ), river water level and ground water level modelling (Band et al., 2021;Ebtehaj et al., 2021).Therefore, in this study, in addition to testing different novel soft computing models, namely, Relevance vector machine (RVM), hybrid RVM trained with gray wolf optimization (GWO), Whale optimization algorithm (WOA), Manta Ray foraging optimization (MRFO), proposed improved Manta-Ray foraging optimization (IMRFO) using limited input variables (Tmin and Tmax) to predict E pan in humid climate conditions.In this study, a novel hybrid soft computing model is proposed i.e. called RVM-IMRFO resulted as integration of RVM with IMRFO model.RVM is chosen as kernel function-based model instead of LSSVM or SVM because it gives better results.Bonakdari et al. (2019) compared RVM with several other soft computing models to forecast water level fluctuation in a freshwater lake.They found that RVM provides good results in terms of accuracy and computation time.Ghosh and Mujumdar (2008) applied a statistical downscaling based on sparse Bayesian learning and RVM to model streamflow at river basin scale during monsoon period by comparing the results obtained by RVM with those obtained by using other state-of-the-art soft computing models such as SVM.Their findings suggest that the RVM showed better performance than the other models.Similarly, Falke et al. (2010) applied RVM to predict flow discharge to improve water for irrigation in arid regions.They concluded that RVM model was capable to provide results close to the observed data, suggesting that this model can be widely used in the field of water resources management.
While for optimization, an improved version of MRFO is employed as it reduces the computation time and errors.Nguyen et al. (2021) employed MRFO combined with deep neural networks (DNN) for flood susceptibility mapping.They compared the results obtained by using MRFO-DNN with those obtained from other models.Their findings suggest that combining MRFO and DNN improved flood susceptibility classification precision with an area under the curve (AUC) of 0.98.Elmaadawy et al. (2021) employed MRFO to predict several important indicators of the wastewater treatment plant.Their findings suggest that MRFO, especially when combined with other models, provides very accurate results.The MRFO model performs better than other state-of-the-art models for solving other complex problems, such as operation of a photovoltaic/diesel generator/pumped water reservoir power system (Liu et al., 2021).The rest of the manuscript is organized as follows: Section 2 presents brief information about the study cases, data acquisition, and soft computing models applied in this study; Section 3 presents the main findings regarding the existing and proposed new hybrid model and discusses their relevance and applicability in the light of recent literature on the same topic.Finally, the main conclusions and recommendations achieved in the study are presented in the Section 4.

Case study
For the case study area, the middle reaches the area of Yangtze River Basin (YRB) is selected in this study as shown in Figure 1.YRB's middle reaches are mainly located in central China's two key provinces (Hubei and Hunan).These two provinces play a key role in the economic development of the country due to good rice and other agricultural products produced in this area but also having one of six nature reserves of the International Convention on Wetlands in the form of Dongting lake, which is the world second largest freshwater lake.This area covers a drainage basin of 60,500 km 2 with a subtropical monsoon climate and a 22 million population.Due to the subtropical monsoon climate, the area faces abundant but uneven rainfall.The annual precipitation variation in the basin is 1350-1550 mm whereas 60% of precipitation of total annual precipitation occurs from April to August.The annual temperature variation in the basin is 16.5°C with 1580 annual sunshine hours.
In contrast, the monthly average temperature in January is 4.3°C.The monthly average temperature in July is 29.2°C.Basin faces elevation variation from north to south ranging from 2000 m to 12 m.Hunan and Hubei are Chinese provinces that frequently face flood and drought due to uneven rainfall distribution.Due to uneven distribution of rainfall, 2 billion m 3 annually water shortage occurs in the selected area.The big main reason of this water shortage is due to the evaporation process that affects the hydrological balance significantly.This catchment have a key role in guaranteeing national food security due to agricultural products in this region.40% area of the region is cultivated and produced the total agricultural products of 1.47 × 103 million tons in the form of mainly rice and others.Water shortage also affects the agricultural products and agricultural water consumption in the region is more than 60% of the total annual water consumption i.e. 26 billion m 3 .Therefore, water shortage in the basin restricts the regional economic development.Drought events in the basin are closely associated with misbalance in the hydrological cycle that is mainly affected by evaporation process.Therefore, precise estimation of evaporation process is necessary in such a highly economical agricultural productive drought prone region to understand and gather more information about the climate of the region for better water resources management practices in the basin.
One climatic station is selected from each province for the modelling of pan evaporation in the region, as shown in Figure 1.Jingzhou (JZ) station was chosen from the Hubei province, whereas Yueyang (YUY) station was selected from the Hunan province.For analysis, the monthly climatic data of both stations containing minimum, maximum temperature, and pan evaporation is collected from China meteorological administration (CMA) for 458 months .For better application and performance evaluation of selected machine learning models, the cross-validation technique is adopted in this study.Therefore, 39 years of monthly data is divided into three equal datasets of 13 years (156 months) i.e.M3 = 1988 to 2000, M2 = 1974 to 1987, and M1 = 1962 to 1974.One dataset among these three parts is adopted as a testing dataset during the model application.In contrast, the remaining dataset will be used as a training dataset (Table 1).

Relevance vector machine (RVM)
The relevance vector machine (RVM) proposed by Tipping (2001) belongs to sparse probabilistic Bayesian models.Generally, it is considered a special case of the Gaussian process regression (Niu et al., 2022).The generalization capability of the RVM is improved by introducing the so-called 'prior over the weights'.Its major particularity compared to the support vector machine (SVM) is that, a smaller number of basis kernel function (K F ) termed relevance vectors (RV) are used for constructing the regression model (Figure 2).Giving an ensemble of input variables (x i ) and its target (t i ) for which, the two variables are expressed as (Zhao et al., 2022): The probabilistic formulation leads to (Wang et al., 2022;Zhao et al., 2022): The previous formulation corresponds to a Gaussian distribution over t n with mean f (x n ) and variance σ 2 ().The  regression problem using the RVM can be expressed as attempts to predict [t]for any given [x] according to the formulation below (Wang et al., 2022): Where W = [w 1 , w 2 , . . ., w N ] T are the weights of the linear model, the general form of the regression based on the RVM is a simple linear combination of the weighted K F written as follow: Where K F (x, x i ) is a kernel function, and one function is designated to each sample in the training dataset, the ω = (ω 1 , ω 2 , . . ., ω N ) T are the adjustable weights, and ω 0 is the bias ().For example, a Gaussian kernel function can be written as follows: Where γ is the kernel parameter of radial basis function (i.e. the width), the likelihood of the measured training data set could be written as (Guo & He, 2022;Wang & Gao, 2022): Mirjalili et al. (2014) proposed the grey wolf optimizer (GWO) algorithm.For preparing the mathematical formulation of the GWO algorithm, it is necessary to understand the structure of the social hierarchy of the grey wolf (GW).The social hierarchy is composed of four individuals (Figure 3a): (i) the leader, called alpha (α), (ii)

Grey wolf optimizer (GWO)
The adviser, called Beta (β), (iii) the hunter, called delta (δ), and (iv) the scapegoats, called omega (ω) (Mirjalili et al., 2014).According to Figure 3, the wolves can recognize the exact location of the prey and then surround them.The alpha wolves mainly govern this operation.It was reported in the literature that, in some cases, beta and delta could also help in the hunting process.From Figure 3b, the search agent, i.e. the omega wolves, move by updating its position taking into account the position of the three others wolves alpha, beta, and delta.The final location is randomly detrmined within a circle.Furtheremore, from Figure 3(a), the wolf (α) is a leader in the pack, the (β) in the second level and becomes the leader if the (α) becomes old or passes away.Finally, the wolves (δ) dominate (ω), but they receive switching commands from the betas and alpha wolves (Mirjalili et al., 2014).Mirjalili et al. (2014) state that the algorithm is inspired by the hunting strategies of the grey wolf in nature.It is composed of three major steps: (i) encircling prey, (ii) hunting, and (iii) attacking prey or the exploitation phase.Similar to the philosophy of all swarm intelligence algorithms, the GWO tried to simulate the behaviour of the group hunting of the GW, taking into account an important point that is the respect of the social leadership hierarchy, more precisely, only the three first wolf, i.e.α, β and δ corresponds to the best solutions, and each one updates its position during the training process (Figure 3b), i.e. x α , x β , and x δ , respectively (Yu & Wu, 2022).The GWO algorithm can be summarized as follows.The encircling prey (P R ) can be regardless of the process during which the GW encircles the prey and it can be written as follows (Mirjalili et al., 2014): In Equations ( 7) and ( 8), two important positions are available: − → X P and X, which corresponds to the PR and the GW positions, respectively.A and − → C are named coefficients vectors, and (t) is the current iteration.The A and − → C can be calculated as follow: Where a fluctuate over time in a decreasing way from 2 to 0, while − → r 1 and − → r 2 are random vectors in the interval of [0, 1].
During the hunting operation, it is assumed that the Omegas wolf only follow the best search agents, i.e. the i.e. α, β and δ, which are considered as the best search agents, and their position are continuously saved during the training process as follow (Figure 3b): Where − → X α , − → X β , and − → X δ are the actual position of the α, β and δ wolfs.Finally, the hunt is closed by an attack of the P R mounted by the GW.This step can be formulated by decreasing the value of a and consequently the value of A (Mirjalili et al., 2014).Figure 5 shows the flowchart of the GWO.Mirjalili and Lewis (2016) proposed the whale optimization algorithm (WOA) algorithm.The algorithm was developed tacking into account hunting strategy adopted by the whale based on the emission of bubble nets in the sea's surface.The hunting behaviour of the whales mainly inspires the WOA.Overall it can be achieved in three distinguished stages: (i) starting with the encircling prey, (ii) the bubble net attacking stage (the exploitation stage); realized in two steps which are the shrinking encircling mechanism and the spiral updating position, and (iii) the search for prey (exploration phase) (Mirjalili and Lewis 2016).Similar to the other optimization algorithms, each whale is an agent and its position is continuously updated during the complete hunting cycle in order to reach closer the best solution, which is the prey.Therefore, it is assumed that the best solution corresponds to the prey, and all agents are invited to update their positions towards it, this can be formulated as follows (i.e. the encircling prey) (Pan et al., 2022;Qiao et al., 2021;Wong et al., 2019):

Whale optimization algorithm (WOA)
In the above equations, D is the travel distance to be travelled by the search agent to reach the best solution (i.e. the best candidate), (t) denotes the actual iteration, − → X * is the position of the best search agent and X is the actual position.The two vectors A and C can be estimated as follow (Pan et al., 2022;Wong et al., 2019): Where r corresponds to a random vector, and its value can be any value in the interval of zero to one, a is a vector whose value dropped from two to zero during the iteration process (for both exploration and exploitation stages).
Two methods have been adopted for the exploitation phase, i.e. the bubble net attacking stage.First, the shrinking encircling mechanism may be realized by minimizing the value of a which resulted in a reduction in the value of A. Consequently, as a result of this decrease of the values of a and A, the updated position of any search agent may be located anywhere from the original to the best agent.Second, the spiral updating position starts by estimating the probable distance between the whale localized at the (X, Y) and the prey localized at (X * , Y * ).Consequently, a spiral equation is then formulated as follows (Mirjalili and Lewis 2016;Qiao et al., 2021): Where − → D is the distance of the ith agent to the prey.In addition, (b) corresponds to a constant for defining the shape to be used by the logarithmic spiral, (l) is a random vector ranging from (−1) to (+1), and (.) is the multiplication.
For the exploration phase (search for prey), it is always based on the variation of the A and a random search taking into account the localization of each other whale.This is considered a global search and may be formulated as follow: Where −−→ X rand corresponds to a random whale (i.e.position) chosen from the available all population.More details about the WOA can be found in (Mirjalili and Lewis 2016;Pan et al., 2022;Qiao et al., 2021;Wong et al., 2019).Figure 5 shows the flowchart of the WOA.Zhao et al. (2020) proposed the manta ray foraging optimization (MRFO) algorithm.The MRFO was inspired by the behaviours of manta rays (Figure 4a).Like the others metaheuristic algorithms, it is structured in several stages: chain foraging, cyclone foraging, and somersault foraging.The first stage is chain foraging.During this phase, it is assumed that the best solution is localized exactly in the point characterized by a high density of plankton (i.e. the prey).According to Figure 4b, a robust chain is formed from an ensemble of individual manta rays.The overall chain moves in an organized manner toward the prey, corresponding to the location with the high concentration of plankton.They swim head to tail and update their position at each iteration by suposing that the best position (x best ) corresponds to the position of the prey.In addition, each iteration updates its position, taking into account the individual's position in front of them (Zhao et al., 2020).

Manta Ray Foraging Optimization (MRFO)
The manta ray starts swimming by forming a foraging chain.During the swimming, each agent moves toward the prey and simultaneously considers the manta ray in front of it (Figure 4b).This operation can be written mathematically as follows (Fathy et al., 2020;Zhao et al., 2020): Where, x d i (t) is the position of the ith agent (i.e.individual) at the iteration (t), r is a random vector in the range of [0, 1], x d best (t) is the prey (i.e, the plankton with high concentration), N corresponds to the total number of manta rays, α is a coefficient that can be calculated as follows: Once the manta rays fix the position of the prey, the all population form an uninterrupted chain and moves toward the plankton having a spiral shape: this is the cyclone foraging (Figure 4c), and each agent swims toward the individual in front of it.This can be formulated as follows (Houssein et al., 2021b;Zhao et al., 2020): where ω is a random number in [0, 1], consequently, the cyclone foraging stage may be written as follow: Where β is a weighting factor calculated as follows: Where, T is the maximum number of iterations, t is the actual iteration, and r 1 is a random number in the range of [0, 1].The cyclone foraging has a good exploration and helps significantly in improving the exploration stage.Furthermore, by introducing a new random position in the overall search space, each agent to find another position compared to the best one.This can be formulated as follow (Abd Elaziz et al., 2021;Zhao et al., 2020): Where x d rand is a random position in the search space, Lb d and Ub d are the lower and upper limits of the ath dimension, respectively.
Finally, a pivot is formed for the somersault foraging, a pivot is formed having the prey as a centre (Figure 4c).
Each agent somersault to a new position and updates their positions considering the best one; this can be achieved by swimming around the pivot.The somersault foraging can be formulated as follow: Where, r 2 and r 3 are random numbers in the range of [0, 1], and S is the somersault factor.More details about the MRFO can be found in (Abd Elaziz et al., 2021;Fathy et al., 2020;Houssein et al., 2021b;Zhao et al., 2020).
Figure 5 shows the flowchart of the MRFO.

Improved Manta-Ray Foraging Optimization (IMRFO)
According to Liao et al. (2021), during the exploitation phase of the MRFO algorithm, the position of each agent is continuously updated, taking into account the best one, by modifying the values of the parameters (α) and (β), which lead to a convergence towards the local optima and poor solution stability.Hence, to help the MRFO converge toward the global solution rather than the local solution, an improved algorithm called improved manta ray foraging optimization (IMRFO) was proposed and based on the opposition-based learning (OBL) method.
The idea behind the OBL is its capacity to search in all directions of the search space.To be clearer, the opposition number can be explained as follows (Houssein et al., 2021a): Where x is the opposite of the x variable, and U b and L b are the upper and lower bands.Consequently, during the optimization, the two variables x and x are compared taking into account their fitness functions, the best solution is reserved and the other is rejected.Therefore, the proposed IMRFO based on the OBL strategy can be formulated as follow.It is assumed that the best solution is in the contrary direction to the first solution, Divergence is observed between the two, which is the leading reason for the failure of the algorithm to move toward the best solution.Hence, the following equation can be written (Feng et al., 2021;Fu et al., 2022;Houssein et al., 2021a): Where, xd i is the opposite position for the x d i , and x max i and x min i are the upper and lower bands.Consequently, xd i could replaces the x d i if the improvement gained using xd i is better than the x d i .In the second step, the OBL is used for estimating the opposite vector of the first initial population ( N).During the optimization phase, a comparison is then made between the two to select the highest fitness.Finally, the algorithm is closed by the ''best solution phase'' using an iterative process until the best solution is obtained (Feng et al., 2021;Fu et al., 2022;Houssein et al., 2021a).Figure 6 shows the flowchart of the IMRFO.

Proposed IMRFO-RVM
The performances of the RVM models are mainly governed by the appropriate selection of the kernel function.The well-known approach for achieving this is applying a trial-and-error approach .However, using an optimization strategy can obtain the best performances of the machine learning models, which is the goal of our study.Therefore, the RVM model was integrated with IMRFO to develop the RVM-IMRFO model.More precisely, the IMRFO is used for better selecting the optimal kernel with OF THE Gaussian function (Equation 5) of the RVM.In addition, the performances of the hybrid RVM-IMRFO were compared to those of the RVM-GWO, RVM-WOA, and RVM-MRFO models.The overall flowchart of the proposed modelling framework is depicted in Figure 5.According to Figure 5, the pan evaporation was moddeled using three climatic variables, T max , T min and extraterrestrial radiation.In addition, the performances of the standalone RVM model were compared with those of four hybrid models; each is linked to one of the above-described metaheuristic: GWO, WOA, MRFO and IMRFO.The above metaheuristic algorithms were applied for better optimization and selecting the RVM parameters corresponding to the kernel function's two parameters in Equation ( 5).

Application and results
The feasibility of a relevance vector machine tuned with improved Manta-Ray foraging optimization (RVM-IMRFO) is investigated in modelling monthly pan evaporation using limited climatic data as input.The outcomes of the RVM-IMRFO are compared with those of the RVM tuned by gray wolf optimization (RVM-GWO), RVM tuned with the whale optimization algorithm and RVM tuned with Manta Ray foraging optimization (RVM-MRFO) concerning the following evaluation statistics: Where Y c , Y o , Ȳo , N are calculated, measured, mean of the observed pan evaporation and quantity of data, respectively.The abovementioned methods were implemented in estimating monthly Epan using data from two stations in China by considering different control parameters.The programmes were applied using MATLAB software.Table 2 provides the parameter information for the applied methods.The table shows that the iteration number, population, and numbers of runs were set to 30, 100 and 20 for all the algorithms.Data percentages used for training and testing are 80% and 20%.

Results
Table 3 gives the training and testing statistics of the RVM method which uses different input combinations of Tmin, Tmax, Ra and periodicity component α to predict the monthly Epan of JZ Station for three different testing data sets.It apparent from Table 3 that the RVM performs better in the M2 scenario compared to the other two scenarios while the M1 scenario provides the worst Epan predictions.Among the double-input combinations, Tmin and Tmax have the best accuracy.In contrast, Tmin and Ra produce the worst model performance.Adding periodicity (α) improves the accuracy of the RVM model in the M2 scenario only.The models with Tmin, Tmax, Ra and α has the least RMSE (0.601 mm/month), MAE (0.477 mm/month) and the highest R 2 (0.889) and NSE (0.882) in predicting Epan in the testing period.It should be noted that involving Ra does not improve the model  4-7 in estimating monthly Epan considering the same model inputs and scenarios.According to the RVM-GWO method (Table 4), Tmax, Ra input generally offers better accuracy among double input combinations while other hybrid RVM methods behave differently.For RVM-WOA, RVM-MRFO and RVM-IMRFO methods, the Tmin and Tmax generally have the best accuracy among the double input combinations.It can be observed from Tables 4-7 that all the hybrid methods provide the best accuracy in modelling monthly Epan in the M2 scenario, like the single RVM method.Considering periodicity in the model inputs generally improves their accuracy in modelling monthly Epan; for example, improvements in RMSE of the best RVM-WOA and RVM-MRFO in the M2 scenario are by 5.3% and 11.6%, respectively.Comparison with the single RVM reveals that metaheuristic algorithms considerably improve the accuracy of RVM in modelling Epan using limited climatic input data; in the M2 scenario, the improvements in RMSE, MAE, R 2 and NSE of the best RVM are 19.8%,17.2%, 3.4% and 3.1% by RVM-GWO, 21%, 22%, 5.4% and 5.6% by RVM-WOA, 44%, 42.6%, 7% and 7.3% by RVM-MRFO and 86.1%, 84.6%, 9.1% and 9.7% by RVM-IMRFO, respectively.
Training and testing statistics of the single RVM method in predicting monthly Epan of YUY Station are summed up in Table 8 for different input combinations and three testing data sets.Similar to the JZ Station here also the M2 dataset provides better Epan predictions.At      9).At the same time, Tmax and Tmin generally have the best Epan prediction accuracy by implementing the RVM-WOA, RVM-MRFO, and RVM-IMRFO.Similar to the JZ Station, the best accuracy was obtained from the M2 testing data set, whereas M1 had the worst performance.In contrast to JZ Station, adding Ra into the Tmin, Tmax input improves the models' accuracies; for example, improvement in RMSE is 12.7% for RVM-GWO, 3.9% for RVM-WOA, 6.5% for RVM-MRFO and 5.9% for RVM-IMRFO in M3 data set.The positive effect of periodicity on Epan can be observed for the single and hybrid RVM methods; for example, by adding α into the best hybrid RVM-WOA, RVM-MRFO and RVM-IMRFO model, the RMSE was improved by 6.2%, 3.2% and 0.9% in the test period, respectively.It is clear from Tables 8-12 that merging metaheuristic algorithms considerably improves the single RVM accuracy in Epan prediction using limited climatic inputs; in the M2 scenario, the improvements in RMSE, MAE, R 2 , and NSE of the best RVM are 7.3%, 9%, 4.9% and 5.4% by RVM-GWO, 25.6%, 28.4%, 6.9% and 7% by RVM-WOA, 37.4%, 40.7%, 8.1% and 8.4% by RVM-MRFO and 64.4%, 64.3%, 10.9% and 11.6% by RVM-IMRFO, respectively.
In order to compare the RVM with standard multi layer perceptron neural networks (MLPNN), Tables 13  and 14 were prepared.For training MLPNN models, the fastest Levenberg-Marquardt algorithms was employed.Sigmoid activation function was used in the hidden and output layers and 50 iterations were set because after this number, there was no improvement in the accuracy.It It is clear from the Tables 13 and 14 that the MLPNN models with Tmin, Tmax, Ra and α offered the best accuracy in both stations and periodicity generally improved the prediction accuracy.Comparison with the RVM models clearly shows that the optimal RVM models with the same inputs performs slightly better accuracy than the MLPNN in prediction monthly pan evaporation.
Figures 6 and 7 compares the observed and predicted Epan by the RVM-based models in the test period for JZ and YUY stations.It is apparent from scatter diagrams that the predictions of the RVM-IMRFO are less scattered than those of the other models, and the RVM-MRFO follows it.The least accuracy of the single RVM can be observed.Taylor diagrams of the models are provided in Figures 8 and 9 for the JZ and YUY stations.As is evident from the diagrams that the least RMSE and the highest correlation belong to the RVM-IMRFO and for this model, the standard deviation is the closest to the observed one in both stations.The RVM-MRFO and RVM-WOA models follow it.The distributions of the Epan predictions obtained by the considered models are compared in Figures 10 and 11     accuracy in all RVM based methods and the models considering periodicity information (α) generally produce the lowest RMSE and MAE.

Discussion
The feasibility of the new algorithm, improved Manta-Ray foraging optimization, in tuning relevance vector machines in predicting monthly pan evaporation using limited climatic data was investigated.Its outcomes were compared with other RVM-based methods using data from two stations in China.From the outcomes it was observed that the periodicity component is essential and generally improves the models' accuracy in predicting monthly Epan; improvements in RMSE and MAE were by about 6% and 12%.This outcome directly agrees with the previous literature (Eray et al., 2018).Adding Ra into Tmin, Tmax input generally improved the models' accuracy; however, in some cases, considering Ra as input deteriorated the models' efficiency; for example, M3 scenario of RVM-GWO, M2 and M3 scenarios of RVM-WOA, all scenarios of RVM-MRFO.As reported in the literature (Adnan et al., 2019;Shi et al., 2012;Zhang et al., 2019), an increasing number of inputs does not guarantee better prediction accuracy.High number of inputs may cause poor model accuracy because it may increase the variance and complexity of the implemented model.For this reason, different input combinations should be investigated in modelling Epan using machine learning methods.
The study's outcomes revealed that all methods provided the best accuracy in the M2 scenario (M2 testing data set) while the M3 scenario provided the worst accuracy.The machine learning methods used in this study are data-driven.They are highly based on data distribution/structure.Therefore, their accuracy may vary from one data set to another even though all data belong to the same phenomenon.For this reason, the evaluation of machine learning methods in mapping a phenomenon (e.g.pan evaporation) should consider different training test sets.Only one data set may mislead the modeller.It was observed that the Tmin and Tmax produced the best accuracy for some hybrid RVM methods.In contrast, the best accuracy of the RVM-GWO belonged to the Tmin, Tmax input.This can be explained by the different flexibility the algorithms have and different assumptions they made and their learning styles.All these are effective in the accuracy of the implemented algorithm.
Comparison outcomes indicated that the metaheuristic algorithms improved the accuracy of RVM in modelling Epan using limited climatic data and the proposed improved MRFO provided the best improvement.This algorithm's main advantage is integrating an opposition-based learning strategy, which generates a new population by taking the opposite of the initial population.Then, by assessing the populations utilizing objective function, it finds the global best solution with the best fitness.Compared to classical MRFO, this improved version is faster and stronger to avoid getting stuck in local optima.The proposed method (RVM-MRFO) is compared with the previous researches in 13 to validate its afficiency in monthly pan evaporation modelling.The study prepared by Ghorbani et al. (2021) used multiple model-support vector machine in Epan prediction using climatic inputs of average air temperature, relative humidity, wind speed and sunshine hours.They applied the proposed method to three strations from Iran and the model provided R 2 ranging from 0.939 to 0.967.Wu et al. (2021) investigated the accuracy of flower pollination algorithm based extreme learning machine in predicting Epan of four stations from China using climatic inputs of minimum and maximum temperatures and sunshine duration.Their model produced the R 2 between 0.864 and 0.924.In another research (Mohamadi et al., 2020), the edfficiency of adaptive neuro-fuzzy interface system with shark algorithm was examined in predicting Epan of two stations from Iran using four inputs (T, RH, W, n).The R 2 and NSE of the best model were 0.92 and 0.92 for two different stations.The study by Ghaemi et al. (2019) investigated the ability of M5 model tree with maximum overlap discrete wavelet transform in modelling Epan of two stations from Turkey using four climatic inputs (W, T, RH, SR).Malik et al. (2017) and Wang et al. (2017) used MLP to predict monthly Epan of some stations from India and China, respectively.It can be clearly seen from Table 13 that the statistics obtained from RVM-MRFO is generally superior to the outcomes reported by the previous researches.It should be noted that the slightly better or similar accuracy of some researches compared to presented study is due to the use of more input variables (e.g.Ghorbani et al., 2021;Malik et al., 2017).

Concluding remarks
This study investigated the feasibility of a relevance vector machine tuned by improved Manta-Ray foraging optimization for predicting monthly pan evaporation using limited climatic data.The outcomes were compared with single and hybrid RVM-based methods involving RVM-GWO, RVM-WOA, and RVM-MRFO.Epan and temperature data obtained from two stations in China were divided into three parts, M1, M2, and M3.Models were tested using each data set.It was observed that the new algorithm (IMRFO) considerably improved the accuracy of a single RVM in monthly pan evaporation prediction; average improvements in RMSE, MAE, R 2 , and NSE of RVM were obtained as 75%, 74%, 10% and 10.6% by implementing RVM-IMRFO, respectively.The RVM-MRFO followed the RVM-IMRFO with the corresponding average improvements of 40.7%, 41.6%, 7.5%, and 7.8% in RMSE, MAE, R 2 and NSE compared to single RVM, respectively.Importing extraterrestrial radiation into the model inputs generally improved the prediction accuracy.All the methods provided the best accuracy in the M2 scenario/data set.In contrast, the M3 scenario produced the worst predictions, indicating the necessity of using different data sets for testing machine learning methods in predicting monthly pan evaporation.The overall outcomes proved the superiority of the RVM-IMRFO model in predicting monthly pan evaporation using only temperature data which is very beneficial in developing countries where other climatic data are not available or missing.The overall averages of the RMSE, MAE, R2, and NSE statistics of all datasets input combinations also endorsed the outperformed performance of RVM-IMRFO over other benchmark models by improving these statistical values from 34.7-38.2 to 18.2-19.5, 36.2-36.4 to 19.1-18.5, 12.5-13.8 to 3.6-3.7, and 12.4-14.6 to 3.6-3.9%,for station 1 and station 2, respectively.The RVM was also compared with the standard MLPNN method and found superior accuracy in predicting monthly pan evaporation (Table 15).This study illustrated the superiority of a new metaheuristic algorithm in tuning RVM in the prediction of monthly pan evaporation.This method was not tested for other time intervals, which can be done in future studies.The RVM-based methods were assessed using data from two stations in China.They can be further evaluated using data from other regions with different climatic characteristics.The feasibility of improved Manta-Ray foraging optimization can also be investigated by tuning other machine learning methods such as ANN, ANFIS or ELM for predicting pan evaporationnot available.

Figure 4 .
Figure 4. MRFO algorithm diagram: (a) structure of a manta ray, (b) Chain foraging behaviour in a 2-D space, and (c) Somersault foraging behaviour in MRFO adapted after Zhao et al., 2020.

Figure 6 .
Figure 6.Scatterplots of the observed and predicted Epan by different RVM-based models in the test period using the best input combination -JZ Station.
the same time, the worst models belong to the M1 dataset.Considering three data sets of M1, M2, and M3, it can be said that Tmin and Tmax produce the best efficiency in Epan prediction among the double input combinations.At the same time, Tmin and Ra seem to be the least effective on Epan.Importing periodicity into triple input combinations (Tmin, Tmax, Ra and α) improves the model accuracy in Epan prediction.Tables 9-12 report the assessment statistics of the hybrid RVM methods in predicting monthly Epan YUY Station.Like the JZ Station, here is also Tmax and Ra input is generally more effective on Epan than other double-input combinations for the RVM-GWO method (Table

Figure 7 .
Figure 7. Scatterplots of the observed and predicted Epan by different RVM-based models in the test period using the best input combination-YUY Station.
using violin-type diagrams.It is clear from the figures that the Epan predictions provided by the RVM-IMRFO are closer to the observed one while the single RVM has the furthest one.

Figure 8 .
Figure 8.Taylor diagrams of the predicted Epan by different RVM-based models in the test period using the best input combination -JZ Station.

FiguresFigure 9 .
Figures 12 and 13 provides the average RMSE and MAE values of the applied models in predicting Epan of both stations during the test period.There values were calculated by averaging RMSE and MAE of all three data sets (M1, M2 and M3).The impact of each input parameters on Epan can be clearly observed from these graphs.

Figure 10 .
Figure 10.Violin charts of the predicted Epan by different RVM-based models in the test period using the best input combination -JZ Station.

Figure 11 .
Figure 11.Violin charts of the predicted Epan by different RVM-based models in the test period using the best input combination-YUY Station.

Figure 12 .
Figure 12.Average (a) RMSE and (b) MAE of the applied models in predicting Epan using all models for all input combinations during test period-JZ Station.

Figure 13 .
Figure 13.Average (a) RMSE and (b) MAE of the applied models in predicting Epan using all models for all input combinations during test period-YUY Station.

Table 1 .
The statistical parameters of the applied data.

Table 2 .
Parameter setting of the optimization algorithms used in the study.

Table 3 .
Training and test statistics of the models for Epan prediction -Station 1 (RVM).

Table 4 .
Training and test statistics of the models for Epan prediction -Station 1 (RVM-GWO).

Table 5 .
Training and test statistics of the models for Epan prediction -Station 1 (RVM-WOA).

Table 6 .
Training and test statistics of the models for Epan prediction -Station 1 (RVM-MRFO).

Table 7 .
Training and test statistics of the models for Epan prediction -Station 1 (RVM-IMRFO).

Table 8 .
Training and test statistics of the models for Epan prediction -Station 2 (RVM)

Table 9 .
Training and test statistics of the models for Epan prediction -Station 2 (RVM-GWO).

Table 10 .
Training and test statistics of the models for Epan prediction -Station 2 (RVM-WOA).

Table 11 .
Training and test statistics of the models for Epan prediction -Station 2 (RVM-MRFO).

Table 12 .
Training and test statistics of the models for Epan prediction -Station 2 (RVM-IMRFO).

Table 13 .
Training and test statistics of the models for Epan prediction -Station 1 (MLPNN).

Table 14 .
Training and test statistics of the models for Epan prediction -Station 2 (MLPNN).

Table 15 .
The comparison of presented study with the previous researches.Flower pollination algorithm based extreme learning machine, RVM-IMFRO: Relevance vector machine tuned with improved Manta Ray foraging optimization, ANFIS-SA: Adaptive neuro-fuzzy interface system with shark algorithm, MT-MODWT: M5 model tree with maximum overlap discrete wavelet transform, MLP: Multilayer perceptron, MM-SVM: Multiple model-support vector machine, n: Sunshine duration, W: Wind speed, T: Average air temperature RH: Relative humidity, SR: Solar radiation.