Modeling and Uncertainty Analysis of Groundwater Level Using Six Evolutionary Optimization Algorithms Hybridized with ANFIS, SVM, and ANN

In the present study, six meta-heuristic schemes are hybridized with artificial neural network (ANN), adaptive neuro-fuzzy interface system (ANFIS), and support vector machine (SVM), to predict monthly groundwater level (GWL), evaluate uncertainty analysis of predictions and spatial variation analysis. The six schemes, including grasshopper optimization algorithm (GOA), cat swarm optimization (CSO), weed algorithm (WA), genetic algorithm (GA), krill algorithm (KA), and particle swarm optimization (PSO), were used to hybridize for improving the performance of ANN, SVM, and ANFIS models. Groundwater level (GWL) data of Ardebil plain (Iran) for a period of 144 months were selected to evaluate the hybrid models. The pre-processing technique of principal component analysis (PCA) was applied to reduce input combinations from monthly time series up to 12-month prediction intervals. The results showed that the ANFIS-GOA was superior to the other hybrid models for predicting GWL in the first piezometer and third piezometer in the testing stage. The performance of hybrid models with optimization algorithms was far better than that of classical ANN, ANFIS, and SVM models without hybridization. The percent of improvements in the ANFIS-GOA versus standalone ANFIS in piezometer 10 were 14.4%, 3%, 17.8%, and 181% for RMSE, MAE, NSE, and PBIAS in the training stage and 40.7%, 55%, 25%, and 132% in testing stage, respectively. The improvements for piezometer 6 in train step were 15%, 4%, 13%, and 208% and in the test step were 33%, 44.6%, 16.3%, and 173%, respectively, that clearly confirm the superiority of developed hybridization schemes in GWL modeling. Uncertainty analysis showed that ANFIS-GOA and SVM had, respectively, the best and worst performances among other models. In general, GOA enhanced the accuracy of the ANFIS, ANN, and SVM models.


Case Study and Data
The Ardebil plain, with the area of 990 km 2 , is located in the northwest of Iran between latitudes 38′3° and 38′27 and the longitudes of 47′55° and 48′20° (Figure 1). The average annual rainfall is 304 mm. The hottest month in this plain is May and the driest month is July. The average annual temperature is 9 °C. In Ardebil plain, groundwater supplies water for drinking, agricultural, and industrial purposes. There is a negative balance of about 550 million m 3 in the Ardebil aquifer. The GWL decreases by 20-30 cm per year, which is the fastest decline. The Ardebil plain has 89 villages, that use groundwater for agricultural uses. The current condition of the GWL in the Ardebil plain has negative impacts on the farmers as its main users. In this study, the following parameters were used as the input to the hybrid ANN, ANFIS, and SVM models. Then, the principal component analysis was used to select the best input combination up to 12-month lag. were selected for the current study. A total of 20% of the data set was used for testing, and 80% of the data set was used for the training, that were selected randomly. Nine observed wells (wells 6,9,10,24,11,4,7,8, and 1) were used to provide the spatiotemporal variation of GWL for different months. Each piezometer had 140 monthly data points. The measurements were made one time during each month.

ANFIS Model
The ANFIS model uses fuzzy interface systems which use fuzzy if-then rules to construct a predictive model. The ANFIS model has been widely used for predicting rainfall [33], temperature [34], runoff [35], evaporation [36], and sediment load [37]. Figure 1 shows the structure of the ANFIS model in the framework of the study. The square nodes and circle nodes show the adaptive and fixed nodes, respectively. The ANFIS model has five layers [38]. (1) The inputs are fuzzified in the first layer whose nodes are constant. The membership grade of inputs is the output of the first layer: where, 1 is the output of the first layer, ( ) and −2 ( ) are the fuzzy membership functions for the fuzzy set Ai and Bi-2, respectively. The bell-shaped member function is selected for the current study due to its smoothness and concise notation: where a, b, and c are the premise parameters (training algorithms obtain these parameters).
(2) The nodes of the second layer are labelled with M, which shows that they carry out a simple multiplier function. The fuzzy strengths of each rule are the output of the second layer: (3) The nodes of the third layer are also fixed. The fuzzy strengths from the previous layer are normalized in the third layer. The sum of weight functions is used to compute the normalization factor. The normalized fuzzy strengths are the output of the third layer: (4) The nodes of the fourth layer are adaptive and its outputs are computed as: 4 =̄=̄( + + ), = 1,2. ., where, pi, qi, and ri are the consequent parameters. (5) The output in the fifth layer is labelled with S. A fixed node is observed in this layer. This layer computes the total summation of all the incoming signals: In the classical training approach, a combination of the least square and gradient descent methods is commonly used as a hybrid learning algorithm to adjust the parameters of the ANFIS model. The consequent parameters of ANFIS model are updated by applying the least square method in the forward pass. Additionally, in the backward pass, the gradient descent method is used for updating the premise parameters. In the hybridized schemes, tuning and adjusting the consequent and premise parameters are determined by the optimization algorithms as the hybrid training scheme.

ANN Model
The artificial neural network uses behavioral patterns to provide a framework for modeling mechanisms. It consists of three layers: input, hidden, and output layers, and includes the processing units named neurons which are arranged in several layers [39]. The connection weights link the neurons of preceding layers to the neurons of the following layers. The output of the middle layer (hidden layer) is used as the input to the following layer. The input data is received by the input layer, while the last layer generates the final output of the ANN model. The middle layers receive and transmit the input data to the connected nodes in the following layers. The weighted sum of inputs is used by the hidden neurons to produce the intermediate output. The ANN model uses the activation functions to compute the outputs of the hidden and output neurons. It uses the bias values to set the output along with the weighted sum of inputs to the neuron. The process of ANN modelling has two major levels: (1) preparing the network structure, and (2) adjustment of the weights of connections. The literature review indicates that the backpropagation training algorithm is wildly used in different fields, such as water engineering [40]. First, the output of the ANN model is obtained as a response of the ANN model. In the next level, the error between observed and estimated values is minimized to find the weights of the model. If the output is different from the observed value, the modification of weights and biases will start to decrease the error values. However, the backpropagation algorithm has a slow convergence rate and to overcome its inherent weakness the meta-heuristic optimization algorithms are used in the present study. Figure 1 shows the structure of the ANN model and its hybridization with intelligence algorithms.

SVM Model
The SVM model has been widely used for predicting solar radiation [41], rainfall [42], landslides [43], and drought [44]. In the SVM model, the input data are divided into testing and training samples. The selected input vector (training sample) is mapped into a high-dimensional feature space. Then, the optimal decision function is generated [44]. Equation (7) shows the regression estimation function of the SVM model: where, ( ) is the nonlinear mapping function for mapping sample data (x) into an m-dimensional feature vector, b is the bias, and is the weight vector of the independent function. and b are computed by minimizing the following function: where, D(f) is the generalized optimal function, ‖ ‖ 2 is the complexity of the model, C is the penalty parameter, and is the error control function of . Thus, the optimization problem is defined as follows: where, and * are the relation factors. Adjusting the partial derivatives of W, b, , and * to 0 and using the Lagrangian equation, an optimization problem can be formulated as follows: where, ( , ) is the kernel function. The most popular kernel function is the radial basis function: where, is the radial basis function parameter. The SVM based model uses the grid search algorithm (GS) to find the optimal value of parameters C and . Specifically, a set of initial values is chosen for both parameters and C. To select and C using cross-validation, the available data are divided into k subsets. One subset is regarded as testing data and then assessed using the remaining k-1 training subsets. Then, the cross-validation error is computed using the split error for the SVM model using different values of C and . Various combination of parameters C and are evaluated and the one yielding the lowest cross-validation error is chosen and used to train the SVM model for the whole dataset. The structure of the SVM model is shown in Figure 2.  Reduction number of inputs using PCA Grasshoppers are regarded as pests because they damage agricultural crops. They are a group of insects that can generate large insect swarms. The mathematical function to investigate the swarming behavior of grasshoppers is demonstrated with the following equation [45]: where, is the position of the i th grasshopper, is the classical interaction, is the gravity force on the i th grasshopper, and is the wind advection. The classical interaction is simulated as follows: where, dij is the distance between the i th and j th grasshoppers, and s is a function for the definition of the strength of social forces.
The function s is computed as follows: where, f is the intensity of attraction, and l is the attractive length scale. The distance between grasshoppers ranges between 0 and 15. Repulsion is observed in the interval [0 2.079]. The grasshoppers enter the comfort zone if they are far from 2.079 units from other grasshoppers. G component is computed as follows: where, g is the gravitational constant and ̂ is a unity vector towards the center of the earth. The A parameter is computed as follows: where, u is a constant drift and ̂ is a unit vector in the direction of the wind. Finally, the new position of a grasshopper is computed using its common position, the food source position, and the position of all other grasshoppers: where, N is the number of grasshoppers. However, Equation (18) cannot be directly used for optimization because grasshoppers do not converge to a specified point. Thus, a corrected equation is used to update the grasshopper's position: where, is the upper bound; is the lower bound; ̂ is the value of the Dth dimension in the target space (optimal solution found so far); and c is a decreasing coefficient to shrink the comfort zone, repulsion zone, and attraction zone. Figure 2 shows the flowchart of GOA.

Weed Algorithm (WA)
Weeds have a very adaptive nature that converts them to undesirable plants in agriculture. Figure 3 shows the flowchart of the WA algorithm [46]. The WA starts with initializing a random population of weeds in the search space. A predefined number of weeds are randomly distributed over the entire dimensional space, indicated as a solution space. The fitness of weeds is assessed by considering its fitness function to optimize the problem. Each agent of the current population can produce some seeds via a predefined region considering its own location. In this way, the number of produced seeds relies on its fitness function in the population regarding the best and worst solutions, as observed in Figure 3. The number of seeds is computed as follows [46]: where, is the worst fitness function, is the best fitness function, is the minimum number of seeds, is the maximum number of seeds, and is i th fitness function. The distribution of seeds is random over the search space and is based on the standard deviation and zero mean. The standard deviation of the distribution of seeds varies as follows: where, is the maximum number of iterations, is the standard deviation at the current iteration, is the final value of standard deviation, is the predefined initial value of standard deviation, and n is the nonlinear modulation index. Seeds are produced by each weed and then are distributed over the space. The competitive exclusion is the final level in the WA. If a weed does not generate seeds, it will be extinct. If all the weeds generate seeds, the number of weeds increases exponentially. Therefore, the number of seeds is limited to the maximum value (Pmax). The weeds with better fitness function are allowed to reproduce. Weeds with worse fitness function are removed (see figure 4).  Recently, CSO has gained popularity among other optimization algorithms because of its exploration ability and is widely used in different fields, such as wireless sensor networks [47], robotics [48], data clustering [49], and dynamic multi-objective algorithms [50]. Chu et al. (2006) introduced the cat swarm algorithm [51]. Figure 5 shows the flowchart of CSO. The CSO uses hunting and resting skills for optimization. First, the initial population of cats is initialized randomly. The seeking mode and tracing mode are two important operation modes in the CSO model. The seeking mode demonstrates the resting ability of cats which change their position and remain alert. This mode is regarded as a local search for the solutions. The seeking memory pool (SMP), the seeking range of selected dimension (SRD), and counts of dimension to change (CDS) affect the cat's behavior. The number of duplicate cats is denoted by SMP. CDC shows that the dimensions are to be mutated and SRD denotes change value of chosen dimensions. In the seeking mode, most of the cat's time is in the resting time, even though they remain alert [52]. The seeking mode includes the following levels: • Generate replicas of the cats as per SMP.
where, , is the position of the k th cat in the d th dimension (new position of the cat), is the random number, N is the number of cats, D is the number of dimensions, and , is the position of j th cat in the d dimension (old position of the cat). • Compute the objective function for all copies and choose the best objective function value (xbest) of the cat.

•
Substitute xj,p with the best cat if the xbest is better than xj,p in terms of the objective function value.
The hunting skill of cats is represented by the tracing mode. Cats trace the objectives with high energy by changing their locations with their own velocities. The velocity is updated as follows: where, is the inertia weight, 1 is a constant, and , is the velocity of j th cat in the d dimension, and , , is the new velocity of the j th cat. The position of cats in the tracing mode is updated as follows: where, , , is the j th position of the k th cat in the d th dimension (new position of the cat). In PSO, a set of particles that are generated randomly search the best adjacent solutions for optimization. The updating equations for the new position and velocity of particles are written as [53]: where, d is the number of dominions; is the inertia weight; 1 and 2 are the random values; 1 and 2 are the acceleration coefficients; ( ) is the global best position obtained by neighbors; and is the personal best position. The particles find the solutions of optimization problems by adjusting the position and velocity of particles. The main advantages of PSO are easy implementation and computational efficiency.

Genetic Algorithm (GA)
Genetic algorithm is one of the most popular algorithms that is extensively applied for optimization problems. Each chromosome in GA is a candidate solution [19]. The genes of chromosomes simulate the variables of optimization. First, the initial population of chromosomes is randomly initialized for optimization and the selection operator is used to select the best chromosomes for the production of the next generation. The chromosomes with better fitness values have a great chance of being chosen by the selection operator. The crossover operator is used to exchange genes between two chromosomes for producing new solutions. Finally, the mutation operator is used to cause changes in the genes. The mutation operator is applied to the chromosomes of new genes to generate different solutions with new genes. If the convergence criteria are satisfied, the algorithm stops; otherwise, the algorithm runs again. The drawback of GA shows that GA requires a high number of iterations [20].

Krill Herd Algorithm (KHA)
Gandomi and Alavi [54] introduced the KHA using the krill's behavior in nature [54]. The KHA is widely used in different fields, such as text document clustering analysis [55] and structural seismic reliability [56]. The KHA acts, based on three main concepts: (1) mutation-induced, (2) foraging mutation, and (3) physical diffusion. The following formulation uses the three behaviors mentioned above [54]: where, is the location of the i th krill, Ni is the motion induced by another krill, is the foraging motion, and is the physical diffusion of the i th krill. Equation (28) describes the motion-induced by another individual krill.
where, is the maximum induced speed, , is the neighbor's local effect, , is the krill's target direction, is the inertia weight of induced motion, and , is the old motioninduced for the i th individual krill. The foraging motion can be formulated as: where, is the foraging speed, , is the food attractive, , is the effect of the best fitness of the i th krill, and , is the last foraging motion. The diffusion can be computed as: where, is the maximum diffusion speed, and is the random direction. Finally, the position of a krill is computed as follows: where, , is the value of the next individual krill location, and , represents the current position of solution number I, and is the essential constant. Figure 5 shows the flowchart of the krill algorithm.

Principal Component Analysis (PCA)
PCA is a statistical orthogonal transformation to obtain a set of values of linearly uncorrelated (principal components) from a set of observations. When the user has the number of inputs but he cannot identify the appropriate inputs, the PCA is used to reduce the number of inputs. The final data set should be able to demonstrate most of the variance of the original input data by creating a variable reduction [57]. PCA can be explained, based on the following equation [57]: where, shows the principal component, is the related eigenvector, and xi is the input variable. The information is obtained by solving Equation (34): where, is the variance-covariance matrix, I is the unit matrix, and is the eigenvalues.

Taguchi Model
The random parameters of optimization algorithms are the most important parameters affecting the outputs of the optimization algorithms. Thus, determining the appropriate values of random parameters is necessary to construct the optimization models. The Taguchi model is widely used to design different parameters of different experiments or experimental models. First, the initial level is determined for each of the random parameters in the optimization algorithms. In the Taguchi method, parameters are classified into two groups: (1) controllable, and (2) uncontrollable (noise). In the Taguchi model, each parameter combination that has a higher S (signal)/N (noise) ratio is regarded as the best combination [58].
where, n is the number of data, and is the fitness function that is obtained by the Taguchi model. For example, consider the PSO algorithm with four parameters and three levels. When the population size is at level 1, the acceleration coefficient is tested at levels 1, 2, 3, and 4. Similarly, the inertia coefficient is tested at levels 1, 2, 3, and 4.

Hybrid ANN, ANFIS, and SVM Models with Optimization Algorithms
The optimization algorithms can be used as a robust training algorithm for the ANN models. The process starts with the initialization of a group of random agents (particles, chromosomes, krill, grasshoppers, weeds, or cats). The position of agents represents the ANN weights and biases. Following this level, using the initial biases and weights (i.e., the initial position of agents), the hybrid ANN-optimization algorithms are trained, and the error between the observed and estimated value is calculated. At each iteration, the calculated error is decreased by the updating of agent locations.
The model procedure in ANFIS-optimization algorithm models starts with the initialization of a set of agents (particles, chromosomes, krill, grasshoppers, weeds, or cats) and continues with the random choice of agents and finally adjusts a location for each agent. First, the ANFIS model is trained. Then, the consequent and premise parameters are optimized by the optimization algorithms. The root mean square error (RMSE) is defined as an objective function. The aim of optimization algorithms is to minimize the objective function value with finding the appropriate values of consequent and premise parameters.
In SVM, the C parameter and kernel function parameters have significant effects on the accuracy of the SVM. The random population of agents (particles, chromosomes, krill, grasshoppers, weeds, or cats) are initialized for training the SVM parameters. The RMSE is defined as an objective function. The aim of hybrid SVM-optimization algorithm models is to minimize model errors. Figure 2 shows the developed framework of hybrid ANN, ANFIS, and SVM-optimization models for modeling groundwater level. Thus, the model parameters are considered as decision variables for optimization algorithms. The optimization algorithms aim to minimize the error function to find the optimal value of model parameters. The PCA selects the appropriate input combinations. Then the hybrid and standalone models uses the input combinations to forecast GEWL. The models uses the optimized model parameters to accurately forecast monthly GWL.

Uncertainty Analysis of Soft Computing Models
The input data and the inability of model structure are the sources of uncertainty. In this research, an integrated framework is developed to simultaneously evaluate the input data and model structure.

Input data uncertainty
The combined Bayesian uncertainty was used to compute the uncertainty contributed by input data. The input error model was used to account for the uncertainty of input data [59]:  [59].

Mode Structure Uncertainty
Bayesian model average (BMA) is used for model uncertainty. The posterior model probability and averaging over the best models were used to estimate the uncertainty of the models. The weighted average prediction of quantity of target variable is computed as follows [59]: 5-A predetermined number of GWLs for each model is provided using probabilistic parameter estimations obtained from level 2 to level 4. 6-The variance and weight of models are estimated. 7-The weights for ensemble members of models are summed to compute the weight models. 8-To the experimental soft computing models. The following indices were used to quantify the uncertainty of models: 9-where k is the number of observed data, is the upper bound of data, is the lower bound of data, is standard deviation, p is bracketed by 95% of predicted uncertainties, d is the distance between the upper and lower bounds, and ̄ is the average distance between the upper and lower bounds [59,60].

Statistical Indices for Evaluation of Different Models
In this study, the following indices were used to evaluate the performance of models: Root mean square error: Mean absolute error: Nash Sutcliffe efficiency: Percent bias (PBIAS): where, N is the number of data, H0 is the observed value, and Ps is the predicted value. RMSE and MAE show a good match between observed data and estimated values when it equals 0. The NSE shows a good match between the observed values and estimated values when it equals 1. The best value of PBIAS is zero.

Inputs Selection by PCA
In this study, 12 input variables (H(t-1), …., H(t-12)) were considered to select the input lag times of monthly GWL. As presented in the flowchart and framework of the current study in Figure 1, the first step of the model developments is the appropriate selection of time lags for GWL modelling by PCA analysis. Table 1 shows the variance contribution rate for PCAs as the principal component loadings. There are the loadings of 12 principal components versus 12 input lag times of GWL. The first four PCs variance summed up a contribution of 91%, among which the first PC variance had a contribution of 48% loadings. It was observed that the inputs H(t-1), H (t-2), H (t-3), H (t-4), and H (t-5) had higher factor loading in comparison with other inputs of the PCs. Thus, the first four PCs were selected for the hybrid soft computing models which included inputs H(t-1), H (t-2), H (t-3), H (t-4), and H (t-5) because of their higher loading factor. This loading analysis of variables reduced the raw initial input parameter numbers from 12 to 5, that decrease the model development efforts. The coefficients of more 0.75 are significant for Eigen value verifications [60].

Selection of Random Parameters by the Taguchi Model
The Taguchi model was used to find the value of random parameters rather than the classical trial and error methods. Table 2 shows the computed signal-to-noise (S/N) ratio for each random parameter in the optimization module of the hybrid training of ANFIS. Each parameter had four levels and the best level of each parameter is selected based on the S/N values. The S/N ratio was computed for each level of parameters. The best value of parameters had the highest S/N rate. For example, sensitivity analysis for different values of GOA parameters was done, as shown in Table 2. The results indicated that the population size = 300 had the highest value of S/N. Thus, the optimal size of population was 300. The maximum S/N ratio for parameter l was 1.23. Thus, the optimal value of parameter l was 1.5. The maximum S/N ratio for parameter f was 1.14. Thus, the optimal value of parameter f was 0.5.

Results of Hybrid ANN, ANFIS, and SVM Models
In this section, the results of developed hybrid models are presented and compared with each other and with the usual ANFIS, ANN, and SVM models. These models are hybridized with GOA, CSO, KA, WA, PSO, and GA meta-heuristic optimization algorithms. The results of models in three piezometers of 6, 9, and 10 as shown in Figure 6, are presented and discussed. These piezometers were selected as samples to evaluate the ability of new hybrid models.  Table 3. Among SVM models, the hybrid SVM-GOA was observed to have the lowest value of NSE and the highest values of RMSE, MAE, and PBIAS. It was important to mention that the standalone SVM, ANN, and ANFIS had worse performance than hybrid ANN, SVM, and ANFIS models that indicates the superiority of hybridization in model developments. Among PSO, CSO, GA, KA, and WA, the CSO had better results than the other optimization algorithms. The general results showed that ANFSI model was superior to the SVM and ANN models. Additionally, the ANN model had lower values of RMSE and MAE than did the SVM model. Additionally, the results of ANFIS-GOA as the best model in piezometer 6 in comparison with standalone ANFIS shows that meta-heuristic hybridizations improved the model performances in train and test steps. The percent of RMSE, MAE, NSE, and PBIAS improvements by ANFIS-GOA in train step were 15%, 4%, 13%, and 208% and these values for the test steps of ANFIS-GOA are 33%, 44.6%, 16.3%, and 173%, respectively, that clearly confirm the superiority of developed hybridization schemes in GWL modelling. Additionally, in Figure 7a, the scatter plots of training and testing steps visualize the performance of ANFIS-GOA compared to the other models. Furthermore, simulations coincide very well with the observed values and all of the data points concentrated over the y = x line with R 2 = 0.93. Furthermore, this figure shows that other hybridized models such as CSO, PSO, KA, WA, GA, and standalone ANFIS have less accuracy in high and low values of GWL, while the ANFIS-GOA over all of low to high values of GWL performed accurately in regard to the observations.

Begin
Step 1: Initialization. Set the generation counter G=1; initialize the population P of NP krill individuals randomly; set the foraging speed Vf, the maximum diffusion speed Dmax, and the maximum induced speed Nmax.
Step 2: While the termination criteria is not satisfied or G<Gmax do Sort the population/krill from best to worst. for i=1:NP (all krill) do Perform the following motion calculation. Motion induced by the presence of other individuals Foraging motion Physical diffusion Implement the genetic operators. Update the krill individual position in the search space. Evaluate each krill individual according to its position. end for i Sort the population/krill from best to worst and find the current best. G=G+1.
Step 3: end while Step 4: Post-processing the results and visualization. End. Results of hybrid models for piezometer 9 in Figure 7b and Table 4 indicated that the hybrid ANN, ANFIS, and SVM models had better performance than the standalone ANN, SVM, and ANFIS models, the same as the results for piezometer 6 in the previous subsection. Among ANFIS hybrid models, the hybrid ANFIS-GOA was confirmed to have the best performance with the smallest values of RMSE = 1.  Table 4 indicated that GOA and GA were the best and worst algorithms among other algorithms. As observed in Table  4 and Figure 7b, the evolutionary ANN models had more accuracy than the evolutionary SVM model because of lower values of RMSE, MAE, and PBIAS and higher values of NSE. However, no major differences were observed in GWL predictions of piezometers 6 and 9 predictions by ANFIS-GOA. Here the results of models in piezometer 10 are evaluated. As observed in Table 5, results indicated that the ANFIS-GOA was better in terms of minimizing RMSE, MAE, and PBIAS than the other models. ANFIS-GOA reduced RMSE error by 7.01% and 7.04% compared to ANN-GOA and SVM-GOA, respectively. The standalone ANFIS, ANN, and SVM models provided worse results than the hybrid models. The SVM model provided the worst performance among other models. The NSE of ANFIS-GOA, ANFIS-CSO, ANFIS-KA, ANFIS-WA, ANFIS-PSO, and ANFIS-GA was 0.91, 0.90, 0.89, 0.79, and 0.75, respectively. GA had the worst performance among other algorithms. As is shown in Table 5, the error in the estimated GWL by using GA was more than that of PSO, KA, WA, GA, CSO, and GOA. Overall, the percent of improvements in the ANFIS-GOA versus standalone ANFIS in piezometer 6 were 14.4%, 3%, 17.8%, and 181% for RMSE, MAE, NSE, and PBIAS in training stage and 40.7%, 55%, 25%, and 132% in testing stage, respectively. These values again confirm that all of the hybridized models performed more accurately than the stand-alone models and indicate the generality of hybridizing Taguchi with training procedure compared to the classical standalone models. Scatterplots for the soft computing models are provided in Figure 7a for the training and testing phases. It is clear that the hybrid ANFIS-GOA predictions were much closer to the measured data in the testing and training phases with a higher coefficient of determination. This result indicated a better correlation and a larger degree of statistical match between measured and predicted data of ANFIS-GOA relative to the other hybrid ANN and SVM models. The R 2 values were found to vary in the range of 0.84-0.94 and 0.79-0.91 for the ANN (hybrid ANN models and based ANN model) and SVM models (hybrid SVM models and based SVM model), respectively. The SVM model had the lowest R 2 among other models. Additionally, the ANFIS-GA, ANN-GA, and SVM-GA models had the lowest R 2 among other hybrid ANFIS, ANN, and SVM models. There is a weak agreement between the lower and higher values of the actual and estimated GWLs in this scatter plots of piezometer 6, unlike the ANFIS-GOA results.
• piezometer 9 As observed in Figure 7b Table 4 show incorporating the Taguchi and GOA in ANFIS training enhanced the R 2 values 13% in comparison with the standalone ANFIS and in all of the developed models the hybridized meta-heuristic models outperformed the single standalone models.

• Piezometer 10
The results of Figure 7c indicated that the ANFIS-GOA and SVM models produced the best and the worst results, respectively. It is clear that developed hybrid ANFIS-GOA model forecasting of GWL was less scattered and closer to the straight line of 1:1 than those the other models and it shows impressive results in regard to the other models. For training and testing phases, GA had a worse performance than CSO, PSO, KA, WA, and GOA because of the lower values of R 2 . The standalone ANFIS model had the worst performance among the ANFIS-GOA, ANFIS-CSO, ANFIS-WA, ANFIS-PSO, ANFIS-GA, and ANFIS-KA models. The ANFIS-GOA model with R 2 = 0.94 as is presented in Table 5, the values of GWL simulated by the ANFIS-GOA are almost equal to the observed values of GWL. The linear fit of the forecasted GWL and measured GWL results have a high correlation coefficient that is very close to 1.00 (R 2 =0 .97) and a perfect correlation coefficient (R 2 value) of 0.94, confirmed that the simulation model has provided a very good prediction of the observed values of GWL. Additionally, 94% of the observed GWL values accurately fit the hybrid ANFIS-GOA model predictions.

Uncertainty Analysis of Soft Computing Models
As stated in the aims of the current study, the uncertainty analysis of hybrid intelligence models is another major contribution and novelty of the present study. The same as the previous subsections, in this section the results of uncertainty analysis of hybrid models in selected three piezometers are provided and comparative evaluation between different hybrid models are presented. The hybrids of ANFIS, SVM, and ANN models with GOA, WA, KA, PSO, and GA are joined with the nonparametric Monte-Carlo Simulations (MCSs) to quantify the uncertainty of developed models in GWL simulations. The probability of model predictions in MCSs is considered as a degree of uncertainty of model results and demonstrates the probabilities in the GWL forecasting bands that enclosed the observed GWL inside these bounds of probability.
• Piezometer 6 In the trained hybrid models, the uncertainty in the model trained parameters and weights is the major source of uncertainty in model results. Here the effects of uncertainty in trained, optimization, and determination of parameters, and weights of intelligence developed hybrid models for piezometer 6 are presented. For training and testing stage, the uncertainty of the models results in piezometer 6 are provided in Figure 8a and in Table 6. The uncertainty results are quantified by the two indices of p and d and visualized by the uncertainty bounds of 95%. At first, the values of p show how many of the observed GWL values in the training and testing stages are positioned inside the 95% confidence bounds. Secondly, the d-factor as the measure of deviations should be small also. Figure 8 indicated that the highest and lowest d was obtained for SVM and ANFIS-GOA, respectively. Based on p and d indices, CSO had better performance than PSO, GA, KA, and WA. Results indicated that the standalone ANN, ANFIS, and SVM models had higher d and lower p than hybrid ANFIS, ANN, and SVM models that indicate higher uncertainty in the standalone model results. The overall comparison of the results indicated that the ANN model outperformed the SVM model. The d values of uncertainties of models in Table 6 show that in all of developed models for GWL the d value is lower than 1, that proves the superior tight bounds of developed models. The best results are derived by the ANFIS-GOA with d = 012 and p = 0.94 indicates that developed model 95% of observations are covered by the uncertainty bounds. The desired values for p in model uncertainty analysis have values greater than 80% [49].
• Piezometer 9 As presented in Table 6 and in Figure 8b, SVM-GOA and SVM had the lowest and highest d among SVM models. According to Table 6, GOA outperformed CSO and KA, but both algorithms were better than GA, PSO, and WA. The p-value of the standalone ANFIS model was increased by the optimization algorithms. GA provided lower performance in the optimization of ANN with p equal to 0.83 and d equal to 0.24, compared to WA, GOA, PSO, KA, CSO, and WA.
Again, the comparisons confirm the superiority of ANFIS-GOA in uncertainty verifications that have p = 0.94 and d = 0.16. As confirmed by these values of p in all of the developed models, all of them are satisfactory and the major part of GWL simulations are enclosed by the 95% prediction interval based on model prediction in Monte Carlo simulations. However, the d values that measure the average distance from upper and lower limits of prediction interval, for the ANFIS-GOA models are significantly and considerably smaller than those of all of the other models. In general, the benefits of ANFIS-GOA models over the other models is two-fold. At first, the GOA based models provide a more accurate prediction of GWL with fewer errors. Secondly, the confidence interval of ANFIS-GOA model results is much narrower and yet encloses almost the greatest percent of observation in MCSs.
• Piezometer 10 From Table 6 However, general results indicated that the ANFIS-GOA has the best performance among other models. Figure 9 shows the coefficient of variation for different optimization algorithms. ANFIS-GOA had a lower coefficient of variation than other models and optimization algorithms. The worst results were for GA. In general, there are three main sources that generate the uncertainty of model outputs: the first one is the data and knowledge uncertainty, the second one is the parametric uncertainty due to unknown model parameters, and the third one is the structural uncertainty due to physical complexity of phenomena. The main contribution of the current paper is the uncertainty analysis of hybrid models prediction of GWL in the form of parametric uncertainty due to regulatory parameters and weights produced in the training stage of models.

Spatiotemporal Variation of GWL
The previous section indicated that the GOA improved the performance of ANN, ANFIS, and SVM models. The results indicated that the GOA had better performance than other optimization algorithms. As shown in Figure 9, the hybrid GOA models (ANFIS-GOA, ANN-GOA, and SVM-GOA) have low variation coefficients in modeling.
Most literature reviews revealed only a few quantity comparisons. Furthermore, they did not include the spatiotemporal variation of GWL. In this section, the latitude, longitude, H(t-1), H (t-2), H (t-3), H (t-4), and H (t-5), hydraulic conductivity (HC), and specific yield of nine observed wells (well 6,9,10,24,11,4,7,8, and 1) were used to provide the spatiotemporal variation of GWL for different months. The Ardebil plain is a heterogeneous aquifer. Thus, the hydraulic conductivity and specific yield spatially vary in the Ardebil plain. HC is a measure of a material's capacity to transmit water. The specific yield is defined as the ratio of the volume of water that an aquifer will yield by gravity to the total volume of the aquifer. A pumping test method was used to obtain the value of the hydraulic conductivity and specific yield. Figure 10 shows the measured hydraulic conductivity and specific yield for the Ardebil plain. In this section, the ANFIS, ANN, and SVM models with the best algorithm (GOA) were used to provide the spatiotemporal variation of GWL. The difference between estimated GWL models and observed GWL was computed for all months of years. The RMSE was used as an error function to compare the estimated data with the observed data. From Figure 11, it was clear that the ANFIS-GOA provided more accurate estimation than ANN-GOA and SVM-GOA. It was clear that the RMSE of ANFIS-GOA varied from white (1.2 m) to dark blue (2.2), while the RMSE of ANN-GOA and SVM-GOA varied from 1.7 (yellow) to 2.7 m (light green). Thus, results indicated that ANFIS-GOA has higher accuracy for the heterogeneous aquifers. The heterogeneous aquifers are considered as complex hydraulic systems because their hydraulic parameters vary spatially and temporally. Additionally, the climate parameters, such as temperature and rainfall, can increase the complexity of prediction of GWL for heterogeneous aquifers.

Conclusions
In this study, the ANFIS, ANN, and SVM models were used to predict groundwater level. The GOA, CSO, GA, PSO, WA, and KA were used to fine-tune and integrate with the ANN, SVM, and ANFIS models. Three piezometers (6, 9, and 10) in the Ardebil plain were considered as a case study for the GWL investigation. The input combinations of time series (up to 12-month lag) were reduced using principal component analysis (PCA). For the testing phase and piezometer 6 ANFIS-GOA indicated a value of RMSE: 1.21, MAE: 0.878, NSE: 0.93, and PBIAS: 0.15 which reflected better performance than the other models. The R 2 values were found to vary in the range of 0.84-0.94 and 0.79-0.91 for the ANN (hybrid ANN models and based ANN model) and SVM models (hybrid SVM models and based SVM model), respectively. The results indicated that the SVM model had the lowest R 2 among other models. It was observed that the ANFIS-GOA yielded the most dominant performance among other models. From uncertainty analysis, the weakest model in the optimization of the ANFIS model was ANFIS-GA with a p = 0.87 and d = 0.21. However, general results indicated that the ANFIS-GOA had better performance than other models. Additionally, the results of spatiotemporal variations maps of GWL showed that ANFIS-GOA has high accuracy for the heterogeneous Ardebil aquifer. Future studies can evaluate the accuracy of these models under climate change conditions. The climate parameters such as temperature and rainfall can be simulated for future periods. Then, these parameters can be used as input to the models to simulate GWL for the future periods.

Conflicts of Interest:
The authors declare no conflict of interest.