Optimizing Rotation Forest-Based Decision Tree Algorithms for Groundwater Potential Mapping

: Groundwater potential mapping is an important prerequisite for evaluating the exploitation, utilization, and recharge of groundwater. The study uses BFT (best-ﬁrst decision tree classiﬁer), CART (classiﬁcation and regression tree), FT (functional trees), EBF (evidential belief function) benchmark models, and RF-BFTree, RF-CART, and RF-FT ensemble models to map the groundwater potential of Wuqi County, China. Firstly, select sixteen groundwater spring-related variables, such as altitude, plan curvature, proﬁle curvature, curvature, slope angle, slope aspect, stream power index, topographic wetness index, stream sediment transport index, normalized difference vegetation index, land use, soil, lithology, distance to roads, distance to rivers, and rainfall, and make a correlation analysis of these sixteen groundwater spring-related variables. Secondly, optimize the parameters of the seven models and select the optimal parameters for groundwater modeling in Wuqi County. The predictive performance of each model was evaluated by estimating the area under the receiver operating characteristic (ROC) curve (AUC) and statistical index (accuracy, sensitivity, and speciﬁcity). The results show that the seven models have good predictive capabilities, and the ensemble model has a larger AUC value. Among them, the RF-BFT model has the highest success rate (AUC = 0.911), followed by RF-FT (0.898), RF-CART (0.894), FT (0.852), EBF (0.824), CART (0.801), and BFtree (0.784), respectively. Groundwater potential maps of these 7 models were obtained, and four different classiﬁcation methods (geometric interval, natural breaks, quantile, and equal interval) were used to reclassify the obtained GPM into 5 categories: very low (VLC), low (LC), moderate (MC), high (HC), and very high (VHC). The results show that the natural breaks method has the best classiﬁcation performance, and the RF-BFT model is the most reliable. The study highlights that the proposed ensemble model has more efﬁcient and accurate performance for groundwater potential mapping.


Introduction
Groundwater is one of the major clean water sources across the world [1]. It has multi-purpose uses such as drinking, manufacturing, irrigation, and other domestic purposes, but in recent times it has faced scarcity due to its restricted water supply and over-exploitation [2,3]. Therefore, groundwater pollution and reduction make systematic research on groundwater imminent [4][5][6][7]. In particular, groundwater is the primary between 10 and 50 km , 33 rivers have drainage areas between 50 and 100 km , and 10 rivers have drainage areas larger than 100 km 2 . The total length of rivers is 3255.96 km, and the river network density is 0.86 km/km 2 .
The Luo River, which is the largest river in the study area, is a secondary tributary of the Yellow River and is the main body of the surface hydrological network in the area. The river network density of drainage areas larger than 1 km 2 is 0.9 km/km 2 . The relative height difference of ravines is 120-567 m, and the average longitudinal gradient of tributaries is between 2.5‰ and 9.13‰. Figure 1. Location of the study area.

Data Processing
In this study, a total of 235 springs and 235 non-springs were recorded by collecting historical groundwater-related data and conducting field investigations. In general, the occurrence and utilization of groundwater are related to various conditioning factors; sixteen conditioning factors were selected in this study, including altitude, plan curvature, profile curvature, curvature, slope angle, slope aspect, stream power index (SPI), topographic wetness index (TWI), stream transport index (STI), normalized difference vegetation index (NDVI), land use, soil, lithology, distance to roads, distance to rivers, and rainfall (Supplementary Figure S1).
Slope angle, slope aspect, plan curvature, profile curvature, curvature, altitude, SPI, TWI, and STI are topographic factors [27]. They can be extracted from a digital elevation The study area is a hilly and gully region of the Loess Plateau. The rivers in the county belong to the Yellow River system, and the main stream is deep and the tributaries are densely distributed. There are 636 rivers with drainage areas larger than 1 km 2 , out of which 516 rivers have drainage areas between 1 and 10 km 2 , 93 rivers have drainage areas between 10 and 50 km 2 , 33 rivers have drainage areas between 50 and 100 km 2 , and 10 rivers have drainage areas larger than 100 km 2 . The total length of rivers is 3255.96 km, and the river network density is 0.86 km/km 2 .
The Luo River, which is the largest river in the study area, is a secondary tributary of the Yellow River and is the main body of the surface hydrological network in the area. The river network density of drainage areas larger than 1 km 2 is 0.9 km/km 2 . The relative height difference of ravines is 120-567 m, and the average longitudinal gradient of tributaries is between 2.5‰ and 9.13‰.

Data Processing
In this study, a total of 235 springs and 235 non-springs were recorded by collecting historical groundwater-related data and conducting field investigations. In general, the occurrence and utilization of groundwater are related to various conditioning factors; sixteen conditioning factors were selected in this study, including altitude, plan curvature, profile curvature, curvature, slope angle, slope aspect, stream power index (SPI), topographic wetness index (TWI), stream transport index (STI), normalized difference vegetation index (NDVI), land use, soil, lithology, distance to roads, distance to rivers, and rainfall (Supplementary Figure S1).
Slope angle, slope aspect, plan curvature, profile curvature, curvature, altitude, SPI, TWI, and STI are topographic factors [27]. They can be extracted from a digital elevation model (DEM) with a spatial resolution of 30 m. Lithology and soil are geological factors that can be extracted from geological maps. Distances to roads and distances to rivers are extracted from topographic maps [18]. Rainfall data can be downloaded from the Met Office [28]. NDVI and land use maps are created from enhanced thematic mapper plus (ETM+) images [29], and supervised classification is carried out in ENVI software [30].
Altitude refers to the height of a point relative to the base level, which affects the local conditions of the groundwater distribution area [20]. In this study, DEM was used to generate elevation maps, and the natural breaks method was adopted to redivide elevation maps into six categories: 1230-1300 m, 1300-1400 m, 1400-1500 m, 1500-1600 m, 1600-1700 m, and 1700-1800 m.
Plan curvature is a topographically-based variable that shows the direction in which water flows [21]. There are three types of planar curvature: concave (curvature < 0), convex (curvature > 0), and flat (curvature = 0) [35]. In this study, Plan curvature is divided into the Profile curvature shows the rate at which the slope differs in the direction of the highest slope [18]. In this study, profile curvature is divided into 5 categories: (−10. Curvature affects the spatial variation of groundwater flow, soil moisture, and other hydrological conditions, which indirectly affect the recharge of groundwater [36]. In this study, the curvature was divided into 5 categories by the natural breaks method: Based on a digital model (DEM), TWI mainly evaluates the impact of topographic and soil characteristics on soil water distribution. TWI is a secondary topographic index that can describe the location and size of topographic conditions in the saturated source area of surface runoff [37]. The formula is as follows: where SCA represents the confluence area on the unit contour length flowing through a point on the surface. β represents the slope gradient of a terrain at a certain elevation. In this study, the natural breaks method is used in ArcGIS10.5 to reclassify TWI into 5 categories: <2, 2-2.5, 2.5-3. SPI is based on the assumption that the flow rate is relative to a specific catchment area and can be used to measure the erosion capacity of flowing water [18]. It can show potential water erosion at specific locations in the basin [21]. SPI can be defined as: where, A s represents catchment area; β represents slope. STI is a variable that describes the erosion and deposition processes of water flow [38]. In this study, STI is reclassified into 5 categories: <10, 10-20, 20-30, 30-40, and >40.
NDVI is a remote sensing index that reflects the status of land cover vegetation. [39], and it can affect changes in groundwater level [40] and groundwater flow [41]. NDVI is determined by the sum of the red portion (R) and the formula is as follows: This research extracts NDVI from satellite images and reclassifies them into five categories: (−0.17)-0.10, 0.10-0.14, 0.14-0.17, 0.17-0.21, 0.21-0.41.
Due to the hydrological and mechanical characteristics of vegetation, land use has also been regarded as an important factor affecting groundwater potential in many studies [42]. In this study, the land use map is divided into six categories: farm land, forest land, grass land, water bodies, construction land, and bare land. Soil type and texture will affect runoff characteristics and infiltration groundwater recharge and play an extremely important role in groundwater potential evaluation [43]. In this study, the soil map is divided into 5 categories: sticky black loessial soils (Type A), new soils (Type B), cultivated loessial soils (Type C), red clay soils (Type D), and alluvial soils (Type E).
Lithology can affect hydrogeological characteristics such as hydraulic conductivity, aquifer porosity, and groundwater flow [44]. In this study, lithology maps are divided into four categories: Group A, Group B, Group C, and Group D.
The existence of rivers has a significant impact on the degree of erosion and infiltration [45]. In this study, river buffer zones are divided into 5 types: <50 m, 50-100 m, 100-150 m, 150-200 m, and >200 m.
Rainfall is one of the most important factors in the evaluation of groundwater potential. Rainfall can replenish water in the aquifer [46]. In this study, the rainfall map is divided into three categories: <400 mm/year, 400-450 mm/year, and >450 mm/year.

Methodology
There are four main steps in this research ( Figure 2): (1) Describe the study area and prepare groundwater potential conditioning factors; (2) use GIS software to extract the main control factors that affect the potential of groundwater and draw related layers; (3) optimize the selected model and select the best parameters for groundwater potential modeling; (4) use a ROC curve to verify the generated groundwater potential map.

Multicollinearity among Factors
Multicollinearity refers to the phenomenon where two or more impact factors are highly correlated in the regression model [47]. When there is a multicollinear relationship between the selected conditioning factors, the established model will find it difficult to accurately estimate the results. Therefore, the influencing factors with collinearity should

Multicollinearity among Factors
Multicollinearity refers to the phenomenon where two or more impact factors are highly correlated in the regression model [47]. When there is a multicollinear relationship between the selected conditioning factors, the established model will find it difficult to accurately estimate the results. Therefore, the influencing factors with collinearity should be eliminated before modeling. It is generally determined by the variance inflation factor (VIF) and tolerance (TOL) [48,49]. VIF is the reciprocal of TOL [50]. The larger the VIF, the smaller the TOL, indicating that the collinearity is more serious [51].

Evidential Belief Function (EBF)
The Dempster-Shafer theory, also known as belief function and evidence theory [52], is a method of mathematical evidence theory [53]. It was first proposed and introduced by Dempster in 1968, and then further improved and gradually matured by his student Shafer in 1976 [54]. It belongs to the category of artificial intelligence and was first applied to expert systems with the ability to process uncertain information [55].
The EBF model is a bivariate model based on Dempster-Shafer theory [56], which is applied to the spatial correlation evaluation of impact factors and dependent variables, including four equations.
where T represents the dependent variable, A ij represents the jth class of the ith evaluation factor, and A represents the evaluation area; N(T) represents the total number of dependent variables; N(A ij ) represents the number of evaluation units in the jth class of the ith evaluation factor; N(T∩A ij ) represents the number of dependent variables in the jth class of the ith evaluation factor; and N(A) represents the total number of evaluation units in the evaluation area. W A ij (D) is the ratio of the conditional probability that D exists given the presence of A ij to the conditional probability that D exists given the absence of A ij ; W A ij (D) is the ratio of the conditional probability that D does not exist given the presence of A ij to the conditional probability that D does not exist given the absence of A ij .

Rotation Forest (RF)
Rotating forest is a newly developed classifier ensemble technology based on feature extraction [57]. In RF, every basic cluster is trained by the principal component analysis (PCA) algorithm, and feature F is divided into K subsets. For each subset, the PCA algorithm is used to keep all principal components to avoid the loss of mutation informa-Water 2023, 15, 2287 7 of 21 tion. The advantage of RF is that it improves the variability of data and the accuracy of clustering [58].
The main steps of establishing an RF model are: (1) randomly dividing the attributes of the training data set into K subsets; (2) sampling these subsets by Bootstrap and carrying out principal components on each subset; (3) reset the rotation matrix and train the classifier based on the rotation data set; (4) combining the results of the trained classifiers to output the final class label; and (5) assigning these class labels to each pixel. Finally, a groundwater potential map is generated in the ArcGIS environment.

Best-First Decision Tree Classifier (BFTree)
EFTree is a decision tree based on a learning algorithm. It uses multiple classifiers to create a classification that can optimize the result and make it better than a model with only one classifier [59]. The best-first decision tree finds the split that leads to maximal information gain, or Gini gain, in every splitting process [60] and stops growing trees when all instances belong to a single value, when the best information gain is not greater than zero, or when it grows to the specified number. The best-first decision tree shows good classification performance.

Classification and Regression Tree (CART)
The CART model is a binary recursive partition proposed by Breiman et al. [61]. It can deal with continuous and nominal attributes as targets and predictive indicators. It has the ability to resist missing data, and its variables do not need to have a normal distribution [62]. CART is a binary tree structure based on a decision tree. The CART builds classification trees for categorical predictor variables and regression trees for predicting continuous dependent variables [63]. There are two main ideas for establishing a classification regression tree: (1) For each predictor, all possible binary splits of the predictor values are considered, and the best split of each predictor is found. (2) Comparing each predictor by reduction of heterogeneity and finding the best split overall with the largest reduction. Repeat the two-step procedure until there is no meaningful reduction of the response variable and the CART is finished.

Functional Trees (FT)
FT is a multi-classification model using a tree model for learning that can be used for both regression and classification problems [64]. The main feature of the FT model is that the logistic regression function is used to segment the nodes in the function and predict the function leaves instead of separating inputs at tree nodes by comparing the values of input features with uniform values [65]. The accuracy of the FT model is usually related to the minimum number of instances per leaf of the bootstrap iteration and the function tree [64]. The main difference between FT and other hierarchical models is that instead of dividing inputs by comparing the values of input attributes with a constant on tree nodes, it uses logistic regression functions on functional nodes to divide them and make predictions on functional leaves [65].

Performance Evaluation of Models
The validation of the results of the established model is one of the most important tasks in groundwater potential mapping. Without verification, the prediction model has no scientific significance [66]. In this study, the predictive ability of the model was evaluated through statistical indicators and the receiver operating characteristic curve (ROC) [67]. The ROC curve is a comprehensive indicator reflecting the sensitivity and specificity of continuous variables; each point on the curve reflects the susceptibility to the same signal stimulus. The abscissa of the ROC curve is 1-specificity, also known as the false positive rate, expressed by FPR, and the ordinate is sensitivity, also known as the true positive rate, expressed by TPR. Generally, the area under the ROC curve (AUC) can be used to evaluate the performance of the model [68,69]. The AUC is defined as the area under the ROC curve, with a value ranging from 0.5 to 1. It can intuitively describe the difference in model performance, and the closer the AUC value is to 1, the higher the accuracy of the corresponding model and the more realistic and reliable the prediction results [70]. The ROC curve index can be calculated using the following formula: where TP (true positive) and TN (true negative) are the number of pixels that are accurately classified and FP (false positive) and FN (false negative) are the number of pixels that are inaccurately classified. TSS is true skill statistics; it is an index that measures the ability of predicted values to distinguish between events and non-events through all elements in the confusion matrix [71].
MCC is the Matthews correlation coefficient; it is a measure of binary classification [72]. The metric is the correlation coefficient between the actual and predicted classes. MCC = 1, thinking the final result is a perfect prediction; MCC = 0, representing a complete divergence between random prediction and observation; MCC = −1, which represents a complete divergence between prediction and observation [73].
The F-Score is mainly used to evaluate the accuracy of the two-class model. This study also used the chi-square test to evaluate the degree of deviation between the observed value and the theoretical value in the model [74]. The chi-square value is proportional to the degree of deviation.
In addition, this study also evaluated the significance of the results by calculating the p value. p value is a parameter used to determine the result of a hypothesis test. Generally, whether the result is significant is judged by the significance level α. In this study, the 95% confidence interval is selected, and the significance level α = 0.05 is used. If p > 0.05, the result is not significant; if p < 0.05, the result is significant.

Correlation Analysis
The results of the multicollinearity analysis in this study are shown in Figure 3. As can be seen from the above figure, among the 16 groundwater conditioning factors, the highest VIF value is 1.625 and the lowest TOL value is 0.615, indicating that there is no multicollinearity among the sixteen conditioning factors.
Subsequently, the importance of each impact factor is calculated based on the ReliefF method [75]. The algorithm can handle multi-class problems and regression problems where the target attribute is a continuous value [76]. ReliefF method results show that distance to rivers has the greatest impact on groundwater potential (AM = 0.099), while curvature has the smallest influence on groundwater potential (AM = 0.006) (Figure 4).
Based on the EBF model, this study analyzed the correlation between groundwater and conditioning factors (Supplementary Figure S2). Among them, the Bel value reflects the relationship between groundwater influencing factors and the groundwater level, and Subsequently, the importance of each impact factor is calculated based on the ReliefF method [75]. The algorithm can handle multi-class problems and regression problems where the target a ribute is a continuous value [76]. ReliefF method results show that distance to rivers has the greatest impact on groundwater potential (AM = 0.099), while curvature has the smallest influence on groundwater potential (AM = 0.006) (Figure 4).
Based on the EBF model, this study analyzed the correlation between groundwater and conditioning factors (Supplementary Figure S2). Among them, the Bel value reflects the relationship between groundwater influencing factors and the groundwater level, and the Bel value is directly proportional to the groundwater level. The greater the Bel value, the greater the impact on groundwater potential.  Subsequently, the importance of each impact factor is calculated based on the ReliefF method [75]. The algorithm can handle multi-class problems and regression problems where the target a ribute is a continuous value [76]. ReliefF method results show that distance to rivers has the greatest impact on groundwater potential (AM = 0.099), while curvature has the smallest influence on groundwater potential (AM = 0.006) (Figure 4).
Based on the EBF model, this study analyzed the correlation between groundwater and conditioning factors (Supplementary Figure S2). Among them, the Bel value reflects the relationship between groundwater influencing factors and the groundwater level, and the Bel value is directly proportional to the groundwater level. The greater the Bel value, the greater the impact on groundwater potential. In terms of slope angle, the slope angle is between (0-10 • ), and the Bel value is the largest (0.327), followed by the slope angle between (10-20 • ), and the Bel value is 0.271. This shows that the slope range (0-20 • ) has the greatest impact on groundwater potential. The slope bel = 0 greater than 40 • has no effect on the groundwater potential. In terms of slope aspect, the north slope (Bel = 0.231) and the east slope (Bel = 0.180) have a greater impact on the potential of groundwater, with the least impact in the flat direction (Bel = 0). In the altitude range (1230-1500 m), the Bel value increases with the increase in altitude, and when the Bel value reaches its maximum at 1400-1500 m, then the Bel value decreases as the altitude increases. When the altitude increases to 1700-1800 m, the Bel decreases to 0. The range of SPI is  and (>80), and the Bel value is 0.254 and 0.239, respectively. It shows that the stronger the erosion ability of flowing water, the greater the impact on groundwater potential. In the range of (>3.5) and (3-3.5), the Bel value is the largest, at 0.339 and 0.256, respectively. Shows that the greater the recharge of the groundwater aquifer, the greater the impact on groundwater potential. STI is in the range (<10), and the Bel value is the largest (Bel = 0.271). NDVI is in the range of (−0.17)-(0.10), and the Bel value is the largest (Bel = 0.290). It shows that the NDVI value is in the range of (−0.17)-(0.10), which will have a greater impact on the potential of groundwater. In terms of land use, the Bel value of the forest land category is the largest (Bel = 0.523). When the soil type is B, the Bel value is the largest (Bel = 0.646). In soil type D, Bel = 0, indicating that this type has little effect on groundwater potential. In GroupD, the lithology has the largest Bel value (Bel = 0.400), followed by GroupB, with a Bel value of 0.318. This is because the lithology and hydrological characteristics of this group are poor, which has a greater impact on the flow characteristics of groundwater and the porosity and permeability of the aquifer. Therefore, it has a greater impact on groundwater potential. The Bel value of GroupC lithology is 0, indicating that the lithology of this group has almost no effect on the potential of groundwater. For road distance within the range of (<50 m) and (50-100 m), the Bel value is the largest, 0.427 and 0.339, respectively. This is because the road affects the diving of groundwater and the capillary water rising from the upper stagnant surface. The distance to the river is within the range of 50-100 m, and the Bel value is the largest (Bel = 0.390). As rivers increase the recharge and infiltration of groundwater, the further away from the river, the lower the potential of groundwater. Rainfall is a crucial factor because it will directly affect the hydrological characteristics of groundwater, such as runoff and recharge. Its three categories all have large Bel values. Among them, the rainfall is within the medium level range of 400-450 mm/year, and the Bel value is the largest (Bel = 0.491), indicating that medium rainfall is more conducive to groundwater recharge.

Configuration and Training of the Models
Groundwater potential mapping models were constructed using the Weka and Matlab platforms. In the training process of the BFT model, the optimal value of the parameters is found through the heuristic test using the training data set, and the groundwater potential model is established. In this process, the BFT model selects post-pruning and pre-pruning as pruning strategies and then adjusts seed and numFoldsPruning, where seed represents the random number seed to be used and numFoldsPruning represents the number of folds for internal cross-validation.
Finally, import all seed values, numFoldsPruning values, and corresponding AUC values into Matlab software to establish the optimized surface of the BFT model (Figure 5a).  The CART model is also optimized by adjusting the parameters of seed and numFold-sPruning, and its optimized surface is shown in Figure 6.  There are three types of FT models: FT, FTLeaves, and FTInner. All model types require tuning of the parameter numboostingiteration (numBoostingIteration represents se ing a fixed number of iterations for LogitBoost). Record the AUC values corresponding to each parameter on the curve. The optimization curves corresponding to these three model types are shown in Figure 7.
For hybrid models, the optimal parameters of the base models are first found, and then the relevant parameters (seed and number of iterations) of the ensemble models are adjusted. NumIterations represents the number of iterations to perform. Finally, draw the optimized surface of each ensemble model according to the optimization parameters calculated by Weka software (Figure 8). The optimized parameters of all selected models are shown in Table 1.  There are three types of FT models: FT, FTLeaves, and FTInner. All model types require tuning of the parameter numboostingiteration (numBoostingIteration represents setting a fixed number of iterations for LogitBoost). Record the AUC values corresponding to each parameter on the curve. The optimization curves corresponding to these three model types are shown in Figure 7.

Model Performance and Validation
Performance of models using cutoff-dependent metrics (Supplementary Table S1). The training data of the RFCART model has the highest MCC value (0.679), indicating that in the training data set, the correlation coefficient between the actual category and the predicted category of the model is the largest and the correlation is the highest. Its TSS value is also the largest (0.678), which shows that in the training data set, the fit between the actual groundwater level and the predicted groundwater level is excellent. In addition, the accuracy value and F-score value of the training data of the RF-CARF model are higher than those of other models.
Validation of models using cutoff-dependent metrics (Supplementary Table S2). In this table, the accuracy, F-Score, and TSS values of the validation data sets of the RF-BFT and RF-FT models are more obvious. The MCC value of the validation data of the RF-BFT For hybrid models, the optimal parameters of the base models are first found, and then the relevant parameters (seed and number of iterations) of the ensemble models are adjusted. NumIterations represents the number of iterations to perform. Finally, draw the optimized surface of each ensemble model according to the optimization parameters calculated by Weka software (Figure 8). The optimized parameters of all selected models are shown in Table 1.

Model Performance and Validation
Performance of models using cutoff-dependent metrics (Supplementary Table S1). The training data of the RFCART model has the highest MCC value (0.679), indicating that

Model Performance and Validation
Performance of models using cutoff-dependent metrics (Supplementary Table S1). The training data of the RFCART model has the highest MCC value (0.679), indicating that in the training data set, the correlation coefficient between the actual category and the predicted category of the model is the largest and the correlation is the highest. Its TSS value is also the largest (0.678), which shows that in the training data set, the fit between the actual groundwater level and the predicted groundwater level is excellent. In addition, the accuracy value and F-score value of the training data of the RF-CARF model are higher than those of other models.
Validation of models using cutoff-dependent metrics (Supplementary Table S2). In this table, the accuracy, F-Score, and TSS values of the validation data sets of the RF-BFT and RF-FT models are more obvious. The MCC value of the validation data of the RF-BFT model is the largest (0.489), which shows that in the validation data set, the correlation between the actual and predicted classes of this model is more consistent.
Parameters of ROC curves with a training dataset ( Table 2): It can be seen that in the training data set, all seven models have good predictive capabilities. Among them, the RF-BFT model has the highest AUC value (0.911), followed by the RF-FT model and the RF-CART model, at 0.898 and 0.894, respectively. In addition, Table 3 also describes the standard errors (SE) of these 7 models and the upper and lower limits of the asymptotic 95% confidence interval. These results all show an error value within the normal range. The p values in Table 3 are all <0.0001, indicating that the result is significant. Parameters of ROC curves with testing datasets (Table 3): In this table, the result of the validation data set shows that the RF-CART model has the largest AUC value (0.808), followed by the RF-BFT and RF-FT, 0.807 and 0.800, respectively. Shows that the RF-CART model has excellent predictive performance. This is the same result as the training data set.

Comparison of the Hybrid Model with Benchmark Models
The chi-square test and p value can determine the statistical significance between models [77]. Therefore, this study uses the chi-square value and p value to compare the significant difference in performance between the hybrid model and the benchmark model. Under the condition of significance level α = 0.05, the larger the chi-square value and the corresponding p value is less than 0.05, indicating that the model performance is significantly different. It can be seen that there are significant differences in the performance of all models (Table 4).

Generation of Groundwater Potential Maps
After training and testing procedures in this study, groundwater potential maps of these 7 models were obtained, and four different classification methods were used to reclassify the obtained GPM into 5 categories: very low (VLC), low (LC), moderate (MC), high (HC), and very high (VHC). The four classification methods are: geometric interval, natural breaks, quantile, and equal interval [78]. Figure 9a shows the spatial distribution and proportion of each potential class of each model under these four classification methods. It can be seen from the figure that in the high potential category, groundwater is more distributed, and all models have similar spatial distributions. In order to select the optimal classification method, this study uses the five potential classes as the abscissa and the groundwater point density (GSD) as the ordinate to draw the point density line graphs of these four classification methods ( Figure 10). The point density line chart can intuitively reflect the distribution proportions of the five potential classes under different models. It can be seen from the figure that in the natural breaks classification method, the higher the potential category, the larger the GSD value, and the more groundwater distribution. Furthermore, the natural break method has the best classification performance, among which the results of the RF-BFT model are the most reliable. Therefore, this study uses the natural breaks method to reclassify the groundwater potential map.
The histogram of the spatial distribution of the five potential classes of each model obtained using the natural break method is shown in Supplementary Figure S3b. In this figure, the five potential classes of the RFBFT, RF-ART, and RFFT ensemble models have similar spatial distributions. The pixel percentage decreases with the increase in the potential category, and the percentage of groundwater distribution increases with the increase in the potential category. This rule is confirmed in other articles [79]. The EBF model shows different results: The pixel percentage in the LC and MC categories is larger, while the pixel percentage in the VLC category is relatively small.
This study uses the natural breaks method to generate the final groundwater potential maps of seven models ( Figure 11). It can be seen from the figure that the very high (VHC) groundwater level area in Wuqi is mainly distributed in the eastern and western valleys with lower altitude, and the distribution ratio is consistent with the above results, which also shows that the classification selected in this paper is appropriate.
groundwater point density (GSD) as the ordinate to draw the point density line graphs of these four classification methods ( Figure 10). The point density line chart can intuitively reflect the distribution proportions of the five potential classes under different models. It can be seen from the figure that in the natural breaks classification method, the higher the potential category, the larger the GSD value, and the more groundwater distribution. Furthermore, the natural break method has the best classification performance, among which the results of the RF-BFT model are the most reliable. Therefore, this study uses the natural breaks method to reclassify the groundwater potential map. The histogram of the spatial distribution of the five potential classes of each model obtained using the natural break method is shown in Supplementary Figure S3b. In this figure, the five potential classes of the RFBFT, RF-ART, and RFFT ensemble models have similar spatial distributions. The pixel percentage decreases with the increase in the potential category, and the percentage of groundwater distribution increases with the increase in the potential category. This rule is confirmed in other articles [79]. The EBF model shows different results: The pixel percentage in the LC and MC categories is larger, while the pixel percentage in the VLC category is relatively small.
This study uses the natural breaks method to generate the final groundwater potential maps of seven models ( Figure 11). It can be seen from the figure that the very high (VHC) groundwater level area in Wuqi is mainly distributed in the eastern and western valleys with lower altitude, and the distribution ratio is consistent with the above results, which also shows that the classification selected in this paper is appropriate.

Discussion
This article uses the BFT model, CART model, and FT model as the base classifiers to integrate the rotation forest model with them to model the distribution of groundwater in Wuqi County, China. In addition, the area under the receiver operating characteristic curve (AUC) is used to verify the accuracy and success rate of the training and test data. The results show that the AUC values of the ensemble model are all greater than those of the benchmark model. The articles of Nguyen et al. [80], Razavi-Termeh [22] and Lee and Oh [81] also have relevant instructions for sub-model integration. In the report of Hosseinalizadeh et al. [82], they integrated Bag, RS, and RF methods with the BFT model, respectively. The final training and verification results show that the AUC value of the BFT single model is the smallest. In addition, some researchers have also tried to combine different base models. Pham [20] integrated three different hybrid computing intelligent models with basic decision stump classifiers to draw groundwater potential maps. Kordestani [21] implemented the evidence belief function and the enhanced regression tree (EBF-BRT) ensemble algorithm in groundwater potential mapping. Although the ensemble methods used by the researchers are different, the final results all explain that the correct combination of weak classifiers can effectively solve the overfitting problem in the modeling process, thereby improving the performance of the model. This also confirms that the conclusions of this study are true and accurate. The rotation forest algorithm used in this study has obtained good application results in some studies [83]. However, Naghibi et al. [15] have obtained different results. The scholar used the EBFTM data mining method as a new integrated model and compared it with the tree-based rotation forest model to establish a groundwater potential map. It was found that the results obtained by the EBFTM method were even better. The reason for the different research results in this literature may be due to the differences in the characteristics of the data samples and the research objects in the selected study area. Therefore, in future research, researchers should choose the corresponding integration method according to the different research areas and apply the selected method to more than two research areas as much as possible, so as to better judge the applicability of the model.
Compared with other studies on groundwater potential, in order to obtain more reliable modeling results, this article also uses the Weka platform to optimize the modeling parameters of the study area and uses a 10-fold cross-validation method to process the sample data. In the study of Naghibi et al. [15], the training data prediction rate of the CART model is only 0.7870. The research results of Nguyen et al. [84] show that the AUC values obtained by the RFBFT model after training and verification are 0.891 and 0.826, respectively. In the report of Pham et al. [64], the AUC value of the FT model training sample is 0.849. However, Zhao and Chen [85] optimized the parameters of one of the models (the LMT model) in the study and finally found that the AUC values of the remaining models that did not participate in the optimization were lower than the LMT model (for example, the AUC values of the RFFT model training and validation data sets are 0.839 and 0.740, respectively). By comparing the reports of these researchers, it can be seen that it is necessary and effective to optimize the parameters of the selected model. However, in the research of Yariyan et al. [86] and Chen et al. [87], the BFT model obtained a better AUC value than this paper without optimization. The reason for this result may be that the climate and hydrological characteristics of the study area are different, which makes the BFT model less applicable to the study area.
Different classification methods produce different results for groundwater potential maps. In this study, four different classification systems (natural break method, quantile, geometric interval, and equal interval) were selected to partition groundwater potential mapping. The reclassification results indicate that the natural break method is the most effective. Baeza et al. [88] also used different methods (equal interval, natural break method, quantile, and standard deviation) to classify landslide sensitivity mapping. The final result demonstrates that the natural break method is the most suitable classification method. Youssef et al. [89] applied the same classification method as this study to partition the landslide sensitivity map, and results illustrate that the quantile method is the most accurate distribution classifier. The initial guess as to the reason for this phenomenon is that the model chosen by the scholar is different from this article. Furthermore, the study area of the report is a basin, while this study area is the Loess Plateau. The difference in the geomorphic units of the study area makes the classification method applicable to different degrees.
This study judged the weight of the selected conditioning factor classes based on the EBF model. When the slope is small, the surface runoff velocity is large, and the impact on the groundwater potential is more obvious. In the research of Ozdemir [90], Naghibi and Pourghasemi [91], the correlation analysis between slope and groundwater potential yielded the same result. Abd Manap et al. [92] believe that altitude has a moderate impact on groundwater potential. According to the study of Tien Bui [33], the larger the SPI value, the greater the impact on the hydrological characteristics of groundwater. Zabihi [35] and Pham [20] believe that the greater the TWI value, the greater the recharge of groundwater aquifers. Chen et al. [13] proposed that the smaller the NDVI value, the easier it is for surface water to penetrate into the ground. The research results of these scholars are basically consistent with this research. However, the STI indicator expressed unexpected results; Kordestani [21] proposed that areas with higher STI values have higher groundwater potential because of the higher groundwater level. This difference may be due to the inconsistency of the impact factor with the geographical characteristics and groundwater generation mechanisms of this study area.
In summary, this study integrates BFTree, CART, FT and RF models, and uses BFTree, CART, FT, RF-BFT, RF-CART, RF-FT, and EBF benchmark models to map groundwater potential in Wuqi County, China. According to the multicollinearity analysis results, the sixteen conditioning factors were not highly correlated. The ReliefF method was used to calculate the importance of each influence factor. The results show that the distance from the river has the greatest influence on the groundwater potential, while the curvature has the least influence on the groundwater potential. Based on the EBF model, we could conclude that all factors contributed to groundwater potential modeling. Finally, use statistical indicators and the ROC curve to verify the accuracy of the model results. The results show that the AUC value of the ensemble model is greater than that of the benchmark model, its prediction rate is higher, and its performance is better. Among them, the RF-BFT model has the highest prediction rate. In summary, the research results obtained in this paper could provide certain reference guidelines for the rational utilization of groundwater resources, ecological environment protection, and regional land planning.

Concluding Remarks
Accurate assessment and analysis of the groundwater potential play a pivotal role in groundwater management, development, and utilization. This study uses a decision tree algorithm based on a rotating forest to model the groundwater space in Wuqi County, China. First, sixteen conditioning factors were selected in the study area. Secondly, analyze the collinearity and correlation between the impact factor and groundwater and optimize the various parameters of the selected model. After that, use the BFTree, CART, EBF, and FT models and the RF-BFT, RF-CART, and RF-FT ensemble models to model the groundwater in Wuqi County. Finally, use statistical indicators and the ROC curve to verify the accuracy of the model results. The results show that the AUC value of the ensemble model is greater than the benchmark model, its prediction rate is higher, and the performance is better [93]. Among them, the RF-BFT model has the highest prediction rate. This study also reclassified the groundwater potential map obtained by different classification methods and chose the best classification method to generate the final groundwater potential map of the study area. After comparative analysis, the groundwater potential map results of this study are reliable and can be used as a useful tool for the local government of Wuqi County to explore and develop groundwater potential.
Supplementary Materials: The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/w15122287/s1, Figure S1: Spring conditioning factors; Figure S2: Conditioning factor histogram based on EBF; Figure S3: Selection of the best classification method for groundwater potential map: (a) geometrical interval, (b) natural breaks, (c) quantile, (d) equal interval; Table S1: Performance of models using cutoff-dependent metrics; Table S2: Validation of models using cutoff-dependent metrics.