Modeling and Investigating the Mechanisms of Groundwater Level Variation in the Jhuoshui River Basin of Central Taiwan

Due to nonuniform rainfall distribution in Taiwan, groundwater is an important water source in certain areas that lack water storage facilities during periods of drought. Therefore, groundwater recharge is an important issue for sustainable water resources management. The mountainous areas and the alluvial fan areas of the Jhuoshui River basin in Central Taiwan are considered abundant groundwater recharge regions. This study aims to investigate the interactive mechanisms between surface water and groundwater through statistical techniques and estimate groundwater level variations by a combination of artificial intelligence techniques and the Gamma test (GT). The Jhuoshui River basin in Central Taiwan is selected as the study area. The results demonstrate that: (1) More days of accumulated rainfall data are required to affect variable groundwater levels in low-permeability wells or deep wells; (2) effective rainfall thresholds can be properly identified by lower bound screening of accumulated rainfall; (3) daily groundwater level variation can be estimated effectively by artificial neural networks (ANNs); and (4) it is difficult to build efficient models for low-permeability wells, and the accuracy and stability of models is worse in the proximal-fan areas than in the mountainous areas.


Introduction
Due to industrial and economic development along with rapid population growth in Taiwan in recent years, more water sources are required to satisfy the increasing water demands from the development of human civilization and livelihoods. Due to its easy accessibility, low cost, and stable quality and temperature, groundwater is often considered an important water source. The interactive mechanism of groundwater level variations is rather complex and includes precipitation, tides, atmospheric pressure, and even earthquakes. Previous research has shown that rainfall can be regarded as a leading indicator of groundwater levels by applying time series to analyze the relationship between rainfall and groundwater levels [1][2][3][4][5][6][7]. However, the temporal relationship between groundwater and

Pearson Correlation Coefficient
The Pearson correlation coefficient [19] is a quantitative method used to measure the linear correlation between two variables. In this study, it is used to determine the temporal relationship between groundwater level variations and precipitation as well as stream flow. The Pearson correlation coefficient ranges between [−1,1], in which 1 denotes a positive exact linear relationship, −1 indicates a negative perfect linear relationship, and 0 indicates that no linear relationship exists between the two variables. For variables of P (precipitation) and G (groundwater) with n datasets (each dataset can be denoted by p i and g i , where i = 1, 2, . . . , n), the correlation coefficient can estimate the Pearson correlation (R) between P and G. The calculation of the correlation coefficient is given as: where P and G are the means of P and G accordingly.

Gamma Test (GT)
The Gamma test (GT) is designed to estimate noise variance by measuring the smooth relationship between the input-output dataset. The GT was initially proposed by Končar and Aðalbjörn [20,21] to determine the best input combination for a neural network [22,23]. This study implements the GT for the extraction of nonlinearity between groundwater level variations and precipitation, along with the identification of important factors affecting groundwater level variations.
Given an input-output pair denoted by (X, y) = ((x 1 , . . . , x m ), y), where X is the input items and scalar y is the output items, the observations can be described as Equation (2).
where, X i ∈ R m are the m-dimensional input variables with a dataset length of M, which is constrained to a closed bounded set C ∈ R m . The corresponding outputs y i ∈ R are scalars. The underlying relationship of the data set can be denoted: where, f is a smooth function, and r denotes a random variable for noise. The Gamma statistic (Γ) is a variance estimation of noise. Let X i,k denote the kth nearest neighbor to X i in terms of Euclidean distance. The delta function is defined: where, |· · · | is the Euclidean distance, and p is the number of neighbors. For computing the Γ, the line of least squares regression is built for p points: where, A is the gradient. The steeper the gradient is, the more complex is the model. The Γ is an index to evaluate the model performance of an input combination. A Γ closer to 0 implies a more appropriate input combination.

Back-Propagation Neural Network (BPNN)
The back-propagation neural network (BPNN), developed by Rumelhart et al. [24], is used to build a model for estimating the groundwater level variations in this study. Figure 1 shows the BPNN structure in this study.

Back-Propagation Neural Network (BPNN)
The back-propagation neural network (BPNN), developed by Rumelhart et al. [24], is used to build a model for estimating the groundwater level variations in this study. Figure 1 shows the BPNN structure in this study.  The input items comprise the streamflow and accumulated rainfall. The node number of the hidden layer is determined by trial and error. The model target is groundwater level variation. The BPNN applies the gradient steepest descent method for adjusting the neuron weights for minimizing the output error. In the learning process (calculation iterations), the neuron weights can be adjusted by an error convergence technique to approximate the model target based on the given input set. The output layer error may propagate backward to the input layer. Further details of the BPNN algorithm are found in Rumelhart et al. [24].

Adaptive Neuro-Fuzzy Inference System (ANFIS)
The adaptive neuro-fuzzy inference system (ANFIS) was proposed by Jang [25] who used the fuzzy inference system as an essential core in combination with the artificial neural network. The ANFIS preserves the learning process of ANNs for projecting input features to an output space and retains the fuzzy advantages, including if-then rules (rule layer), for describing the regional behavior of such projection. Then, the results of the inference system are acquired through the reasoning capability of fuzzy logic. The ANFIS was shown to have powerful modeling abilities in comprehensive fields such as motor fault detection and diagnosis [26], power systems dynamic load [27], forecasting systems for the demand of teachers' human resources [28] and real-time reservoir operations [29].
This study applies the Takagi-Sugeno fuzzy model [30] in a fuzzy rule layer. Here, 2 input variables, p (precipitation) and s (streamflow), and 1 output variable, g (groundwater), are taken as an example for describing the rule layer. The rule sets of the rule layer can be expressed as: Rule 1: If p is A 1 and s is B 1 then g 1 = k 1 *p + t 1 *s + r 1 Rule 2: If p is A 2 and s is B 2 then g 2 = k 2 *p + t 2 *s + r 2 where k, t and r are linear parameters in the consequent part (then-part) of the first-order Takagi-Sugeno fuzzy model.
The typical ANFIS includes 5 layers ( Figure 2). The input layer-in this layer, each node produces membership degrees that belong to each of the fuzzy sets by using appropriate membership functions. The rule layer-the AND operator is used to obtain one output that represents the result of the antecedent for that rule. The average layer-the ratio of each ith rule's firing strength to the sum of all the rules' firing strength is calculated. The consequent layer-the node function determines the contribution of each ith rule's toward the total output. The output layer-the defuzzification process transforms each rule's fuzzy results into the model output. The details of the ANFIS are found in Chang and Chang [31]. functions. The rule layer-the AND operator is used to obtain one output that represents the result of the antecedent for that rule. The average layer-the ratio of each ith rule's firing strength to the sum of all the rules' firing strength is calculated. The consequent layer-the node function determines the contribution of each ith rule's toward the total output. The output layer-the defuzzification process transforms each rule's fuzzy results into the model output. The details of the ANFIS are found in Chang and Chang [31].

Case Study
This study investigates the interactive mechanism of groundwater level variations and precipitation in mountainous areas in the Jhuoshui River basin by using statistical methods. The Jhuoshui River basin is located in Central Taiwan with a maximum elevation of 3200 m. The watershed area of Jhuoshui River is approximately 3157 km 2 . Figure 3a shows the distribution of the gauge stations and the hydrogeological information of groundwater monitoring wells in the study area. There are 2 main regions in this study area: The mountain area and the alluvial fan. The alluvial fan can be roughly divided into 3 districts: The fan-top district; the fan-mid district; the fan-tail district (Figure 3a). The G1~G4 monitoring wells belong to the mountain area, and the G5~G8 monitoring wells belong to the fan-top of the alluvial fan. As shown in Figure 3b, the fan-top district is the only district without a significant confined aquifer. In general, the hydrogeological feature of the fan-top district in this study area is recognized as a high potential groundwater recharge area because the main geological components are sandstone, phyllite and slate (the gravel of the geological structure) (Figure 3b). According to investigations of core drilling samples, the soil thickness of this study area (the depth to bedrock, including the soil layer, colluvium layer and saprolite) is deep [32]. The Central Geological Survey (CGS) (Ministry of Economic Affairs (MOEA), R.O.C.) indicated that the hydrogeological structure of the study area can be divided into several strata. Based on the depth from the land surface ( Figure 3b), these strata are as follows: Aquifer 1 (F1), Aquitard 1 (T1), Aquifer 2 (F2), Aquitard 2 (T2) and Aquifer 3 (F3). The average thicknesses of Aquifer 1, Aquifer 2 and Aquifer 3 are 42 m, 95 m and 86 m, respectively. The thickness of the gravel and sand strata in the fan-top areas can reach more than 130 m [33]. In this study area, some groundwater monitoring wells include two different depth monitoring records (in a different aquifer), such as G1, G4, G7 and G8 (Figure 3c). However, according the hydrogeological information provided by the CGS (Figure 3b), it is recognized that the shallow groundwater well in G7 belongs to Aquifer 3 (F3). The rainfall is distributed unevenly, which mainly occurs from May to September due to the unique topographical terrain and location. The total rainfall in wet periods comprises 75% of the annual rainfall, which implies rainfall differs significantly between wet and dry periods.
The groundwater level data at twelve groundwater monitoring wells and the flow data at two streamflow gauging stations were collected from the Water Resources Agency in Taiwan during 2001 and 2011, and rainfall data at seventeen rainfall gauging stations were collected from the Water Resources Agency, Central Weather Bureau and Taiwan Power Company in Taiwan in 2001  and 2011 (Tables 1-3). The missing data were infilled by using linear regression techniques based on the data of the surrounding stations. The flowchart of this study is shown in Figure 4. First, this study conducts rainfall duration analyses by investigating the effective duration of accumulated rainfall on groundwater level variations. Second, this study conducts effective rainfall analysis by identifying significant accumulated rainfall amount thresholds affecting groundwater level variations, and therefore the correlation between accumulated rainfall amounts and groundwater level variations can be estimated. Third, this study constructs estimation models of the groundwater level variations by using the BPNN and ANFIS based on rainfall, streamflow and groundwater level variation data, in which the GT is used to determine the proper input variables associated with the rainfall gauging stations as well as the streamflow gauging stations (S1), and rainfall durations for each model at individual groundwater monitoring stations. Finally, the performance of the models is compared.            The Pearson correlation coefficient and the root mean square error (RMSE) are used to evaluate the performance of the estimation model. The calculation of RMSE is given below: where, y i is the model estimation and d i is the observation, and N is the number of datasets. The RMSE is used to evaluate the accuracy of the estimations of the groundwater level variations. The lower the RMSE value is, the better is the model's performance.

Duration of Accumulated Rainfall Analysis
First, this study uses the Pearson correlation coefficient to investigate the relationship between the duration of accumulated rainfall (1-10 days) and the groundwater level variation. A high correlation coefficient indicates a strong relationship between the accumulated rainfall and the groundwater level variation. To generate a comprehensive data set, rainfall levels at each rainfall gauging station can be determined by the average rainfall over the basin obtained from the Thiessen polygon method [34]. Figure 5 shows the time series of groundwater levels at the groundwater monitoring wells and the rainfall data at R2. This study also analyzed the relationship between the groundwater level variation and the rainfall at a 1-day lag (from the current day to the previous day), and further compared the results with a duration analysis. At all groundwater monitoring stations, the results show that the correlation coefficient of groundwater level variations and accumulated rainfall with different durations (2-10 days) is higher than the rainfall at the current day and 1-day lag. In other words, the accumulated rainfall has a stronger positive relationship with the groundwater level variation than the rainfall of the previous day at all wells. As shown in Figure 6, the highest correlation between the groundwater level variation and twoday accumulated rainfall occurs at G4(1), G4(2) and G2, which might result from the gravel of the geological structure at G4 and G2. The highest correlation between groundwater level variations and three-day accumulated rainfall occurs at G1(1) and G3, which may be because of the shallow well depth of G1(1) and G3. The highest correlation between the groundwater level variation and a fiveday accumulated rainfall occurs at G1(2), which might be caused by the deep well depth of G1(2). Therefore, the accumulated rainfall over different durations is selected as an input variable in the ANN models for estimating the groundwater level variations. In addition, the differences in geological structure and depth at each groundwater monitoring well produces different relationships between the groundwater level variations and the accumulated rainfall of different durations.

Effective Rainfall Analysis
The different quantities of rainfall may have different impacts on the groundwater level variation. This study considers the average rainfalls associated with different thresholds for evaluating the rainfall amount impacts on the groundwater level variations at G1 to G4 by the use of As shown in Figure 6, the highest correlation between the groundwater level variation and two-day accumulated rainfall occurs at G4(1), G4(2) and G2, which might result from the gravel of the geological structure at G4 and G2. The highest correlation between groundwater level variations and three-day accumulated rainfall occurs at G1(1) and G3, which may be because of the shallow well depth of G1(1) and G3. The highest correlation between the groundwater level variation and a five-day accumulated rainfall occurs at G1(2), which might be caused by the deep well depth of G1 (2). Therefore, the accumulated rainfall over different durations is selected as an input variable in the ANN models for estimating the groundwater level variations. In addition, the differences in geological structure and depth at each groundwater monitoring well produces different relationships between the groundwater level variations and the accumulated rainfall of different durations. As shown in Figure 6, the highest correlation between the groundwater level variation and twoday accumulated rainfall occurs at G4(1), G4(2) and G2, which might result from the gravel of the geological structure at G4 and G2. The highest correlation between groundwater level variations and three-day accumulated rainfall occurs at G1(1) and G3, which may be because of the shallow well depth of G1(1) and G3. The highest correlation between the groundwater level variation and a fiveday accumulated rainfall occurs at G1(2), which might be caused by the deep well depth of G1(2). Therefore, the accumulated rainfall over different durations is selected as an input variable in the ANN models for estimating the groundwater level variations. In addition, the differences in geological structure and depth at each groundwater monitoring well produces different relationships between the groundwater level variations and the accumulated rainfall of different durations.

Effective Rainfall Analysis
The different quantities of rainfall may have different impacts on the groundwater level variation. This study considers the average rainfalls associated with different thresholds for evaluating the rainfall amount impacts on the groundwater level variations at G1 to G4 by the use of

Effective Rainfall Analysis
The different quantities of rainfall may have different impacts on the groundwater level variation. This study considers the average rainfalls associated with different thresholds for evaluating the rainfall amount impacts on the groundwater level variations at G1 to G4 by the use of the Pearson correlation coefficient. The duration of the accumulated rainfall adopts the result of Section 4.1 Duration of accumulated rainfall analysis: G4(1), G4(2) and G2-two days; G1(1) and G3-three days; G1(2)-five days. This study includes two types of screening: (1) Lower bound screening: Delete the data sets below the rainfall threshold The rainfall data (a total more than 3986) were screened at thresholds from 1-50 or 1-100 mm (with 1 mm increasing). Figure 7 shows the Pearson correlation between the groundwater level variations and the accumulated rainfall at different lower bound thresholds. In Figure 7a, the correlation patterns of G4(1) and G4(2) change slightly, and the correlation curve rises at the 13 mm threshold of the accumulated rainfall. This might be due to the greater structural permeability of G4 (gravel), as well as the location of G4 close to the Jhuoshui river, so that the groundwater level at G4 is affected by rainfall and the stream flow at the same time. The correlation curve of G2 rises obviously at the accumulated rainfall threshold of 26 mm. It is suspected that this causes saturation when the accumulated rainfall is 26 mm, and furthermore, the excess rainfall causes percolation to recharge the groundwater layer. Therefore, the effective rainfall thresholds affecting the groundwater level variation at G2 can be identified as 26 mm. In Figure 7b, the correlation pattern of G1(1) and G3 are uniform. At G1(1), the correlation increases gradually at the accumulated rainfall threshold of 34 mm, and the highest correlation coefficient is 0.8 at the accumulated rainfall threshold of 63 mm. In Figure 7c, the correlation pattern of G1 (2) is similar to G2. The correlation curve of G1(2) rises obviously at the threshold of 48 mm of the accumulated rainfall, which causes effective groundwater recharge activity when the accumulated rainfall rises to 48 mm in five days. (2) and G2-two days; G1(1) and G3three days; G1(2)-five days. This study includes two types of screening: (1) Lower bound screening: Delete the data sets below the rainfall threshold The rainfall data (a total more than 3986) were screened at thresholds from 1-50 or 1-100 mm (with 1 mm increasing). Figure 7 shows the Pearson correlation between the groundwater level variations and the accumulated rainfall at different lower bound thresholds. In Figure 7a, the correlation patterns of G4(1) and G4(2) change slightly, and the correlation curve rises at the 13 mm threshold of the accumulated rainfall. This might be due to the greater structural permeability of G4 (gravel), as well as the location of G4 close to the Jhuoshui river, so that the groundwater level at G4 is affected by rainfall and the stream flow at the same time. The correlation curve of G2 rises obviously at the accumulated rainfall threshold of 26 mm. It is suspected that this causes saturation when the accumulated rainfall is 26 mm, and furthermore, the excess rainfall causes percolation to recharge the groundwater layer. Therefore, the effective rainfall thresholds affecting the groundwater level variation at G2 can be identified as 26 mm. In Figure 7b, the correlation pattern of G1(1) and G3 are uniform. At G1(1), the correlation increases gradually at the accumulated rainfall threshold of 34 mm, and the highest correlation coefficient is 0.8 at the accumulated rainfall threshold of 63 mm. In Figure  7c, the correlation pattern of G1 (2) is similar to G2. The correlation curve of G1(2) rises obviously at the threshold of 48 mm of the accumulated rainfall, which causes effective groundwater recharge activity when the accumulated rainfall rises to 48 mm in five days.   (2) Upper bound screening: deleting the data sets over the rainfall threshold   (2) Upper bound screening: deleting the data sets over the rainfall threshold Figure 8 demonstrates the Pearson correlation between the groundwater level variations and the accumulated rainfall at different upper bound thresholds. All the groundwater monitoring wells show the same correlation patterns well.  (2) Upper bound screening: deleting the data sets over the rainfall threshold    With the higher upper bound rainfall threshold (which retain more heavy rainfall data), the correlation coefficient increases. This indicates that the groundwater level variations increase with more accumulated rainfall. This study can identify the lower bound threshold of the accumulated rainfall affecting the groundwater level effectively, but the authors cannot identify the upper bound threshold of accumulated rainfall.

Estimation of Groundwater Level Variations
This study built two estimation models (BPNN and ANFIS) for the groundwater level variations for each groundwater monitoring well. The learning and training efficiency of ANN models decrease if the data from all hydrology information is incorporated into the models. Therefore, the GT was used to determine the critical input factors (i.e., rainfall and streamflow) that correlated strongly with the groundwater level variations for ANN models. According to Chang et al. [35], it is better to use rainfall gauging stations as well as streamflow gauging stations as simultaneous input factors while constructing the groundwater level variation models at G1 to G4. Therefore, the study adopts the GT to evaluate the critical rainfall stations from G1 to G4, and streamflow gauging stations S1 and S2 are used as two other inputs for the ANN models. However, due to uncertainty as to whether or not the streamflow information is the critical factor of the groundwater level in the proximal-fan areas, this study adopts the GT to determine the critical rainfall gauging stations and streamflow gauging stations at G5 to G8. The GT is also used to detect rainfall durations (2-5 days) that have the highest correlations with the groundwater level variations at all wells. Table 4 shows the results of the GT analysis.  With the higher upper bound rainfall threshold (which retain more heavy rainfall data), the correlation coefficient increases. This indicates that the groundwater level variations increase with more accumulated rainfall. This study can identify the lower bound threshold of the accumulated rainfall affecting the groundwater level effectively, but the authors cannot identify the upper bound threshold of accumulated rainfall.

Estimation of Groundwater Level Variations
This study built two estimation models (BPNN and ANFIS) for the groundwater level variations for each groundwater monitoring well. The learning and training efficiency of ANN models decrease if the data from all hydrology information is incorporated into the models. Therefore, the GT was used to determine the critical input factors (i.e., rainfall and streamflow) that correlated strongly with the groundwater level variations for ANN models. According to Chang et al. [35], it is better to use rainfall gauging stations as well as streamflow gauging stations as simultaneous input factors while constructing the groundwater level variation models at G1 to G4. Therefore, the study adopts the GT to evaluate the critical rainfall stations from G1 to G4, and streamflow gauging stations S1 and S2 are used as two other inputs for the ANN models. However, due to uncertainty as to whether or not the streamflow information is the critical factor of the groundwater level in the proximal-fan areas, this study adopts the GT to determine the critical rainfall gauging stations and streamflow gauging stations at G5 to G8. The GT is also used to detect rainfall durations (2-5 days) that have the highest correlations with the groundwater level variations at all wells. Table 4 shows the results of the GT analysis.  The node numbers of the hidden layer in BPNN and the rule numbers in ANFIS were determined by trial and error. The optimal model structure is determined by the lowest RMSE value and the highest correlation coefficient in the validation phases at each groundwater monitoring station.
The observations were divided into the training, validation and testing phases in the ratio of 5:2.5:2.5 or 6:2:2. Table 5 shows the estimation results of BPNN and ANFIS at each groundwater monitoring well (G1 to G8), and Table 6 shows the statistics of the observed data.  The results indicate that the ANN models provide an effective simulation and prediction of the groundwater level variation because the observations and the estimations have high correlations and small RMSE values. When comparing Tables 5 and 6, the RMSEs are smaller than the standard deviation at G1 to G4. When comparing the two types of models, the estimations from BPNN are better than those from the ANFIS model. In particular, BPNN shows more than 23% improvement in terms of the correlation coefficient at G1(1), G2 and G5. Therefore, when comparing the two areas of this study, it is easier to predict the groundwater level variation with adequate reliability and accuracy in mountainous areas than in proximal-fan areas.

Conclusions
Water resource deficiency is a global problem, especially under severe climate change. In Taiwan, groundwater is considered an important alternative water resource due to its low cost and convenient accessibility. The improper groundwater development may result in disasters coupled with environmental and economic losses, and strategic development for groundwater conservation is therefore a critical issue. Precipitation is the main source of groundwater recharge, so this study used statistical methods to explore the relationship between precipitation (duration and amount) and the groundwater level variation. The statistical results demonstrate that the connection of precipitation and groundwater can be described by the hydrogeological features. The geological structure (potential high-porosity and high-permeability) of the gravel and the thin soil thickness (shallow monitoring wells) resulted in a high correlation between the groundwater level variation and the short precipitation duration. The results also show that the groundwater level variation in the high-permeability geological structures would have minor effects on accumulated rainfall. In the other words, the high-permeability area can be considered to be an area sensitive to precipitation.
After identifying the relationship between precipitation and the groundwater level variations, this study built an estimation model of the groundwater level variation by artificial intelligence techniques. This study demonstrates that limited data can be used to build a robust estimation model for the groundwater level variation in the potential groundwater recharge area by artificial intelligence techniques. In other words, a model estimating the groundwater level variations with less parameter bias can be built since this model needs only the streamflow and the rainfall variables (which are easier to measure) as input items. This model feature is helpful when applied to complex hydrogeological systems. Furthermore, from the authors' viewpoint, the groundwater level variation is an important issue for water resource management, especially in the potential recharge area. If information can be accurately provided about the variation in the groundwater level to water resources decision makers, they can develop more appropriate water resource allocations or water management policies. Overall, the model shows better performance for high-permeability areas (mountain regions) than the low-permeability area (the proximal-fan region). This study also adopted the Gamma test to deal with the complex correlation between the surface water and the groundwater. The model results show that neural network models can provide accurate performance with suitable data preprocessing. In summary, the results indicate that neural network-based estimation models perform well. These results can also provide valuable information for the prevention and treatment of land subsidence and can serve as an effective reference for water resource management in the alluvial fan and mountain areas. The results also demonstrate that the application of artificial neural networks in the study of groundwater is a promising avenue.