Modified DRASTIC model for groundwater vulnerability to nitrate contamination in the Dagujia river basin, China

Due to rapid economic growth and over-exploitation of groundwater, nitrate pollution in groundwater has become very serious. The main objective of this study is to modify the DRASTIC model to identify groundwater vulnerability to nitrate pollution. The DRASTIC model was firstly used to analyze the intrinsic vulnerability. The DRASTIC model with the inclusion of a land-use factor (DRASTIC-LU) was put forward to map the specific vulnerability of groundwater. Furthermore, the support vector machine (SVM) was introduced to avoid the drawback of the overlay and index methods, and the improved integrated models of DRASTICþ SVM and DRASTIC-LUþ SVM were built. Moreover, 103 groundwater samples were collected for building and validating the models. The Root Mean Squared Error (RMSE) of DRASTIC, DRASTIC-LU, DRASTICþ SVM, and DRASTIC-LUþ SVM was found to be 0.853, 0.755, 0.631, and 0.502, respectively. The model DRASTIC-LU was more precise than the original one. The results also showed that the integrated model using SVM exhibited better correlation between the vulnerability value and the nitrate pollution. The study indicated that the modified models including the land-use factor as well as SVM in the DRASTIC model were more suitable to assess the groundwater vulnerability to nitrate.


INTRODUCTION
flexibility criteria structure to realize the estimation.
However, the weights and rates are originally given or dependent on the experiences of assessment experts, which is the major drawback of this method. In order to deal with this issue, some studies have proposed various techniques, such as changing the weights and/or rates of the structure, subtracting or adding additional factors, using sensitivity analyses and calibration approaches, and combining with the analytic hierarchy process (Secunda The basin of the Dagujia river, known as the 'Mother River' by the local people, is located in Yantai city. Recently, groundwater exploitation has increased enormously, and the release of municipal and industrial wastes has seriously threatened the local groundwater environment. Meanwhile, intense agricultural activities have resulted in the leaching of nutrient constituents. Generally, this basin is a typical area with groundwater nitrate pollution, however, little attention has been paid to groundwater nitrate contamination in this area.
In this paper, based upon the DRASTIC model, integrated models were put forward to assess groundwater vulnerability. The DRASTIC method was firstly used to assess the intrinsic vulnerability of groundwater. Then, the DRASTIC-LU method was put forward to study the specific vulnerability of the aquifer to nitrate pollution. Meanwhile, considering the deficiency of the index-overlay method, the SVM methods combined with DRASTIC and DRAS-TIC-LU were used to build the novel integrated models.
Generally, the main objective of this research is to produce groundwater intrinsic and specific vulnerability maps by employing the ensembles of the DRASTIC (and DRASTIC-LU) method and intelligence technique, and evaluate their respective performances.

STUDY AREA
The Dagujia river basin is located in the north-central part of Yantai city, Shandong, China ( Figure 1). The basin is between the longitude and latitude of 120 44 0 00″-121 27 0 00″ and 37 02 0 00″-37 36 0 00″, respectively, covers an area of about 2,308 km 2 , and is bound by the Yellow Sea to the north. The climate is warm temperate monsoon with four distinctive seasons. The average temperature and the annual precipitation are 12.5 C and 629 mm, respectively. The general trend of the topography is high in the south and low in the north. The maximum elevation has been observed to be 814 m in the western part, whereas the northern part is dominated by coastal regions. The main river, the Dagujia river, is drained by the inner Jia river in the west and the outer Jia river in the east. Meanwhile, there are three reservoirs in this watershed: the Menlou reservoir, the Anli reservoir and the Taoyuan reservoir.
In the study area, groundwater is used for domestic, industrial and agricultural applications. Types of groundwater include pore water in loose ground, fracture karst water in carbonates, bedrock fracture water and clasticrock-type pore-fissure water. The pore and karst water is the target aquifer of intensive exploitation in this area.
Increase in the amount of groundwater withdrawal is the dominant factor, resulting in the lowering of groundwater level. The annual groundwater extraction has increased continually since 1976. Since the 1990s, the government has begun to control the exploitation of groundwater, and the problem of saltwater intrusion has tended to slow down. The groundwater contamination problem caused by seawater intrusion mainly occurs near the Dagujia river estuary. However, as a result of the rapid development of society in recent decades, nitrate pollution of groundwater has become more serious. The maximum concentration of nitrate in the groundwater has reached more than 200 mg/L. Therefore, there is an urgent need to initiate study of groundwater nitrate pollution in the river basin.

DATA AND METHODS
The classic DRASTIC model and its modified model, DRASTIC-LU, were introduced. Meanwhile, an SVMbased improved method was also used. The natural break classification scheme could determine the best arrangement of values into different classes (Yoon et al. ; Thapa et al. ). Therefore, in order to compare the results using different methods, the natural break classification method was always selected to divide the assessment area into the same four ranks: very low, low, medium, and high vulnerability zones.

Preparation of the nitrate concentration
In this study, the nitrate concentration was selected as the primary pollution parameter. To this end, 103 groundwater samples were collected from wells or piezometers during   Index (VI) is used to divide the region, which is calculated as the sum of the product of prescribed ratings (r) and weights (w) assigned to each of the above seven data layers (Pacheco et al. ). In order to obtain the VI value, Equation (1) was used. According to the related literature reviews and local data conditions, Table 1 presents the rating and weighting values in the DRASTIC model.
Here D, R, A, S, T, I, and C are the seven parameters and the subscripts r and w indicate the corresponding ratings and weights, respectively.

DRASTIC-LU method
Different types of land uses cause separate sources of nitrogen pollution, resulting in variant input intensity of nitrate contamination sources. Land cover can also affect the infiltration rate and surface runoff. Therefore, the factor of land use (LU) was also considered to build the DRASTIC-LU model for the assessment of specific vulnerability to nitrate contamination. The DRASTIC-LU Vulnerability Index (VIL) could be obtained by adding the land use factor with appropriate weights and ratings. On the basis of VI, Equation (2) is used to calculate the VIL value. In this study, the weight of the land use parameter was set to be 4. The detailed information on this model is presented in Table 1.
Here VI is the vulnerability index calculated using Equation (1), and LU r and LU w are the rate and weight values of the land use parameter, respectively.

Support vector machine (SVM)
The SVM, a relatively new machine learning method, is a supervised machine learning algorithm. SVM is one of the most cogent prediction methods, which is based on the structural risk minimization method. By contrast, most artificial intelligence models, such as Artificial Neural Networks, use empirical risk minimization techniques.
Therefore, the SVM method can reduce the empirical error, model the complexity and overfit the probability A training data set represented by T was given, as can be seen from Equation (3): Here m is the number of samples in the data set, x i is the input vector of data sample i (x i ∈ R N ), y i is the corresponding output value (y i ∈ R), and R N and R are the N-dimensional and one-dimensional vector spaces, respectively.
The aim of SVM is to find the optimal separation hyperplane, which can specify the widest margin between different classes and minimize the distance of the same class. The separating hyperplane is obtained by solving Equation (4): Here w is the normal vector of the separating hyperplane (w ∈ R N ), and b is the bias value.
As to the linearly separable problem, the hyperplane can be formulated as a quadratic programming problem by solving the objective function given by Inequality (5): Meanwhile, the kernel function is employed to transform a nonlinear classification into a linear classification problem to find an optimum separating hyperplane. The slack variable (ξ) and the penalty factor (C) can modify the constraint condition and objective function, whereas the corresponding objective function (Inequality (5)) can be converted to Inequality (6): Many studies using SVM demonstrate that the Radial Basis Function (RBF) has favorable performance over other kernels in groundwater and hydrologic predictions (Arabgol et al. ). Therefore, RBF with the Gaussian kernel was selected in this study and is given by Equation (7): Here γ is the Gaussian parameter.
The parameters C and γ in Equations (6) and (7), respectively, have a significant effect on the accuracy of the SVM model, which is noted as the major drawback of SVM. Therefore, a ten-fold cross-validation was used to select the optimal kernel parameters in this process (Pradhan ; Naghibi et al. ).

Data preparation
The maps of D, R, A, S, T, I, C, and LU were prepared in a raster format with a resolution of 30 m using geographic information system (GIS), and the inverse distance weighted interpolation technique was used to transform the statistical discrete data to a continuous surface.

Multicollinearity test
The independence of the selected parameters in the vulnerability assessment models is very important to ensure the accuracy of results. Correlation analysis has shown that a relationship among two or more of the input variables may cause deviations. In order to diagnose multi-collinearity among various factors, Tolerance (TOL) and Variance Inflation Factor (VIF) are the two common statistical parameters, using Equation (8). When the TOL value is <0.10 and VIF value is >5, there is a high multicollinearity among the predictor variables (Choubin et al. ): Here R j 2 is the R-squared value of regression using the regressing parameter j on all of the others.
The possibility of the presence of multicollinearity among the seven or eight conditioning factors was examined before the assessment process, and the corresponding results are presented in Tables 2 and 3. The results indicated that no high multicollinearity was observed among the selected parameters.

DRASTIC vulnerability map
The sums of the seven DRASTIC thematic parameters were used to estimate the value of VI, according to Equation (1).  Figure 3.
The high and moderate degrees of vulnerability zones were mainly located within the alluvial deposit region, and covered 6.78% and 12.9% of the study area, respectively.
The lithology of the high vulnerability zone was dominated by coarse sand and sand, whereas the rate of infiltration of contaminants from the surface as well as the transport in the aquifer were fast. The moderate-class region was mainly located at the edges of the foothill and structure development region, where pollution can easily infiltrate    Table 4.
According to the number of properly classified samples, the specific vulnerability results became generally more unified than the intrinsic vulnerability. Meanwhile, the integrated model using SVM became better than the common overlay and index method.
In order to make the results of different models comparable, the classification was further quantified: the values of high, moderate, low and very low classifications were set to be 4, 3, 2 and 1, respectively. The Root Mean Squared Error (RMSE) was introduced to compare the classification performance, using Equation (9). The smaller the values of RMSE, the better the prediction results were. The results of RMSE are presented in Table 4.    Based on the assessment results, the high vulnerability zone was mostly distributed along the riverbeds, which is mostly dominated by coarse sand and sand. The moderate subarea was related to the lithology of the aquifer and unsaturated zone. The low and very low vulnerability subareas were mainly situated at the hilly region. Meanwhile, the groundwater pollution presented obvious spatial distribution characteristics, which are related to intensive human activities that should be taken into consideration.
Generally, the study used a supervised AI technique to build the robust model that obviously improved the accuracy of assessment. The results of the assessment will provide essential information for future strategies on groundwater management and land use planning.