Research articleGroundwater level prediction using a SOM-aided stepwise cluster inference model
Introduction
Groundwater resource is commonly the most important water resource in semi-arid and arid areas that are often subject to water shortage. It plays a fundamental role in supplying clean and safe water to competing uses for domestic, industrial and agricultural sectors, and increasing attentions are also paid to its significance for ecological integrity. However, groundwater aquifer systems always feature complexity, high nonlinearity, being multi-scale and random as a result of the frequent interactions between surface water and groundwater as well as acute human disturbance (Nourani et al., 2015). Thus, effective modeling techniques would be required for providing efficient ground water management strategies. As for dynamic groundwater level (GWL) prediction, physical-based or conceptual models represent the hydrological variables and physical processes in real-world systems (Han et al., 2015), but they have practical limitations in terms of prediction accuracy as a result of unavoidable discrepancies between the model and the real-world system (Adamowski and Chan, 2011, Nourani et al., 2015, Salas et al., 1990). As far as increasingly scarce water resources accompanying with expanding population growth are concerned, improvements and innovations in groundwater predictions become critical. Hence, such black box or data driven models as Artificial Neural Networks (ANNs) were found to be widely employed by hydrogeologists (Chen et al., 2010, Coppola et al., 2005, Izady et al., 2013, Mohanty et al., 2010, Tapoglou et al., 2014, Yoon et al., 2011, Zahmatkesh et al., 2015).
Although various data-driven models were developed to predict GWL fluctuations, there are no consistent agreements on how to select an appropriate model with high efficiency in a real case (Coulibaly et al., 2001). Considering the “multiple inputs-multiple outputs” structure of regional GWL prediction models, a promising approach would be the Stepwise Cluster Analysis (SCA), which has been widely used for flow prediction recently by virtue of its ability to represent the nonlinear and complex relationships between various inputs and response variables (Fan et al., 2015, Huang et al., 2006, Li et al., 2015b). Moreover, it proved to be an effective and promising method for air-quality prediction and pilot-scale groundwater simulation (Huang et al., 2006, Qin et al., 2007, Sun et al., 2009). However, to perform GWL predictions at a regional level is still a hard work, considering the complexities of specific hydrogeological conditions and interactions between groundwater and surface water as well as climatic factors (Adamowski and Chan, 2011, Dash et al., 2010, Nourani et al., 2011). Predictor selection and parameter setting during the training phase would also lead to variations in the model performance with respect to the reliability and robustness of simulations, and optimal model configurations would be a key point of generating reliable simulations when SCA is applied. In order to deal with such issue, a stepwise cluster multisite inference model based on SCA specifically for accurately predicting regional GWL fluctuations would be indispensable.
Generally, data-driven models use statistical techniques instead of numerical simulation to relate the system response to various inputs, which are termed as predictors. As such, these models are able to “learn” system behavior of interest through exploring the patterns of representative data. Accordingly, the data quality for both predictors and training samples would impose varying influences on the model performance, and the introduction of irrelevant and redundant information might mislead the knowledge discovery process during the training phase and further yield unreliable predictions (Lábó, 2012). To tackle such dilemma, on one hand, it was recommended that pre-processing is recommended on the raw data to achieve accurate forecasting as conducted by many studies (Chen et al., 2010, Chen et al., 2011, Moosavi et al., 2013, Nourani et al., 2015). On the other hand, prevalent post-processing procedures were also adopted to correct the predictions (Li et al., 2015a, Morawietz et al., 2011). Accordingly, auxiliary procedures based on pre-processing or post-processing would be favorable alternatives to obtain more accurate GWL predictions.
Regional GWL observations are usually comprised of GWL time series for many piezometers, and a clustering technique may be preferred as a spatial data pre-processing tool to help to identify the regional characteristics dependent upon several representative observations instead of all observations involved. As such, the reduction of dimension would support efficient and informed decisions when black box models are used, although specific loss of GWL information might occur. Hence, intensive modeling efforts would be made based on these representative sites such as centroids for the obtained homogenous clusters. As an unsupervised machine learning technique, the self-organizing map (SOM) operates to reduce dimensions of high-dimensional data, and it could reveal the complex, nonlinear, and statistical relationships between high-dimensional data items on a low-dimensional display so as to allow optimal clusters to be determined (Chen et al., 2010, Kalteh et al., 2008, Kohonen, 1997, Nourani et al., 2015, Yang et al., 2012). Accordingly, the dimensionality of input variables as well as the resulting model complexity would be decreased (Hsu and Li, 2010, Hsu et al., 2002, Kalteh et al., 2008, Nourani et al., 2013, Nourani et al., 2015).
Thus, a promising approach to achieve accurate and efficient regional GWL predictions would combine both spatial clustering method and data-driven model in association with pre/post-processing procedures. With multisite representative GWL observations being considered at the same time, it would then lead to a SOM-aided stepwise cluster multisite inference model for GWL prediction. In this study, this model would be developed to predict GWL a month into the future, and this paper is arranged as follows. The study area for an arid irrigation district is first introduced in the western Hexi Corridor of China. The proposed methodology is then presented with respect to modeling framework, SOM, stepwise cluster multisite inference and autoregressive error models as well as model performance indicators. Afterwards, the obtained results are presented in association with discussions, which are subsequently followed by summarized conclusions.
Section snippets
Study area and data
The study area is located in the Shule River watershed of China, which covers an area of approximately 160,000 km2, and drained by such major inland rivers as Shule River, Dang River and Shiyou River (Fig. 1). Moreover, the Shule River is one of the three longest inland rivers in the Hexi Corridor, extending a length of 670 km. As depicted in Fig. 1, most part of the Shule River watershed is situated in Gansu Province, while almost all source waters are formed in the mountains of Qinghai
Methodology
Fig. 4 illustrates the flowchart of proposed forecasting framework. Generally, SOM is used as a spatial clustering method to group the GWL piezometers into several clusters, the number of which is determined with the help of non-hierarchical K-means classification method (Tarsitano, 2003). While the GWL elevations for central piezometers in the future are predicted through developing such temporal clustering method as stepwise cluster inference model. At the first step, input pattern analysis
Results of SOM
For the SOM output layer, hexagonal discrete lattices are usually preferred for visualization (Kalteh et al., 2008). In order to accomplish spatial clustering of GWL piezometers, a lattice with 6 × 5 hexagons was utilized to illustrate the similarities of GWL observations for 30 wells in the study area (Fig. 3). In this study, a 30 × 156 matrix (i.e., 156 normalized GWL records for each piezometer from 1998 to 2010) comprised the high dimensional inputs into the SOM training process. Fig. 5(a)
Summary
In this study, a SOM-aided stepwise cluster multi-site inference model was developed to make regional GWL predictions a month ahead. To represent various patterns of GWL fluctuations, a self-organizing map coupled with non-hierarchical K-means classification method was applied to identify the optimal clusters for GWL observation wells as well as the central piezometers. According to the available data, there were totally 38 candidate predictors used to predict the GWL. Based on the obtained
Acknowledgement
This study was financially supported by the National Basic Research Program of China (973 Program 2013CB036402), National Key Technology Research and Development Program of the Ministry of Science and Technology of China (2013BAB05B03), Research and Development Special Fund for Public Welfare Industry of the Ministry of Water Research in China (201501028) and China Postdoctoral Science Foundation (2015M571048). Also, the authors wish to acknowledge both the editors and reviewers for their
References (41)
- et al.
A wavelet neural network conjunction model for groundwater level forecasting
J. Hydrol.
(2011) - et al.
Comparison of self-organizing maps classification approach with cluster and principal components analysis for large environmental data sets
Water Res.
(2007) - et al.
Chance-constrained overland flow modeling for improving conceptual distributed hydrologic simulations based on scaling representation of sub-daily rainfall variability
Sci. Total Environ.
(2015) - et al.
Clustering spatial–temporal precipitation data using wavelet transform and self-organizing map neural network
Adv. Water Resour.
(2010) A stepwise cluster analysis method for predicting air quality in an urban environment
Atmos. Environ. Part B. Urban Atmos.
(1992)- et al.
Parameter and modeling uncertainty simulated by GLUE and a formal Bayesian method for a conceptual hydrological model
J. Hydrol.
(2010) - et al.
Review of the self-organizing map (SOM) approach in water resources: analysis, modelling and application
Environ. Model. Softw.
(2008) Validation studies of precipitation estimates from different satellite sensors over Hungary – analysis of new satellite-derived rain rate products for hydrological purposes
J. Hydrol.
(2012)- et al.
Wavelet-entropy data pre-processing approach for ANN-based groundwater level modeling
J. Hydrol.
(2015) - et al.
Using self-organizing maps and wavelet transforms for space–time pre-processing of satellite precipitation and runoff data in neural network based rainfall–runoff modeling
J. Hydrol.
(2013)