Elsevier

Journal of Environmental Management

Volume 182, 1 November 2016, Pages 308-321
Journal of Environmental Management

Research article
Groundwater level prediction using a SOM-aided stepwise cluster inference model

https://doi.org/10.1016/j.jenvman.2016.07.069Get rights and content

Highlights

  • We propose a coupled clustering approach to support groundwater level predictions.

  • The spatial clusters of regional groundwater piezometers are derived through SOM.

  • Stepwise cluster inference models are developed to perform multisite predictions.

  • The approach is well demonstrated in an irrigation area of the China Hexi Corridor.

  • Such post-processor as the AR error model helps to improve the prediction accuracy.

Abstract

Accurate groundwater level (GWL) prediction can contribute to sustaining reliable water supply to domestic, agricultural and industrial uses as well as ecological services, especially in arid and semi-arid areas. In this paper, a regional GWL modeling framework was first presented through coupling both spatial and temporal clustering techniques. Specifically, the self-organizing map (SOM) was applied to identify spatially homogeneous clusters of GWL piezometers, while GWL time series forecasting was performed through developing a stepwise cluster multisite inference model with various predictors including climate conditions, well extractions, surface runoffs, reservoir operations and GWL measurements at previous steps. The proposed modeling approach was then demonstrated by a case of an arid irrigation district in the western Hexi Corridor, northwest China. Spatial clustering analysis identified 6 regionally representative central piezometers out of 30, for which sensitivity and uncertainty analysis were carried out regarding GWL predictions. As the stepwise cluster tree provided uncertain predictions, we added an AR(1) error model to the mean prediction to forecast GWL 1 month ahead. Model performance indicators suggest that the modeling system is a useful tool to aid decision-making for informed groundwater resource management in arid areas, and would have a great potential to extend its applications to more areas or regions in the future.

Introduction

Groundwater resource is commonly the most important water resource in semi-arid and arid areas that are often subject to water shortage. It plays a fundamental role in supplying clean and safe water to competing uses for domestic, industrial and agricultural sectors, and increasing attentions are also paid to its significance for ecological integrity. However, groundwater aquifer systems always feature complexity, high nonlinearity, being multi-scale and random as a result of the frequent interactions between surface water and groundwater as well as acute human disturbance (Nourani et al., 2015). Thus, effective modeling techniques would be required for providing efficient ground water management strategies. As for dynamic groundwater level (GWL) prediction, physical-based or conceptual models represent the hydrological variables and physical processes in real-world systems (Han et al., 2015), but they have practical limitations in terms of prediction accuracy as a result of unavoidable discrepancies between the model and the real-world system (Adamowski and Chan, 2011, Nourani et al., 2015, Salas et al., 1990). As far as increasingly scarce water resources accompanying with expanding population growth are concerned, improvements and innovations in groundwater predictions become critical. Hence, such black box or data driven models as Artificial Neural Networks (ANNs) were found to be widely employed by hydrogeologists (Chen et al., 2010, Coppola et al., 2005, Izady et al., 2013, Mohanty et al., 2010, Tapoglou et al., 2014, Yoon et al., 2011, Zahmatkesh et al., 2015).

Although various data-driven models were developed to predict GWL fluctuations, there are no consistent agreements on how to select an appropriate model with high efficiency in a real case (Coulibaly et al., 2001). Considering the “multiple inputs-multiple outputs” structure of regional GWL prediction models, a promising approach would be the Stepwise Cluster Analysis (SCA), which has been widely used for flow prediction recently by virtue of its ability to represent the nonlinear and complex relationships between various inputs and response variables (Fan et al., 2015, Huang et al., 2006, Li et al., 2015b). Moreover, it proved to be an effective and promising method for air-quality prediction and pilot-scale groundwater simulation (Huang et al., 2006, Qin et al., 2007, Sun et al., 2009). However, to perform GWL predictions at a regional level is still a hard work, considering the complexities of specific hydrogeological conditions and interactions between groundwater and surface water as well as climatic factors (Adamowski and Chan, 2011, Dash et al., 2010, Nourani et al., 2011). Predictor selection and parameter setting during the training phase would also lead to variations in the model performance with respect to the reliability and robustness of simulations, and optimal model configurations would be a key point of generating reliable simulations when SCA is applied. In order to deal with such issue, a stepwise cluster multisite inference model based on SCA specifically for accurately predicting regional GWL fluctuations would be indispensable.

Generally, data-driven models use statistical techniques instead of numerical simulation to relate the system response to various inputs, which are termed as predictors. As such, these models are able to “learn” system behavior of interest through exploring the patterns of representative data. Accordingly, the data quality for both predictors and training samples would impose varying influences on the model performance, and the introduction of irrelevant and redundant information might mislead the knowledge discovery process during the training phase and further yield unreliable predictions (Lábó, 2012). To tackle such dilemma, on one hand, it was recommended that pre-processing is recommended on the raw data to achieve accurate forecasting as conducted by many studies (Chen et al., 2010, Chen et al., 2011, Moosavi et al., 2013, Nourani et al., 2015). On the other hand, prevalent post-processing procedures were also adopted to correct the predictions (Li et al., 2015a, Morawietz et al., 2011). Accordingly, auxiliary procedures based on pre-processing or post-processing would be favorable alternatives to obtain more accurate GWL predictions.

Regional GWL observations are usually comprised of GWL time series for many piezometers, and a clustering technique may be preferred as a spatial data pre-processing tool to help to identify the regional characteristics dependent upon several representative observations instead of all observations involved. As such, the reduction of dimension would support efficient and informed decisions when black box models are used, although specific loss of GWL information might occur. Hence, intensive modeling efforts would be made based on these representative sites such as centroids for the obtained homogenous clusters. As an unsupervised machine learning technique, the self-organizing map (SOM) operates to reduce dimensions of high-dimensional data, and it could reveal the complex, nonlinear, and statistical relationships between high-dimensional data items on a low-dimensional display so as to allow optimal clusters to be determined (Chen et al., 2010, Kalteh et al., 2008, Kohonen, 1997, Nourani et al., 2015, Yang et al., 2012). Accordingly, the dimensionality of input variables as well as the resulting model complexity would be decreased (Hsu and Li, 2010, Hsu et al., 2002, Kalteh et al., 2008, Nourani et al., 2013, Nourani et al., 2015).

Thus, a promising approach to achieve accurate and efficient regional GWL predictions would combine both spatial clustering method and data-driven model in association with pre/post-processing procedures. With multisite representative GWL observations being considered at the same time, it would then lead to a SOM-aided stepwise cluster multisite inference model for GWL prediction. In this study, this model would be developed to predict GWL a month into the future, and this paper is arranged as follows. The study area for an arid irrigation district is first introduced in the western Hexi Corridor of China. The proposed methodology is then presented with respect to modeling framework, SOM, stepwise cluster multisite inference and autoregressive error models as well as model performance indicators. Afterwards, the obtained results are presented in association with discussions, which are subsequently followed by summarized conclusions.

Section snippets

Study area and data

The study area is located in the Shule River watershed of China, which covers an area of approximately 160,000 km2, and drained by such major inland rivers as Shule River, Dang River and Shiyou River (Fig. 1). Moreover, the Shule River is one of the three longest inland rivers in the Hexi Corridor, extending a length of 670 km. As depicted in Fig. 1, most part of the Shule River watershed is situated in Gansu Province, while almost all source waters are formed in the mountains of Qinghai

Methodology

Fig. 4 illustrates the flowchart of proposed forecasting framework. Generally, SOM is used as a spatial clustering method to group the GWL piezometers into several clusters, the number of which is determined with the help of non-hierarchical K-means classification method (Tarsitano, 2003). While the GWL elevations for central piezometers in the future are predicted through developing such temporal clustering method as stepwise cluster inference model. At the first step, input pattern analysis

Results of SOM

For the SOM output layer, hexagonal discrete lattices are usually preferred for visualization (Kalteh et al., 2008). In order to accomplish spatial clustering of GWL piezometers, a lattice with 6 × 5 hexagons was utilized to illustrate the similarities of GWL observations for 30 wells in the study area (Fig. 3). In this study, a 30 × 156 matrix (i.e., 156 normalized GWL records for each piezometer from 1998 to 2010) comprised the high dimensional inputs into the SOM training process. Fig. 5(a)

Summary

In this study, a SOM-aided stepwise cluster multi-site inference model was developed to make regional GWL predictions a month ahead. To represent various patterns of GWL fluctuations, a self-organizing map coupled with non-hierarchical K-means classification method was applied to identify the optimal clusters for GWL observation wells as well as the central piezometers. According to the available data, there were totally 38 candidate predictors used to predict the GWL. Based on the obtained

Acknowledgement

This study was financially supported by the National Basic Research Program of China (973 Program 2013CB036402), National Key Technology Research and Development Program of the Ministry of Science and Technology of China (2013BAB05B03), Research and Development Special Fund for Public Welfare Industry of the Ministry of Water Research in China (201501028) and China Postdoctoral Science Foundation (2015M571048). Also, the authors wish to acknowledge both the editors and reviewers for their

References (41)

  • W. Sun et al.

    A stepwise-cluster microbial biomass inference model in food waste composting

    Waste Manag.

    (2009)
  • E. Tapoglou et al.

    A spatio-temporal hybrid neural network-Kriging model for groundwater level simulation

    J. Hydrol.

    (2014)
  • A. Tarsitano

    A computational study of several relocation methods for k-means algorithms

    Pattern Recognit.

    (2003)
  • Y. Yang et al.

    An integrated SOM-based multivariate approach for spatio-temporal patterns identification and source apportionment of pollution in complex river network

    Environ. Pollut.

    (2012)
  • H. Yoon et al.

    A comparative study of artificial neural networks and support vector machines for predicting groundwater levels in a coastal aquifer

    J. Hydrol.

    (2011)
  • Z. Zahmatkesh et al.

    Uncertainty based modeling of rainfall-runoff: combined differential evolution adaptive Metropolis (DREAM) and K-means clustering

    Adv. Water Resour.

    (2015)
  • P.J. Brockwell et al.

    Introduction to Time Series and Forecasting

    (1996)
  • L. Chen et al.

    Groundwater level prediction using SOM-RBFN multisite model

    J. Hydrol. Eng.

    (2010)
  • L.H. Chen et al.

    Application of integrated back-propagation network and self-organizing map for groundwater level forecasting

    J. Water Resour. Plan. Manag.

    (2011)
  • E.A. Coppola et al.

    A neural network model for predicting aquifer water level elevations

    Ground Water

    (2005)
  • Cited by (0)

    View full text