Construction of spatio-temporal coupling model for groundwater level prediction: a case study of Changwu area, Yangtze River Delta region of China

The dynamic monitoring data of groundwater level is an important basis for understanding the current situation of groundwater development and the utilization and planning of sustainable exploitation. The dynamic monitoring data of groundwater level are typical spatio-temporal sequence data, which have the characteristics of non-linearity and strong spatio-temporal correlation. The trend of dynamic change of groundwater level is the key factor for the optimal allocation of groundwater resources. However, most of the existing groundwater level prediction models are insuf ﬁ cient in considering the temporal and spatial factors and their spatio-temporal correlation. Therefore, construction of a space – time prediction model of groundwater level considering space – time factors and improvement of the prediction accuracy of groundwater level dynamic changes are of considerable theoretical and practical importance for the sustainable development of groundwater resources utilization. Based on the analysis of spatial – temporal characteristics of groundwater level of the pore con ﬁ ned aquifer II of Changwu area in the Yangtze River Delta region of China, the wavelet transform method is used to remove the noise in the original data, and the K-nearest neighbor (KNN) is used to calculate the water level. The spatial – temporal dataset and the long short-term memory (LSTM) are reconstructed by screening the spatial correlation of the monitoring wells in the study area. A spatio-temporal prediction model KNN-LSTM of groundwater level considering spatio-temporal factors is also constructed. The reliability and accuracy of KNN-LSTM, LSTM, support vector regression, and autoregressive integrated moving average model are evaluated by a cross-validation algorithm. Results showed that the prediction accuracy of KNN-LSTM is 20.68%, 46.54%, and 55.34% higher than that of other single prediction models. denoise the prediction.


GRAPHICAL ABSTRACT INTRODUCTION
The dynamic change trend of groundwater level, as an important basis for groundwater resources management, is of considerable importance for the sustainable utilization planning of regional groundwater resources (Stasik et  Therefore, researchers have explored alternative methods, such as data-driven (statistical and machine learning (ML)) techniques, in the past two decades. Compared with the traditional statistical analysis model, the ML method has effective real-time prediction and can solve the problem of multiple independent and dependent variables (Yoon et al. ). Moreover, the ML method avoids some complex physical hydrological processes and only relies on the statistical relationship between explanatory and response variables (Knoll et al. ). ML methods have been successfully applied to groundwater level prediction.
Some studies (Park et al. ) also revealed that the accuracy of groundwater level prediction models combined with different ML methods is higher than that of single ML methods. This finding is due to the effective extraction of specific patterns in the data (such as trends, cycles, and horizontal changes) by different methods. For example, wavelet transform (WT) (Wang et al. ) is a time-frequency localization method, which can extract time-varying (trend and periodicity) or multi-scale behaviors from the time sequence.
WT is usually combined with ML to predict groundwater level, and such a combination considerably improves the prediction accuracy compared with ML alone (Bhardwaj Moreover, exploring the law of groundwater level monitoring data sequence, proposing an effective spatio-temporal data mining method, and building a nonlinear and highprecision groundwater level prediction hybrid model considering spatio-temporal factors are necessary.

Overview of study area
The study area is Changzhou City, Jiangsu Province, which is located in the hinterland of the Yangtze River Delta.
The study area, which has a flat terrain, is located on the northwest edge of Taihu Plain. The surrounding along the Taihu Lake area is slightly high in the north and low in the south, and the elevation difference between the north and the south is approximately 1-2 m (Lu et al. ). The study area is approximately 1,793 km 2 . This area covers four municipal districts of Xinbei, Tianning, Zhonglou, and Wujin, which are called 'Changwu area.' The geographical coordinates are 31 20 0 -32 03 0 N and 119 40 0 -120 12 0 E.
The geographical location is shown in Figure 1.
The pore groundwater in the Changwu area mainly exists in Quaternary unconsolidated sediment layers. The pore phreatic aquifer and the pore confined aquifer I to III redistributed from top to bottom (Chen et al. ). The recharge of the pore phreatic water and the pore confined aquifer I water is mainly atmospheric precipitation, surface water, and agricultural irrigation water, and the main discharge modes are exploitation and evaporation. The pore confined aquifer II and III water are mainly used for development and utilization, respectively, and the pore confined aquifer II is the main mining layer of groundwater.
The pore confined aquifer II group belongs to the middle Pleistocene aquifer group of the Quaternary, which is distributed in the Qidong Formation and divided into the upper (I 1 ) and lower (II 2 ) aquifers. Among these aquifers, the II 2 aquifer is the main aquifer in the area because of its large thickness and excellent water yield. From the mid-   of adjacent water intake wells. This anomaly will reduce the prediction accuracy of groundwater level regime. Therefore, preprocessing the original monitoring well data, such as noise reduction, is necessary for the subsequent construction of groundwater level prediction model to improve the accuracy of the prediction model.
In statistics, the spatial stationary process indicates that continuous spatial variables do not change with the location of the random process (Nakagawa et al. ). Studying the spatial stationarity of spatio-temporal sequence of groundwater level involves the analysis of the spatial distribution trend of monitoring data. A mathematical curved surface is formed by fitting the spatial sequence to reflect the various characteristics of the groundwater level in the spatial region (Fang et al. ). On the basis of ignoring local anomalies, the overall variation trend of groundwater level in spatial dimension is revealed. Figure 3 is the second-order spatial trend map of the average water level of each monitoring point in the space-time sequence of groundwater level in the study area. Figure 3 shows that the groundwater level is generally high in the west and low in the east. The groundwater level first decreases in north-south direction and then increases from south to north. Therefore, the groundwater data of the study area are non-stationary in the spatial dimension.
The time stability of the monitoring sequence of groundwater level in the study area is tested by analyzing the change in groundwater level of each monitoring well in time sequence. Figure 4 demonstrates the water level changes of the four selected representative monitoring wells in the past 10 years. The average water level of monitoring wells in the study area during the past 15 years has generally been increasing with years; that is, the overall water level changes show an upward trend. Simultaneously, the average water level varies with the change in location and the passage of time, and a small fluctuation is observed at the same time interval. Therefore, the monitoring sequence of groundwater level in the study area is a nonstationary sequence in time.

CONSTRUCTION OF THE SPATIO-TEMPORAL DATA PREDICTION COUPLING MODEL
The monitoring value of groundwater level is a kind of non-stationary and non-linear geographical space-time data (Khorrami ). The analysis of spatial-temporal characteristics of groundwater level monitoring data reveals the regularity and difference of the groundwater level distribution in time and space. A strong spatio-temporal correlation is found between sequence and noise.
Thus, the realization of groundwater level prediction should not only consider the historical data of the target monitoring wells and noise impact but also the influence of other monitoring well data associated with the target monitoring well. Therefore, constructing a spatio-temporal hybrid model for groundwater level prediction, which considers temporal and spatial factors, is necessary.
Brief introduction of spatio-temporal data prediction coupling model The KNN-LSTM model proposed in this paper includes WT, the KNN algorithm considering spatial correlation, LSTM used to predict time sequence, and the cross-validation method employed to verify the accuracy of the model. The main steps are as follows.
Step 1: Data preparation phase. Spatio-temporal monitoring data of groundwater levels were sorted, and spatiotemporal sequence data set was compiled to eliminate outliers. The noise reduction sequence Matrix X R of the continuous D years of groundwater level is obtained through the WT of the original data set.
Step 2: Spatial correlation filtering phase. Based on the spatio-temporal sequence data set, the KNN algorithm based on distance weighting is used to filter the spatial correlation of denoised sequence Matrix X R . The K monitoring wells with the strongest spatial correlation with the target monitoring wells are obtained. The sequence matrix X CR of continuous D years of groundwater monitoring wells after spatial screening is constructed.
Step 3: Model training phase. The data of K monitoring wells in X CR for successive D years are divided into two groups: training and test sets for model training and validation, respectively. The data of the D-1 year are selected to construct the training set, which is inputted into the LSTM model to predict the groundwater level of K monitoring wells in the D year.
Step 4: Groundwater level prediction phase. The predicted values of groundwater level of K monitoring wells in the predicted years are weighted and fused in accordance with the weight of each monitoring well. The predicted value of groundwater level of the target monitoring well in the predicted year is obtained, and the predicted value is taken as the initial prediction result.
Step 5 For a given study area, suppose there are N groundwater level monitoring wells in the area, the targeted monitoring well is o, and the groundwater level sequence of the i monitoring well in the d year is defined as follows: where n is the number of samples of groundwater level monitoring data sequence per year, n is 12 in this paper, and the water level is monitored monthly. The original observation sequence X d i of each monitoring well is decomposed into signals. The sequence of groundwater level after decomposition of the i monitoring well in the d year is obtained, and a threshold value is set to filter the noise in its high frequency component.
where X d Ri represents the true water level sequence of the i monitoring well after noise removal in the d year, and X d Ni represents the noise sequence of the i monitoring well in the D year. In the prediction model, the groundwater level data of each monitoring well for continuous D years are selected as the experimental data. The real water level sequence matrix X Ri for the continuous D years of the i monitoring well and the real water level sequence matrix X R for the continuous D years of the groundwater level monitoring well are respectively expressed as follows:

KNN-LSTM based on distance weighting
The spatial correlation of the real water level sequence X R of the underground water level monitoring well is screened.
The K monitoring wells with the strongest spatial correlation with the target monitoring wells are selected by the KNN algorithm based on distance weighting. The groundwater level sequence of K selected monitoring wells is transferred to the LSTM prediction model as the input data set, the spatio-temporal prediction is conducted, and the prediction error is calculated. The result with the smallest error is regarded as the final result of spatial correlation screening based on the constant adjustment of the K value. The main process of spatial correlation screening and prediction algorithm is as follows.
(1) The denoised data set X R is divided into sample and object sequences.
(2) The Euclidean distance l between each monitoring well and the target monitoring well is calculated. For any position monitoring well i(x i , y i ) and target monitoring well o(x o , y o ), the real distance between two monitoring wells is evaluated by calculating the square sum of the difference between the kilometer grid coordinates of the two monitoring wells, and the specific calculation formula is as follows: (3) The monitoring wells are sorted in increasing order according to the size of the calculated Euclidean distance l between the monitoring wells, and the error is expressed by M. The initial setting is K ¼ 1 and M ¼ 1.
(4) The first K distances with the smallest order corresponding to the denoised groundwater level sequence of the monitoring wells are selected, and the sequence Matrix X cR of the monitoring wells for D consecutive years are constructed after spatial screening.
where X cR i represents the denoised water level sequence of the i monitoring well in the continuous D year after screening through the spatial correlation, and X d cR i represents the denoised water level sequence of the i monitoring well in the d year.
where Φ refers to the prediction model used, and the LSTM model is selected in this paper;X D c i is the predicted value of the groundwater level of the i ing relationship of other comparative models is f1. The experiment shows that the prediction effect is the best when the delay of input node in f1 training relation is 3. Therefore, the numbers of input and output nodes are 3 and 1, respectively. Table 1 shows the configuration details for the comparative model experiments in this article.   KNN-LSTM model is the best, and the corresponding curve is close to the true value. Moreover, the evaluation index of each error is the smallest, and the prediction accuracy is high. Thus, denoising the original data is necessary.

MODEL APPLICATION
Data noise reduction processing The wavelet threshold denoising method should be used to denoise the data to avoid the influence of noise on the wavelet function is db4, threshold coefficient P is 0.5, and decomposition level is 5.       Figure 10. The above analysis reveals that the spatio-temporal prediction of groundwater level is markedly influenced by the  However, the prediction is also affected by other associated monitoring wells outside the radius of the water level depression funnel, but the impact is small. Therefore, in addition to considering the impact of historical time sequence, selecting the spatial correlation of monitoring wells in the study area is crucial in the study of the groundwater level prediction method.
Prediction results of groundwater level and its spacetime expression

Results of groundwater level prediction
On the basis of water level dynamic prediction of 33 monitoring wells, nine representative groundwater level dynamic monitoring wells, which run through drawdown cones, are selected. The comparison of the original water level sequence from January to December 2018 with the predicted water level sequence is shown in Figure 11. 2.0 m deep, which belongs to a good impermeable layer. In addition, the pore confined aquifer II is unaffected by the dynamic change in the water level of the lake because no recharge from the lake to the pore confined aquifer II is observed. The analysis reveals that the exploitation intensity of the pore confined aquifer II water around the X09 monitoring well considerably changes at different times. Thus, the water level fluctuation is evident.
However, the difference between the two sequences of monitoring wells X07, X09, and X17 is revealed in Figure 11.
The gap is magnified after mapping due to the large groundwater level value and the small fluctuation. In the actual calculation of prediction error, the error value of such monitoring well sequence is small, and its prediction results are still excellent. The prediction sequence of monitoring wells X04 and X33 has a relative difference with the original sequence, and the prediction accuracy is low. By querying the original information, a considerable amount of missing data in the sequence is noted. This phenomenon affects the accuracy of the prediction results.

CONCLUSION
The KNN-LSTM prediction model proposed in this paper considers the spatio-temporal characteristics of the original data set than the traditional single random prediction model. The wavelet threshold denoising method is used to remove the influence of noise in the data. The KNN algor- ithm is employed to screen the spatial correlation of the in Changwu as an example. However, the prediction accuracy will change when the same model is applied to different regions. In the later stage, further exploration and research will be conducted in the field of spatio-temporal prediction of groundwater level, emphasizing the universality of the model.