Advancing AI-based pan-European groundwater monitoring

The main challenge of pan-European groundwater (GW) monitoring is the sparsity of collated water table depth (wtd) observations. The wtd anomaly (wtda ) is a measure of the increased wtd due to droughts. Combining long short-term memory (LSTM) networks and transfer learning (TL), we propose an AI-based methodology LSTM-TL to produce reliable wtda estimates at the European scale in the absence of consistent wtd observational data sets. The core idea of LSTM-TL is to transfer the modeled relationship between wtda and input hydrometeorological forcings to the observation-based estimation, in order to provide reliable wtda estimates for regions with no or sparse wtd observations. With substantially reduced computational cost compared to physically-based numerical models, LSTM-TL obtained wtda estimates in good agreement with in-situ wtda measurements from 2569 European GW monitoring wells, showing r ⩾ 0.5, root-mean-square error ⩽1.0 and Kling-Gupta efficiency ⩾0.3 at about or more than half of the pixels. Based on the reconstructed long-term European monthly wtda data from the early 1980s to the near present, we provide the first estimate of seasonal wtda trends in different European regions, that is, significant drying trends in central and eastern Europe, which facilitates the understanding of historical GW dynamics in Europe. The success of LSTM-TL in estimating wtda also highlights the advantage of combining AI techniques with knowledge contained in physically-based numerical models in hydrological studies.


Introduction
Groundwater (GW) is the dominant source of drinking water in Europe, with about 75% of European Union residents relying on GW for their water supply, and recognized as an important contributor to industry (e.g. cooling water) and agriculture (irrigation) (European Commission 2008). In addition, GW plays a critical role in sustaining surface water bodies, adapting to climate variability and supporting biodiversity (van der Gun 2020). In recent years, however, Europe has experienced several severe summer droughts and heat waves (e.g. Fink et al 2004, Stahl et al 2016, Van Lanen et al 2016, Bastos et al 2020, Boergens et al 2020, affecting all compartments of the water cycle. Droughts often affect large areas across national boundaries (Brauns et al 2020). High ambient atmospheric temperature, continuous precipitation deficits, and large evapotranspiration losses translate into delayed and attenuated soil moisture (θ) deficits, which subsequently reduce GW recharge and increase water table depths (wtd) and ultimately, cause low streamflow or dried-up rivers (Tallaksen et al 2009, Van Loon 2015, Hellwig et al 2020. More frequent and severe droughts anticipated in future changing climate will exacerbate the vulnerability of European GW systems, emphasizing the necessity of GW monitoring in European GW management (Guerreiro et al 2018, Ault 2020. The major challenge for pan-European GW monitoring is the sparsity of collated wtd observations, which is attributed to the large spatial gaps and temporal inconsistency in the wtd measurement network (Brauns et al 2020). Sparse wtd measurements pose difficulties in understanding GW dynamics at the continental scale and limit insight into extreme events (e.g. droughts) and their impacts on GW over Europe. To overcome the challenge, standardized meteorological drought indices over extended time scales and terrestrial water storage anomalies derived from the Gravity Recovery and Climate Experiment (GRACE) satellite observations have been widely used to predict and quantify GW anomalies (e.g. Bloomfield and Marchant 2013, Kumar et al 2016, Van Loon et al 2017, Boergens et al 2020. However, their reliability is often questioned, especially in small watersheds, mainly due to the unrealistic assumption about the linear translation of changes in meteorological signals (e.g. precipitation) into GW anomalies or the coarse spatial resolution (0.5 • , about 55 km) of the GRACE observations (Kumar et al 2016, Van Loon et al 2017. Physically-based numerical models enable a realistic representation of processes in the terrestrial water and energy cycles. While they may provide more accurate GW anomaly estimates than the above methods, the models require extensive physical background knowledge and become too time and computationally demanding to be applied for high-resolution longterm simulations at larger scales (Wunsch et al 2018, Hauswirth et al 2021. The wtd anomaly (wtd a ) is a measure of the increased wtd due to droughts. In this study, a methodology is proposed that offers reliable high-resolution long-term estimates of wtd a at the European scale with low computational cost, which allows effective and efficient GW monitoring over Europe.
With the emergence of deep learning (DL), increase in computational power, and availability of large datasets, the application of artificial intelligence (AI) has again been attracting considerable attention across scientific disciplines, including hydrological sciences. Rajaee et al (2019) and Tao et al (2022) summarized over 200 papers on the use of AI for GW resource modeling from 2001 to 2020. AI can produce GW predictions as accurately as physically-based numerical models with much less background knowledge and computational cost (Govindaraju 2000, Shen 2018. As a subset of AI, DL techniques such as long short-term memory (LSTM) networks have achieved great success in predicting GW changes in previous studies (e.g. Zhang et al 2018, Hauswirth et al 2021, Vu et al 2021. Specifically, LSTM networks are able to detect longshort-term dependencies between GW anomalies and input hydrometeorological forcings without explicitly defining spatially varying time lags, thereby simplifying the data preprocessing process (Hochreiter andSchmidhuber 1997, Gers et al 2000). In the field of hydrology, such DL techniques are generally trained using a supervised training algorithm with a supplementary teacher signal (here wtd a observations, wtd a,o ) to guide the training process, as shown in figure 1(a). As a result, it is challenging to implement DL techniques to estimate wtd a without sufficient wtd a,o used for training.
Transfer learning (TL), which is a machine learning technique (belonging to the broad field of AI) that applies the knowledge gained in a data-rich domain into a related data-scarce domain, can address the issue of sparse wtd a observations to train DL techniques (Tan et al 2018). The increased knowledge often significantly improves network performance in the data-scarce domain and/or accelerates computing progress (Goodfellow et al 2017), and thus, TL has become very popular in image classification (e.g. Quattoni et al 2008, Lu et al 2015, Lee et al 2017 and natural language processing tasks (e.g. Lu et al 2015). A recent study (Ma et al 2021) transferring hydrological parameters across continents has also demonstrated the usefulness of TL in improving hydrological predictions. In contrast with sparse in-situ wtd a,o (the data-scarce domain), anomalies derived from modeling results, such as modeled wtd a (wtd a,m ), are spatiotemporally continuous throughout the European continent and can be regarded as a data-rich domain and used for knowledge transfer.
Here, we propose an AI-based methodology that integrates LSTM networks with TL, LSTM-TL, to estimate monthly wtd a at the continental scale in order to facilitate pan-European GW monitoring. The methodology (illustrated in figure 1(b)) uses monthly precipitation and θ anomalies (pr a and θ a ) as input and relies on the close connection between GW and other compartments in the water cycle to produce wtd a estimates. We evaluated the obtained wtd a estimates by LSTM-TL (wtd a,LSTM-TL ) with collated in-situ wtd a,o from over 2500 European GW monitoring wells (figure 2) in order to explore the reliability of LSTM-TL. Using pr a and θ a from various observational datasets (pr a,o and θ a,o ) as input to LSTM-TL, we reconstructed monthly wtd a data at the European scale from the early 1980s to the near present and investigated GW changes over Europe in recent drought years. Finally, seasonal wtd a trends were derived for different European regions from the reconstructed data. To the best of our knowledge, these trends are shown for the first time.

LSTM-TL
The study proposes LSTM-TL to estimate wtd a at individual 0.11 • pixels over Europe in the absence of collated wtd measurements, thereby mitigating the negative impact of sparse measurements on European GW management. The methodology combines LSTM networks and TL. LSTM networks are utilized to capture the time-varying and time-lagged relationship between wtd a and input hydrometeorological variables. The use of TL is based on an  established LSTM architecture which was already successfully tested for predictions of wtd a,m over Europe (see the supplementary section). In contrast to other studies that transfer knowledge between different regions, such as Ma et al (2021), this study transfers knowledge between modeled and observed data of the same spatial region. As previously shown (Ma et al 2021a), among a number of hydrometeorological variables with large-scale remotely sensed observations and reanalysis products, the combination of pr a and θ a played the most important role in estimating wtd a,m over Europe based on the test performance of the LSTM networks. Thus, here we select pr a and θ a as input. At pixels without modeled θ a (θ a,m ) information (where surface water bodies exist), pr a is the only input variable. In contrast to typical feedforward networks, LSTM networks utilize not only single data points, but also sequences of data, in our case time series. This kind of 'memory' includes previously useful input (Shen 2018) such as antecedent pr a and θ a to estimate wtd a , although not explicitly shown in the inputs. Figure 1(b) illustrates the workflow of LSTM-TL, encompassing two steps. Firstly, the LSTM networks are constructed at individual pixels on published modeling results (Furusho-Percot et al 2019) from an uncalibrated fully coupled atmosphericland-surface-GW modeling system (i.e. the Terrestrial Systems Modeling Platform, TSMP). The internal LSTM networks are trained on modeling data from January 1996 to December 2012 (204 time steps, about 80% of the total data), validated on modeling data from January 2013 to December 2014 (24 time steps, about 10% of the total data) and tested on modeling data from January 2015 to December 2016 (24 time steps, about 10% of the total data). The reader is referred to Ma et al (2021aMa et al ( , 2021b or the supplementary section for detailed information on the network training, validation and testing processes. Secondly, without additional training, the LSTM networks trained in the previous step are utilized to estimate wtd a based on pr a,o and θ a,o , thereby transferring the modeled input-output relationship (f m ) to the observation-based estimation. The implementation of LSTM-TL is based on the assumption that f m agrees well with the observed input-output relationship (f o ). Due to the lack of monthly GW pumping data, the current application of LSTM-TL does not account for the anthropogenic impact on GW.
LSTM-TL has three salient characteristics. First, the algorithm is independent of wtd a,o to estimate wtd a , which allows its usage in large regions even without wtd observations. Second, the methodology can produce longer-term wtd a estimates than modeling data used for training, and the time length of the obtained wtd a estimates depends on input pr a,o and θ a,o . This is useful for reconstructing historical wtd a and predicting future wtd a at the continental scale. Third, once the internal LSTM networks are successfully trained, the methodology can be used without additional training, and thus, requires small computational cost to generate new wtd a estimates.

Datasets
All anomalies here are relative to the 1996-2012 period (i.e. the training period of the LSTM networks in LSTM-TL), to avoid future information from leaking into the LSTM networks during training. The wtd a was computed by equation (1) for each calendar month and pixel individually to ensure the spatial comparability and to account for the seasonality. The calculation of pr a and θ a is similar to wtd a : where wtd m is the monthly data of wtd, wtd av is the climatological average of wtd m (i.e. averages of wtd m in January, February, …, December), and wtd sd is the climatological standard deviation of wtd m . The monthly modeled pr a (pr a,m ) and θ a,m as well as wtd a,m were derived from published modeling results of TSMP (the supplementary section) from 1996 to 2016, with a spatial resolution of 0.11 • (Furusho-Percot et al 2019). The θ a,m was calculated from the modeled surface layer θ valid for 0-5 cm soil depth. The monthly pr a,o were derived from the 0.5 • gridded ERA5 bias-corrected 'Rainfall flux' and 'Snowfall flux' data from 1979 to 2019 (Muñoz Sabater 2021a), the 0.055 • gridded COSMO-REA6 'TOT_PRECIP' data from 1995 to August 2019 (Bollmeyer et al 2015) and the 0.1 • gridded ERA5-Land 'Total precipitation' data from 1981 to May 2021 (Muñoz Sabater 2021b) at the individual pixel level, and the monthly θ a,o were obtained from the 0.1 • gridded ERA5-Land 'Volumetric soil water layer 1 ′ data (i.e. volume of soil water at 0-7 cm below the land surface) from 1981 to May 2021 (Muñoz Sabater 2021b) and the 0.25 • gridded GLEAM v3.5a 'SMsurf ' (θ at 0-10 cm below the land surface) from 1980 to 2020 (Miralles et al 2011, Martens et al 2017 at the individual pixel level. Detailed information of the observational datasets is given in the supplementary section. Before calculating anomalies, all input observational datasets were re-gridded to 0.11 • × 0.11 • via the first-order conservative interpolation method (Jones 1999).
The monthly wtd a,o were derived from consecutive monthly wtd measurements at 2569 GW monitoring wells (figure 2) located in European unconfined aquifers from 1996 to 2016. The wtd measurements were obtained either from web services or by request from governmental authorities; detailed information is provided in supplementary table 2. We calculated wtd a.o from the wtd measurements to remove errors related to the differences in reference surface elevations that were used by different countries. The 0.11 • gridded wtd a,o data used for evaluation were estimated by averaging wtd a.o from all the wells that lie within the same 0.11 • pixels. The number of wells within each 0.11 • pixel varies from 1 to 49, with a median value of 2 (supplementary figure 1). The pixels are unevenly distributed in eight hydrometeorologically different regions over Europe (named PRUDENCE regions) (Christensen and Christensen 2007), and their numbers and associated well numbers are provided in supplementary table 3. The yearly averaged wtd a,o generally ranged from −0.5 to 0.5 at individual pixels for the 1996-2016 period, as illustrated in figure 2. showing reliable performance in estimating wtd a over Europe, especially when considering the generation of wtd a,LSTM-TL without involving wtd a,o . Regional differences were observed in the scores (figure 3(e)). LSTM-TL always obtained the best scores in British Isles (BI), with a median r of 0.63, a median RMSE of 0.77, a median NSE of 0.35, and a median KGE of 0.60, and the worst scores in Mediterranean (MD), with a median r of 0.37, a median RMSE of 1.15, a median NSE of −0.05, and a median KGE of 0.29. We focused on the medians of the scores to reduce the influence of outliers on the overall performance evaluation. In addition to the limitations caused by the sparse number of observation wells (supplementary  table 3), the poor performance in MD might be due to: (a) the presence of intensive GW abstraction for irrigation in MD, the impact of which on GW is neglected in the current implementation of LSTM-TL; (b) the fact that the wells investigated are predominantly located in deeper aquifers where the internal LSTM networks tended to underperform. As shown in figure 3(e), in general, the scores of wtd a,LSTM-TL were improved compared to wtd a,LSTM(m) (outputs of the LSTM networks using modeling results from TSMP as input) and wtd a,m , and were deteriorated compared to wtd a,LSTM(o) (outputs of the LSTM networks trained on observations), which is expected. The wtd a, LSTM(o) is the only estimate that involved wtd a,o in its generation. The difference in the agreement of wtd a,LSTM-TL and wtd a, LSTM(o) with wtd a,o decreased for the 2015-2016 period (i.e. the test period when the LSTM networks were applied to previously unobserved data in the training, supplementary figure 2), which lends confidence in applying LSTM-TL in areas with limited or without wtd observations. More findings and discussion on the comparison of different wtd a estimates are presented in the supplementary section.

Recent GW drought analysis using reconstructed pan-European long-term water table depth anomaly data
The time length of wtd a,LSTM-TL is determined by input pr a,o and θ a,o , which allows the generation of wtd a,LSTM-TL beyond the time period of wtd a,m used for training (i.e. the 1996-2016 period). This is an advantage of LSTM-TL over using a physicsbased model alone. Here we adopted LSTM-TL to reconstruct pan-European monthly wtd a data RD1-6 (table 1) from the early 1980s to the near present based on various pr a,o and θ a,o pairs from observational datasets. Different input pr a,o and θ a,o data result in different wtd a,LSTM-TL data. The internal LSTM networks were only trained once on modeling results and then used for data reconstruction without additional training. Figure 4 illustrates the spatial distribution of yearly averaged wtd a,LSTM-TL over the European con-  (Bastos et al 2020, Boergens et al 2020, Hari et al 2020. A recovery of GW storage took more than one year in ME, EA and FR (figure 5). In these regions, the strongest impact of GW drought (manifested by the maximum wtd a,LSTM-TL ) was observed in October and November 2018 after summer precipitation deficits, reflecting the lagged response of GW to precipitation. Here we only considered the hydroclimatic effects on GW storage. GW droughts can be intensified by increases in GW withdrawals during drought periods when surface water supplies are low, which might prolong the recovery period of GW storage after droughts.

Seasonal trends of water table depth anomalies in various European regions
We derived seasonal wtd a trends in different PRUDENCE regions by fitting seasonal averaged wtd a,LSTM-TL from the reconstructed data RD1-6 into first-order polynomials y = mx + b. The m is the change rate of wtd a [y −1 ], where m > 0 corresponds to a drying tend and m < 0 corresponds to a wetter condition. The Wald test (Hauck and Donner 1977) was utilized to determine the statistical significance of the trends, with 95% as the threshold. For verification, we compared seasonal averaged wtd a and  their trends in ME (i.e. the PRUDENCE regions with the most pixels with wtd a,o ) obtained from RD1-6 and wtd a,o for the 1996-2016 period (figure 6). The seasonal averaged wtd a,LSTM-TL from RD1-6 showed a good match with wtd a,o in ME except overestimated values in the Summer-Winter 2002 (a record flood over Europe, Ulbrich et al 2003). Both RD1-6 and wtd a,o revealed that there was no significant wtd a trend in ME for the 1996-2016 period, with r-squared values ranging from 0 to 0.13. Figure 7 maps seasonal wtd a trends at the European scale in RD1-6 for the time periods listed in table 1, with pixels without significant trends masked in gray. Inspecting the maps, ME, EA and FR tended to experience intensified GW droughts while parts of MD tended towards a wetter GW pattern during the study periods. This finding is consistent with the trend pattern observed in the GRACE terrestrial water storage data for the 2002-2017 period, that is, a distinct pattern of mid-latitude drying and of highand low-latitude wetting (Reager et al 2016, Tapley et al 2019. However, the statistical significance of the GRACE-observed trend pattern was not assessed. The seasonal wtd a trends are strongly influenced by the LSTM-TL input pr a,o and θ a,o . The reconstructed data by LSTM-TL using GLEAM θ a,o as input (figures 7(b), (d) and (f)) exhibited more pronounced and coherent wtd a trends in four seasons than the ones using ERA5-Land θ a,o as input (figures 7(a), (c) and (e)). The latter showed weaker wtd a trends in spring and winter. In general, the seasonal wtd a trends in the reconstructed data were in good agreement with the trends in the LSTM-TL input pr a,o and θ a,o data used to produce the data (supplementary figures 4-9).

Discussion
Efficient GW monitoring is essential for European GW management, especially under increasing pressure with changing climate. Yet, it is challenging to monitor GW at large scales, mainly due to the sparsity of wtd observations. To address this challenge, we propose the LSTM-TL methodology, which combines LSTM networks and TL, to produce spatiotemporally continuous wtd a estimates at the European scale in the absence of wtd observations. Thus, LSTM-TL is useful for regions with no or sparse wtd observations but spatiotemporal continuous modeling results, which can be generated over any region worldwide, in principle. Note, this work does not conclude that the role of physically-based models in the hydrometeorological sciences can be replaced with AI technologies. On the contrary, in the absence of in-situ measurements, physically-based models are required to construct f m in the application of LSTM-TL.
The core rational of the methodology is to transfer the model-based relationship (f m ) between wtd a,m and input hydrometeorological variables (here pr a,m and θ a,m ) to the observation-based estimation to predict wtd a,LSTM-TL from pr a,o and θ a,o (e.g. from satellites or reanalysis data sets) also for regions without wtd observations. The approach is based on the major assumption that the model-based relationship f m agrees with the observed one (f o ). This assumption is indeed valid, since the agreement of wtd a,LSTM-TL with wtd a,o was improved by input correction compared to the agreement of wtd a,m with wtd a,o . This comparison constitutes a strong and independent test of the underlying assumption.
Compared with physically-based numerical models, LSTM-TL requires orders of magnitude less computational resources in generating wtd a estimates (i.e. wtd a,LSTM-TL ) at the same temporal and spatial scales, thereby facilitating early warning and fast decision making for extreme GW events. The reconstructed European monthly wtd a data successfully reproduced GW droughts in recent drought years. The slightly different drought distributions derived from different reconstructed data reveal the impact of input pr a,o and θ a,o on the obtained wtd a,LSTM-TL . Based on the European monthly wtd a data reconstructed by LSTM-TL, we provided the first estimates of seasonal wtd a trends in different European regions, which show that central and Eastern Europe had significant drying trends during the study periods. This is consistent with the trend of the GRACE terrestrial water storage data presented in Reager et al (2016) and Tapley et al (2019).
In addition, in the application, limitations of the derived LSTM-TL should be recognized. First, in this study, LSTM-TL neglects anthropogenic  interventions, such as GW pumping, on GW dynamics, because the modeling results used for training the internal LSTM networks give a near-natural representation of the water cycle (Furusho-Percot et al 2019). Thus, the methodology may underestimate wtd a in regions with strong human interventions such as IP and MD (de Graaf et al 2019). Second, while the utilized modeling results guarantee good spatial coverage of the European domain, time series are limited at the individual pixel level (<300 time steps for training), which may reduce the skill of LSTM-TL in estimating wtd a . Improved performance of LSTM-TL can be expected as training data increase, which would require extending f m . Also, the 0.11 • (∼12.5 km) resolution is not high enough to resolve the topographic impact of individual hillslopes on wtd (Fan et al 2019). This can be solved by the improved resolution of the modeling results used for training. Third, the behavior of LSTM networks is more difficult to interpret than other AI techniques because of the time-varying weights and biases in the connection of neurons. Here, the input variables were selected based on the network performance i.e. the contribution of individual input variables to the improvement of test performances (Ma et al 2021a). Using more interpretable AI methods (e.g. XGBoost) to facilitate the selection of input variable may provide additional insight into the response of GW to other hydrometeorological variables, as shown in Hadi et al (2019) and Ravindran et al (2021). Last but not least, we found the LSTM networks in LSTM-TL usually had poor test performance in deep aquifers, which might be attributed to the inability to capture very long-term dependences between wtd a and input hydrometeorological variables. Some alternative DL techniques that are superior in time series forecasting can be considered, such as transformers (Vaswani et al 2017), which can effectively capture dependences within the time steps used for training.
In summary, our results show that LSTM-TL provided the best spatiotemporally continuous wtd a estimates over Europe and may serve as a useful alternative to in-situ wtd a,o until collated wtd data sets are available. In addition to data reconstruction described in this study, LSTM-TL can be deployed in online GW monitoring, which would be useful in European GW management.

Data availability statement
The TSMP-G2A data set is available online at 10. 17616/R31NJMH3 (Furusho-Percot et al 2019). The pr a,m and wtd a,m data are available online at 10.26165/ JUELICH-DATA/WPRA1F (Ma et al 2021c). The θ a,m data are available online at 10.26165/JUELICH-DATA/AMQ6NI (Ma et al 2021d).
The data that support the findings of this study are openly available at the following URL/DOI: https://doi.org/10.26165/JUELICH-DATA/ZBLDIR (Ma et al 2021e). van