Explainable AI reveals new hydroclimatic insights for ecosystem-centric groundwater management

Trustworthy projections of hydrological droughts are pivotal for identifying the key hydroclimatic factors that affect future groundwater level (GWL) fluctuations in drought-prone karstic aquifers that provide water for human consumption and sustainable ecosystems. Herein, we introduce an explainable artificial intelligence (XAI) framework integrated with scenario-based downscaled climate projections from global circulation models. We use the integrated framework to investigate nonlinear hydroclimatic dependencies and interactions behind future hydrological droughts in the Edwards Aquifer Region, an ecologically fragile groundwater-dependent semi-arid region in southern United States. We project GWLs under different future climate scenarios to evaluate the likelihood of severe hydrological droughts under a warm-wet future in terms of mandated groundwater pumping reductions in droughts as part of the habitat conservation plan in effect to protect threatened and endangered endemic aquatic species. The XAI model accounts for the expected non-linear dynamics between GWLs and climatic variables in the complex human-natural system, which is not captured in simple linear models. The XAI-based analysis reveals the critical temperature inflection point beyond which groundwater depletion is triggered despite increased average precipitation. Compound effects of increased evapotranspiration, lower soil moisture, and reduced diffuse recharge due to warmer temperatures could amplify severe hydrological droughts that lower GWLs, potentially exacerbating the groundwater sustainability challenges in the drought-prone karstic aquifer despite an increasing precipitation trend.


Introduction
Groundwater is often the primary source of water supply in semi-arid regions with limited access to surface water [1]. Vulnerability of groundwaterdependent regions to droughts under climate change is a cause for concern [2][3][4] as the reliance on groundwater for water and food security will likely increase due to more frequent and severe droughts [5]. Pumping-stress on aquifers may intensify in a warmer and drier future [6], increasing the endangerment and risk of extinction for groundwater-dependent species [7,8]. Although the precipitation and temperature averages and extremes are likely to rise in many regions around the world [9][10][11], global climate change impacts on recharge rates and groundwater levels (GWLs) at the regional aquifers remain highly uncertain [12,13].
The global precipitation during the 21st century is generally expected to follow a 'rich getting richer' pattern, in which already wet areas of the deep tropics and mid-latitudes get wetter, whereas arid and semi-arid regions in the subtropics get drier [5,[14][15][16]. Therefore, it is beneficial to investigate the global climate change impacts on the regional hydroclimatic conditions that affect groundwater availability using trustworthy predictive models. Artificial intelligence (AI) models emerged as reliable tools to predict droughts and climate change impacts on hydrological systems [17] that can be applied to answer the critical overarching question, i.e.: Under global warming and climate change, will groundwater-dependent regions likely experience a wet future with sustainable groundwater supply or a dry future with dwindling groundwater availability (figure 1)?
This paper introduces an explainable artificial intelligence (XAI) framework for enhanced understanding of the relationship between the groundwater levels (GWLs (m) above mean sea level), the projected temperature (T ( • C)), and precipitation (P (mm)) under representative concentration pathway (RCP) scenarios. These scenarios were established based on greenhouse gas (GHG) concentration trajectories adopted by the intergovernmental panel on climate change, and subsequently forecast the recurrence(s) of severe droughts under future climates. We specifically focus on hydrological droughts [18] associated with significant declines in GWLs, which could adversely impact the sustainability of consumptive water use and environmental flows in semiarid karstic aquifer regions. The novel XAI framework provides counterintuitive hydroclimatic insights to explain why groundwater depletion could occur despite increased precipitation under climate change. Researchers, water resources managers, and stakeholders across the globe must exercise caution when projecting future groundwater availability in a warming climate as the consequential regional dynamics might have counterintuitive implications for groundwater sustainability.

Using XAI to project GWL
Although the exact definition of XAI is debatable [19,20], explainability refers to fidelity (i.e. agreement with real-world processes) and interpretability (i.e. explanation is simple and transparent enough to understand) [21]. Such explanations can be used to generate testable hypotheses [22] or to confirm that the AI model resonates with real-world physical systems. We apply XAI in the semi-arid Edwards Aquifer Region (EAR) in south-central Texas to advance understanding of the recurrence(s) and groundwater impacts of severe hydrological droughts in a warming future. The XAI framework reveals how much each predictor contributes-either positively or negatively-to the predicted GWL (m) (global explanation), and the average marginal contribution of each predictor value to the corresponding predicted value (local explanations). Using these global and local explanations, we investigate how each hydroclimatic feature influences the GWL (m) under the impact of climate change and evaluate the critical climatic inflection points (triggers) that could potentially cause severe declines in GWLs. The ancillary explainability of the AI models is expected to enhance the users' confidence, and support decision-making in water resource management.

Edwards aquifer: a drought-prone water source for people and ecosystems
The Edwards aquifer system (figure 1) located in south-central Texas, United States, is one of the world's most species-rich groundwater systems, home to eight threatened and endangered endemic aquatic species (e.g. Texas blind salamander, San Marcos salamander) [8,23]. The karstic aquifer is the primary source of drinking water for the city of San Antonio and surrounding regions with a population of 2.32 million. The aquifer provides water also for recreational, ranching, irrigation, and industrial uses in the region. Based on projections from global circulation models (GCMs), the ecologically important semi-arid region would experience upward trends for both temperature and precipitation by 2100 (figure 1;  [25]. The region endured several droughts over the past century, including a seven year-long severe (historically worst) drought in the early 1950s, characterized by extremely low rainfall and high temperatures [26,27]. Unlike many other large aquifer systems, total permitted withdrawals in the EAR are capped regardless of future population and economic growth, which makes it an ideal study site to isolate the impact of climate change on groundwater resources. In addition, a habitat conservation plan with mandated reductions in permitted groundwater withdrawals during droughts have been established to protect the habitats for the endemic groundwaterdependent species at the spring outlets [23]. Groundwater pumping reductions are enforced according to a tiered critical period management (TCPM) pumping restriction plan. Five critical stages (CS1 through CS5) are designated based on GWL data from index wells and/or spring flow data to curb groundwater pumping depending on the intensity of the hydrological drought. Higher CS levels are indicative of larger depletion of the aquifer associated with lower GWL (m), and hence, mandate larger reductions in permitted groundwater withdrawals. We use the XAIbased projections of GWL (m) in the future in conjunction with the GWL (m) during the 1950s severe drought and the TCPM pumping restrictions (see supplementary figures 4-5 for details) to illustrate the implications of climate change for groundwater availability in terms of the mandated groundwater pumping reductions during drought events.  ( • C)) and weekly-aggregate precipitation (P (mm)) in the San Antonio region. The historical data were acquired from the San Antonio International Airport (www.ncdc.noaa.gov/isd/data-access) and the future projections were statistically downscaled from CMIP5 scenarios for the representative concentration pathways RCP 4.5 and RCP 8.5. Tmax ( • C) and P (mm) are projected to increase in the San Antonio region from 2021 through 2100 under different climate change scenarios. The bottom panel shows the location of the Edwards aquifer, a prolific karstic aquifer system in semi-arid south-central Texas, United States. The aquifer is the primary source of drinking water for the city of San Antonio and home to several threatened and endangered aquatic endemic species at the Comal and San Marcos springs. Groundwater flow is dominantly from north to south in the west and central portions, from west to east in the central and eastern portions of the study area and is discharged at the San Marcos and Comal springs. GWLs (m) at the J-17 Bexar index well (or J-17 in short) and flow rates at the Comal and San Marcos springs are used to determine the mandated reductions in groundwater withdrawals in the San Antonio pool during drought periods. Groundwater replenishment across the recharge zone is paramount to sustain consumptive water uses and environmental flows. The J-17 well is in the artesian portion of the aquifer in the south of the recharge zone in Bexar County. It is sensitive to aquifer pressure changes and is used to represent conditions in the entire San Antonio pool. The Comal springs are located downstream of the recharge zone in Bexar county. Based on the historical data, aquifer recharge in Bexar County region is vulnerable to droughts, and reductions in aquifer recharge result in lower GWL (m) at the J-17 well (supplementary figures 1-2 (available online at stacks.iop.org/ERL/16/114024/ mmedia)) and lower flow rates at the ecologically sensitive Comal springs (supplementary figure 3).

Climate change impacts on the hydrologic cycle
Future temperature and precipitation trends and drought assessments are dependent on GHG concentrations in the atmosphere [28,29], which are often represented by RCPs in GCMs. Under different RCPs, global temperatures were projected to increase by 2.0 • C-4.9 • C, with a median of 3.2 • C, by the end of the 21st century [30]. Increased heating from global warming may not cause droughts, but when droughts occur, they are likely to set in quicker and be more intense [31]. However, based on 1.5 • C-3 • C warming across the globe compared to the preindustrial temperatures, two-thirds of the global population is predicted to experience a progressive increase in drought exposure, whereby the current 1-in-100 year droughts could occur every five years over more than 15% of the global land. Also, the global mean drought duration could rise from 7 months to 9, 11, and 18 months under 1.5 • C, 2 • C, and 3 • C warming while water supply-demand deficits in many regions could increase up to five times under 3 • C of warming [32]. The annual average precipitation and evaporation are expected to concurrently increase in future climates [9][10][11] as global warming tends to accelerate the hydrological cycle and intensify the wet extremes [14,33]. Many climate models project a 1%-3% increase in global precipitation [9, 34] and 1.5%-4% increase in potential evapotranspiration [35] per 1 • C of warming. At the 2 • C-4 • C warming levels, the global terrestrial annual precipitation was projected to increase by an average of 0.02%-3%, while the annual potential evapotranspiration was projected to increase by 6%-15% [36]. These results suggest that warming induced increases in average global potential evapotranspiration would offset increases in precipitation and could lead to more severe droughts. However, regional changes in precipitation (P (mm)), evapotranspiration (ET (mm)), and the dynamics that drive such changes remain uncertain [37]. Although high temperatures and disruptions of P (mm) patterns in future climates are known to exacerbate atmospheric moisture demands and drought conditions, hydroclimatic conditions concurrent with increasing T ( • C) and P (mm) trends under global warming remain poorly understood in semi-arid regions. We investigate this challenging problem using the Multivariate Adaptive Constructed Analogs (MACA) projected future climate datasets from twenty CMIP5 models [38] and two representative concentration pathways-RCP 4.5 and 8.5 [39]-(supplementary note (A)). As described in supplementary figures 6-8, among these 20 models, we selected the CMIP5-MACA models that were statistically most representative of T min ( • C), T max ( • C), and P (mm) in the study area. Projections of T min ( • C), T max ( • C), and P (mm) from these CMIP5-MACA models were then used with the trained XAI models to project future GWL (m). In supplementary figure 9, we provided the rationale for excluding multi-CMIP5-MACA averaging in our analysis.

Method
This research was carried out in three phases. In the first phase, we acquired historical local climate data recorded at the nearby weather station, historical GWL (m) measured at the index well, and historical simulated climate data and future simulated climate data from GCM models, as explained in section 2.1. This step was followed by preliminary data analysis, data preprocessing, and feature engineering, as described in sections 2.2 and 2.3, to identify hydroclimatic variables to be used as predictors in the simple linear regression and the AI models. Subsequently, we developed the tree-based ensemble AI model as described in section 2.4 and applied the trained AI model to predict the GWL (m) on the test dataset, as part of a three-phase model validation scheme. The validated models were used to project weekly GWL (m) from 2021 to 2100, using downscaled weekly T min ( • C), T max ( • C), and P (mm) projections acquired from CMIP5-MACA. The validated AI model were then combined with the explainable algorithms, built on game theory-based Shapley Additive Explanations (SHAP) analysis, which transformed the AI model into an XAI model. The XAI model was then used to unravel the AI's rationale behind the predictions on the unseen testing data as described in section 2.5.

Data sources
Input data for the simple linear regression and XAI models include historical daily climate data, daily GWLs at an index well, and projected daily climate data statistically downscaled from the CMIP5 models. In general, the historical data cover the existing records up to 2020 except for when the AI models are used with the historical climate data from the MACA database, which end in 2005. Daily historical recorded climate data-including T min ( • C), T max ( • C), and P (mm) between 1946 and 2020-from the nearest international airport were acquired from the National Oceanic and Atmospheric Administration (NOAA)'s integrated surface database [40]. The modeled daily historical climate data between 1950 and 2005, and daily projected climate data from 2006 to 2100 under RCP 4.5 and RCP 8.5 scenarios were obtained from the MACA database [39]. The modeled daily historical climate data-including T min ( • C), T max ( • C), and P (mm)-were downscaled from 20 different CMIP5 models at a resolution of 4 km (supplementary note (A)) and compared with the historical recorded climate data to identify the most representative CMIP5 models for the study site in this research (supplementary figures 6-9; supplementary note (A)). The daily recorded historical GWL (m) at J-17 index well from 1934 to 2021 were obtained from the Edwards Aquifer Authority (EAA). Because daily local climate data were not available prior to 1946, GWL (m) measurements prior to 1946 were not used in the modeling process. The daily historical ET (mm) and SM (mm) data were obtained from the TerraClimate database for the study site between January 1958 and December 2019 [41].

Description of hydrological drought
The hydrological drought is described by TCPMguided mandated curtailments in groundwater withdrawals in droughts, based on daily GWL (m) recorded by the EAA at the J-17 index well since 1946. We applied the TCPM to the historical data to calculate decadal frequencies of required groundwater pumping reductions to characterize the worst historical drought at the semi-arid study site and used the decadal frequency of the highest required reductions (i.e. 40% or more) as the baseline (supplementary figures 4 and 5) to determine the likelihood of occurrences of future severe hydrological droughts under RCP 4.5 and RCP 8.5 scenarios. The details of the underlying calculations are provided in the supplementary note (B).

Data pre-processing and feature engineering
The hydroclimatic data from the sources mentioned above were resampled at a weekly timestep in which the daily T max ( • C), T min ( • C), and GWL (m) were averaged, and the daily P (mm) was aggregated. We extracted a consistent set of features-referred to as lagged features-that capture the information present in all previous timesteps to make predictions at subsequent steps in time, which means that the predictions are generated sequentially. These lagged features include the first and second lags of the weekly P (mm) and GWL (m) based on our hypothesis that the events occurring in the previous two weeks will aid in determining the event about to occur at the next point in time. For instance, recharge lag (observed as GWL (m) lag) could be caused by delay of percolation through epikarst or by added time required for accumulated runoff from the contributing zone to reach the recharge zone. We also extracted the month of the year from the timestamp as groundwater pumping rates vary monthly. The historical recorded data were partitioned into two separate sets-80% for AI model training and 20% for AI model testing. The data from September 1946 to December 2005 were reserved for training; whereas the data between January 2006 and November 2020 were used for testing the predictive capability of the trained AI models.

AI model development for GWL predictions
The climatic features have a variety of complex nonlinear interactions with the GWL (m). To capture such nonlinearities, we applied tree-based boosting [42] ensemble techniques. Furthermore, because there is a strong linear correlation between GWL lag1 (m) and GWL (m) (section 3.1), we compared the performance of an ordinary least square linear regression model with the tree-based boosting ensemble models. We found the ordinary least square linear regression model predicted GWL (m) as accurately as the tree-based boosting ensemble model using the test data between January 2006 and December 2020. We also found that the linear regression model predicted more accurately the GWL (m) between January 2006 and December 2020 under the RCP 4.5 scenario, whereas the tree-based boosting ensemble model performed better under RCP 8.5 scenario (figure 4). The mathematical descriptions of these models are provided next.

Linear regression model
The ordinary least square linear regression fits a linear model to minimize the residual sum of squares between the targets and the predictions by linear approximation. It is mathematically defined as, where Y is the target GWL vector, X is the matrix of independent (predictor) variables, W is a vector of regression coefficients, and b is a vector of biases.

Extreme gradient boosting (XGBoost) model
XGBoost is a variant of tree-based boosting algorithm. Conceptually, XGBoost learns the functional relationship f between the predictor variables X and target Y through an iterative process in which the individual trees are sequentially trained on the residuals from the previous tree. Mathematically the predictions from the trees can be expressed as, whereŶ is the predicted GWL and n is the total number of functions learned by n numbers of trees. The following regularized objective ζ (φ) is minimized to learn the set of functions f k and where l is a differentiable convex loss function that measures the difference betweenŶ i (prediction) and Y i (target). Ω is the extra regularization term that penalizes the growing of more trees in the model to prevent complexity and thus, reduce overfitting. λ is a penalty parameter, and ∥ w ∥ is the vector score on the leaves. When Ω = 0, the model becomes equivalent to the traditional gradient tree boosting.
In this research, all parameters of the XGBoost model were tuned using a three-fold cross validation process, which results in a smoother and generally better model. The tuned model was tested on an unseen portion (20%) of the original dataset and the performance was measured using the coefficient of determination (R 2 ) and the root mean square error (RMSE) metrics, in whichŶ i are the predictions, Y i are the targets, n is the total number of datapoints in the testing set, 1 ⩽ i ⩽ n, and µ is the average of the targets in the testing set.

Explaining the GWL predictions
Unraveling the underlying reasons why an AI model makes certain predictions is a key challenge in the hydroclimatic domain and is likely to encourage the adoption of AI by practitioners and stakeholders for high stake hydrological decisions geared towards long-term sustainability. Recent research showed that tree-based interpretable models integrated into an XAI framework provide new knowledge and insights about the evapotranspiration process (e.g. transition from low to high evapotranspiration as a function of combined critical values of daily solar radiation, temperature, and relative humidity) without sacrificing the predictive accuracy on structured hydroclimatic datasets [21]. XAI engenders reliability of predictions and generates new insights to decisionmakers. Although tree-based models are inherently simple and interpretable, their complexity increases when ensembled using the boosting technique. Thus, we applied a post-hoc model agnostic representation of feature importance, where the impact of each feature on the model is calculated using a tool called SHAP based on Shapley values [43][44][45][46]. In other words, SHAP integrated with the AI models yields the XAI framework to reveal how much each predictor contributes-either positively or negatively-to the predicted GWL (m) and the average marginal contribution of each predictor value to the corresponding predicted value. The XAI-generated explanations can be categorized as global (i.e. summarized relevance of the input features in the model) or local (i.e. based on individual predictions). The Shapley value is the average marginal contribution of each feature value across all possible combinations of features. The features with large absolute Shapley values are deemed important. To obtain the global feature importance, the absolute Shapley values are averaged for every feature across the data and sorted in decreasing importance prior to plotting. Each point on the plot represents a Shapley value for individual features and instances. The position on the x-and the y-axis is determined by the Shapley values and the feature importance, respectively, and a color scale is used to depict the feature values from low to high. Thereby, enhancing the model's explainability by reporting the degree of importance of the features and the impact of the individual feature values on the model's predictions. We also provide a rich summary in figure 2 by combining local explanations from the SHAP tree explainer algorithm to interpret the output of ensemble tree-based AI models across the entire dataset.

Modelling historical data
Analysis of the historical data demonstrated that the semi-arid study site experienced the worst drought in hydrologically recorded history in the 1950s [27], which resulted in large effective precipitation (P (mm)-ET (mm)) deficiencies for the aquifer recharge, causing a severe hydrological drought (supplementary figures 1-3). A baseline linear regression model and the XAI, trained with weekly hydroclimatic data from September 1946 to December 2005, are capable of accurately predicting (with R 2 : 0.97 and 0.95, respectively) the weekly GWL (m) from January 2006 to November 2020 during the model testing phase and also disclosing the influence of the hydroclimatic features on the GWL (m) predictions (figures 2(A) to (F)). The XAI revealed that the first lag of the GWL (GWL lag1 ) (m), first lag of the P (P lag1 ) (mm), T max ( • C), and P (mm) are the key determinants of the GWL (m) at the next point in time (figure 2(F)). The XAI further provides information on the average marginal contribution of each predictor value (for the key features) to the corresponding predicted GWL (m) value (figures 2(G)-(J)). The average marginal contribution of GWL lag1 (m)-denoted by the SHAP values on the y-axis-is linearly related to the GWL lag1 (m) values on the x-axis, whereas the average marginal contributions of P lag1 (mm), T max ( • C), and P (mm) are nonlinearly related to their corresponding values (figures 2(G)-(J)). Interestingly, the critical inflection point for the T max ( • C) is ∼30 • C, at which the corresponding SHAP values start following a decreasing trend (figure 2(I)). From this, we infer that declines in GWL (m) are more likely when the weekly-averaged T max ( • C) is ⩾30 • C. The 30 • C threshold appears to be well-representative for the declines in GWL (m) at the semi-arid study site as high T max ( • C) typically occurs in May through September, which results in high ET (mm) while the aquifer is pumped more heavily under typical precipitation conditions from May through July to meet irrigation and municipal demands [47].

Uncertainty estimation
We employed a three-phase validation process (figure 3) to ensure that the predictive GWL (m) models are accurate and trustworthy. First, to ensure that the downscaled MACA data can be reliably used to predict the future, we compared the historical MACA data (D1) with the recorded historical climate data obtained from NOAA (D2) between January 1950 and December 2005. As illustrated in figure 3(A), the historical MACA dataset is statistically representative of the semi-arid study site (with R 2 values of 0.999, 0.998, and 0.996 for T min ( • C), T max ( • C), and P (mm), respectively). Secondly, to ensure that the model can predict the future with high accuracy, the recorded climate data (D2) and recorded GWL (m) data (D3) over the second overlapping time period from September 1946 to November 2020 were used to develop the predictive XAI models, in which the data between 2006 and 2020 were hidden initially during model training and later used to test the predictive performance on the unseen test data. The results revealed that the XAI models accurately predicted the overall trend of the recorded GWL (m) in the test dataset ( figure 3(B)). The accuracy of the predicted values on the unseen test data (R 2 = 0.97 with linear regression and 0.95 with XGBoost) shows that the models can predict the future GWL (m) with high accuracy and fidelity. Lastly, after integrating the GWL models with the MACA projected climate data, the predicted GWL (m) data and the measured GWL (m) data (D3) for the overlapped period from January 2006 through December 2020 were compared. The predicted weekly GWL (m) under the RCP 8.5 scenario shows better correlation (R 2 = 0.95 with XGBoost) with the recorded GWL (m) data from January 2006 to December 2020 than under the RCP 4.5 scenario (R 2 = 0.9 with linear regression and 0.822 with XGBoost)-( figure 3(C)). We also notice that the XGBoost is excellent at predicting the GWL (m) in the lower ranges whereas, the linear regression model has over-predictive tendencies ( figure 3(C)). This three-phase validation process provides us with a good understanding of the uncertainties in the model and input data while ensuring that the predictive models are trustworthy.

Future GWL projections and predicted reductions in groundwater withdrawals
The input data used for the projection of weeklyaverage GWL (m) for the period of January 2021 through December 2100 include the first lag of GWL (m) and weekly average projected climate data statistically downscaled from CMIP5 models through the MACA method. The projected climate variable involves weekly total P (mm) and its first lag, in addition to weekly-average T min ( • C) and T max ( • C) for the period of January 2021 through December 2100. These input variables were chosen based on SHAP analysis in figure 2(F). Using these input data, GWL (m) projections were performed using the validated and trained linear regression and XGBoost models (figure 3) for the period of 2021-2100, following the same procedure in the phase 3 validation.  (table 2). Moreover, based on the TCPM and projected GWL (m) under RCP 4.5 and RCP 8.5, the 1950s hydrological drought (or worse) would likely Table 2. Comparison between the descriptive statistics of the historical and projected weekly P (mm), Tmax ( • C), and GWL (m). The associated uncertainties based on the root mean squared error are computed to be within 1.51 m (using linear regression) and 1.07 m (using XGBoost) under the RCP 4.5 and RCP 8.5 scenarios, respectively. The summarized GWL projections from December 2020 to December 2099 under RCP 4.5 and RCP 8.5 are obtained from the linear regression and XGBoost models, respectively.   supplementary note (B)). Interestingly, although increased mean and median P (mm), relative to historical data, are projected under RCP 8.5, future hydrological droughts under RCP 8.5 are more likely and may have longer duration and higher frequency than the historical records if mitigation and adaptation measures are not considered. Therefore, the notion that increased P (mm) under global warming and climate change may eliminate the risk of hydrological droughts could be misleading. It is necessary to examine concurrent hydroclimatic conditions in warm-wet futures in different regions across the world, especially in the semi-arid regions to develop robust adaptive water resources management plans that are informed by likely future climate impacts on groundwater recharge. As compared to RCP 4.5, likelihood of more frequent 40% or more reductions in groundwater withdrawals (CS4 and CS5 under the TCPM) at the semi-arid study site under RCP 8.5 in 1-5 future decades could pose great risk for the sustainability of the groundwater consumptive use and environmental flows in the absence of mitigation and adaptation measures (figures 4(B)-(H); supplementary note (B)). Specifically, 40% or more mandated reduction in withdrawals 2.5%-71% of the times in each decade from 2020 to 2100 under RCP 8.5 using the XAI model was found to be significantly higher than the historical decadal frequencies calculated using the TCPM (supplementary figure 11). Such high-frequency cutbacks in the groundwater withdrawals (supplementary note (B)) could exert increased stress on the water and food security, natural capital, and economic welfare at the semi-arid study site, which has already experienced a 350% population growth since the 1950s (supplementary figure 12) and continues to grow. Hence, these projections underscore the need to reevaluate the adaptation and mitigation strategies, including groundwater and habitat conservation plans [21], to sustain growing water demands while maintaining environmental flows to protect the biodiversity of groundwater-bound species under future climates. The mandatory groundwater and habitat conservation plans currently in effect at the study site are not typically in effect in many large aquifers across the globe, making those aquifers highly vulnerable to climate change.

Exploring the reasons behind the severe drought projections
Using the XAI, we can interpret potential reasons behind the counterintuitive result of severe future droughts predictions despite higher P (mm) based on historical TerraClimate data between January 1958 and December 2019. This phenomenon likely occurs due to higher ET (mm) and lower soil moisture (SM (mm)) under a much warmer future ( figure 5(A)). An initial question when considering the results is, 'Why does the XAI framework project future severe droughts comparable to the 1950s severe drought event although higher P (mm) has been projected for the EAR region from the downscaled CMIP5 MACA data?' The XAI projected future severe droughts are different from the historical 1950s drought, in which hydrological drought occurred during a seven-year long period of low P (mm). The SHAP analysis demonstrates that T max ( • C), which is a key influencer of the GWL (m), is assigned positive SHAP values by the XAI for T max ( • C) below ∼30 • C and large negative SHAP values with T max ( • C) over ∼30 • C ( figure 5(B)). Thus, indicating that high T max ( • C) can result in significantly lower GWL (m) in semiarid regions, especially under the RCP 8.5 scenario.
In an ideal environment, P (mm)-ET (mm) determine the percolating water volume potentially recharging the aquifer. However, higher P (mm) (input) is generally associated with higher ET (mm) (output) in figure 5(A). Simply put, the projected ET (mm) is large enough to offset the effects of higher P (mm) in the future. In a more complex environment, another consequence of a high T max ( • C) that must be considered is a lower SM (mm). Instead of the P (mm) fully percolating through the soil to raise the GWL (m), some residual P (mm) is held by the dry soil until soil field capacity (i.e. upper limit of water capacity of soil) is reached. As the T max ( • C) increases further, the residual SM (mm) evaporates, resulting in Figure 5. SHAP analysis of the XAI models with historical data (January 1958-December 2019) from TerraClimate that illustrates why the XAI framework projects future severe droughts. (A)-Higher precipitation (P (mm))-along the x-axis-events accompanied by higher evapotranspiration (ET (mm)) depicted by the red dots, which suggests that the ET (mm) is large enough to offset the effects of higher P (mm) and thus, leading to lower GWL (m) in the future. (B)-Higher temperature (Tmax ( • C))-along the x-axis-leading to lower soil moisture (SM (mm)) depicted by blue dots and consequently lower GWL (m), which implies that instead of the P (mm) fully percolating through the soil to raise the GWL (m), some residual P (mm) is held by the dry soil until soil field capacity is reached.
even drier soil with the potential to absorb increasingly more P (mm) and thus, lowering the GWL (m) ( figure 5(B)). The double-devil effect of increased ET (mm) and reduced SM (mm) can be expected to occur in hydro-climatically similar groundwaterdependent regions around the world [48], potentially exacerbating the water sustainability challenges in a warm-wet future.

Discussion and conclusions
Knowledge of temporal distribution and the likelihood of recurrence of severe hydrological droughts is crucial to our understanding of groundwater depletion to plan effective adaptation and mitigation strategies in the face of future climatic uncertainties. We introduce two different modeling techniques for hydrological drought predictions: (a) a simple, yet interpretable linear regression that can predict the GWL with desirable accuracies on the testing datasets; and (b) a relatively complex tree-based ensemble model (XGBoost) integrated with an XAI tool that performs adequately well to forecast the long-term GWL (m) in addition to unveiling new insights about the linear and nonlinear dependencies and interaction between the independent and dependent variables, which enhances the fidelity and interpretability of the model. Here, we do not claim that the XGBoost model integrated with an XAI tool is better than the linear regression model or vice-versa, but the use of these two models with high prediction accuracies is a promising approach to constrain the uncertainties associated with the projected hydrological droughts. For example, the models project severe hydrological droughts (comparable to or worse than the 1950s drought at the semi-arid study site) in 0-3 decades-depending on the model (linear regression or XGBoost)-under RCP 4.5 in the absence of additional adaptation and mitigation measures. Under RCP 8.5, however, severe droughts are more likely in 1-5 decades due to comparably higher T max ( • C) in the second half of the 21st century, although the average P (mm) is projected to be higher in the warming future compared to the past. Under the worstcase scenario (RCP 8.5) and without consideration of the effects of mitigation requirements, permitted groundwater withdrawals would be required to be reduced by at least 40% for >50% of the time in 3 of the next 8 decades under the current TCPM strategy.
Thus, under this scenario as modeled, projections where T max exceeds 30 • C for extended times may represent a transition to a tipping point after which sustainability of groundwater for consumptive use is at much greater risk. This analysis underscores the need for a more thorough examination of climate change in semi-arid regions and the consideration of additional mitigation strategies and water sources to build groundwater resilience. It would be at the discretion of water managers which model and scenario to consider for long-term future decision-making to ensure the sustainability of precious groundwater resources and protect the fragile groundwaterbound ecosystems. For conservative decision making (in terms of climate-resilient sustainability without relying on political motivation), they may consider the projections and interpretations based on the XGBoost model integrated with an XAI tool under RCP 8.5. Alternatively, a less risk-averse trajectory can be planned in accordance with the linear regression projections under RCP 4.5. In either case, the results from these models correspond to the baseline cases to further improve, develop, and test the efficacy of groundwater and habitat conservation plans to mitigate climate change-induced stress on groundwater resources in large karstic aquifer systems.
Although noninterpretable AI models (e.g. deep learning models) were reported to be useful for the environment and water management decision-making [49], their internal mechanics, calculations, and underlying basis for predictions and decision-making are not transparent to users; therefore, often considered as 'black-box' models [19], which could limit their use in practice. On the other hand, unraveling the underlying reasons why an AI model makes certain predictions is a key challenge in the hydroclimatic domain and is likely to encourage the adoption of AI by practitioners and stakeholders for high stake hydrological decisions geared towards long-term sustainability. XAI engenders reliability of predictions and generates new insights to decision-makers, including the critical T max ( • C) inflection point beyond which groundwater depletion is triggered despite increased average P (mm). The framework also helps illustrate the nonlinear relationships between P (mm) and ET (mm) and T max ( • C) & SM (mm), facilitating a better understanding of the hydroclimatic process, which are otherwise impossible to unveil even with the inherently interpretable linear regression methods. Although focused aquifer recharge along the streams or depressions typically occurs during major storm events, the XAI indicates that diffuse recharge (distributed recharge over large areas between the streams) occurs when weekly-average T max ( • C) is below 30 • C. Diffuse aquifer recharge, however, is expected to decrease in a warming future when the frequency of weeklyaverage T max ( • C) > 30 • C increases. Based on the data used in our analysis, future extreme P (mm) events (P max ), especially under RCP 4.5, could result in higher maximum GWL (m) than the historical maximum GWL (m), which can be attributed to larger focused recharge along the streams during big storms. However, because the mean and median GWL (m) under RCP 4.5 and RCP 8.5 in the warming future are projected to decrease, the increased focused recharge under extreme P (mm) events would not be sufficient to counterbalance the decreases in diffuse recharge in a warming future with more frequent weekly-average T max ( • C) > 30 • C. These non-linear GWL-climate dynamics call for climate change-informed development of the ecosystemcentric groundwater management plans to continue to protect the threatened and endangered endemic groundwater-bound species into the future. The XAI model was shown, in this paper, to be a trustworthy and accountable predictive tool to capture such nonlinear dynamics.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.