Effects of climate and irrigation on GRACE-based estimates of water storage changes in major US aquifers

Understanding climate and human impacts on water storage is critical for sustainable water-resources management. Here we assessed climate and human drivers of total water storage (TWS) variability from Gravity Recovery and Climate Experiment (GRACE) satellites compared with drought severity and irrigation water use in 14 major aquifers in the United States. Results show that long-term variability in TWS tracked by GRACE satellites is dominated by interannual variability in most of the 14 major US aquifers. Low TWS trends in the humid eastern U.S. are linked to low drought intensity. Although irrigation pumpage in the humid Mississippi Embayment aquifer exceeded that in the semi-arid California Central Valley, a surprising lack of TWS depletion in the Mississippi Embayment aquifer is attributed to extensive streamflow capture. Marked storage depletion in the semi-arid southwestern Central Valley and south-central High Plains totaled ∼90 km3, about three times greater than the capacity of Lake Mead, the largest U.S. reservoir. Depletion in the Central Valley was driven by long-term droughts (⩽5 yr) amplified by switching from mostly surface water to groundwater irrigation. Low or slightly rising TWS trends in the northwestern (Columbia and Snake Basins) US are attributed to dampening drought impacts by mostly surface water irrigation. GRACE satellite data highlight synergies between climate and irrigation, resulting in little impact on TWS in the humid east, amplified TWS depletion in the semi-arid southwest and southcentral US, and dampened TWS deletion in the northwest and north central US Sustainable groundwater management benefits from conjunctive use of surface water and groundwater, inefficient surface water irrigation promoting groundwater recharge, efficient groundwater irrigation minimizing depletion, and increasing managed aquifer recharge. This study has important implications for sustainable water development in many regions globally.


Introduction
Water sustainability is a critical issue globally because of the importance of water security for humans and ecosystems [1,2]. Water sustainability is strongly linked to human water use and climate extremes (droughts and floods) [3,4]. Water-related disasters have impacted ∼4.2 billion people since 1992, representing the most economically destructive (∼1.3 trillion US dollars, 63% of all damages) of all natural disasters [5]. Unsustainable groundwater (GW) development with human GW use exceeding recharge rates occurs in regions with ∼1.7 billion people [3]. One of the overarching goals of the United Nations is 'Securing Sustainable Water for All' with particular emphasis on sustainable GW development [3,6]. Irrigation is the dominant water user globally, accounting for ∼70% of water withdrawal and ∼90% of water consumption, with heavy reliance on GW in (semi)arid regions [7,8].
The sustainability of water resources can be evaluated using monitored or modeled water fluxes and/or water storage. However, monitoring data and regional modeling are limited in most regions, including parts of the US. Water storage changes reflect the balance of fluxes: Water Storage Change = Input flux − Output flux. (1) Water storage declines or unsustainable development can result from decreasing input fluxes, increasing output fluxes, or both, considering natural fluxes (e.g. climate forcing) and human fluxes from various processes (e.g. irrigation).
The Gravity Recovery and Climate Experiment (GRACE) satellites have revolutionized water storage monitoring at regional to global scales, providing data on vertically integrated terrestrial total water storage (TWS) changes. These composite TWS values include snow storage, surface water (SW) storage, soil moisture (SM) storage, and GW storage (GWS) [9]. Many studies emphasize GRACE-derived GWS variability, which requires separate estimation of storage changes in the other components. Previous studies have related changes in GRACE TWS to climate variability/change, GW use, or both in 34 regions globally that have large trends [9]. GRACE data have been used to delineate GW depletion in the North China Plain [10], NW India [11,12], and US aquifers [13,14]. Hydraulic connections between SW and GW in many regions underscore the importance of managing both of these conjunctively to maintain streamflow for aquatic ecosystems [15,16].
A number of factors can contribute to water storage changes: (a) climate (arid versus humid) and climate extremes (droughts and floods), and (b) human intervention through water use (often dominated by irrigation), source of water use (SW, GW), and surface reservoir management.
Climate controls hydrologic systems with GW discharging to SW in humid regions, whereas SW often recharges GW in arid regions. Climate extremes are generally more prevalent in arid areas with many longer-term droughts than in humid regions. If there is no human intervention in a region, then any water storage changes should reflect climate variability. Climate can impact water storage changes directly through fluxes (precipitation (P), evapotranspiration (ET), runoff (R off ), surface infiltration (I), and GW recharge (GWR)) or indirectly through changes in human water use in response to climate extremes. Floods generally increase (droughts generally decrease) storage (SW, SM, and GWS) through increased (decreased) fluxes (P, R off , I, and GWR), although these generalizations may not apply everywhere.
Indirect linkages between climate and GW storage can occur through human water use, particularly when climate extremes lead to changes in irrigation water demand or sources of irrigation (SW or GW), resulting in amplification or dampening of direct climate-driven storage changes. Irrigation is the dominant water use in the US, accounting for 63% of freshwater use (excluding power generation) [17]. The source of irrigation water, GW or SW, is critical because GW irrigation amplifies drought impacts by reducing GWS, whereas SW irrigation generally dampens drought impacts by increasing GW recharge from irrigation return flows. However, this is not necessarily the case from a producer's perspective, as access to GW can mitigate drought impacts for producers. Humans can also affect TWS variability by constructing and managing surface reservoirs and through managed aquifer recharge (MAR). Difficulties in attributing causes of water storage changes arise in regions where both climate extremes and human intervention are prevalent. A recent study emphasized the importance of climate on GWS trends in the US with GW use contributing less than 25% to GWS trends [18]. Synergies between climate extremes and human water use have been recognized in some recent studies emphasizing the importance of climate-driven human water use [19,20].
The objective of this study was to address the following questions: (a) What are the factors controlling TWS variability as estimated from GRACE data? Natural climate variability, human intervention approximated by irrigation water use, or both? (b) How can we use insights from GRACE data to better inform sustainable management of water resources, particularly GW resources?
A flow chart describes the data sources and approaches used (figure 1). This is the first detailed analysis of linkages between GRACE TWS variability, climate, and irrigation water use for 14 major aquifers throughout the US (figures 1, 2 and S1). Novel aspects of this study include: (a) detailed analysis of the severity of TWS trends in the major US aquifers using different metrics; (b) comparison of TWS variability to climate variability based on precipitation and US Drought Monitor [USDM] data; (c) in-depth evaluation of impacts of irrigation water use and source (SW and GW) on TWS variability during dry and wet climate cycles; and (d) extension of water storage records over several decades using output from regional GW models.   [21]. Climate forcing was based on the US Drought Monitor (USDM). Irrigation water use included the volumes at 5 yr intervals and irrigation sources (surface water, SW, and groundwater, GW) and irrigation efficiency (table S15). Management strategies to increase sustainability include conjunctive use of SW and GW, with inefficient SW irrigation (mostly flood irrigation) and efficient GW irrigation, MAR, and irrigation demand reduction for systems with only access to GW.
In addition, this study leverages a recent study that assessed the reliability of GRACE-derived GWS variability through detailed comparisons with GWlevel monitoring data and regional and global models in major US aquifers [21]. The results of this companion study show that TWS and GWS time series plot very close to each other for most aquifers, indicating that GWS is the dominant contributor to long-term variability in TWS in most systems with limited contribution from snow, SM, and reservoir storage, except Powell and Mead reservoirs in Arizona. There was good correspondence between GWS trends from GRACE and those from regional models for most aquifers with the exception of the Mississippi Embayment aquifer. This companion study forms the foundation for the current study, which focuses on the causes of long-term TWS variability, emphasizing climate variability and human water use, focusing on irrigation. While we focus on the current climate in this analysis, we recognize the importance of climate change with megadroughts projected for the Southwest and High Plains regions of the US in the latter half of the 21st century [22,23]. We examined various approaches to more sustainable water management, particularly GW management, based on insights from GRACE data with implications for critically stressed aquifers globally.

Materials and methods
We selected 14 major aquifers throughout the US that are generally intensively monitored and modeled by the US Geological Survey (figure 2). These aquifers are described in supporting information (SI), section 1 (available online at stacks.iop.org/ERL/16/094009/ mmedia). Water storage changes reflect the balance of fluxes at the land surface in regional models, as follows:

Water storage from GRACE satellite data
(2) where P is precipitation, Irrig. is irrigation return flow, Q on and Q off represent surface and subsurface (GW) flow into and out of the system, respectively, ET is evapotranspiration, GWP is total GW pumpage, including irrigation, and ∆TWS is the change in TWS [25].
Raw time series of ∆TWS from GRACE (TWSA Raw ) was disaggregated into long-term (linear trend + interannual variability), annual, and residual (mostly sub-annual) variability using Seasonal Trend decomposition using Loess (STL) (SI, section 2.1) [26]: Linear trends were fit to the long-term variability (TWSA Long-term ) using a nonparametric regression tool (e.g. Sen slope) [27] and the remaining long-term signal reflects interannual variability (equation (4)): This study focuses on long-term (trend + interannual) variability in TWS based on the ensemble mean  figure S5. The GRACE data are provided in table S7, CPA data in table S10, and drought data in table S12.
of CSR-M and JPL-M solutions (figure 2, tables 1 and S1(c)). While many GRACE studies emphasize linear trends, the 15yr GRACE record is relatively short for estimating trends [28]. We assessed the robustness of the GRACE apparent trends against current and historical natural variability at interannual and multidecadal scales using two metrics: (a) the goodness of fit of the linear trend (coefficient of determination of the TWS trend, R 2 ) relative to current interannual variability and (b) the severity of the current TWS trend relative to historical natural multidecadal variability (1901-2014) (trend to interannual variability ratio, TIVR) [29,30]. The TIVR was calculated from the GRACE TWS annual trend multiplied by the GRACE period (15.25 yr) and divided by the standard deviation (SD) of the reconstructed climatedriven TWS variability (1901-2014) using precipitation and temperature forcing data [29]. TIVRs ranging from ±2 to ±3 (i.e. trends greater than 2-3 SD of interannual variability) are considered extreme, whereas TIVRs outside ±3 are considered exceptional and very unlikely to reflect natural climate variability [30].

Climate data
Monthly precipitation data were derived from the PRISM (Parameter-elevation Regressions on Independent Slopes Model) climate data (www.prism.oregonstate.edu/). The gridded (4 km) PRISM precipitation data were aggregated to the aquifer scale using aquifer polygons. The precipitation time series was analyzed using STL, similar to the TWS variability. Anomalies were calculated by subtracting the long-term mean over the GRACE period and the anomalies were accumulated between 2002 and 2017 to determine the cumulative precipitation anomaly (CPA) (SI, section 3). Drought data were derived from the USDM using aquifers polygons with details in SI section 3. TWS interannual variability was compared with CPA and USDM data using Pearson correlation.

Regional groundwater models
Regional GW models were used to assess GWS variability over much longer time periods than the GRACE record to put the GRACE data within a longer-term context. The water balance applied to aquifers is as follows: where R is GW recharge, D is GW discharge to streams, springs, and ET, and ∆GWS is change in GWS [32]. Regional GW models have been developed for seven of the 14 major aquifers, including the California Central Valley Hydrologic Model (CVHM) [33,34], Columbia Plateau Regional Aquifer System (CPRAS) [35], Eastern Snake River Plain Aquifer [36], Northern High Plains [37], Southern High Plains [38], Mississippi Embayment Regional Aquifer System (MERAS 2.1) [39], and a portion of the Texas Gulf Coast, the Houston Area Groundwater Model (HAGM) [40] (SI, section 4). These comparisons allow us to further evaluate the persistence of GRACE-derived TWS trends.

Irrigation wateruse
Irrigation water use data were obtained from the US Geological Survey (USGS) National Water-Use Science Project (NWUSP) that compiles data on water use (withdrawals) for different sectors by county in the US, every 5 yr since 1985 (table S15). Additional details are provided in SI, section 5.

Results and discussion
The main findings from this study are as follows with more details provided in later sections. Longterm variability (trend + interannual variability) in GRACE-derived TWS is the focus of this study and is dominated by interannual variability in most aquifers (figures 2 and 3; table 1). Linear trends are generally within ±2-3 SDs of reconstructed interannual TWS variability (1900-2014) in 10 out of the 14 aquifers (TIVRs ⩽ 2-3), indicating that the calculated apparent trends in most of these aquifers reflect natural interannual variability and are unlikely to persist into the future (table 1, figure 3). Linear trends do not provide a good fit for TWS variability in many aquifers (low R 2 values) (table 1).
There are distinct differences in TWS variability between the humid east and semi-arid west with the 98th meridian often considered the boundary between these regions (figure 3) [41]. The humid east is characterized by large interannual TWS variability with stable or slightly increasing apparent linear trends related to low drought intensities (Accumulated Drought Severity and Coverage Index, ADSCI, mostly 37-64; table 1, figures 2 and S5) and variable GW irrigation pumpage that can capture extensive surface water networks, such as in the Mississippi Embayment aquifer [32,42]. The semi-arid western US has varying TWS trends between the southwest and northwest US. Large decreasing trends were restricted to the semi-arid southwest and southcentral US, markedly exceeding ±2-3 SDs of interannual variability (TIVRs: −3.5 to −5.6; figure 3) and moderate to high TWS trend R 2 values (∼0.5-∼0.7) (table 1, figure 2). Impacts of intense droughts in the southwest (up to 5 yr long) were amplified by switching from mostly SW irrigation (wet periods) to increased GW irrigation (drought) in the Central Valley resulting in long-term net declines in TWS, consistent with regional modeling studies (figure 4) [33]. In contrast, apparent trends in TWS in the remaining aquifers in the northwest and north-central US Table 1.  [31] was estimated between interannual TWS variability and cumulative precipitation anomaly (CPA, defined in SI, section 3.1), and between interannual TWS variability and the US Drought Monitor (USDM: D0 through D4). Detailed comparison of TWSA and different USDM drought categories are included in table S13. Statistically insignificant correlations are bolded based on p value <5% and a 99% confidence interval. Irrig. is the total irrigation from surface water and groundwater derived from US Geological

Mean annual precipitation (P) in mm yr
Survey county data in 2010 and 2015 [17]. Supporting data are provided in supporting information. More detailed information is provided in table S1 and in SI.   (table 1). Impacts of lower drought intensities in the NW were dampened by more widespread SW irrigation and recent MAR resulting in limited TWS trends, mostly within ±2-3 SDs of interannual variability (table 1). In summary, GW use in humid regions is more sustainable than in arid regions in general, and GW sustainability can be enhanced in arid regions through conjunctive use of SW and GW and MAR. Human intervention is variable in these humid regions ( figure 4). Irrigation water use in the Mississippi Embayment was high, surprisingly similar to or 50% greater than that in the California Central Valley, mostly sourced from GW (∼84%-88%). This level of irrigation and GW source would be expected to greatly reduce TWS, as suggested by regional GW models (∼−120 km 3 over the 15 year GRACE period) [39]. Lack of irrigation impact on GRACE TWS trends may result from 90% of GW irrigation pumpage being derived from the shallow Mississippi River Valley alluvial aquifer [43] that is likely well connected to a dense stream network in this humid region (SI, section 6). While storage depletion may have occurred prior to GRACE monitoring (from ∼1980s on), the shallow aquifer may have reached a quasi-equilibrium status with irrigation pumpage linked to water capture (SW, ET) rather than storage depletion. Preliminary results from the new regional model suggest up to 10× less GW depletion, likely linked to increased recharge and stream capture than the earlier Mississippi Embayment model [44]. The new model has a much denser stream network (∼1000s of streams) relative to the original model (43 of the largest streams) (SI, section 6). Irrigation water use is likely derived from stream baseflow, induced stream recharge, and/or reduced ET rather than GWS. Stream baseflow reduction and drying up of some streams support irrigation pumpage derived from capture rather than storage [32,42]. Inefficient irrigation refers primarily to flood irrigation rather than sprinkler or drip irrigation and accounted for 43%-54% of the irrigated area in the Mississippi Embayment and should also reduce irrigation impacts on TWS changes in the Mississippi Embayment (table S15(b)) [17].

Western US
Semi-arid regions west of the 98th meridian include a number of aquifers with varying climate forcing and irrigation amounts and sources. Many aquifers receive large water inputs from outside of the aquifers (Central Valley, Arizona Basin and Range, Columbia Plateau, and Snake River Plain), whereas groundwater in the High Plains aquifer is derived primarily from precipitation over the plain. As a result, TWS variability differs markedly between the southern and northern aquifers in the semi-arid western US (figure 2).  figure S6). The droughts generally ended in floods, e.g. a 5 yr drought that began in 2012 ended with severe flooding from atmospheric rivers in early 2017 [46]. Surprisingly; droughts did not increase water demand as irrigation withdrawal remained stable (Sacramento) or decreased by ∼30% (San Joaquin/Tulare) in the 2015 drought year relative to the 2010 wet year, which was related to more than doubling of land fallowing (figure 4) [47]. However, droughts resulted in a change in irrigation water source from mostly SW (64% of total in 2010 wet year) to increased GW (74% of total, 2015 dry year) in the San Joaquin/Tulare Basins that amplified drought impacts on TWS trends by decreasing GW storage (table S15). SW diversions from the humid northern Central Valley to the semi-arid southern region were greatly reduced during drought and were replaced with GW irrigation [33].

Southwest and south-central US
Drought impacts also extended into the Arizona Alluvial Basins to the east with almost continuous drought since 2000 and overall drought intensity (ADSCI: 190) similar to that in the southern Central Valley (table 1) 2 and 4). The Central Arizona Project (CAP) aqueduct delivered Colorado River water to irrigation in many Arizona basins. CAP water also contributed to MAR in Active Management Areas, increasing GWS [48]. The CAP SW deliveries totaled ∼25 km 3 over the GRACE record (2002-2017) [49] and were fairly stable during the GRACE period, unlike those in the Central Valley. These deliveries contributed to reduced drought impacts in the Arizona Alluvial system. Reservoir management also reduced drought effects contributing 24% to TWS variability in the Arizona Alluvial system (figure S6). Water was transferred from Lake Powell (Upper Colorado) to Lake Mead in response to the 'Fill Mead First strategy' and a portion of that stored water was ultimately transmitted to GWS in the Arizona Alluvial Basins [49]. TWS would have declined much more without these management strategies.  (table 1). GW supplied ∼97% of irrigation in the region ( figure 4, table S15). TWS declines in the central and southern High Plains were driven primarily by GW pumpage because there is almost no SW available for irrigation. Low recharge rates in these aquifers preclude direct connections between climate and TWS variability [50]. Interannual variability in TWS change is attributed to indirect linkages between climate and TWS variability through variations in irrigation water demand and GW pumping linked to climate variability, as shown in earlier studies in the central High Plains [51]. Previous studies show that GW depletion exceeded recharge by ∼10× in the Central High Plains [52]. TWS declines in Texas aquifers south of the High Plains (Edwards Trinity and Gulf Coast aquifers) were moderate, partly because irrigation was only ∼5%-25% of that in the High Plains aquifer. Trends in these Texas aquifers are within ±2-3 SD of interannual variability (TIVR < 2-3) and low TWS trend R 2 values (0.03-0.3) indicate predominantly natural interannual variability (table 1). The high correlation between TWS variability and drought (R = 0.62-0.79) indicates that climate is the primary driver of TWS variability in these Texas aquifers.

Northwest and north-central US
In contrast to declining TWS trends in the southwest and south-central US, apparent TWS trends were stable or slightly rising in the northwest (Snake River Plain, 4.4 km 3 ; Columbia Plateau, 9.7 km 3 , 2002-2017) (figure 2). The TIVR is 3.5 in the humid Columbia Plateau but is only 0.7 in the Snake River Plain, indicating the latter may reflect interannual variability mostly (table S1). Drought intensity was low in the Columbia (ADSCI, 91) and higher in the Snake River Plain (143), less severe than in the southwest US. Correlation between TWS and precipitation was low in both the Columbia and Snake basins (R = ±0.2) (table 1). Widespread flooding in 2011 may have contributed to increasing TWS [53].
SW irrigation accounted for 68%-72% of total irrigation in the Columbia Plateau and Snake River Plain (table S15(a)), likely dampening drought impacts on TWS changes because of GW recharge from flood irrigation and transmission losses along unlined canals, partially disconnecting storage changes from climate variability (figure 4, table S15). However, irrigation pumpage from deeper confined basalts in the Columbia River Basalt Group is disconnected from the shallow system and would not benefit from SW irrigation. In contrast, the Snake River Plain aquifer is unconfined with strong interconnections between shallow and deeper systems.
Although drought intensity was high in the Upper Colorado (ADSCI, 172), similar to the Central Valley, irrigation was derived primarily from SW (∼97%) and likely dampened drought impacts, resulting in very low apparent TWS trends, similar to interannual variability (TIVR, −0.03) with low TWS trend R 2 values (0.01) (table 1, figures 2-4).
Further east, the Northern High Plains shows moderately high interannual variability (SD: 42 mm) correlated with the USDM data (R = 0.73) (table 1). The apparent TWS trend (∼22 km 3 ) primarily reflects natural interannual variability because the trend is within ±2 SD of interannual variability (TIVR: 1.5, figure 3) and the TWS trend R 2 is low (0.3) (table 1). Climate effects on TWS variability may have been partially dampened by SW irrigation, accounting for ∼20%-30% of total irrigation, with inefficient flood irrigation representing ∼80% of total irrigated area (figure 4, table S15), recharging aquifers adjacent to the Platte and other rivers, as shown by long-term GW level monitoring [54]. Variations in TWS trends between the Northern (22 km 3 ) and Central and Southern (-40 km 3 ) High Plains may be explained by shorter droughts in the north (2012-2013) relative to further south (2011-2015) and some SW irrigation and sandier soils in the Northern High Plains (e.g. Nebraska Sand Hills) resulting in higher recharge and more dynamic storage response to climate variability relative to absence of SW irrigation and more clay-rich soils further south, limiting GW recharge [55].
3.2. Long-term system evolution from regional groundwater models GRACE-derived TWS changes are restricted to the recent 15 year period; however, trends in GWS were evaluated over much longer decadal timescales using regional GW models supported by GW level monitoring for seven of the 14 US aquifers (figures 5 and S8). Results from a previous analysis show that GRACE-derived GWS variability compares favorably with regional models for many aquifers, except the Mississippi Embayment, although the overlap period of GRACE and models is limited [21].
In the Central Valley, results from the regional GW model  are consistent with GRACE results from this study showing drought-driven TWS changes amplified by switching from predominantly SW irrigation during wet periods to increased GW irrigation pumpage during drought. Modeled GWS declined by ∼15-20 km 3 during each short-term drought (1976-1977; 1999-2003; 2007-2009) and by ∼40 km 3 during a 5 year drought (1987)(1988)(1989)(1990)(1991)(1992), with only partial recovery during wet periods in the early 1980s and late 1990s [33,34]. The model also shows the impact of the irrigation source shifting from up to ∼70% SW irrigation during wet periods to up to ∼70% GW irrigation during dry periods, amplifying drought impacts on GWS. The model emphasizes the importance of conjunctive use of SW and GW, with pipeline development (⩽1000 km) transferring SW from the more humid north to the semi-arid south, resulting in GWS recoveries in some regions [33]. The importance of conjunctive SW and GW use is highlighted by recent new land subsidence linked to irrigation expansion in areas relying entirely on GW without access to SW [34].
In the Northern High Plains, the regional model shows no net change in GWS from ∼1980 to mid-2000s, similar to the GW level monitoring data ( figure S8(d), table S17) [37,54]. The importance of conjunctive use of GW and SW is evident in modeled and monitored GW-level rises from SW irrigation near rivers (e.g. Platte River) and modeled reductions in baseflow to some streams by up to 50% [56]. The regional model of the Southern High Plains shows ∼350 km 3 of GW depletion since the 1950s related to intensive GW irrigation greatly exceeding GW recharge rates [38]. The much greater depletion relative to the Northern High Plains is attributed to ∼10× lower recharge relative to irrigation pumpage, lack of SW for irrigation and related recharge (return flow and leakage from distribution systems), and lower permeability soils in the Southern High Plains [55].
In the northwestern US aquifers, losses to the aquifer from SW irrigation increased GWS by ⩽∼20 km 3 from ∼1940 to ∼1970 in the Columbia Plateau with ∼70%-80% from inefficient flood irrigation [35] and by ⩽∼20 km 3 from 1912-1950 in the Eastern Snake River Plain aquifer [36]. Increasing GW-based irrigation in the Snake River Plain depleted GWS from an excess of ∼20 km 3 in the early 1950s down to ∼6 km 3 in the mid-2010s. Recent MAR has partially replenished GWS.
Re-evaluation of the original Mississippi Embayment regional model [44] suggests that the fraction of GW pumpage derived from storage depletion may have been substantially overestimated while the amount derived from capture of stream baseflow, induced stream recharge, and ET, may have been greatly underestimated, particularly in recent decades [39,57]. These GW/SW modeling issues are not as prevalent in other systems where interactions between GW and SW can be monitored or bounded.

Study limitations
The large regional scale output provided by GRACE is considered an advantage when conducting aquifer to continental scale water storage analyses. However, the low spatial resolution of GRACE data (∼100 000 km 2 ) is often viewed as a limitation by hydrologists when evaluating water storage in smaller scale aquifers and river basins. This regional scale GRACE data may mask local scale variations in water storage. The coarse resolution provided by GRACE data could be partially overcome in the future by supplementing GRACE TWS changes with groundbased gravity monitoring that has much higher spatial resolution (∼100 m), as shown in previous studies [58,59].
This study focused on current and historical climate extremes; however, climate change is also a critical issue with projected megadroughts in the US Southwest and Plains regions that should be addressed in future studies [22,23]. Comparison of GRACE data with climate extremes in this study focuses primarily on droughts and benefits from the detailed data available from the US Drought Monitor. Conversely, comparable data are not available for flooding in the US. The Dartmouth Flood Observatory relies primarily on subjective reporting rather than independent monitoring data. Therefore, it is much more difficult to compare GRACE data with floods than droughts in the studied aquifers.
Irrigation water use is one of the primary drivers considered in this study; however, the data are based primarily on estimates of SW and GW use for irrigation that are provided once every 5 yr for most aquifers.
These are some of the primary limitations of this study; however, they do not impact the main findings of this analysis.

Implications for sustainable water management in the US
GRACE satellites may be extremely valuable in assessing the sustainability of future water management projects designed to resolve spatial and temporal disconnects between water supplies and demands caused by climate extremes, irrigation, and SW availability.
Low regional storage changes in the humid eastern US underscore the importance of high precipitation, low to moderate drought intensities, and extensive perennial stream networks that can be captured even by GW irrigation resulting in more sustainable GW management. However, impacts of GW pumpage on streams need to be considered to maintain environmental flows for healthy ecosystems.
SW irrigation (mostly flood irrigation) has been extremely valuable in recharging GW and increasing aquifer storage in the northwest US, as shown by GW level rises during irrigation development in the early to mid-1900s, regional models, and is consistent with stable or slightly increasing TWS trends in GRACE data (figure 5) [21]. In Idaho, up to 0.5 km 3 yr −1 of Snake River water has been transferred to the Eastern Snake River Plain aquifer within the past few years to promote recharge in unlined canals and adjacent spreading basins (MAR) [60]. In the northern High Plains, pilot studies transporting excess SW during wet periods in unlined irrigation canals promote recharge [61]. Although MAR has been practiced in some parts of the Central Valley since the 1960s, water volumes transferred from SW to GW were low (∼14 km 3 from 1960 to 2013) and impacts were generally localized [48]. More recent studies have been applying flood MAR in California, capturing excess SW using irrigation infrastructure to flood cropped and fallow fields in winter to promote aquifer recharge [62]. Flood irrigation and MAR were also effective in recharging GW in Arizona Alluvial Basins sourced from the Colorado River [48].  (table S17). The references for these models are listed in table S18.
These approaches counter the mantra of 'more crop per drop' to maximize irrigation efficiency because the latter fails to recognize that losses from inefficient SW irrigation (mostly from flood irrigation) actually recharge GW systems and are somewhat similar to MAR. Some recent studies in California and Texas have estimated how much high magnitude streamflow (⩾90th-95th percentiles) could be captured to recharge depleted aquifers that would otherwise discharge to the ocean [63,64].
Historical GW depletion provides subsurface reservoirs to complement surface reservoirs. The GRACE data show TWS depletion of ∼90 km 3 in the southwest and southcentral US (2002-2017; Central Valley, Arizona, and Central and Southern High Plains). Previous studies show that TWS in these systems is dominated by GW [21] and this level of depletion would provide reservoir storage almost 3× the capacity of Lake Mead. Estimated depletion of US aquifers over approximately the last century (1900-2008) from modeling and monitoring data totals ∼1000 km [3,65]. Not all of this depleted aquifer storage would be available as some storage is permanently lost because of aquifer compaction (e.g. ∼20% in the Central Valley) [33]. However, additional subsurface storage may be available from natural deep water tables in aquifers in semi-arid regions.

Implications for other systems globally
Net increases in TWS in the northwest US from GRACE data in this study are consistent with net increases in GWS modeled by the WaterGap Global Hydrologic Model (WGHM) in this region and also in other regions globally where SW irrigation recharges GW (e.g. NW India, SE Asia) [66]. Although numerous GRACE studies delineated GW overexploitation in NW India [12], analysis of earlier data indicate that GW levels in some regions of the Indo-Gangetic Basin rose by median values of ∼20-30 m from leaky SW irrigation canals (1900s-1950s, 1960s) with recent net depletion since the 1980s ⩽10 m, highlighting the importance of considering recent GRACE data within a longer-term context [67]. Future water management in NW India could move towards more sustainable development by conjunctive use of SW and GW. More recent analysis on the River Ganges in Bangladesh indicates that GW pumping may be enhancing capture of SW by inducing recharge, similar to what may be occurring in the Mississippi Embayment aquifer [68,69]. To increase environmental flows in the Murray Darling Basin, the Australian Government spent almost 6 billion dollars on water infrastructure (e.g. lining irrigation canals and piping irrigation water) [70]. However, failure to monitor and account for irrigation return flows resulted in little improvement of river flows in the basin [70]. These limited examples of the importance of SW irrigation in sustainable development is similar to the recent expansion of MAR in many regions.

Application of findings for sustainable water management
The results of this study indicate that GRACE satellites can be extremely valuable in monitoring regional water storage changes to evaluate sustainable management approaches. GW irrigation pumpage is the primary driver of GRACE TWS declines in the central and southern High Plains. Climate is a major driver of TWS variations in many of the other aquifers, resulting in high interannual water storage variability in response to wet and dry climate cycles. GW irrigation amplifies GRACE TWS changes in response to drought (e.g. Central Valley) but SW irrigation dampens TWS changes (e.g. NW US aquifers) (figures 2 and 5). Therefore, sustainable GW management would benefit from irrigation sourced by SW because it is more renewable than GW, recognizing that inefficient SW irrigation (mostly flood irrigation) can contribute to GW recharge (e.g. Columbia Plateau Aquifer). The use of SW irrigation needs to ensure that it does not negatively impact environmental flow regimes in streams that include preservation of extreme/peak flows. Inter-basin SW transfers to semi-arid regions with limited SW increase opportunities for GW recovery and more sustainable management (e.g. US Central Valley and Arizona Alluvial Basins). Conjunctively managing SW and GW is also beneficial, optimizing their use considering floods and droughts. Inefficient SW irrigation (e.g. flood) and efficient GW irrigation (e.g. drip) should be optimal from a water storage perspective; however, energy consumption for SW pumping and potential contamination during water transfer for inefficient SW irrigation should also be considered. In the past, inefficient SW irrigation (e.g. flood irrigation) recharged aquifers unintentionally. More recently, excess SW is recharged using MAR from high magnitude stream flows, as quantified in California and Texas, in large depleted GW reservoirs that are a legacy of previous overexploitation. Systems with little or no SW, such as Central and Southern High Plains., may decrease aquifer overexploitation and extend aquifer lifespan by need maximizing irrigation efficiency and reducing pumpage.