Groundwater recharge rate estimation using remotely sensed and ground-based data: A method application in the mesoscale Thur catchment

Study region: The mesoscale Thur catchment located in the north-eastern part of Switzerland is a study site in which a range of hydrological research has been conducted. Study focus: Groundwater recharge is challenging to quantify due to the complexity of hydrogeological processes and limited observations. Because of its spatiotemporal availability, remotely sensed data present an attractive water management tool. Its application in mesoscale catchments (10 – 104 km2) however, remains limited. This study investigated the use of satellite products used in conjunction with ground-based data to determine spatiotemporal variations in water available for groundwater recharge. Gridded components from readily available precipitation, evapotranspiration, and hydrological discharge data were used to generate spatiotemporal maps over a 20-year period. New hydrological insights for the region: Closure of the water balance found that monthly data displayed a moderate correlation, with improved correlations of obtained seasonal and annual intervals, suggesting that over the long-term the Thur catchment is in a steady state. Maps were generated and trends for the 20-year period assessed. Examination of the gridded water balance components for different hydrological years emphasized the limiting effect of precipitation on recharge. This study highlights the value of remotely sensed data in groundwater recharge estimates, but emphasises the importance of continued ground-based monitoring; the lack of which is a limiting factor in water management where mesoscale catchments are concerned.


Introduction
One of the most important functions of a catchment is its ability to store and release water; a characteristic that can buffer against severe weather events, seasonal changes, and climate variability (Berghuijs et al., 2016;Datry et al., 2017;Staudinger et al., 2017). Groundwater recharge indicates the existence of renewable groundwater resources and is therefore an important component of a catchment's hydrological functioning (Döll and Fiedler, 2008;Jasechko et al., 2014;Mohan et al., 2018). Groundwater recharge varies in space and time making it difficult to measure directly (Minnig et al., 2018;Scanlon et al., 2002;von Freyberg et al., 2015). Climate change, evolving land utilization, abstraction, and anthropogenic impacts are a growing concern when characterizing groundwater bodies and predicting their sustainability (Burri et al., 2019;Condon and Maxwell, 2019;Han et al., 2017;Oki and Kanae, 2006;Sridhar et al., 2014).
In most places, groundwater recharge is an integral part of a catchment's water balance, therefore, monitoring and understanding a catchment's water input and output (or water balance), is essential to the sustainable management of water resources. However, both where field and model based estimates are concerned, groundwater recharge is often considered a residual product of the water balance assessment, accumulating the uncertainties of all other components of the budget (Reinecke et al., 2021), thus reliable estimates are important. In a steady-state system, where net water input equals net water output, the components of a catchment's water balance can be related through an equation, which can vary in its complexity (Healy and Scanlon, 2010). The dominant components which make up a catchment's water balance include precipitation (P) as primary input, and evapotranspiration (ET) as predominant output (Crosbie et al., 2015;Dhungel and Fiedler, 2016;Reitz et al., 2017). However, in mountainous regions, stream or river discharge (Q) which can be further divided into quick, event-based surface flow (or quickflow, Q q ) and typically slow subsurface flow (or baseflow, Q b ) contributions, often makes up the majority of water output (Spreafico and Weingartner, 2005;Viviroli et al., 2007a;Zappa et al., 2017). Discharge (Q), an approximated value of river flow and velocity, typically represents high-resolution measurements at a specific point. However, quantifying subsurface contributions of Q is often problematic as knowledge of hydraulic parameters is compromised due to subsurface heterogeneity and the limited number and depths of measuring locations (Alley et al., 2002).
The water balance components vary in space and time, and although linked through large-scale interrelationships, have very different spatiotemporal characteristics (Creutzfeldt et al., 2014). However, according to Healy and Scanlon (2010), "the universal concept of mass conservation of water implies that water balance methods are applicable over any space and time scale" (pp.15). The major net loss component from a catchment's water balance is generally ET; the sum of evaporation (from ground surface) and transpiration (from plant surfaces). The complex interacting components which result in ET are difficult to quantify, whether via field-based measurements, or via remotely sensed technology (Zhang et al., 2016). As a result, ET typically has the greatest uncertainty of the water balance components (Velpuri et al., 2013), and remains one of the most difficult components to measure accurately (Ferguson et al., 2010;Hirschi et al., 2017).
Over longer time scales, changes in a catchment's water storage (ΔS) are often assumed to be small relative to the volumes of the other water balance components (Han et al., 2020). For catchments with no additional inflow from or outflow to adjacent catchments, via transfer schemes or sub-surface flows, ΔS would equal zero, and the catchment's water balance can be assumed to be in a steady state (Healy and Scanlon, 2010). Subsequently, both within the fields of research and water resource management, it is common to estimate recharge using some form of a water balance equation (Dages et al., 2009;Healy et al., 2007;Rodríguez-Huerta et al., 2020;Tilahun and Merkel, 2009;WFD-CIS, 2016).
In the past, the measurement of water balance components was traditionally limited to point or plot scales (~10 -2 m 2 to ~10 1 m 2 ) measurements of single components (Ruth et al., 2018). Currently, remotely sensed products, including ET derived from computational surface-energy balance models that stem from multispectral satellite-sensed characteristics (i.e. net radiation, surface temperature, and vegetation properties), offer perhaps one of the most attractive global estimates of water balance measurements. As a result, this type of data has been used to improve model predictions and to close the water balance leading to a significant reduction in uncertainty (Anderson et al., 2011;Bastiaanssen et al., 1998;Irmak et al., 2012;Nishida et al., 2003;Tang et al., 2010;Wang and Xie, 2018). Although comprehensive, the available remotely sensed ET data are not without their limitations, and performance of the products can be adversely affected by factors such as topography, vegetation type, input datasets, estimation methods, spatiotemporal scales, etc. (e.g. Li et al., 2018;Lu et al., 2019;Zhao and Liu, 2014).
While temporal and spatial scaling, whether up-or down-scaling, presents an additional challenge in estimating a catchment's water balance (Anderson et al., 2007;Cui et al., 2018;Hong et al., 2009;Kalma et al., 2008), when coupled with ground-based data, remotely sensed observations have been shown to provide a more complete understanding of the hydrological system (Becker, 2006;Gleason and Durand, 2020;Pavelsky, 2014;Vereecken et al., 2015). Both surface and subsurface movement and storage of water are dependent on precipitation, temperature fluctuations, vegetation type and distribution, anthropogenic land and water utilization, as well as topographic, lithological, and soil characteristics (Moeck et al., 2020;Sophocleous, 2002;Wada et al., 2010). As such, remotely sensed (RS) data provides an attractive tool in hydrological studies, particularly because of the long temporal and large spatial availability of many RS products (Alem et al., 2021;Becker and Nemec, 1987;Falalakis and Gemitzi, 2020;Jódar et al., 2018;Martín-Arias et al., 2020;Ollivier et al., 2021Ollivier et al., , 2020Sarrazin et al., 2018).
However, due to the pixel resolution of most RS products, the use of RS data in hydrological studies of mesoscale catchments (typically 10 -10 4 km 2 ) or smaller, is limited (Armanios and Fisher, 2014;Sun et al., 2018). In addition, although RS components for an entire water budget calculation are readily available, reproducible methods for handling the necessary data are not prominent in the literature (e.g. Rajib et al., 2018), hampering the ease of data application. This study investigates the application of satellite derived ET products, used in conjunction with ground-based discharge data, using open source software to determine spatiotemporal water distribution and estimate the volumes of water available for potential groundwater recharge (hereon referred to as groundwater recharge or R) within the mesoscale Thur River catchment in Switzerland.
The Thur River represents a dynamic system free of any major natural or artificial reservoirs, and with its mesoscale catchment (~1700 km 2 ) provides a unique opportunity to explore the potential of using readily available RS data to evaluate monthly, seasonal and annual R for the years 2000 -2019. Previous research in the Thur catchment focused either on localized field-based studies (Chittoor Viswanathan et al., 2016;Kurth et al., 2015;Kurth and Schirmer, 2014;Paillex et al., 2017Paillex et al., , 2005Schirmer et al., 2013;Schneider et al., 2011;Vogt et al., 2011), or involved the use of lumped modelling approaches to simulate its large-scale processes (Abbaspour et al., 2007;Dal Molin et al., 2020;Doulatyari et al., 2017;Rössler et al., 2019;Viviroli et al., 2009). The aims of this study were as follows: i) assess the spatiotemporal variability of gridded water balance components in the Thur catchment for the years 2000 -2019 using open source software and readily available RS ET data in combination with ground-based data, ii) identify variability in environmental processes during different hydrological years (e.g. wet vs. dry years) in the mesoscale Thur catchment, iii) determine water availability in the Thur catchment for potential groundwater recharge, iv) explore the results in terms of spatiotemporal distribution in the Thur catchment, and examine method uncertainties.

Study area and its hydrogeological setting
The Thur River, located in the north-eastern part of Switzerland (Fig. 1), has an approximate catchment size of 1700 km 2 . The Thur River (a tributary to the Rhine) is the longest river in Switzerland free of any major natural or artificial reservoirs along the length of its course (~130 km), resulting in preannounced seasonality streamflow variability. Three major tributaries: the Murg, the Necker and the Sitter flow into the Thur River, and based on available data from active stream discharge stations, the Thur catchment can be divided into nine (9) sub-catchments (Fig. 1a).
The headwaters of the Thur River arise in the southern, glacier-free, limestone-dominated, pre-alpine region of the catchment near Mount Säntis, where vegetation is sparse and soils are generally shallow. Here, productive groundwater occurrences are confined largely to small fluvio-glacial gravel and sand deposits hosted largely within valley bottoms, and fractured-rock (Gurtz et al., 1999). Average depth to groundwater in this southern region is 3.5 m. Moreover, karstic units, can represent complex groundwater flow path Fig. 1. Characteristics of the Thur catchment's a) major drainage lines, gauging stations, and sub-catchments, b) topographic variability (m), c) mean precipitation values (mm), and d) mean actual evapotranspiration rates (mm) (Federal Office of Environment, FOEN; Federal Office of Topography; MeteoSwiss; Running, et. al., 2019). White areas in figure e) represent no data areas for MOD16 ET data. (Schürch et al., 2007). The Pleistocene molass-sandstones, marls, and conglomerates located predominantly in the northern region of the Thur catchment are highly productive in terms of groundwater, and host one of the largest groundwater systems in Switzerland where average depth to groundwater is 1.4 m (Abbaspour et al., 2007;Keller, 1992). Abstraction rate estimates from the largest aquifer in the Thur catchment are in the order of 16 million m 3 /y (https://umwelt.tg.ch/wasser/wassernutzungen/zahlen-und-fakten.html/ 2158 last accessed: 11 September, 2020), which amounts to approximately 9.4 mm of water abstracted from the groundwater annually. Although it is important to monitor the local effects of groundwater abstraction, which can have strong impacts on groundwater flow, data is often limited or completely lacking (Creutzfeldt et al., 2014). When compared to the mean annual precipitation values (see below), the abstracted groundwater from the largest aquifer amounts to < 1% of the annual water input into the Thur catchment.
Land use in the Thur basin is dominated by agriculture (~ 60%), with large areas of pasture. Forests make up 30% of the land surface, and the remaining ~10% includes barren land, surface waters, and urban areas. Elevation in the Thur catchment varies from 2502 to 363 m asl. (Fig. 1b), with an average slope inclination of 7.9º (Melsen et al., 2016). The hydrological regime of the Thur catchment, with its warm-summer humid continental or sub-montane climate (Gurtz et al., 1999;Peel et al., 2007), is characterized by its variable morphological and climatic elements. Streamflow in the Thur River can fluctuate by up to two orders of magnitude in the space of a few hours, with recorded discharge rates at the Andelfingen (An) discharge station ranging from 3 to 1130 m 3 /s (mean discharge of 47 m 3 /s) (Doulatyari et al., 2017;Gurtz et al., 1999).
On average, precipitation in the Thur catchment varies from 700 mm/y in the low elevation northern region, to 2700 mm/y in the southern mountainous region (Fig. 1c). The lowest average evapotranspiration rates (~400 mm/y) are found in regions of high elevation, with higher average ET is associated with the lower reaches of the Thur valley (~500 mm/y), (Fig. 1d). The highest average evapotranspiration rates (over 1000 mm/y) are associated with the central region of the Thur catchment (~ 700 m asl.), here land use is associated with mixed pasture, agriculture, and forestry.

Data and methods
This section describes in detail the gridded and ground-based data employed in this study, as well as the methods applied to calculate groundwater recharge in the Thur catchment. Through employing open source software for the gridded water balance computations, the presented workflow is anticipated to be readily transferrable and applicable to other catchments.

Precipitation product
Precipitation, being the primary flux in most hydrological cycles, requires high resolution data in order to reliably estimate water availability and distribution within a catchment. Although an array of remotely sensed rainfall data is available (e.g. Global Precipitation Climatology Project (GPCP), Tropical Rainfall Measuring Mission (TRMM), Global Precipitation Measurement (GPM), etc.), these all have a relatively low resolution (~10 -250 km 2 ) when considering a mesoscale catchment with a high topographic variability . Thus, long-term gridded precipitation data, interpolated from 73 Swiss national stations located within or close to the Thur catchment, was used for this study (©MeteoSwiss, 2016).
The MeteoSwiss product consists of continuous high-quality measurements, interpolated from a network of ground based radar and gauge data using the Reduced Space Optimal Interpolation (RSOI) method; specifically developed for regions with a complex orography (Begert et al., 2007;Schiemann et al., 2010). The interpolation errors for the Swiss Plateau region has a relative standard error of + /-20% (MeteoSwiss, 2013), and the resulting gridded data has a 2.3 km x 2.3 km resolution. The gridded MeteoSwiss annual (January -December) and monthly rainfall data (RhiresY v1.0 and RhiresM v1.0) represents accumulated precipitation, including both rainfall and snowfall equivalent (in mm). The MeteoSwiss data was resampled using bilinear interpolation to match the resolution of the MODIS data and used as input in the water budget calculation (refer to Section 3.4).
Although a substantial portion of the Thur catchment is under agriculture, high annual precipitation rates resulting in wet environmental conditions in the catchment, renders the irrigated amounts as relatively insignificant when compared to the relatively high surface water availability (Kanton Thurgau Amt für Umwelt et. al., 2008). Therefore, although seasonal water shortage (e.g. summer drought periods) are recorded locally (discussed in Section 4.3), it is assumed that the effect of irrigation is negligible in the sense of the Thur catchment's total water balance. Additionally, relatively little variation in precipitation characteristics, such as season, frequency, and duration of dry and wet days, appears across the Thur's different sub-catchments (Dal Molin et al., 2020).

Evapotranspiration product
Calculations of AET are generally local measures from Eddy Covariance (EC) towers, Large Aperture Scintillometers (LAS), or lysimeters. These methods are often regarded as the most accurate and reliable determination of AET (Baldocchi, 2003;Brotzge and Crawford, 2003;Rana and Katerji, 2000;Schrader et al., 2013;Xu and Chen, 2005). However, due to the spatial heterogeneity of AET, upscaling of point information to a regional scale is challenging, and models have been employed to compensate for the lack of observations and fill spatial gaps (de Graaf et al., 2017;Döll and Fiedler, 2008;Orth and Seneviratne, 2015;Wada et al., 2016).
Remotely sensed products offer feasible alternatives to obtaining AET measurements at regional scales (Bhattarai et al., 2016;Zhang et al., 2008). The Moderate Resolution Imaging Spectroradiometer (MODIS) on board the Terra satellite, has been acquiring 36 spectral bands (wavelengths) of the globe every 8 days since December 1999 (https://terra.nasa.gov/about/mission). The MODIS sensor was designed to detect electromagnetic bands which include spectral signatures of atmospheric water vapour as well as vegetation and land cover, from which continuous biophysical variable including land surface temperature, albedo, soil moisture (Running et al., 2019(Running et al., , 1994. From these variables, a regional composite ET product is estimated, at a 500 m x 500 m resolution, using the Mu et al. (2011) improved ET algorithm which is based the Penman-Monteith (PM) equation (Monteith, 1965). When dealing with ET values, it must be noted that through the process of hydraulic redistribution (HR) by plant roots, groundwater itself may be contributing to the determined value. According to Luo et al. (2016), where depth to water table ranges between 3 and 6 m, the percentage water redistributed may range between 14.5% and 23% for coniferous forests, depending on the soil type. However, were depth to water table is relatively shallow (between 1 m and 2 m), the effect of groundwater transpirated via HR is often not evident, as a roots would be in constant contact with water at this depth (Luo et al., 2016).
Although remotely sensed ET products are currently available with higher pixel resolutions (e.g. Guzinski et al., 2020), the time series for these products are currently still limited (Sen-ET from ESA is available as of 2016). The MODIS ET product has well-documented quality assessment protocols, reproduces basin-scale AET response with acceptable uncertainty (Khan et al., 2018;Velpuri et al., 2013), has good temporal resolution, and is freely and easily accessible. In light of this, the improved gap-filled annual (January -December) and 8-day (MOD16A3GF and MOD16A2GF respectively) MODIS (hereon referred to as MOD16) ET data was used as the evapotranspiration component in this study. The full series was obtained from the Land Processes Distributed Active Archive Centre (LP DAAC) for the years 2000-2019 (https://lpdaac.usgs.gov/) [Accessed on 2019-11-19]. The MOD16 image tiles were pre-processed and clipped to the Thur catchment for ease of use (refer to Section 3.4).

Discharge product
While the MeteoSwiss precipitation and MOD16 ET products are in gridded spatiotemporally varying quantities, discharge (Q) is measured at a point as a flux through a stream channel (Tang et al., 2010). Ten (10) federally operated gauging stations are located in the Thur catchment ( Fig. 1a; Table 1). All stations are located along free-flowing systems and have continuous data for the period 2000-2019 (with the exception of the Halden (Ha) station, where October -December 2019 was absentrefer to Section 1.2 in Supplementary Information for details). Available stream discharge time series were obtained from the Swiss Federal Office for the Environment (FOEN) and quality checked prior to use.
The Andelfingen (An) site represents the discharge for the entire Thur catchment, with 9 sub-catchments located upstream of it. The next biggest sub-catchment is represented by the Halden (Ha) site, which drains approximately half of the Thur catchment. For the 2000 -2019 period, the high elevation sub-catchment Appenzell (Ap) displayed the highest average Q value (1375.5 mm), while the low elevation Frauenfeld (Fr) station displayed the lowest average discharge value (556.89 mm) ( Table 1). For the years 2000 -2019, hourly discharge (m 3 /s) was converted to monthly, seasonal, and annual (January -December) volumes of water (mm), by aggregating the mean hourly discharge and dividing it by the discharge station's upstream area. Seasonal data was based on winter (December, January, and February), spring (March, April, and May), summer (June, July, and August), and autumn (September, October, and November) aggregates.

Correction of MODIS data
The Mosnang (Mos) sub-catchment ( Fig. 1a), also known as the Rietholzbach research catchment, is a well-instrumented catchment with a long history of both atmospheric and hydraulic data collection. Readily available long-term AET data from studies conducted in the Mos sub-catchment for the years 1976-2015 (Hirschi et al., 2017;Seneviratne et al., 2012) were used in this study to evaluate the accuracy of MOD16 ET data used in calculating the Thur catchment's water budget. A large weighing lysimeter setup ( Fig. 1a) in the Mos sub-catchment is the only independent long-term AET data available for the Thur catchment.
The weighting lysimeter, with a depth of 2.5 m and a diameter of 2 m, is located in a grassland setting next to the valley bottom of the Rietholzbach in the upper part of the Mos catchment (Hirschi et al., 2017). The lysimeter is mainly filled with gley-brown Cambisol soil type from the same location including a gravel filter layer at the bottom Moeck et al., 2018). At the bottom of Table 1 Gauging stations located in the Thur catchment with site names and abbreviated IDs used in this study, along with number of catchments located upstream from station, area of upstream catchments, elevation of stations, and average stream discharge (Q) for the years 2000 -2019. the lysimeter, water outflow is measured with a tipping bucket. As groundwater is shallow at the site and the average rooting depth is around 0.3 m (Germann, 1981), lysimeter seepage can be assumed to be a reliable indicator of actual vertical groundwater recharge (Ghasemizade et al., 2015;Seneviratne et al., 2012;von Freyberg et al., 2015) Despite scale discrepancy and uncertainties due to the setup of the lysimeter, Seneviratne et al. (2012) show that the Rietholzbach lysimeter seepage and catchment runoff display very similar monthly dynamics, which suggests that the lysimeter is representative for the water balance of entire Mos catchment. The largest discrepancies between lysimeter seepage and catchment runoff values were found during March; most likely linked to the higher spatial variability of hydrological processes taking place in that month (e.g. snowmelt and the onset of the growing season) . At the surface, the lysimeter is covered with a grass species composition which imitates the surrounding conditions, including the same cutting scheme (Hirschi et al., 2017). The manner in which the lysimeter is setup (i.e. the weighing of the lysimeter), also allows for the estimation of AET: a well-established technique (Gebler et al., 2015;Ghasemizade et al., 2015;Goss and Ehlers, 2009;Rana and Katerji, 2000;Schrader et al., 2013;Seneviratne et al., 2012, among many others). Hirschi et al. (2017) demonstrated that for the Mos sub-catchment, the lysimeter and eddy covariance (EC) measurements are in good agreement, in particular where the annual timescale is concerned. Moreover, a good agreement was seen when comparing the lysimeter AET values with the Mos catchment's long-term water balance values (Hirschi et al., 2017). These findings emphasize the representativeness of the site-level lysimeter for the entire Mos catchment, despite its comparatively small source area. Here, we follow Vereecken et al. (2015), who argued that improved description of soil hydrological fluxes at the local-scale are fundamental to reduce large uncertainties, which are still present in large-scale models (or remote sensed data) used to predict these fluxes. As such, and in spite of this area representing only a fraction of the entire Thur catchment, MOD16 ET values for the entire Thur Catchment were bias corrected using the Mos sub-catchment lysimeter values. Taking into account that Seneviratne et al. (2012) found that the Rietholzbach catchment is representative of a number of hydroclimatic regions in the Swiss eastern plateau, in particular for the Thur-river basin, it seems to be reasonable to make the bias correction based on the lysimeter values. A more detailed discussion about this assumption and the possible implications for the water balance calculation can be found in Section 5.1.
When comparing the original MOD16 ET values for the Mos sub-catchment to lysimeter values, an overestimate in the MOD16 ET data was evident (Fig. 2a). Where the lysimeter AET values ranged between 1.20 and 134.20 mm/month for the period from 2000 and 2015, the MOD16 ET values ranged from 15.52 to 132.79 mm/month for the same period. In order to account for this, the MOD16 ET values were adjusted (hereon referred to as AET corr ) by a correction factor of 0.71. This factor was estimated by inversely fitting the original MOD16 ET data to the measured lysimeter values, while seeking to reduce the daily mismatch. Overall, fitted results are in good agreement with the observed lysimeter data (Fig. 2b). Furthermore, a comparison between the AET corr values and modelled AET values (after Zappa et al., 2017), show a very strong, positive (after Evans, 1996) correlation (Pearson correlation = 0.97: refer to the Supplementary Fig. S1). Although the Pearson's correlation coefficient (R) can be overly sensitive to outliers, in light of the large data set, the method was deemed suitable as a basic comparison between the original MOD16 and the AET corr values (Dessu et al., 2018;Legates and McCabe, 1999). In addition, when looking at the expected MOD16 ET values as described by Mu et al. (2011) at a latitude of approximately 47.5⁰ (the latitude of the Thur catchment), these values are in line with the lysimeter values, and subsequently the AET corr .
Additionally, P values (from the MeteoSwiss data) were compared to precipitation data from the meteorological station Büel located at the outlet of the Mos sub-catchment and in close proximity of the above described lysimeter located in the Mos subcatchment (data available from https://iac.ethz.ch/group/land-climate-dynamics/research/rietholzbach/data.html). Findings show a very strong positive correlation (after Evans, 1996) between the gridded P values from the MeteoSwiss data with those from the Büel station (Pearson correlation = 0.93) (refer to Fig. S2 in the Supplementary Information).

Baseflow from total discharge values
Baseflow separation is a method whereby total stream discharge (Q) is separated into precipitation event-based surface discharge components often referred to as quickflow (Q q ), and baseflow components (Q b ) (Eq. 1). Baseflow (Q b ) is usually associated with subsurface processes, cannot be attributed to a single precipitation event, and has been taken as a quantitative variable of groundwater discharge to rivers by many authors (Blume et al., 2007;Duncan, 2019;Fendeková and Fendek, 2012;Hall, 1968;Healy and Scanlon, 2010;Hellwig and Stahl, 2018;Nathan and McMahon, 1990;Reitz et al., 2017;Sutcliffe et al., 1981). Baseflow represents a part of the groundwater which returns to the surface water; a process through which many rivers and lakes are fed, and aquatic ecosystems are maintained during dry periods (Aeschbach-Hertig and Gleeson, 2012;Gurdak, 2017). Infiltration to recharge subsurface storage increases baseflow, but due to AET baseflow can also be reduced as temperature gradients and trees absorb water from the ground. Using a digital filter method available in the EcohydRology R package (R Core Team, 2018), with three passes and filter parameter set to 0.925 as proposed by Nathan and McMahon (1990), base-and quickflow was estimated from the available Q data in order to estimate baseflow from total streamflow in the Thur catchment: (1) A baseflow index was generated for each sub-catchment (refer to Supplementary Table S1 and Fig. S3). In addition, a baseflow separation method comparison was conducted (after Zomlot et al., 2015) using the free online automated Web-Based Hydrograph Analysis Tool (WHAT) (Lim et al., 2010(Lim et al., , 2005, to compare three separation methods: 1) the Eckhardt recursive digital filter (Eckhardt, 2005;Lim et al., 2005) with filter parameter set to 0.98 and 0.925, 2) the local minimum method, and 3) the one parameter method. On average, the ammount of baseflow contribution derived from the different methods varied by up to 22% (refer to Supplementary  Fig. S4). Using a recursive digital filter method with the filter parameter set to 0.925 was the most conservative approach, resulting in an average monthly baseflow contribution of 43%. In order not to over-estimate recharge values in this study, Q q was deterimined using the method most conservative with respect to Q b .

Spatially gridded discharge
In order to evaluate the discharge volumes (in mm) in the Thur Catchment on a spatial basis as part of the water balance, point measurements of stream volumes, weighted based on a topographically-based top-down flow accumulation (FA) algorithm generated using SAGA GIS, were used (Conrad et al., 2015). Refer to Supplementary Fig. S5 for a comparison of FA weight ranking methods. The FA raster was based on a flow routing algorithm (after Seibert and McGlynn, 2007;Tarboron, 1997) generated from a 25 m raster DEM (Source: Federal Office of Topography) which was pre-processed using Wang and Liu's, 2006 fill sink process (Wang and Liu, 2006). The FA raster was resampled using bilinear interpolation to match the 500 m x 500 m raster resolution of the MOD16 ET data, and then normalized and inverted (FA norm ) (after de Lavenne et al. (2019) to create a weighting factor between 0 and 1 with the greater part of the Q q volume weighted to the upstream reaches of the Thur River: before being multiplied by the temporally aggregated quickflow (Q q ) values of each sub-catchment, in order to represent spatially distributed quickflow (Q q-dis ) for the Thur catchment: A comparison of the FA values with the inverted FA values is presented in Fig. S5; which indicates that the inverted FA (FA norm ) more closely represent the measured baseflow values (Q b ). The resulting product, although based on an algorithm which laterally connects adjoining downstream pixels, was used purely to represent discharge volumes spatially as input into the water budget calculation (Eq. 6; Fig. 3). As such, each pixel value does not represent actual measured Q q at the pixel location, but rather each Q q-dis pixel merely represents a portion of the precipitation into that cell which ultimately produces the measured downstream Q q response. However, results show a very strong, positive correlation (Pearson correlation = 0.94; after Evans, 1996) with modelled results (Viviroli et al., 2007b(Viviroli et al., , 2007c) (refer to Supplementary Fig. S6).

Recharge estimates from the water balance equation
In cases where a catchment represents a steady-state system, the terrestrial water budget is equal to all inputs into minus all outputs out of the system. This implies that over long time periods (e.g. annual), catchment storage change (ΔS) calculated from Eq. 4 does not vary, and baseflow can be defined as representative of effective groundwater recharge (Reitz et al., 2017;Schilling et al., 2021;Wolock, 2003): where P represents input in the form of precipitation, ET and Q represent output in the form of evapotranspiration and discharge (as a sum of Q q and Q b ), and ΔS is the change of storage. Discharge (Q) was separated into its quick-and baseflow components using a Q b -conservative digital filter method (refer to Section 3.4.4), where Q b would represent the minimum effective groundwater recharge: Bearing in mind the inherent uncertainty in all of these variables (e.g. Coxon et al., 2015;Li et al., 2018;Lu et al., 2019;MeteoSwiss, 2013;Zhao and Liu, 2014) and assuming acceptable or constant component errors (refer to Section 3.5), the water available for recharge, referred to here as potential recharge (R), in the Thur catchment was described using the independent gridded water balance component AET corr and Q q-dis values as follows: where the gridded spatiotemporal R values represent the value of water which would result in the modelled Q b response (after Eq. 1) derived from measured Q values.
According to Creutzfeldt et al. (2014), the spatiotemporal variability and availability of water in a catchment is controlled by underlying processes (e.g. spatially changing boundary conditions and temporal fluxes in catchment parameters). Any change in trend (representing both the direction and rate of change) can be determined by the slope of a linear regression model (De Jong et al., 2011). Gridded values of R were assessed using a pixel-wise ordinary least-squares linear regression for the 20 years from 2000 to 2019, at the 95% significance level; determining changes in recharge characteristics in the Thur catchment. Additionally, the difference in mean R values estimated over the first 10 years (2000 -2009) was compared to the mean R values estimated over the second 10 years (2010 -2019) of the total study period (refer to Section 4.3.1).

Workflow
Using the open-source statistical program R (R Core Team, 2018), grid-based computations were conducted for the Thur Catchment at the 500 m x 500 m resolution, with each independently estimated input variable representing a water balance component (in rates per unit area) for the years 2000 -2019. The work process for the data processing is illustrated as a flow diagram in Fig. 3. This method relates the water balance components P, AET corr and Q q-dis in a spatiotemporal manner to estimate catchment-wide R over time. No lateral transfer was considered between the individual pixels, but rather mapping the vertical exchange of water entering (via precipitation) and leaving (via evapotranspiration and discharge) each pixel to estimate the potential recharge component of each cell.

Sensitivity of gridded water balance components
Although using independent water balance components reduces the propagation of errors (Healy and Scanlon, 2010), the accuracy of recharge derived from a water budget calculation depends entirely on the uncertainties of each component used as input into the calculation. For example, previous studies (including our study, see Section 3.4.1) have shown that in some cases MOD16 ET products result in an over estimation when compared to ground-based values (Miranda et al., 2017;Ruhoff et al., 2013;Velpuri et al., 2013;Zhao and Liu, 2014, also see Section 3.4.1). Terrain complexity, the distribution and number of measuring stations, along with time series availability and measurement frequency can affect the precision and representativity of measurements taken (Coxon et al., 2015;Schiemann et al., 2010;Turnipseed and Sauer, 2010). This is particularly true when measuring precipitation, where amounts can vary greatly in both seasonal and spatial distribution, especially over complex terrain (Schiemann et al., 2010;Sun et al., 2018).
Although the input components stem from independent sources, they are by nature highly inter-dependant. However, in assuming  that they are unbiased and normally distributed, the variance (σ 2 ) of the individual components can be calculated, and the degree of error stemming from each component estimated: Here, confidence intervals (CI) were calculated and explored for recharge estimate using 100,000 random component values generated within a realistic range within which each component was varied by a fixed amount based on estimated data errors. The range of precipitation value error was based on a + /-20% relative standard error suggested by MeteoSwiss (2013), while the error range for AET was set at + /-25% (based on the findings of Miranda et al., 2017;Ruhoff et al., 2013;Velpuri et al., 2013;and Zhao and Liu, 2014) as well as our own assessment (refer to Section 3.4.1). The specified error for measured Q values is + /-3.2% (after Spreafico and Weingartner, 2005), with higher errors indicated during peak flows. In order to account for the high streamflow variability of the Thur River and its tributaries, the relative standard of error for Q q-dis was increased to 5% for this study. However, the error stemming from the gridded distribution of Q q-dis (after Section 3.4.3) was not considered in this study, and thus Q q-dis related errors should be considered as being conservative and only related to measurement errors.

Results
This section evaluates the individual water balance components, some of which were derived from ground-based data, and presents results from the Thur catchment's gridded water balance. Spatiotemporal recharge over a 20 year period was investigated, and the monthly, seasonal, and annual variability in the water balance components in the Thur catchment were examined.

Baseflow from total streamflow
The results of the digital filter baseflow separation were plotted as monthly data at seasonal intervals (Fig. 4a). When considering the ratio of Q b to total streamflow (Q), the significance of Q b becomes evident for all of the sub-catchments in the Thur catchment, with average contributions of Q b ranging from 31% to 57% (refer to Supplementary Table 1 for baseflow ratios). Along steeper slopes, flow velocities are higher and recharge capacity is generally limited, favouring surface runoff and instream discharge, resulting in elevated Q q values. Whereas gentler slopes experience both slower flow velocities as well as greater storage capacities, resulting in a higher contribution of Q b to total streamflow (Anderson et al., 1978;Lavenne et al., 2019;Moeck et al., 2020;Van Loon and Laaha, 2015). The high elevation sub-catchments of Ap, He, Mos, Mog, and Stg displayed a greater proportion of Q q relative to Q b , while the sub-catchments Wae and Fr, located in the lowland region of the Thur catchment, displayed a relatively high proportion of Q b in relation to Q q . Although the entire Thur catchment experienced a predominance of Q q during the winter, summer, and autumn months (with the highest Q q values occurring during the summer), on average Q b continued to make up 45% of the total discharge value and displayed a predominance of 53% of the total streamflow during the spring months. When lumping seasonal Q q and Q b values on a yearly basis, elevated contributions of Q b during the dry summers of 2003, 2015, and 2018 are highlighted; making up as much as 55%, 65%, and 60% respectively of total flow recorded at the end of the dry (summer) months in the Thur River (Fig. 4b). Please refer to Supplementary Fig. S3 for monthly Q q and Q b values from all of the sub-catchments. Twenty years of hourly quickflow values were fed into the gridded workflow (refer to Section 3.4.5), in order to evaluate the spatiotemporal R values (after Eq. 6) in the Thur catchment and its sub-catchments. A very strong correlation between the modelled discharge values (after Viviroli et al., 2007cViviroli et al., , 2007b, and Q q-dis and Q b values as used in this study, was found (refer to Supplementary Fig. S6).

Water balance
With water balance components used in this study derived from independent sources, the extent of water balance closure in the Thur catchment (after Eq. 6) was determined using mean monthly, seasonal and annual values of input component (P) correlated with the sum of output components AET corr and the observed Q values (separated into Q q , and Qb after Eq. 1). Monthly values indicated a very strong correlation, with Pearson correlation (R) equal to 0.81 (Fig. 5a). The correlations improved further were seasonal (Fig. 5b) and annual (Fig. 5c) data was used, with Pearson correlation (R) equal to 0.89 and 0.94 respectively. This suggests that over longer time periods (e.g. seasonal to annual), the assumption that the Thur catchment is in a steady-state, and ΔS negligible, becomes more valid. Please refer to Table S2 of the Supplementary Information for a comparison of goodness-of-fit measures.
The lack of complete closure might be attributed either to time-lags in groundwater to surface water, or to measurement errors and uncertainties in the water balance components. For instance, precipitation can have a relative standard error of + /-20% as suggested by MeteoSwiss (2013). Moreover, in the high elevation reaches of the catchment uncorrected solid precipitation amounts (e.g. snow) can have a strong impact (Dal Molin et al., 2020;Schmucki et al., 2014;Zappa et al., 2003), and potentially limit the complete closure of the water balance. For example, although some grouping was evident in the monthly data (Fig. 5a), with spring and summer values plotting predominantly above and autumn and winter values predominantly below the input vs. output correlation line, an improved grouping was evident in the seasonal values (Fig. 5b). The impact of snow storage and melt is further advocated with most spring and summer data plotting predominantly above the correlation line (Franz et al., 2010;Meeks et al., 2017). However, a more systematic investigation is required to understand this process in detail and thus this outcome should not be over-interpreted in the context of this study.

Variation in gridded water balance components
The spatial distribution of mean monthly P values over the 20 year period from 2000 to 2019 were highest in the high elevation regions to the south of the Thur catchment, and lowest in the northern reaches of the catchment (Fig. 6a left). Monthly P values (Fig. 6a bottom right) ranged between 1 and 282 mm/month over the 20 year period, with a monthly mean of 115 mm. Annual values (Fig. 6a top right) show high (>1500 mm/year) P values for 2001 and 2002 (nationally recorded wet years), and low (<1250 mm/year) P values in 2003 which experienced record-breaking heatwaves throughout Europe (Casty et al., 2005;Schär et al., 2004), 2015 and 2018 which are also nationally recorded drought years (BAFU et. al., 2020). Additional dry years in the Swiss north-eastern pre-Alps were experienced during and 2011(MeteoSwiss, 2011. From 2003-2009, and again in 2011, mean annual P values remained at or below the 20 year average of 1371 mm. Low (< 40 mm) monthly AET corr values were associated with exposed high elevation regions to the south of the Thur catchment, as well as with the lowland regions to the north where land use is more intense, while the highest AET values were associated with the mid-latitude reaches of the Thur catchment (Fig. 6b left). Annual AET corr values showed relatively little variation, with a maximum value of 544 mm/year for the year 2011, a minimum of 486 mm/year for 2001, and an overall mean of 514 mm/year for the 20 year period (Fig. 6b top right). Monthly AET corr values oscillated between ~15 and ~90 mm/month, with a mean monthly value of 60 mm ( Fig. 6b bottom right). The distributed mean monthly Q q-dis values were greatest in association with the valley bottoms where water accumulates, with minimum Q q-dis associated with the narrow alpine valleys to the south of the catchment (Fig. 6c left). Over the 20 year period monthly Q q-dis values ranged between 1.53 and 135 mm, with a mean value of 40 mm/month (Fig. 6c bottom right). Mean annual Q q-dis values were highest in 2002 (693 mm), which is in line with a nationally recorded wet year (BAFU et. al., 2020), while the years 2012 and 2013 also presented relatively high (>530 mm) Q q-dis values ( Fig. 6c top right). Low Q q-dis values were associated with the drought years of 2003, 2015, and 2018 (302, 417 and 363 mm respectively), with the 2005, 2009, 2014, and 2017 Q q-dis values also falling below the 20 year average of 480 mm. The water budget components, P, AET corr , and Q q-dis , appear to be influenced by the catchment's topographic variations (refer to Fig. 1b).

Spatiotemporal groundwater recharge over a 20 year period
Gridded recharge maps were generated by closing the water balance after Eq.6. The mean R values for the Thur catchment between the years 2000 -2019 ranged from smaller values (<500 mm) in the central and northern regions, to large values (>1000 mm) in the southern regions (Fig. 7a); this is strongly controlled by a difference in precipitation gradient between north and south. The standard deviation of the estimated R for the 20 year period, indicated a slightly higher degree of deviation in absolute values associated with the mountainous southern regions (Fig. 7b). The gridded R values are a conservative spatial estimate of water available for recharge in the Thur catchment area, while the net spatial change of R values assessed using a pixel-wise linear regression is a measure of how much R (in mm) has changed per pixel over the 20 year period. Long-term change in R was calculated using annual values (Fig. 7c). At the 95th percentile level, areas within the Thur catchment experiencing significant change in R values over the 20 year period comprised 4.4% (or 74.98 km 2 ) of the total area, and all significant change observed was negative. These R changes were restricted to the lower reaches of the Thur catchment, parts of the Fr/Wae sub-catchments and the lower reaches of the Jon sub-catchment (refer to Fig. 1a for sub-catchment locations). Potential masking of the actual areas experiencing change in R values may result from the use of annual data as in Fig. 7c. Moreover, comparing the difference in pixel values from the first ten years (2000 -2009) with those from the second ten years (2010 -2019) of estimated R, shows areas that have changed without the constraint of a calculated trend with the pixel-wise regression (Fig. 7d). Areas shown to have experienced negative R rates over the 10-year period (Fig. 7d), correspond well with areas of significant change in R using a pixel-wise linear regression (Fig. 7c).
Over the 20 year study period, the mean temporal recharge values for the entire Thur catchment displayed an average monthly (short-term) mean R value of 34 mm/month (Fig. 8a), while annual (long-term) mean R pixel values ranged between 208 and 578 mm/year, with a 20-year mean of 410 mm/year (Fig. 8b). Seasonal (medium-term) values displayed a mean R of 121 mm during the winter months, 85 mm during spring, 93 mm during summer, and 110 mm during the autumn months (Fig. 8c). It must be noted that, due to the use of a conservative baseflow separation method with respect to Q b (refer to Section 3.4.2 and Supplementary Fig. S4 where differences of up to 22% are indicated depending on the separation method), resulting R values are conservative, and it is likely that the study area is capable of experiencing greater recharge rates. As discussed by Immerzeel et al. (2020) and Viviroli et al. (2007), mountains can be described as the world's water towers; where precipitation and subsequent surface runoff and discharge potentials are high. Although monthly data suggests that over short periods the Thur catchment did on occasion experience water shortage, especially during the spring and early summer months, as indicated by the negative values (14.6% of the monthly data values were ≤ 0), over medium (seasonal) to long-term (annual) periods, a sustained net surplus of R was evident in the Thur catchment over the 20 years period. Fig. 8a and b show short-(monthly) and long-term (annual) temporal trends in R, where above-average (blue) or below-average (red) R values are indicated with respect to monthly and annual mean values for the 20 year period. Although short-term (monthly) R values were on occasion negative (Fig. 8a), in particular during the spring of 2007, 2009 and 2018, a mean value of 34 mm/month suggests that on average R is positive in the Thur catchment. Medium-term (seasonal) intervals show that mean seasonal R values were at their lowest during spring, while the highest mean seasonal R values occurred during the winter months (Fig. 8c). Although R values were never negative over medium-and long-term intervals, Fig. 8b indicates that periods of notably smaller annual R values (values Fig. 8. a) Mean monthly, b) annual, and c) seasonal potential recharge values (mm) centred around the short-medium-and long-term mean values for the years 2000 -2019. Red indicates potential recharge below and blue potential recharge above the subsequent long-term mean values. below the long-term annual mean of 410 mm/year) in the Thur catchment are in line with national drought years (2003, 2015, and 2018), while the years 2001 and 2002 display R values of above long-term mean potential recharge (410 mm/year. Comparing the intensity of R values below the long-term average across the different national drought years, the below long-term R values of the 2003 drought is seen as having being the least extreme (307 mm), and the 2018 drought the most extreme (208 mm), highlighting the importance of the initial condition(this point is further discussed in Section 5). From 2003 until 2011, mean annual R values remained continuously below the long-term mean. Concerning soil moisture variations, Orth and Seneviratne (2012) showed that, rather than evapotranspiration or runoff, soil moisture memory effects depend on both the initial soil moisture state and the subsequent accumulation of precipitation. Considering the above-average precipitation values spanning the duration from 2001 to 2002 (refer to Fig. 6a), initial soil moisture would likely have been elevated prior to 2003; a possible explanation for the lowered R values displaying a relative mildness in spite of the intensity of the 2003 drought. The subsequent droughts of 2015 and 2018 followed a period of continued R deficit (e.g. from 2003 to 2011), during which average or below average precipitation rates were experienced, and display and increasing severity in terms of below-average R values. Similar findings by Van Loon and Laaha (2015) suggest that both drought duration and deficit is governed by average catchment wetness via precipitation and catchment storage capacity.

Sensitivity assessment of water balance components
Uncertainties in input components to the Thur catchment's water balance were assessed and the variance of each component compared after Eq. 7. Annual values of both P and Q q-dis were approximately normally distributed, while AET corr was bimodal and displayed the highest degree of variance (Fig. 9a). A stochastic perturbation was added to the variance of the individual variables P, AET corr , and Q q-dis , variables combined, via the introduction of noise which was then compared to the variance of R. Tendencies show that P, although having a lower variance than AET corr , displays the greatest spread in values over the period from 2000 to 2019 (Fig. 9b). Subsequently, P represents the component to which R is most sensitive. The spread of AET corr was smaller, followed by Q q-dis , suggesting greater certainty in these data sets, particularly in Q q-dis . These results highlight some of the primary uncertainties and sensitvities in the water budget calculation.

Discussion
This study investigated the feasibility of estimating spatiotemporal variations in R using a combination of satellite image products in conjunction with ground-based discharge data in a mesoscale catchment over a 20 year period. The use of RS data is of particular interest in regions where the availability of direct measurements are either limited or non-existent.  (Miranda et al., 2017;Ruhoff et al., 2013;Velpuri et al., 2013;Zhao and Liu, 2014). A few reasons may lead to the MOD16 ET product overestimating AET values in the Thur catchment, including a lack of any topographic correction of the product (see Zhao and Liu, 2014), in addition to other factors such as spatiotemporal scale etc. (Lu et al., 2019). In light of both the mesoscale size of the Thur catchment, and its variable topography, a correction of the MOD16 ET values was deemed necessary.

Actual evapotranspiration
Where the monthly values are concerned, inversely fitting the MOD16 ET data to the measured lysimeter values resulted in a very strong, significant positive correlation (Pearson correlation = 0.95, p < 0.05) (see Section 3.4.1). Concerning the correction, if more sophisticated statistical techniques were used and/or a different strategy to weight the values individually were applied, slight variations in the satellite data may have been better quantified. However, this could also have resulted in an overfitting of the data . As the lysimeter data only represents a single point-scale measurement, this study did not seek to achieve a perfect match between MOD16 ET and lysimeter AET. As mentioned previously, Seneviratne et al. (2012) show that the Rietholzbach lysimeter seepage and catchment runoffs display very similar monthly dynamics, suggesting that the lysimeter is representative for the water balance of entire Mos and many eastern Swiss catchments, in particular the Thur River catchment. Moreover, Hirschi et al. (2017) demonstrated that for the Mos sub-catchment, the lysimeter and eddy covariance (EC) measurements are in good agreement, with the Mos sub-catchment's long-term water balance values. All of the mentioned findings emphasize the representativeness of the site-level lysimeter for the entire Thur catchment, despite its comparatively small source area.
However, additional ground-based ET data might be required to up-scale and validate the correction of the MOD16 ET using observed values. Although the lysimeter dynamics and rates has been shown to be in good agreement with measurement for a wide range of hydroclimatic region in eastern Switzerland, and in particular for the Thur River basin , without additional ground-based AET data the correction of the MOD16 ET and subsequent upscaling remains uncertain. Notwithstanding this simple approach, AET corr values compare very strongly with previous AET studies conducted in the Thur catchment Zappa et al., 2017;Spreafico and Weingartner, 2005). Thus, a simple single correction factor for MOD16 ET data was deemed suitable in scaling the MOD16 ET values to within a suitable range for the Thur catchment.

Baseflow separation
Although a well-studied concept, baseflow remains difficult to define as demonstrated in this study (see Section 3.4.2 and Supplementary Fig. S4) and elsewhere (Duncan, 2019;Healy and Scanlon, 2010;Nathan and McMahon, 1990). Significant errors may be introduced where baseflow is used as a primary indicator of recharge as this represents only a portion of a catchment's total drainage (Sophocleous, 2002). However, this study assumes little or no groundwater loss via inter-catchment exchange, from abstraction or via evapotranspiration. In addition, the baseflow separation method is also often only appropriate in humid and sub-humid conditions (Healy and Scanlon, 2010). At larger spatiotemporal scales Q b characteristics can provide information on groundwater status and seasonal low flows which are integral to instream ecology (Duncan, 2019). Furthermore, Q b has been ascribed as an integrated characteristic of a catchment's storage potential and response time (Van Loon and Laaha, 2015), and studies have equated both Q b and baseflow-index values as being equal to recharge (e.g. Lee et al., 2006;Reitz et al., 2017). With average contributions of baseflow ranging from 31% to 57% of total streamflow, the importance of subsurface discharge is highlighted; a process which is often neglected in studies solely based on surface data (Hoffmann, 2002). In particular during the dry summers of 2003, 2015, and 2018 elevated baseflow would have made an important contribution to the Thur River and its tributaries; highlighting the importance of groundwater in maintaining aquatic ecosystems during dry periods (Moeck et al., 2020).

Closing the water balance
Input P values over seasonal and annual time scales correlated very strongly with aggregated output values of AET corr , Q q , and Q b , indicating a trend towards total closure of the Thur catchment's water balance, and supporting the assumption that over longer timescales the Thur catchment represents a steady-state system (Spreafico and Weingartner, 2005;Zappa et al., 2017). Although minor fluctuations might occur in the Thur's ΔS value as shown in Zappa et al. (2017), over longer time scales (> 10 years), and considering the humid climatic conditions prevalent in the Thur catchment, ΔS variability would generally be very small relative to the volumes of the other water balance components (Han et al., 2020;Reitz et al., 2017). However, the lack of complete water balance closure might be attributed to both small fluctuations in ΔS, especially over shorter time intervals.

Spatiotemporal recharge
Over the 20 year period the Thur catchment experienced a total of eleven years with below-average R values (refer to Fig. 8b), nine of which were consecutively lower than the annual mean. In spite of this, a linear pixel-wise regression indicated that only 4.4% of the Thur catchment experienced significant overall change in R (at the 95th percentile). Spatially significant change assessed over the 20 year period indicated a predominantly negative trend in R values, primarily in association with the lower reaches of the Thur catchment and in regions associated with the Fr/Wae and lower Jon sub-catchments located in middle reaches of the catchment (Fig. 7c). This negative trend could be attributed to the lower average P values associated with the affected regions (refer to Fig. 6a), as well as the total 11 years of below-average R values experienced in the Thur catchment between 2003 and 2011.
The years 2003, 2015, and 2018 were national drought years in Switzerland and the impact thereof is evident in the long-term temporal P, Q q-dis , and R values. While fluctuations in the AET corr values showed relatively little response to changes in P over the 20 year period, the limiting effect of P on R is evident from the spatiotemporal assessment, which is in line with other findings in the Thur catchment (Dal Molin et al., 2020). The relationship between P and ET has been shown to be one of long-term equilibrium, dependent on larger-scale climate fluctuations (e.g. El Nino), rather than the monthly, seasonal, yearly time steps investigated here (Zhang et al., 2013). Although 2003 is known as a year of record-breaking heatwave throughout Europe, compared to the 2015 and 2018 drought years the extremity of the below-average R values in 2003 appears to have been positively buffered by higher than average precipitation during 2001 and 2002. This suggests that the long-term R behaviour in the Thur catchment is in line with findings by Orth and Seneviratne (2012) concerning soil moisture variations.
Concerning the increasing extremity of below-average R values from the 2003, to the 2015, and finally the 2018 drought years in this study, highlights the effects of hydrological wet years preceding hydrological drought years in terms of R values. As such, average catchment wetness and soil moisture effects and memories could be an important factor to consider as part of a water management strategy within the Thur catchment. Advances in remotely sensed soil moisture studies may aid in this endeavour (Dorigo et al., 2016;Nicolai-Shaw et al., 2015), and further investigation along this line is recommended for the Thur catchment.
As mentioned in Section 4.3.2 over short time intervals R values were occasionally negative. This might be as a result of a time lag in the R and Q q-dis values, or due to the inherrent errors in the input components (refer to Section 4.4). In other studies, where the sum of the AET corr and Q q-dis values were greater than corresponding P values, a common approach has been to adjust one of the components to ensure water balance closure. For example, Reitz et al. (2017) estimated gridded annual recharge, quickflow, and ET for the contiguous U.S., but in 15% of all map pixels the sum of ET and quickflow were higher than precipitation and adjustments were made to close the water balance. The majority of these adjustments are made to the ET values, as ET is the most difficult water balance component to accurately quantify (Bhattarai et al., 2016;Hulsman et al., 2020;Reitz et al., 2017;Zhao and Liu, 2014). To draw attention to the difficulties of estimating recharge from a relatively simple water budget calculation, no attempt was made to close the balance in this study (see Section 4.2). As such, estimated spatiotemporal R values determined in this study serve to gauge the minimum rate at which the groundwater table should be re-supplied in the Thur catchment.
On average, when considering the total water balance of the Thur catchment and assuming negligible change in subsurface storage, recharge (R) accounted for 29%, quick flow (Q q-dis ) for 34%, and AET (AET corr ) for 37% of total precipitation. Groundwater R estimates are in line with modelled values for the region after studies from Abbaspour et al. (2007). In light of the method uncertainties in estimating quick-and baseflow from total discharge value (a maximum difference of 22% was observed between the different methods; refer to Supplementary Fig. S4), the recharge estimates for the Thur catchment may in fact, be greater (and consequently quick flow smaller). However, this potential for greater recharge, may again be negated in regions where the groundwater table intersects rooting depth. A study by von Freyberg et al. (2015), reported recharge in the Mos sub-catchment as 71% of precipitation. However, these recharge estimates were lysimeter-based, where no or only little surface run occurred due to the lysimeter setup. The sum of R and Q q-dis values comprise 63% of precipitation in the Thur catchment, increased recharge estimates depending on the baseflow separation method used. When comparing water balance components for the different sub-catchments of the Thur catchment, the amount of R was seen to increase only slightly with higher the elevation sub-catchments (Fig. 10a), and greater changes were observed in the proportion of decreasing evapotranspiration with respect to increasing quickflow (Fig. 10b). Refer to Supplementary Fig. S7 and Table S2 for percentage error compared to total available water in terms of P in the Thur catchment. Although not considered in detail in this study, the geological setting and lithological variances, along with anthropogenic alterations, might also further control the Fig. 10. Average annual groundwater recharge (R), quickflow (Q q-dis ), and actual evapotranspiration (AET corr ) in the Thur's sub-catchments as a function of a) elevation, and b) as percentage (%) of precipitation, with An representing the entire Thur catchment. baseflow contribution (Dal Molin et al., 2020;Weatherl et al., 2021).
The aforementioned available seepage data from the lysimeter located in the Mos sub-catchment (Hirschi et al., 2017;Seneviratne et al., 2012) was compared to the gridded R values determined in this study (refer to Supplementary Fig. S8). It was assumed that no overland flow forms on the top of the lysimeter and that all the rainfall infiltrates into the lysimeter storage due to the existence of a 10 cm steel edge surrounding the top of the lysimeter (Ghasemizade et al., 2015). Moreover, the lysimeter is located in the grassland setting next to the valley bottom. Thus, a substraction of Q q was seen to be necessary to make a comparision between gridded R values and lysimeter seepage. When the quickflow (Q q ) values were subtracted from the lysimeter seepage values (Lysimeter-Q q ), a strong correlation (after Evans, 1996) was found between the lysimeter seepage values and the gridded R values (Pearson correlation = 0.55). More importantly, the Lysimeter-Q q values show a stronger 1:1 relation to R values; representative of effective recharge in the Mos sub-catchment ( Supplementary Fig. S8). Refer to Table S4 of the Supplementary Information for an evaluation of goodness-of-fit measurements of lysimeter values vs. gridded R values.

Limitations
Although long-term changes in subsurface storage are possible, assuming that storage change is small relative to the volumes of P, AET corr , and Q q-dis , 100% closure constraint was used for time span (20 years) of this study. No attempt was made in this work to describe interpixel transfer of groundwater flow. Rather, the contributing factor of P entering a pixel cell to become the potential recharge component R, as expressed by a conservative Q b value at the discharge station, was mapped. Previous studies indicated a groundwater residence time of 20 days or more for groundwater sources located at distances > 300 m from the Thur River (Hoehn and Scholtis, 2011). Moreover, flowpaths associated to karstic structures cannot be identified with our approach and might bias locally the obtained results. However, closer of the water balance for each sub-catchment indicate that this has likely only a smaller impact. Although knowledge concerning the lateral connection between pixels could improve the temporal understanding of groundwater recharge in the Thur catchment, considering the relatively short residence time in the major valley bottoms, temporal groundwater travel-time-knowledge would likely only be applicable over monthly or shorter time periods. Even though the approach used in this study disregards the lag-time from groundwater recharge to discharge, in view of the monthly, seasonal, and annual scales considered in this study, recharge values after Eq. 6 are believed to represent reliable average estimates in line with findings by the likes of Abbaspour et al. (2007) and von Freyberg et al. (2015). Chemical and isotope data can provide further insight into specific recharge areas, surface-groundwater interactions, and groundwater ages associated with the Thur catchment (Cirpka et al., 2007;Hoehn and Scholtis, 2011;Moeck et al., 2017).
While it is important to monitor the local effects of groundwater abstraction, considering the volumes abstracted relative to the error of P in the Thur catchment ( +/-20%; refer to Section 3.5), the current groundwater abstraction rates of 0.65% of total input fall well below the estimated input error, and were therefore omitted from this study. Furthermore, any groundwater abstracted is utilized locally within the Thur catchment, predominantly for the production of potable water and in parts for irrigation and industry. Abstracted water used for irrigation would flow directly back into the aquifer (minus the losses incurred via evapotranspiration), while the industrial and drinking water portions would flow back to the rivers via the wastewater treatment plants; ultimately contributing to the Thur catchment's baseflow and quickflow (Han et al., 2017).
Further discrepancies in the water budget may arise from errors or uncertainties in the calculated components (refer to Section 3.5), and or inadequate spatial and temporal accounting of these components (Coxon et al., 2015;Healy and Scanlon, 2010;Khan et al., 2018;Sun et al., 2018). Although the use of satellite image products in conjunction with ground-based discharge data provides evidence of closure of the Thur catchment's water balance, comparing spatial data over time as binary plots, although a common practice, is reductionist in that a single value (e.g. mean pixel value) is meant to represent characteristics of an entire region. This is potentially reductionist, especially in catchments with variable terrain. The propagation of data and method error, although not considered in this study, leads to further uncertainty. However, good closure of the water budge (refer to Fig. 10), along with good correlations with modelled findings (please refer to the Supplementary Figs. S1, S2, S6, and S8), suggests that in spite of these uncertainties, values in this study are representative of the Thur catchment's water balance.
Terrain complexity, spatial resolution, the distribution and number of measuring stations, along with time series availability and measurement frequency can affect the quality and representativity of measurements taken (Coxon et al., 2015;Schiemann et al., 2010;Turnipseed and Sauer, 2010). For example, in spite of the MeteoSwiss product being of a high standard and in spite of having a lower variance than AET corr , perturbations of P displayed the greatest component sensitivity (Fig. 9b). Precipitation (P) was the only input into the water budget calculation, and therefore small variations or uncertainties in P strongly affect the output water balance components, including discharge and recharge. The uncertainty of AET corr , although smaller, may also be affected by errors which arise as a result of terrain complexities as outlined above, in addition to product specifics such as lack of terrain correction, and the use of the PM equation (Lu et al., 2019;Zhao and Liu, 2014). Discharge values used in this study represent both high quality as well as high temporal resolution data. This is evident in the low variance and small spread of perturbed Q q-dis values ( Fig. 9a and b). The conservative approach used in this study to determine Q b implies that resulting values from Eq. 6 represent minimum possible R values. In spite of the uncertainty in the values, the use of independent data sets (remotely sensed and ground-based), resulted in a good closure of the Thur catchment's long-term water balance (refer to Supplementary Fig. S7 and Table S2).
Overall, the results of this study are in line with findings by Schädler and Weingartner (2002), highlighting the effect of terrain complexity on major contributing components when calculating a water budget. Uncertainties in the MOD16 ET product where within range (10-30%) of the reported ET uncertainties and studies have shown that the PM-based MOD16 ET product is suitable for regions with spatially heterogeneous land cover, such as those found in the Thur catchment (Cleugh et al., 2007;Mu et al., 2011;Running et al., 2019). However, the PM-based method only applies to active vegetation, which may result in skewed values in regions where snow cover is present during the winter months (Mdaghri-Alaoui and Eugster, 2001). Due to errors relating to leaf and cloud shadowing effects, Srivastava et al. (2017) recommend a regional-scale standardization of the MOD16 ET product, which was done by correcting the MOD16 ET product by a factor of 0.71. Additionally, the spatiotemporal resolution of the MOD16 ET product, although suitable for characterizing processes in mesoscale catchments such as the entire Thur catchment, might be too coarse for use in small catchments such as the Mos and He sub-catchments (both of which are smaller than 17 km 2 ). This could further explain the uncertainty in the AET corr values, as these were derived from the MOD16 ET product corrected to the lysimeter data collected in the Mos sub-catchment. Lastly, it must also be noted that 20 years provide a relatively small window of observation where the Thur's long-term water balance is concerned.

Conclusion
This study demonstrates that the approach of remote sensing products used in conjunction with ground-based data to estimate spatiotemporal groundwater recharge has the potential to be useful both for local and regional water availability assessments, as well as long-term water resource planning. The use of remotely sensed data as presented here offers a method of determining a basin's water balance at the mesoscale through the integration and analysis of observed data, reconciled within the water budget equation.
Gridded water balance components P, AET corr , and Q q-dis were able to adequately capture the spatiotemporal variability of potential groundwater recharge in the Thur catchment over a 20-year period. The pixel-wise linear regression suggests a decline in R from 2000 to 2019, in particular within the lower reaches of the Thur catchment. However, knowledge of spatiotemporal characteristics remains a factor, and full comprehension of the intrinsic heterogeneities and complexities of specific basins may never be entirely possible. Temporal data of the water balance components emphasized the limiting effect of P on R, and stressed the importance of hydrological wet years preceding hydrological drought years in terms of long-term below average recharge potential. In light of this, the inclusion of soil moisture data is recommended in future studies to provide additional insight into spatiotemporal groundwater recharger behaviour in the Thur catchment or other (mesoscale) catchments.
Although reductionist, a remotely sensed water budget calculation may go some way towards a holistic starting point for monitoring recharge in mesoscale catchments, especially in poorly monitored or remote catchments where the lack of ground-based observations could be augmented. For example, although the 500 m x 500 m gridded resolution used in this study was found to be a limiting factor when dealing with the Thur's smaller sub-catchments, freely available MOD16 ET data was corrected locally using longterm ground-based lysimeter measurements from a single site within the Thur catchment. By comparing variance with other modelled water budged terms it was shown that, although the MOD16 ET product tended to overestimate AET the overestimation was consistent and larger uncertainties were associated precipitation values. The use of the MOD16 ET data was shown to be valuable in determining AET for the mesoscale Thur catchment, and it is believed that it is applicable to regions with relatively little or no ground-based AET measurements, remaining cognisant of potential over or under estimation.
For many catchments long-term observations are either completely missing or, where monitoring networks do exists, these are often falling into disrepair due to a lack of finances and/or upkeep (Montanari et al., 2013;Sivapalan et al., 2003). Spatiotemporal groundwater recharge estimates are useful for water resource managers who want to estimate trends in the water balance and track these changes over time. Remotely sensed data can be a useful asset in the advancement of hydrological understanding, even where gauged basins are concerned. This provides significant advantages when dealing with poorly gauged basins and establishing long-term data records for such basins in terms of their water availability (Gleason and Durand, 2020). However, even with additional methodological innovation in the field of RS, this study underscores the current need for continued, and in some places new, ground-based monitoring of hydrogeological components, where mesoscale or smaller catchment sizes are concerned.

Funding
The work described in this paper was funded by the Swiss National Science Foundation (grant agreement No 200021_169003).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. data. The authors thank the Federal Office of Meteorology and Climatology (MeteoSwiss) for the meteorological data and the Federal Office for the Environment (FOEN) for the streamflow data. We thank the group of S. Seneviratne (Land-Climate-Dynamics), Institute for Atmospheric and Climate Science (IAC), Swiss Federal Institute of Technology Zurich (ETHZ), who provided meteorological data and lysimeter seepage time series from the Rietholzbach research catchment. The MOD16A3GF and MOD16A2GF data were developed by Dr S.W. Running and his team from the Numerical Terradynamic Simulation Group at the University of Montana, supported by NASA/EOS.