Quantifying uncertainty in groundwater depth from sparse well data in the California Central Valley

Groundwater is a critical freshwater resource for irrigation in the California Central Valley, particularly in times of drought. Groundwater depth has dropped rapidly in California’s overdrafted basins, but irregular monitoring across space and time limits the accuracy of the groundwater depth projections in the Groundwater Sustainability Plans required by the California Sustainable Groundwater Management Act (SGMA). This work constructs a Bayesian hierarchical model for predicting groundwater depth from sparse monitoring data in three Central Valley counties. We apply this model to generate 300 m resolution monthly groundwater depth estimates for drought years 2013–2015, and compare our smoothed groundwater depth map to smoothed rasterized maps published by the CA Department of Water Resources. Finally, we quantify uncertainty in groundwater depth predictions that are made by imputing missing well data and interpolating predictions across the study domain, which is helpful in directing future sampling efforts towards areas with high uncertainty. The BHM model accurately captures the spatiotemporal pattern in groundwater depth, as evidenced by 94.54% of withheld test samples’ true depth being covered by the 95% prediction interval drawn from the BHM posterior distribution. The model converged despite a very sparse dataset, demonstrating broad applicability for evaluating changes in regional groundwater depth as required by SGMA. Depth prediction intervals can also help prioritize future groundwater depth sampling activity and increase the utility of groundwater depth maps in total storage predictions by enabling sensitivity analysis.


Introduction
Groundwater is essential to the California Central Valley's (CV) highly productive agricultural economy [1], serving as a critical buffer against both seasonal variations in water availability and short-term droughts [2,3]. Groundwater withdrawals peak during late summer months when agricultural water demand is highest and surface water availability is limited [2,4]. Groundwater also serves as the primary water source for perennial crop growers during periods of extreme drought. For example, during the 2012-2016 drought, groundwater increased from 43% to 70% of agricultural water supply in the CV [5].
As a result of the California CV's extensive dependence on groundwater, the region has suffered from serious groundwater overdraft and its associated consequences [6]. Groundwater storage in the CV aquifer system has been depleted at an average rate of 1.85 km 3 per year since 1962 [7], and during the 2012-2016 drought, groundwater depletion was roughly quadruple that rate [8]. This level of groundwater overdraft causes a multitude of interconnected consequences, including permanent reductions in groundwater storage potential, land subsidence, seawater intrusion, and increased groundwater pumping costs for growers [5,9].
In 2014, California lawmakers passed the Sustainable Groundwater Management Act (SGMA), creating the state's first enforceable groundwater management guidelines and providing a framework for basin-level groundwater decision making [5,6,10]. Under SGMA, basins deemed high-or medium-priority must achieve long-term groundwater sustainability over a 20-year period, halt overdraft, and bring groundwater basins into balanced pumping and recharge [11][12][13]. This must be accomplished without causing undesirable side effects such as significant and unreasonable groundwater storage reduction or land subsidence [11][12][13][14]. SGMA also grants authority to Groundwater Sustainability Agencies (GSAs) and, if necessary, the State to monitor groundwater, develop and implement groundwater sustainability plans, and enforce these plans through groundwater use fees and limits on extraction.
The success of SGMA will depend on the accuracy of groundwater monitoring and modeling, but existing tools suffer from limited generalizability across basins and poor quantification of data and model uncertainty. Well-level spatial and temporal monitoring of depth-to-groundwater provides accurate readings proximal to the well, but well-level monitoring over large spatial regions or temporal periods requires costly large-scale data collection [10]. California's most comprehensive groundwater monitoring effort, the California Statewide Groundwater Elevation Monitoring Program (CASGEM), provides depth-to-groundwater measurements collected by the Department of Water Resources (DWR) and local monitoring parties [14,15]. DWR produces interpolated maps of groundwater depth in the most sampledense geographic regions for the months of March and October, when sampling efforts peak [15]. As such, these maps provide limited geographic coverage and exclude data from less frequently sampled months that could be used to inform temporal trends. Additionally, the interpolation techniques used to generate CASGEM maps cannot differentiate between real changes in groundwater depth and fluctuations in data availability from non-uniform spatiotemporal sampling patterns [16]. Finally, existing CASGEM maps do not report uncertainty in depth to groundwater estimates. This impedes policy decision making and efficient identification of data gaps to improve well sampling efficiency [17]. Neither conventional interpolation models nor sophisticated simulation models estimate groundwater levels with the high spatial and temporal resolution relevant to GSA, while also providing robust uncertainty quantification [18].
Bayesian hierarchical models (BHMs) have gained traction for environmental science applications due to their ability to translate spatial and temporal information into conditional models on a wide variety of scales and to propagate the uncertainty of missing values into subsequent analysis [19][20][21][22]. The popularity of BHMs has coincided with rapid developments in computational technology that allow for greater model complexity [23][24][25]. For example, Manago et al (2019) developed a BHM for groundwater depth in urban Los Angeles, offering depth prediction that is more robust to varying frequency of observed well data than traditional interpolation methods [16]. BHMs allow for multiple imputation of missing well data and, therefore, reflect the uncertainty of the missing values in subsequent analyses such as spatial prediction and risk analyses [16,26]. While the BHM's hierarchical approach harnesses simple models to capture a complex posterior distribution [27,28], the computational complexity of specifying a BHM can become impractical as data density decreases. A model offering unambiguous uncertainty estimation, such as a BHM, is useful in prioritizing sampling locations to improve prediction uncertainty when environmental sampling is cost-limited.
Motivated by the need for locally-relevant and near-real-time groundwater level monitoring with quantifiable uncertainty, this work develops a BHM that is subsequently used for multiple imputation and interpolation to generate monthly groundwater depth maps in California's Central Valley at 300 m spatial resolution. The model addresses current gaps in depth-to-groundwater predictions by incorporating the uncertainty due to both imputing missing data and making predictions at new locations. This approach maintains higher temporal resolution of publicly available depth-to-groundwater reports than existing simple interpolation models, while also quantifying uncertainty in those predictions. We expand upon the model presented by Manago et al (2019) [16], modifying their BHM formulation for a substantially larger spatial area across the Central Valley counties of San Joaquin, Stanislaus, and Merced. Finally, we describe a process for constructing a BHM with sparse data, probing computational limits of the BHM and extending the model's applicability to large, sparse spatiotemporal datasets. Our model offers computational efficiency improvements such as task parallelization, allowing for greater uncertainty handling over a larger spatial domain. This framework can be applied to other groundwater monitoring areas under SGMA and to other water-stressed areas outside of California, offering insight into when and where additional data collection would be most useful to reducing uncertainty in groundwater depth predictions.

Methodology
To estimate monthly groundwater levels across San Joaquin, Stanislaus, and Merced in 2013-2015, we proceed through the following modeling framework. First, we collected, cleaned, and aggregated groundwater depth data from the California DWR into monthly, 300 m resolution space-time observation cells. After withholding a randomly selected 10% of observed cells for out-of-sample testing, we initialized the BHM with vague priors and iterated its Markov chain algorithm for 11 000 steps to yield a posterior distribution for each parameter in the model. Next, we used out-of-sample data to internally validate the BHM. For monthly predictions of groundwater depth across the study area, multiple imputation of missing cells was performed prior to Ordinary Kriging. We then generated groundwater maps of the predicted mean and 95% prediction interval. Finally, we compared modeling results from our BHM's monthly interpolation products with DWR's groundwater depth maps, where available and intersecting.

Data and study area
The study area is the 20 km-buffered intersection of San Joaquin, Stanislaus, and Merced counties with CA Bulletin 118 groundwater basins as defined by the California DWR (figure 1(a)). We obtain groundwater measurements for 2013-2015 from the DWR's CASGEM program. The data represent the distance from the ground surface to the water table level, so a larger distance corresponds to a lower groundwater level.
Wells reporting groundwater depth over the 2013-2015 period are heterogeneously dispersed across the study area ( figure 1(b)). Depth-togroundwater data are also heterogeneous across time, with both the sampling frequency and the spatial distribution of sampling varying widely from month to month. After performing quality control on the data to remove surface-water level readings, wells were spatially aggregated into 300 m resolution cells. We also explored spatial aggregation of the wells at 150 m and 600 m resolution. Cells with an in-cell, in-month standard deviation of >9 m were assumed erroneous due to measurement or input error and were removed from the dataset. The total percentage of data removed was 2.91% across the three years.
The 300 m-resolved spatial cells were consolidated into monthly temporal bins. We averaged each monthly temporal bin's groundwater readings by applying equal weight to each well in the cell. This avoided over-weighting more frequentlysampled wells. Some cells contain well(s) reporting only a few times a year, while other cells contain groundwater depth data for as many as 33 months over the 36-month study period. In order to remove overly sparse spatial locations and facilitate model convergence, we removed cells containing less than three monthly depth readings over the study period from the dataset. Cells with higher temporal reporting density were clustered in the northern quadrant of the study area (figure 1(c)).
After data cropping, cleaning, and aggregation, there were 690 cells reporting data across 36 monthly time steps. The mean distance between monitoring wells wasD = 83.2 km. The mean and median missingness in the dataset were 86.39% and 90.58%, respectively, indicating that the dataset is very sparse. The sparseness of this dataset exceeds that of prior BHM implementations for groundwater depth by nearly an order of magnitude [16]. Only about 10% of our cells had data presence each month annually, with peaks in March and October near 40%. Supplementary information (SI) section 1 (https://stacks.iop.org/ERL/15/084029/mmedia) provides accessory illustrations of spatial and temporal well reporting frequency.

Bayesian hierarchical model implementation
We implement a four stage BHM to impute missing groundwater data [16]. The BHM is structured into the following levels: A full specification of the structure, parameters, and hyperparameters of the data model, process models, and parameter model is provided in SI section 2. SI section 2 also reports results validating the model assumptions for the study's large, sparse dataset and details of the sampling and imputation algorithm. Prior distributions of hyperparameters were selected to be vague at the scale of the data and conjugate when possible. The BHM iteratively updates parameter distributions for 11 000 steps. Internal validation of the BHM with out-of-sample data is available in SI section 3, including an assessment of the model's ability to fit data with a parameter recovery experiment.

Multiple posterior sampling and interpolation
The BHM generates posterior distributions of hyperparameters, which can be used to predict cells' missing groundwater levels multiple times across every time step. The BHM posterior distribution provides 10 000 predictions for each missing data point. For each time step, 100 sets of fully imputed datasets are created by randomly selecting from the 10 000 possible depth predictions for that location and time step.
We computed ordinary kriging interpolations of each of these sets of complete cell imputations on a 300 m grid overlaying the study area, producing 100 interpolated groundwater maps. Using this map manifold, we calculate each grid cell's mean, 95% prediction interval from 2.5% and 97.5% empirical quantiles, and variance. The variance includes the uncertainty due to both imputation of missing data and prediction at new locations. We also calculated depth to groundwater and variance at 150 m and 600 m resolutions.

External comparison of modeling results
California's DWR produces bi-annual 'depth below ground surface' interpolated layers, produced in March and October to coincide with their heavily sampled collection periods [29]. The color ramp layer, hereafter termed the DWR layer, provides a smoothed approximation of the groundwater elevation 'surface' based on interpolations of the measurement data [29]. These interpolated maps are used as a 'status quo' interpolated map for coinciding months of the BHM's interpolated maps. We use this product as a point of visual comparison for our model's final interpolated layer.
We first aggregated the DWR layer into the same 300 m grid as described in the previous section. We then take the difference between the BHM mean prediction and the DWR layer for each cell. We completed external comparison of modeling results for all 6 months that the DWR produced these layers in the 2013-2015 period.

Bayesian hierarchical model convergence
We used Markov chain Monte Carlo (MCMC) for parameter estimation, such that the Markov chain's stationary distribution composes the parameters' posterior distribution [30]. Per iteration, our MCMC algorithm (i) drew most parameters by means of Gibbs steps from their full conditional distributions, (ii) drew parameters controlling the strength of spatial and temporal dependence by means of Metropolis-Hastings steps, and (iii) imputed missing groundwater depth values using parameters' updated conditional distributions [31]. Parameters converged within the first 1000 iterations, indicating that increased sparsity of our dataset did not inhibit parameter convergence despite longer computational times. From every numerically generated realization of the Markov chain beyond this burn-in period, we constructed posterior distributions of the parameters (N = 10 000). SI section 3 includes the complete description of the MCMC algorithm and a Markov chain illustration.
The majority of the data model parameters' posterior distributions were normally distributed. One of the two seasonal parameters' distribution is centered around zero, which indicates that the majority of seasonal fluctuation in the groundwater readings were encompassed in a single parameter of the Fourier pair. The spatial dependence strength parameter (θ s ) was normally distributed with a mean value of 16.6, which indicates that cells beyond 56 km away from each other are statistically spatially uncorrelated. The temporal dependence strength parameter (α) was very high, demonstrating the utility of harnessing temporal short-term memory for groundwater depth predictions within each cell.

Monthly multiple posterior sampling and interpolation
After sampling from the posterior parameter distributions to generate multiple groundwater depth predictions at each 300 m grid cell across the study area, we calculated the mean, 95% prediction interval, and variance of each 300 m cell to generate descriptive map sets for each month. Figure 2(a) shows the mean predicted groundwater depth from 100 samples of fully imputed data based on the posterior distribution for October 2013. We observe a consistent spatial pattern that persists across all months. This demonstrates that the vector of spatial random effects (s) in The difference between intersecting DWR and BHM mean depth, mapped in (c) and (a) respectively, shows strong agreement for the majority of the region; deviation on the edges is likely due to DWR's lack of data use beyond within-month observed spatial edge. the spatial process model effectively captured strong spatial patterns from depth to groundwater training data and propagated these spatial patterns when imputing missing data. The depth to groundwater mean maps across 150 m, 300 m, and 600 m resolutions were nearly identical, with predictions consistent within one meter across >98% of the study area (SI section 4).
We quantify the prediction variance due to uncertainty associated with imputing missing data and uncertainty associated with spatial prediction and present the sum of six consecutive cell-by-cell variance maps (October 2013 through March 2014) in figure 2(b). Regions with high summed variance, such as the southeast corner of Merced county, are high value candidates for additional groundwater depth sampling. SI section 4 includes cell-by-cell variance for each month constituting figure 2(b), and provides an additional month (March 2014) of groundwater depth maps and external validation. Within-month variance distributions were particularly high for months with sparse groundwater depth sampling, such as January 2014. While monthly BHM predictions clearly offer the benefit of short-term memory in the temporal process model, it appears that well sampling within the prediction month remains critical to maintaining low prediction variance.
We observe that the variance of depth estimates increases with increasing resolution when comparing depth to groundwater variance maps across 150 m, 300 m, and 600 m resolutions. At coarser resolution, more wells are averaged together in-cell during the aggregation step pre-BHM and variance decreases. The spatial pattern of highest variance locations remains relatively constant across all three resolutions of analysis, demonstrating the BHM's consistent identification of data gaps for additional groundwater depth sampling efforts.

External comparison of modeling results
We compare our modeling results with the biannual interpolated DWR layers produced under the CAS-GEM program. As seen in figure 2(c) for October 2013, the DWR layer covers only the fraction of the study area for which uncertainty is acceptably low. Our prediction layer's bounds exceed that of the DWR layer because our model uses spatiotemporal information beyond within-month sampling. This approach is supported by strong spatial dependence persisting for at least 60 km across the area and very strong temporal dependence when verifying BHM assumptions (SI section 2). Figure 2(d) plots the per-pixel difference between the DWR layer and our model's mean prediction for the given month. We define exact match-up as when the difference between the two pixels is 0 m. For October 2013, 94.7% of 300 m-resolved differences were within 5 m of exact match-up. For March 2014, 94.0% of the cells were within 5 m. Thus, we can conclude that the interpolated product from DWR and our model's multiply interpolated mean product have strong consistency. We see the strongest disagreement between the two maps on the edges of the DWR layer. We hypothesize that the deviation between the models is primarily due to shortcomings in the local averaging method used to generate the DWR layers (details of DWR layer generation available in SI section 5). The DWR method does not account for well depth-to-groundwater readings beyond the map's extent or exploit readings beyond its subjectively specified time period. In contrast, the BHM model and subsequent ordinary kriging were performed using data that includes a data-driven selection of a 40 km buffer zone around the map area and used observed temporal patterns in the data to make predictions at cells lacking depth reporting in a month. While the DWR layer's resolution was originally 30 m and our modeling layer's resolution is 300 m, our model offers a clear improvement in terms of groundwater depth map temporal frequency. Our model also benefits from an ability to easily scale spatial resolution to 30 m, though higher resolutions require longer computational times.

Discussion
We demonstrate the utility of BHM techniques for estimating depth-to-groundwater in the California CV. Previous efforts to apply BHMs to estimating GW depth only tested the validity of BHM models on high density datasets. Our model application and validation demonstrates parameter convergence even as computational times per Markov chain iteration increased. This model framework is generalizable to basin-by-basin analysis across all high-and mediumpriority basins subject to the 2014 SGMA policy on a monthly basis, offering groundwater depth evaluation at a high spatial resolution in shorter monthly time updates. This combination of model fidelity and transparency in prediction uncertainty make the Bayesian hierarchical modeling approach particularly useful for informing future sampling campaigns. Uniformly increasing spatial and temporal density of depth-togroundwater across an entire study area can be very costly [10]. However, one of CASGEM's goals is to identify and address data gaps in their groundwater monitoring program in accordance with Water Code section 10 933(a) [32]. Uncertainty quantification using this BHM could aid in this assessment. For example, wells with highly-variable groundwater imputations would benefit the most from additional sampling wells. In addition, large variations in month-to-month predictions present high-value opportunities for increased frequency of data collection.
Depth-to-groundwater models are critical for informing policy in California, and the probabilistic and high resolution attributes of our BHM-enabled groundwater maps make them well-suited for this application. SGMA program performance evaluation may require incorporation of a variety of key metrics, including depth-to-groundwater. For instance, GSAs may experiment with various groundwater restoration approaches, such as on-farm groundwater recharge using water capture and storage services provided by natural landscape features [33]. Our approach can be used to both forecast groundwater depth in a future month and update groundwater depth predictions when public data for that future month becomes available. In this example, comparison of the mapped groundwater depth forecast and prediction can be used to assess the efficacy of basin recharge policies. The simplicity of uncertainty propagation from our framework can be a valuable instrument for communicating confidence in such policy evaluation.
This BHM framework enables sensitivity analysis of groundwater depth in further models or analyses. For instance, depth-to-groundwater is a key uncertain variable in a variety of agronomic analyses. Furthermore, groundwater flow models seeking to compare results to empirical depth-to-groundwater data would benefit from uncertainty intervals around empirical groundwater depth for internal validation. A variety of methods for propagating uncertainty through a model or other analysis, such as stochastic simulation, would be well-suited to using this depthto-groundwater product that quantifies uncertainty.

Conclusions
This study employs a BHM coupled with multiple imputation and spatial prediction to produce monthly groundwater depth maps in California's Central Valley at 300 m resolution. This method allows for tractable uncertainty estimation of predicted groundwater depth across policy-relevant space and time domains. As compared with existing approaches for monitoring groundwater levels in California, our estimates offer quantitative estimates of model uncertainty without sacrificing spatial and temporal resolution relevant to local policymakers and policy enforcers.
Groundwater in California is inextricably tied to national food security [3], and SGMA has recently created the first set of enforceable groundwater management guidelines for replenishing overdrafted groundwater basins across the state. This work, when expanded to other high-and medium-priority groundwater basins in California, can fill a critical gap for GSAs as they complete their groundwater monitoring tasks. Our model is generalizable and constructed from easily-collected data, offering recently-empowered local groundwater policy enforcers options both as a planning tool and a policy evaluation tool.