Assessment of groundwater well vulnerability to contamination through physics-informed machine learning

Contamination from anthropogenic activities is a long-standing challenge to the sustainability of groundwater resources. Physically based (PB) models are often used in groundwater risk assessments, but their application to large scale problems requiring high spatial resolution remains computationally intractable. Machine learning (ML) models have emerged as an alternative to PB models in the era of big data, but the necessary number of observations may be impractical to obtain when events are rare, such as episodic groundwater contamination incidents. The current study employs metamodeling, a hybrid approach that combines the strengths of PB and ML models while addressing their respective limitations, to evaluate groundwater well vulnerability to contamination from unconventional oil and gas development (UD). We illustrate the approach in northeastern Pennsylvania, where intensive natural gas production from the Marcellus Shale overlaps with local community dependence on shallow aquifers. Metamodels were trained to classify vulnerability from predictors readily computable in a geographic information system. The trained metamodels exhibited high accuracy (average out-of-bag classification error <5%). A predictor combining information on topography, hydrology, and proximity to contaminant sources (inverse distance to nearest upgradient UD source) was found to be highly important for accurate metamodel predictions. Alongside violation reports and historical groundwater quality records, the predicted vulnerability provided critical insights for establishing the prevalence of UD contamination in 94 household wells that we sampled in 2018. While <10% of the sampled wells exhibited chemical signatures consistent with UD produced wastewaters, >60% were predicted to be in vulnerable locations, suggesting that future impacts are likely to occur with greater frequency if safeguards against contaminant releases are relaxed. Our results show that hybrid physics-informed ML offers a robust and scalable framework for assessing groundwater contamination risks.


Introduction
Groundwater contamination by human activities remains a persistent challenge in the quest towards sustainable resource utilization and equitable access to safe drinking water [1][2][3]. From a management perspective, these contamination risks are often conceptualized through a source-pathway-receptor approach. Sources include any type of activity or industry that can release contaminants into the environment. Once released, contaminants are transported and potentially influenced by biogeochemical reactions along hydrologic pathways that connect the source to groundwater receptors. The receptor may be defined as the water table, a water supply well, a groundwater-fed stream, or some other point within the hydrogeologic system. Following this framework, groundwater vulnerability to contamination is 'the tendency or likelihood for contaminants to reach a specified position in the groundwater system after introduction at some location above the uppermost aquifer' [4]. Predicting vulnerability is often operationalized by evaluating contaminant travel times or concentrations at the receptors of interest or estimating contaminant residence time distributions within the groundwater system [5][6][7].
Given that groundwater is 'hidden' beneath the Earth's surface, contamination risk and groundwater vulnerability are often assessed using models. Physically based (PB) models, which encapsulate the current scientific understanding of hydrogeological processes, have a long history in groundwater contamination risk assessment [8], vulnerability analysis [7], and decision support [9]. At present, however, the application of PB simulation models at large regional scales remains difficult due to a tradeoff between process complexity, spatial resolution, and computational demand. In particular, applications involving contaminant transport simulation often need to adopt simplified process representation or coarse spatial discretization to circumvent prohibitively long run times.
Spurred by the 'big data' revolution, machine learning (ML) approaches have emerged as popular alternative tools for prediction across the geosciences [10]. In particular, the well-known random forest (RF) algorithm has been applied from local to global scales to predict the probability of exceeding threshold concentrations and vulnerability to a variety of contaminants [11][12][13][14]. ML models have several advantages, including high accuracy in matching observations, low computational requirements relative to PB models, and capacity to generate new predictions almost instantaneously. Nevertheless, the large number of observations needed by purely datadriven ML approaches may be too expensive, timeconsuming, or difficult to acquire, especially when the events of interest are rare. The prediction-centric nature of ML has also raised questions over model interpretability, with models often characterized as black boxes [15]. In such models, spurious relationships or unreasonable predictions violating conservation laws may not be readily recognized.
Hybrid approaches that combine the strengths of both PB and ML models are rapidly developing under a new paradigm known as physics-informed ML [16], or more broadly, theory guided data science [17]. Physics-informed ML methods enforce scientific theory from PB models while preserving the computational efficiency and scalability of datadriven ML models [18]. One method of implementing such integration is metamodeling [19], wherein ML models are trained to learn functional inputoutput relationships from PB models. Metamodeling and other physics-informed ML approaches have been shown to learn generalizable relationships and successfully make predictions for conditions not covered by the training dataset [19][20][21][22][23][24]. Metamodels are therefore potentially valuable tools for studying large unmonitored areas, alternative scenarios (e.g. future climate change or environmental disturbance), and rare events (e.g. transient groundwater contamination). However, applications of metamodels to regional scale contaminant transport problems are limited [25], and their use in quantifying vulnerability to contamination from multiple sources has yet to be investigated [26].
The objective of this research is to test the appropriateness of the metamodeling approach for groundwater vulnerability assessment. In particular, we employ a novel combination of a source-pathwayreceptor PB model [27] and ML to evaluate the vulnerability of drinking water wells to contamination from unconventional oil and gas development (UD). We trained metamodels to classify the vulnerability of receptors using spatial metrics readily computable in a geographic information system. The trained metamodels, which exhibit high accuracy, were subsequently used to predict the vulnerability of domestic water wells sampled in 2018 that were distributed across a large geographic region (∼2900 sq km) that hosts UD. We utilized these predictions alongside geochemical indicators of contamination, violation reports, and historical groundwater quality records to elucidate the potential influence of UD in our study sites. Our findings demonstrate the utility of physics-informed ML for vulnerability assessment and contaminant source attribution. Because the approach is scalable and readily deployable, metamodeling is a practical tool to scientists and water resource managers engaged in informing policies intended to safeguard drinking water supplies against UD and other sources of contamination.

Background: groundwater quality in the context of unconventional oil and gas development
The risks of groundwater contamination from UD continue to concern communities that rely on shallow aquifers for drinking water [28]. Local groundwater impairment has been linked to UD-related incidents [29]. Surface spills of drilling fluids, hydraulic fracturing fluids, flowback and produced waters are of particular concern [30][31][32], especially because these fluids contain chemicals that are detrimental to human health [33]. Produced water is continuously generated throughout the operational lifespan of a UD well, and risks of spills are exacerbated by limited options for proper treatment and disposal [34].
Details on the materials released, volumes, and durations of spill events remain sparse, thus precluding the comprehensive assessment of groundwater contamination risks and potential public health implications [35]. Produced waters typically contain high concentrations of B, Ba, Br, Ca, Cl, Li, Na, Ra, and Sr, and hence their co-occurrence at anomalously elevated concentrations in shallow groundwater may be a useful indicator of contamination by UD [36][37][38][39]. Organic chemicals used in hydraulic fracturing fluid additives and derived from hydrocarbonbearing formations are also present in UD wastewaters and have been detected in groundwaters near UD sites [40][41][42][43][44][45]. Many inorganic and organic contaminants associated with UD have natural or other anthropogenic sources. Mass (or molar) ratios of inorganic ions, as well as isotopes and noble gas composition, are informative for differentiating UDimpacted waters from non-impacted waters or waters contaminated from alternative sources, but the isotopic and noble gas analyses can be costly and are thus not routinely performed [46][47][48][49][50]. With UD projected to play a significant role in the U.S. economy and energy portfolio for the next few decades [51], new approaches are needed to better characterize the environmental risks and public health implications from industry activities. In this study, we explore the potential role of UD in groundwater contamination through a novel combination of physics-informed ML modeling and geochemical analysis.

Study area, field sampling, and related data collection
Northeast Pennsylvania is a UD center for natural gas production from the Marcellus Shale. Land use in this region is predominantly forested, but the majority of the local watersheds contain some form of UD infrastructure [52]. More than half of the rural population relies on domestic groundwater wells that typically tap into fractured bedrock aquifers composed of interbedded sandstones, siltstones, shales and mudstones of Upper Devonian formations [53]. Groundwater flow is topographically dominated, with recharge from uplands discharging to adjacent valleys along local flow systems and to major rivers along intermediate and regional systems [54].
We collected groundwater samples from 94 homes (90 domestic wells and 4 private springs) in Bradford and Tioga counties (figure 1) in the summer of 2018 (PA2018 samples) as part of the Water and Energy Resources Study. Homeowner participation in the study was voluntary. Groundwater samples were collected upstream of any household treatment system upon stabilization of pH, temperature, dissolved oxygen and specific conductance. Sample storage and laboratory analyses, which included a suite of inorganic and organic analytes, followed standard protocols [55]. A summary of the laboratory results was mailed back to the homeowners.
To identify potential UD impacts, we utilized six inorganic mass ratio frameworks to identify samples exhibiting geochemical signatures consistent with UD produced water. For these samples, we analyzed violation records [56] for indications of nearby contaminant releases. We also examined past groundwater quality records near these sampling locations at three periods: 1980s [53], 2010s [57], and 2016 [58]. The 2010s data are 'pre-drill' records collected prior to the drilling of new UD wells to establish a baseline in case of future alleged impairment. However, most of the pre-drill samples were already surrounded by older UD wells at the time of sampling [59]. Despite these multiple data sources, attribution of impacts to UD contamination remains ambiguous. Models are required to elucidate whether hydrologic pathways connect contaminant sources with well water receptors.

Physics-informed machine learning approach: vulnerability classification
Metamodels were trained on a PB, high resolution, three-dimensional model of groundwater flow and solute transport in a 190 sq km domain in southeastern Bradford county (dashed box in figure 1) [27]. The PB model used was Hydrogeosphere, a finite element hydrologic simulator [60]. The calibrated model successfully replicated observations of hydraulic head determined from USGS monitoring wells and drillers' well logs and estimates of groundwater discharge from regression equations [27]. The PB model utilizes the concept of capture probability to operationalize vulnerability. Capture probability is the probability that a parcel of water at a given location will arrive at a receptor within a specified time, and vulnerability is defined as the union of capture probability at potential contaminant source locations (text S1 and figure S1 (available online at https://stacks.iop.org/ERL/16/084013/ mmedia); see [27] for a detailed discussion). Note that vulnerability of a receptor (water well) does not necessarily equate to its actual contamination, as the latter is also dependent on the occurrence of releases (e.g. spills) at the source locations (UD well pads) inside the capture zone. Capture probability and vulnerability to contamination at 25 years were evaluated for 316 domestic groundwater wells extracted from the Pennsylvania Groundwater Information System [61] that were within the PB model boundaries. The 25 year time horizon corresponds to the expected operational lifespan of UD wells and was selected to provide insights on risks from the early stages of drilling up to well plugging [51]. Note that the 316 receptors within the PB model domain are distinct from the 94 PA2018 wells that are distributed across a larger geographic region (∼2900 sq km). Because PB modeling with sufficient spatial resolution at this large scale remains computationally intractable, we used metamodeling to gain insights on vulnerability and to identify simple predictors that reflect the underlying physics of contaminant transport processes.
For our metamodeling approach, we posed the question of vulnerability to contamination from UD as a classification problem: a domestic well is classified as Vulnerable (Class V) if V ⩾ 0.001 and as Non-vulnerable (Class NV) otherwise. This conservative threshold accounts for the dominance of low capture probability regions in domestic well capture zones arising from uncertainty in the streamlines induced at typical household pumping rates while also minimizing the effect of numerical errors [27,62,63]. The objective of the metamodeling is to predict the vulnerability class of a domestic well given a set of input predictors describing the hydrology, terrain, and proximity to UD at a domestic well location (table S1). Interactions between topography and proximity to UD were previously shown to be important in predicting groundwater quality in a large dataset, but the reasons for such associations are poorly understood [64]. We hypothesize that metamodels linking these predictors to the hydrologic concepts of capture probability and well vulnerability will elucidate mechanisms of contamination from UD and explain observed geochemical signatures. The predictors were assigned to wells using ArcGIS 10.7.
Many predictors, such as the various proximity metrics, are inherently correlated with one another.
Hence, we used a supervised ML algorithm called conditional inference forest (CIF), a variant of the RF algorithm that employs unbiased recursive partitioning through Chi-square tests in its tree-building process to identify the predictors most strongly associated with the response variable [65]. This approach helps the algorithm learn robust relationships even when predictors are correlated [66]. As in the case of RF, random subsets of predictors and bootstrap samples of the data are used for training each tree. The algorithm uses up to twothirds of the dataset for training each tree, and the remaining 'out-of-bag' sample is used for testing and for computing honest performance measures of the model.
The ML training/testing dataset consisted of PB model simulations for the original 316 domestic wells described in the previous section ( figure S2(a)). In order to encompass a wider ensemble of field conditions, we augmented this dataset with simulations for 834 'seed' wells [19] that were equally divided but randomly distributed inside uniform 2 × 2 km grid cells superimposed on the PB model domain. One hundred CIF model realizations (random restarts), each consisting of 3000 trees, were generated to assess the stability of the metamodel predictions. The synthetic minority over-sampling technique [67] was used to ensure a balanced dataset for each realization. Dominant predictors were identified using conditional permutation importance [68], and The predictors describe the topography, hydrology, and proximity to UD at a given receptor location. Definitions: idups-inverse distance to nearest upgradient UD source, d_elev-difference in surface elevation between nearest UD source and the groundwater well, mtpi-multiscale topographic position index, dnrst-distance to nearest UD source, idw2_1 km/2 km/3 km-inverse distance squared weighted count of UD sources within 1, 2, or 3 km of the groundwater well, idw_2 km/3 km-inverse distance weighted count of UD sources within 2 or 3 km, n_1 km/2 km-number of UD sources within 1 or 2 km, wtd-water table depth, dsd-distance from stream to divide, mvdev-deviation of the cell elevation from the mean elevation, dwetland-distance to nearest wetland. Detailed descriptions are in table S2. relationships between input predictors and log odds of predicted vulnerability were visualized with accumulated local effects plots [15]. The predictors' effects on vulnerability were also assessed qualitatively for consistency with hydrologic principles. The trained metamodels were then used to predict the vulnerability class of the 94 PA2018 sampling locations in order to help elucidate the influence, if any, of UD on the sample chemistry. All ML analyses were conducted in R-3.6.3.

Metamodel training and evaluation of important predictors of vulnerability
For each metamodel realization, the classification was determined at the point where model sensitivity, the proportion of Class V wells in the training dataset correctly classified as Class V, and specificity, the proportion of Class NV wells in the training set correctly classified as Class NV, are equal. This balance reduces both Type I and Type II errors while maximizing predictive accuracy [69]. The average out-of-bag classification accuracy of the metamodels with equal sensitivity and specificity was 97.1%, indicating good predictive capability of the metamodels. This performance is on par with other RF variants and other ML algorithms [10,66]. The high accuracy also indicates that the metamodels successfully emulated the behavior of the PB model.
Variables characterizing hydrology and proximity to UD sources were important in predicting vulnerability class (figure 2). Kruskal-Wallis and post-hoc analysis of the conditional permutation importance across all realizations showed that the top predictors were idups (inverse distance to the nearest upgradient UD source), d_elev (difference in surface elevation between nearest UD source and the groundwater well receptor), and mtpi (multi-scale topographic position index). The first predictor in this top group, idups, is a measure of the total flow path distance of a water well from UD well pads. This predictor utilizes the D-infinity algorithm in TauDEM [70] to delineate flow paths from UD source locations along the surface of the digital elevation model. Surface flow paths are a reasonable and readily delineable approximation of subsurface flow paths in this region, in accordance with previous research showing topographic control of the shallow groundwater flow regime [54]. Distances from each UD source along each flow path and from each flow path to groundwater receptors are summed, and the inverse is taken such that idups = 0 indicates the absence of upgradient UD sources. A Python script and sample files illustrating the idups calculations are provided in the SI. The second top predictor, d_elev, approximates the difference in potential energy that drives groundwater flow when the water table follows the surface topography [71]. The third most important predictor, mtpi, describes the relative location along a hillslope, with large negative values representing major valleys and large positive values representing hilltops or catchment divides [72]. Common proximity metrics [73,74] such as the (Euclidean) distance to the nearest UD source, dnrst, and variants of inverse distance-weighted UD well counts within specified radii (1, 2, and 3 km) from groundwater wells are also important predictors of vulnerability class. However, our results demonstrate that proximity metrics alone are insufficient and that hydrologic drivers of solute transport must be considered for accurate assessment of drinking water vulnerability to contamination from UD. Other important predictors are described in table S2.
The dominant predictors have nonlinear effects on the log odds of classification into Class V (figure 3). Proximity metrics generally indicate that groundwater wells closer to UD are more likely to be Vulnerable (figures 3(a)-(d)). This insight is consistent with capture probability being highest in the immediate vicinity of a groundwater well and decreasing to zero as one moves further away. The most common UD proximity metric, dnrst, has an inverse relationship with the log odds up to around 2 km, where the effect appears to stabilize ( figure 3(b)). This finding implies that domestic well capture zones in the study area can extend up to 2 km and thus some likelihood of impairment by weakly sorbing, recalcitrant UD contaminants is preserved at this distance. Contaminants that are susceptible to degradation or that sorb strongly may be removed, transformed, or retained on aquifer solids at smaller distances. Our results also show that the log odds of being in Class V increase sharply when there is at least one UD source within 1 km (n_1 km) of a groundwater well, but more sources within this distance do not translate to higher odds ( figure 3(d)). While seemingly counterintuitive, this finding is consistent with our threshold condition for the classification problem: one source inside the probabilistic capture zone will generally suffice to meet V ⩾ 0.001. The marginal decline in odds is due to interaction between the topography in the training region and the locations of UD well pads. Given the terrain in the PB model domain, areas intercepting more than one well pad in 1 km tend to be located near topographic divides and thus often fail to intercept flow paths (figure S3).
Groundwater wells are more likely to be in Class V if they are proximal to a flow path that extends downgradient from a UD source (idups > 0) ( figure 3(a)) and if the nearest UD source is at a higher elevation (d_elev > 0) (figure 3(e)), consistent with the topographically driven flow in our study region. Wells located in valleys (mtpi < 0) are also more likely to be in Class V (figure 3(f)), and slightly more so near stream confluences (characterized by small dsd [75]) ( figure 3(g)). These areas correspond to discharge zones in the groundwater flow system. Locations with a shallow water table (characterized by small wtd) are more likely to be Vulnerable ( figure 3(h)). This behavior is consistent with the popular DRASTIC groundwater vulnerability rating system [76], where a shallow water table implies faster infiltration and less degradation of contaminants derived from the surface. These qualitative interpretations collectively suggest that the metamodel predictions are hydrologically sensible.

Predicted vulnerability classification of the groundwater sampling locations
We used the metamodels to classify the vulnerability of the groundwater receptors where the PA2018 samples were collected. Fifty-seven of the 94 locations (60.6%) were predicted to be in Class V based on the mode across all metamodel realizations ( figure  S2(b)). This proportion is considerably larger than the 13.6% of the original 316 domestic wells with V ⩾ 0.001 simulated inside the PB model domain ( figure  S2(a)). This discrepancy reflects the greater proximity of our samples to UD sources-the median dnrst for the PA2018 locations was about 1 km less than the median for wells within the PB model domain and Bradford county in general ( figure S4). Homeowners who volunteered to join our study tended to live closer to UD, and correspondingly, were more likely to be in locations classified as Vulnerable to contamination. This observation is consistent with research indicating that geographic proximity to UD infrastructure affects one's level of concern towards UD risks [77]. Studies with similar voluntary recruitment strategies should consider this self-selection when making generalizations about the prevalence of UD impacts.
All 94 sampling locations had ⩾60% agreement in their predicted class across all metamodel realizations. Eighty-two had ⩾90% and 68 had 100% classification agreement, indicating general robustness of the predictions for majority of our samples ( figure  S2(b)). Samples near 60% agreement may potentially fall in a different mode class with additional realizations. These locations likely have predictor values with counteracting effects, such that the random predictor subsetting in the CIF algorithm can lead to a different classification with each metamodel run. This underscores the importance of evaluating the stability of the metamodels and their predictions.

Vulnerability and other lines of inquiry about UD contamination
In this section, we probe the potential influence of UD on the PA2018 samples through multiple complementary approaches: vulnerability metamodeling, analysis of inorganic and organic constituents, and inspection of violation reports and historical groundwater quality records. First, we compared the chemical composition of samples across Vulnerable and Non-vulnerable classes, focusing on inorganic chemicals typically present in high concentrations within produced waters, as well as organic compounds that have been reported in UD wastewaters [36,38,43,45]. Taking the 82 wells with ⩾90% agreement, we found elevated median concentrations of several constituents-Ba, Br, Cl, Li, Na, Sr, total dissolved solids (TDS), and gasoline range organics (GRO)in Class V samples relative to Class NV samples, with Ba, Br, and Li being statistically significant at p < 0.05 (table S3). This result remains even when considering all samples (table S4) or only the 68 samples with  100% classification agreement (table S5), with Cl, Na, and GRO also attaining statistical significance in the latter. This consistency indicates the robustness of these results and supports the need to further assess the Vulnerable sampling locations. Of the constituents evaluated, only Ba has a health-based EPA maximum contaminant level (2000 µg l −1 ). This MCL was exceeded in one Class NV sample (Sample ID 076) and five Class V samples (032, 046, 048, 049, and 065).
Although produced water inputs would raise concentrations of the identified constituents in nonimpacted fresh groundwater, our observation of higher median concentrations within Class V samples is insufficient to establish a link with UD activities because these chemicals have other potential sources. Inorganic ion ratios have been used in groundwater source attribution studies and to identify shallow groundwaters that have been altered by mixing with produced water [46,47,49,50,78]. Six frameworks that use inorganic ion ratios show that the majority of our samples plot within non-impacted groundwater, while some samples are geochemically similar to produced waters (figure 4). Eight samples (six Class V: 032, 036, 046, 052, 054, 063, and two Class NV: 076, 092, ⩾90% classification agreement) plot within the oil and gas produced water region in three or more frameworks. Note that oil and gas produced waters have inorganic ion ratios similar to Appalachian Basin brines that are known to mix naturally with shallow groundwater beneath valleys of our study area [79,80]. We therefore explored additional sources of information, including data on organic chemistry, to probe whether the produced water signature in these eight samples could be attributed to UD. Samples 032, 046, 054, and 063 (Class V) exceed the median GRO concentration of the full dataset, with the GRO concentration in 054 more than twice the median GRO (table S6). Samples 046, 054, 063 (Class V), as well as sample 092 (Class NV), exceed the median diesel range organics (DRO), with the 046 and 054 concentrations exceeding the median DRO by at least a factor of two. Sample 046 also exhibits the highest measured DRO in the dataset. Compound-specific analysis of this sample [55] revealed the presence of bis-2-ethylhexyl phthalate, a compound previously detected in Marcellus flowback waters [43], and N-Ndimethyltetradecylamine, a suspected surfactant used in the industry [81].
Well-water impairment (i.e. impact) is contingent on both vulnerability and an actual release of contaminants. Thus, we examined UD releases in the vicinity of the eight potentially impacted samples to strengthen our inferences on source attribution. We found violation reports indicating spills, defective casings, and inadequate waste storage facilities at UD well pads within 2 km of five of the six Class V samples (table S7). The 2 km distance is notable because it corresponds to the dnrst predictor that emerged from the metamodeling, further suggesting the pertinence of this distance to the capture zones of domestic wells in the study area. Sample 032 and the . Mass ratio frameworks for differentiating non-impacted from potentially UD-impacted waters. Open symbols indicate that a constituent concentration used in a framework was reported as below detection limit and substituted with DL/2, and filled symbols indicate that all concentrations were reported above the DL. Boundaries for frameworks (a) to (d) were taken from the literature [46,47,49,50]. Boundary for Conventional Hydrocarbon in framework (c) proposed by [49] is not shown because conventional activity is negligible in the study area. Boundaries for frameworks (e) and (f) were defined in this study as discussed in text S2. Individual sample ID's plot within the produced water region in ⩾3 frameworks. Shale gas produced water samples from the Marcellus Shale in Pennsylvania were extracted from the USGS National Produced Water Geochemical Database [82].
two Class NV samples were linked to violations indicating releases beyond 2 km. The absence of potential UD releases closer to sample 032 and the two Class NV samples may point to a non-UD source of the geochemical signatures in these samples, such as naturally upwelling brines. The absence of known releases may also simply reflect deficiencies and difficulties in documenting such events. None of the violation reports contained sufficient information on the material, volume, and duration of the releases necessary for the detailed modeling of such events. Contamination via mechanisms not represented in our current models, e.g. UD-related roadside truck spills, also cannot be ruled out as possible explanations for the produced water signatures, although transportation spills are estimated to have much lower probabilities than spills at well pads [83].
Temporal data would further reinforce the plausibility of contamination from UD, particularly if samples from a given locale shift from the nonimpacted region to the oil and gas produced water region in the mass ratio frameworks. Such change would indicate water quality impairment over the intervening period. On the other hand, if the groundwater quality is consistently similar to produced water, then the sustained geochemical signature is likely derived from a non-UD source, such as mixing with natural brines. Given the absence of continuous time series monitoring data, we examined neighboring records from historical groundwater quality datasets (1980s, 2010s, and 2016) to establish temporal context near the eight potentially impacted sites. For six of the sites, we found historical records within 10 m to 1.6 km (mostly within 200 m) that had sufficient concentration data to calculate the inorganic ion ratios (table S8). No historical neighbors with sufficient data were found for sample 092.
Historically non-impacted records near Class V samples 032, 046, and 063 and Class NV sample 076 suggest that the produced water signatures in the corresponding 2018 samples are recent developments. UD sources are thus mechanistically supported, though not proven, for the three Class V samples. The recent shift in the chemistry in Class NV sample 076 may be from non-UD sources or UD pathways not currently represented in our models. On the other hand, three historical samples-two 2010's records near Class V samples 036 and 054, and the 2016 record near Class V sample 052-exhibited produced water signatures (table S8). The presence of produced water signatures in these three locations both in the historical samples and in the 2018 samples possibly indicates a source that is sustained over time, such as natural brine upwelling in these areas. This explanation, however, is confounded by the presence of neighboring non-impacted historical records. The three historical samples with produced water signatures were also collected after potential releases at surrounding well pads; therefore, the possibility that the sampled geochemistry reflects a UD influence cannot be excluded. Additionally, the 2016 sample near 052, which had non-impacted neighbors prior to the onset of UD (table S8), was flagged by the USGS for possible anthropogenic contamination due to the detection of man-made organic compounds and radioactive constituents. The agency, however, did not definitively attribute this contamination to UD [58].
In summary, 57 out of our 94 PA2018 sampling locations were predicted to be Vulnerable to contamination from UD. Eight of the 94 samples exhibited geochemical similarity to oil and gas produced waters. Six of these were predicted to be Vulnerable, and the remaining two were Non-vulnerable. Contamination of the Vulnerable wells from UD is plausible from a hydrologic standpoint, and violations indicating UD releases, together with recent changes in groundwater chemistry, provide further support for a potential UD link (table S9). Non-UD sources of the produced water signatures (e.g. natural upwelling of brines in valleys) or anthropogenic sources not represented in the PB model (e.g. transportation spills) cannot be ruled out, particularly for the two Non-vulnerable sampling locations. We emphasize that the majority of our samples are non-impacted, consistent with previous studies of groundwater contamination by UD in the region [44,46,59,84]. With the exception of Ba concentrations, which can be elevated from natural sources in this region, concentrations of analytes reported in tables S3-S5 did not exceed relevant health-based standards at either Vulnerable or Nonvulnerable sites. Despite the currently low prevalence of observed contamination, the large proportion of Vulnerable wells suggests that drinking water impacts will occur with greater frequency if decisions are made to relax regulatory and operational safeguards intended to prevent environmental releases of UD wastewaters and other fluids.

Conclusions
Our analysis elucidates important interactions between groundwater quality, proximity to UD, and topography through the hydrologic concepts of capture probability and well vulnerability. Metamodels successfully emulated predictions from a detailed PB groundwater flow and transport model, and enabled the efficient generation of robust insights on groundwater vulnerability across a larger geographic area. We found that inverse distance to nearest upgradient UD source (idups) was highly informative in classifying vulnerability to contamination in our topographically controlled flow system. This predictor and the other spatial metrics that highly influence vulnerability can serve as functional proxies in initial assessment of groundwater contamination risks. We found that vulnerability clarified the interpretation of samples exhibiting chemical similarity to oil and gas produced waters. Our results show that the physics-informed ML approach of vulnerability classification serves as a useful line of inquiry in investigating suspected cases of contamination, particularly when more extensive and costly methods are infeasible. Metamodeling of vulnerability can also function as a systematic framework for optimizing locations for groundwater quality monitoring and for probing water-related contaminant exposure pathways relevant to public health. Lastly, the approach provides a science-based platform for designing policies to protect groundwater within the context of UD (e.g. by minimizing new development in areas near clusters of vulnerable water wells) and other anthropogenic sources of contamination.

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).
has not been formally reviewed by EPA. The views expressed in this document are solely those of the authors and do not necessarily reflect those of the Agency. EPA does not endorse any products or commercial services mentioned in this publication. M A Soriano was also supported by the Yale Institute for Biospheric Studies Small Grants Program and the Geological Society of America Graduate Student Research Grants program. C J Clark was funded in part by the National Institute of Environmental Health Sciences under the National Institutes of Health [F31ES031441]. We thank the three anonymous reviewers for their thoughtful comments that led to improvements in this manuscript. We are also grateful to Dr Joshua Warren for initial insights on ML, and to Keli Sorrentino, Julie Plano, Livia Nock, Emma Ryan, Rebecca Brenneis, Nicholas Hoffman, and Nicolette Bugher for project assistance. We also thank the Yale Analytical and Stable Isotope Center and the Cary Institute of Ecosystem Studies for use of laboratory facilities, and the Yale Center for Research Computing for use of the high-performance computing infrastructure.