Contamination of domestic groundwater systems by verotoxigenic escherichia coli (VTEC), 2003-2019: A global scoping review

Verocytotoxin-producing E. coli (VTEC) are important agents of diarrhoeal disease in humans globally. As a noted waterborne disease, emphasis has been given to the study VTEC in surface waters, readily susceptible to microbial contamination. Conversely, the status of VTEC in potable groundwater sources, generally regarded as a “safe” drinking-water supply remains largely understudied. As such, this investigation presents the first scoping review seeking to determine the global prevalence of VTEC in groundwater supply sources intended for human consumption. Twenty-three peer-reviewed studies were identified and included for data extraction. Groundwater sample and supply detection rates (estimated 0.6 and 1.3%, respectively) indicate VTEC is infrequently present in domestic groundwater sources. However, where generic (fecal indicator) E. coli are present, the VTEC to E. coli ratio was found to be 9.9%, representing a latent health concern for groundwater consumers. Geographically, extracted data indicates higher VTEC detection rates in urban (5.4%) and peri–urban (4.9%) environments than in rural areas (0.9%); however, this finding is confounded by the predominance of research studies in lower income regions. Climate trends indicate local environments classified as ‘temperate’ (14/554; 2.5%) and ‘cold’ (8/392; 2%) accounted for a majority of supply sources with VTEC present, with similar detection rates encountered among supplies sampled during periods typically characterized by ‘high’ precipitation (15/649; 2.3%). Proposed prevalence figures may find application in preventive risk-based catchment and groundwater quality management including development of Quantitative Microbial Risk Assessments (QMRA). Notwithstanding, to an extent, a large geographical disparity in available investigations, lack of standardized reporting, and bias in source selection, restrict the transferability of research findings. Overall, the mechanisms responsible for VTEC transport and ingress into groundwater supplies remain ambiguous, representing a critical knowledge gap, and denoting a distinctive lack of integration between hydrogeological and public health research. Key recommendations and guidelines are provided for prospective studies directed at increasingly integrative and multi-disciplinary research. © 2021 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )


Introduction
At present, six E. coli pathotypes collectively known as diarrhoeagenic E. coli are recognized as clinically important. Verotoxigenic E. coli (VTEC), or Shiga toxin-producing E. coli (STEC), are characterized by the production of verocytotoxins (Stx1, Stx2) similar to AB5-type Shiga toxins which include enterohaemor- Baranzoni et al., 2016 ). VTEC enteritis comprises a wide range of symptoms from mild uncomplicated infection in healthy adults to severe haemorrhagic diarrhea and colitis among vulnerable sub-populations ( Newell and La Ragione, 2018 ). Potential sequalae include haemolytic uraemic syndrome (HUS), renal failure, and thrombotic thrombocytopenic purpura (TTP), all of which can prove fatal in a minority (3 -10%) of cases ( Rahal et al., 2015 ).
VTEC transmission is often zoonotic, occurring via the fecaloral route, with cattle the most frequently reported animal reservoir, but also potentially including other domesticated animals and wildlife ( Farrokh et al., 2012 ;;Ahmed et al., 2015 ;Penakalapati et al., 2017 ). The organism is characterized by a relatively small infectious (threshold) dose (ID 50 < 100 cells), with human infection in developed regions typically associated with consumption of contaminated water or food ( Croxen et al., 2013 ;Saxena et al., 2015 ). Accordingly, VTEC enteritis represents a major global public health concern, albeit the global human health burden remains largely unknown due to the lack of "comprehensive" confirmed infection data, accredited to limited and/or resource constrained surveillance systems in developing regions as well as underdiagnoses among healthy populations ( Croxen et al., 2013 ;Rivas et al., 2016 ;Newell and La Ragione, 2018 ;Delahoy et al., 2018 ). Available estimates, which are conservative and likely a significant underestimate, place the global burden of VTEC infection at 2.8 million cases per annum, in concurrence with 3890 cases of HUS, 270 cases of renal disease, and 230 deaths ( Majowicz et al., 2014 ).
Since being recognised as an etiological agent in waterborne outbreaks in the early 1990s ( Dev et al., 1991 ;Swerdlow et al., 1992 ), VTEC, and particularly serogroup O157, have been implicated in both sporadic cases and outbreaks of waterborne gastrointestinal infection via consumption from contaminated drinking water sources ( Muniesa et al., 2006 ;Luna-Gierke et al., 2014 ;Saxena et al., 2015 ;Garvey et al., 2016 ;ECDC, 2019 ). Surface water is generally considered significantly more susceptible to pathogen ingress ( Moreira and Bondelind, 2017 ). Conversely, groundwater resources are often traditionally, and erroneously, perceived as an inherently (microbiologically) 'safe' source of water for domestic usage, due to natural attenuation processes afforded by overlying contiguous (sub-)soil layers ( Bain et al., 2014 ;Murphy et al., 2017 ). However, over the past decade, a newfound emphasis has been placed on the importance of groundwater as a transmission pathway for waterborne enteric infection ( Bradford and Harvey, 2017 ). VTEC strains have been reported in groundwater supplies and linked with multiple groundwater-related outbreaks ( Muniesa et al., 2006 ;Hynds et al., 2014a ;Guzman-Herrador et al., 2015 ;Saxena et al., 2015 ;Murphy et al., 2017 ;Moreira and Bondelind, 2017 ). For example, the Walkerton (Ontario, Canada) multietiological outbreak was positively associated with a contaminated municipal groundwater supply, causing 2300 acute clinical cases and 7 deaths, with E. coli O157:H7 identified as one of two pathogens responsible ( Hrudey et al., 2003 ).
To date, no 'global' effort to ascertain the extent of VTEC prevalence in 'domestic' (i.e., intended for human use/consumption) groundwater supplies and subsequent human exposure has been undertaken. These data are fundamental to accurately determine the exposures and public-health burden attributable to groundwater sources contaminated by VTEC ( Murphy et al., 2017 ;Newell and La Ragione, 2018 ). An improved understanding of groundwater borne VTEC is critically important considering our reliance on groundwater systems as a water resource (~95% of global freshwater reserves, ~2.2 billion daily consumers) ( Bradford and Harvey, 2017 ;Murphy et al., 2017 ;Cuthbert et al., 2019 ). Accordingly, the current study sought to conduct a comprehensive analysis of international peer-reviewed literature to: ( i ) identify the global occurrence of VTEC in 'domestic' groundwater supplies and ( ii ) iden-tify and categorize (extra-)local risk factors (e.g., infrastructural, environmental, socioeconomic) associated with VTEC contamination. Study findings may be used to enhance the accuracy and consequent efficacy of predictive (environmental fate) modeling and human exposure/risk assessment of the health-risks associated with water consumption from groundwater sources.
In particular, robust figures of groundwater VTEC prevalence are fundamental in the formulation of Quantitative Microbial Risk Assessments (QMRA) (cf. Haas et al., 1999 ), a tool employed in the assessment of public-health risks associated with exposure to pathogenic microorganisms. As a multi-disciplinary tool-kit, QMRAs integrate metrics and models of pathogenic mobilization through exposure mechanisms (e.g., natural, engineered), eventual (human) pathogenic exposure, and ensuing risk of disease ( WHO, 2016 ). Today, QMRAs represent one of the leading methods in the characterization of consumer health risks associated with domestic water supplies ( Owens et al., 2020 ). Notwithstanding, QMRAs must rely on empirical (real-life) data in order to improve the validity of estimates generated and ensuing risk assessments.

Primary research question and literature review protocol
The literature review protocol was developed based upon several previous investigations ( Sargeant et al., 2006 ;Graham and Polizzotto, 2013 ;Hynds et al., 2014a ;Andrade et al., 2018 ;Chique et al., 2020 ), with the following research question established to direct the review protocol:

"What is the global prevalence of verotoxigenic E. coli (VTEC) in 'domestic' groundwater supply sources and what (extra-)local risk factors are associated with VTEC contamination?"
Scopus and Web of Science databases were employed as the primary bibliographic sources for identification of relevant studies. All database literature searches were conducted on July 1st, 2019, employing Boolean positional operators ("AND", "OR", "ADJ") in conjunction with custom search terms (Table S1). Search terms were nested according to specific categories deriving from the Population-Agent-Outcome (PAO) model ( Hynds et al., 2014a ). A number of trial (mock) search iterations were used to assess the validity and accuracy of search terms employed with the inclusion of 'outcome' terminology (e.g., 'ingress', 'contamination', 'pollution') significantly restricting the volume of retrieved database records (~40). Accordingly, a modified version of the PAO model excluding the 'outcome' category and associated search terms was adopted.

Screening phases, article selection and identification of additional literature
Four primary phases were employed for record screening, suitability assessment and study exclusion/inclusion ( Fig. 1 ). Overall, a total of 4 4 4 records were retrieved from bibliographic database searches with subsequent de-duplication decreasing this number to 302 (Phase 1). A set of eligibility criteria (Table S2) guided subsequent record "exclusion" phases (Phase 2-3). In the first instance, each record was subject to title and abstract screening with article selection based on eligibility criteria (Phase 2). Subsequently, Phase 3 consisted of full-text screening of forward-selected articles (from Phase 2), with exclusion of non-original and fulltext research articles (e.g., literature reviews, conference proceedings) and those identified as unsuitable as per additional eligibility criteria. In Phase 3, uncertainty in terms of article inclusion ( n = 13/112; 11.6%) was resolved through all researchers. Based on established aims and objectives, the review protocol exclusively focused on investigations analysing "typical" infrastructure used to source groundwater (e.g., boreholes, artesian wells, drilled wells). Specifically, exclusion criteria included: ( i ) short communications and other non-original research articles (e.g., literature reviews), ( ii ) articles wholly or partially based on previously published data, ( iii ) investigations failing to explicitly describe VTEC groundwater occurrence and/or not reporting the number of contaminated groundwater samples or sources, ( iv ) studies analysing water supplies in which groundwater is mixed with any other water source(s) (e.g., surface water), and ( v ) studies (either in -situ or ex -situ ) incorporating experimental methodologies (e.g., tracer, soil column studies), or based on non-typical groundwater infrastructure (e.g., river-bank filtration, infiltration galleries), which fail to reflect 'natural' VTEC groundwater prevalence.
Additional (relevant) articles not identified through the developed protocol were captured through manual screening of the bibliography of selected records incorporated into the review protocol (Phase 1-3) (i.e., "snowball" approach) ( Fig. 1 ). The latter included bibliographic screening of academic reviews and gray literature identified (independently or in Phase 1-2) as well as fulltext articles analysed in Phase 3. Furthermore, the "cite by" tool provided by Google Scholar was used to identify all articles citing each of the records previously screened in Phase 3. All newly identified records ( n = 19) were then subject to screening procedures described for Phase 2-3 ( Fig. 1 ).

Data categories and field extraction
Six primary data field categories were established to compartmentalize data extraction. A list of data extraction fields comprising each of the six categories is provided in Table S3 ( n = 41). In all instances, a "not reported" field label was assigned if relevant data were absent and/or ambiguously reported (Table S3).
A modified version of the World Health Organization (WHO) regional classification system ( WHO, 2019 ) was used to allocate studies to a corresponding geographical region ( n = 7). Similarly, mean income level and climate followed classification guidelines provided by the United Nations (UN, R. 2017 ) and Peel et al. (2007) (Köppen-Geiger System), respectively. If reported, climate allocation was based on specific location(s) with large-scale (e.g., regional, national) studies incorporating (multiple) climates assigned an "unknown" classification. Where possible, reported sampling periods were compared with (averaged) monthly precipitation data from records extracted from the nearest (available) weather station. The World Meteorological Organization (WMO) global database was used to find representative weather stations ( http://worldweather.wmo.int ) ( WMO, S. 2020 ) with each investigation allocated a tentative category relative to (averaged) local precipitation statistics (i.e., high, low). As such, precipitation classifications were relative to local monthly records. In order to gain insights into the (potential) influence of seasonal livestock management (e.g., summer grazing) on VTEC prevalence, investigations from higher latitudes (i.e., 'temperate' and 'cold' climates) were discretized where 'summer' sample collection was specified in their experimental designs. Investigations based in 'tropical' climates were omitted from 'summer' classifications and reported as "not available". Local settlement patterns, classified as 'rural', 'urban' and 'mixed' (i.e., peri-urban), were based on available descriptions and specific study location(s) provided. Additionally, the 'rural' category was sub-divided according to the reported (primary) agricultural focus: ( i ) fresh produce (e.g. fruits, vegetables, grains) and ( ii ) livestock (dairy) production. Generally, groundwater supply type and/or accompanying infrastructural descriptions were largely ambiguous. Accordingly, an attempt was made to (tentatively) classify groundwater sources into two main categories based on available groundwater supply descriptions and inferred levels of construction, integrity and protection (viz. Bain et al., 2014 ;Chique et al., 2020 ). These two consisted of hand-dug (unimproved) groundwater supplies, typically associated with lower levels of construction/design, and boreholes (protected), which are presumed to afford higher levels of structural protection.
Each investigation was classified according to the reported sampling point, namely ( i ) direct groundwater 'source' (e.g., well tap) or ( ii ) point of 'use' (e.g., household tap). A number of investigations ( n = 5) collected groundwater samples of varying water volume. Accordingly, in each instance sample volume allocation followed the minimum of the reported volume range. VTEC detection methods employed included either amplification of verocytotoxin gene markers (Stx1, Stx2), culture-based (presumptive) E. coli O157 detection, or a combination of both methods, i.e., presumptive followed by molecular confirmation. Subject to specific PCR-assay methodology, a range of additional virulence genes/markers (e.g., eae, hlyA, rfbE ) were also targeted and employed in VTEC sample detection. Overall, prevalence figures provided within are based in a combination of PCR-based VTEC and presumptive E. coli O157 identification. Where reported, the number of positive groundwater samples of both 'generic' E. coli and VTEC were extracted from reviewed studies. As such, E. coli :VTEC detection ratios were calculated according to different and relevant data extraction categories (e.g., study location, sampling strategy). Investigations seeking to establish links between (potential) environmental VTEC reservoirs and groundwater contamination using strain typing tools were also incorporated into the data extraction protocol. The two techniques employed were pulsed-field gel electrophoresis (PFGE) and multiple-locus variable-number tandem repeat (VNTR) Analysis (MLVA) molecular typing. Groundwater contamination mechanisms were inferred from available study descriptions (if any), with four main groundwater contamination categories employed ( Lee, 2005 ;Hynds et al., 2012 ;Andrade et al., 2018 ;Chique et al., 2020 ), including ( i ) direct surface ingress, consisting of contaminants entering groundwater supply units via surface structural components (wellhead). ( ii ) groundwater recharge or (sub-)soil layer filtration/migration of (surface-borne) contaminants and eventual groundwater deposition. ( iii ) Direct underground migration, i.e., groundwater contamination originating from sources below soil surface (e.g., septic tanks). ( iv ) inter-aquifer exchange comprising contamination of groundwater (exclusively) through hydraulic interconnectivity. Similarly, contamination sources were grouped into two main categories; human (e.g., domestic wastewater) and animal.

Included studies, geography, local environments and inconsistent reporting
The review protocol identified 23 studies that complied with all inclusion criteria ( Fig. 1 , Table S1). A synthesis of extracted and collated data across data field categories is provided in Table  S3. A lack of consistent and/or ambiguous reporting was observed among all established data extraction categories, and was particularly prevalent in terms of structural descriptions of groundwater supply sources surveyed (Table S3). Specifically, 65.2% of studies (15/23) included no extractable data for any pre-established infrastructural data fields. Overly generic structural supply unit descriptions (e.g., well depth) were observed in 17.4% (4/23) of studies. In turn, just two studies explicitly reported comprehensive infrastructural data in relation to analysed groundwater supply units ( Pitkänen et al., 2011 ;Ferguson et al., 2012 ). Similarly, absent and/or ambiguous reporting was a consistent feature with regard to (hydro-)geological setting, with just 5 studies providing a relevant description ( Table 1 ).

Study design, climate and seasonality
The dataset largely comprised investigations focusing on groundwater VTEC "prevalence" (21/23; 91.3%) in contrast to those prompted by outbreaks of infection (2/23; 8.7%) ( Table 1 ). The two infection-related investigations derived from outbreaks associated with ( i ) consumption of bagged spinach in the USA and Canada ( Jay et al., 2007 ) and ( ii ) drinking water in a school camp in South Korea ( Park et al., 2018 ). Studies more frequently originated from high latitudes ( Fig. 2 ) resulting in the predominance of 'temperate' (10/23; 43.5%) and 'cold' (4/23; 17.4%) climatic classifications ( Table 1 ). 'Single' or 'one-off' sampling regimes accounted for 15/23 (65.2%) of studies and 1638/2471 (66.3%) of analysed samples, Table 1 Summary of key characteristics extracted from included studies ( n = 23). Extracted data are based on total of n = 2471 groundwater samples and n = 1998 groundwater supplies comprising the dataset.

Groundwater supply characteristics and detection methods
Approximately half of the identified studies focused on groundwater supplies classified as serving an individual household (i.e., privately-owned) (11/23; 47.8%) ( Table 1 ). Most studies (18/23; 78.3%) analysed groundwater supplies classified as boreholes (i.e., protected), accounting for a total of 1518/2471 (61.4%) samples analysed. Groundwater supplies categorized as hand-dug wells only featured in two investigations. Few investigations reported VTEC analysis on groundwater samples which were subject to treatment (e.g., chemical, physical) ( n = 3). However, several studies failed to describe if a treatment system was employed ( n = 10). As shown ( Table 1 ), there was a tendency for investigations to adhere to 'source' sample collection points with supplies directly sampled in 11/23 (47.8%) investigations. Conversely, collection of samples from a domestic tap, i.e., following distribution, was reported in 4/23 (17.4%) studies. Approximately half of reviewed studies based VTEC analysis on sample volumes of 100-250 ml (11/23; 47.8%) with the preferred method employed for sample concentration being membrane filtration (16/23; 69.6%). Molecular identification comprising PCR was the favoured detection method (19/23; 82.6%). Application of quantitative PCR was limited to 3/19 (15.9%) investigations. In total, 14/19 (73.4%) investigations employed PCR on E. coli previously isolated through culture/enrichment. A subset of PCR-based studies incorporated PCR analysis of all groundwater samples collected (5/19; 26.3%). Overall, application of VTEC serotype typing tools to identify (potential) environmental sources was infrequently employed (3/23, 13%) ( Table 1 ).

Generic E. coli and VTEC detection rates
VTEC was cultured and/or genetic markers ( stx1, stx2 ) identified in 9/23 (39.1%) studies, with overall sample-and source-specific detection rates of 0.6% (16/2471) and 1.3% (25/1998), respectively. The unexpected (higher) number of positive supply sources in contrast to groundwater samples stems from the prevalence of inconsistent reporting in the review dataset with several investigations failing to specify source/sample VTEC detection rates (Table S3). Based on positive PCR detection in 8/19 (42.1%) reviewed studies, gene detection was reported in 0.8% (4/477) of groundwater samples, and 6.9% (34/493) supplies analysed ( Table 1 ). A single study employing culture-based methods reported positive identification of (presumed) E. coli O157 in an unspecified number of groundwater sources/samples. Based on pooled (i.e., category-specific) values deriving from the review dataset, parallels between groundwater sample detection rates of 'generic' E. coli and VTEC according to study sampling strategy ('one-off' and 'repeat') are presented in Fig. 5 . Pooled study data indicates a 'generic' E. coli groundwater sample detection rate of 315/1926 (16.4%) ( Fig. 5 A). Notwithstanding, a total of 7/23 (30.4%) investigations did not explicitly report the study-specific generic E. coli detection rate. Again, this lack of reporting was associated with ambiguous/incomplete data collation and/or presentation (4/23; 17%) or was related to investigations not employing E. coli as a groundwater fecal indicator organism (FIO) (i.e., direct VTEC detection) (3/23; 13%). Overall, a VTEC to 'generic' E. coli sample detection ratio of 15/152 (9.9%) was estimated ( Fig. 5 A). Additionally, VTEC: 'generic' E. coli specific to 're-peat' and 'one-off' investigations of 16.7% and 7.5% were estimated ( Fig. 5 ). In terms of geographic VTEC: 'generic' E. coli detection ratios, Europe (19.2%) and North America (17.5%) exhibited the highest values ( Table 3 ).
Pooled VTEC detection rates for groundwater samples and groundwater supply sources of 0.7% (16/2230) and 1.3% (25/1949) were calculated, respectively. Similar to 'generic' E. coli incidence, several investigations failed to report sufficient data to calculate VTEC detection rates. Specifically, 5/9 (55.6%) VTEC positive studies failed to specify the number of contaminated groundwater samples ( Fig. 5 A). Likewise, 3/9 (33.3%) investigations did not report the number of sampled groundwater supplies.

Potential drivers of VTEC in groundwater
Geographically, highest VTEC groundwater detection rates were reported in Sub-Saharan Africa (10/152; 6.6%) ( Table 2 ). This feature is clearly reflected in country-specific data ( Fig. 2 ) with South Africa reporting the highest VTEC supply detection rates (8.5%). A local study ( Abia et al., 2017 ) reported the highest VTEC (dataset) prevalence with respect to groundwater supplies (9/18; 50%). In turn, Won et al. (2013) reported the highest sample detection rate (7/180; 3.9%) among dairy farm environments in Ohio, USA. High groundwater supply detection rates in Sub-Saharan Africa influence the high values observed in 'upper-middle' income categories (10/130; 7.7%) ( Table 2 ; Fig. 3 ). Extracted data also indicate 'urban' and 'peri-urban' environments had higher supply VTEC contamination rates at 5.4% and 4.9%, respectively. Conversely, 'rural' settings had lower pooled VTEC supply/sample detection rates of 0.7% and 0.9%, respectively. Climate trends indicate environments classified as 'temperate' (14/554; 2.5%) and 'cold' (8/392; 2%) accounted for a majority of contaminated supply sources; estimates likely constrained by the high-latitude focus of studies comprising the dataset ( Table 1 ). Conversely, no VTEC were reported in the single (data reporting) investigation based in 'arid' environments. Within the context of supply regulation/management, public supplies ( n = 2) exhibited no contamination in contrast to those under private or alternate (i.e., 'mixed') administration ( Table 2 ). Overall, private (i.e., unregulated) groundwater samples (15/800; 1.9%) and supplies (15/631; 2.4%) were characterised by the highest VTEC detection rates. Moreover, while the number of investigations (explicitly) focusing on hand-dug supplies was low ( n = 2), higher sup- Fig. 5. Schematic of 'generic' E . coli and VTEC detection rates and incidence ratios. VTEC detection rates and ratios specific to sampling strategy categories ('one-off' and 'repeat') are also provided. The total number of investigation in each sampling strategy category is provided at the top of each panel. The single study in the dataset with a "mixed" (one-off/repeat) sampling strategy was incorporated into categories shown in panel B and C resulting in a total number of 24 investigations between categories. The number of investigations failing to report relevant E . coli and VTEC data to calculate detection rates is provided in text boxes. * Includes (pooled) groundwater samples from all investigation and corresponding sampling strategy categories as indicated. . ply detection rates were calculated (9/40; 22.5%) in comparison to supply sources classified as "protected"(11/1137; 1%).
Assessing the potential nexus between seasonality and sampling design, investigations focusing sampling efforts during periods of (normally) 'high' precipitation reported higher detection rates for both groundwater samples (15/683; 2.2%) and supplies (15/649; 2.3%) relative to 'low' precipitation periods, where there was no reported detection across samples or sources ( Table 2 ). The two investigations based on "temporally" limited sampling campaigns ( < 1 month) reported the highest sample/supply VTEC prevalence (6% and 7.8%, respectively). Similarly, VTEC source de-tection rates were higher (10%) in studies characterized by a more "limited" scope in terms of the number of sampled groundwater supplies ( n = < 10). Pooled data indicate comparable groundwater source detection rates among investigations focusing on 10-20 (3.1%) and 51-150 (3.7%) groundwater samples. Investigations incorporating 'repeated' sampling regimes reported higher VTEC supply detection rates (13/324; 4%) than 'one-off' investigations (12/1420; 0.8%) ( Table 2 ). Similarly, as shown in Fig. 5 , higher sample detection rates for 'generic' E. coli were reported in 'repeated' (151/711; 21.2%) compared with 'one-off' (176/1440; 12.2%) investigations. An overall adjusted VTEC: 'generic' E. coli detection ratio Table 2 Pooled data synthesis based on reported groundwater samples ( n = 2230) and sources ( n = 1949) with positive VTEC detection among selected study characteristics. The VTEC sample/source detection column displays the number of investigations reporting extractable data in terms of samples (left) and supply sources (right) analysed and corresponding (pooled) VTEC contamination rates with values in bold indicating full (equivalent) VTEC contamination reporting in the dataset. of 5/30 (16.7%) was calculated to account for the potential effects of 'repeat' sampling on groundwater sample detection. Overall, identification and attribution of contamination mechanisms, pathways and sources were often absent and/or subject to very ambiguous reporting. Only 3/9 (33.3%) investigations reporting positive VTEC detection described potential ingress mechanism(s) ( Table 4 ). Specifically, groundwater recharge and direct surface ingress were the primary ingress mechanisms suggested with each featuring in two investigations; albeit largely based on tentative attribution ( Table 4 ). Reported presumptive contamination sources were evenly distributed among 'animal' and 'human' categories (5/6; 83.3%), with cattle the only (specific) source potentially linked to VTEC groundwater supply contamination 3/5 (60%) ( Table 4 ). In turn, human contamination sources were often (vaguely) associated with domestic waste effluents, with pit latrines explicitly linked with VTEC supply ingress in 2/5 (40%) investigations. Contamination source attribution was primarily based on source proximity/adjacency. A total of 2/3 investigations employing environmental VTEC strain typing reported the presence of VTEC. Through comparison of PFGE patterns obtained from human stool samples and drinking water isolates, Park et al. (2018) identified groundwater as the most likely source of a (multiple) diarrhoeagenic E. coli outbreak (VTEC/EPEC). Similarly, Schets et al. (2005) compared ( E. coli O157) PFGE patterns from groundwater isolates with collated records from different (potential) regional sources, identifying cattle as the likely con-tamination source. Beyond select investigations focusing on specific agricultural settings ( n = 7), most investigations had a 'broad', i.e., landscape-wide, approach towards VTEC detection without any (reported) explicit contamination sources ( Table 1 -2 ). A total of 5/8 (62.5%) PCR-based investigations with positive VTEC detection specified the serogroups identified, with only O157 ( n = 4) and O103 ( n = 1) reported ( Table 4 ).

Generic E. coli and VTEC detection ratios
To the authors knowledge, the current study represents the first attempt to globally quantify VTEC incidence in groundwater supplies. As such, one of the key deliverables emanating from this investigation are estimated 'generic' E. coli: VTEC detection ratios ( Fig. 5 ) with potential (bespoke) applications in groundwater management. The presented figures may be applicable for development of increasingly accurate QMRA, specifically focusing on 'domestic' groundwater supplies ( Haas et al., 1999 ); primarily, as a pathogen 'contribution' or 'loading' metric (i.e., model input) specific to groundwater borne VTEC. To date, development of QMRA for estimating the burden of groundwater borne VTEC has largely relied on E. coli:E. coli O157 ratios estimated from a range of environmental waters (e.g., Haas et al., 1999 ;Soller et al., 2010 ). Hence, it may be contended that the presented ratios provide a more specific representation of VTEC incidence in groundwater environments. Notwithstanding, it is critical to highlight the potential influence of selective sampling on estimated values ( Fig. 5 ). Key data trends (e.g., sampling period, number of supplies analysed) suggest that studies with more "inclusive" (i.e. random rather than targeted) sampling designs frequently convey lower levels of VTEC detection ( Table 2 ), and as such, the likelihood of 'bias' introduced by targeting susceptible supplies (i.e., "proof of concept") is emphasized and potentially compounded by publication bias (i.e., failure to publish negative results). Identified as a key feature in a range of (recent) literature compilations with similar scope ( Hynds et al., 2014a ;Andrade et al., 2018 ;Chique et al., 2020 ), there is sufficient evidence to suggest that sampling "bias" is somewhat ubiquitous within the microbial groundwater contamination literature; a factor which may compromise the integrity of estimates presented herein.
Arguably, the ratio estimated from studies adopting 'repeat' sampling (16.7%) may provide a more realistic estimate of 'generic' E. coli presence and concurrent VTEC incidence in groundwater. As such, the calculated 'repeat' value likely represents a more robust metric which can be applied to accurately discern (human) VTEC exposure over time. Evidently, the latter is expected to exhibit natural temporal variation with the 'repeat' ratio presented perhaps best representative of an "annual" mean based on the reported distribution of sampling periods ( Table 2 ). Generic E. coli :VTEC ratios deriving from 'repeat' investigations are also less compromised by incomplete reporting, which potentially reinforces the cogency of estimated values ( Fig. 5 ). Previous QMRAs focusing on domestic water distribution systems (e.g., Howard et al., 2006 ;Katukiza et al., 2013 ;Machdar et al., 2013 ;Abia et al., 2016 ) have employed an E. coli :VTEC detection rate of 8% (viz. Haas et al., 1999 ), also employed for E. coli :EPEC (Shresta et al., S. 2017 ), and comparable with the low range presented in this investigation (7.5-9.9%; Fig. 5 ). In turn, Hynds et al. (2014b) use the lower bound probabilistic range of 1-16% as reported by Soller et al. (2010) from untreated drinking water. The latter exhibits some agreement with the estimated 16.7% ratio based on 'repeat' sampling ( Fig. 5 B). While available figures are comparable, the likelihood of estimated E. coli and VTEC values being inflated as a result of selective sampling needs to be highlighted. Clearly, the lack of clear and homogeneous reporting relating to both E. coli and VTEC groundwater sample detection ( Fig. 5 ; Table 2 ) represents a critical limitation among identified studies.

VTEC groundwater incidence and geographical data trends
Pooled review data were used to establish VTEC groundwater sample/supply detection rates of 0.7% and 1.3%, respectively. The unexpected lower detection rates estimated for groundwater samples in contrast to supplies are an artefact of inconsistent reporting of VTEC detection among investigations (Table S3). In light of the potential effects of selective sampling and likelihood of figure "inflation", it is argued that values deriving from investigations employing 'repeat' sampling designs (1-4%; Table 2 ) provide more "robust" VTEC prevalence estimates, which are potentially representative of global contamination 'baselines'. Notably, a recent investigation ( Stokdyk et al., 2020 ), not included in the review dataset (published outside review period), reporting on a comprehensive number of groundwater samples ( n = 834) and supplies ( n = 145) identified comparable VTEC sample detection rates (0.4%) to those presented here. Analogous values may serve to support presented VTEC detection rates; however, this study was both geographically restricted and exclusively focused on 'public' supplies, with any comparisons thus requiring careful interpretation.
Overall, the low number of investigations identified ( n = 23) and (relatively) limited number of groundwater samples/supplies comprising the review dataset ( Table 1 ) represent both a key finding and research gap considering: ( i ) the importance of VTEC as a globally significant (waterborne) pathogen and concomitant clinical burden, and ( ii ) reported links between groundwater resources and risk of human infection ( Muniesa et al., 2006 ;Majowicz et al., 2014 ;Guzman-Herrador et al., 2015 ;Moreira and Bondelind, 2017 ). Similarly, the lack of robust data on VTEC serotype prevalence in groundwater wells ( Table 4 ) constitutes another important finding; particularly considering the increasingly reported global incidence and clinical burden of non-O157 serotypes and potential for regional serotype variability ( Gould et al., 2013 ;Luna-Gierke et al., 2014 ;Baranzoni et al., 2016 ). Geographically, the review dataset was characterized by key regional data (knowledge) gaps ( Fig. 2 ). The lack of regional data represents a crucial limitation accounting for both reported geographical variations in notified VTEC infections and potential influence of local/regional risk factors ( Croxen et al., 2013 ;Newell and La Ragione, 2018 ). Regional E. coli and VTEC incidence data are essential for deriving geographically specific information and distributions. For example, Murphy et al. (2016) pooled available data from two North American studies to produce custom (QMRA) health-risk assessments for Canadian groundwater supplies. Despite the poor geographical distribution encountered and prospective bias in source selection, some of the collated regional figures and generic E. coli :VTEC ratios may find useful application in future investigations ( Table 2 -3 ; Fig. 2 ).

Local environments, settlement patterns and land-use
Emphasizing the importance of local risk factors is key considering higher VTEC groundwater supply detection rates in 'urban' (5.4%) and 'mixed' (4.9%) settings ( Table 2 ). To an extent, findings exhibit a disparity with available geostatistical evidence linking VTEC prevalence with 'rural' environments in high-income regions ( Schets et al., 2005 ;Denno et al., 2009 ;ÓhAiseadha et al., 2017 ;Brehony et al., 2018 ). Notably, the majority of reviewed investigations based in 'urban' and 'mixed' settings were concentrated in lower income regions in Asia and Sub-Saharan Africa ( Fig. 3 ). Available evidence from Sub-Saharan Africa highlights the intensity of (peri-) urban animal rearing, the prevalence of domestic wastewater sources, incidence of (zoonotic) VTEC strains, and high contamination burden on groundwater resources ( Kulabako et al., 20 07 ;Adelana et al., 20 08 ;Braune and Xu, 2010 ;Lupindu et al., 2014 ;Lapworth et al., 2017 ). Notably, in terms of cattle management practices, barn/feedlot confinement as opposed to traditional 'pastoral' (i.e., free-roaming) rearing has been associated with lower VTEC environmental prevalence in developing countries ( Callaway et al., 2009 ;Smith et al., 2011 ). However, considering spatial contiguity, presented results suggest that a diffuse livestock (source) distribution, and the associated potential for close interaction with groundwater supplies (receptor), increase the likelihood of (supply) pathogenic contamination and human VTEC exposure. Highlighted geographical and land-use trends may also reflect the regional importance of manure application (land spreading) in agriculture; a potentially key conduit for environmental VTEC dissemination ( Fremaux et al., 2008 ;Ferens and Hovde, 2011 ;van Elsas et al., 2011 ). However, detailed data on manure soil application rates at a "global" scale are not available, with any conjectures made on the geographical influence of soil manure on estimated VTEC prevalence subject to speculation.
Investigations based in agricultural environments focusing on fresh produce ( n = 4) reported no positive VTEC detections ( Table 2 ) with a calculated (FIO) E. coli sample detection rate of 11.7% (18/154). Results are relevant in light of (global) evidence for microbial contamination in produce-growing regions and the potential for pathogen propagation via fresh produce (e.g., German  ( Schets et al., 2005 ). N/R = Not reported. sprout outbreak; ca. 2011) ( Buchholz et al., 2011 ;Pachepsky et al., 2011 ;Nguyen-the et al., 2016 ). In particular, due to its zoonotic origin, enhanced survival capacity in soil/water, and high prevalence in farming environments, VTEC is a leading etiological agent of outbreaks linked to the consumption of fresh produce ( Olaimat and Holley, 2012 ;Lal et al., 2012 ;Jung et al., 2014 ). Conversely, studies based in rural environments with an (explicit) focus on cattle rearing exhibited higher VTEC detection rates in comparison to other rural sub-categories ( Table 2 ). However, two investigations of supplies located adjacent to Concentrated Animal Feeding Operations (CAFOs) ( Economides et al., 2012 ;Li et al., 2015 ) reported positive (FIO) E. coli groundwater sample detection (8.6%) but no VTEC contamination. Similarly, Joung et al. (2013) analysed groundwater wells in relation to (confined) livestock carcass burial sites. All samples analysed ( n = 627) were negative for VTEC, with rates of generic E. coli contamination calculated at 7.3%. The results presented seem to support the aforementioned inferences relating to the spatial distribution (i.e., confinement) of potential VTEC sources and decreased likelihood of groundwater supply contamination.

Potential influence of climate and seasonality
Although based on limited data (both in frequency and scope), summary statistics highlight the potential influence of 'temperate' and 'cold' climatic settings on higher rates of VTEC supply incidence (2-2.5%) ( Table 2 ). Pooled data may (at times discreetly) reflect the cumulative effects of lower temperatures, reduced ultraviolet (UV) solar radiation, and higher precipitation on VTEC survival, transport, and thus environmental prevalence. There is substantial evidence within the literature supporting enhanced (extra-host) E. coli and VTEC survival at lower temperatures ( < 10 °C), in both environmental and potable water, soils, plant material, cattle feces and derived effluents (e.g., manure, slurry) ( John and Rose, 2005 ;Fremaux et al., 2008 ;Ma et al., 2011 ;van Elsas et al., 2011 ). Generally, increased stress and energy expenditure associated with higher temperatures limits E. coli persistence in the environment with large amplitude oscillations ( > + /-7 °C) particularly impacting E. coli survival ( Semenov et al., 2007 ). Previous experiments conducted on natural well water indicate increased VTEC survival at 5-10 °C ( Rice et al., 1992 ;Watterworth et al., 2006 ). Collated data trends may also be partially due to reduced levels of solar UV radiation in 'temperate' and 'cold' settings increasing VTEC prevalence in soils, surface water, and manures, prior to mobilization to subsoils and groundwater ( LeJeune et al., 2001 ;Yaun et al., 2003 ;20 04 ;Fremaux et al., 20 08 ). Additionally, the effects of high rainfall on the release and propagation of fecal material and bacteria through enhanced (landscape) hydrological connectivity (e.g., overland flow) have been well substantiated ( Fremaux et al., 2008 ;Hofstra, 2011 ;McCarthy et al., 2012 ;Blaustein et al., 2016 ). Unsurprisingly, VTEC outbreaks have been frequently linked with periods of heavy antecedent rainfall ( Muniesa et al., 2006 ;O'Dwyer et al., 2016 ). These effects are suggested via higher VTEC sample/supply detection rates calculated from investigations focusing on periods of 'high' precipitation (~2%) ( Table 2 ). E. coli have also been shown to benefit from anaerobic and microaerobic conditions provided by moist or waterlogged environments often found in (wet) 'temperate' settings ( Fremaux et al., 2008 ;Brennan et al., 2010 ;van Elsas et al., 2011 ).

Contamination sources, pathways and hydrogeology
Reviewed investigations also alluded to the potential relevance of 'human' contamination sources, and specifically, latrines. With the exception of one study ( Won et al., 2013 ), all investigations align contamination sources with 'human' origin derived from lower income settings ( n = 4), thus reinforcing the potential underlying role of socioeconomic drivers ( Graham and Polizzotto, 2013 ). Just three investigations reported potential ingress mechanisms ( Table 4 ), all of which were largely grounded on inconclusive evidence, representing a critical knowledge gap. A major methodological limitation precluding accurate identification of VTEC contamination sources was the lack of molecular VTEC strain typing employed in reviewed studies (e.g., PFGE, MLVA) ( Table 1 ). A significant majority of studies tentatively identified VTEC sources based on proximity rather than employing DNA typing to effectively link 'source(s)' and 'receptor(s)', thus representing a key implication for the applicability of reported data. As such, while cattle are generally reported as a major source of environmental VTEC, their importance as a reservoir in developing regions, in the absence of data, is more uncertain ( Mainga et al., 2018 ). Accordingly, other key animal (or human) VTEC reservoirs may remain unaccounted for ( Ferens and Hovde, 2011 ;Ahmed et al., 2015 ). Table 4 Summary of selected characteristics within the "VTEC contamination" data extraction category based on studies with positive detection of VTEC in groundwater ( n = 9). The study number column is based on the number of investigations analysing or reporting relevant data (e.g., reported source/pathway). Pooled synthesis data among selected characteristics are also provided.

Characteristics
Studies n (%) Positive Samples n (%) Positive Sources n (%) In general, an integrative research approach aimed at identifying VTEC contamination sources, pathways and potential risk factors, is often superseded by a focus on quantifying VTEC environmental "prevalence". This feature is also reflected in the recurrent omission of local (hydro-)geological characteristics ( Table 1 ). Accounting for the influence of parent materials on key (sub-)soil characteristics (e.g., porosity, permeability), and in turn, E. coli survival and transport (i.e., retention, filtration), study area hydrogeological setting represents a pivotal driver ( Fremaux et al., 2008 ;Bolster et al., 2009 ;van Elsas et al., 2011 ). However, a limited number of investigations provided relevant hydrogeological descriptions (5/23; 21.7%). Ferguson et al. (2012) associated local geology with (VTEC) groundwater contamination through provision of evidence-based links between shallow (unconfined) sandy aquifers, rainfall intensity and efficient groundwater recharge. Similarly, Li et al. (2015) attributed unsaturated alluvial sediment soil layers (3-30 m) with a high pollutant attenuation capacity and mitigation of VTEC transport to groundwater. Additional investigations reporting hydrogeological data did not directly address the potential relevance in terms of VTEC incidence. Incorporation of hydrogeological parameters as potential risk factors in future investigations is thus strongly advised.

Groundwater supply type, infrastructure and administration
With noted exceptions (e.g., Ferguson et al., 2012 ), a significant majority of reviewed investigations failed to describe sample collection protocols and wellhead characteristics (e.g., tap, pump, ), allowing for discrimination of the effects (if any) of supply design/construction on VTEC incidence. Moreover, descriptions of source structural components (e.g., well casing, seal) and other key features (e.g., depth, age) were also largely absent from reviewed investigations (Table S3). This represents an impediment to identifying risk factors pertaining to VTEC supply ingress and inappropriate/faulty structural components. Ferguson et al. (2012) have identified increased supply depth as a determining factor limiting VTEC contamination, attributed to effective pathogen attenuation with increasing soil depth. Despite its potential importance, only 3/23 (13%) studies (explicitly) reported groundwater supply depth. Li et al. (2015) failed to detect VTEC irrespective of groundwater well depth while Won et al. (2013) found no association between well depth and VTEC contamination citing inadequate (private) supply maintenance as a possible mechanism for VTEC ingress.
The influence of varying levels of supply maintenance, regulation and surveillance, are potentially reflected in the higher VTEC detection values estimated for 'private' supplies ( Table 2 ). In contrast to public/municipal supplies, which are generally managed by local/regional regulatory authorities, and often comprise water treatment systems, private supplies are largely unregulated and untreated, and as such, more likely subject to inadequate maintenance and microbiological contamination ( Hexemer et al., 2008 ;Kreutswiser et al., R. 2011 ;Daniels et al., 2016 ;Fox et al., 2016 ). Review findings thus contribute to a growing body of evidence emphasizing the potential transmission of enteric pathogens and human exposure through private groundwater supplies ( Hynds et al., 2014a ;Wallender et al., 2014 ;Murphy et al., 2017 ). Additionally, higher VTEC detection rates among private supplies may also point to the prevalence of "direct" source contamination rather than widespread or aquifer-wide groundwater pollution. Pooled data based on (inferred) levels of supply construction and integrity indicate hand-dug supplies had elevated VTEC detection rates (22.5%) ( Table 2 ). However, findings require careful interpretation as they largely reflect detection levels reported in a single study ( Abia et al., 2017 ). Only two investigations explicitly analysed hand-dug groundwater supplies, both deriving from Sub-Saharan Africa. Accordingly, any assessment of risk factors associated with supplies and levels of protection are precluded by ( i ) vague descriptions of groundwater supplies, ( ii ) tentative nature of supply classifications employed and ( iii ) low number of identified investigations (explicitly) analysing 'unimproved' supplies ( n = 2).

Conclusion and recommendations
Presented herein is the first global scoping review of the presence of Verotoxigenic E. coli (VTEC) in groundwater sources intended for human consumption. Overall, a VTEC to 'generic' E. coli sample detection ratio of 15/152 (9.9%) was derived from reviewed investigations, providing a 'baseline' for VTEC in E. coli contaminated groundwater sources. Based on the relatively low number of identified investigations, prevalence of inconsistent reporting, and limited amount of data comprising the review dataset, a major recommendation deriving from this investigation is the pressing need for additional field-based studies/data on VTEC prevalence in groundwater systems. Additional data is essential to improve established VTEC groundwater contamination 'baselines' and accurately identify (extra-)local contamination risk factors, which are in turn, necessary to inform relevant (multi-disciplinary) stakeholders including groundwater consumers, public-health authorities and regulatory bodies, environmental managers, and hydrogeologists. Similarly, investigations deriving from geographically diverse settings are required to obtain insights/data applicable within different (hydro-)geological, climatic and (local) settlement (urban/rural) contexts. The following bespoke guidelines/recommendations are proposed to improve the scope, integrity, and insights obtained from future VTEC-groundwater investigations: -Explicit study reporting of (basic) summary statistics and methodology implemented; particularly pertaining to: ( i ) incidence of E. coli and VTEC in groundwater analysed ( n per sample/supply), ( ii ) length of field sampling period (e.g., month, year), and ( iii ) detailed descriptions of groundwater sampling protocols employed (e.g., point of sampling). -Investigations employing more temporally inclusive (i.e., 'repeat') sampling strategy/designs aimed at providing more robust estimates of VTEC groundwater prevalence over time. -Identification and description of groundwater supply type(s) analysed (e.g., hand-dug well, driven well), key infrastructural components (e.g., cap, casing), associated structural parameters (e.g., depth, age), and relevant (i.e., local) supply hydrogeological descriptions. -Implementation of molecular (VTEC) strain typing techniques (e.g., PFGE, MLVA) enabling the effective integration of (catchment-groundwater supply) VTEC contamination sources, pathways and ingress mechanisms. -More investigations employing tools allowing identification of VTEC serotypes present within 'domestic' groundwater systems.
The importance of efforts to understand and prevent microbial contamination of groundwater cannot be overstated. To minimize public health risk, consolidation of robust research approaches and systematic design of sampling programmes is not only warranted, but necessary. Regardless, findings of this study offer valuable insights into the extent and significance of groundwater as a potential source of VTEC infection and provides guidance for future research to guide policy development, action plans and remediation efforts/technologies to safeguard public health into the future.

Author contributions
J.OD and P.H conceptualised the study. C.C performed the initial review of literature. All authors (C.C, P.H, L.B, M.R., D.M, J.OD) determined final inclusion. Analyses were carried out by C.C. in consultation with J.OD and P.H. All authors were involved in the writing of the manuscript, with J.OD and P.H providing final approval for submission.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.