Human influence and biotic homogenization drive the distribution of Escherichia coli virulence genes in natural habitats

Abstract Cattle are the main reservoirs for Shiga‐toxin‐producing Escherichia coli (STEC), the only known zoonotic intestinal E. coli pathotype. However, there are other intestinal pathotypes that can cause disease in humans, whose presence has been seldom investigated. Thus, our aim was to identify the effects of anthropic pressure and of wild and domestic ungulate abundance on the distribution and diversity of the main human E. coli pathotypes and nine of their representative virulence genes (VGs). We used a quantitative real‐time PCR (qPCR) for the direct detection and quantification of the genus‐specific gene uidA, nine E. coli VGs (stx1, sxt2, eae, ehxA, aggR, est, elt, bfpA, invA), as well as four genes related to O157:H7 (rfb O157, fliCH7) and O104:H4 (wzx O104, fliCH4) serotypes in animals (feces from deer, cattle, and wild boar) and water samples collected in three areas of Doñana National Park (DNP), Spain. Eight of the nine VGs were detected, being invA, eae, and stx2 followed by stx1, aggR, and ehxA the most abundant ones. In quantitative terms (gene copies per mg of sample), stx1 and stx2 gave the highest values. Significant differences were seen regarding VGs in the three animal species in the three sampled areas. The serotype‐related genes were found in all but one sample types. In general, VGs were more diverse and abundant in the northern part of the Park, where the surface waters are more contaminated by human waste and farms. In the current study, we demonstrated that human influence is more relevant than host species in shaping the E. coli VGs spatial pattern and diversity in DNP. In addition, wildlife could be potential reservoirs for other pathotypes different from STEC, however further isolation steps would be needed to completely characterize those E. coli.


| INTRODUCTION
A large number of infectious agents, including those most important to the microbiological safety of food and water, have been identified in domestic animals and wildlife. Food-borne bacterial pathogens evolve in response to environmental changes, developing new virulence properties and occupying new niches (Newell et al., 2010). Bacterial pathogens acquired their pathogenic capability by incorporating different genetic elements through horizontal gene transfer (Koonin, Makarova, & Aravind, 2001) and thus the ancestors of virulent bacteria, as well as the origin of virulence determinants, lay most likely in the environmental microbiota (Martinez, 2013). The ubiquitously distributed enterobacterium Escherichia coli (E. coli) is naturally present in the lower intestinal tracts of humans and warm-blooded animals. E. coli can survive for long time in the environment, where so-called "naturalized" populations may coexist with strains of vertebrate origin (Ishii & Sadowsky, 2008). E. coli genotypes present in ecosystems are also influenced by environmental factors such as temperature and hydrology, and by anthropogenic factors that include the proximity to urban areas and livestock production systems, with higher numbers and a greater diversity of E. coli genotypes closer to settlements and farms (Lyautey et al., 2010). The risks for Public Health posed by livestock and wild animals carrying pathogenic E. coli are dependent on the prevalence, incidence, and magnitude of pathogen carriage in the animal hosts, and the degree of interaction between the animals and humans (Jay et al., 2007). Ungulate animals are among the most common reservoir species for Shiga-toxigenic E. coli (STEC), a zoonotic pathotype for which cattle are considered the main reservoirs (Hancock, Besser, Lejeune, Davis, & Rice, 2001). In addition, E. coli O157:H7 and other non-O157 STEC are present in a large variety of other ungulates such as deer, sheep, goats, or pigs (Doane et al., 2007). With regard to wildlife, the most abundant species in a particular region would be the most likely concern in terms of pathogen shedding since the risk of fecal contamination by these animals is the highest. In studies on free-ranging deer, the fecal prevalence of E. coli O157:H7 was estimated to range from zero to less than 3% (Branham, Carr, Scott, & Callaway, 2005;Dunn, Keen, Moreland, & Alex, 2004;Fischer et al., 2001;Renter, Sargeant, Hygnstorm, Hoffman, & Gillespie, 2001;Sargeant, Hafer, Gillespie, Oberst, & Flood, 1999), while in feral pigs, 23% of fecal samples were positive for E. coli O157 in California, USA (Branham et al., 2005).
Until now, some studies for detection of STEC in large game animals such as the red deer (Cervus elaphus) or the Eurasian wild boar (Sus scrofa) have been developed (Miko et al., 2009;Sanchez et al., 2009 (Chandran & Mazumder, 2013;Li et al., 2013), and thus there is a lack of epidemiological data regarding their distribution, which would be especially relevant at the wildlife/livestock/ human interface.
Using a set of quantitative real-time PCRs (qPCRs) for the direct detection and quantification of nine E. coli virulence genes (VGs), we used Doñana National Park (DNP) as a natural experiment to identify the effects of anthropic pressure and of wild and domestic ungulate abundance on the distribution and abundance of human pathogenic E. coli genotypes and VGs. We expect that higher interspecies transmission of E. coli may arise from increased ecological overlap (Barasona et al., 2014;Barasona et al., 2015;Goldberg, Gillespie, Rwego, Estoff, & Chapman, 2008), and that the spatial pattern of distribution of pathogenic VGs in the environment and hosts may be affected by human, livestock, and wildlife distribution. We hypothesized that E. coli VGs would be more diverse and abundant in proximity to human settlements and waste than in natural habitats, with human influence being more relevant than host species in shaping their spatial pattern.

| Study area
DNP (37°0′ N, 6°30′ W, covering an area of approximately 54,000 ha with the highest level of environmental protection in Spain), located in the south-west Iberian Peninsula, is considered one of the most important European wetlands in terms of biodiversity. This is a flat region of sandy soils, with altitudes ranging from 60 m above sea level (asl) to 0 m asl in the south marshland area. It contains the largest wetland in Western Europe, an intricate matrix of marshlands (270 km 2 ). Natural inundation takes place between October and March, mostly by rain in the drainage watershed. Under natural conditions, most of the contributions of water come from precipitation, streams in the north-west (La Rocina, El Partido, Las Cañadas, which is included in our study area), and rivers in the east (Guadalquivir and Guadiamar, which are now diverted, entries occurring through the Guadalquivir estuary in the east, outside our study area) (Aldaya, García-Novo, & Llamas, 2010).
Traditional farming is being progressively abandoned, and greenhouse farming and rice paddies have become the most productive activities around DNP, together with touristic resorts (Haberl et al., 2009).
Aside from the temporary marshland, DNP has a large number of small, more or less permanent water bodies and watercourses ( Figure 1). Some streams flow from the higher regions in the northwest and drain southward into the marshland. These streams have not significantly improved their water quality in the last two decades despite the construction of waste water treatment plants (Serrano et al., 2006). DNP has a mediterranean climate generally classified as dry subhumid with marked seasons. In the wet season (winter and spring), the marshland is flooded, and wild and domestic ungulates graze in the more elevated scrublands. The hardest season for ungulates in DNP is summer (from July to September), when herbaceous vegetation, wetlands, and water bodies in most habitats dry up and only a few meadows remain green at the ecotone between the upper scrublands and the lower marshes (Braza & Alvarez, 1987). DNP represents a unique setting where wildlife and cattle share habitat with a proximity gradient to human settlements toward the park boundary. Local variation in wildlife abundance and cattle distribution, along with the seasonally increased aggregation of livestock and wildlife at water points, makes DNP ideal for research on indirectly transmitted disease agents (Green & Silverman, 1994).

| Sample collection
A survey was carried out during June-September 2012, when water availability is critical, and therefore livestock and wild ungulates aggregate more around water sites. The sampling strategy was designed to represent the north (where water from the streams pours into the marshes) to south (dry dune habitats) gradient of DNP, and the east to west gradient (from the marsh to the woodlands). Collection of samples was performed using disposable sterile material and containers, and sampling sites were georeferenced by Global Positioning System.
We collected 14 water samples (variable volume), nine from surface water (creeks and waterholes) and five from septic tanks using sterile containers. We also collected 68 pooled fresh fecal samples from the ground (from 3 to 7 individual fecal samples per pool) from either red deer or fallow deer (29 pools, n = 148 fecal samples), wild boar (20 pools, n = 92 fecal samples), and cattle (19 pools, n = 87 fecal samples) in sterile plastic bags ( Figure 1). All samples were sent for refrigeration on the same day to the laboratory and immediately frozen upon arrival for further analysis.

| Laboratory analyses
Water samples and pooled fecal samples were processed and analyzed by using a previously described qPCR assay in order to detect a set of nine VGs (see Table S1) characteristic of different E. coli enteric pathotypes (stx1,stx2,eae,InvA,ehxA,est,elt,bfpA,aggR), four serotype-related genes (rfb O157 , fliC H7 , wzx O104 , fliC H4 ), and one genusspecific gene (uidA) (Cabal et al., 2013;. Pooled fecal samples were processed in a 1/3 proportion of phosphatebuffered saline. Briefly, 400 mg of each pool of feces were used for DNA extraction with a commercial kit (QIAamp DNA stool mini-kit, Qiagen, Hilden, Germany) and extracted DNA was directly used in the qPCR. Water samples were concentrated by double centrifugation at 16 Relative centrifugal force during 15 min. Supernatants were then mixed together with the sediment to get a final volume of 400 μl per sample. Then, DNA was extracted using the same commercial kit.
Finally, qPCRs were performed as described previously .

| Statistics
Kruskal-Wallis and Mann-Whitney U nonparametric tests were used to compare the number of uidA copies per mg of feces, considered F I G U R E 1 Map of Doñana National Park (DNP) and surroundings. Environmental features, sampling type, sites, and areas are shown. Watercourses in the north represent the entrance of water from outside the park. The habitat east to the three study areas is composed by marsh as an indicator of the overall E. coli load in each sample, among hosts species and zones.
Proportion of positive samples to certain VG combinations depending on the sample type was evaluated using Fisher's exact test.
Explanatory covariates were determined following the revision of the landscape and animal factors regulating E. coli presence, and based on the accessible information for DNP, we selected 16 potential predictors (see Table S2), derived from a geographic information system (GIS) of the study area using Quantum GIS version 1.8.0 Lisboa (QGIS Development Team, 2012). In a first step, we screened against including collinear covariates using a |r| = .6 as a threshold cut-off value (Hosmer & Lemeshow 2000). As a result, in a second step the noncollinear variables in the previous step were included as explanatory ones in generalized linear models (GLMs): host species, distance to nearest surface water entrance to DNP, riparian habitat proportion, distance to nearest permanent water point, distance to nearest marsh-shrub humid ecotone, ungulate abundance (per sampling area), and water conservation status (in the nearest water point), respectively, for each VG and host (wild boar, deer, and cattle) (Green & Silverman, 1994). In this second step, we tested the final predictors affecting the presence of E. coli VGs using a binomial error (0 = negative, 1 = positive) and a logic link function. Distances, abundances, water status, and land cover type proportions were treated as continuous variables, while host species as a categorical variable (see Table S2). Regarding the VG diversity (defined as the number of different VGs present, ranging from 1 to 8), we used a Poisson error and an identity link function. All statistics were performed in SPSS Statistics 18 for Windows (IBM ® , Armonk, NY, USA).

| Descriptive epidemiology
All samples, but one cattle pool (18/19, 94.7%), one deer pool (28/29, 96.5%), and one wild boar pool (19/20, 95%), tested positive for the genus-specific gene uidA, including all nine surface water and all five septic tank samples (Table 1). The mean number of uidA copies per mg of sample is shown in Table 2. Statistical differences in the number of uidA copies were observed depending on the type of sample, with higher values in environmental than in animal samples (Mann-Whitney U test, p < .05). No statistical differences were evidenced when comparing septic tanks against surface waters (Mann-Whitney U test, p = .79). Differences depending on the type of sample, species, and zones for VGs are shown in Table 2. The number of positive samples to each VG varied largely depending on sample, host species, and area (Table 1), and in some cases, qualitative values differed from quantitative ones (Table 2). Overall, the pattern of VG diversity was decreasing from north to south, and decreased particularly in the southernmost part of the park, with little anthropogenic influence (MAR area, Figure 2, Kruskal-Wallis tests statistically significant, p < .05 for the three host species, respectively). The qualitative analyses revealed that the EIEC-associated VG (invA) was the most abundant gene in all samples/areas (45/82), followed by eae (41/82) and stx2 (36/82), which were also frequently detected. EhxA (21/82), stx1 (24/82), and aggR (22/82) were moderately detected (Figure 3).
The STEC-associated VGs (stx1, stx2, ehxA, and eae) were present in all combinations samples/hosts in at least one of the areas sampled.
On the contrary, ETEC and typical EPEC-associated VGs (est, elt, and bfpA) were absent or present in very few samples (Table 1). All VGs were found in deer and wild boar samples. InvA and stx2 were the most frequently detected VGs in ruminants, followed by eae (deer and cattle) and aggR (deer), while in wild boar the most frequently found genes were eae and invA. Two VGs, eae and ehxA, were often detected in septic tanks, and invA was most frequent and abundant in superficial water. Interestingly, est was present in wild ungulates but absent in cattle and water samples. The VGs aggR and est were not detected in the southern third of DNP, further away from anthropogenic influences ( Table 1).
The serotype-related genes rfb O157 and fliC H7 were detected simultaneously in 4 (21.1%) of 19 cattle samples, 1 (3.4%) of 29 deer samples, and 3 (15.0%) of 20 wild boar samples. In contrast, these genes were detected in three of five septic tanks and in three of nine surface water samples. However, samples positive to rfb O157 /fliC H7 that were also positive to STEC typical VGs were even less frequent (cattle: 15.8%, wild boar: 5%, and deer: 0%). In the southern third of DNP this combination was only found in a septic tank (Figure 3). The probability of detection for this combination was higher in water samples (either superficial or septic tank) than in animal samples (6/14 vs. 8/68; Fisher's p = .006). The serotype-related genes wzx O104 and fliC H4 were found together in 1 (5.3%) cattle, 3 (10.3%) deer, and 1 (5%) wild boar samples. This combination was not observed in septic tanks and was present in only one surface water sample. Three of these six detections corresponded to the northernmost sampling sites in DNP. One of the deer samples also carried aggR, but all wzx O104 and fliC H4 positive samples were negative for stx2.
Mean values for the quantitative presence of each VG are shown in Table 2. Briefly, the highest values were reported for stx1 and stx2 (>10 5 gene copies per mg or ml), followed by est and invA (>10 4 ). By sample source, the highest values for cattle and deer were reported for stx2 (>10 3 and >10 6 , respectively), while in wild boar, stx1 gave the highest mean values (>10 3 ). In the septic tanks, stx1 and invA were found at high levels (>10 3 ), and for the superficial water, stx1 and stx2 gave the highest results (>10 5 ).
In cattle, deer, and wild boar, significant differences were found among the study areas for some VGs. Also, significant differences were seen for stx1, stx2, and ehxA mean values by sample origin. Finally, significant differences were detected for stx2 mean values among animal species (Table 2). factors, no association between the presence of any VG, gene combinations, or the total number of VG detected and any particular host species was found. Models showed that the closer the sampling site was to the entrance of surface water to the park, the higher the risk of the sample to test positive for VGs stx2, eae, and invA (stx1 was marginally significant), so as for the combination rfb O157 /fliC H7 . As a result, the proximity to the water entrance was statistically positively associated to an increased number of different VGs (diversity) in the fecal sample. The VG aggR was statistically more prevalent as distance to marsh increased. Stx2 and the combination rfb O157 /fliC H7 were statistically more frequent at higher abundances of ungulates.

| DISCUSSION
Animals are considered the main source of certain pathogenic E. coli strains (mainly STEC strains), while humans constitute the only known reservoir of all the other pathotypes (Nataro & Kaper, 1998). For this reason, most studies performed on animal samples have focused mainly on the detection of STEC, and especially O157:H7 (Miko et al., 2009;Sanchez et al., 2009). However, animals can also host other pathotypes and VGs that could eventually lead to the emergence of new strains such as the EHEC/EAEC O104:H4 causing the German outbreak in 2011 (Bielaszewska et al., 2011;Nyholm et al., 2015), even though the origin of these strains remains unclear. For this reason, here we evaluated the presence of VGs characteristic of the intestinal pathotypes of E. coli in samples from livestock, wildlife, and the environment collected in different epidemiological settings in terms of anthropogenic contamination. This qPCR proved to be a fast and reliable tool to assess the frequency and quantity of each VG present in both water and fecal samples and an alternative to time-consuming methods such as traditional bacteriology, which has been regarded as less suitable for characterization of a whole E. coli population (Lleo et al., 2005). Even though the simultaneous detection of a given set of VGs with different VGs is variable. Data generally confirmed the initial hypothesis that E. coli VGs would be more diverse and abundant in proximity to human settlements and waste than in natural habitats.
Anthropic drivers were more influential than host species in shaping E. coli VG spatial pattern and VG diversity, and its abundance was in general higher in the northern part of DNP closer to the human influence, while some VGs (aggR and est) were not detected in the southern third of DNP, away from anthropogenic influences.
All these findings indicate that water, both surface water and the effluents from human dwellings, plays a key role in E. coli epidemiology in DNP. Several studies have been performed for detection and quantification of E. coli in water for human consumption or agricultural and recreational uses (Khan et al., 2007), but information on the total E. coli numbers or its VGs in wastewater before treatment or nonpotable water in natural parks is scarce. Our results confirm that human waste may be an important source of microbial exposure to livestock and wildlife through water, and subsequent high levels of antimicrobial resistance, even within protected areas (Pesapane, Ponder, & Alexander, 2013).
Although the limited sample size of water samples prevents the extraction of definitive conclusions, other studies have also reported the presence of certain VGs typical of human pathotypes in strains recovered from surface waters, and suggest the wide spread of potentially pathogenic isolates in aquatic ecosystems. The high prevalence of positive samples to different VGs compared with other culture-based studies performed on water samples (Carlos et al., 2011;Ramirez Castillo et al., 2013) is not surprising given the higher sensitivity of direct-detection approaches in comparison with culture-based techniques (Khan et al., 2007).
probability of isolation of one single colony is extremely low as previously reported (Dunn et al., 2004;Renter et al., 2001;Sargeant et al., 1999). Rfb O157 and fliC H7 were detected together more often in septic tanks (60%) and surface water samples (33%) than in animal fecal pools (3.4%-21.1%). In addition, the high mean values for stx1 found in the septic tanks (containing wastewater of human origin) in comparison with those of stx2 were in agreement with the fact that human STEC strains usually carry only stx1 (with O157:H7 being an exception) (Guth, Prado, & Rivas, 2010). In contrast, the superficial water contained higher values for stx2 than the septic tanks and higher than stx1 in this matrix as reported previously (Sidhu, Ahmed, Hodgers, & Toze, 2013). These findings are probably due to animal contamination.
Although aggR was detected at low levels, animal samples with the highest values were found in the EBD sampling area. This meant that possibly animals in this area had a higher chance to acquire EAEC than animals from the other two sampling areas. In addition, only water samples collected from the septic tank located in the EBD area revealed the presence of aggR gene, suggesting a differential degree of exposure of the animals located in this area to EAEC. Also, watersheds may not have been much polluted with fecal contamination of human origin as aggR was not present. Interestingly, other authors already reported EAEC pathotype in sewage water not only from treatment plants (Carlos et al., 2011;Omar & Barnard, 2010) but also from domestic animals such as pigs, cattle, and chicken (Kagambega et al., 2012). None of the wildlife or livestock samples contained the typical VGs present in the EAEC/EHEC O104:H4 German outbreak strain (stx2/aggR/wzx O104 / fliC H4 ). This contrasts with our previous report in which those characteristic VGs were detected simultaneously in samples from German cattle farms located near the outbreak area .
Although the detection rate for invA in DNP animal samples was higher than expected taking into account that EIEC is considered a human pathogen (Kaper, Nataro, & Mobley, 2004), the mean values (~10 2 gene copies per mg) obtained in the quantitative analysis for this VG could indicate that it was present at low quantities in animals.
In contrast, higher mean values were found in water samples (10 3 -10 4 gene copies per ml). Presence of EIEC markers such as invA may indicate the pollution of surface waters and animal foraging grounds with human feces Sidhu et al., 2013). Shigella, which shares the same genetic background as EIEC, has not been detected in animals. This suggests that the positive samples for invA in the current study are most likely linked to EIEC.
Similarly to invA, the low mean values (or the absence) found for the typical EPEC gene bfpA, suggested a limited degree of human fecal contamination with this pathotype. Animals have been described as reservoirs of atypical EPEC together with humans (Moura et al., 2009).
Thus, the eae mean values seen in the septic tank together with the absence of bfpA could also indicate the presence of this pathotype, as eae is a common gene in STEC and in EPEC strains (Nataro & Kaper, 1998).
Finally, ETEC markers were detected at low frequencies although est in deer from CdR was high. It is possible that ETEC/STEC pathotypes carrying est could be also present, as seen in our previous works in cattle and other animal species . These hybrids have been previously described and associated to the carriage of stx2g (Sidhu et al., 2013).

| CONCLUSIONS
Biotic homogenization is the process by which species invasions and extinctions increase the taxonomic, genetic, or functional similarity of multiple locations over a specified time interval (Olden, Leroy Poff, Douglas, Douglas, & Fausch, 2004