Implementation and Integration of Microbial Source Tracking in a River Watershed Monitoring Plan

Fecal pollution of water bodies poses a serious threat for public health and ecosystems. Microbial source tracking (MST) using host specific bacteria are used to track the source of this potential pollution and be able to perform a better management of the pollution at the source. In this study we tested 12 molecular MST markers to track human, ruminant, sheep, horse, pig and gull pollution to determine their usefulness in their application for an effective management of water quality. First, the potential of the selected markers to track the source was evaluated using fresh fecal samples. Subsequently, we evaluated their performance in a catchment with different impacts, considering land use and environmental conditions. All MST markers showed high sensitivity and specificity, although none achieved 100% for both. Although some of the MST markers were detected in hosts other than the intended ones, their abundance in the target group was always several orders of magnitude higher than in the non-target hosts, demonstrating their suitability to distinguish between sources of pollution. The MST analysis matched the land use in the watershed allowing a very accurate assessment of the main hazards and sources of pollution, in this case mainly human and ruminant pollution. Correlating environmental parameters like temperature and rainfall with the levels of the MST markers provided insight into the dynamics of the pollution along the catchment. The levels of the human associated marker showed a significant negative correlation with rainfall in human polluted areas suggesting a dilution of the pollution, whereas at agricultural areas the ruminant marker increased with rainfall. There were no seasonal differences in the levels of human marker, indicating human pollution as a constant pressure throughout the year, whereas the levels of the ruminant marker was influenced by the seasons, being more abundant in summer and autumn. Performing MST analysis integrated with land uses and environmental data can improve the management of fecal polluted areas and set up good practices.

Detection of the MST markers by PCR was performed using primers described 162 previously (S1 Table). Five host-specific markers matched the 16S rRNA gene of Bacteroidales 163 species: the human marker HF183, the ruminant marker CF183 [4], the horse marker HoF597 164 and the pig markers PF163 [23] and Pig-2-Bac [24].

172
The 16S rRNA gene of Catellicoccus marimammalium was amplified using primers 173 GullF and GullR (S1 Table) [9]. The reaction mixture was incubated at 95ºC for 2 min followed 174 by 35 cycles of 94ºC for 30 s, 63ºC for 30 s, and 72ºC for 35 s, which were followed by a final 175 incubation at 72ºC for 5 min.
177 The reaction mixtures were incubated at 95ºC for 2 min, followed by 35 cycles consisting of 178 94ºC for 40 s, 55ºC for 50 s, and at 72ºC for 45 s; followed by a final 5 min extension.

179
The universal primers F63 and 1389R [26,27] were used (0.4 µM) to amplify the 16S 180 rRNA gene, and was used as a positive control to rule out the presence of PCR inhibitors in 181 samples causing false negative results. The reaction mixture was incubated at 95ºC for 2 min, 209 data was obtained from the Ordnance Survey Ireland (OSi). These layers included river 210 segments, catchment and subcatchments areas and digital elevation models.

213
Sensitivity and specificity for the eight microbial source tracking markers were 214 calculated as described previously [30]. In order to obtain a 95% confidence interval for the 215 estimates of sensitivity and specificity, at least 20 fecal samples of target species were analyzed.
216 Sensitivity (r) and specificity (s) are defined as r=a/(a+c) and s=d/(b+d), where a is when a fecal 217 DNA sample is positive for the PCR marker of its own target (true positive); b is when a fecal 218 DNA sample is positive for a PCR marker of another target (false positive); c is when a fecal 219 DNA sample is negative for a PCR marker of its own target (false negative); and d is when a 220 fecal DNA sample is negative for a PCR marker of another target (true negative).

221
The positive predictive value or conditional probability that a particular source of fecal 222 contamination was present when a water sample tested positive for the corresponding MST 223 marker was calculated using Bayes' Theorem [7].

224
The quantitative values of the qPCR markers were log 10 converted to achieve normality.

239
It was shown previously that significant geographical variability exists in the sensitivity 240 and specificity of MST markers [15,30]. The performance of these MST markers was therefore 241 evaluated using feces from local animals and humans, prior to deploying these in the field. 264 markers and total Bacteroidales (AllBac) were measured by qPCR in fecal samples from 265 different sources ( Figure 2, Table 2). The number of positive samples, as determined by qPCR, 266 was higher than by end-point PCR. The samples that were positive for qPCR but negative for 267 endpoint PCR contained very low target DNA concentrations and in many cases were below the 268 quantification limit.

283
To verify whether significant differences existed when comparing qualitative end-point 284 PCR and the levels of host-marker levels among the corresponding target and non-target 285 samples, as determined by qPCR, a Chi-squared and independent sample Kruskal-Wallis Test 286 analysis was used. The differences within the target and non-targeted samples were statistically 287 significant for all the markers evaluated in the study (P < 0.05). Thus, the MST markers tested 288 are potentially useful to discern among different fecal samples in Ireland.

291
The Dargle catchment is characterized by a highly varied land use, including pristine, 292 agricultural, forested and urban areas (Fig 1), which were analyzed using Arc Hydro tools of 293 ArcGIS using CORINE land cover data. This informed the localization of sampling stations 294 within the catchment, with a view to capture the impact of land use on water quality. The 295 catchment was sampled at 10 stations (Fig 1). In addition to MST markers, the levels of fecal 269 indicator bacteria (FIB), i.e. E. coli and Enterococci, were determined to evaluate the level of 270 fecal pollution in the sub-catchments of the Dargle.

271
The highest levels of FIB ( Fig 3A, Fig 3B) Table 3). The ruminant marker was less prevalent than the human marker in water samples. 269 CF183 was detected in more than 60% of the sampling stations surrounded by agricultural 270 areas: DRG1, DRG2, DRG3, KGH, CBR, GCU and GCR, but was virtually absent in the SWN 271 (~20%), which flows through an almost exclusive urban area ( Figure 3F, Table 3). While the 272 horse marker (HoF597) was rarely observed in general, its presence stood out at the KMG2 and 273 CBR sampling stations (more than 20% of samples tested positive). There are several horse 274 riding stables located upstream from these sampling sites.

275
Bayesian statistics was used to determine the probability of detecting feces from the 276 target-host within a given station using the end-point PCR results. Given the prevalence of the 277 three MST markers in the catchment, as well as the sensitivity and specificity of these markers 278 as determined using fecal samples (Table 1), the conditional probability of detecting the 279 biological source of pollution was evaluated [7]. The conditional probability of having human 280 pollution when the HF183 marker is detected was 0.95. The conditional probabilities for the 281 ruminant marker (CF128) and horse marker (HoF597) were 0.93 and 0.75 respectively.

282
The levels of the human MST marker (qHF183) were high (10 5 -10 6 gc 100 ml -1 ) in 283 sampling sites with anthropogenic impact: SWN and KMG2, DRG3 and the CSO. Intermediate 284 levels of the human marker, between 10 3 -10 4 gc 100 ml -1 , were detected in sampling sites with 285 medium anthropogenic impact: KGH, CBR, GCU, KMG1 and DRG2. The levels of the human 286 marker were below or at the detection limit at DRG1 and GCR ( Fig 3E and Table 3).

287
There was a fairly even distribution of the levels of the ruminant marker along the 288 different sampling points of the Dargle catchment ( Fig 3F and

293
The level of general Bacteroidales marker (using AllBac marker) was assessed in order 294 to evaluate the potential use of total Bacteroidales as a fecal indicator. The levels of AllBac 295 were higher than 10 4 gc 100 ml -1 in all samples ( Fig 3D) [14,[37][38][39]. In this 289 study we tested eight MST markers using end-point PCR and four markers using qPCR, with a 290 view to determine their usefulness in their application for an effective management of water 291 quality. The selected MST markers have been described elsewhere [4,9,24,25,28,40].

292
The variability of genetic marker abundance and prevalence in populations from other 293 geographical locations suggests that the use of MST markers developed in a geographical area 294 requires a priori characterization of the assay performance at each watershed of interest before 295 being implemented [15,[41][42][43]. Thus, in the first instance the selected MST markers were tested 296 in fecal samples from known sources. All the PCR markers showed a high sensitivity and 269 specificity, although none of them achieved 100% for both parameters. Although some of the 270 MST markers were detected in hosts other than the intended ones, their abundance in the target 271 group was always significantly higher than in the non-target hosts, demonstrating their 272 suitability to distinguish between sources of pollution. These MST markers are diluted once the 273 fecal matter is introduced in the environment. The lower levels reported in non-targeted samples 274 are therefore likely to be below the detection limit once the fecal matter has entered the 275 watershed.

276
The levels of the human marker (HF183) was variable in individual human stool 277 samples. However, when the marker was tested in raw and treated sewage samples, 100% 278 sensitivity and fairly constant marker levels were observed, which is in agreement with earlier 279 observations by Seurinck et al [28]. These authors report levels of 10 5 to 10 9 gc gr -1 of wet 280 human feces and 10 9 to 10 10 gc L -1 of sewage, similarly to the results of this study. The human 281 gut harbors three predominant enterotypes dominated either by Bacteroides, Prevotella or 282 Ruminococcus which could be related to for example diet [44][45][46], which might explain the high 283 degree of variability in marker levels among individuals. However, the main sources of human 284 pollution are sewer misconnection, effluents from WWTP or CSO, as well as septic tanks, 285 rather than from individuals. The two human markers tested in this work: HF183 using end-286 point PCR and with SYBRGreen for qPCR were classified as the most sensitive and specific 287 using binary analysis in a multi laboratory method evaluation study [47].

288
The effect of diet on the composition of the microbiota has also been reported for cattle, 289 suggesting a strong correlation between bacterial microbiota structures and feeding practices.
290 Prevotella spp. were the dominant group when animals were fed with unprocessed grain, while 291 Ruminococcacea were the main group when other feeds were used [14]. In this study we tested 292 the ruminant marker CF128 by end-point PCR [4] and since it showed a good specificity, a 293 SYBRGreen assay with this primer was developed. The probe-labeled assay targeting the cow 294 specific marker CowM3 [29] was used to be able to discern among cattle and other ruminants 295 (especially sheep and deer). The CF183 marker was detected in cows, sheep, deer and goat by 296 PCR and qPCR and showed a cross reaction mainly with horse and pig. However

283
In this study, markers targeting mitochondrial DNA from pig and sheep showed a high 284 specificity and sensitivity (Table 2). Just a few positives were observed in human fecal samples, 285 raw and treated sewage, which might be due to a recent ingestion of pork and sheep products.
286 Other studies analyzing mitochondrial DNA detected the bovine marker in individuals who had 287 eaten beef 24 hours before the sample collection; however the levels were two orders of 288 magnitude lower than those of the human marker [51]. In this study we did not perform a 289 quantitative analysis for these markers which might have shown much lower concentration of 290 pig or sheep mitochondrial DNA in human feces. Therefore, mitochondrial DNA may prove to 291 be a good option to be used with a combination of bacterial markers to help to discriminate 292 among animals with similar microbiota such as cow, sheep and deer [50,52].

293
Two Bacteroidales pig markers were also evaluated in this work, whereas PF163 294 showed a higher sensitivity than Pig-2-Bac, a higher specificity was achieved by the latter, 295 suggesting the use of a combination of markers to increase source tracking resolution [53].
269 Finally the gull marker showed a very high specificity, not being detected in other avian 270 samples like swans or peacocks.

271
The HF183 and CF183 markers were evaluated in several European countries, including 272 Ireland [30], which showed 88% sensitivity and 100% specificity for HF183 and 100% 273 sensitivity and 96% specificity for CF183, which is slightly higher than what was observed in 274 this study. Both markers were also evaluated in Galway (West of Ireland) what allows us to 275 compare regional geographic variability [54]. For HF183 a sensitivity of 12% for individual 276 fecal samples and 70% for sewage was reported with 100% of specificity, whereas 94% 277 sensitivity and 95% specificity was observed for CF183. In the latter cross-reaction was 278 observed with mainly pig fecal samples, as is the case in this study. These markers therefore 279 perform in a consistent manner in different areas, and at different times.

280
The levels of the host specific Bacteroidales markers (qCF128) in the target host were 281 several orders of magnitude higher, ranging from 10 7 -10 8 gc 100 mg wet wt -1 , than in other 282 hosts, similar to what was observed previously using samples taken in California [55]. The very 283 high levels of these markers in their target hosts mean that remain above the detection limit 284 following release in a waterbody.  277 and Enterococci in general showed a high seasonal pattern and high correlation with maximum 278 temperature detecting higher levels in warmer periods. A lower persistence of these markers 279 with higher temperature has also extensively described [62][63][64] suggesting that the high levels 280 are mainly related to major pollution inputs during such periods. Moreover, there was a positive 281 correlation with FIB and rainfall, with the exception of DRG3 where a negative correlation was 282 found due to the dilution of the source of the pollution (the WWTP effluent) and CBR where no 283 correlation was found. In the case of the MST markers analyzed in this study, such a correlation 284 was only observed at a few sites, suggesting a potential of other fecal input in the catchment 285 with rainfall events. These results confirm that BMP of run-off could improve water quality.

287
The MST toolbox evaluated in this study shows a good potential to be used for a better 288 management and assessment of fecal pollution. However, it is advisable to evaluate the markers 289 on fecal samples collected in the area where they will be applied in order to know their 290 specificity, sensitivity, prevalence and understand their patterns in the environmental samples.
291 The combined use of FIB, MST markers, environmental data and Geographical Information 292 System to integrate land uses analysis with visual exploration, achieves appreciably enhanced 293 description of the potential main causes of fecal pollution in a river catchment area. This data 294 may be applied to develop hydrological models integrating bacterial data to facilitate the 295 application of measures to eliminate or reduce the levels of fecal pollution at the source.