Viruses in Nondisinfected Drinking Water from Municipal Wells and Community Incidence of Acute Gastrointestinal Illness

Background: Groundwater supplies for drinking water are frequently contaminated with low levels of human enteric virus genomes, yet evidence for waterborne disease transmission is lacking. Objectives: We related quantitative polymerase chain reaction (qPCR)–measured enteric viruses in the tap water of 14 Wisconsin communities supplied by nondisinfected groundwater to acute gastrointestinal illness (AGI) incidence. Methods: AGI incidence was estimated from health diaries completed weekly by households within each study community during four 12-week periods. Water samples were collected monthly from five to eight households per community. Viruses were measured by qPCR, and infectivity assessed by cell culture. AGI incidence was related to virus measures using Poisson regression with random effects. Results: Communities and time periods with the highest virus measures had correspondingly high AGI incidence. This association was particularly strong for norovirus genogroup I (NoV-GI) and between adult AGI and enteroviruses when echovirus serotypes predominated. At mean concentrations of 1 and 0.8 genomic copies/L of NoV-GI and enteroviruses, respectively, the AGI incidence rate ratios (i.e., relative risk) increased by 30%. Adenoviruses were common, but tap-water concentrations were low and not positively associated with AGI. The estimated fraction of AGI attributable to tap-water–borne viruses was between 6% and 22%, depending on the virus exposure–AGI incidence model selected, and could have been as high as 63% among children < 5 years of age during the period when NoV-GI was abundant in drinking water. Conclusions: The majority of groundwater-source public water systems in the United States produce water without disinfection, and our findings suggest that populations served by such systems may be exposed to waterborne viruses and consequent health risks.

qPCR controls. Every batch of PCR reactions included the following negative controls: 1) Negative extraction control, which was FCSV created from a blank filter using the same elution and secondary concentration steps as a real sample; 2) Negative RT master mix; and 3) Negative PCR master mix. If any of the negative controls were positive the data were omitted, the source of the contamination identified and corrected, and the analysis batch repeated.
Every batch of PCR reactions included the following positive controls: 1) Positive extraction control, which was the same as the enterovirus reference control seeded into "blank" FCSV matrix; and 2) Positive reference control for each virus group tested. The standard of each virus that resulted in a crossing point of near 34 was aliquoted and stored frozen to be used subsequently as the reference control. Reference controls for noroviruses GI and GII, rotavirus, and HAV were in the form of cDNA, the reference control for adenovirus was extracted DNA, and the enterovirus reference control was intact virus because this control also served as the nucleic acid extraction positive control for the entire analysis batch. New reference controls were created at the same time as the standard curves. Positive reference controls were required to be within ± 0.5 cycles of the original crossing point measured when the standard curve was created in order for the measurements of the unknown samples to be acceptable. An analysis batch was repeated if the reference control fell outside this range. proportion or multiplier that is necessary for calculating the final virus concentration.
Step 1: Number genomic copies measured in PCR reaction Step 2: ÷ volume of RT reaction added to the PCR reaction RT reaction volume Step 3: ÷ volume of nucleic acid extraction added to the RT reaction nucleic acid extraction volume Step 4: ÷ volume of FCSV extracted FCSV volume Step 5: x dilution factor to mitigate inhibition (factor = 1 if no inhibition) Step 6: ÷ water sample volume filtered in liters Step 7: = Number virus genomic copies/liter Adenovirus and enterovirus serotyping. All enteroviruses and adenoviruses in qPCRpositive samples were serotyped by nucleotide sequencing. For enteroviruses, a separate PCR targeting a 656 base pair region encoding one-third of the 5' UTR (untranslated region), the entire VP4 region, and one-third of VP2 was performed using primers OL68-1 and EVP4 (Ishiko et al. 2002). For adenoviruses, the 263 bp product from the qPCR that targeted the hexon gene was sequenced. Flasks were rocked for 90 min at room temperature, the inoculum was decanted, the cell layer was washed with pre-warmed PBS containing 2% fetal bovine serum, and then 10 ml of Eagle minimal essential medium with 2% fetal bovine serum was added to each flask. Incubation was at 37°C; inoculated cell cultures were replenished with fresh maintenance media every seven days. Each set of inoculated flasks included a sterile PBS negative control and positive controls inoculated with poliovirus or adenovirus 41. Cultures were examined with an inverted microscope for the appearance of cytopathic effects (CPE) daily for three days and then every other day for two weeks. Cultures that were CPE negative after two weeks, after removing the maintenance media and adding 1 ml sterile H 2 O, were freeze-thawed three times to release any potentially present virus. The freeze-thaw lysates (0.2 mL) plus sterile 2x PBS (0.2 mL) was passed into a new 25 cm 2 flask containing the same cell line (60% to 80% confluent) and observed for another two week period. After four weeks, if still CPE negative, a third passage was performed and the culture was observed for two more weeks. All cultures of water samples in this study were passaged three times and observed for six weeks.

Statistical Models Interpretation
All Poisson regression modeling results are reported in Supplemental Material, Table S3. The interpretation of the fixed virus effect differs somewhat for the unadjusted versus adjusted models. The unadjusted models are known as marginal or population-average models (Kaufman 2008). Corresponding virus effects are the average effect pooled over communities and reporting periods. Inference is limited to the communities and reporting periods within the current study.
The random intercept adjustment implies that the 14 study communities and four time periods are random samples from populations of similar communities and time periods. This permits inference to similar communities and time periods with different underlying levels of AGI incidence than those in the present study. The adjusted models are referred to as subject-specific or cluster-specific models (Kaufman 2008). The virus effect is that of a 'typical' or 'average' community and reporting period from the relevant populations, where 'typical' or 'average' is operationalized by setting the random intercepts to their mean value of zero. We opted not to adjust for multiple comparisons in the analyses. Rather, our approach was to evaluate each association in the context of available relevant information (Savitz and Olshan 1995).
Certain assumptions are necessary to generalize the models presented here to estimate AGI risk from qPCR-measured virus levels in other environmental settings. Foremost is the sampling timeframe for characterizing virus exposure. In a new setting the sampling timeframe from which these measures are determined must be assumed to be no different than the 12-week aggregate exposure measures used to construct the models. A second key assumption is that the sampling, secondary concentration, nucleic acid extraction, and qPCR methods used to measure viruses in another setting would yield the same virus concentrations and detection frequencies as obtained in the present study. Another consideration is the qPCR virus measurements in the present study were from non-chlorinating systems. Therefore any condition that would completely inactivate a virus while leaving its nucleic acid amplifiable would result in the models overestimating AGI incidence. Other assumptions are no different than those necessary for extending a dose-response relationship obtained from a human feeding trial to a QMRA for a different population and location.

Quantitative Microbial Risk Assessment
The following steps were carried out for each iteration of the Monte Carlo simulations: 1) N single-sample virus concentration values were randomly selected from the data set of tap water samples collected during the periods when the UV disinfection intervention was absent from a community (number of samples in the data set = 618). The QMRA conducted with only the period 1 tap water data included 136 samples in the data set. N was also randomly selected at each iteration to be within the range of the actual number of tap water samples collected from a study community during a 12-week period (between 17 and 24 samples, uniform distribution). The data set included both zero and non-zero values, empirically representing the temporal and spatial variability in virus contamination observed during the study; 2) The arithmetic mean of the N concentration values was calculated to obtain a 12-week mean virus concentration, consistent with the level of time aggregation used as the predictor variable in the virus exposure -AGI response models; 3) The mean concentration was input into the exposure-response relationship (Eq. 1), along with an error term randomly drawn from a normal distribution with a mean of zero and variance σ 2 (Eq.2). Model coefficients and corresponding variance/covariance estimates are reported in Supplemental Material, where the error term is ~N(0,σ 2 ), with σ 2 =Var(intercept)+Concentration 2 *Var(beta)+2*Concentration*Covar(intercept,beta) (Eq. 2) 4) To obtain a realization of the baseline AGI incidence from other sources (I B ), not related to drinking water contamination, a concentration value of zero was input into the exposure-response relationship (Eq. 1), along, again, with a random error term (Eq. 2).

5)
To obtain a realization of the AGI incidence rate difference ( ! " ) when viruses were absent compared with viruses present in non-disinfected drinking water, the baseline incidence estimated in step 4 was subtracted from the total incidence estimated in step 3; Steps 1-5 were repeated 2 x 10 5 times to obtain the frequency distribution of the AGI incidence rate difference from tap waterborne viruses (i.e., where i = 1…N and N = number of Monte Carlo iterations). The simulation was carried out in MATLAB® R2011a.
Supplemental Material Table S3. Poisson regression modeling results. Regression coefficients and corresponding variance/covariance estimates from the linear (in the log of the AGI incidence) fits and incidence rate ratio (IRR) (i.e., (relative risk) information from the spline fits for each model by participant age group, virus type, and virus exposure measure. Coefficients are for daily AGI incidence, i.e., AGI episodes/person-day = e (intercept + beta * virus exposure measure) . Multiply by 365.25 for annual incidence. Supplemental Material Figure S2. Spline fit depicting the influence of an outlier on the association between AGI incidence, all ages, and all-viruses mean concentration. The outlier is a mean virus concentration value from one community that had unusually high NoV-GI concentrations during period 1. The model is unadjusted. The data are the same as Figure 2, panel A, in the manuscript except for inclusion of the outlier. Note the difference in the horizontal axis scales.
Supplemental Material Table S6. Virus types, frequencies, and concentrations by qPCR and frequencies of culturable adenovirus and enterovirus by ICC-qPCR for the well water samples collected immediately following UV disinfection before the water entered the distribution system (n = 191). These data represent the potential contribution of viruses from UV-treated well water to the tap water virus measurements. The median and 75 th percentile concentrations for all sample groups were zero therefore the 95 th percentile is reported. b ICC-qPCR was performed only on qPCR positive samples. c This number is less than the sum of virus types because some samples were positive for two or more viruses.
Supplemental Material Figure S3. Association between adult AGI incidence (episodes/personyear) and enterovirus mean concentration in tap water with the analysis restricted to surveillance periods 3 and 4 only; the models are unadjusted for community and period. Top plot: Linear (in the log of the AGI incidence) fit derived from Poisson regression. Each data point represents a community and period. Bottom plot: AGI incidence rate ratio (IRR, a measure of relative risk) based on a spline fit with the vertical red dashed line indicating the virus exposure threshold above which AGI risk was significantly elevated. Blue dashed lines in both top and bottom plots are the lower and upper 95% confidence limits. Enterovirus concentration reported as genomic copies/L. Regression coefficients are provided in Supplemental Material Table S3.

Statistical Models Sensitivity Analyses
We conducted two post hoc sensitivity analyses for the models highlighted in Figure 2  The results of the confounding analysis are reported in Supplemental Material Table S7.
Among the nine models examined, the percent change in the IRR when UV status was included in the model ranged between -0.8% and 2.3%. We conclude the confounding effect of UV disinfection was minimal on the virus exposure -AGI incidence associations.
We also conducted analyses where outcome and exposure data were aggregated at the level of calendar month within community and surveillance period. The first surveillance period The outcome and exposure data exhibited substantially more variability when aggregated at the level of the month within community and surveillance period as compared to the primary analyses where data were aggregated at the level of 12-week surveillance periods. This was manifested in higher p-values and dampened incidence rate ratios (Supplemental Material Table   S8). In the monthly analyses, one of the subgroups highlighted in Figure 2  In the spline analyses for this subgroup, the estimated incidence rate ratio was ≥ the null value of 1.0 throughout the range of virus concentration values, but did not achieve statistical significance at the .05 level. It is possible that the less stable monthly data are more accurately represented by splines. For reasons stated in the manuscript, we feel strongly that aggregation of the exposure data at the level of 12-week surveillance periods provides the most objective and accurate representation of virus levels in the communities.

38
Supplemental Material Table S8. Poisson regression results with AGI and virus data aggregated at the level of community and calendar month. Post hoc analyses restricted to those models shown in Figure 2 of the manuscript. UV disinfection status is not included in the models, like the primary analyses.