Bacterial and viral fecal indicator predictive modeling at three Great Lakes recreational beach sites

Coliphage are viruses that infect Escherichia coli (E. coli) and may indicate the presence of enteric viral pathogens in recreational waters. There is an increasing interest in using these viruses for water quality monitoring and forecasting; however, the ability to use statistical models to predict the concentrations of coliphage, as often done for cultured fecal indicator bacteria (FIB) such as enterococci and E. coli, has not been widely assessed. The same can be said for FIB genetic markers measured using quantitative polymerase chain reaction (qPCR) methods. Here we institute least-angle regression (LARS) modeling of previously published concentrations of cultured FIB (E. coli, enterococci) and coliphage (F+, somatic), along with newly reported genetic concentrations measured via qPCR for E. coli, enterococci, and general Bacteroidales. We develop site-specific models from measures taken at three beach sites on the Great Lakes (Grant Park, South Milwaukee, WI; Edgewater Beach, Cleveland, OH; Washington Park, Michigan City, IN) to investigate the efficacy of a statistical predictive modeling approach. Microbial indicator concentrations were measured in composite water samples collected five days per week over a beach season (~15 weeks). Model predictive performance (cross-validated standardized root mean squared error of prediction [SRMSEP] and RPRED2) were examined for seven microbial indicators (using log10 concentrations) and water/beach parameters collected concurrently with water samples. Highest predictive performance was seen for qPCR-based enterococci and Bacteroidales models, with F+ coliphage consistently yielding poor performing models. Influential covariates varied by microbial indicator and site. Antecedent rainfall, bird abundance, wave height, and wind speed/direction were most influential across all models. Findings suggest that some fecal indicators may be more suitable for water quality forecasting than others at Great Lakes beaches.


Introduction
Statistical models have seen increased use for predicting water quality in recreational waters in recent years (Christensen et al., 2021;Francy et al., 2020;Jones et al., 2013).This trend is partly due to the realization that persistence models (using water quality measures taken one day to predict quality on the following day) are often inaccurate (Whitman and Nevers, 2008), and waiting 8-24 h for cultured microorganisms to be counted (USEPA, 2021) reduces the level of public health protection in recreational waters (Wymer et al., 2021).Statistical predictive models could allow for public advisories to be issued quickly enough to inform beach attendance on the same day of use (Brooks et al., 2013).
Recreational water quality fecal indicator data, such as cultured enterococci and E. coli, are often modeled using least squares fitting via a multiple linear regression framework (de Brauwere et al., 2014).Least-angle regression with Lasso (LARS-lasso, Efron et al. (2004)) modifies the linear regression approach by constraining the sum of absolute regression coefficients (i.e., L 1 regularization), providing a method for identifying important/ unnecessary covariates, filtering highly collinear covariates, and reducing the overfitting of training data.Linear regression-based methods, although sometimes outperformed by machine learning techniques like random forests and boosted models (Brooks et al., 2016;Thoe et al., 2012), may be less susceptible to overfitting with smaller datasets (n < 100) that are typically available for recreational water quality forecast modeling.
Although cultured fecal indicator bacteria (FIB) predictive modeling is well established, research suggests that viruses are a more likely causative agent of many recreational waterborne illnesses compared to bacterial pathogens (Begier et al., 2008;Sinclair et al., 2009;Soller et al., 2010).As a result, scientists are investigating the use of coliphage as an alternative fecal indicator for water quality testing.Coliphage (F+ and somatic) are viruses that infect coliform bacteria, including E. coli, and may be effective indicators of the presence of human fecal contamination and the associated risk from enteric viruses because they are consistently found in municipal sewage and are similar in size and structure to some human enteric viruses (Havelaar et al., 1993;King et al., 2011;McMinn et al., 2017a).Coliphage are found in the digestive systems of humans and other warm-blooded animals (McMinn et al. 2017a) and are routinely identified in sewage (Ewert and Paynter, 1980;Gantzer et al., 1998;Korajkic et al., 2020;Lucena et al., 2004).Coliphage are also accepted metrics for microbial monitoring of groundwater sources of drinking water (USEPA, 2006).
In addition, there is a growing interest in using genetic methods for recreational beach monitoring.The primary advantage of these methods over culture-based protocols is the

EPA Author Manuscript
EPA Author Manuscript

EPA Author Manuscript
ability to provide water quality information within a few hours (Griffith and Weisberg, 2011).Several epidemiological studies report a significant relationship between the incidence of swimming-related illnesses and genetic enterococci concentration estimates determined by quantitative polymerase chain reaction (qPCR) methods in both marine and Great Lakes recreational waters (Colford et al., 2012;Wade et al., 2008;Wade et al., 2010).Based on these findings, the United States Environmental Protection Agency (EPA) has suggested Beach Action Values associated with qPCR-based EPA Method 1611 for enterococci (USEPA, 2012a(USEPA, , 2012b)).A health relationship has also been demonstrated for qPCR estimates of total Bacteroidales by EPA Method B (USEPA, 2010c) in marine waters (Wade et al., 2010).
This study implements LARS-lasso models and uses cross-validation to compare the predictive performance of previously-reported (Wanjugi et al., 2018) coliphage (F+ and somatic), cultured FIB (E. coli and enterococci), and newly-presented (this study) qPCRbased genetic marker (E.coli, enterococci and Bacteroidales) concentrations for three Great Lakes sites across a single recreational beach season (15-weeks).Covariates include previously reported (Wanjugi et al., 2018) paired measurements of common water and beach area parameters routinely used to describe and model water quality conditions in recreational settings ranging from water temperature to rainfall (Francy, 2009;Francy et al., 2013;USEPA, 2010aUSEPA, , 2010b)).Emphasis is placed on the comparative predictive performance of each fecal indicator response variable utilizing a standardized modeling approach.In addition, the most influential covariates are identified to reveal potential trends that could inform future recreational water quality sample testing and predictive modeling efforts.Findings suggest that some fecal indicators may be more suitable for water quality forecasting than others at Great Lakes beaches.

Site descriptions
Sites included Edgewater Beach near Cleveland, OH, Grant Park in South Milwaukee, WI, and Washington Park in Michigan City, IN (Fig. 1, modified from Wanjugi et al., 2018).All are routinely monitored and have yielded FIB concentrations that exceed the USEPA's recommended Beach Action Values in 10-30% (USEPA, 2012b) of samples based on historical monitoring.At these sites, potential fecal pollution sources are a mixture of nearby wastewater treatment facilities, stormwater runoff, tributary inflows, and combined sewer overflows, with secondary influence from beachgoers, wildlife, and agricultural runoff.For additional details, see Wanjugi et al. (2018).

Water sampling
Water samples for microbial indicator testing were collected from late May to early September of 2015.Each sampling event produced a 6 L composite created from six 1 L grab samples collected in a transect area (three shin-deep and three waist-deep).As described in Wanjugi et al. (2018), water samples were collected via standard methods recommended in Section 9060 of Standard Methods for the Examination of Water and Wastewater (APHA, 2005).Samples were collected at approximately 8:30am on each sampling day (Monday through Friday).The total number of sampling events was 71 at Grant Park, 67 at Washington Park, and 67 at Edgewater Beach.

Cultured bacteria and viral fecal indicator datasets
This study uses previously-published data on the concentrations of cultured FIB (enterococci and E. coli) and coliphage (F+ and somatic) from Wanjugi et al. (2018) (Haugland et al., 2014;USEPA, 2015aUSEPA, , 2015b)).All test sample results used in this study were reported per 100 mL of water sample, with the log 10 copy/ reaction quantitative estimates generated by the E. coli workbook scaled accordingly.

Data acceptance metrics-Each of the Excel data analysis workbooks
referenced above performed automatic checks on standard curves, positive and negative controls, and test sample data quality, and for unacceptable matrix interference in test samples.In the enterococcus and total Bacteroidales method workbooks, these checks included: (1) An analysis of covariance (ANCOVA) with an acceptance criterion of p > 0.05 for slopes and intercepts of the individual standard curves contributing to the composite curve; (2) target organism and Sketa22 assay Ct measurements for each of the calibrator or positive control sample analyses performed in each test sample run within +/− 3 standard deviations of the means determined for these assays in preliminary analyses; (3) Ct values of duplicate Sketa22 assay analyses of test samples within 3 units of the mean from calibrator sample analyses; (4) Ct values of duplicate IAC assay analyses of test samples within 1.5 units of the mean from negative control sample analyses; and (5) average target organism CSE estimates for the negative control (filter blank) samples analyzed in each test sample run <lower limit of quantification (LLQ) value of 720 prior to Sketa22 assay delta Ct adjustments.Undetected Ct measurements were assigned values of 40 for purposes of averaging.The LLQ value established for the EPA enterococci method is 568 CSE (USEPA, 2013b), however, the 720 CSE/sample value was selected as the LLQ for this study based on the analyses of plasmid DNA standards with lowest concentration of 6 copies/reaction and 1/120th of total DNA extract volumes analyzed.Similar data acceptance metrics were applied in the E. coli method workbook with the following differences: (1) target organism and Sketa22 assay Ct measurements for each of the positive control sample analyses performed in each test sample run within acceptance bounds established from a multiple laboratory evaluation study of the method (Sivaganesan et al., 2019); (2) mean Ct measurements for the negative control (filter blank) samples analyzed in each test sample run > 35.09LLQ Ct estimate established from the composite standard curve in the workbook (corresponding to 679 copies/sample); (3) standard deviation of duplicate EC23S857 Ct measurements for test samples that were > LLQ within 1.414 (Sivaganesan et al., 2019).

Covariate data
Ancillary water and site characteristics were measured as potential statistical covariates.These measurements are the same data as presented in Wanjugi et al. (2018).Instrumentation devices and measurement protocols are described in Table S1 of Wanjugi et al. (2018).All measured covariates are routinely collected for statistical modeling of water quality (Francy et al., 2013;USEPA, 2010a).Water parameters included: water temperature (°C), turbidity (NTU), dissolved oxygen (mg/L), conductivity (μmhos/cm), pH, ultraviolet absorbance in the water column at 254 nm (UV_254, 1/m), and dissolved organic carbon (DOC, mg C/L).Site parameters included: wind speed (km/h), wind direction (ø), air temperature (°C), wave height (m), relative humidity (%), cumulative rainfall (24 h, 48 h, and 72 h; mm), photosynthetically active radiation (PAR, mol/m 2 -s), and counts of humans, birds and dogs at the time of sampling.At Edgewater Beach, the discharge from the Cuyahoga River was available from a permanent gauging station at its mouth (USGS site 04208000).For this analysis, wind speed, wind direction and beach orientation angle were converted into alongshore (Wind-A) and onshore/offshore (Wind-O) wind components using sine and cosine functions (Cyterski et al. 2013).A correlation analysis identified multiple highly collinear covariate combinations (r ≥ 0.8): UV_254 and DOC (r = 0.81), 24 h and 48 h cumulative rainfall (r = 0.84), and 48 h and 72 h cumulative rainfall (r = 0.89).
To minimize collinearity for regression coefficient estimation, UV_254 and 48h rainfall covariates were excluded from further analyses.
2.6.Data analyses 2.6.1.Modeling scenarios and data transformations-Regression models were generated for the seven microbial indicators: F+ coliphage, somatic coliphage, cultured E.coli, cultured enterococci, qPCR-based E.coli, qPCR-based enterococci, and qPCR-based Bacteroidales.Prior to any data analyses, cultured FIB and qPCR-based concentrations (C) were log 10 transformed due to a large dynamic range with these datasets.Coliphage concentrations were transformed as log 10 (C+1); the small constant added to prevent negative log 10 values as some coliphage concentrations were <1.0.A value of ½ the detection limit (coliphage) or LLQ (qPCR) was used for microbial indicator concentrations under the detection limit or LLQ, respectively.All covariate data were standardized (subtracting the mean and dividing by the standard deviation).

Microbial indicator measurement correlations-Pearson correlation coefficients (r)
were used to examine the strength of associations between paired microbial indicator measurements by site.To account for potential variability due to occurrence of non-detects in some data sets, correlation analyses were repeated 100 times for each paired measurement combination and an average r was calculated.For each of the 100 coefficients, microbial indicators below the detection limit (coliphage and cultured FIB) or LLQ (qPCR targets) were assigned a unique uniform random number between zero and the respective detection limit/LLQ.Significance of the correlation coefficients was assessed using a t-test (Kendall and Stuart 1973).
2.6.3.Model formulation-All regression models were developed using the "lars" package (version 1.2, Efron et al. ,2004) in R (version 4.1.3, R_Core_Team, 2021).This package implements an iterative fitting technique (LARS-lasso, Tibshirani, 1997), where the linear regression coefficients are manipulated across successive steps, and Cp (Mallows 1973) is tracked.The "optimum" model is determined by the step where Cp is minimized.This implementation is an L 1 regularization technique, in which it is possible for regression coefficients for specific covariates to shrink to zero, in essence removing them from the model (i.e., there is no evidence that they are useful).
2.6.4.Model predictive performance evaluation-A cross-validation approach was used to assess predictive performance of each site-specific microbial indicator model.For each model, corresponding data were randomly split into ten folds.Each fold was withheld while the other nine folds were used to train a sub-model.Each sub-model was then used to predict microbial indicator measurements in the withheld data fold.Thus, ten sub-models were developed, resulting in a single prediction for every data point, as each data point occurs in a withheld data fold only once.Predicted microbial indicator values from each set of ten sub-models was used to compute a standardized root mean squared error of prediction (SRMSEP): For each model, n m is the number of observations, P i m is the ith prediction, O i m is the ith observed value, and C m is the log 10 average concentration of the modeled microbial metric.
Along with a SRMSEP, (3) Total Score (sum of metrics 1 and 2).In this analysis, the influence of 24 h and 72 h cumulative rainfall were summed and called "Rainfall."In the same way, the influence of Wind-A and Wind-O were summed, and this summed influence was called "Wind Speed/Direction."

FIB qPCR measurements
Overall percentages of water sample measurements that gave < LLQ concentration estimates for the Entero1a (enterococci), EC23S857 (E.coli), and GenBac3 (Bacteroidales) assays were 27.2, 8.1 and 6.2, respectively.Positive and negative control sample acceptance criteria were met in all 50 instrument runs of the test samples for each method from the study (data not shown).Each sample was also evaluated for suitable DNA recovery and absence of amplification interference.These quality controls indicated that the percentage of test sample analyses that failed to meet acceptance criteria in the enterococci, E. coli and total Bacteroidales methods were 2.25, 2.62 and 3.00, respectively.Composite standard curve performance metrics are summarized in Table 1.Individual qPCR measurements for the three study sites are shown in Fig. 2.

Microbial indicator summary statistics and correlations
Microbial indicator measurements targeting two coliphage (somatic and F+), two cultured FIB (enterococci, E. coli), and three FIB genetic markers (enterococci, E. coli, and Bacteroidales) were used as response variables in LARS models.Table 2 shows summary statistics of each data set including: number of total samples, number of sample non-detects and below detection limit (coliphage and cultured FIB) or LLQ (qPCR), number of samples that failed quality controls, and descriptive statistics excluding non-detects [minimum, maximum, mean, standard deviation, and coefficient of variation].Pearson correlation coefficients (r) between microbial measures varied by site (Fig. 3).Overall, Edgewater Beach showed the most significant correlations (11) of the site-specific datasets, while Washington Park showed the least (5).There were only a few coefficients that were significant at every site: cultured E. coli paired with qPCR-based E. coli, cultured E. coli paired with qPCR-based enterococci, and qPCR-based E. coli paired with qPCR-based enterococci.

Predictive Model Performance
Predictive performance metrics for each model (n = 21) are given in Table 3.The top performing model (Fig. 4

Predicting bacterial and viral fecal indicator concentrations in Great Lake recreational waters
This study reports the use of LARS-lasso modeling to predict concentrations of coliphage (F+ and somatic), cultured FIB (E. coli and enterococci), and qPCR-based genetic markers (E. coli, enterococci and Bacteroidales) using water and beach site covariate measurements collected five-days per week over an entire beach season from three Great Lake recreational sites.Due to the limited size of recreational beach season data sets including those reported here, the conventional practice of parsing a dataset into training and testing subsets was not feasible.Instead, a cross-validation approach was used to compare model predictive performance.Findings identified multiple trends.First, a clear difference in predictive model performance was observed between F+ and somatic coliphage types.F+ coliphage models consistently resulted in poor predictive performance, ranking the lowest of all microbial fecal indicators at each recreational beach site.The reduced performance of F+ coliphage models could be due, in part, to a higher incidence of non-detects (25.1% for F+ compared to 2.5% for somatic) and overall lower concentrations in water samples compared to somatic coliphage.As a result, increasing sample volumes could help alleviate this issue.However, working with larger volumes (>1 L) presents additional logistical and expense challenges potentially making this solution impractical for routine water quality monitoring.In addition, there is a growing body of evidence suggesting that these virus types exhibit different occurrence patterns in untreated sewage (Korajkic et al., 2020), animal fecal samples (McMinn et al., 2014), and across different surface water types (riverine compared to lake beach) (Wanjugi et al., 2018).In contrast, somatic coliphage predictive modeling exhibited good Overall Performance (Table 3) at Grant Park (1.39) and Edgewater Beach (1.61), suggesting that this viral indicator could be an important fecal indicator tool for future water quality forecasting applications.Second, enterococci and Bacteroidales qPCR outperformed E. coli qPCR regardless of site.Unlike F+ coliphage, E. coli qPCR exhibited a reasonable frequency of non-detects across samples (5.9%) suggesting a different explanation.One possible hypothesis is the presence of naturalized E. coli populations in soils and beach sand, a phenomenon that has been reported in multiple Great Lake studies (Ishii et al., 2007;Ishii et al., 2006).The persistence and propagation of naturalized E. coli populations could obscure any links between measured covariates and the occurrence of this fecal indicator.Additional research is warranted to investigate the mechanisms resulting in poor performance of F+ coliphage and E. coli qPCR fecal indicators.

Microbial measurement correlation trends
New qPCR findings reported here for E. coli, enterococci, and Bacteroidales add to the previously published data set consisting of paired measurements of cultured FIB and coliphage, providing the opportunity to evaluate correlation trends between seven different microbial fecal indicator water quality metrics.Results are particularly useful because all measurements were generated from the same water sample grabs in the same laboratory utilizing standardized protocols for each methodology.Correlation analyses identified several trends providing potential insights into the co-occurrence or lack thereof between these water quality microbial measures.Average correlation coefficients (r) between cultured E. coli and qPCR measurements ranged from 0.64 (Edgewater Beach) to 0.87 (Grant Park) and were significant (p < 0.05) regardless of sampling site.In contrast, a significant correlation between enterococci measurements was only observed at one site (Edgewater Beach, r = 0.90, p < 0.001).Previous studies investigating FIB culture and qPCR paired measurements at Great Lakes beach sites report similar findings, suggesting that the degree of correlation is likely influenced by site specific conditions (Lavender and Kinzelman 2009, Shrestha and Dorevitch 2019, Whitman et al. 2010).While correlations between culture and qPCR measurements of E. coli and enterococci are well documented, little is known regarding the level of agreement between coliphage and qPCR FIB metrics.In this study, coliphage exhibited markedly lower correlations with paired qPCR fecal indicator measurements, with most comparisons resulting in non-significant results (p > 0.05), suggesting that the occurrence of these viral and bacterial fecal indicators in Great Lake recreational beach waters are governed by different conditions.Potential factors might include different animal source shedding patterns or variable fate and transport behaviors.
A recent study comparing coliphage with E. coli qPCR paired measurements in untreated sewage samples collected across the contiguous United States reported a similar trend, where somatic coliphage did not significantly correlate with E. coli qPCR (r = 0.21; p = 0.15), but F+ coliphage exhibited a weak correlation (r = 0.41, p = 0.003) (Korajkic et al. 2020).Further research could help elucidate factors contributing to these different occurrence patterns.

EPA Author Manuscript
EPA Author Manuscript

The influence of predictive model covariates
Many covariates were used to predict bacterial and viral fecal indicator concentrations, resulting in several notable trends.Of the covariates used in all models, rainfall was the most influential (Table 4), echoing findings from other microbial water quality modeling efforts focused on cultured FIB at recreational beach sites (Nevers and Whitman, 2005;Whitman and Nevers, 2008).Findings also lend support to the development of a potential core set of physical and beach parameter measurements for future water quality forecasting applications in the Great Lakes basin, as approximately 25% of the covariate data sets exerted minimal influence on microbial indicator predictive models (Table 4).Reducing the number of covariate measurements needed to predict water quality would streamline future applications as well as lower data collection costs.The analysis of covariate influence also provided useful clues about potential fecal sources of pollution at sites.For example, bird abundance was the second most influential covariate across all models (Table 4, Fig. 5), notably at Grant Park and Edgewater Beach.Information on potential sources of fecal pollution can help managers identify sites for future microbial source tracking analyses.Findings also demonstrated the utility of measuring discharge in nearby lotic hydrologic elements (e.g., the importance of Cuyahoga discharge at Edgewater Beach).Both Grant Park and Washington Park are also likely influenced by nearby lotic inputs -Oak Creek and Trail Creek, respectively -but discharge in these systems was not measured in this study due to the absence of permanent gauge stations.The absence of this potentially useful covariate could, in part, have contributed to lower model predictive performance at these two sites versus Edgewater Beach.Additional research is needed to confirm covariate importance trends identified in this study.

Conclusion
LARS-lasso predictive modeling with cross-validation was used to compare the predictive performance of models of coliphage (F+ and somatic), cultured E. coli and enterococci, and qPCR-based E. coli, enterococci, and Bacteroidales measures using a suite of environmental covariates at three recreational beach sites in the Great Lakes basin.Key findings include: • Models yielded highly variable SRMSEP and R PRED 2 measures, indicating that some microbial measures may be more amenable to statistical modeling approaches than others.
• Somatic coliphage models performed at a similar level or better compared to cultured and qPCR FIB while F+ coliphage models consistently performed poorly.

•
Enterococci and Bacteroidales qPCR outperformed E. coli qPCR regardless of beach site.

•
Rainfall, bird abundance, wave height, and wind speed/direction were the most influential covariates across all models.
• Approximately 25% of covariates exerted minimal influence on predictive models suggesting a potential core set of physical and beach parameters may

EPA Author Manuscript
EPA Author Manuscript EPA Author Manuscript be optimal for future water quality forecasting applications in the Great Lakes basin.
Additional research is warranted to further characterize the suitability of statistical predictive models for recreational water quality forecasting of virus and bacterial fecal indicators and confirm trends observed in this study.Findings also provided useful insights on water quality forecasting in the Great Lakes demonstrating a challenging reality, there can be a large degree of variability from one site to another.While LARS allowed for the successful comparison of different microbial indicator predictive performance and identified important differences between coliphage types and qPCR-based fecal indicators, future studies with larger sample sizes could be amendable to alternative approaches such as machine learning techniques that could further improve water quality forecasting.Water Res.Author manuscript; available in PMC 2023 September 01.

EPA Author Manuscript
Cyterski et al.
Summary statistics for microbial indicator log 10 concentrations measured at the three beach sites.ND = non-detect.Units of measurement for each response and detection limits/lower limit of quantification (LLQ) are given below the table.Predictive performance metrics for each microbial indicator and site model.The influence of each covariate across all models.

Fig. 1 .
Fig. 1.Three Great Lakes beaches sampled for microbial water quality: Edgewater Beach in Cleveland, OH (A), Grant Park in South Milwaukee, WI (B), and Washington Park in Michigan City, IN (C).Modified from Wanjugi et al. (2018).

Fig. 2 .
Fig. 2. Raincloud plots showing individual qPCR measurements (log 10 concentration) for Edgewater Beach (Panel A), Grant Park (Panel B), and Washington Park (Panel C) study sites.Open circles show measurements below and shaded circles indicate measurements above the respective detection limit/lower limit of quantification (LLQ).The right/left boundary of each box show the 25th/75th percentiles of the data distribution.The box's whiskers extend to any observation within 1.5 times the interquartile range.Shaded curves represent respective density distributions.

Fig. 3 .
Fig. 3. Average Pearson correlation matrices (r) for microbial measures at each study site (A = Edgewater Beach, B = Grant Park, C = Washington Park).Averaged r values appear in the lower left portion of each panel; t-test derived p-values are given in the corresponding upper right cells.Significant coefficients (p < 0.05) are bolded.Standard deviations for these coefficients (calculated from 100 simulations where concentrations below detection/LLQ were replaced by random uniform numbers) never exceeded 0.004 and were primarily < 0.001.

Fig. 5 .
Fig. 5. Heat maps of least-angle regression (LARS) regression coefficients for seven microbial indicator predictive models at Edgewater Beach (Panel A), Grant Park (Panel B), and Washington Park (Panel C).Covariates are displayed in alphabetical order on the y-axis and microbial measures are shown on the x-axis.Positive and negative coefficient values are denoted by red and blue shading, respectively.White/pale cells indicate coefficients near or at zero (statistically uninfluential).Cells shaded in black denote no covariate data available for that site.
root mean squared error of prediction.Water Res.Author manuscript; available in PMC 2023 September 01.

. qPCR-based bacterial fecal indicator measurements 2.4.1. Water filtration and DNA extraction-Water filtration
(Sivaganesan et al., 2018 composited Ct measurement data generated from analyses of the same multi-target plasmid DNA standards used for the Entero1a and GenBac3 assays(Sivaganesan et al., 2018) using a prototype of the automated Excel analysis workbook presented byLane et al.
Aw et al. (2019)adjustments from the Sketa22 assay were further used in the E. coli workbook to adjust for DNA recovery in the test sample extracts as described byAw et al. (2019).Enterococci CSE estimates were converted in the Method 1611.1/1609.1 workbook to calibrator cell equivalents (CCE) for comparisons with published EPA BAVs To assess covariate influence, a single LARS-lasso model was generated for each site and response variable combination using all available measurements for each data set.From this model output, only the regression coefficients were used; models were not used to evaluate predictive performance.For each site, regression coefficients for each microbial indicator and covariate combination (7 microbial indicators, 16 covariates) were displayed as heatmaps ("image" function in base R with color gradient created via the "RColorBrewer" package).The overall influence for each covariate was then evaluated for each covariate across all microbial indicator and site combinations (n = 21) based on the following metrics: (1) sum of regression coefficients (absolute values used); (2) frequency of non-zero regression coefficients; and Washington Park).Discharge from the Cuyahoga River at Edgewater Beach was the only covariate with regression coefficients not equal to zero regardless of microbial indicator model (this covariate only available for Edgewater Beach).The somatic and F+ coliphage models at Washington Park were the only instances where all covariate regression coefficients were equal to zero.In contrast, the enterococci qPCR model for Edgewater Beach was the only occurrence where all covariates yielded regression coefficients not equal to zero.The aggregate influence of each covariate across all models is summarized in Table4.Of the covariates available across all three sites, rainfall exerted the most influence with non-zero regression coefficient values in 52% (11 of 21) of models.Bird abundance, wave height, and wind speed/direction were the next most influential covariates across all microbial indicator models (Total Score ≥ 1.53).Five covariates exerted minimal influence on predictive models (Total Score ≤ 0.74) with pH contributing the lowest (Total Score = 0.29).

Table 1
Composite qPCR standard curve performance metrics.
were set to ½ the detection limit (shown at the bottom of the table) for calculations.

Table 3
Summed Coefficients are the sum of the absolute values of LARS regression coefficients in each model.Proportion Occurrence denotes the proportion of 21 models where a regression coefficient ≠ 0.Total Score is the sum of 'summed coefficients' and 'proportion occurrence'.Values calculated from seven models, not 21, as this covariate was only measured at Edgewater Beach.
*Water Res.Author manuscript; available in PMC 2023 September 01.