Relationship between SARS-CoV-2 in wastewater and clinical data from five wastewater sheds

The Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has resulted in a global pandemic starting in 2019 with nearly 500 million confirmed cases as of April 2022. Infection with SARS-CoV-2 is accompanied by shedding of virus in stool, and its presence in wastewater samples has been documented globally. Therefore, monitoring of SARS-CoV-2 in wastewater offers a promising approach to assess the pandemic situation covering pre-symptomatic and asymptomatic cases in areas with limited clinical testing. In this study, the presence of SARS-CoV-2 RNA in wastewater from five wastewater resource recovery facilities (WRRFs), located in two adjacent counties, was investigated and compared with the number of clinical COVID-19 cases during a 2020-2021 outbreak in United States. Statistical correlation analyses of SARS-CoV-2 viral abundance in wastewater and COVID-19 daily vs weekly clinical cases was performed. While a weak correlation on a daily basis was observed, this correlation improved when weekly clinical case data were applied. The viral fecal indicator Pepper Mild Mottle Virus (PMMoV) was furthermore used to assess the effects of normalization and the impact of dilution due to infiltration in the wastewater sheds. Normalization did not improve the correlations with clinical data. However, PMMoV provided important information about infiltration and presence of industrial wastewater discharge in the wastewater sheds. This study showed the utility of WBE to assist in public health responses to COVID-19, emphasizing that routine monitoring of large WRRFs could provide sufficient information for large-scale dynamics.


Introduction
The novel coronavirus SARS-CoV-2 (Severe Acute Respiratory Syndrome-Corona Virus-2) has led to the spread of the COVID-19 worldwide with nearly 500 million confirmed cases and over 6 million deaths, as of the beginning of April 2022 ( World Health Organization, 2022 ). The main reasons for the wide spread of the initial outbreak ( Alpha strain) are the long incubation period (up to 14 days; median: 4-5 days) and simultaneous viral shedding and transmission by asymptomatic infected individuals precluded from clinical testing data Xu et al., 2020 ). Wastewater-based epidemiology (WBE) has pre-COVID-19 been utilized as a successful surveilling and early warning tool to monitor poliovirus and SARS-CoV-1 ( Jafferali et al., 2020 ;Michael-Kordatou et al., 2020 ) in the wastewater shed (catchment area popula-tion of wastewater resource recovery facilities, WRRFs) ( Xagoraraki and O'Brien, 2020 ). Studies over the last decades have shown that sewage monitoring of pathogens can be effectively used to determine the presence and relative abundance of the pathogen within a community ( Larsen and Wigginton, 2020 ;Peccia et al., 2020 ) thus informing public health interventions ( Asghar et al., 2014 ). In addition, sewage, or wastewater monitoring can help ascertain whether the transmission of a pathogen (e.g., a virus) is declining or increasing ( Larsen and Wigginton, 2020 ). Specifically, wastewater surveillance has been used to monitor both the prevalence and severity of outbreaks by analyzing viral RNA in wastewater systems throughout a variety of community types ( D'Aoust et al., 2020 ;Jafferali et al., 2020 ).
Wastewater surveillance of viral RNA and the trends, also known as wastewater-based epidemiology (WBE), provides unique insights not af-  forded by clinical testing alone. Due to a combination of limited equipment or capabilities in addition to a high level of an infected population being asymptomatic ( Gandhi et al., 2020 ), reported numbers of infected individuals may be below the true infected count ( Lu et al., 2020b ). WBE can therefore provide complementary information as it is not procedure-intensive and also can detect RNA from asymptomatic individuals ( Michael-Kordatou et al., 2020 ;Wu et al., 2020b ). An additional benefit is that WBE is unbiased as it is centered around a community or select population, rather than particular individuals, and there is no discrepancy for those with underlying or pre-existing conditions and includes also those individuals not reporting their symptoms or seeking healthcare ( Wu et al., 2020b ). WBE can detect low viral concentrations in large wastewater volumes, although this can depend on environmental conditions, and sensitivity of the analytical procedures ( Lu et al., 2020a ;Wu et al., 2020a ). As domestic wastewater is prone to the impact of dilution (e.g., via rainfall, snow melt, stormwater, industrial wastewater), fecal indicator viruses such as the Pepper Mild Mottle Virus (PMMoV) are used to normalize analytical results from wastewater samples ( Medema et al., 2020 ). Currently, the U.S. Centers for Disease Control and Prevention (CDC) utilizes PMMoV as one of its two primary fecal indicators, specifically for the normalization of SARS-CoV-2 levels in wastewater ( Centers for Disease Control and Prevention (CDC), 2022 ). PMMoV is one of the most abundant viruses in the human gut ( Kitajima et al., 2018 ;Rosario et al., 2009 ). Therefore, it has often been used by researchers to determine the amount of fecal solids in a sample ( D'Aoust et al., 2020 ) and can be quantified and used to adjust reported SARS-CoV-2 values to account for the relative extent at which the sample has been diluted Medema et al., 2020 ). Additionally, PMMoV is found to be stable in wastewater and shows low seasonal variations ( Kitajima et al., 2018 ) Another benefit is that PMMoV, despite possessing slight diet-based variability, is present in the vast majority of human fecal samples ( Hsu et al., 2022 ;Kitajima et al., 2018 ).
The first objective of this study was to evaluate the feasibility of selected methodologies for wastewater data analysis and to enhance analytical laboratory procedures for wastewater analysis of viral RNA. The second objective was to evaluate trends of SARS-CoV-2 abundance compared to observed clinical surveillance data for five differently-sized wastewater sheds serving the five WRRFs. Previous studies have indicated that correlations are often observed as trends, being compared on a weekly basis ( Stadler et al., 2020 ;Weidhaas et al., 2021 ). However, some studies have had success in generating correlations utilizing daily clinical cases with frequent wastewater testing, albeit often with a slight time lag ( Feng et al., 2021 ;Karthikeyan et al., 2021b ). Therefore, this study investigates whether there is an improvement in correlation when shifting from a data-point based comparison to a weekly analysis. The third objective of this study was to investigate the benefit of data normalization with PMMoV in obtaining improved correlations between wastewater data and clinical viral cases of SARS-CoV-2.

Materials and Methods
Wastewater samples were collected and analyzed from five WRRFs, located in two adjacent counties in the same geographical region of an urban nature. Samples were collected during the period of October 15, 2020 to April 15, 2021 by operators at the WRRFs ( Table 1 ).
Samples were collected primarily as flow-dependent composite samples (24-hr) collected in volumes of either 400-500 mL or 900-1000 mL. All samples were stored at 4°C during and post-sampling and transported to the UMD laboratory on ice for processing. and often processed immediately. Portions of the samples were transferred to two 50 mL Falcon conical centrifuge tubes (Corning, Corning, NY, USA) at a fill volume of 45 mL per tube after shaking. In the event that samples could not be processed immediately, samples were stored at 4°C for up to an additional 48 hours.

Sample concentration and quantification
All samples were concentrated from 90 mL (as two 45 mL aliquots) to approximately 1 mL using the following method, as initially determined by Kaya et al. (2022) as the preferred procedure.

Centrifugation
The 50 mL Falcon conical tubes were centrifuged in a benchtop centrifuge (Allegra X-22, Beckman-Coulter, Brea, CA, USA) with a swingingbucket rotor, at 3,400 G for 20 min to remove the solid fraction and large particles. The supernatant was transferred to new 50 mL Falcon centrifuge tubes and apportioned over three centrifugal cycles into 15 mL Amicon Ultra-15 Centrifugal Filter Devices with a cut-off of 100 kDa (Millipore, Amsterdam, the Netherlands) for 15 min at 3,400 G. If the liquid did not pass through the device during the allotted runtime, additional centrifugation time was provided. Additionally, if a sample would have a higher solids content (a rare occurrence), the sample would have to undergo further centrifugation as to avoid noted inhibition of the RT-PCR signal.

Concentration
Once centrifugation yielded a total residual liquid (two 45 mL aliquots combined) of less than 1.5 mL, the duplicate samples were combined, and the concentrated sample was adjusted to 1 mL with RNasefree water. After rinsing and agitation, the sample was transferred to sterile 1.5 mL RNase-free Microfuge tubes (Invitrogen, Carlsbad, CA, USA) for subsequent RNA extraction.

RNA extraction
Quick-RNA TM Miniprep kits (Zymo Research, Irvine, CA, USA) were used to extract RNA. Specific modification included utilizing 0.3 mL of each sample and 0.6 mL of Lysis buffer. As per the protocol, final RNA volume for each sample was 0.1 mL. Reverse transcription was performed as outlined in the publication by Kaya et al. (2022) , section 2.8. A noted exception was the use of a smaller volume of reverse transcriptase (1 μL rather than 1.25 μL), in order to preserve stock (see Table 2 for full details). This modification was made for most samples and no discrepancies were observed by this reduction.

RT-qPCR
Protocols were predicated upon CDC recommendations ( Corman et al., 2020 ;Lu et al., 2020c ). SARS-CoV-2 gene (N) plasmids (2019-nCoV RUO Kit -Cat. No:10006625, IDT) were linearized via manufacturer's protocol prior to use as an RT-PCR standard. Standard curves for assays were prepared by performing serial dilutions (10-fold) Reaction mix 2 added and samples incubated at 42°C for 1hr. Enzyme inactivated at 70°C for 20 min and kept at 4°C until tubes removed from machine. * Some October and early November samples used 5 μL of buffer and 1.25 μL of transcriptase. in order to verify quantitative results. Additional specifications used are outlined in Kaya, et al. (2022) . RT-PCR was conducted using a CFX Connect Real-Time PCR Detection System (Bio-Rad, Hercules, CA, USA), using standard and calibration curves to determine the viral concentration. TaqMan® (Promega, Madison, WI, USA) Master primer mix was used along with specific primers and probes ( Table 3 ). Positive control was established via standard curves run on each plate, when performing RT-PCR. Specifically, linearized standards of the relevant plasmid (e.g., N or PMMoV) were added in triplicate to each well plate. N-plasmids, used for N1 and N2 analysis, ranged from 10 1 -10 5 gc/L in concentration. De-ionized water was used as a method/extraction blank and RNAse-free water was used as negative controls in each PCR well. These were done either in duplicate or triplicate. Both the N1 and N2 primer mixes were diluted to preserve stock, and it was confirmed that there was no loss, nor any significant change compared to undiluted stock. The initial viral load (RNA) in the original wastewater sample was calculated. Data was measured via RT-PCR utilizing either the N1 or N2 gene, with both genes being analyzed for comparison.

Process control -Bovine respiratory syncytial virus (BRSV)
Bovine Respiratory Syncytial Virus (BRSV) (Inforce 3 Cattle Vaccine TM , Zoetis, Parsippany, NJ, USA) was used as a surrogate (process control), as it was used to determine the level of recovery. BRSV was chosen due to its inherent stability ( Bivins et al., 2021 b) and availability. Additionally, BRSV was found to functional well as a surrogate due to its morphological similarities to the SARS-CoV-2 virus and its documented use as an extraction control for drinking and wastewater samples (  For each 45 mL aliquot of sample, a 1:1000 ratio ( 45 μL:45 mL ) of BRSV stock concentration (median concentration of 10 11 ± 10 1 gc/L) was added prior to all sample processing, including centrifugation. The reported recovery determined via BRSV analysis was approximately 8.5% and most values ranged between 0.5-20% consistent with findings in similar studies ( Alygizakis et al., 2020 ;Bivins et al., 2021 b;Prado et al., 2021 ), Several studies ( Li et al., 2021 ;Maestre et al., 2021 ) reported difficulty in utilizing BRSV as a normalization approach and obtaining high recoveries. However, in this study it was a beneficial surrogate to determine if a WRRF sample was negative (i.e., below the threshold), that it was not due to a procedural error, but rather either that it was a true value or that the low value was due to an error with the N-gene primer mix or standards.

Normalization -Pepper Mild Mottle Virus (PMMoV)
PMMoV RNA was quantified via RT-PCR in an identical manner to SARS-CoV-2 and BRSV primers. The specific primers used to detect the Pepper Mild Mottle Virus (PMMoV) can be seen in Table 3 . A series of 10-fold dilutions were used to establish a proper calibration curve (5-point) and all samples were analyzed at least in duplicate, utilizing at least one negative and positive control. PMMoV values ranged from 2 × 10 4 gc/mL to 4 × 10 7 gc/mL but typically were in the 10 5 -10 6 gc/mL range. The average Ct value was 27.54 ± 2.01 min.
Start ing qt y . determined by PCR ( gc ∕ μl ) * Vol . of cDNA used for PCR rxn ( uL ) Total PCR vol . in well * total cDNA vol . ( mL ) vol . of RNA used ( mL ) * final vol . of all RNA ext ract ed vol . of concent rat e used in RNA ext ract ion * total vol . of concent rat e total initial sample vol .
* 1000 uL mL 1000 mL L = gc ∕L (1) The results indicated that the presence of SARS-CoV-2 varied between wastewater sheds but the variation was not statistically significant with standard deviations ranging from 2.2 -7.2 × 10 6 gc/L for N1 values and from 0.38 -1.5 × 10 6 gc/L for N2 values. WRRF3 had the highest average value over the six-month monitoring period (4.4-5.2 × 10 6 gc/L and 8.8-9.9 × 10 5 gc/L for N1 and N2 values, respectively), as well as the greatest standard deviation, but fell within the margin of error. WRRF5 had the lowest average values (1.4-2.3 × 10 6 gc/L and 3.2-4.5 × 10 5 gc/L for N1 and N2 values, respectively), and also the lowest maximum values and standard deviations for both N1 and N2. This was in addition to WRRF5 having the greatest incidence of values falling below the threshold. These lower values of WRRF5 may be due to flow-related dilution rather than demographic or clinical reasons. Overall, there was a strong correlation between the reported N1 and N2 values (see SI for full comparison), and the average values over time for N1 values with and without normalization ( Fig. 2 ).

Correlations between WRRF data and Covid-19 cases
The results obtained from each WRRF were compared with the reported Covid-19 cases for all zip codes contained within the service area of the individual wastewater sheds. Correlations were observed both visually (graph-based overlaps) and statistically. In the former instance, the number of new cases reported over time within a set of zip codes was overlaid with the measured wastewater data for that respective WRRF ( Fig. 3 ). Overall, there was an insignificant improvement in correlations between the detected N1 gene SARS-CoV-2 numbers and clinical case data marked by the use of PMMoV normalization ( Fig. 3 ) for WRRF 2-5. However, WRRF1 showed the greatest improvement by normalization, which may be due to a much smaller population size and geographical area of WRRF1 in comparison to the other plants, thus being prone, perhaps, to greater volatility. The normalization with PMMoV corrected unusually large peaks of SARS-CoV-2 presence during the monitoring period and provided trends that were not observed before normalization (for more detail see SI.) Spearman's rank correlation analysis was used to evaluate the temporal relationship between the viral data in wastewater and the new clinical case numbers in the sewer sheds (all p < 0.05). To evaluate the temporal effect, the viral load was paired with the daily and the  weekly averaged new cases, respectively, for the sewer sheds to calculate Spearman's rank correlation coefficients (Spearman's Rho). As data points from the wastewater were collected, on average once or twice per week, it was expected that there would be little correlation with these values and the clinical data, which was collected daily. However, it was uncertain, despite the expected poor correlation, if the use of PMMoV normalization would nonetheless have a significant impact on these results.
With the exception of WRRF1 (which had a slightly negative correlation) Spearman rank coefficients ranged from 0.137 to 0.177. This confirmed the presence of a poor correlation between wastewater and clin-ical data values. In agreement with other studies ( Al-Faliti et al., 2022 ;Fitzgerald et al., 2021 ;Rusiñol et al., 2021 ), the large size WRRFs have a better correlation (Spearman's Rho = 0.64) between the viral loads and the daily clinical cases than the medium and small size WRRFs. The use of PMMoV normalization decreased Spearman coefficients for WRRFs 2, 3 and 4 but increases were observed for WRRF1 and WRRF5. WRRF1, as noted previously, had the greatest fluctuations in values of all WRRFs due to its small population size thus many values fell below the detection limit. WRRF5 had the lowest suspended solids content as well as the greatest flow and population size/area. For WRRF5, the Spearman rank coefficient improved from 0.158 to 0.220. Combined average val- 29.5 * * 140.7 1.46 × 10 9 * Values reported are for Raw Influent (source for samples). * * These values reported are flow + recycle.
ues from all five WRRFs did not statistically change with the PMMoV normalization.
When the correlation was performed based on the total number of clinical cases reported in the week for which the sampling was done at the WRRFs was considered, the correlations improved for WRRFs 1-4. The Spearman rank coefficients ranged from 0.434 to 0.599 with WRRFs 2, 3 and 4 having the greatest similarity. WRRF5, possibly due to it having the greatest dilution, was the only WRRF positively impacted by the use of PMMoV normalization. WRRF3 and WRRF4 had no notable change, while WRRF1 and WRRF2 had negative impacts from the use of normalization. These findings demonstrate that wastewater can be used to monitor dynamic trends of COVID-19 infections in a community, regardless of the population size and magnitude of confirmed cases.
Many studies have claimed ( Feng et al., 2021 ;McLerran, 2022 ;Stadler et al., 2020 ) that normalization based on the presence of PM-MoV does not often significantly increase correlations significantly and can sometimes even make them worse. However, PMMoV and similar viral RNA are used to normalize wastewater data in an effort to validate the data obtained ( Li et al., 2022a ), and has an additional benefit, as it can also be used as an additional recovery tool ( Feng et al., 2021 ). Therefore, use of PMMoV should be performed together with analysis of additional parameters such as rain fall data to assess the risk of infiltration and the presence of industrial wastewater contributions, where PMMoV is unlikely to be present.

The effects of flow rate, suspended solids, and dilution
To better understand the potential benefit of PMMoV, the effects of flow rate, suspended solids and dilution were evaluated further ( Table 4 ). WRRF1 covered a smaller geographical area and also represented a comparatively smaller population thus it was more prone to fluctuation. At times, samples from this WRRF were collected on different dates than the other WRRFs due to the geographic location. Additionally, the flow rate for WRRF1 was approximately one order of magnitude lower than the other four WRRFs, but its population served was as much as fifty times less than the other WRRFs, thereby having the possibility of higher dilution. For WRRFs 1, 2 and 5, the reported flow also included recycling of the influent prior to the measurement. Average suspended solids values for each WRRF were also assessed and WRRF4 and WRRF5 had the lowest values. This also corresponded with the measured PMMoV values, as WRRF4 and WRRF5 showed the lowest PMMoV values ( Table 4 ).
The use of the PMMoV and suspended solids data showed that WRRF5 samples were more diluted than the other large WRRFs and therefore it is likely that more values fell below the detection threshold. Although the solids content was similar for WRRF4 and WRRF5, the increased flow for WRRF5 also impacted the SARS-CoV-2 dilution. Another factor impacting dilutions, was a period in mid-February with prolonged snow accumulation followed by melting. This led to dilution, for both the N1 and N2 genes, where many values on Feb. 15 th and 18 th fell below or close to the threshold ( Fig. 2 ). The usefulness of the PMMoV normalization can be seen in Fig. 2 for dates in early to mid-February, during which there was frequent snow and inclement weathers. The data illustrate how normalization can correct for these dilutions. The strong correlation between PMMoV values and suspended solids is well studied and although prone to variation and impact by non-fecal solids, there is often a strong relationship between the two ( Kim et al., 2022 ). The PMMoV data obtained in this study was compared to the suspended solids for each WRRF of the studied time period ( Figure 4 ).
Although the relationship between PMMoV and the fecal solids or "fecal strength " is well documented ( Graham et al., 2020 ;Kim et al., 2022 ;Wolfe et al., 2021 ), there are few papers specifically correlating the suspended solids of WRRFs to PMMoV levels, as is indicated in Fig. 4 . This relationship, nevertheless, was not always a good tool for normalization as it only sometimes improved the Spearman Rank correlation (see above). This is concurrent with discrepancies in the literature. For example, Zhan et al. (2022) noted an increased correlation between wastewater and clinical cases through the use of PMMoV normalization (r s = 0.35 → r s = 0.73). They noted that this success occurred when clinical cases where low and the effects of normalization were more pronounced ( Zhan et al., 2022 ). In contrast, Feng et al. (2021) found PMMoV normalization to either cause nominal improvement or decrease correlations. They posit this is due to the complex nature of wastewater and differences in how viruses interact in sewer environments. Most similar to this study, Duvallet et al. (2022) found that in some instances PMMoV increased correlation, while in others, it decreased correlations or had no effect. They attribute this to the variability of both the wastewater matrices and of the treatment plant design ( Duvallet et al., 2022 ). Padilla-Reyes et al. (2022) noted that irrespective of normalization, weekly or biweekly correlations were stronger than daily correlation but that correlations also often had to be adjusted for time discrepancies (lags) between peaks in WW versus clin- ical detection. The data presented in this WRRF study did not have correlations that could be improved through lag time or other timeshift adjustments, but more frequent sampling could lead to that being observed.

Conclusions
Detection of SARS-CoV-2 using the genes N1 and N2 overall showed a good correlation between the results, but in some instances only one of the two genes was detected, which could be due to degradation of the target RNA. N1 values overall demonstrated a better fit and correlated better than N2 values with the clinical data based on zip code. This analysis did not find a strong association between the viral loads in the wastewater and the daily clinical case numbers from the sewer sheds. However, a correlation was observed between wastewater data and weekly clinical cases for all the WRRFs, which is likely due to the delays in analyses and reporting of clinical data. This is in agreement with many studies, namely, that a weekly average or similar analysis provides a stronger correlation ( Fahrenfeld et al., 2022 ;Halwatura et al., 2022 ;Li et al., 2022b ;Shah et al., 2022 ). This study was also able to successfully use the PMMoV to normalize the data but did not lead to consistent increases in correlation between the wastewater data and the clinical data. The data that were normalized with PMMoV, however, did show a strong correlation with suspended solids levels for the wastewater influents thus supporting the accuracy of normalization to determine the proportion of fecal solids in a particular sample.
Variations in average numbers of SARS-CoV-2 were observed between WRRFs but these were not statistically significant. This may be due to the variability of wastewater and the overall large population sizes of the catchment areas. Improved understanding of these factors affecting the SARS-CoV-2 RNA concentration is needed to improve such prediction models. Overall, the findings of the analysis demonstrated that wastewater can be used to monitor the dynamic trend of COVID-19 infections in a community. Although correlations and predictions can strongly vary, WBE can be used as an early warning to signal, to determine rising cases in a given area, especially if these individuals are not clinically tested. This study, like others, Karthikeyan et al., 2021a ;Randazzo et al., 2020 ). finds that the use of wastewater analysis could be a cost-effective and sensitive strategy to successfully detect community transmission of the virus earlier than clinical reporting.

Supplementary data
Further data to support the results and conclusion driven in this paper is available in the Supplementary Information file.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.