Evaluation of a Wastewater-Based Epidemiological Approach to Estimate the Prevalence of SARS-CoV-2 Infections and the Detection of Viral Variants in Disparate Oregon Communities at City and Neighborhood Scales

Background: Positive correlations have been reported between wastewater SARS-CoV-2 concentrations and a community’s burden of infection, disease or both. However, previous studies mostly compared wastewater to clinical case counts or nonrepresentative convenience samples, limiting their quantitative potential. Objectives: This study examined whether wastewater SARS-CoV-2 concentrations could provide better estimations for SARS-CoV-2 community prevalence than reported cases of COVID-19. In addition, this study tested whether wastewater-based epidemiology methods could identify neighborhood-level COVID-19 hotspots and SARS-CoV-2 variants. Methods: Community SARS-CoV-2 prevalence was estimated from eight randomized door-to-door nasal swab sampling events in six Oregon communities of disparate size, location, and demography over a 10-month period. Simultaneously, wastewater SARS-CoV-2 concentrations were quantified at each community’s wastewater treatment plant and from 22 Newport, Oregon, neighborhoods. SARS-CoV-2 RNA was sequenced from all positive wastewater and nasal swab samples. Clinically reported case counts were obtained from the Oregon Health Authority. Results: Estimated community SARS-CoV-2 prevalence ranged from 8 to 1,687/10,000 persons. Community wastewater SARS-CoV-2 concentrations ranged from 2.9 to 5.1  log10 gene copies per liter. Wastewater SARS-CoV-2 concentrations were more highly correlated (Pearson’s r=0.96; R2=0.91) with community prevalence than were clinically reported cases of COVID-19 (Pearson’s r=0.85; R2=0.73). Monte Carlo simulations indicated that wastewater SARS-CoV-2 concentrations were significantly better than clinically reported cases at estimating prevalence (p<0.05). In addition, wastewater analyses determined neighborhood-level COVID-19 hot spots and identified SARS-CoV-2 variants (B.1 and B.1.399) at the neighborhood and city scales. Discussion: The greater reliability of wastewater SARS-CoV-2 concentrations over clinically reported case counts was likely due to systematic biases that affect reported case counts, including variations in access to testing and underreporting of asymptomatic cases. With these advantages, combined with scalability and low costs, wastewater-based epidemiology can be a key component in public health surveillance of COVID-19 and other communicable infections. https://doi.org/10.1289/EHP10289


Introduction
Wastewater-based epidemiology (WBE) has emerged as an effective and sensitive approach for monitoring COVID-19 presence in a community through the detection of the novel coronavirus (SARS-CoV-2) shed by infected individuals into wastewater. [1][2][3][4][5] Although methods for COVID-19 WBE are still being refined, particularly with respect to optimizing sampling and virus concentration methods, [6][7][8][9] this approach has shown promise, with wastewater SARS-CoV-2 concentration trends mimicking those of clinically reported COVID-19 cases. 5 In addition, in some cases, SARS-COV-2 had been detected in wastewater prior to the reporting of clinical COVID-19 cases. 5,10 With the ability to noninvasively monitor an entire community with a single wastewater sample from a wastewater treatment plant (WWTP), WBE also has clear advantages in terms of cost (compared with traditional surveillance methods) and in areas where clinical testing is limited or residents are hesitant to participate. [1][2][3][4][5] What has remained elusive is the quantitative relationship between viral concentrations in wastewater and community infection rates, as well as the representativeness of community viral genotype profiles from wastewater sequencing. These limitations are due to the biological variability of SARS-CoV-2 infections, physical uncertainties of wastewater sampling, and inherent variability in case reporting. Biological variability encompasses latent variation in the magnitude and duration of viral shedding in both symptomatic and asymptomatic infected individuals. [11][12][13] For instance, asymptomatic individuals have similar viral loads as symptomatic individuals but tend to have shorter durations of viral shedding into fecal material. 14,15 Physical uncertainties include representativeness of the wastewater samples, virus concentration and extraction methodologies, molecular detection methods and polymerase chain reaction (PCR) inhibition as well as RNA persistence and decay rates in sewage conveyance systems. 6,8,[16][17][18] In addition, it is uncertain how the integrity of the SARS-CoV-2 virome affects PCR and sequencing efforts, with both intact and degraded SARS-CoV-2 envelopes and RNA being present in the wastewater. [19][20][21] Finally, there is inherent variability in case reporting results from underreporting of infections owing to limited testing capacity, barriers in access to testing, testing avoidance, self-isolation of individuals with mild symptoms, and widespread asymptomatic transmission of the virus. 22,23 In this paper, the SARS-CoV-2 burdens on communities of diverse size, location, climates, and demography were examined through four lines of evidence: wastewater SARS-CoV-2 concentrations, prevalence data estimated via testing nasals swabs of residents participating in random door-to-door sampling events, 24 clinically reported COVID-19 cases (which was obtained from the Oregon Health Authority and included cases not identified through the door-to-door sampling events), and sequence data obtained from nasal swabs and wastewater samples. From these data sets, the accuracy of wastewater SARS-CoV-2 concentrations and reported COVID-19 cases were compared with the estimated prevalence of SARS-CoV-2 infection (including both symptomatic and asymptomatic individuals). In addition, the ability of wastewater SARS-CoV-2 concentrations to identify neighborhood-level COVID-19 hotspots, as well as the community SARS-CoV-2 variant profile, was determined.

Community Nasal Swab Sampling and Prevalence Estimates
The prevalence of SARS-CoV-2 in six Oregon communities (some at multiple time points) was estimated using a two-stage cluster sampling scheme as a part of Oregon State University's Team-based Rapid Assessment of community-level Coronavirus Epidemics (TRACE) project. 24 In the first stage, 30 clusters (each comprising one or more census blocks) were randomly selected in each community, with probabilities proportional to the number of housing units (i.e., clusters with a large number of occupied housing units had a higher probability of being selected). Each cluster contained a minimum of 50 occupied housing units. The TRACE project received approval from Oregon State University's institutional review board to conduct this work.
Starting in 2021 with the last two community sampling events (in Redmond and Corvallis), results from community wastewater testing were used to inform the first stage of sampling. These samples represent a refinement of the TRACE methodology based on evidence that estimates of SARS-CoV-2 concentration in wastewater correlated well at the micro-sewershed level to prevalence within the community. For these two samples, clusters were first grouped into approximately five strata based on microsewershed boundaries. Then wastewater data collected ∼ 2-3 wk prior to household sampling were used to allocate the 30 sampled clusters to the strata, using optimal allocation to minimize the anticipated standard error of the prevalence estimate. This standard approach in sampling methodology facilitated oversampling from higher prevalence areas in a principled design-based approach and still allowed for approximately unbiased estimates of prevalence to be obtained. With this approach, a total of 30 clusters were still selected; however, a set number of them were required to be from each stratum based on wastewater data. Within each stratum, the clusters selected for door-to-door sampling were still chosen with probability proportional to the number of occupied housing units in the cluster.
In the second stage, field teams used systematic sampling with a random start (i.e., the housing unit at the northwest corner of the cluster) to identify 12 housing units within each of the 30 clusters. The sampling interval (k) used for this stage was specific to each cluster and was calculated by dividing the total number of housing units in the cluster by 12. Field teams were provided with the sampling interval and starting location in advance of data collection. Teams proceeded through their assigned cluster(s) in a serpentine fashion, adhering to the sampling interval. Teams revisited housing units in which no one was home two additional times before the housing unit was replaced. Housing units in which no one agreed to participate were also replaced.
All residents of the selected households, regardless of age, were invited to participate. After obtaining informed consent, field teams interviewed each participant to collect information including name, date of birth, contact information, symptoms, previous positive test result(s), COVID-19 vaccination status, and demographics. Field teams provided test kits to participants and assisted them in self-collecting a nasal swab specimen. Nasal swab specimens were transported to the laboratory for processing and analysis. Test results were mailed to all participants and available to participants via a secure website if they opted for this method. REDCap software was used to securely collect and track data participation and disseminate results of the COVID-19 tests to the participants. 25 In total, the TRACE project collected 4,136 self-administered nasal swab samples from residents in 2,521 randomly selected Oregon households in Bend, Corvallis, Eugene, Hermiston, Newport, and Redmond from 30 May 2020 to 14 March 2021. An average of 517 ± 78 individuals from 315 ± 43 households (60% ± 14% average household participation rate) participated in each random door-to-door nasal swab sampling event. The prevalence of SARS-CoV-2 within the community at the time of sampling was estimated using one of two approaches based on the number of positive cases detected.
When at least one positive case was detected, a designweighted approach accounting for the corresponding multistage sampling design (with or without wastewater strata) was used. In this approach, the sampling unit is the housing unit, the sampling weight depends on factors of the design, and the response is the observed positivity among participants in that housing unit. For example, in the original approach without incorporating the wastewater information, the sampling weight of housing unit j in cluster i is shown in Equation 1 where M 0 is the total number of housing units in the population; M i is the total number of housing units in cluster i; m i is the number of housing units sampled from cluster i (ideally 12); and h ij is the number of participants from housing unit j in cluster i.
These weights are used to estimate prevalence using a standard Horvitz-Thompson estimator. 26 When zero positive cases were identified through door-todoor sampling in the whole community, a Bayesian approach was taken that combined the observed data with active case counts within the community at the time of sampling. In particular, a beta-binomial model was used where the count of positive individuals was assumed to follow a binomial (n,p) distribution, and the prior for p is assumed to be beta (a, b). Values for a and b were obtained via optimization by setting the fifth quantile of the beta distribution equal to the number of cases reported to the Oregon Health Authority in the previous 2 wk divided by the population size, and the 50th quantile equal to the fifth quantile divided by the proportion of cases thought to be symptomatic based on previous data.
A noninformative (uniform [0,1]) prior was also explored, and the approach was not sensitive to the choice of prior. The posterior distribution for p also takes the form of a beta and can be used to estimate prevalence. Imperfect test sensitivity and specificity were accounted for in both approaches. Prevalence estimates were quickly shared with the local public health officials partnering with the TRACE project.

Clinically Reported COVID-19 Cases
Weekly COVID-19 case data for the participating communities were obtained from ZIP code reported data from the Oregon Health Authority website. 27 For weeks when <10 cases occurred in a given ZIP code, the Oregon Health Authority reported the case count as "1-9 cases." In these instances, to minimize error, a value of 5 cases was substituted in the calculations.

Comparison of Wastewater Concentration Methods
Several wastewater concentration methods were tested to determine which method resulted in the highest viral recovery, as estimated by the maximum mean SARS-CoV-2 signal recovered from replicate composite WWTP samples. The WWTP sample was collected from Newport, Oregon, during the June 2020 sampling event. Two concentration methods, electronegative membrane filtration (EMF) and centrifugal ultrafiltration (CU) were evaluated. In addition, four iterations of EMF with various amendments were tested (EMF Methods A-D). All methods were conducted in triplicate.
For the EMF method, 30 mL of WWTP influent composite sample was mixed gently on a stir plate with a magnetic stir bar. Each of the EMF method iterations differed in the second step, when amendments were made to the sample. In EMF Method A, 0.01 N hydrochloric acid (HCl) was added to titrate the pH below 3.5 and magnesium chloride (MgCl 2 ) was added to make a final concentration of 25 mM. In EMF Method B, 0.01 N HCl was used to titrate the pH below 3.5 but no MgCl 2 was added. In EMF Method C, MgCl 2 was added for a final concentration of 25 mM, but the sample was not acidified. In EMF Method D, no additions were made.
All samples were gently mixed for an additional 5 min. After mixing, the entire sample was vacuum filtered through a 0:45-lm pore size, 47-mm diameter, mixed cellulose ester electronegative membrane (Catalog no. 7141-104; Whatman). The filter was stabilized in 1 mL of DNA/RNA Shield (Zymo Research, Irvine, CA) and stored at -20 C until RNA extraction and reverse transcriptase droplet digital PCR (RT-ddPCR) could occur.
In the CU method, 40 mL of wastewater was added to a Centricon Plus-70 centrifugal ultrafiltration device (Catalog no. UFC701008; Merck Millipore) with a 10-kDa cutoff. The ultrafiltration device was centrifuged for 20 min at 3,220 × g. The filtrate was discarded, and the concentrate spun out of the ultrafilter back into the concentrate cup at 1,000 × g for 2 min. The concentrate was removed and placed in 1 mL of DNA/RNA Shield.

WWTP Sampling
All WWTP influent samples comprised 24-h time-weighted composites taken prior to primary treatment. The characteristics of each WWTP are given in Table 1. Samples were collected at the time of each random door-to-door nasal swab sampling event, as well as an additional one to three times per week for 6-11 months from April 2020-May 2021.
Twenty-two pump stations serving Newport, Oregon's Vance Avery WWTP sewershed, were sampled hourly for 24 h from 20-21 June and 11-12 July 2020 ( Figure 1). Some pump stations were sampled twice within a single weekend, for a total of 52 pump station samples. The total population of the Newport sewershed was 10,853, with the populations served by each microsewershed ranging from 15 to 9,426 persons. The characteristics of each micro-sewershed and the distribution of samples are given in Table 2.
For both WWTPs and pump stations, the 24-h composites consisted of hourly samples and were kept on ice during sampling. Samples collected on or before 31 July 2020 were frozen in 200-mL aliquots and stored at −80 C for up to 33 d prior to concentration (median 6 d). After thawing in cold water baths, samples were concentrated using electronegative filtration, as previously described. 28 Briefly, all samples collected on or before 10 July 2020 were acidified to a final pH of 3.5 and MgCl 2 was added to a final concentration of 25 mM. Samples (30-40 mL) were vacuum filtered through a 0:45-lm pore size, 47-mm diameter mixed cellulose ester electronegative filter (HAWP; Millipore).
Influent samples collected after 10 July 2020 were filtered with no amendments given that preliminary data showed improved viral recovery with unamended wastewater ( Figure S1). In addition, influent samples collected after 31 July 2020 were neither frozen nor amended prior to filtration. In these samples, filtration occurred within 8 h of sample collection. Once filtration was complete, the electronegative membranes were placed into 2-mL tubes containing either 0:7-mm garnet or 0:5-mm glass beads, stabilized in 1 mL of DNA/RNA Shield (Zymo Research), and frozen until analysis. Preliminary experiments showed no difference between these homogenization methods ( Figure S2). Field blanks of deionized water were processed with every batch of samples.

Molecular Analysis: Nasal Swabs
Participant nasal swab samples were analyzed at the Oregon Veterinary Diagnostic Laboratory using the TaqPath COVID-19 Combo Kit (Applied Biosystems), in accordance with the manufacturer's instructions for use as required by the Emergency Use Authorization under strict biosafety level 2 (BSL2) conditions. Nucleic acid isolation was performed using the MagMax Viral/ Pathogen II Nucleic Acid Isolation Kit (Applied Biosystems/ ThermoFisher). Briefly, 200 lL of transport medium from the swab sample was added to a single well of a KingFisher Deepwell 96-well plate containing 5 lL of Proteinase K. Each 96-well plate held 94 participant samples and 1 negative control well containing water. The last well was left empty to allow for the positive control to be added during the (RT)-PCR detection step. After sample addition, nucleic acid magnetic beads were resuspended, 10 lL was added to 265 lL of binding solution and then added to the wells. MS2 phage control (5 lL) was added to all wells as an extraction control and processed on a KingFisher Flex magnetic particle processor. Purified nucleic acids were eluted in 50 lL of MagMax elution solution. Eluted nucleic acid was stored at −80 C unless RT-PCR was run within 2 h of extraction.
Detection of SARS-CoV-2 viral RNA was performed using the TaqPath RT-PCR COVID-19 Kit on a 7500 Fast Real-Time PCR Instrument (Applied Biosystems). Reactions were run in multiplex with primers and probes specific for three gene sequences specific to SARS-CoV-2: ORF1ab, N Protein, and S Protein. A primer and probe set was included to detect the MS2 phage added during the initial sample processing as an internal control to verify RNA extraction. The thermal protocol for the one-step real-time PCR was as follows: 2 min at 25°C uracil N-glycosylase incubation, 10 min at 53°C reverse transcription, 2 min at 95°C activation, followed by 40 cycles of 3 s at 95°C denaturation and 30 s at 60°C anneal/extension/detection. Results were analyzed using SDS Software (version 1.5.1) and interpreted using COVID-19 Interpretive Software (version 1.2; Applied Biosystems).

Molecular Analysis: Wastewater
Wastewater samples were homogenized with either 0:7-mm garnet beads or 0:5-mm glass in DNA/RNA Shield using either a Qiagen TissueLyser (Qiagen Inc.) or a BioSpec Mini-Beadbeater 16 (BioSpec Products, Inc.) for 2 min. Beads and debris were pelleted by centrifugation at 12,000 rcf for 1 min. The lysate was transferred from each tube to a 96-well plate, and 200-400 lL was extracted using the MagMAX Viral/Pathogen kit on a KingFisher Flex automated instrument (ThermoFisher Scientific), as described above. Purified RNA was eluted in 50 lL of MagMAX Elution Solution. Extraction recovery from RNA extraction step was quantified with a commercial standard (Exact Diagnostics), and extraction blanks were included with every run. RNA was stored at −80 C until analysis.
Two SARS-CoV-2 RNA targets (N1/N2) and an internal control (Human RNase P) were measured via RT-ddPCR using a commercial triplex assay (2019-nCoV CDC ddPCR Triplex Probe Assay; Catalog no. 12008202; Bio-Rad) using the One- Step RT-ddPCR Advanced Kit for Probes and run on a QX-200 ddPCR system (Bio-Rad). The primer and probe sequences were published previously. 29 RT-ddPCR was chosen over RT-qPCR owing to its superior sensitivity, 30,31 robustness against inhibitors, 32 and its wide application in WBE methodologies. 17,33,34 Reactions were partitioned into droplets using an automated droplet generator (ADG). Twenty-two microliter reactions were prepared with 5:5-lL template RNA, whereas the ADG partitioned only 20 lL, yielding an effective template volume of 5 lL. Each reaction had an average of 12,657 droplets [standard deviation ðSDÞ = 1,783]. Commercially prepared RNA standards and negative controls were included on each extraction plate and ddPCR plate (cat. no. COV019 and COV000; Exact Diagnostics). All samples and controls were analyzed in duplicate.
The one-step thermal cycling conditions were as follows: reverse transcription at 50°C for 60 min; enzyme activation at 95°C for 10 min; 40 cycles of denaturation at 94°C for 30 s followed by annealing/extension at 55°C for 60 s; enzyme inactivation at 98°C for 10 min; and, last, a 4°C hold for droplet stabilization, for a minimum of 30 min to a maximum of overnight. Finally, the amplification in the droplets was determined using the Bio-Rad droplet reader. All assay conditions were performed as specified in the Bio-Rad assay protocol. 35

RT-ddPCR Quality Control
The quality controls for the wastewater RT-ddPCR method included field blanks, extraction blanks, negative control reactions [containing only human genomic DNA (gDNA)], positive control reactions (containing synthetic RNA of SARS-CoV-2 assay targets and human gDNA), and no-template controls (NTCs). The results of all quality control (QC) reactions are summarized in Table S1. Only positive detections which passed quality assurance/QC were included in this study.

RT-ddPCR Data Analysis
Wells with <6,000 droplets were omitted. Sample data with positive reactions were accepted only if the corresponding extraction blank and field blanks, as well as the PCR negative and notemplate controls, were all negative for the N1/N2 targets. When averaging sample data across replicates, a value of one-half the sample-specific limit of detection was substituted for nondetects. Reactions were regarded as positive if three or more droplets per well amplified in either target. Droplet clusters were manually called for each target using the QuantaSoft Analysis Pro software (version 1.4; Bio-Rad). All other analyses were conducted in R (version 4.1.0; R Development Core Team) with Rstudio Desktop (version 1.4.1717). 36 Spatial graphics were created using the sf, 37 ggplot2, 38 gtable, 39 and ggmap 40 packages.
The N1 and N2 markers exhibited generally good agreement in the wastewater samples. The markers were concordant in 91.4% of reactions: 4.1% of reactions (n = 954) had a positive detection in N1 only (using a threshold of three positive droplets per reaction), and 4.5% of reactions were positive in N2 only. Quantitatively, the markers were also reasonably well aligned according to a simple linear model [slope = 0:94, with N2 as the response variable, adjusted r 2 = 0:97, root mean square error ðRMSEÞ = 0:58 copies Table 2. Newport, Oregon, micro-sewershed characteristics: the pump station name, size (area and population) and the percentage of the area that is residential for the 22 Newport micro-sewersheds (neighborhoods) sampled in this study. per reaction]. Accordingly, the N1 and N2 data were averaged together for all reported concentration values. For limit of blank (LOB) determination, 104 reactions were run with the triplex assay across three plates using the Exact Diagnostics Negative Control as a template. Due to the nonnormal distribution of the results, the LOB was determined using a nonparametric (rank order) method with a false-positive probability (a) of 0.05.
The limit of detection (LOD) was predicted for each target (N1 and N2) using Equation 2, as follows: where stDevðCopiesPerReaction LOB Þ is the standard deviation (Dev) of the copies per reaction from the LOB assay. The predicted LOD was subsequently tested by running 60 test reactions at concentrations of 4 and 12 copies per reaction of target using the Exact Diagnostics Standard for SARS-CoV-2.

Bovine Coronavirus Process Recovery Control
To determine the viral recovery efficiency of the wastewater processing method used in this study, a spike-in experiment was performed using bovine coronavirus (BCoV) as a surrogate for SARS-CoV-2. BCoV solution was prepared from freeze-dried Calf Guard cattle vaccine (Zoetis). After rehydrating in 3 mL of sterile diluent provided by the manufacturer, the BCoV solution was divided into 100-lL aliquots and stored at -20 C. To use, the BCoV stock solution aliquot was thawed on ice and vortexed thoroughly; each aliquot was used for a maximum of two freeze-thaw cycles. BCoV stock solution was spiked into wastewater samples at a ratio of 1/1,000 (vol/vol) before the concentration step.
To determine the concentration of the BCoV stock solution, 10 lL of the stock was spiked into 390 lL of phosphate-buffered saline (PBS), and 200 lL of that mixture was extracted following the same protocols used for wastewater sample extractions as described in the main text. One extraction blank, prepared with PBS, was included on each plate as an RNA extraction contamination control. The extracted RNA was then serially diluted (1:10) in nuclease-free water for six dilutions and ran in duplicate using a previously established BCoV assay, following the onestep RT-ddPCR procedure. 41 Stock concentration of BCoV was around 230,000 gene copies ðgcÞ=lL.
Process efficiency (i.e., viral recovery) was calculated by dividing the final quantity of BCoV measured in wastewater samples by the quantity of BCoV spiked to each wastewater sample before concentration (Equation 3). Nonspiked wastewater samples were also quantified for BCoV to assess background concentrations.
cDNA Library Preparation and SARS-CoV-2 Sequencing For cDNA synthesis, 11 lL of lysate from positive wastewater or participant samples, were used for single-strand cDNA synthesis, using the Thermo Superscript IV kit with the following modifications: No host gDNA/RNA removal steps were performed, no RNase H step was performed, and the reverse transcriptase incubation step at 50°C was increased from 10 min to 30 min.
The cDNA was used for amplification and sequencing, using the Swift Amplicon SARS-CoV-2 Panel (AL-COV48) together with Swift Amplicon Combinatorial Dual Indexed Adapters (AL-S1A96, AL-S1BA96, AL-S2A96, AL-S2B96). The Swift Biosciences protocol was followed, except that, after optimization experiments, the volume of the G1 reagent was reduced to 25% of recommended level. This resulted in an increase in read coverage for wastewater samples of 2-to 8-fold for the experiments described in this study (with the release of version 2 of the Swift primer set, the volume of this reagent was no longer reduced). Individual libraries (30-96 samples) were quantified by fluorescence, normalized, and pooled for 2 × 150 bp sequencing on a lane of an Illumina HiSeq3000 sequencer.
Except for some initial sequencing experiments, libraries from individuals were prepared on different days and sequenced in different lanes than samples from wastewater to reduce the possibility of contamination of sequences from low titer wastewater samples with sequences from high titer individual samples. Pooled libraries from wastewater samples were often run on two to three lanes of the Hi-Seq 3,000 to increase read depth.
Wastewater sequencing on the HiSeq3000 produced a median percentage of reads mapped of 0.86% compared with a median of 21.7% for nasal swab samples. In general, wastewater samples with an RNA concentration of log 10 > 4:0 gene copies of N1/N2 per liter of wastewater (gc/L) reliably produced usable amounts of sequence. Samples with lower RNA concentrations produced variable amounts of sequence, and those <log 10 3:0 gc=L were routinely unsuccessful.

Multi-Locus Sequence Typing
After demultiplexing, Illumina primer sequences were trimmed using BBDuk (v.38.96), sequences were aligned to the reference sequence (NC_045512.2; Wuhan-Hu-1) using BWA-MEM (v.0.7.17-r1188), and the SARS-Cov-2 primers were removed using the Swift Biosciences Primerclip package. Genome Analysis Toolkit software (GATK) (v.4.2.0.0) was used to identify variants compared with the reference sequence and to count the numbers of reference and variants reads at each single nucleotide polymorphism (SNP) site. The ploidy was set to 4 and the down-sampling function was not employed. By not using the down-sampling function, all sequence reads that passed standard QC parameters were used in making variant calls. When required, an Integrated Genomics Viewer (IGV) was used to manually inspect sequence alignments and variant calls.
To identify viral genotypes represented in wastewater RNA by multi-locus sequencing typing, sequences from individual samples from Oregon were used to define multi-locus genotypes as a set of polymorphic sites that were unique to each variant and were not shared with any other variant known at the time (Table S2).
To estimate the fraction of each variant present in the wastewater RNA sequences, the number of reads supporting each variant were summed across all variant-specific SNP sites and divided by the total number of reads spanning those SNP sites. At SNP sites where the total number of reads was >100, both the variant read number and total read number were scaled down to a total read number of 100 prior to the estimation calculation. At SNP sites with >100 total reads, the actual number of reads was used, thus decreasing the weight of those counts proportionally to the read coverage.
After this estimation was conducted for each variant, the fractions were summed for all identified variants. If the sum was >0:7, it was assumed that the identified variants comprised all of the RNA molecules present and each fraction was divided by the total to normalize the total to 1.0. If the initial sum was <0:7, it was assumed that the difference was comprised of RNA molecules from unidentified variants, and the fraction of RNA attributable to the unidentified variants was set to the difference between the total and 1.0. If the sum was >1:5, the SNP data were manually reviewed to identify and remove artifacts (miscalled 1-bp indels) and remove common SNPs that were shared between the variants present to prevent double-counting. The custom software to conduct these calculations was implemented in R and packaged into a SnakeMake pipeline. 42 All individual sequences were deposited in the data sharing platform Global Initiative on Sharing All Influenza Data (GISAID; see Table S3 for accession numbers) and all wastewater sequences were deposited in NCBI's short read archive, under BioProject PRJNA719837. 43,44

Monte Carlo Simulations
To compare the accuracy of SARS-CoV-2 wastewater concentrations and clinically reported COVID-19 cases as estimators of community prevalence, Monte Carlo simulations were performed to account for the uncertainty in the point estimates for each sampling event. Clinically reported COVID-19 cases included cases identified through the TRACE random door-to-door sampling events as well as cases reported through standard health surveillance efforts by the Oregon Health Authority. The Monte Carlo simulations accounted for the uncertainty in the point estimates for each sampling event with the underlying assumption that the community prevalence based off the TRACE random door-todoor sampling events were the closest to the ground truth.
For each simulation, a new wastewater concentration (log 10 gc=L) was redrawn for each community sample from a Gaussian distribution with mean equal to the point estimate and standard deviation equal to the standard error of the point estimate. Similarly, a new prevalence was redrawn for each community from a method-specific distribution: a truncated Gaussian distribution with mean equal to the point estimate, standard deviation equal to the standard error of the point estimate, and lower truncation bound equal to zero when the design-based estimator was used; or a Beta distribution with shape and scale parameters estimated using optimization to fit the posterior mean and 95% credible interval when the Bayesian estimator was used. 45 The reported COVID-19 case numbers were not redrawn because these were the publicly available standard. For each simulation, a simple linear regression model was fit using the new wastewater concentration draw to estimate the log 10 of the new community prevalence draw, and a separate model was fit using the Log 10 of the observed COVID-19 case count to estimate the log 10 of the new community prevalence draw. For each model, the slope, intercept, and R 2 value were recorded. In total, 10,000 simulations were performed.

LOB, LOD, and BCoV Process Recovery
The LOB for N1 and N2 was 2.0 and 4.2 copies per reaction, respectively. Both of these values were below the three-droplet threshold; only reactions with one or two droplets yielded copy numbers at that level, which provided confidence in the chosen threshold for calling positive reactions. All LOB reactions were below the positive threshold for N1, and only 4/104 nontarget reactions had three or more droplets in N2, which was a falsepositive rate of 4%.
The predicted LOD based on the LOB results were 4 and 12 copies per reaction for N1 and N2, respectively. For an LOD estimate to be valid, >95% of test reactions at the predicted LOD value need to amplify above the LOB. The LOD of N2 was confirmed to be 12 copies per reaction, as 58/60 (97%) test reactions at that concentration had copy numbers above the N2 LOB. The N1 LOD was somewhere between 4 and 12 copies per reaction: at 12 copies per reaction, all 60 reactions amplified above the N1 LOB, but at 4 copies per reaction, 13/60 (22%) reactions amplified below the N1 LOB. Using a parametric method (which is an imperfect estimate in this case because the test reaction data were not normally distributed), the N1 LOD was estimated to be 8 copies per reaction. The process efficiency based on BCoV recoveries was 57 ± 4%. BCoV was not detected in nonspiked wastewater samples.

Community SARS-CoV-2 Prevalence
Over the course of 10 months, from May 2020 through March 2021, eight sampling events were conducted in six Oregon communities by the TRACE project to determine the COVID-19 prevalence within those communities. The selected communities represented a diverse cross-section of Oregon, ranging from a small coastal commercial fishing community (Newport), to midsized and large university communities in the temperate Willamette Valley (Corvallis and Eugene), mid-sized and large arid high desert communities (Redmond and Bend), and a small agricultural community in eastern Oregon (Hermiston).
Community prevalence results are summarized in Table 3. The response rates ranged from 38% to 71%, with an average response rate of 60 ± 14%. The estimated SARS-CoV-2 prevalence, including both symptomatic and asymptomatic infections, ranged from 8/10,000 to 1,687/10,000 persons. The SARS-CoV-2 concentrations in the influent wastewater corresponding to the door-to-door nasal swab samples ranged from 2.92 to 5:13 log 10 gene copies per liter of wastewater (gc/L).
The majority of the prevalence sampling events took place in the absence of significant precipitation (Table S4). The only exception was in Newport on 20 June 2020. During this 24-h period, 3.56 mm of rain fell, which increased the wastewater flow rate by 5% compared with 11 July 2020, a 24-h period in which no rain fell.
The SARS-CoV-2 concentrations in the influent wastewater of these same cities over a 6-to 11-month period ranged from nondetect to 5:58 log 10 gc=L ( Figure S3). The wastewater SARS-CoV-2 concentrations showed a moderate positive correlation (Pearson's r = 0:71) when compared with the log to the base 10 of reported cases per 10,000 persons (to normalize for differences in population size). The accuracy of estimating reported COVID-19 cases using wastewater SARS-CoV-2 concentrations was also moderate [root mean square logarithmic error ðRMSLEÞ = 0:14 and mean absolute percentage error ðMAPEÞ = 0:29], with wastewater concentrations differing by up to 62-fold representing the same number of reported COVID-19 cases per 10,000 persons (Figure 2A; Tables S6 and S8). Similar correlation strengths and levels of accuracy were observed with each city when analyzed individually ( Figure S4).
When the wastewater SARS-CoV-2 concentrations were compared with the TRACE-informed prevalence estimates that include both symptomatic and asymptomatic cases ( Figure 2B, Table 3), the positive correlation was much stronger (Pearson's r = 0:96), and the accuracy was much higher, as demonstrated by the lower RMSLE (0.03) and MAPE (0.17) values. This correlation also suggests that the detection limit of our WBE method was 3 infections/10,000 persons. Compared with wastewater SARS-CoV-2 concentrations, reported COVID-19 cases were more weakly correlated with estimated prevalence (Pearson's r = 0:85) and had a lower accuracy (RMSLE = 0:05 and MAPE = 0:31) than wastewater SARS-CoV-2 concentrations ( Figure 2C; Tables S6 and S8).
In addition, Monte Carlo simulations demonstrated that wastewater SARS-CoV-2 concentrations were significantly better than reported COVID-19 cases counts at estimating COVID-19 prevalence, even after accounting for uncertainty inherent in the wastewater and prevalence estimates (Figure 3; Tables S6 and  S8). The median R 2 for the wastewater SARS-CoV-2 concentration model was 0.82 compared with 0.71 for the reported-cases model, based on the Monte Carlo simulations. A Wilcoxon ranksum test, which determines whether the two distributions for R 2 have a location shift of 0 vs. the alternative that the wastewater SARS-CoV-2 concentration R 2 distribution has a positive shift, gave a p-value of <0:0001. Thus, even after accounting for the uncertainty inherent in the wastewater SARS-CoV-2 concentration and community prevalence estimates, the difference between the wastewater SARS-CoV-2 concentration and reported-cases median R 2 values was significant, and the wastewater SARS-CoV-2 concentration had a larger median R 2 .
For each simulation, the two nonnested regression models were directly compared using Vuong's test, which is a likelihood-ratio-based test for model selection using the Kullback-Leibler information criterion. 46,47 Vuong's test revealed that the wastewater SARS-CoV-2 concentration model fit significantly better than the reported-cases model (p < 0:05) 53% of the time. The reported-cases model fit significantly better only 2% of the time; the models could not be distinguished on the remaining occasions. Although Vuong's test cannot determine whether the preferred model is the true model, the high R 2 value (0.91 using the observed data) and low RMSLE (0.03) and MAPE (0.17) are indicative of a good fit.

Localized Micro-Sewershed Surveillance (Newport, Oregon)
To identify COVID-19 "hotspots" at the neighborhood-scale in Newport, the wastewater SARS-CoV-2 concentrations were quantified from wastewater samples collected at 22 pump stations located throughout the small community of 10,853 persons ( Figure 1). These pump stations divided the community into neighborhood-scale micro-sewersheds with estimated populations ranging from 15 persons at the 10th St. pump station up to 9,426 persons at the Northside pump station ( Table 2). These wastewater samples were collected during the same time period as the two Newport random nasal swab prevalence sampling events.
During the peak of a Newport outbreak in mid-June, 2020, the random nasal swab sampling event estimated a community COVID-19 prevalence of 3.4% (Table 3). During this time, SARS-CoV-2 was detected in wastewater samples from 13/22 micro-sewersheds, with concentrations ranging from 3:16 to 5:25 log 10 gc=L (Figure 4; Table S5). Three weeks later, in July 2020, the estimated COVID-19 prevalence in Newport had decreased to 0.6%. The number of microsewersheds with detectable levels of SARS-CoV-2 in their wastewater also decreased to 5, with concentrations ranging from 3:36 to 4:71 log 10 gc=L. In addition to decreased micro-sewershed wastewater SARS-CoV-2 concentrations, the decrease in community prevalence also corresponded with marked decreases in nasal swab positivity rates and clinically reported COVID-19 cases (Figure 4; Tables S5 and S7).

Newport Community SARS-CoV-2 Genotype Profiling
Using Multi-Locus Sequence Typing, 48 two distinct viral variants, a B.1.399 lineage variant designated NA and a B.1 lineage variant designated NB, were detected both in the wastewater and among the individuals who tested positive during the June 2020 Newport sampling event (Table S2). The B.1.399/NA consensus most closely matched sequences found in Europe and then in California in March and April 2020, respectively. The B.1/NB consensus sequence most closely matched that from Yakima County, Washington, on 29 April 2020, and sequences found in Europe in March 2020.
During the June sampling event (from 569 samples), B.1.399/ NA was detected in 77% of the individuals who tested positive (10/13), whereas B.1/NB was detected in 23% of the individuals who tested positive (3/13) ( Figure 5, Table 4). In the July sampling event, two individuals tested positive (from 550 samples). Although both samples both yielded low coverage sequence data, they were identified as B.1.1.291 by the international database, Prevalence estimates were calculated using design-weighted estimators appropriate to the respective community sampling design when positive cases were identified or using a betabinomial Bayesian model when zero positive cases were found. b Represents the lower bounds of 95% intervals. GISAID and, thus, were different from the individuals from the June sampling event.
In the wastewater samples, the fraction of viral RNA accounted for by each variant was estimated from the total fraction of variant reads summed across all SNP positions specific to that variant. During the June sampling event, the viral variant distribution in the WWTP influent was dominated by B.1.399/NA, accounting for 70% of the reads, whereas B.1/NB accounted for a minority of the reads at 4% ( Figure 5, Table 4). This mirrors the observations made through the random nasal swab assays (77% B.1.399/NA and 23% B.1/NB).
B.1.399/NA was also dominant in 11/12 positive microsewershed wastewater samples collected. During the June sampling event, B.1.399/NA accounted for 30-98% of the total viral sequence reads across the micro-sewersheds, whereas during the July sampling event it accounted for 48-92% of the viral sequence reads across the micro-sewersheds (Table 4). In addition, although the abundance of B.1/NB was always in the minority, ranging from 0% to 52% of total virus sequence reads during the June sampling event and 0% to 11% of total virus sequence reads during the July sampling event, it was always detected in the wastewater of micro-sewersheds, when B.1/NB was detected among individuals via random nasal swabs ( Figure 5, Table 4). The only exception to this relationship was the Bayfront microsewershed. In Bayfront, B.1/NB was not detected in the wastewater, and the single positive individual discovered in Bayfront carried B.1/NB.
Finally, from the weeks of 8 June to 30 November 2020, five SARS-CoV-2 variants were detected consistently in wastewater samples from the Newport WWTP ( Figure 6A), representing at least 5% of the sequence reads in samples from at least 3 wk.  (Table S2B), was most abundant, and it was also detected during the weeks of 29 June and 3 August. From 3 August onward, a B.1.2 subvariant, designated FF, was the most abundant variant detected. The detection of these variants in Newport broadly mirrored trends across the state of Oregon observed in individuals whose positive samples were sequenced and deposited in GISAID ( Figure 6B; Figure S5).

Discussion
Estimating COVID-19 Prevalence WBE has been widely used to quantify wastewater SARS-CoV-2 concentrations during the COVID-19 pandemic to provide a relative sense of viral burden in a community and track its trend over time. 2,49,50 However, the moderate correlation between wastewater SARS-CoV-2 concentrations and reported COVID-19 cases, similar to those observed in this study, has resulted in the perception that WBE is limited in its ability to estimate community COVID-19 infections. 51 It has been hypothesized that this moderate correlation can be attributed to a variety of factors related to the wastewater sample, including virus decay in sewage conveyance systems, variability in sampling and concentrating techniques, and variability in the magnitude and duration of shedding by infected individuals. 6,11,16 It should be noted that rainfall events were largely absent during the wastewater collection periods of this study. Thus, dilution of wastewater virus concentrations by rainfall events was not an important source of the observed variability in wastewater SARS-CoV-2 concentrations.
Although the correlation between wastewater SARS-CoV-2 concentrations and COVID-19 cases (as reported by health departments) has been moderate, 51,52 this study demonstrated a very  Table 3 for corresponding numeric values and upper and lower bounds on 95% intervals for estimated prevalence per 10,000 values. See Table S6 for corresponding wastewater SARS-CoV-2 concentration numeric values and standard errors. See Table S8 for corresponding case count numeric data. Prevalence estimates were calculated using design-weighted estimators appropriate to the respective community sampling design when positive cases were identified or using a beta-binomial Bayesian model when zero positive cases were found. Note: gc, gene copies. strong correlation between wastewater SARS-CoV-2 concentrations and estimated COVID-19 prevalence. With an average response rate of 60 ± 14%, the response rate of this study was in the upper end of the range reported by other studies doing a door-todoor health survey (20-50%), lending strength to the correlation analyses. [53][54][55] In addition, Monte Carlo simulations indicated that wastewater SARS-CoV-2 concentrations were significantly better (p < 0:05) than reported COVID-19 cases at estimating COVID-19 prevalence in a community. The COVID-19 prevalence estimates minimized many of the uncertainties inherent to reported COVID-19 cases, including asymptomatic individuals and limited access to testing. This suggested that the weaker correlations between wastewater SARS-CoV-2 concentrations and reported COVID-19 cases had less to do with the inherent variability within the wastewater measurements and more to do with the inherent variability in the reporting of COVID-19 cases. In addition, the inherent variability in reported COVID-19 cases is expected to become even more complex with the anticipated unreliability of self-reporting positive results from at-home testing kits.
Thus, when compared with a metric with much smaller uncertainties, such as the prevalence estimates presented in this study, wastewater data were quantitative and could accurately estimate COVID-19 prevalence. This improved correlation may also explain why wastewater has shown significantly better estimation capabilities when compared with COVID-19 hospitalizations and deaths. 2,5,49 These metrics also have lower uncertainties than reported COVID-19 cases, which are likely to be more strongly impacted by unequal distribution of testing availability, asymptomatic individuals, and individuals who do not seek testing for various reasons, including being vaccinated.

Identifying Neighborhood COVID-19 Hotspots and SARS-CoV-2 Variants
In addition to estimating community-wide prevalence with high precision, this study demonstrated that quantifying wastewater SARS-CoV-2 was a powerful method for detecting infection "hot spots" at the micro-sewershed (i.e., neighborhood) level. It should be noted that owing to the hierarchical flows between pump stations, the Northside micro-sewershed was receiving flow from the Bayfront micro-sewershed, which contains the Samaritan Pacific Communities Hospital (Figure 1). Thus, the Northside microsewershed wastewater results may have captured individuals who would not be linked to the Northside micro-sewershed by case reporting or random household surveillance (e.g., infected individuals at the Samaritan Pacific Communities Hospital).
Similar to the observations at the community level, wastewater SARS-CoV-2 concentrations taken from Newport microsewersheds correlated slightly more strongly with nasal swab positivity rates (taken during the prevalence sampling event) than with reported COVID-19 cases at the neighborhood level. Thus, wastewater SARS-CoV-2 concentrations were generally more accurate at identifying hot spots of COVID-19 infections within a community than reported clinical cases. Other studies have used similar approaches to demonstrate the utility of wastewater monitoring at the building level on college and university campuses. 56  Prevalence estimates were calculated using design-weighted estimators appropriate to the respective community sampling design when positive cases were identified or using a beta-binomial Bayesian model when zero positive cases were found. See Table 3 for corresponding numeric values and upper and lower bounds on 95% intervals for estimated prevalence per 10,000 values. See Table S6 for corresponding wastewater SARS-CoV-2 concentration numeric values and SEs. See Table S8 for corresponding case count numeric data. Note: SE, standard error. WBE was also successful at identifying the dominant and minor SARS-CoV-2 variants (B.1.399/NA and B.1/NB, respectively) at both the micro-sewershed and city levels in Newport. At the city WWTP, as well as at three of the four micro-sewersheds that contained positive nasal swab samples during the random door-to-door sampling events (Big Creek, Nye Beach, and Northside), the SARS-CoV-2 variants identified in the wastewater were the same as those identified via nasal swab analyses  Table S5 for corresponding wastewater SARS-CoV-2 concentration (WBE) numeric values and 95% CIs. See Table S7 Table 4 for corresponding numeric data. Map tiles by Stamen Design, under CC BY 3.0. Basemap data by OpenStreetMap, under ODbL. Note: WWTP, wastewater treatment plant. and at similar relative proportions. The lone exception was the Bayfront micro-sewershed whose wastewater sample was dominated by the B.1.399/NA variant, whereas the nasal swab samples from that area were dominated by the B.1/NB variant. However, this micro-sewershed was home to the Samaritan Pacific Communities Hospital, which may have skewed the microsewershed variant information with inputs from hospitalized individuals who would not be normally associated with that microsewershed.
WBE data was also able to monitor the change in SARS-CoV-2 variants present in the Newport community over time.
The five major SARS-CoV-2 variants identified in the WBE data from 8 June to 30 November 2020 mirrored the primary SARS-CoV-2 variants identified through clinical samples in the state of Oregon during that same time period. Interestingly, spikes observed in the WWTP influent SARS-CoV-2 concentrations often corresponded to the appearance of a new dominant variant in the wastewater sequences. This suggests that rises in viral RNA concentrations in wastewater samples may signal the appearance of a new dominant variant.
The agreement between wastewater sequence data with clinical sequence data at both the micro-sewershed and city levels supports the reliability of wastewater sequence data to accurately represent the SARS-CoV-2 variant distribution in a city or neighborhood. To our knowledge, this is the first study of variant abundance in wastewater at the neighborhood level, although others have sequenced SARS-CoV-2 in wastewater at the community level. [58][59][60]

Limitations of WBE Methodologies
Although results of this study have demonstrated the power of WBE methodologies, care should be taken when comparing quantitative results across studies. A large interlaboratory study found that the quantified SARS-CoV-2 concentrations in a single wastewater sample can vary by several orders of magnitude, depending on the concentration methods (e.g., polyethylene glycol precipitation, ultrafiltration, direct extraction, or HA membrane filtration), pretreatments (e.g., removal of solids and/or pasteurization of the wastewater samples), and PCR platforms (e.g., digital or quantitative) employed. 18 However, this same study also reported very low variability in replicate samples from each participating laboratory, as well as between participating labs that used the exact same methodologies, (SD <0:2 log 10 gc=L). This finding suggests that when the same methodologies were employed, consistent SARS-CoV-2 concentrations both within and across laboratories was achievable. Further research to optimize and standardize the wastewater surveillance process is still needed to unify the results from national and global wastewater surveillance efforts. Table 4. SARS-CoV-2 variant relative abundances in Newport, Oregon: relative abundances of variants detected in samples from individuals and wastewater across micro-sewersheds (neighborhoods) during the 20-21 June 2020 prevalence sampling event in Newport. The number of individuals counted for each micro-sewershed include not only those residing within the micro-sewershed itself but also those residing within upstream micro-sewersheds flowing into the micro-sewershed.
b SE was calculated on untransformed fractions. c Measurements were from a) a single assay, b) assays of two independent water samples, c) two assays of RNA from one sample and one assay of a second independent sample.