Wastewater-Based Estimation of the Effective Reproductive Number of SARS-CoV-2

Background: The effective reproductive number, Re, is a critical indicator to monitor disease dynamics, inform regional and national policies, and estimate the effectiveness of interventions. It describes the average number of new infections caused by a single infectious person through time. To date, Re estimates are based on clinical data such as observed cases, hospitalizations, and/or deaths. These estimates are temporarily biased when clinical testing or reporting strategies change. Objectives: We show that the dynamics of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) RNA in wastewater can be used to estimate Re in near real time, independent of clinical data and without the associated biases. Methods: We collected longitudinal measurements of SARS-CoV-2 RNA in wastewater in Zurich, Switzerland, and San Jose, California, USA. We combined this data with information on the temporal dynamics of shedding (the shedding load distribution) to estimate a time series proportional to the daily COVID-19 infection incidence. We estimated a wastewater-based Re from this incidence. Results: The method to estimate Re from wastewater worked robustly on data from two different countries and two wastewater matrices. The resulting estimates were as similar to the Re estimates from case report data as Re estimates based on observed cases, hospitalizations, and deaths are among each other. We further provide details on the effect of sampling frequency and the shedding load distribution on the ability to infer Re. Discussion: To our knowledge, this is the first time Re has been estimated from wastewater. This method provides a low-cost, rapid, and independent way to inform SARS-CoV-2 monitoring during the ongoing pandemic and is applicable to future wastewater-based epidemiology targeting other pathogens. https://doi.org/10.1289/EHP10050


Table of Contents
Managing PCR inhibition in San Jose sludge samples Table S1. Primer and Probes. Table S2. The match between the daily and further subsampled R ww traces for Zurich. Table S3. The match between the daily and further subsampled R ww traces for San Jose. Table S4. Parameters of the optimal shedding load distribution from symptom onset. Figure S1. Representative results from inhibition titration experiments with two samples from San Jose. The x-axis shows the concentration of solids (wet weight) added to the DNA/RNA Shield diluent in units of mg/mL. The y-axis shows the concentration of the three SARS-CoV-2 RNA targets in the solids in units of gene copies/g dry weight. Each panel represents a single sample analyzed in ten wells, and error bars represent standard deviations across these wells, resulting from the total error of the dPCR instrument's software. Figure S2. Fluorescence plots from the Stilla Naica Crystal Droplet PCR used for the wastewater samples from Zurich collected after 23 September 2020 from positive (top) and negative (bottom) experimental results. Droplets positive for N1 (blue) and N2 (brown) and both N1 and N2 markers (purple). RPP30 markers, detectable as elevated fluorescence in Green channel, are included in the commercial assay used, but were not further analyzed. Figure S4. Rww estimation from San Jose (USA) sludge measurements, assuming zero delay between symptom onset and testing: (A) Measured RNA concentrations of the N, S and ORF1a genes (blue, green, yellow respectively) between 15 November 2020 and 19 March 2021 (125 samples, of which 3 were excluded based on quality control). (B, C) Confirmed cases and testingadjusted cases in Santa Clara county (purple; B, red; C) during the same time period. The testingadjusted cases describe the number of positive tests / total number of tests per day, normalized by the mean number of tests per day in Santa Clara county during the study period. In contrast to Fig.  2 in the main text, the delay between symptom onset and testing was assumed 0, rather than the same as for case reporting. (D) The estimated infection incidence per day from normalized RNA concentrations (top; the N, S, and ORF1a genes are indicated by blue horizontal-hatched, green \hatched, and yellow /-hatched bands respectively), case reports (middle) and testing-adjusted cases (bottom). The gene copies (gc) per gram dry weight were normalized by the lowest measured value (N*M = 2663.7 gc/g-dry weight per infection/day). The ribbons indicate the mean ± standard deviation across 1000 bootstrap replicates. (E) The estimated R ww compared to R cc from confirmed cases (top; purple solid band) and testing-adjusted cases (bottom; red solid band). The colored line indicates the point estimate on the original data, and the ribbons the 95% confidence interval across 1000 bootstrap replicates. were excluded based on quality control). (B, C) Confirmed cases and testing-adjusted cases in Santa Clara county (B, purple; C, red) during the same time period. In contrast to Fig. 2 in the main text, the missing cases on Dec. 30th were set to 1000 in panel (B). The testing-adjusted cases describe the number of positive tests / total number of tests per day, normalized by the mean number of tests per day in Santa Clara county during the study period. (D) The estimated infection incidence per day from normalized RNA concentrations (top; the N, S, and ORF1a genes are indicated by blue horizontal-hatched, green \-hatched, and yellow /-hatched bands respectively), case reports (middle) and testing-adjusted cases (bottom). The gene copies (gc) per gram dry weight were normalized by the lowest measured value (N*M = 2663.7 gc/g-dry weight per infection/day). The ribbons indicate the mean ± standard deviation across 1000 bootstrap replicates. (E) The estimated R ww compared to R cc from confirmed cases (top; purple solid band) and testing-adjusted cases (bottom; red solid band). The colored line indicates the point estimate on the original data, and the ribbons the 95% confidence interval across 1000 bootstrap replicates. Figure S6. The estimated Rww based on N gene measurements in San Jose (black line, grey 95% confidence intervals) compared to different Rcc estimates for Santa Clara county from the California COVID assessment tool (CalCat) [3][4][5][6][7] . The R ww estimate from Fig. 2D was compared to 6 methods to estimate R cc , indicated by the color and panel title. "ETH" refers to the pipeline of Huisman et al. applied to the daily confirmed case data from Santa Clara county, i.e. the same comparison as shown in Fig. 2D. We excluded the SEIR model estimates included on the CalCat website (since these performed visibly worse than the other methods).   We calculated R ww from the measured N1 marker indicating the N gene copy loads, normalized in four different ways: by 1x10 12 gene copies (gc) per infection (the order of magnitude of the lowest measured concentration), by 5x10 10 gc per infection (roughly rescaled to case incidence), by 1x10 5 gc per infection (to cover the space of possible orders of magnitude), no normalization. (A) After normalization the measurements were deconvolved and rescaled back to the original magnitude (multiplied by the normalization factor) to illustrate differences in the inferred infection incidence.

Figure S10. Coverage between Rcc and Rww for different shedding load distributions (SLDs).
We scanned across different parameter pairs (mean, standard deviation in days) for the SLD from time since infection. For the city of Zurich, the R ww from N1 loads in wastewater was compared to that of confirmed cases in the catchment. For San Jose, we compared S gene concentrations to confirmed cases in Santa Clara County. The contour lines show SLD parameter pairs with equal coverage, in steps of 10% of the optimum value. In each comparison (grid-point) the 95% confidence intervals of R cc and R ww are based on 50 bootstrap replicates. The numeric data corresponding to this figure can be found in Excel table S8.

Figure S11. Mean average percentage error (MAPE) between case-based Rcc and wastewater-based Rww for different shedding load distributions (SLDs).
We scanned across different parameter pairs (mean, standard deviation in days) for the SLD from time since infection. For the city of Zurich, the R ww from N1 marker loads in wastewater was compared to that of confirmed cases in the catchment. For San Jose, we compared S gene concentrations to confirmed cases in Santa Clara County. The contour lines show SLD parameter pairs with equal MAPE, in steps of 10% of the optimum value. The numeric data corresponding to this figure can be found in Excel table S8.  We scanned across different parameter pairs (mean, standard deviation in days) for the SLD from time since symptom onset. For the city of Zurich, the Re from N1 marker loads in wastewater was compared to that of confirmed cases in the catchment. For San Jose, we compared S gene concentrations to confirmed cases in Santa Clara County. The incubation period is assumed gamma distributed with mean 5.3, and standard deviation 3.2. The contour lines show SLD parameter pairs with equal RMSE, in steps of 10% of the optimum value.

References Additional File-Excel Document
Supplemental Methods:

Managing PCR inhibition in San Jose sludge samples:
We diluted the solids in DNA/RNA Shield (Zymo Research, Irvine, CA, USA) (hereafter referred to as "diluent") prior to RNA extraction, inhibitor removal, and RT-dPCR, as described in the main text. This dilution was necessary as we have previously shown that direct RNA extraction from the solids resulted in a product containing inhibitors, so dilution of the RNA template prior to RT-dPCR was required 1 .To determine an optimal concentration of solids to add to the diluent (mg wet weight / mL diluent) to reduce inhibition, but retain reasonable sensitivity, we sludge quantified SARS-CoV-2 RNA in solutions containing DNA/RNA shield with solids concentrations from 7.5 mg/mL to 150 mg/mL, where we used the exact methods described in the main text. We aimed to identify the highest concentrations of solids in solution that did not show inhibition. Representative results from two different samples from San Jose are shown in Figure S1 where concentrations of SARS-CoV-2 RNA per g dry weight of solids is shown for different starting concentrations of solids in diluent. Note that the numbers on the y-axis are directly comparable between solutions, as we account for the different mass of solids in the dimensional analysis to derive the SARS-CoV-2 RNA concentration. The relatively lower concentrations for SARS-CoV-2 RNA for solutions of 150 mg/mL suggest the presence of inhibition at this relatively high solids concentration. Concentrations are relatively similar among the lower concentration solutions suggesting that starting at 75 mg/mL, inhibition is mostly alleviated. Only one gene target from one of the samples (the N gene in sample A shown in the top graph) showed slightly higher concentrations in more dilute solutions. Given these results we therefore moved forward using 75 mg/mL for the method.
Supplemental Tables: Note: the terms "PMMoV" and "BCov" denote pepper mild mottle virus, and bovine coronavirus respectively. The other primers and probes target severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).    Figure S1. Representative results from inhibition titration experiments with two samples from San Jose. The x-axis shows the concentration of solids (wet weight) added to the DNA/RNA Shield diluent in units of mg/mL. The y-axis shows the concentration of the three SARS-CoV-2 RNA targets in the solids in units of gene copies/g dry weight. Each panel represents a single sample analyzed in ten wells, and error bars represent standard deviations across these wells, resulting from the total error of the dPCR instrument's software.      [3][4][5][6][7] . The Rww estimate from Fig. 2D was compared to 6 methods to estimate Rcc, indicated by the color and panel title. "ETH" refers to the pipeline of Huisman et al. 8 applied to the daily confirmed case data from Santa Clara county, i.e. the same comparison as shown in Fig. 2D. We excluded the SEIR model estimates included on the CalCat website (since these performed visibly worse than the other methods).   11 . The error bars (A; mean ± standard deviation) and 95% confidence intervals (B) are based on 50 bootstrap replicates per condition. We calculated Rww from the measured N1 marker indicating the N gene copy loads, normalized in four different ways: by 1x10 12 gene copies (gc) per infection (the order of magnitude of the lowest measured concentration), by 5x10 10 gc per infection (roughly rescaled to case incidence), by 1x10 5 gc per infection (to cover the space of possible orders of magnitude), no normalization. (A) After normalization the measurements were deconvolved and rescaled back to the original magnitude (multiplied by the normalization factor) to illustrate differences in the inferred infection incidence. (B) The resulting Rww estimates differ both in the mean (due to the deconvolution illustrated in A) and width of the uncertainty interval. The results are compared for four shedding load distributions (SLDs; see Methods): the Benefield SLD upon symptom onset (Incubation + Benefield) 9 , the Han SLD upon symptom onset (Incubation + Han) 9,10 , shedding only on the day of symptom onset (Incubation only), shedding only on the day of death (Incubation + Death) 11 . The 95% confidence intervals (B) are based on 50 bootstrap replicates per condition.

Fig. S10: Coverage between Rcc and Rww for different shedding load distributions (SLDs).
We scanned across different parameter pairs (mean, standard deviation in days) for the SLD from time since infection. For the city of Zurich, the Rww from N1 loads in wastewater was compared to that of confirmed cases in the catchment. For San Jose, we compared S gene concentrations to confirmed cases in Santa Clara County. The contour lines show SLD parameter pairs with equal coverage, in steps of 10% of the optimum value. In each comparison (grid-point) the 95% confidence intervals of Rcc and Rww are based on 50 bootstrap replicates. The numeric data corresponding to this figure can be found in Excel table S8.

Fig. S11: Mean average percentage error (MAPE) between case-based Rcc and wastewater-based Rww for different shedding load distributions (SLDs).
We scanned across different parameter pairs (mean, standard deviation in days) for the SLD from time since infection. For the city of Zurich, the Rww from N1 marker loads in wastewater was compared to that of confirmed cases in the catchment. For San Jose, we compared S gene concentrations to confirmed cases in Santa Clara County. The contour lines show SLD parameter pairs with equal MAPE, in steps of 10% of the optimum value. The numeric data corresponding to this figure can be found in Excel table S8.  We scanned across different parameter pairs (mean, standard deviation in days) for the SLD from time since symptom onset. For the city of Zurich, the Re from N1 marker loads in wastewater was compared to that of confirmed cases in the catchment. For San Jose, we compared S gene concentrations to confirmed cases in Santa Clara County. The incubation period is assumed gamma distributed with mean 5.3, and standard deviation 3.2. The contour lines show SLD parameter pairs with equal RMSE, in steps of 10% of the optimum value.