Real-time sewage surveillance for SARS-CoV-2 in Dhaka, Bangladesh versus clinical COVID-19 surveillance: a longitudinal environmental surveillance study (December, 2019–December, 2021)

Summary Background Clinical surveillance for COVID-19 has typically been challenging in low-income and middle-income settings. From December, 2019, to December, 2021, we implemented environmental surveillance in a converging informal sewage network in Dhaka, Bangladesh, to investigate SARS-CoV-2 transmission across different income levels of the city compared with clinical surveillance. Methods All sewage lines were mapped, and sites were selected with estimated catchment populations of more than 1000 individuals. We analysed 2073 sewage samples, collected weekly from 37 sites, and 648 days of case data from eight wards with varying socioeconomic statuses. We assessed the correlations between the viral load in sewage samples and clinical cases. Findings SARS-CoV-2 was consistently detected across all wards (low, middle, and high income) despite large differences in reported clinical cases and periods of no cases. The majority of COVID-19 cases (26 256 [55·1%] of 47 683) were reported from Ward 19, a high-income area with high levels of clinical testing (123 times the number of tests per 100 000 individuals compared with Ward 9 [middle-income] in November, 2020, and 70 times the number of tests per 100 000 individuals compared with Ward 5 [low-income] in November, 2021), despite containing only 19·4% of the study population (142 413 of 734 755 individuals). Conversely, a similar quantity of SARS-CoV-2 was detected in sewage across different income levels (median difference in high-income vs low-income areas: 0·23 log10 viral copies + 1). The correlation between the mean sewage viral load (log10 viral copies + 1) and the log10 clinical cases increased with time (r = 0·90 in July–December, 2021 and r=0·59 in July–December, 2020). Before major waves of infection, viral load quantity in sewage samples increased 1–2 weeks before the clinical cases. Interpretation This study demonstrates the utility and importance of environmental surveillance for SARS-CoV-2 in a lower-middle-income country. We show that environmental surveillance provides an early warning of increases in transmission and reveals evidence of persistent circulation in poorer areas where access to clinical testing is limited. Funding Bill & Melinda Gates Foundation.


Supplementary Material
Supplementary Methods 3   Table S1. Sequences of primers and probes.  Table S2. Sample assessment of the availability of clinical testing at the ward level at two different one week time periods throughout the pandemic. 25 Figure S6. The correlation between sewage viral load and logged case data from July 2020 -December 2021 by weekly lag in the study area. 26 Figure S7. The correlation between sewage viral load and logged case data from July 2020 -December 2020 by weekly lag in the study area. 27 Figures S8. The correlation between sewage viral load and logged case data from January 2021 -June 2021 by weekly lag in the study area. 28 Figure S9. The correlation between sewage viral load and logged case data from July 2021 -December 2021 by weekly lag in the study area. 29 Table S3. Comparing the correlation between case data and SARS−CoV−2 viral load in environmental surveillance in the study area. 30 Figure S10. Cross-correlations comparing the correlation between COVID-19 clinical case data and SARS-CoV-2 viral load in environmental surveillance data from July 2020-December 2021 from the study area. 31

Site development
Once the sewage lines were mapped and digitized to shapefiles , Novel-T, a mapping company, built interactive maps for our study area using the sewer line shapefiles, WorldPop data, 1 2m resolution DTM maps (AW3D, Tokyo, Japan), and digital elevation models at 2m resolution. 2 We identified prospective environmental surveillance (ES) based on the catchment area coverage and catchment population using the Watershed Tool and the Population Selector Tool built into the interactive maps. We then selected the ES sites if the site was accessible throughout the year, away from industrial wastes, and if the wastewater had a pH of around 7 and high total dissolved solids (>250 mg/L).

Blue line tracing and digitization of informal and formal sewage lines
For blue line tracing, multiple field teams went to the field sites to walk and cover every km of the study area. They traced all the informal and formal sewage lines and marked the flow direction on a physical map. At the end of each day, these lines were manually digitized using QGIS, a free and open-source Geographic Information System (https://www.qgis.org/en/site/index.html), to create shapefiles (appendix).

Environmental sample collection
Aquaread probe Sensors detected and recorded conductivity, date, depth, dissolved oxygen, GPS, pH, oxidation reduction potential, resistivity, salinity, sea water specific gravity, temperature, time, total dissolved solids, and turbidity.

BMFS grab sample collection and processing
Six-litre grab samples of wastewater are collected using the collection bag, sealed, and placed in clean buckets for transportation back to a field office. Then, we filtered the sample through ViroCap filters at the field office until the 6L went through the filter or when 45 min has elapsed. Subsequently, the ViroCap filter housing was then placed on ice packs to maintain cold chain during transport to the International Centre for Diarrhoeal Disease Research, Bangladesh (icddr,b) laboratory for further processing

Virus elution, concentration, and total nucleic acid extraction
The virus was eluted using 1.5% beef extract, 0.05 M glycine, pH 9.5 eluent solution and further concentrated using skim milk flocculation. 3,4 Total nucleic acid (TNA) was extracted from the skim milk pellet using the QIAamp Stool Mini Kit (Qiagen, Gaithersburg, MD, USA) with a slightly modified manufacturer's protocol. 18 An extraction blank was included per batch of extraction to monitor for contamination. The TNA was stored in -80°C until further testing.

RT-qPCR for SARS-CoV-2
The PCR primers were designed to amplify the virus nucleocapsid (N) gene region of the SARS-CoV-2 genome. The PCR probes were designed to capture specific amplicon(s). The primers and probes were designed by the Center for Disease Control and Prevention (sequences of the primers and probes are shown in Table S1. 5 The primer probe mixes for N1 and N2 amplicons were part of the 2019-nCoV CDC EUA Kit (Integrated DNA Technologies, Inc., Coralville, IA).

Table S1. Sequences of primers and probes.
Target Each 20-μl RT-qPCR for N1 included 5 μl of 4X qScript XLT One-Step RT-qPCR Toughmix (Quanta Biosciences, Beverly, MA), 1.5 μl of primer probe mix N1 (Integrated DNA Technologies, Inc., Coralville, IA), 8.5 μl of nuclease free water, and 5 μl of total nucleic acid or positive control or nuclease free water (no template control). Each 20-μl RT-qPCR for N2 included 5 μl of 4X qScript XLT One-Step RT-qPCR Toughmix (Quanta Biosciences, Beverly, MA, USA), 1.5 μl of primer probe mix N2 (Integrated DNA Technologies, Inc., Coralville, IA), 8.5 μl of nuclease free water, and 5 μl of total nucleic acid or positive control or nuclease free water (no template control). 2019-nCoV-N Positive Control (Integrated DNA Technologies, Inc., Coralville, IA), a plasmid positive control, was diluted to a final working concentration of 200 copies/µL and included as a positive control template for every PCR plate while nuclease free water was included as a no template control (negative control). Cycling conditions included 10 min of reverse transcriptase enzyme activation and cDNA synthesis at 50°C, 3 min of initial denaturation and enzyme activation at 95°C, and 40 cycles of 5 s at 95°C and 30 s at 60°C. The PCR was performed on the CFX96 (Bio-Rad, Hercules, CA) and the results were analysed with CFX Maestro software v1.1 (Bio-Rad, Hercules, CA). Threshold cycles (Cts) were determined after setting the baseline threshold.

Analytical performance of real-time RT-qPCR N1 and N2 gene assays
Standard curves for the N1 and N2 gene assays were generated by testing 10-fold serial dilutions of the SARS-CoV-2 RUO Plasmid Controls (IDT, Coralville, IA) in triplicate (concentration ranged from 200,000 to 0.2 copies/µL). Serial dilutions of SARS-CoV-2 RUO Plasmid Controls (IDT, Coralville, IA) showed that the limit of detection (defined as the lowest dilution where 10/10 samples were detected) was the same for N1 and N2 (10 copies/μl). The PCR efficiencies calculated from the slopes of the standard curves were 98% and 103% for the N1 and N2 assays, respectively qPCR for faecal indicator organisms HF183 and CrAssphage Each assay was tested in 25 µl reaction which included 12.5 µl of 2x TaqMan Environmental master mix (Thermo Fisher Scientific, Inc., Waltham, MA), 0.25 µl of 100 µM forward and reverse primers (final concentration 1 µM), 0.02 µl of 100 µM probe (final concentration 0.08 µM), 6.98 µl nuclease free water, and 5 µl total nucleic acid template. The synthetic fragment of HF183 and CrAssphage (Integrated DNA Technologies, Inc., Coralville, IA) and nuclease free water were included in every plate as positive and notemplate control respectively. The qPCR was performed on CFX96 (Bio-Rad, Hercules, CA) which cycling condition included initial denaturation at 95 o C for 10 min, followed by 40 cycles of denaturation at 95°C for 15s and annealing/extension at 60˚C for 1 min.

Human faecal indicators, rainfall, and catchment area population size
We assessed the correlation between the mean log10 viral load of CrAssphage and HF183 in wastewater with log10 mean of the catchment area population size (scaled to per 1000 persons) from four weeks in March 2021, May 2021, and July 2021 to determine if it was possible to normalize measured concentrations of SARS-CoV-2 to the population contributing to each sewage sample. The months were chosen to ensure data was collected during the dry (March) and rainy (July) seasons. We proceeded with 1:100 dilutions of HF183 and CrAssphage as they were detected in every sample and had low Ct values. We assessed the correlation via linear regression between the mean log10 viral load of HF183 and CrAssphage in wastewater and rainfall (mm) 6 from four weeks in March 2021, May 2021, and July 2021. We used linear regression models to examine the correlations at the site, ward, and overall levels with the previous one days and the mean of the previous three days' of rainfall, adjusting for site using fixed effects.

Analysis of case data and correlation with sewage data
We compared the environmental samples on the log10 scale to the clinical case data on the normal and log10 scales via the Shapiro-Wilk normality test. The logged case data had a normal distribution (p=0.10) while the unlogged case data did not (p<0.001). Therefore, we used logged clinical case data. Next, we assessed the Pearson correlation between the viral load of environmental samples and the logged clinical case data from the study region where each week of environmental sample data was compared against clinical case data from the same week (0 lag), the next week (1-week lag), and 2 weeks out (2-week lag) for the entire study and in sixmonth time periods (July 2020-December 2020, January 2021-June 2021, July 2021-December 2021). Case data prior to July 2020 was excluded due to inadequate testing prior to this time. Additionally, we estimated the cross-correlation between sewage viral load and clinical cases in the study region from July 2020-December 2021 via a linear model with generalized estimating equations adjusting for ward and temporal autocorrelation. 7 Logged case data was smoothed using a 7-day running average where we averaged the number of cases three days before, the day of, and three days after to account for a lack of or reduction in laboratory testing on weekends. We then estimated the cross-correlations for the entire study period and in six-month time periods (July 2020-December 2020, January 2021-June 2021, July 2021-December 2021) to examine if there were changes in the association between the logged clinical cases and viral load in the ES samples throughout the pandemic.

Analytical performance of real-time RT-qPCR N1 and N2 gene assays
Standard curves for the N1 and N2 gene assays were generated by testing 10-fold serial dilutions of the SARS-CoV-2 RUO Plasmid Controls (IDT, Coralville, IA) in triplicate (concentration ranged from 200,000 to 0.2 copies/µL). Serial dilutions of SARS-CoV-2 RUO Plasmid Controls (IDT, Coralville, IA) showed that the limit of detection (defined as the lowest dilution where 10/10 samples were detected) was the same for N1 and N2 (10 copies/μl). The PCR efficiencies calculated from the slopes of the standard curves were 98% and 103% for the N1 and N2 assays, respectively.