Detection of SARS-CoV-2 variants by genomic analysis of wastewater samples in Israel

Investigation of SARS-CoV-2 spread and identification of variants in sewers has been demonstrated to accurately detect prevalence of viral strains and is advantageous to clinical sampling in population catchment size. Herein, we utilized an established nationwide system of wastewater sampling and viral concentration approaches to perform large-scale surveillance of SARS-CoV-2 variants in nine different locations across Israel that were sampled from August 2020 to February 2021 and sequenced (n = 58). Viral sequences obtained from the wastewater samples had high coverages of the genome, and mutation analyses successfully identified the penetration of the B.1.1.7 variant into Israel in December 2020 in the central and north regions, and its spread into additional regions in January and February 2021, corresponding with clinical sampling results. Moreover, the wastewater analysis identified the B.1.1.7 variant in December 2020 in regions in which non-sufficient clinical sampling was available. Other variants of concern examined, including P.1 (Brazil/Manaus), B.1.429 (USA/California), B.1.526 (USA/New York), A.23.1 (Uganda) and B.1.525 (Unknown origin), did not show consistently elevated frequencies. This study exemplifies that surveillance by sewage is a robust approach which allows to monitor the diversity of SARS-CoV-2 strains circulating in the community. Most importantly, this approach can pre-identify the emergence of epidemiologically or clinically relevant mutations/variants, aiding in public health decision making.


H I G H L I G H T S
• Wide wastewater surveillance across Israel was used to detect SARS-CoV-2 variants. • Detected penetration of B.1.1.7 into Israel corresponded with clinical data. • Variant B.1.1.7 was detected in a region with insufficient clinical sampling. • Wastewater surveillance is important for pre-detection of variants of concern.
G R A P H I C A L A B S T R A C T a b s t r a c t a r t i c l e i n f o

Introduction
Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), the cause of an ongoing global pandemic of corona virus disease Science of the Total Environment 789 (2021) 148002 (Zhou et al., 2020), has affected at least 223 countries/territories with 114 million confirmed cases and almost 2.6 million deaths reported worldwide as of beginning of March 2021 (World Health Organization, n.d.). While the majority of infections result in no apparent symptoms or mild ones, some progress to acute respiratory disease, multi-organ failure, and death (Huang et al., 2020). Community transmission of the virus, as well as anti-viral treatments, can engender novel mutations in the viral genome, potentially resulting in more virulent strains with higher mortality rates or emergence of strains resistant to treatment (Sanjuán and Domingo-Calap, 2016). Recent increased sequencing efforts worldwide contributed to the discovery of SARS-CoV-2 variants of concern (VOC) with rapid spread and increased transmissibility, including B.1.1.7 (UK)  and B.1.351 (South Africa) (de O. Houriiyah Tegally et al., 2020). The impact of these variants' mutations on the effectiveness of vaccines is being explored, and travel bans are implemented on passengers arriving from afflicted countries. In Israel as well, entry of foreigners has been minimized and returning citizens are quarantined in specialized facilities or at home.
Systematic tracking of demographic and clinical patient information, and especially of the viral genome, has been shown to be necessary to effectively combat transmission of the disease (Wohl et al., 2016). Numerous countries have recently increased their viral genome sequencing capacity in aim to identify and control known and novel viral variants in a timely manner (Furuse, 2021). However, estimating viral diversity from clinical samples may be challenging in terms of representative sampling and funds. Wastewater-based epidemiology (WBE) has been shown to be a useful tool to track virus circulation and variants such as Poliovirus (Asghar et al., 2014). Indeed, wastewater sampling and viral concentration is established in Israel and routinely used for polio surveillance, including molecular characterization of Poliovirus strains (Shulman et al., 2016).
SARS-CoV-2 is primarily detected in respiratory tract tissues, however it has been shown that the virus can also be detected in the gastrointestinal tract, with viral RNA shedding in feces of around 50% of the infected patients, often for longer periods than for nasal swabs (Xiao et al., 2020;Wu et al., 2020a). Accordingly, since the start of the COVID-19 pandemic, wastewater real-time quantitative PCR (RT-qPCR) has been used to quantify the amount of SARS-CoV-2 RNA in sewage across many different regions globally, where generally a correlation is found between the RNA levels and the numbers of COVID-19 confirmed cases (Wu et al., 2020b(Wu et al., , 2020cAhmed et al., 2020a;Ahmed et al., 2020b;Medema et al., 2020). Few recent studies investigating the spread of SARS-CoV-2 variants in the community via wastewater showed accurate identification and prevalence of viral strains known to exist in the region sampled, and in addition demonstrated evidence for recent introductions of viral lineages before they are detected by local clinical tests (Nemudryi et al., 2020;Martin et al., 2020;G. Izquierdo-Lara et al., 2021;Crits-Christoph et al., 2021;Jahn et al., 2021;Rimoldi et al., 2020).
In this study, whole SARS-CoV-2 genomes were sequenced from RNA extracted from wastewater sampled across different regions Israel in August 2020-February 2021 to identify frequencies of VOC such as the B.1.1.7 (UK) variant in Israel. Successful identification of this variant in different regions, including regions in which sufficient clinical testing was not available at the time, exemplify the importance of combining sewage-based surveillance with clinical testing for early and robust identification of worrisome SARS-CoV-2 variant spread.

Wastewater sample collection and processing
The wastewater sampling sites (n = 9) chosen for this study are covering different regions of Israel with catchment population of approximately 50%. These samples were collected from waste water treatment plants (WWTP) once a month from each site during August 2020-February 2021 (n = 58 samples). For some sampling sites/ times, data was not available due to malfunctions in the sampling process: Jerusalem -November; Natanya -December and February; Tzfat -November; Elhamra -January. 250 ml of untreated wastewater was collected every 30 min for 24 h, by external automatic composite samplers. A cold temperature (~4°C) was maintained during sample transport to the laboratory. Virus was concentrated from each wastewater sample as previously described (Ahmed et al., 2020b), with a minor change: 25 ml of wastewater was centrifuged (4696g) for 5 min at 4°C . 20 ml supernatant was collected into 50 ml tube and 0.5 g of MgCl 2 was added, continued with gentle shaking for 5 min to obtain a final concentration of 25 mM MgCl 2 . Following, sample was passed through 0.45-μm pore-size, 47-mm diameter electronegative MCE membranes (Merck Millipore Ltd). The membrane was immediately inserted into a 50 ml tube containing 3 ml external lysis buffer (NucliSENS easyMAG;REF 280134). Virus trapped in the membrane was recovered and inactivated by gentle shaking for 30 min.

Viral RNA extraction
Total nucleic acid (NA) were extracted using the NucliSENS easyMAG system (bioM‫י‬rieux, Marcyl'Etoile, France) according to the manufacturer's instructions. Briefly, 3 ml of lysis buffer containing the concentrated sewage was extracted using the easyMAG extractor. Extracted NA were eluted in 55 μl elution buffer and stored at −70°C until sequencing.

Real-time PCR (q-PCR)
Detection of the SARS-CoV-2 RNA presence in wastewater was initially performed using RT-qPCR targeting the SARS-CoV-2 E gene as described by Corman et al. (2020). This reaction was combined in a multiplex that contained two additional reactions: hCoV OC-43 for pre-extraction spike control, and MS-2 phage for post extraction spike control.
The protocol was adjusted for the use with the SensiFast reaction mix (Bioline, https://www.bioline.com/us/), thereby shortening the time of the reaction, while maintaining its sensitivity and specificity. The reaction mix was assembled according to the manufacturer's instruction in a final volume of 30 μl. the following primers and probe concentrations were used: E-sarbeco primers and probe -0.3 μM each and 0.2 μM, respectively, OC-43 primers and probe -0.6 μM each and 0.4 μM, respectively, and MS-2 primers and probe -0.6 μM each and 0.4 μM, respectively. The reaction thermal profile was as follows: 15′ at 45°C, 2′2″ at 95°C, 45× [5″ at 95°C, 42″ at 60°C].
Positive control for the SARS-CoV-2 E gene was generated by In vitro transcription of RNA fragments. A region corresponding to nucleotide positions 26067-26441 in SARS-CoV-2 reference sequence MN908947.2 was amplified by PCR and transcribed to RNA using the Thermofisher T7 MEGAscript kit, according to the manufacturer's instructions (www.thermofisher.com/). Transcribed RNA was then purified from the reaction mix and quantified. The resulting RNA was diluted 10-fold, and the number of copies was calculated. The average Cq values and standard deviation for each dilution are detailed in Supplementary Table S1. Determination of the analytical limit of detection (LOD) was performed by running nine serial dilutions, in triplicates, and plotting the resulting regression curve (Supplementary Fig. S1). Based on the calculated regression formula, the LOD was determined to be 18.5 copies per reaction.

Library preparation and sequencing
RNA in extracted nucleic acids was reversed transcribed to single strand cDNA using SuperScript IV (ThermoFisher Scientific, Waltham, MA, USA) as per manufacturer's instructions. SARS-CoV-2 specific primers designed to capture SARS-CoV-2 whole genome were used to generate double strand cDNA and amplify it via PCR using Q5 Hot Start DNA Polymerase (NEB) (https://artic.network/ncov-2019). Briefly, each sample underwent two PCR reactions with primer pool 1 or 2 and 5X Q5 reaction buffer, 19 mM dNTPs and nuclease-free water. Resulting DNA was combined and quantified with Qubit dsDNA BR Assay kit (ThermoFisher Scientific) as per manufacturer's instructions and 1ng of amplicon DNA in 5 ml per sample was taken into library preparation. Libraries were prepared using NexteraXT library preparation kit and NexteraXT index kit V2 as per manufacturer's instructions (Illumina, San Diego, CA, USA). Libraries were purified with AMPure XP magnetic beads (Beckman Coulter, Brea, CA, USA) and library concentration was measured by Qubit dsDNA HS Assay kit (Thermo Fisher Scientific, Waltham, MA, USA). Library validation and mean fragment size was determined by TapeStation 4200 via DNA HS D1000 kit (Agilent, Santa Clara, CA, USA). The mean fragment size was~400 bp, as expected. The library mean fragment size and concentration molarity was calculated and each library was diluted to 4 nM. Libraries were pooled, denatured and diluted to 10pM and sequenced on MiSeq with V3 2 × 150 bp run kit (Illumina).

Bioinformatic analysis
Fasq files underwent quality control using FastQC (www. bioinformatics.babraham.ac.uk/projects/fastqc/) and MultiQC (Ewels et al., 2016), and low-quality sequences were filtered using trimmomatic (Bolger et al., 2014). Mapping was performed to the SARS-CoV-2 reference genome (NC_045512.2) using the Burrows-Wheeler aligner (BWA) (Li and Durbin, 2009), followed by filtering unmapped reads, sorting and indexing of the resulting BAM files using the SAMtools suite . BAM files were imported to R and the frequency of each nucleotide or gap in each position along the genome was obtained for each sample using the Rsamtools Bioconductor package (Morgan et al., 2020). Lists of mutated positions along the genome, i.e., positions that have a nucleotide that is different from the reference genome, with a frequency > 1%, were summarized for each sample. Specific positions of the SARS-CoV-2 VOC were obtained from relevant publications de O. Houriiyah Tegally et al., 2020;Zhang et al., 2021;Bugembe et al., 2021;Sabino et al., 2021;PANGO lineages, n.d.;Anthony et al., 2021). Visualizations were done using the ggplot2 Bioconductor package (Wickham, 2016).

Sequencing of wastewater samples from representative locations across Israel
Nine wastewater samples from different WWTP were collected once a month during August 2020 -February 2021 from different regions and catchment populations in Israel (Fig. 1). The regions chosen included population from various sectors, e.g. Jews, Arabs, Bedouin, orthodox. All samples (n = 58) tested positive to SARS-Cov-2 using RT-qPCR of the E gene, with Cq values ranging from 27 to 35. SARS-CoV-2 whole genome sequencing of samples was successful, despite the high Cq values, with an average breadth of 90 ± 10% and an average depth of >1000 (Supplementary Table S2).

Surveillance of SARS-CoV-2 variants
Mutations associated with known worldwide VOC were examined in each region/time (Fig. 2). A mutation was determined where >1% of the nucleotides mapped to a certain position were different from the reference and the frequency was set to the % of mutated nucleotides in the position out of the total number of nucleotides mapped to the position. Results show the penetration of the B.1.1.7 (UK) variant into Israel in December 2020 into the Shafdan and Tzfat regions, and its spread into the Beer-Sheva, Ashdod and Jerusalem regions in January 2021, and into the Rahat and Haifa regions in February 2021. In these occurrences, most mutational positions associated with the B.1.1.7 have elevated frequencies, including the spike protein mutation N501Y and the ORF1 protein deletions 3675-3677, which are associated with multiple VOC. Indeed, the B.1.1.7  was first identified in Israel via clinical sampling and sequencing in December 2020. By January it was already widespread in~30% of sequenced random samples, and in February it had spread to 60.5% (Israel ministry of

Discussion
Investigation of SARS-CoV-2 spread and identification of variants in wastewater has been demonstrated to accurately detect prevalence of viral strains. The approach is based on pooled samples and can only detect mutations per each position in the genome rather than strains per individual. Nevertheless, it holds numerous advantages over human clinical sampling, mainly in terms of ethical issues and population catchment, where only few wastewater samples provide an overall view of viral diversity compared to thousands of samples in humans. Herein, we utilized an established nationwide system of wastewater sampling and viral concentration approaches to perform large-scale surveillance of SARS-CoV-2 variants via next generation sequencing of SARS-CoV-2 genomes, sampled each month in 9 different WWTP across Israel, from August 2020 to February 2021.
Sequencing of SARS-CoV-2 from wastewater was successful, with high coverages of the genome obtained for most samples and sufficient depth to call mutations, despite the high Cq values that are characteristic of such samples. These results facilitated robust surveillance of variants throughout Israel, including regions in which low morbidity has been reported by clinical sampling.
The first detection of the B.1.1.7 variant in Israel, under the national program for SARS-CoV-2 sequencing that was recently established to track VOC in clinical samples, was in the end of December 2020. By February 2021, the variant was identified in 60.5% of randomly sequenced samples (Israel Ministry of Health, data not published) and is estimated by models to dominate 80% of current infections (data not shown). Mutation analyses of SARS-CoV-2 in wastewater samples successfully identified the penetration of the B.1.1.7 variant into Israel in December 2020 in the Shafdan and Tzfat regions, and its spread into additional regions by February 2021, corresponding with clinical sampling results. Some areas show later penetrance of the B.1.1.7 variant due to different reasons, such as being located remotely (e.g. ElHamra) or a missing sampling point (e.g. Natanya in February). Moreover, the identification by wastewater sampling of the B.1.1.7 variant in Tzfat in December 2020, a region in which non-sufficient clinical sampling was available, exemplifies that surveillance by wastewater is a robust approach that can cover large areas by few samples, alerting ahead regarding the penetrance and spread of VOC. Although, additional work is required to assess the correlations between clinical and wastewater sampling systems.
To summarize, this study illustrates the advantages in sequencing of wastewater, and its utilization as robust approach to monitor the diversity of SARS-CoV-2 strains circulating in a community, alerting regarding the emergence of epidemiologically or clinically relevant mutations or variants and aiding in public health decision making.

Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.