City-wide wastewater genomic surveillance through the successive emergence of SARS-CoV-2 Alpha and Delta variants

Genomic surveillance of SARS-CoV-2 has provided a critical evidence base for public health decisions throughout the pandemic. Sequencing data from clinical cases has helped to understand disease transmission and the spread of novel variants. Genomic wastewater surveillance can offer important, complementary information by providing frequency estimates of all variants circulating in a population without sampling biases. Here we show that genomic SARS-CoV-2 wastewater surveillance can detect fine-scale differences within urban centres, specifically within the city of Liverpool, UK, during the emergence of Alpha and Delta variants between November 2020 and June 2021. Furthermore, wastewater and clinical sequencing match well in the estimated timing of new variant rises and the first detection of a new variant in a given area may occur in either clinical or wastewater samples. The study's main limitation was sample quality when infection prevalence was low in spring 2021, resulting in a lower resolution of the rise of the Delta variant compared to the rise of the Alpha variant in the previous winter. The correspondence between wastewater and clinical variant frequencies demonstrates the reliability of wastewater surveillance. However, discrepancies in the first detection of the Alpha variant between the two approaches highlight that wastewater monitoring can also capture missing information, possibly resulting from asymptomatic cases or communities less engaged with testing programmes, as found by a simultaneous surge testing effort across the city.


Introduction
Genomic surveillance has been a significant feature in the public health response to the SARS-CoV-2 pandemic Zhu et al., 2020) because of its ability to detect the emergence of and track new variants of concern (VOC) (Robishaw et al., 2021). Important therapeutic response or reduced vaccine effectiveness Robishaw et al., 2021). Thus, while vaccination currently provides substantial protection against all known VOC, continued genomic surveillance is essential to mitigate and contain their threat to public health. It informs the implementation and assessment of non-pharmaceutical interventions (e.g., social distancing, lockdowns, and regional, national, and international restrictions) and targeted surge testing. It also serves as an early warning system for the emergence and spread of novel variants Tegally et al., 2021).
Genomic surveillance of SARS-CoV-2 has primarily been driven by whole genome sequencing of clinical isolates, typically using residual RNA from diagnostic RT-qPCR tests. One million SARS-CoV-2 genomes were sequenced worldwide by April 2021, rising to over seven million by January 2022 on the GISAID database (Elbe and Buckland-Merrett, 2017). This has provided unprecedented insight into the joint evolution and epidemiology of the SARS-CoV-2 pandemic (Harvey et al., 2021;Ward et al., 2021). Nevertheless, the cost of clinical sequencing to generate these data has been and continues to be substantial (10 -35 GBP per sample for consumables (Tyson et al., 2020), excluding similar costs for staff, logistics and data infrastructure). It may be unsustainable at the levels required to adequately inform public health authorities as SARS-CoV-2 becomes endemic and threatens public health for the foreseeable future, even in developed nations.
Wastewater-based surveillance is a complementary, cost-effective approach to clinical sequencing, which has gained significant attention throughout the COVID-19 pandemic Jahn et al., 2021;Mishra et al., 2021;Peccia et al., 2020;Rios et al., 2021;Smyth et al., 2022). Given that SARS-CoV-2 is shed in faeces by more than 50% of infected people (Foladori et al., 2020), it can be recovered from wastewater, its RNA extracted, and its presence and quantity in a wastewater catchment determined using RT-qPCR (Farkas et al., 2020), with trends generally tracking the rise and fall of corresponding clinical cases Peccia et al., 2020). This can be achieved for entire populations by sampling at the inlet of wastewater treatment plants, or at much finer spatial scales, such as across cities, by sampling within the sewer network.
More recently, the recovery of SARS-CoV-2 genomes from wastewater has opened up the possibility of detecting and tracking circulating SARS-CoV-2 variants Jahn et al., 2021;Peccia et al., 2020;Rios et al., 2021;Smyth et al., 2022). Such an approach is particularly attractive for population-level insights during periods of high prevalence, especially if capacity constraints reduce the proportion of sequenced positive RT-qPCR tests. Furthermore, it can detect asymptomatic cases and is proposed to capture communities under-represented by clinical testing, particularly in urban centres (Green et al., 2021;Polo et al., 2020).
Nevertheless, moving from detecting and quantifying SARS-CoV-2 in wastewater by RT-qPCR to characterisation by genome sequencing is challenging. The low abundance of SARS-CoV-2 means enrichment through RNA concentration methods is necessary, simultaneously enriching PCR inhibitors and contaminating bacterial, viral, and human nucleic acids (Peccia et al., 2020). SARS-CoV-2 genomes in wastewater are also highly degraded and fragmented. In combination, this can result in poor and inconsistent amplification of target amplicons and, thus, patchy genome coverage. Even if amplification and sequencing are successful, data interpretation can be difficult. Wastewater harbours a mixed SARS-CoV-2 population. Therefore, sequences are derived from a pool of fragments, removing much of the phase information between polymorphic sites on the genome used to assign phylogeny and lineage. However, by reference against clinically-derived genomes of known SARS-CoV-2 lineages, wastewater data has the potential to detect and quantify polymorphisms characteristic of defined lineages and VOC in particular (Fontenele et al., 2021;Jahn et al., 2021).
Our study demonstrates the utility of wastewater-based genomic surveillance of SARS-CoV-2 using longitudinal data collected from multiple locations in a single city -Liverpool, UKbetween November 2020 and June 2021. During this time, Liverpool was the subject of a pilot study evaluating lateral flow tests for rapid asymptomatic testing . This pilot noted the link between social inequalities and testing uptake, with social deprivation and digital exclusion as significant factors limiting uptake (Green et al., 2021). Wastewater-based epidemiology (WBE) can provide valuable insight into some of the communities or areas of Liverpool that may be less accessible to conventional testing. This period in the UK also saw the emergence and establishment of the Alpha (B.1.1.7) and, subsequently, the Delta (B.1.617.2) SARS-CoV-2 variants. We show that wastewater genomic surveillance reliably detected the emergence of both and their subsequent rise across a city.

Sample collection, concentration and RNA extraction
Wastewater grab samples (1 L per sample) were collected from eight locations across Liverpool's sewer network and from the main wastewater treatment plant (WWTP) at Sandon Docks between the 2nd of November 2020 and the 21st of June 2021, as part of the ongoing Environmental Monitoring for Health Protection programme (EMHP, part of NHS Test & Trace, now the UK Health Security Agency) in England ( Fig. 1, Table S1). In addition, concurrent samples from four WWTPs in the southeast of England were collected as a control group between the 2nd of September 2020 and the 17th of January 2021. Samples were transported and subsequently stored at 4 -6 • C until  Table S1 for further catchment details. analysis, minimising RNA degradation. Within 24 h of collection, all samples were centrifuged (10,000 x g, 4 • C, 10 min) in sterile polypropylene tubes to remove suspended solids. The supernatant (50 ml) was transferred to 250 mL polycarbonate PPCO bottles containing 19-20 g of ammonium sulfate (Sigma-Aldrich, Cat. No. A4915). After the ammonium sulfate had dissolved, the samples were incubated at 4 • C for 1 h before further centrifugation (10,000 x g, 4 • C, 30 min) and supernatant removal. The pellet was resuspended in 200-500 μL of PBS.
Concentrates were stored at 4 • C until nucleic acid extraction. Nucleic acids were extracted from concentrates using NucliSens lysis buffer (BioMerieux, Marcy-lÉtoile, France, Cat No. 280134 or 200292), NucliSens extraction reagent kit (BioMerieux, Cat. No. 200293) either manually  or using the King-fisher 96 Flex system (Thermo Scientific, Waltham, MA, USA) according to the manufacturer instructions (Kevill et al., 2022), generating RNA extracts of 50 -100 µL in volume. Extracts were stored at -80 • C until further processing. Genome copies per litre (gc/l) of wastewater were calculated using One-step RT-qPCR for the SARS-CoV-2 N1, Phi6 and MNV targets using an RNA Ultrasense One-step RT-qPCR system (Life Technologies, Carlsbad, CA, USA, Cat. No. 11732927), on a Quant Studio Flex 6 (Applied Biosystems Inc., Waltham, MA, USA) as previously described (Kevill et al., 2022). Data were not subsequently normalised by flow rate, chemical composition, etc., since we were interested in the contribution of a variant to the proportion of viral RNA in a sample, not absolute case numbers.
To aid VOC and VUI identification at low frequencies from wastewater samples, we adopted a recently described amplicon-level cooccurrence approach (Jahn et al., 2021). Briefly, co-occurring mutations were called from BAM files using CoOccurrence adJusted Analysis and Calling (COJAC) (Jahn et al., 2021), facilitating the identification of signature mutations co-occurring on the same sequencing read, that is, a read or paired read coming from the same amplicon, thus one SARS-CoV-2 virion. This greatly improves confidence in variant detection, especially at low frequencies, since co-occurring mutations are less likely to arise through sequencing error than individual SNPs (Jahn et al., 2021). We extracted co-occurrence signature mutations of the B.1.1.7 (VOC-20DEC-01, Alpha) and B.1.617.2 (VOC-21APR-02, Delta) lineages. Since several signature mutations are shared amongst VOC/-VUI, not all variants have a unique set of co-occurring mutations. B.1.1.7 has unique pairs of co-occurring mutations on amplicon 146 (genome positions 27972 (Q27*) and 28048 (R52I)) and amplicon 147 (genome positions 28111 (Y73C) and 28280 (D3L)), while B.1.617.2 only has one non-unique pair of mutations on amplicon 121 (genome positions 22917 (L452R) and 22995 (T478K)), i.e., it is shared with other variants.

Statistical analyses
We used R version 4.1.1 (R Core Team, 2021) for all statistical analyses and ggplot2 (Wickham, 2016) for visualisations.
Prior to analysis, all unique signature mutations (SNPs/Indels) of a given variant (Fig. 2) were identified in each sample, mutations with a read depth <10 removed and frequencies of 1.0 and 0 rescaled to 0.99 and 0.01 for beta regression compatibility, respectively. We modelled the relationship of the mean frequency of each variant's signature SNPs/ indels with location (i.e. differing network sites) and time during respective variant emergences with beta linear regression (betareg, "betareg" v.3.1.4, (Cribari-Neto and Zeileis, 2010)), given allele frequencies are in the standard unit interval [0, 1]. To do so, we set the mean frequency of the unique signature SNPs/indels of a given variant as the dependent variable and wastewater site, date, and their interaction as predictor variables. All models were fit by maximum likelihood using the logit link function, logit(p) with p the probability of observing the (variant) data and logit being the inverse of the standard logistic  function, and included site as an additional regressor for the precision parameter when it improved the model fit (see Table S4 for final model structures), as indicated by Akaike Information Criterion (AIC) and likelihood ratio tests (Cribari-Neto and Zeileis, 2010). To account for missing data in SNP/indel frequencies, a weighting factor was applied using the number of used signature SNPs/indels (weights = n). We assessed model validity by visual checks of homoscedasticity of the standardised weighted residuals and linearity of the model fit (Fig. S2). We then extracted likelihood ratio tests of estimated marginal means for each predictor variable (joint_tests, "emmeans", Table S4).
We also compared the frequency of detected VOC/VUI signature SNPs/indels in wastewater samples to the frequency of VOC/VUI identified in clinical cases by the COVID-19 Genomics (COG-UK) Consortium between the 2 nd of November 2020 and the 21 st of June 2021 across Liverpool. We extracted counts of genomically confirmed cases for all circulating lineages from the CLIMB platform (Nicholls et al., 2021) on the 26 th of October 2021 and then filtered and grouped them by the outer postcodes covered by the catchment areas of the WWTP and the eight sewer network sites. For the Delta variant, confirmed clinical cases of the B.617.2 lineage and its subvariant AY.4 were combined. Where outer postcodes spanned multiple wastewater catchments, we included clinical cases in counts for all those sites, divided by the number of overlapping wastewater catchments. Additionally, we obtained total daily infection numbers for the upper-tier local authority of Liverpool from UK Government statistics (https://coronavirus.data.gov.uk, Fig. S5).
We modelled the frequency change of variants in clinical data over time with beta linear regression in the same way as for wastewater variant frequencies. To test the time match between a respective variant's frequency in wastewater and clinical samples, we used Spearman's rank correlation with a series of possible time lag settings to find the time frame shift with the best match for each sampling area.

The rise of Alpha variant
Across all catchments, we observed a significant increase in the mean frequency of Alpha (B.1.1.7) signature SNPs/indels between the 2 nd of November 2020 and the 28 th of February 2021 (F 1 = 13667, P < 0.001, Fig. 3, Table S4). This closely corresponds with the observed rise in Alpha clinical cases across Liverpool for the same period (Fig. 4) and wastewater data from four WWTPs in the southeast of England (F 1 = 13829, P < 0.001, Table S4, Fig. S3). For most sites, the rise of the Alpha variant began in mid to late December, with peak frequencies observed in late January and early February (Figs. 3 and 4). As defined by cooccurrence analysis, the earliest wastewater Alpha variant detection preceded clinical detections in five of the nine sites by up to 55 days (Fig. 4). The contrary was observed in the remaining four sites, with clinical samples picking up Alpha up to 26 days earlier (Fig. 4). The best time match between Alpha frequencies in wastewater and clinical samples also depended on catchment. The closest match varied from a 5day lead to a 2-day lag in the wastewater when testing a range of 5-day lag to 5-day lead of wastewater frequencies (Fig. 4, Table S5). However, this does not match the relative pattern of the earliest detection in the two sample types.  (Table S4). Point shape indicates the number of unique Alpha-specific mutations used in the mean calculation for a given sample: empty circles: 1 mutation, crossed square: 2 to 5 mutations, filled circles: >5 mutations.
We detected local differences in the rise of the Alpha variant between wastewater catchments (date: site interaction, F 8 = 32.8, P < 0.001, Table S4). This was most notable in Strand SSO (STS), where we observed a high frequency of Alpha signature SNPs from four samples in early November (Fig. 3), though this signal diminished before further detections in early January. Similarly, we observed Alpha signature SNPs at a low to moderate frequency at Fazakerley High (FZH) as early as mid-November, while they were barely detected until late December in Mersey Road (MRD, Fig. 3). This suggests Alpha spread through parts of the north of the city earlier than through the south (Fig. 1), a finding corroborated by co-occurrence analysis but not clinical data (Fig. 4).

The decline of Alpha and rise of Delta variant
From the 15 th of March to the 26 th of June 2021, we observed a significant increase in the frequency of Delta signature SNPs in all wastewater catchments (F 1 = 964.1, P < 0.001, Table S3, Fig. 5), with variation in temporal trends across the city (date: site interaction, F 8 = 32.4, P < 0.001, Table S4, Fig. 5). This coincides with a rise in clinical cases of the same variant (B.1.617.2 and AY.4, Fig. S4) and Alpha's decline (Figs. 3 and 4). The best time match between Delta frequencies in wastewater and clinical samples again varied by catchment from a 4day lead to a 5-day lag in the wastewater. In all catchments, a significant correlation of frequencies in wastewater and clinical samples was confirmed (Fig. S4, Table S6).
It is noteworthy that the observed transition from Alpha to Delta in wastewater was abrupt (Figs. 4,5 and S4). From April to early June, infection numbers were low across Liverpool (Fig. S5), and wastewater SARS-CoV-2 concentrations were consequently low (Fig. S6). This is reflected in the observed reduction in mapped reads and genome coverage for this period (Fig. S1). Indeed, the detection of Alpha and Delta signature SNPs was more sporadic during this period (Figs. 4,5 and S4). Lower SARS-CoV-2 concentrationsand associated lower data qualityprobably also contributes to the lower estimates of Alpha frequency in wastewater relative to clinical data from around March 2021.
Co-occurrence analysis was less discriminatory for Delta than Alpha and, thus, a less reliable indicator of its presence in wastewater catchments. We note an apparent co-occurrence signal for Delta, i.e., the detection of L452R and T478K co-occurring on amplicon 121 of our primer panel, as early as November for all wastewater catchments (Fig. S4). This was too early to be Delta, its sub-lineages (e.g., AY.2 and AY.3) or known VUI B.1.629, B.1.630 and B.1.633 (first detected globally in February and March 2021), which also carry this pair of cooccurring mutations. The detected signal, thus, must reflect other circulating variants carrying these SNPs or false positives due to reliance on co-occurrence in a single amplicon.

Discussion
Genomic surveillance of wastewater has already shown great Points show the mean frequency of unique Alpha-specific mutations for a given wastewater sample (blue) and the frequency of Alpha variant clinical cases from a given date (yellow). Coloured lines show the respective local polynomial regression fit including shaded 95% confidence intervals. Vertical lines indicate the first confirmed clinical case of Alpha variant (yellow) and the first wastewater detection of co-occurring Alpha-specific mutations on amplicon 147 (blue). The strongest correlation between a 5-day lead and 5-day lag of wastewater data respective to clinical data is reported at the bottom of each graph.
promise throughout the unfolding SARS-CoV-2 pandemic (Farkas et al., 2020;Polo et al., 2020), including the detection of VOC (Fontenele et al., 2021), in some cases prior to clinical detection (Jahn et al., 2021). Here, we have demonstrated that wastewater monitoring can also reveal fine-scale, local differences in the spread of VOC across urban centres. Spatiotemporal differences in variant frequencies across Liverpool were recoverable throughout the rise of the Alpha variant (B.1.1.7) in early winter 2020 and, despite lower quality data, the rise of the Delta variant (B.1.617.2/AY.4) in spring 2021. This clear reflection of the rise of Alpha and Delta variants, respectively, in clinical and wastewater genomic data, demonstrates the reliability of this approach.
As seen for the Alpha variant here and by Jahn et al. (2021), genomic surveillance of wastewater can detect VOC earlier than clinical testing. In both instances, co-occurrence analysis improved confidence in (early) low-frequency variant detection by identifying multiple linked mutations from the same virion instead of solely relying on single signature mutations. This requires the co-occurrence of mutations unique to a given variant on amplicons of the used sequencing scheme. When no unique mutation set is available, as for the Delta variant and the NimaGen SARS-CoV-2 whole-genome sequencing kit used here, reliable variant detection via co-occurrence analysis is not possible. The software developers have acknowledged this limitation, and the design of primers to create appropriate co-occurrence amplicons for relevant sequencing schemes is suggested as a workaround (Jahn et al., 2021). It is important to note that even in cases where co-occurrence analysis is applicable, our fine-scale local data highlighted that wastewater monitoring sometimes detects new variants earlier than clinical testing, but not always. The reasons for this are yet unclear. It is likely that the inherent variability of wastewater detections, due to variations in viral shedding rates and dilution from rainfall (Polo et al., 2020), and the increasing stochasticity of clinical detection with decreasing population size play a role. It is also worth noting that these samples were almost entirely grab samples, which likely sample fewer clinical infections than composite samples. Certainly, the relationships between population size, wastewater flow variation and SARS-CoV-2 variant detection warrant further investigation.
The mixed pattern of wastewater Alpha variant detections preceding confirmed clinical cases in some parts of Liverpool, yet vice versa in others, highlights the complementarity of the approaches. Wastewater monitoring has the notable advantages of being more cost-effective per unit of population and is less biased by testing frequencies in different communities (Polo et al., 2020), while sequencing of clinical samples provides greater specificity and the opportunity for contact tracing. Indeed, our finding that the Alpha variant was detected in wastewater in North Liverpool much earlier than clinical cases had indicated, corresponds well with findings from a large-scale asymptomatic testing campaign, which found that testing uptake was lower in North Liverpool, yet the rate of positive tests higher (Green et al., 2021). Clearly, a combination of genomic surveillance of clinical cases and wastewater is most likely to detect new variants as early as possible and provides the most precise picture of unfolding variant dynamics to inform public health measures .
Intriguingly, when comparing peak Alpha and Delta variant frequencies in corresponding clinical and wastewater data, we find that  (Table S4). Point shape indicates the number of unique Delta-specific mutations used in the mean calculation for a given sample: empty circles: 1 mutation, crossed square: 2 to 5 mutations, filled circles: >5 mutations. each variant, in turn, reaches complete dominance in clinical but not wastewater samples, with estimated maximum frequencies of 100% and ~75% in clinical and wastewater samples, respectively. Correspondingly, correlations between clinical and wastewater data appear stronger in sampling areas with higher maximum frequency estimates in wastewater. It is notable that viral concentrations were low from March -May 2021, which corresponded to public health restrictions in the UK to contain infection numbers, and which led to lower sequence data quality for these samples. Improved viral concentration methods may mitigate this limitation (Kevill et al., 2022). Equally, better statistical methods may be required to estimate lineage frequencies from pooled sequencing data, as produced from wastewater samples (Amman et al., 2022;Karthikeyan et al., 2022). Here we have relied on a relatively crude estimation by taking the mean of signature SNP/Indel frequencies. However, genetic variation may mean that a SNP/Indel may not be present on all branches within a lineage, whereas a single fully phased viral genome from a clinical sample would be reliably assigned to a lineage. Following methods for analysing metagenomic amplicon data (Grubaugh et al., 2019;Quince et al., 2021;2011), the development of statistical methods to infer lineage proportions from multiple amplicons while controlling for sequencing error may be productive.
The limited quality of sequences obtained from wastewater during April and May 2020 also highlights the current limits of variant detection via wastewater sequencing when case numbers, and hence SARS-CoV-2 concentrations, are low. While the rise of Delta was evident in our results (Fig. 5), the transition from Alpha to Delta, compared to the gradual emergence of Alpha, was less visible and more abrupt. If new, more transmissible variants are associated with more rapid emergence and a quicker rise in frequency, this would highlight the need to develop more sensitive and accurate methods of variant detection and quantification within wastewater. It is, however, anticipated that the increased adoption of wastewater-based epidemiology will drive innovation in wastewater sampling, concentration and RNA extraction, improving viral qPCR and sequencing sensitivity Kevill et al., 2022;Polo et al., 2020).

Conclusions
• We show that wastewater genomic sequencing can reliably detect the emergence and rise of new SARS-CoV-2 variants. • Variant frequency estimates from wastewater sequencing correspond well with those obtained through genomic sequencing of clinical samples. • In some cases, variants are observed in wastewater before clinical detections, which may be particularly useful in areas or communities with low testing uptake.

Author contributions
MRB and SP conceived and designed the study. ARJ developed laboratory methods for data collection. JLK, KF, EC and CW collected the wastewater data. COG-UK provided clinical and community testing data. FSB and MRB processed and analysed the data. IB, HB, MSK, RvA and SP provided bioinformatic support for processing the data. MJW, DLJ and SP provided supervision of the work. FSB, MRB, and SP drafted the manuscript. All authors reviewed and edited the manuscript. All authors approved the final version of the report.

Data accessibility statement
Wastewater sequencing data is publicly available on the European Nucleotide Archive under Study ID PRJEB53325 (ERP138109). The clinical case data used in this study are visualised at https://www. cogconsortium.uk/tools-analysis/public-data-analysis-2/. A filtered, privacy conserving version of the lineage-LTLA-week dataset is publicly available online (https://covid19.sanger.ac.uk/downloads) and gives access to almost all used data, despite a small number of cells having been suppressed to conserve patient privacy.

Ethics statement
Use of surplus nucleic acid derived from routine diagnostics and associated patient data was approved through the COG-UK consortium by the Public Health England Research Ethics and Governance Group (R&D NR0195).

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability
Data is available according to the Data Availability statement in the manuscript