Novel Nested-Seq Approach for SARS-CoV-2 Real-Time Epidemiology and In-Depth Mutational Profiling in Wastewater

Considering the lack of effective treatments against COVID-19, wastewater-based epidemiology (WBE) is emerging as a cost-effective approach for real-time population-wide SARS-CoV-2 monitoring. Here, we report novel molecular assays for sensitive detection and mutational/variant analysis of SARS-CoV-2 in wastewater. Highly stable regions of SARS-CoV-2 RNA were identified by RNA stability analysis and targeted for the development of novel nested PCR assays. Targeted DNA sequencing (DNA-seq) was applied for the analysis and quantification of SARS-CoV-2 mutations/variants, following hexamers-based reverse transcription and nested PCR-based amplification of targeted regions. Three-dimensional (3D) structure models were generated to examine the predicted structural modification caused by genomic variants. WBE of SARS-CoV-2 revealed to be assay dependent, and significantly improved sensitivity achieved by assay combination (94%) vs. single-assay screening (30%–60%). Targeted DNA-seq allowed the quantification of SARS-CoV-2 mutations/variants in wastewater, which agreed with COVID-19 patients’ sequencing data. A mutational analysis indicated the prevalence of D614G (S) and P323L (RdRP) variants, as well as of the Β.1.1.7/alpha variant of concern, in agreement with the frequency of Β.1.1.7/alpha variant in clinical samples of the same period of the third pandemic wave at the national level. Our assays provide an innovative cost-effective platform for real-time monitoring and early-identification of SARS-CoV-2 variants at community/population levels.


Introduction
A cluster of severe pneumonia cases of unknown origin, linked to the Huanan seafood wholesale market in Wuhan, Hubei Providence, China, were reported by the Chinese Health Authorities on 31 December 2019. Sequencing-based analysis of lower respiratory tract samples (bronchoalveolar lavage fluid) identified a novel beta-coronavirus sharing >85% sequence similarity with a bat severe acute respiratory syndrome (SARS)-like coron-Bioinformatic analysis with the ScanFold server provided an informative overview regarding the highly predicted stable regions of SARS-CoV-2 RNA. Specifically, the analyses led to the identification of 300 nt sequences bearing negative z-scores, which were predicted to possess low MFE native values, thus, being more stable. Sequences with notable negative z-scores were found to distribute across the SARS-CoV-2 genome ( Figure  1).

Figure 1.
Schematic illustration of the in silico SARS-CoV-2 RNA stability analysis with the ScanFold algorithm, indicating the highly predicted regions targeted by our nested PCR/real-time PCR detection assays (NSP3, helicase, ORF3a, and N detection assays).
Using the CDC/2019-nCoV_N1-based assay, detection of SARS-CoV-2 RNA was achieved in up to five cDNA copies/PCR reaction (weak PCR product). The application of the nested PCR/real-time PCR assays improved the detection performance in up to two cDNA copies/PCR reaction for N, helicase, and ORF3a assays. Furthermore, the NSP3 assay detected SARS-CoV-2 RNA in up to five cDNA copies/PCR reaction ( Figure S1). Similar results regarding the performance and the limit of detection (LOD) on the SARS-CoV-2 RNA control sample were highlighted by the nested real-time PCR assays ( Figure S2). Figure 1. Schematic illustration of the in silico SARS-CoV-2 RNA stability analysis with the ScanFold algorithm, indicating the highly predicted regions targeted by our nested PCR/real-time PCR detection assays (NSP3, helicase, ORF3a, and N detection assays).
To investigate the impact of our novel assays on SARS-CoV-2 detection sensitivity, serial dilutions of SARS-CoV-2 complete genome RNA control, covering nine orders of magnitude (from 10 3 to 2.5 RNA copies/reverse transcription) were analyzed by the novel nested PCR/real-time PCR assays, and a newly in-house developed assay using the CDC proposed "2019-nCoV_N1" set of primers and probe (CDC/2019-nCoV_N1-based assay) (https://www.cdc.gov/coronavirus/2019-ncov/lab/rt-pcr-panel-primer-probes. html accessed on 1 July 2020).
Using the CDC/2019-nCoV_N1-based assay, detection of SARS-CoV-2 RNA was achieved in up to five cDNA copies/PCR reaction (weak PCR product). The application of the nested PCR/real-time PCR assays improved the detection performance in up to two cDNA copies/PCR reaction for N, helicase, and ORF3a assays. Furthermore, the NSP3 assay detected SARS-CoV-2 RNA in up to five cDNA copies/PCR reaction ( Figure S1). Similar results regarding the performance and the limit of detection (LOD) on the SARS-CoV-2 RNA control sample were highlighted by the nested real-time PCR assays ( Figure S2).

Effective SARS-CoV-2 Detection in Wastewater Requires Amplification of Multiple Targets
Our novel nested PCR/real-time PCR assays along with the CDC/2019-nCoV_N1based assay were applied to 30 wastewater samples (S1-S30) obtained from Wastewater Treatment Plant (WWTP) of Athens from August (n = 2), September (n = 16), and October (n = 12) 2020. The cycle threshold (C T ) values of the positive samples per assay are presented in Table 1.  (13/30, 43.3%). Among the 17 positive samples, the CDC/2019-nCoV_N1-based assay was positive in five samples (sensitivity 29.4%). In this study, the developed novel nested real-time PCR assays resulted in the following detection of SARS-CoV-2 RNA: (a) N assay, 10 samples (sensitivity 58.8%); (b) NSP3 assay, 9 samples (sensitivity 52.9%); (c) helicase assay, 7 samples (sensitivity 41.2%); and (d) ORF3a assay, 5 samples (sensitivity 29.4%). As expected, the application of the novel nested real-time PCR assays resulted in significantly lower CT values vs. the CDC/2019-nCoV_N1-based assay (Table 1), thus, highlighting their improved analytical performance. The analysis of the samples with the nested PCR assays (end-point analysis) led to the same results for most of the samples ( Figure 2). Specifically, the agreement of nested PCR/real-time PCR results was 100% for N and ORF3a assays. A notable discrepancy was observed for one sample (S20), which was found negative by the NSP3 nested PCR assay, as well as for three samples (S1, S8, and S15) which showed a weak positive signal in the helicase nested PCR assay as compared with the same nested real-time PCR assay. In this regard, the overall agreement of nested PCR/real-time PCR assays was 96.7%. Most importantly, in the majority of the positive samples (10/17, 58.8%), SARS-CoV-2 RNA was detected by a single assay, while five (5/17, 29.4%) and one (1/17, 5.9%) samples were scored positive by three or four assays, respectively. In this regard, a significantly improved detection specificity was achieved by the following combinations: (a) N and NSP3 assays, 14/17, sensitivity 82.4%; (b) N, NSP3, and helicase assays, 16/17, sensitivity 94.1%. Our results clearly highlight the significantly improved sensitivity by the combination of more than one assays, and that the detection of SARS-CoV-2 in wastewater is genomic region dependent, as different results are being obtained by targeting different regions of SARS-CoV-2 genome.

Mutational Analysis of SARS-CoV-2 in Wastewater Using Targeted DNA-seq
Five well-characterized missense mutations, D614G (23403A>G)-S gene, Q57H (25563G>T)-ORF3a gene, P323L (14408C>T)-ORF1ab/RdRP gene, R203K (28881G>A)-N gene, and G204R (28883G>C)-N gene, were initially targeted as proof-of-principle of our methodology to perform the analysis and quantification of SARS-CoV-2 mutations/strains in wastewater samples. To perform DNA-seq, novel nested PCR assays, against the above-mentioned point mutations, were designed and applied in September Most importantly, in the majority of the positive samples (10/17, 58.8%), SARS-CoV-2 RNA was detected by a single assay, while five (5/17, 29.4%) and one (1/17, 5.9%) samples were scored positive by three or four assays, respectively. In this regard, a significantly improved detection specificity was achieved by the following combinations: (a) N and NSP3 assays, 14/17, sensitivity 82.4%; (b) N, NSP3, and helicase assays, 16/17, sensitivity 94.1%. Our results clearly highlight the significantly improved sensitivity by the combination of more than one assays, and that the detection of SARS-CoV-2 in wastewater is genomic region dependent, as different results are being obtained by targeting different regions of SARS-CoV-2 genome.

Mutational Analysis of SARS-CoV-2 in Wastewater Using Targeted DNA-seq
Five well-characterized missense mutations, D614G (23403A>G)-S gene, Q57H (25563G>T)-ORF3a gene, P323L (14408C>T)-ORF1ab/RdRP gene, R203K (28881G>A)-N gene, and G204R (28883G>C)-N gene, were initially targeted as proof-of-principle of our methodology to perform the analysis and quantification of SARS-CoV-2 mutations/strains in wastewater samples. To perform DNA-seq, novel nested PCR assays, against the above-mentioned point mutations, were designed and applied in September (n = 5) and October/November (n = 8) 2020 samples. The PCR products were used to gener-ate DNA-seq barcoded libraries corresponding to September and October/November 2020 samples. The genomic variation profiling (list of existing SNVs or indels) of SARS-CoV-2 is summarized in Table 2, and visualized in Figures 3 and 4. The analysis highlighted that the G614 strain of SARS-CoV-2, originating from the D614G (23403A>G) point mutation in S gene, was exclusively detected (>99.9%) over the D614 strain. This finding is in line with the genomic data of the GISAID (https://www. gisaid.org/ accessed on 1 May 2021) and Nexstrain (https://nextstrain.org/sars-cov-2 accessed on 1 May 2021) databases, highlighting the significant prevalence of the G164 strain in percentage of >98% and~100% worldwide and in Europe, respectively. Similarly, the P323L (14408C>T) substitution in the ORF1ab/RdPR gene was also prevalent,~99.9%, in both time periods of sampling, also in agreement with the genomic epidemiology data (~99% worldwide and in Europe). Interestingly, the Q57H (25563G>T) mutation in the ORF3a gene was solely found observed in October/November samples at a percentage of 47%, which is in accordance with the growing trend observed worldwide during the last months (11/2020,~43%). In addition to the characterized D614G mutation, a previously unknown point mutation within S gene, H625R (23436A>G), was found at a frequency of 5.7% in September samples. The 23436A>G substitution results in the change of histidine to arginine at position 625 of the spike protein. Even though two similar amino acids are substituted, based on the in silico protein structure analysis, H625R leads to subtle alterations in the spike protein folding ( Figure 5). As a result, the H625R-mutant spike protein may exhibit differential biochemical properties, which should be further investigated, since they may have a severe impact on the functionality of the protein, making the virus more transmissible and/or infectious. Structures of Spike trimeric are publicly available from the Protein Data Bank (PDB) database (https://www.rcsb.org/ accessed on 01/05/2021). Moreover, a novel point substitution, A54V (25553C>T), was also detected at a percentage rate of ~9% of September samples, resulting in the change of alanine to valine at position 54 of the ORF3a polypeptide. The amino acids both represent aliphatic, nonpolar neutral residues, and thus are not expected to induce critical structural alterations in ORF3a polypeptide functionality.
Focusing on the N gene, a declining trend was observed for the missense point mutations 28881G>A (R203K) and 28883G>C (G204R), as well as the synonymous substitution 28882G>A, from ~90% in September to ~70% in October/November samples. Our data agree with the declining trend of the above-mentioned substitutions worldwide (09/2020, 45% and 11/2020, 28%), although their absolute percentage in Greek samples remain significantly higher. Interestingly, a point substitution 28884G>T, which has been observed in ~1% worldwide, was overrepresented in our datasets, 09/2020, ~70% and 10-11/2020, ~35%. More importantly, our analysis highlighted the significant correlation of 28884G>T and 28883G>C point mutations, resulting in a novel amino acid substitution from glycine In addition to the characterized D614G mutation, a previously unknown point mutation within S gene, H625R (23436A>G), was found at a frequency of 5.7% in September samples. The 23436A>G substitution results in the change of histidine to arginine at position 625 of the spike protein. Even though two similar amino acids are substituted, based on the in silico protein structure analysis, H625R leads to subtle alterations in the spike protein folding ( Figure 5). As a result, the H625R-mutant spike protein may exhibit differential biochemical properties, which should be further investigated, since they may have a severe impact on the functionality of the protein, making the virus more transmissible and/or infectious. Structures of Spike trimeric are publicly available from the Protein Data Bank (PDB) database (https://www.rcsb.org/ accessed on 1 May 2021). Moreover, a novel point substitution, A54V (25553C>T), was also detected at a percentage rate of~9% of September samples, resulting in the change of alanine to valine at position 54 of the ORF3a polypeptide. The amino acids both represent aliphatic, nonpolar neutral residues, and thus are not expected to induce critical structural alterations in ORF3a polypeptide functionality.
Focusing on the N gene, a declining trend was observed for the missense point mutations 28881G>A (R203K) and 28883G>C (G204R), as well as the synonymous substitution 28882G>A, from~90% in September to~70% in October/November samples. Our data agree with the declining trend of the above-mentioned substitutions worldwide (09/2020, 45% and 11/2020, 28%), although their absolute percentage in Greek samples remain significantly higher. Interestingly, a point substitution 28884G>T, which has been observed in~1% worldwide, was overrepresented in our datasets, 09/2020,~70% and 10-11/2020, 35%. More importantly, our analysis highlighted the significant correlation of 28884G>T and 28883G>C point mutations, resulting in a novel amino acid substitution from glycine to leucine in position 204 (G204L) as compared with the single 28883G>C (G204R) or 28884G>T (G204V) substitutions. Moreover, the sequencing data revealed the existence of a simultaneous 4nt deletion following position 28880 (28881_28884del) and a 4 nt insertion at position 28885 (28885_28886insACAT). These two simultaneous indels lead to the R203K and G204H missense mutations of the nucleocapsid protein. The findings obviously indicate that R203K co-exists with G204R, G204L, or G204H variations. Although these mutations are located on the linker region (LKR) of the nucleocapsid phosphoprotein, which spans from position 175-254aa, only R203K belongs to the LKR's crucial Ser/Arg (SR)-rich motif that contains putative phosphorylation sites [17]. Consequently, the absence of R203 residue is expected to have an immediate impact on the N protein folding and functionality, while any of the variations in position 204 of the protein (G204R, G204L, or G204H) lead only to subtle modifications ( Figure 5).
Finally, a significant growing trend from 7% in September to 27% in October/November samples was revealed for the missense mutation S194L (28854C>T), in line with a similar trend observed worldwide (09/2020, 13% and 11/2020, 21%). Similar to R203K, the S194L mutation is also located on the SR-rich motif of the N protein and involves substitution of the hydroxylic neutral serine with the aliphatic neutral leucine. Since this region regulates the N protein oligomerization upon phosphorylation, the S194L could have a significant impact on protein function; this notion is in accordance with the dramatic changes in the predicted protein structure, and therefore merits further study.    To further validate our methodology and to support the need for in-depth epidemiological analysis of existing and/or new emerging variants of SARS-CoV-2, we targeted the mutational analysis of the Spike (S) gene of SARS-CoV-2. In this regard, novel nested PCR assays, against the whole S gene (~4 Kb), were designed and applied in wastewater samples obtained daily in March 2021. The PCR products were used to generate a DNA-seq barcoded library corresponding to March 2021. The analysis led to the conduct of 3.13 million sequencing reads ( Figure S3), with a median read length of 307 bp. The findings of these analyses are summarized in Table 3 and Figure 6.

Discussion
Considering the lack of effective treatments against COVID-19 [6], accurate, massive, and representative at the community level screening along with in-depth epidemiological analysis of existing or new emerging variants, are currently the only ways for an evidencebased approach of applying restriction measures in the near future. To this end, WBE of SARS-CoV-2 has been raised as a modern approach for real-time population-wide surveillance of SARS-CoV-2. Indeed, concentrations of SARS-CoV-2 in wastewater seemingly correlate with COVID-19 infection rates, and precede epidemic expansion and molecular testing at general population levels [14,18]. Moreover, SARS-CoV-2 genetic analysis in human samples could provide a severity-stratification tool for COVID-19 patients, as well as a much-needed approach for the early identification of newly emerging SARS-CoV-2 variants; nonetheless, these approaches remain particularly costly and time-consuming. Thus, mutational profiling of SARS-CoV-2 in wastewater could represent an innovative, cost-effective approach for the monitoring of existing variants and an early-warning system for new emerging ones, at community/population levels.
The current analytical protocols for WBE of SARS-CoV-2 suffer from reduced sensitivity and false negative results, due to, for example, low viral loads, RNA degradation, and PCR inhibition [16]. To overcome these drawbacks, we performed RNA stability analysis of the SARS-CoV-2 RNA genome and identified highly predicted stable regions. This knowledge was exploited for the design of novel in-house methods that combining random hexamers-based reverse transcription and nested PCR/real-time PCR amplification against four highly stable regions of SARS-CoV-2 RNA. The evaluation of our novel assays highlighted the improved LOD (up to two copies/PCR reaction) as compared with one-step RT-PCR methods [19,20], and the significantly improved sensitivity as compared with in-house CDC/2019-nCoV_N1-based assay. Interestingly, more than half of the positive samples were detected by using only one assay, highlighting the anticipated on-going degradation of SARS-CoV-2 RNA in wastewater and clearly demonstrates that SARS-CoV-2 RNA detection in wastewater is genomic region dependent.
Previously, nested PCR approaches have been successfully applied for the detection of SARS-CoV-2 in wastewater [21][22][23][24]. We have documented, here, that the detection of SARS-CoV-2 in wastewater is assay dependent. In this regard, the reduction of false negative results in WBE requires the targeting of more than one SARS-CoV-2 genomic region. Based on our findings, the detection sensitivity of a single assay ranged from~30% to 60%, while the combination of two or three assays improved sensitivity to 82% and 94%, respectively.
Mutation rates and genetic diversity of RNA viruses are significantly high, providing selective advantage to evolve and adapt to dynamic changes of environments and hosts [25]. Despite the presence of genetic proofreading machinery in SARS-CoVs [26], genetic diversity of SARS-CoV-2 is ever-growing, highlighting the unique position of CoVs in the RNA virus world. Thus, specific amino acid changes in SARS-CoV-2 encoded polypeptides could alter SARS-CoV-2 life cycle, infectivity, and/or antigenicity resulting in weakening the on-going vaccination programs and COVID-19 treatment efficacy.
As the SARS-CoV-2 Spike protein prevails as the main target of COVID-19 vaccines [27], mutations of S gene are frequently reported and studied [8,28]. In this regard, the D614G (23403A>G) missense mutation, which was initially identified in Europe, has emerged as the dominant pandemic form, likely due to a significant fitness advantage. The D614G mutation has been strongly associated with higher upper track viral loads and higher rates of younger hosts' infection, as well as with increased replication and higher pseudotyped viral titers ex vivo [7,29,30]. Moreover, recent studies have confirmed the enhanced infectiveness and transmission of G614 strain in vivo [31,32]. Additionally, mutations of RNA-dependent RNA polymerase (RdRP), which is targeted by the anti-viral nucleoside analogues remdesivir [33] and favipiravir [34], are also in the spotlight. Interestingly, the P323L (14408C>T) mutation has been reported to co-evolve with D614G worldwide; this adaptation of the virus might strengthen SARS-CoV-2 G614 strain replication rates and infectivity [35].
In this regard, we have initially targeted five well-characterized missense mutations spanning different genomic regions of SARS-CoV-2, i.e., D614G (23403A>G), P323L (14408C>T), Q57H (25563G>T), R203K (28881G>A), and G204R (28883G>C), and specific nested PCR amplicons were sequenced using DNA-seq. This approach allowed the quantification of SARS-CoV-2 mutations/variations, and our data highlighted the significant prevalence (>99%) of D614G and P323L mutations in wastewater samples obtained from WWTP of Athens, Greece, during September-November 2020, in line with worldwide data based on COVID-19 patient samples. Additionally, the reported worldwide growing trend of the Q57H mutation was confirmed in our samples, with a percentage of~47% in samples collected during October/November 2020.
Interestingly, a previously unknown missense mutation in the S gene, H625R (23436A>G), was identified in~6% of September samples. Although H625R involves the substitution of two amino acids with positively charged polar side chains, the in silico structure analysis suggested significant changes in S protein folding. In this regard, specific monitoring of H625R along with other newly emerging mutants in COVID-19 patients and wastewater is necessary to conclude on their potential selective advantage and possible association with SARS-CoV-2 infectivity and effect on existing vaccines.
Focusing on N gene, the declining trend of 28881G>A, 28882G>A, and 28883G>C substitutions was confirmed in our samples. Interestingly, the 28884G>A substitution, which reportedly has a~1% prevalence worldwide, was overrepresented in Athen's samples (09/2020, 70% and 11/2020, 35%) and correlated significantly with 28883G>C, resulting in the G204L substitution of nucleocapsid protein. Moreover, a novel variant originating from a simultaneous 4 nt deletion (28881_28884del) and a 4 nt insertion at position 28885 (28885_28886insACAT), was observed in our samples, leading to the R203K and G204H missense mutations of nucleocapsid protein.
Finally, to facilitate the in-depth epidemiological analysis of existing and/or new emerging SARS-CoV-2 variants, we have expanded our methodology towards the mutational profiling of the whole S gene of SARS-CoV-2. The analysis of March 2021 samples as proof-of-principle of the methodology, highlighted that the B.1.1.7/alpha variant was dominant in wastewater samples from WWTP of Attica, a finding in accordance with the prevalence of B.1.1.7/alpha variant during the third pandemic wave in Greece.

SARS-CoV-2 RNA Stability Analysis
The in silico stability analysis of SARS-CoV-2 RNA was carried out with the ScanFold algorithm [36]. The Wuhan-Hu-1 reference genome (NC_045512.2) was analyzed with ScanFold-Scan, using a 300 nt window with a 150 nt nucleotide step size, resulting in 198 analyzed windows. Each window was analyzed using the RNAfold algorithm that is included in the ViennaRNA package. For each window the minimum free energy (MFE) ∆G • structure and value was predicted using the Turner energy model at 18 • C.

Wastewater Sampling
The 24 h composite influent wastewater samples were collected from the WWTP of Athens (serves a population equivalent of 5,200,000 inhabitants). Influent wastewater samples were collected in precleaned high-density polyethylene (HDPE) bottles, transported on ice to the laboratory and stored at 4 • C. All the collected samples were analyzed immediately after the arrival at the laboratory.

Sample Concentration and RNA Extraction
Sample concentration was performed immediately after arrival using Polyethylene Glycol 8000 (PEG 8000, Promega Corporation, Madison, WI, USA) precipitation. In particular, 50 mL of an influent wastewater was centrifuged at 4750× g for 30 min at 4 • C to remove debris, bacteria, and large particles. The supernatant was transferred in a clean centrifuge tube, containing 3.5× g PEG and 0.8 g NaCl, mixed at ambient temperature until completely dissolved, and centrifuged at 10,050× g for 2 h, at 4 • C. Most of the supernatant was discarded without disturbing the pellet and the tube was centrifuged at 10,050× g for 5 min, at 4 • C. Finally, the pellet was reconstituted by 500 µL nuclease-free water.

First-Strand cDNA Synthesis
Total RNA template from wastewater samples was reverse transcribed in a 20 µL reaction containing 5.0 µL RNA, 1.0 µL of 10 mM dNTPs mix (Jena Bioscience GmbH, Jena, Germany), 100 U SuperScript III reverse transcriptase (Invitrogen, Carlsbad, CA, USA), 50 U RNaseOUT recombinant ribonuclease inhibitor (Invitrogen), and 1.0 µL of 50 µM random hexamers (Invitrogen). The mixture of total RNA, dNTPs, and random hexamers was incubated at 65 • C for 5 min, while the reverse transcription took place at 25 • C for 10 min, followed by 50 • C for 50 min. Enzyme inactivation was performed at 70 • C for 15 min. The AMPLIRUN SARS-CoV-2 RNA control (Vircell S.L., Granada, Spain) was used as the SARS-CoV-2 complete genome control.

Detection of SARS-CoV-2
Novel nested PCR and nested real-time PCR assays were designed and validated against the identified highly stable regions of SARS-CoV-2 RNA. The Wuhan-Hu-1 reference genome (NC_045512.2) was used for the in silico analysis and design of SARS-CoV-2 specific primers and fluorescent probes (Table S1).
A Veriti 96-well fast thermal cycler (Applied Biosystems, Carlsbad, CA) was used for the nested PCR assays. The 25 µL of the reaction consisted of 5.0 µL cDNA template (1st PCR) or 2.0 µL PCR product (2nd PCR), 1.0 µL of 10 mM dNTPs mix (Jena Bioscience GmbH), 500 nM of each forward/reverse primer, and 1 U of Kapa Taq polymerase (Kapa Biosystems, Inc., Woburn, MA, USA). The thermal protocol consisted of polymerase activation step at 95 • C for 3 min, followed by 15 cycles (1st PCR) or 40 cycles (2nd PCR) of denaturation at 95 • C for 30 s, primer annealing at 60 • C for 30 s and extension at 72 • C for 1 min, followed by a final extension step at 72 • C for 5 min. After the completion of the 2nd reaction, 10 µL of PCR product was electrophoresed on 1.5% w/v agarose gel, visualized with ethidium bromide staining, and photographed under UV light.
The probe fluorescent-based real-time PCR assays were performed in a 7500 Fast Real-Time PCR System (Applied Biosystems). The PCR product of the 1st conventional PCR, as described above, were used as a template for the real-time PCR assay (2nd reaction). The 20 µL reaction consisted of 2.0 µL PCR product, 10 µL Kapa Probe Fast Universal (2X) qPCR Master Mix (Kapa Biosystems), 500 nM of each of the forward, reverse primers, and 125 nM of fluorescent probe. The thermal protocol included an initial polymerase activation step at 95 • C for 3 min, followed by 40 cycles of denaturation at 95 • C for 15 s, and finally the primer/probe annealing and extension step at 60 • C for 1 min.

Targeted DNA-seq for the Mutational Analysis of SARS-CoV-2 in Wastewater
In-house developed targeted DNA-seq assays, using semi-conductor sequencing technology were performed for the analysis of SARS-CoV-2 variations/mutations in wastewater samples.
The Veriti 96-well fast thermal cycler (Applied Biosystems) was used for the nested PCR assays. The 25 µL of the reaction consisted of 5.0 µL cDNA template (1st PCR) or 2.0 µL PCR product (2nd PCR), 1.0 µL of 10 mM dNTPs mix (Jena Bioscience), 500 nM of each forward/reverse primer, and 1 U of Kapa Taq polymerase (Kapa Biosystems). The thermal protocol consisted of polymerase activation step at 95 • C for 3 min, followed by 20 cycles (1st PCR) or 40 cycles (2nd PCR) of denaturation at 95 • C for 30 s, primer annealing at 60 • C (most assays) or 57 • C (assays 5, 6, 10, and 14 of S gene analysis, Table S3) for 30 s and extension at 72 • C for 1 min, followed by a final extension step at 72 • C for 5 min. After the completion of the 2nd reaction, 10 µL of PCR product were electrophoresed on 1.5% w/v agarose gel, visualized with ethidium bromide staining, and photographed under UV light.
An Ion Xpress Plus Fragment Library Kit (Ion Torrent, Thermo Fisher Scientific Inc., Waltham, MA, USA) was employed for the construction of the DNA-seq library, using 1 µg of purified PCR product mix as input. Adapter ligation, nick-repair, and purification of the ligated DNA were carried out, based on the manufacturer's guidelines. The adapter-ligated library was quantified using the Ion Library TaqMan Quantitation Kit (Ion Torrent) in an ABI 7500 Fast Real-Time PCR System (Applied Biosystems).
The sequencing template was generated with emulsion PCR on an Ion OneTouch 2 System (Ion Torrent), using the Ion PGM Hi-Q View OT2 kit (Ion Torrent). Next, the Ion OneTouch ES instrument (Ion Torrent) was used for the downstream template enrichment procedure. Ultimately, semi-conductor sequencing methodology was carried out in the Ion Torrent PGM system for the sequencing of the amplicons.

Bioinformatics Analysis
Alignment of the sequencing reads to the reference genome was carried out using the Burrows-Wheeler Aligner (BWA-MEM) [38]. The BAM files were analyzed by the Integrative Genomics Viewer (IGV) v.2.8.12 software for the visualization and assessment of the alignment results.
To efficiently call variants from the derived NGS datasets, we used the iVar algorithm [39], which is designed to detect virus genomic variations (SNVs or indels) from amplicon-based sequencing assays, using the default parameters; iVar was also used to identify the corresponding codons and translate variants into amino acids, based on the SARS-CoV-2GFF file.

In Silico 3D Protein Folding Analysis
To examine the predicted structural modification caused by each detected genomic variant, 3D structure models were generated with the I-TASSER v.5.1 server [40]. For the detection of the structural differentiations between the mutated and corresponding "wild-type" polypeptides, only the 3D structure with the highest confidence score was taken into consideration.

Conclusions
We report, here, that SARS-CoV-2 detection in wastewater is assay/genomic region dependent, and that significantly improved detection specificity was achieved by the combination of novel nested PCR/real-time PCR assays against highly stable SARS-CoV-2 genomic regions. Furthermore, we adopted our nested PCR approach for SARS-CoV-2 mutational profiling and variant quantification in wastewater by targeted DNA-seq. Our analysis indicated the high prevalence of the D614G and P323L missense mutations within S and RdRP genes, respectively; the growing trend of Q57H mutation in ORF3a; and the overrepresentation of R203K with G204R or G204L mutations in nucleocapsid phosphoprotein. Moreover, our approach was successfully applied for the mutational profiling of the SARS-CoV-2 S gene in wastewater samples from March 2021, highlighting the prevalence of the B.1.1.7/alpha variant, in line with the fact that the B.1.1.7/alpha variant was dominant in clinical samples during the same period of the third pandemic wave in Greece. Further mutational/variant profiling of wastewater samples for a long period of time could clearly shed light on the value of a wastewater-based mutational/variant analysis serving as a leading indicator of SARS-CoV-2 variants spread at community/population levels. We anticipate that our novel methods will further support the global efforts for monitoring SARS-CoV-2 spreading, as well as for detecting the emergence of new viral variants.