SARS-CoV-2 whole-genome sequencing using reverse complement PCR: For easy, fast and accurate outbreak and variant analysis.

During the course of the SARS-CoV-2 pandemic reports of mutations with effects on spreading and vaccine effectiveness emerged. Large scale mutation analysis using rapid SARS-CoV-2 Whole Genome Sequencing (WGS) is often unavailable but could support public health organizations and hospitals in monitoring transmission and rising levels of mutant strains. Here we report a novel WGS technique for SARS-CoV-2, the EasySeq™ RC-PCR SARS-CoV-2 WGS kit. By applying a reverse complement polymerase chain reaction (RC-PCR), an Illumina library preparation is obtained in a single PCR, thereby saving time, resources and facilitating high-throughput screening. Using this WGS technique, we evaluated SARS-CoV-2 diversity and possible transmission within a group of 173 patients and healthcare workers (HCW) of the Radboud university medical center during 2020. Due to the emergence of variants of concern, we screened SARS-CoV-2 positive samples in 2021 for identification of mutations and lineages. With use of EasySeq™ RC-PCR SARS-CoV-2 WGS kit we were able to obtain reliable results to confirm outbreak clusters and additionally identify new previously unassociated links in a considerably easier workaround compared to current methods. Furthermore, various SARS-CoV-2 variants of interest were detected among samples and validated against an Oxford Nanopore sequencing amplicon strategy which illustrates this technique is suitable for surveillance and monitoring current circulating variants.


Introduction
In December 2019 China reported a group of patients with a severe respiratory illness caused by a thus far unknown coronavirus. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was identified as the causative agent [1]. Since then the outbreak has evolved into a pandemic with more than a hundred million infections and almost 4 million deaths worldwide by June 2021 [2]. Healthcare systems, governments and society as a whole are under pressure, working to reduce the spread of SARS-CoV-2 by large scale testing and vaccination. The start of vaccination coincided with reports of new variants of SARS-CoV-2, variants with specific mutations in the spike protein reported to be associated with either an increase in infectiousness or a possible reduction of vaccine effectiveness [3][4][5][6][7]. Defined variants of concern or interest are already being detected in multiple countries and the proportion of mutants in the population is increasing. This poses a new challenge besides the already large-scale testing of the community. Current testing is based on RT-PCR detection of SARS-CoV-2 in naso-or oropharyngeal swabs. If tested SARS-CoV-2 positive, instructions are to self-isolate at home and source finding and contact tracing is performed. In a hospital setting, the same procedures are in place to identify patients and personnel at risk of infection. Contact tracing is time consuming both in-and outside a hospital setting and when numbers of infections are high, the public health capacity reaches the limits of feasibility of thorough source and contact tracing investigations [8]. Routine sequencing of the SARS-CoV-2 genome from positive samples provides crucial insights into viral evolution and can support outbreak analysis [9,10]. Current Whole Genome Sequencing (WGS) workflows often require cumbersome preparation, are laborious to implement for high throughput screening complicating widespread implementation. Using the Reverse Complement Polymerase Chain Reaction (RC-PCR) Easy-Seq™ RC-PCR SARS-CoV-2 WGS kit (NimaGen, Nijmegen, The Netherlands), which integrates tiled target amplification with Illumina library preparation, has a simple workflow with minimal hands-on time.
The current study evaluates the technique and shows the application in the detection of variants of interest. Additionally, a set of epidemiologically linked cases was used to illustrate its added value in detecting potential transmission events in public health and hospital settings.

Material and methods
In this study we assessed the performance of the novel RC-PCR sequencing technology applied to SARS-CoV-2, EasySeq™ RC-PCR SARS-CoV-2 WGS kit (NimaGen, Nijmegen, The Netherlands).

March -September 2020
Naso-and oropharyngeal swabs collected in UTM or GLY medium of 173 SARS-CoV-2 positive and 15 negative samples were collected from healthcare workers and patients at the Radboud university medical center. Among these samples, 6 outbreak clusters defined by our hospital infection prevention and control (IPC) team were included. 64 samples were collected and tested on behalf of the local public health service. These were samples of people living in the defined public health region surrounding our hospital.

January -May 2021
Naso-and oropharyngeal swabs collected in UTM or GLY medium from patients and healthcare workers who tested SARS-CoV-2 positive at the Radboud university medical center during from January to May 2021 were included (n = 171) to determine lineage and the presence of variants of interest within our hospital population.

Variant panel
Seven cultivated SARS-CoV-2 samples of various lineages previously sequenced by the national public health authority of the Netherlands (RIVM) using Oxford Nanopore technologies amplicon strategy [10].
All personal data of patients, HCW and public health service samples was anonymized. Cluster information was provided anonymously by the IPC team and the regional public health service.
Detailed descriptions on the included samples can be found in Supplementary Data.

Real-time polymerase chain reaction (RT-PCR)
SARS-CoV-2 RT-PCR was performed on all samples during routine diagnostics. RNA was isolated using Roche COBAS 4800 (Roche Diagnostics Corporation) with a CT/NG extraction kit according to the manufacturers protocol. RT-PCR with primers targeting the envelope (Egene) was used as described by Corman et al., and performed on a LightCycler 480 (Roche Diagnostics Corporation) using Roche Multiplex RNA Virus Mastermix [11].

Reverse complement polymerase chain reaction (RC-PCR)
For all samples, RNA isolation was repeated on the MagnaPure 96 (Roche Diagnostics Corporation) using Small Volume isolate protocol with 200 µl of sample and eluting isolated RNA in 50 µl. cDNA-synthesis was performed using either Multiscribe RT (Applied Biosystems, USA) or LunaScript® RT SuperMix Kit (New England Biolabs, USA) with respectively 5 or 6 µl of RNA input. Whole genome sequencing (WGS) was performed in 6 independent runs using the EasySeq™ RC-PCR SARS-CoV-2 WGS kit (NimaGen, Nijmegen, The Netherlands). Figs. 1 and 2 show a detailed description of the technology in which two types of oligo's are used to start the targeted amplification. The universal sequence hybridizes with the SARS-CoV-2 target specific primer creating the RC-PCR primer which includes the specific SARS-CoV-2 primers with Unique Dual Index (UDI) and adapter sequences. This in contrast to other techniques where multiple steps are needed to add sequence adapters and UDI's. Thus, a regular PCR-system can be used to produce SARS-CoV-2 specific amplicons ready for sequencing. The kit uses 155 (v1) and 154 (v2-v3) newly designed primer pairs with a tiling strategy as previously implemented in the ARTIC protocols [12]. The primer pairs are divided in two pools, A and B. Pool A contains 78 or 77 primers (v1 and v2-v3 respectively) and Pool B contains 77 primers. This strategy requires two separate RC-PCR reactions but ensures there is minimal chance of forming chimeric sequences or other PCR artifacts (Fig. 2). After the PCR, samples of each plate are pooled into an Eppendorf tube, resulting in two tubes, for pool A and B, respectively. These are individually cleaned using AmpliClean™ Magnetic Bead PCR Clean-up Kit (NimaGen, Nijmegen, The Netherlands). Afterwards, quantification using the Qubit double strand DNA (dsDNA) High Sensitivity assay kit on a Qubit 4.0 instrument (Life Technologies) is performed and pool A and B are combined. The amplicon fragment size in the final library will be around 435 bp with a 298 bp SARS-CoV-2 genomic insert. Next Generation Sequencing (NGS) was performed on an Illumina MiniSeq® using a Mid Output Kit (2 × 149 or 2 × 151-cycles) (Illumina, San Diego, CA, USA) by loading 0.8 pM on the flowcell. The first two sequence runs were conducted using version 1 of the EasySeq™ RC-PCR SARS-CoV-2 WGS kit on a large variety of Ct-values (Ct 16 -41) using the standard protocol provided by NimaGen. The additional sequence runs were conducted using version 2 or version 3 of the EasySeq™ SARS-CoV-2 WGS kit using a balanced library pooling strategy based on estimated cDNA input according to the manufacturer's protocol.

Technical evaluationmean sequence depth plots
The mean sequence depth of the SARS-CoV-2 genome is plotted for 3 versions of the EasySeq™ RC-PCR WGS kit, each 2 runs (Fig. 3). Mean depths are centered on a depth of 2-3 log 10 . Version 1 (v1) of the EasySeq™ RC-PCR kit was not able to amplify all coding regions, 6280-6407 (amplicon 35) and 9525-9737 (amplicon 51) both located on ORF1ab were missed (Fig. 3). For version 2 (v2), new amplicons were designed and added resulting in coverage of all coding regions as illustrated in Fig. 3. As for version 3 (v3) another low covered region can be observed which is the dominant ORF1ab:S3675_F3677-in the B.1.1.7 (Alpha) variant (Run1_v3). In v3, a new amplicon design of the primers covering the Spike HV69-70 deletion has been designed to more accurately detect this region. HV69-70-is clearly visible in Fig. 3(v3). To give a better representation of the coverage of the Spike gene, mean sequence depth plots specifically on the Spike gene were generated (Fig. 4).
Results of the mean coverage plots of the Spike gene show a mean sequence depth of around 2-3 log 10 for v1 and 2, and in v3 on average one log 10 higher. The coverage plot of v3 three regions can be observed with lower coverages. The first two regions are S:HV69-70-and S:Y144-, which are known deletions of the Alpha (B.1.1.7) variant and dominant during time of screening [23]. The third region is a larger region (23,431), this region has lower coverage due to less efficient amplification of amplicon 123. This Alpha variant was not present during using v1 and only limited present during v2 (Fig. 4).
Additionally, the effect of viral load on the SARS-CoV-2 genome coverage was examined (Fig. 5 (Cluster 1,2,3,6, and the HCW). In four out of six outbreak clusters (Cluster 3,4,5,6) defined by the infection prevention team, sequencing results support previously identified epidemiological information. However, some samples within these epidemiologically defined clusters were excluded based on phylogenetic placement, for instance, one sample of Cluster 5 is not part of lineage B.1.8 (Fig. 6).
Within the 64 community samples, samples from nine people clustered together (Fig. 6, part of lineage B.1.22). The public health service confirmed a cluster seen within this group of samples.  [23]. Furthermore, in 96.4% of all samples of the Alpha variant Spike mutation S:D614G was observed (Fig. 7A). Additionally, mutation S:D614G was found in all circulating lineages.

SARS-CoV-2 lineage and variant detection verification
We tested seven samples to validate if lineage determination and Spike variant detection matches between another broadly used sequencing method. The SARS-CoV-2 samples were sequenced by the national public health authority of the Netherlands (RIVM) using Oxford Nanopore Technologies (ONT) sequencing using an amplicon strategy as described by Oude Munnink et al., [10]. Results show 100% consensus on lineage outcome and 100% identical detection for all Spike gene mutations specific to each lineage (Fig. 7B).

Discussion
This study describes the first application of Reverse Complement-PCR implemented in the EasySeq™ RC-PCR SARS-CoV-2 WGS kit to sequence the SARS-CoV-2 genome. This novel method combines target amplification and indexing in a single procedure, directly creating a sequencing ready Illumina library. Using this method, epidemiological clusters from the hospital and the community were supported by phylogenetic outbreak analysis. Additionally, circulating SARS-CoV-2 lineages and defined variants of concern could be identified and monitored. Using RC-PCR, samples with Ct values up to 30 as determined by RT-PCR could be sequenced with a high SARS-CoV-2 genome coverage. With optimization of the protocols, bioinformatic analysis, and the kit itself, it is expected that performance can be increased. As was already seen with the switch from Multiscribe RT to Lunascript RT for the reverse transcriptase, changes to primers between kit versions and optimization of the bioinformatic analysis resulting in higher genome coverage at higher Ct values and a more accurate mutation detection.
Previous studies showed the benefit of using WGS of SARS-CoV-2 for outbreak investigation purposes and to study transmission routes [10,[24][25][26][27][28]. Several methods have been optimized for this purpose. The ARTIC Illumina method, a tiling multiplex PCR approach, was the first that enabled WGS of SARS-CoV-2 using Illumina sequencers [29]. The technique has subsequently been optimized and analysis, albeit in small sample numbers, concluded that it delivers sufficient quality to perform phylogenetic analysis [30][31][32]. It had been used as targeted and random RT-PCR screening with subsequent sequencing of the population in order to study the spread of SARS-CoV-2 through the community [24]. Sikkema et al., showed the use of SARS-CoV-2 sequencing in healthcare associated infections and identified multiple introductions into Dutch hospitals through community-acquired infections [9].
During this pandemic many advances have been made in WGS of SARS-CoV-2 [33]. It should be noted that for the EasySeq™ RC PCR SARS-CoV-2 WGS kit two of the primer pairs using v1 of EasySeq™ were suboptimal. Improvements were seen in version 2 and 3 of the kit, which resulted in an increase in genome coverage to a maximum 98.9% and 99.5% respectively between versions 2 and 3. Currently, EasySeq™ is able to retrieve nearly 100% genome coverage, our study shows that the technology is very useful for phylogenetic analysis and mutation and variant detection of SARS-CoV-2. The EasySeq™ RC-PCR SARS-CoV-2 WGS kit uses 155 amplicons which makes it susceptible to amplicon dropouts (regions of no sequence coverage) as a result of accumulation of mutation on primer binding locations, this was observed in this study but also has been reported using other amplicon designs [34,35,36]. The introduction of the SARS-CoV-2 B1.1.7 (Alpha) variant caused problems to properly detect S:HV69-70-using v1 and v2. The limitation of detecting S:HV69-70-has been solved by the new design in v3. In line with the mutation rate of SARS-CoV-2 probably more adjustments to the RC-PCR primers have to be made in the future to ensure retrieval of full SARS-CoV-2 genome sequences which is apparent to amplicon-based assays [34]. Vice versa, the high number of amplicons also limits the size of the dropout making it less vulnerable for losing a large portion of the SARS-CoV-2 genome.
Regardless of high or low infection rates, real-time sequencing of SARS-CoV-2 positive samples could be used to target infection prevention measures nationwide and locally [37]. Its application can range from incidental cluster analysis to support or reject epidemiological related cases to real-time surveillance in the community or health care institutes. The latter surveillance strategies have already been implemented in the Netherlands recently due to the emergence of new variants of interest especially related to infectiousness, clinical outcome and vaccine effectiveness [38,39].
With this study we evaluated the performance of the EasySeq™ RC-PCR SARS-CoV-2 WGS kit, although this gives a good impression of how well this novel RC-PCR technology works we want to emphasize that there are limitations to this evaluation. It is difficult to compare sensitivity of the assay to other available SARS-CoV-2 WGS strategies. Other studies also demonstrate its performance by comparing the genome completeness to Ct values however, one-on-one comparison of these values is limited [33]. Preferably DNA copies or viral copies as viral load would be more valuable but are difficult to obtain.
In conclusion, this study shows the first application of RC-PCR in the field of medical microbiology and infectious diseases. Results confirm the robustness of the method which requires less hands-on time compared to current sequencing methods and can be used for high throughput sequencing of SARS-CoV-2. WGS of SARS-CoV-2 accompanied with bioinformatic analysis support the identification of chains of transmission of SARS-CoV-2 and the spread of different lineages including mutation profiles and variant detection. This enables a rapid, targeted and adaptive response to an ongoing outbreak that has great impact on public health and society.

Data availability
Tailored variant analysis pipeline can be found on https://github.co m/JordyCoolen/easyseq_covid19 SARS-CoV-2 metadata and GISAID is available in the Supplementary Data.

Funding/support
The EasySeq™ RC-PCR SARS-CoV-2 WGS version 1 kit was supplied by NimaGen and sequencing of the first two Illumina libraries was performed by NimaGen. Further sequencing was performed by the Department of Medical Microbiology at the Radboud university medical center for the purpose of using the technology in routine diagnostics and support the national surveillance program. Therefore, no other funding was applied for.

Role of funder/sponsor
NimaGen had no role in the design and conduct of the study; collection, management, data analysis; preparation or approval of the manuscript.

Declaration of Competing Interest
The authors have no conflict of interest to disclose.