A robust, cost-effective and widely applicable whole-genome sequencing protocol for capripoxviruses

Highlights • Robust method for the genomic characterization of all Capripoxviruses.• Pre-sequencing enrichment method based on targeted long-range PCR amplification.• Method applicable to low titre samples such as blood samples and vaccine batches.• Viral DNA enrichment compatible with various sample types and sequencing platforms.• Complete coding genome sequencing evaluated on three different sequencing platforms.


Introduction
Capripoxviruses (CaPV) represent one of the nine genera currently recognized within the Chordopoxvirinae subfamily of the Poxviridae family. The genus Capripoxvirus is comprised of lumpy skin disease virus (LSDV), goatpox virus (GTPV), and sheeppox virus (SPPV). All CaPVs are responsible for economically important diseases in domestic ruminants that threaten the livelihoods of smallholder farmers and poor rural communities in endemic regions (Tuppurainen et al., 2017a;Tuppurainen and Oura, 2012) and recently also farming communities in free regions (Tuppurainen et al., 2017b). Due to their potential for rapid spread and substantial economic impact, the World Organization for Animal Health (OIE) has categorized all CaPV diseases as notifiable.
Sheeppox (SPP) and goatpox (GTP) are endemic in small ruminants in Central and North Africa, the Middle East, and various parts of Asia. In cattle, lumpy skin disease (LSD) was initially confined to the African continent and the Middle East but recently spread to Europe and Asia (Roche et al., 2020;Tuppurainen et al., 2017a). All three diseases have made several incursions in southern Europe but were successfully controlled by stamping out or mass vaccination. In 2019, LSD reached the Eastern part of Asia being reported in India, Bangladesh and China (Gupta et al., 2020;Lu et al., 2020) and in 2020 the disease spread further to South-East Asia (Flannery et al., 2021;Roche et al., 2020;Tran et al., 2021). The recent spread of LSDV into disease-free areas in the Eurasian continent has brought renewed attention to these neglected diseases.
The CaPV genome consists of a linear double-stranded DNA molecule that is about 150 kb in length and encodes between 147 and 156 open reading frames (ORFs). The nucleotide composition is approximately 75 % A + T and uniformly distributed along the entire genome. The genome structure is similar to that of other chordopoxviruses with the central region containing the more conserved genes that are involved in basic viral replicative processes. By contrast, genes that are species-or hostspecific are distributed toward the end of the genome (Gubser et al., 2004). Two identical inverted terminal repeats (ITRs) are located at both ends of the genome. Capripoxvirus genomes are very similar to each other with a sequence identity of no less than 96 % across their entire length (Tulman et al., 2002(Tulman et al., , 2001. As the different CaPV species cannot be distinguished morphologically or serologically, CaPVs were originally classified according to the host species from which they were isolated. Although most SPPV or GTPV isolates cause more severe disease in either sheep or goats, some isolates are equally pathogenic in both animal species (Babiuk et al., 2008). Moreover, some wild ruminants are known to be susceptible to LSDV (Greth et al., 1992;Le Goff et al., 2009;Young et al., 1970), which further complicates this host-based classification system. Reliable differentiation between the three CaPV species is therefore only possible at the molecular level. Several real-time PCR (qPCR) assays have been developed to detect and differentiate CaPV species. All of these assays are based on the detection of species-specific single nucleotide polymorphisms (SNPs) but target different regions within the genome, namely the G-protein-coupled chemokine receptor (GPCR) gene (Lamien et al., 2011), the 30 kDa RNA polymerase subunit (RPO30) gene (Gelaye et al., 2013) or the homolog of the variola virus B22R gene (Chibssa et al., 2019). Although these qPCR assays allow to identify the different CaPV species, they lack the discriminatory power to distinguish closely related strains (Mathijs et al., 2016c). Differentiation at strain level requires a much higher resolution that can only be obtained by comparing whole genome sequence data.
Unfortunately, CaPVs are difficult to sequence due to their large genome size, skewed nucleotide composition, and the presence of multiple homopolymeric and repetitive sequences. Although clinical samples may contain high viral loads, viral DNA represents only a small fraction of the total DNA that is present in these samples. To increase the signal (viral genome) to noise (host genome) ratio, most CaPV sequencing protocols require high concentrations and relatively large volumes of viral DNA that can be obtained only by virus isolation. Similar to other poxviruses, the first CaPV genomes were obtained using a shotgun sequencing approach. Viral DNA was isolated from infected cell cultures and randomly fragmented by incomplete enzymatic digestion. DNA fragments within a predefined size range were cloned and sequenced using Sanger's dideoxy chain termination method and a primer walking strategy (Kara et al., 2003;Tulman et al., 2002Tulman et al., , 2001. With the advent of high-throughput sequencing, the labor-intensive cloning step could be omitted which greatly reduced the cost and time needed to obtain complete genomes. To increase the number of viral reads, the viral DNA is enriched before sequencing by purifying and/or concentrating the virus particles (Biswas et al., 2020;Douglass et al., 2019;Flannery et al., 2021;van Schalkwyk et al., 2020;Wolff et al., 2020;Zeng et al., 2014). Although these protocols allow to measure genetic variation on a genome-wide scale, they still rely on virus isolation to obtain sufficient viral-enriched DNA. Unfortunately, virus isolation is time consuming and requires appropriate cell-culture and biocontainment facilities that are not available to all laboratories.
This study aimed to develop a robust, cost-effective and widely applicable method that allows to generate (nearly) complete CaPV genomes directly from clinical samples or commercial vaccine batches. Taking advantage of the genetic similarity of CaPVs, a set of pan-CaPVs long-range PCRs (LR-PCRs) was developed that spans the entire genome using only a limited number of amplicons. The resulting PCR amplicons can be sequenced on all currently available high-throughput sequencing platforms to reconstruct (nearly) complete genomes in less than a week.

Samples
All three CaPV species and different sample matrices (cell culture, vaccine batch, and clinical sample) were included in this study (Table 1). The attenuated LSDV strain SA-Neethling was kindly provided by Dr. E. Tuppurainen (at that time working at The Pirbright Institute, United Kingdom). The isolate was passaged three times in OA3.Ts cells (ATCC: CRL-6546) grown in Dulbecco's Modified Eagle's Medium (Life Technologies) supplemented with 5% foetal calf serum (Life Technologies), 0.10 mg/l streptomycin, and 100 000 IU/l penicillin. Other samples consisted of a skin lesion from an experimentally infected cow with the attenuated LSDV strain SA-Neethling (bioethical approval # 20150605− 01), as well as cell culture or clinical samples from recent LSDV field strains and CaPV vaccine batches from different manufacturers (Table 1).

DNA purification
Sample treatments prior to DNA extraction differed according to sample type. Twenty-five milligrams of skin sample were homogenized in 180 μL phosphate-buffered saline (PBS) and 25 μL Puregene proteinase K using a TissueLyser (Qiagen) followed by an overnight incubation at 56 • C at 200 rpm on a ThermoMixer C (Eppendorf). The freezedried vaccine pellets were simply resuspended in 3 mL PBS and used immediately.
To eliminate host DNA, the skin homogenate, PBS dissolved vaccine pellets, and cell culture supernatant were treated with Baseline-ZERO DNase (Epicenter) according to the manufacturer's instructions. The remaining DNA was extracted using a Puregene extraction kit (Qiagen) according to instructions provided by the manufacturer. The viral load of the purified DNA samples was assessed using a probe-based qPCR assay designed to detect all CaPVs in the conserved D5R region (Haegeman et al., 2013).

Primer design for long-range PCR amplification
Nine full-length genome sequences representing all 3 CaPV species (Supplementary Table 1) were downloaded from GenBank and aligned using MAFFT v7.305b (Katoh and Standley, 2013). At first, the alignment was used to design 30 primer pairs that generate tiled amplicons of about 5.0-5.5 kbp in size (Supplementary Table 2). At a later stage, the number of primer pairs was reduced so that the entire genome could be covered by only 23 overlapping amplicons of about 7.5 kbp in size (Table 2).

Long-range PCR amplification
LR-PCR reactions were set up as follows (all mentioned concentrations are final concentrations): 1x Q5 High-Fidelity DNA Polymerase reaction buffer (New England Biolabs), 1 M betaine, 0.5 μM of both forward (pf) and reverse (pr) primer, 0.4 mM CleanAmp dNTPs (TriLink Biotechnologies), 1 U of Q5 High-Fidelity DNA polymerase, 5 μL of DNA template, and distilled water up to a volume of 50 μL. Cycling conditions were as follows: 98 • C for 3 min (initial denaturation), 35 cycles of 10 s at 98 • C, 30 s at 63 • C and 7 min at 72 • C (amplification), followed by 2 min at 72 • C (final elongation). The PCR products were stained with Midori Green Direct (Nippon Genetics) before being visualized on 1% agarose gels. Only single products of the expected size were purified using the Agencourt AMPure XP system (Beckman Coulter) according to the manufacturer's instructions. Purified DNA was quantified on a Quantus Fluorometer (Promega) using the QuantiFluor dsDNA System (Promega).

Library preparation and sequencing
To assess the robustness and wide applicability of the protocol, 3 sequencing platforms relying on different technologies were evaluated in the present study (Fig. 1).

Illumina (ILL) sequencing
In order to distinguish the short sequence reads from both ITRs, two libraries (Lib1− 12 and Lib12− 23), each comprising a pool of 12 LR-PCR amplicons corresponding to half of the CaPV genome, were prepared using a Nextera XT DNA Library Preparation Kit (Illumina) according to the manufacturer's instructions. An additional library (Lib-DNA) was prepared directly from 1 ng of purified DNA (i.e. without PCR enrichment). The libraries were quantified with a Kapa Library Quantification Kit (Roche) and the insertion size was verified on a Bioanalyzer 2100 system using a High-Sensitivity DNA Kit (Agilent Technologies). Twelve femtomoles of each barcoded library were pooled prior to sequencing on a MiSeq System using the MiSeq Reagent Kit version 3, 2 × 300 bp (Illumina). Sequencing was performed at the Neuromics Support Facility -VIB Genomics Core (UAntwerp, Belgium).

PacBio (PB) sequencing
The LR-PCR amplicons covering the entire CaPV genome were pooled in equimolar quantities to a final amount of 2 μg for singlemolecule real-time (SMRT) sequencing (PacBio). SMRTbell libraries were prepared using the SMRTbell Template Prep Kit 1.0 (PacBio). To filter out smaller fragments and SMRTbell dimers, the libraries were size selected on a BluePippin 0.75 % Gel cassette (Sage Science) using the "High Pass V3 ′′ protocol (collection protocol settings: 5.5 kb -9.5 kb). P6-C4 sequencing was performed using a single SMRT cell on a PacBio RSII sequencer with a movie time of 240 min (PacBio) at the Genomics Core Leuven (KU Leuven, Belgium).

PacBio RSII sequence data
For the PacBio RSII data analysis, quality filtering was performed using SMRT Portal v2.3.0 (PacBio) with the following settings: minimum subread length = 3000, minimum polymerase read quality = 80, and minimum polymerase read length = 3000. Consensus sequences for each PCR amplicon were obtained de novo using the "RS_Long_Amplicon_Analysis.1 ′′ protocol included in SMRT Portal v2.3.0. with default parameters except for the Maximum Number Of Subreads which was set to 4000.The resulting amplicon sequences were assembled into a single contig using iAssembler software v1.3.2 (Zheng et al., 2011).

Comparison of assembled sequence data
The quality of the de novo assemblies obtained from the different sequencing strategies was assessed and compared using Quast v5.0.2 (Mikheenko et al., 2018). Coverage depth across the SA-Neethling genome was plotted in R using the ggplot2 package (v4.0.2) (Wickham, 2016).

Sequencing of genome termini and Sanger sequencing confirmation
Like all poxviruses, CaPV genomes contain a stretch of nucleotides, called functional resolution sequence (FRS), that is situated between the telomeres and the first tandem repeat (Merchlinsky, 1990). This highly conserved sequence was used to amplify and sequence the 5 ′ and 3 ′ termini of the CaPV genome with primer pair: FRS TTTTA-TAGGCTTAAAAAAAAGTATAATATTG and ORF1 ATTTTAGCAA-GAGCAGCAGAATATTGG. The PCR reaction was set up as follows (final concentrations): 1x Q5 High-Fidelity DNA Polymerase reaction buffer (New England Biolabs), 1 M betaine, 0.5 μM of both FRS and ORF1 primers, 0.4 mM CleanAmp dNTPs (TriLink Biotechnologies), 1 U of Q5 High-Fidelity DNA polymerase, 5 μL of DNA template, and distilled water up to a volume of 50 μL. Cycling conditions were as follows: 98 • C for 3 min (initial denaturation), 35 cycles of 10 s at 98 • C, 30 s at 63 • C and 1 min at 72 • C (amplification), followed by 2 min at 72 • C (final elongation).
Discrepancies between the full-length consensus sequences and previously published vaccine-associated Neethling genomes (n = 7; accession numbers: AF409138, MN636838− 43) were confirmed or refuted by dideoxy chain terminator sequencing (Sanger sequencing). Sequencing was performed on an Applied Biosystems 3130 Genetic Analyzer Sequencer using the BrightDye Terminator Cycle Sequencing kit (Nimagen) according to the manufacturer's instructions.

Genome annotation
Open reading frames were predicted using both NCBI ORF-Finder (http://www.ncbi.nlm.nih.gov/orffinder/) and GATU relative to the Neethling vaccine LW 1959 sequence (AF409138). The annotated full genome sequence can be accessed at GenBank accession number MW435866.

Primer design and long-range PCR amplification
Due to the AT-rich nature of the CaPV genome, finding suitable regions for primer design was challenging. Nevertheless, taking advantage of the high genetic relatedness between CaPV species, we were able to select primer sets sharing similar annealing temperatures (63 • C) and generating similar amplicon lengths (5.5 kbp) ( Table 2, Supplementary  Table 2). The LR-PCR protocol was optimized to obtain only single amplicons of the expected size that overlap each other by at least 1 kb and cover the entire coding CaPV genome (Mathijs et al., 2016a(Mathijs et al., , 2016bVandenbussche et al., 2016). Crossing point (Cp) values obtained from extracted DNA in the CaPV-D5R qPCR assay ranged between 13.43 and 29.52. To further reduce costs and hands-on-time, the PCR protocol was modified by increasing the amplicon lengths from 5.5 kbp to 7.5 kbp, which greatly reduced the number of amplicons needed to cover the entire genome. Twenty-three single amplicons of 7.5 kbp were obtained for each of the CaPV DNAs and different sample types (LSDV attenuated SA-Neethling isolate, Fig. 2; GTPV Caprivac vaccine batch, Supplementary Fig. 1; SPPV Jovivac vaccine batch, Supplementary Fig. 2; LSDV attenuated SA-Neethling skin sample, Supplementary Fig. 3). DNA from LSDV field strains Evros/GR/15, Kubash/KAZ/16, 210LSD-249/BUL/16, 20L42_Quyet-Thang/VNM/20, 20L43_Ly-Quoc/VNM/20, 20L70_Dinh--To/VNM/20, 20L81_Bang-Thanh/VNM/20 was amplified similarly (Agianniotaki et al., 2017;Mathijs et al., 2020a, , submitted for publication). In total, successful Capripoxvirus genome PCR enrichment was demonstrated for six skin lesion samples, one blood sample, two cell culture supernatants, and six vaccine batches (Table 1). For all CaPV samples, the minimum required amount of purified DNA per amplicon (250 ng) was obtained allowing the preparation of equimolar amplicon pools for both short (MiSeq-Illumina) and long-read sequencing (RSII--PacBio, MinION-ONT) platforms. The coding sequences of the different vaccine batches and clinical samples from Greece, Kazakhstan, and Bulgaria were already reported previously (Agianniotaki et al., 2017;Mathijs et al., 2020b,). The sequencing results from the pooled 7.5 kbp amplicons and extracted DNA of the attenuated LSDV SA-Neethling isolate were further detailed in this study to evaluate the impact of the different sequencing strategies.

Data output from the different sequencing strategies
The data output obtained from the different sequencing strategies for the attenuated LSDV SA-Neethling isolate after quality filtering/trimming is summarized in Table 3. MiSeq sequencing of the Nextera XT (Illumina) libraries Lib-DNA, Lib1− 12, and Lib12− 23 generated reproducible read numbers and read lengths. The raw sequencing data for these libraries have been submitted to the Short Read Archive under the BioProject number PRJNA661421. As reported earlier for MiSeq Reagent Kit version 3 (2 × 300 bp), the quality of the second read (R2) was systematically lower than that of the first read (R1), which resulted in shorter reads after quality trimming. The data generated by the SMRTcell loaded with a single SMRTbell (PacBio) library is in line with expectations with over 70,000 reads with an average length of 14 kb. The R9.4.1 MinION Flow Cell (Oxford Nanopore Technologies) generated 1,241,611 base-called reads from which 579,266 were left after quality trimming/filtering. Due to the poor quality of the majority of the reads generated, only 5173 out of the 78,844 Flongle reads remained after quality filtering (Table 3).
As expected, over 99.9 % of the reads obtained after PCR enrichment (ILL-PCR, PB and ONT) map to the CaPV genome whereas only 9.3 % of

Read assembly, assembly quality assessment, and genome coverage
To assess the advantages and limitations of the different sequencing platforms, the resulting de novo assemblies were evaluated for accuracy and completeness using QUAST (Table 4, Gurevich et al., 2013). Mismatches and insertions/deletions (InDels) of the de novo assemblies were determined relative to the reference sequence MN636842 (van Schalkwyk et al., 2020). Amplicon sequencing data from the different sequencing platforms (ILL-PCR, PB, ONT-R9.4.1, ONT-FLG) could be assembled into a single, near full-length consensus sequence whereas the de novo assembly of data generated without PCR enrichment (ILL-DNA) only yielded a multitude of contigs (Table 4). Nevertheless, the ILL-DNA dataset allowed to cover 95.86 % of the mapped reference genome. The most accurate consensus sequences were obtained for the ILL-PCR and PB assemblies, both in terms of number of mismatches and number of InDels per 100 kb. Due to the low basecalling accuracy in homopolymer regions, both the ONT-R9.4.1 and ONT-FLG assemblies contained a markedly higher number of InDels with no less than 110.45 and 56.56 indels per 100 kb, respectively. Fig. 3 depicts the depth of coverage of the SA-Neethling genome for all sequencing platforms that were used in this study. The coverage depths were highly variable between platforms but mainly rely on the sequencing effort per sample. For ILL-DNA and ILL-PCR, the average depth of coverage was 95.5 and 4676, respectively. For ILL-DNA, 24.6 % of the bases were covered by less than 20 reads whereas the minimum coverage for ILL-PCR was 67. High variation in coverage depth across the genome was observed for ILL-DNA and ILL-PCR due to the enzymatic tagmentation step during Nextera XT library preparation (Fig. 3 A, B). As there is no fragmentation required for sequencing on third-generation platforms (PB and ONT), all 23 amplicons and their overlaps can be clearly identified on the coverage plots (Fig. 3 C, D, E).

Genome finishing, genome annotation and comparison with published Neethling strain genomes
Genome finishing was performed on the consensus sequence that was generated from the ILL-PCR and PB contigs. The genome termini and discrepancies with closely related Neethling strains were analysed by Sanger sequencing. When aligned to publicly available full-length LSDV genomes, the SA-Neethling consensus sequence shared over 99.9 % Table 3 Sequencing data obtained for the cell-cultured attenuated SA-Neethling LSDV strain with the different sequencing strategies applied in this study.

Table 4
De novo assembly statistics obtained from the data generated by the different sequencing strategies used to determine the coding genome of SA-Neethling. Statistics are based on contigs of size > = 5000 bp, unless otherwise noted (e. g., "# contigs (> = 0 bp)" include all contigs).  nucleotide identity with LSDVs in subgroup 1.1, which contains both Neethling-type vaccine viruses and virulent vaccine-associated viruses from South Africa (van Schalkwyk et al., 2020). Only 53 variable sites (23 SNPs and 29 InDels) were observed between the SA-Neethling consensus sequence and the Neethling vaccine LW1959 (AF409138). All variable sites and their impact on ORF lengths or amino acid changes are given in Supplementary Table 3. Ninety-one percent (n = 21) of the SNPs were located in coding regions of the genome with 76 % of them (n = 16) resulting in an amino acid change. Similarly, most of the InDels (82 %, n = 24) were located in one of the ORFs. Compared to the Neethling vaccine LW1959, nine InDels caused a frameshift that disrupted the stop codon and resulted in elongation of the ORFs LW019, LW026, LW086, LW087, LW131, LW134, and LW144. Two internal amino acids were inserted in ORF LW022, while 3 amino acids were deleted in ORF LW076.

Discussion and conclusions
In contrast to many RNA viruses, CaPVs display only low levels of genetic diversity both within and between species. Although sequencing specific regions (e.g. RPO30 and GPCR) allows to discriminate vaccine from field strains, none of these regions contain sufficient information to study possible lines of transmission within an outbreak or epidemic (Mafirakureva et al., 2017;Molini et al., 2018;Sudhakar et al., 2020). Comparison of the whole genome sequences of recent LSDV isolates from the 2015-2016 epidemic in southern Europe revealed only a limited number of point mutations between the isolates (Agianniotaki et al., 2017;Mathijs et al., 2020b,). Moreover, the characterization of full-length LSDV genomes isolated from the field exhibit probable recombination events between wild-type and vaccine LSDV isolates (Sprygin et al., 2020(Sprygin et al., , 2018. Whole genome sequencing is therefore essential when comparing closely related CaPV strains in outbreak or epidemic investigations and the identification of recombinant CaPVs as it allows to capture all genetic variation at once. The advent of HTS technologies has enabled researchers to measure genetic variation on a genome-wide scale at single nucleotide resolution (Mathijs et al., 2016c). Theoretically, these novel technologies are ideal to study large DNA viruses such as CaPVs. Unfortunately, the proportion of viral nucleic acids in clinical samples is usually very small compared to the host-derived nucleic acids. As a consequence, direct sequencing of all nucleic acids in a sample (i.e. metagenomics) rarely generates sufficient viral reads to reconstruct the entire genome or study the diversity within the viral population. The results obtained with the unenriched sample (Lib-DNA) are therefore not surprising. Although we were able to cover 95 % of the genome, the depth of coverage was highly variable across the genome, ranging from 1 to 1152 reads (Fig. 3A). However, it should be noted that a high titre cell cultured viral stock was used in the present study. Less coverage will be expected for clinical samples or vaccine batches (low titre cell culture). A viral enrichment step is therefore indispensable if one wishes to distinguish variable sites reliably. Over the years, several enrichment strategies have been developed to focus sequencing efforts on the viral nucleic acids of interest. These include (i) virus isolation and concentration, (ii) host nucleic acid depletion, (iii) target capture using virus-specific probes and (iv) virus genome amplification using conserved primer sets.
We specifically chose a genome amplification strategy as it is a wellestablished method that is relatively straightforward and both highly specific and sensitive. In contrast to other CaPV sequencing methods that are based on virus isolation and concentration, our genome amplification strategy does not require specialized equipment or technical skills and can be easily performed in any laboratory that is equipped with a thermocycler. Thanks to the high genetic similarity of CaPVs, we were able to design a set of pan-CaPV LR-PCRs that cover nearly the entire genome using only 23 amplicons. To demonstrate the robustness of our method, we successfully tested it on SPPV, GTPV and LSDV isolates, clinical samples and vaccine batches. A similar tiling amplicon approach has been used previously for the amplification of several other viral genomes (Freed et al., 2020;Gardner et al., 2014;Grubaugh et al., 2019;Li et al., 2012;Quick et al., 2017;Yu et al., 2020) and has also been developed by the ARTIC network to sequence the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Although this strategy is in theory applicable to any virus species, most tiling amplicon protocols target relatively small RNA viruses such as Ebola virus, Zika virus or West Nile virus. As far as we know, no tiling amplicon sequencing protocols have been described for large DNA viruses such as CaPVs. Due to the size (150 kb) and complex structure (repetitive regions) of CaPV genomes, we opted for an amplicon size of 7.5 kb instead of the much smaller sizes that are generally used in tiling amplicon protocols. As explained by Quick et al. (2017), the likelihood of finding large genome fragments in a sample is determined mainly by the type of virus (RNA/DNA virus, high/low viral load, etc.). Fortunately, CaPV virions are relatively stable in the environment and can be found at very high concentrations in clinical samples such as skin lesions. Routine analyses in our laboratory have shown that the LR-PCR strategy is very robust and can be applied to a wide variety of sample matrices including isolates, clinical samples and vaccine batches. By increasing the amplicon size, we were able to reduce the number of amplicons to 23 and, consequently, also decrease the cost and time needed to amplify an entire CaPV genome. Moreover, the larger amplicon sizes allow to cover the entire ITR in a single amplicon, which facilitates downstream data analysis. In contrast to most tiling amplicon protocols that use only 2 primer pools, we decided to amplify each region individually to facilitate the identification of amplicon dropouts and, at the same time, assess the yield of each reaction individually prior to sequencing. Although this singleplex approach is more expensive, it guarantees a more uniform coverage across the entire genome that cannot be achieved using highly multiplexed tiling PCRs. For instance, several studies have described highly uneven genome coverage when using the ARTIC SARS-CoV-2 amplification protocol (Cotten et al., 2021;Itokawa et al., 2020;Lam et al., 2021;Tyson et al., 2020).
As amplicon-based enrichment strategies provide both enrichment and amplification at the same time, our CaPV LR-PCR can be easily coupled to any of the currently available sequencing platforms. Although a detailed comparison between the different sequencing platforms is beyond the scope of this study, the observed results are in line with previous studies. The most accurate consensus sequences were obtained for the ILL-PCR and PB assemblies, which yielded identical sequences. Despite its markedly lower raw read accuracy, the uniform distribution of errors across PB reads allows to perform a consensus correction step using all available long reads. Unfortunately, PB platforms are expensive and not always easily accessible due to scarcity of service providers. As reported previously, both ONT assemblies suffered from numerous InDel errors that are a well-known problem of nanopore sequencing (Van der Verren et al., 2020). It was shown that nearly half of the errors are due to homopolymers, especially those longer than 4 (Delahaye and Nicolas, 2021). These errors also explain the noise observed in the ONT coverage plots. Although the N50 read length of the ONT data was approximately 7500 bp, large drops in coverage depth were observed across the entire length of the amplicons due to the inaccurate calling of homopolymer stretches. Due to these limitations, despite being the most accessible sequencing technology, ONT nanopore sequencing is presently less suitable for the accurate characterization of CaPV genomes. However, as the sequencing chemistry and the data analysis are evolving continuously, the accuracy of nanopore sequencing is expected to improve over the following years. For instance, new flow cells, sequencing kits, basecalling algorithms and polishing tools are being developed to tackle these InDel artefacts, particularly in homopolymer regions. Additional experiments will be needed to assess the impact of the continuous developments in nanopore sequencing and data analysis.
In conclusion, we have developed a robust, cost-effective and widely applicable method to sequence complete CaPV coding genomes. The described LR-PCR strategy allows to enrich viral target DNA of any CaPV species directly from a variety of samples, including clinical samples (tissue lesions, blood), vaccine batches, and virus isolates. The enriched DNA can be sequenced on all currently available sequencing platforms without additional amplification, yielding (nearly) complete genomes in less than a week. As over 99 % of the DNA is CaPV-specific, only low sequencing efforts are needed. As a consequence, high numbers of CaPV samples can be multiplexed on a single sequencing run, which greatly reduces the sequencing cost per sample. In our experience, 50− 100 K paired 250 bp reads on a MiSeq run are sufficient for complete genome sequencing, which represents only a fraction of the sequencing effort that is needed to cover the entire CaPV genome without amplicon-based enrichment. We hope that the method described in this study will aid researchers worldwide that need to study closely-related CaPVs (e.g. outbreak investigations).

Data availability
We have reviewed published papers, most of which have not yet made their raw EEG data available.
Data will be made available on request.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.