A generic assay for whole-genome amplification and deep sequencing of enterovirus A71

Enterovirus A71 (EV-A71) has emerged as the most important cause of large outbreaks of severe and sometimes fatal hand, foot and mouth disease (HFMD) across the Asia-Pacific region. EV-A71 outbreaks have been associated with (sub)genogroup switches, sometimes accompanied by recombination events. Understanding EV-A71 population dynamics is therefore essential for understanding this emerging infection, and may provide pivotal information for vaccine development. Despite the public health burden of EV-A71, relatively few EV-A71 complete-genome sequences are available for analysis and from limited geographical localities. The availability of an efficient procedure for whole-genome sequencing would stimulate effort to generate more viral sequence data. Herein, we report for the first time the development of a next-generation sequencing based protocol for whole-genome sequencing of EV-A71 directly from clinical specimens. We were able to sequence viruses of subgenogroup C4 and B5, while RNA from culture materials of diverse EV-A71 subgenogroups belonging to both genogroup B and C was successfully amplified. The nature of intra-host genetic diversity was explored in 22 clinical samples, revealing 107 positions carrying minor variants (ranging from 0 to 15 variants per sample). Our analysis of EV-A71 strains sampled in 2013 showed that they all belonged to subgenogroup B5, representing the first report of this subgenogroup in Vietnam. In conclusion, we have successfully developed a high-throughput next-generation sequencing-based assay for whole-genome sequencing of EV-A71 from clinical samples.


Introduction
Enterovirus A71 (EV-A71) belongs to the Enterovirus A species of the family Picornaviridae, and is genetically divided into three genogroups (A, B, and C). The latter two are further divided into pulmonary edema or hemorrhage) (Khanh et al., 2012;Nguyen et al., 2014). There was a strong association between the detection of EV-A71 and severity and death in this outbreak (Khanh et al., 2012;Nguyen et al., 2014). Although phase III trials of three different inactivated EV-A71 vaccines from China were recently completed with 95% protection rates against EV-A71 associated HFMD (Zhu et al., 2014;Zhu et al., 2013;Li et al., 2014), there are currently no vaccines or specific antiviral drugs available for EV-A71.
Large EV-A71 outbreaks have been associated with (sub)genogroup switches, sometimes accompanied by recombination events (Solomon et al., 2010;McWilliam Leitch et al., 2012;Tee et al., 2010). Similarly, sequence data from Vietnam obtained between 2005 and 2011 show a subgenogroup switch from C5 to C4 in 2011 (Khanh et al., 2012;Tu et al., 2007). Of note, this C4 subgenogroup was also associated with the epidemic in China in 2008 and with a recent outbreak in Cambodia (Xing et al., 2014;Tan et al., 2011;Seiff, 2012). Interestingly, phylogenetic analyses and molecular clock dating suggest that strain replacement commonly occurs in EV-A71 (Tan et al., 2011), and that emerging subgenogroups may circulate cryptically in the community at low prevalence for years and cause mild infection prior to outbreak emergence (Tee et al., 2010). Critically, it is unclear whether viral evolution is the driver of emergence and large outbreaks of HFMD in Southeast Asia, or a consequence of the greater number of infected hosts. The former would have profound implications for vaccine development, necessitating active surveillance and periodic vaccine updates. Taken together the available data highlight the importance of understanding EV-A71 evolution and population dynamics within and between endemic countries, which may be essential for understanding and controlling this emerging infection, including vaccine development.
Despite the public health burden of EV-A71, relatively few (∼500) EV-A71 complete genome sequences are available for analysis and from limited geographical localities, compared to ∼4500 sequences of A/H3N2 human influenza virus alone (Viboud et al., 2013). Hence, the availability of an efficient procedure for whole-genome sequencing would be an important step toward stimulating scientific effort to generate more viral sequence data. Herein, we report the development of a complete genome sequencing protocol for EV-A71 directly from clinical specimens using RT-PCR amplification of three overlapping amplicons and nextgeneration sequencing technology.

Patients and Clinical specimens
Samples used in this study were derived from patients enrolled in an ongoing three-year prospective observational study of HFMD patients of all severities (including out-and in patients) admitted to the outpatient clinics, infectious diseases wards, and the paediatric intensive care units in three major referral hospitals in Ho Chi Minh City, Vietnam. This study utilized 65/82 consecutive EV-A71 RT-PCR positive throat swabs collected in viral transport medium from patients enrolled between July 2013 and December 2013 at the Hospital for Tropical Diseases (HTD) and Children's Hospital 2. For the purpose of assay evaluation, the swabs with a wide range of viral loads were selected based on real time RT-PCR crossing point (Cp) values of between 24 and 36 (i.e. from high to low viral load).

Virus isolates
An EV-A71 subgenogroup C4 isolate from a child with HFMD admitted to HTD during the 2011 outbreak, and eight different isolates each belonging to a different subgenogroup (Table 2) obtained from the Department of Biomedical Science, University of Malaya, Malaysia were used for the initial assessment of assay performance.

Primer design
Overlapping primer pairs were designed for the amplification of three amplicons spanning the entire EV-A71 genome. The primers were derived either from previous publications (Khanh et al., 2012) or newly designed based on the alignment of 279 complete genome sequences of all EV-A71 subgenogroups available in GenBank (Genogroup A, n = 1; B0, 2;B1, 14;B2, 4;B3, 8;B4, 11;B5, 18;C1, 17;C2, 33;C3, 3;C4, 165;C5, 2, and undefined, 1;accessed in April 2013). Sequence alignment was performed using MUSCLE available within the Genieous 7.1.3 software package (http://www.geneious.com). The resulting alignment was used to identify conserved regions for selection of PCR primers. The location of primers was chosen to generate PCR fragments with an overlap of 400 nucleotides (nt) or more, allowing for proper sequence assembly. To avoid compromise of PCR sensitivity, a maximum of two degenerate nucleotides were introduced per primer. As a consequence, in some occasions, two or more primers targeting one genomic region were selected for amplification of diverse EV-A71 subgenogroups. Primer sequences, genomic locations, amplified fragment lengths, and alignment profiles of primer binding sites are shown in Table 1 and Supplementary Appendix Fig. 1.

Nucleic acid isolation
Viral RNA was extracted from 140 l of culture supernatant or clinical material using the QIAamp viral RNA kit (QIAgen GmbH, Hilden, Germany), recovered in 50 l of elution buffer (provided with the kit) and was either immediately subjected to RT-PCR amplification or stored at −80 • C for subsequent use.

RT-PCR
RT-PCR was performed using SuperScript III One-Step RT-PCR system with Platinum Taq DNA High Fidelity (Invitrogen, Carlsbad, CA, USA) in a 25 l reaction consisting of 800 nM of each primer (Table 1), 12.5 l of 2× reaction buffer (provided with the kit), 0.5 l of SuperScriptIII RT/Platinum Taq High Fidelity mix and 4 l of template RNA. RT-PCR reaction was performed in a Mastercycler (Eppendorf, Hamburg, Germany). Cycling conditions are specified in Table 1.

Genome sequencing, sequence assembly and minor variation detection
For each sample, PCR products were quantified by a fluorescence-based dsDNA quantification method using the Quant-iT dsDNA Assay Kit in a Qubit fluorometer (Invitrogen) and then pooled with an equal quantity of each individual PCR amplicon. One nanogram of pooled DNA from individual samples was then subjected to library preparation using the Nextera XT DNA sample preparation kit (Illumina, San Diego, CA, USA), in which each sample was assigned to a unique barcode sequence using the Nextera XT Index Kit (Illumina). Sequencing of the prepared library was carried out using the Miseq reagent kit v2 (300 cycles, Illumina) in an Illumina Miseq platform. A total of 24 samples were sequenced in a single run. The reads obtained were processed to remove PCR primers using CLC Genomics Workbench (QIAgen).
Sequence assembly was performed using the Genieous 7.1.3 software package utilizing a reference-based mapping tool (i.e. the consensus sequence was obtained by mapping individual reads of each sample to a reference sequence). Finally, screening of minor (sub-consensus) variants was performed using the SNP detection

Sequence analysis of the obtained consensuses
Sequence alignment of the consensus genomes obtained in this study was performed using MUSCLE (Edgar, 2004), and the phylogenetic relationships among these data and 25 representative sequences taken from GenBank were estimated using the maximum likelihood (ML) PhyML method available within the Geneious package. The ML phylogenetic analysis utilized the general time reversible (GTR) nucleotide substitution model (Guindon and Gascuel, 2003), and support for individual nodes was assessed using a bootstrap procedure (1000 replicates).

Sequence accession numbers
The sequences of EV-A71 obtained in this study were submitted to NCBI (GenBank) and assigned accession numbers KP691643-KP691666.

Ethical statement
The studies from which clinical specimens were selected for use in this study were reviewed and approved by the Institutional Review Boards of the sites of enrolment in Ho Chi Minh City, Vietnam and the Oxford Tropical Research Ethics Committee (OxTREC), University of Oxford, Oxford, United Kingdom.

Whole-genome RT-PCRs
After examining the complete genome sequence alignment of EV-A71, primer pairs for three overlapping RT-PCRs spanning the whole genome were designed. To assess their performance, we first tested the assays on RNA derived from culture supernatants of different EV-A71 subgenogroups. Notably, all three RT-PCRs were able to amplify viral RNA from each sample ( Table 2).
The overlapping PCRs were further tested on 65 EV-A71 RT-PCRpositive throat swab samples collected from children with HFMD. All three fragments were successfully amplified in 50/65 (77%) samples (Fig. 1). Eleven out of 15 samples that were not amplified by the three overlapping PCRs were due to failure of the 3.2Kb PCRs (details in Supplementary Appendix Table 1). A significantly higher viral load (as suggested by Cp values generated by an EV-A71 realtime RT-PCR (Khanh et al., 2012)) was observed among samples that were amplified by all three RT-PCRs compared to samples that were not (Fig. 2). Accordingly, clinical samples with a Cp value of 30 or less allowed the successful amplification of the three genomic fragments.

Genome sequencing using the Illumina platform
Amplified products from 23 clinical samples and an EV-A71 isolate were sequenced in one batch. The run resulted in an output of 4.9 Gb, of which 60% could be successfully assembled. Over 92% of the assembled bases had Phred quality scores of ≥30 (i.e. a probability of ≥99.9% that the single bases were correctly sequenced). The remaining bases had Phred quality scores of between >10 and 29 (i.e. >90% of accuracy). Twenty-three complete genome sequences (including 22 from clinical samples and one from virus culture material) and one near-complete genome sequence of EV-A71 were successfully assembled. A coverage of ≥500-fold was achieved in 22 samples (including 21 clinical samples and the virus isolate) and  Phylogenetic analysis suggested that all viruses derived from throat swabs collected in 2013 belonged to EV-A71 subgenogroup B5 (Fig. 3), while the virus from culture material belonged to subgenogroup C4. This is in agreement with our initial typing result based on sequencing of VP1 region (data not shown).

Minor variant detection
Using the criteria described in the Methods section, we identified a total of 107 positions in the EV-A71 genome carrying minor variants (ranging from 0 to 15 variants per sample), of which 15 (14%) were nonsynonymous. Interestingly, 73% (11/15) of the nonsynonymous variants were present in the nonstructural protein coding region, which also contained 65% of the total 107 variants observed, with mutations in the 2 C protein being the next most frequent (38/107, 35.5%). Most variants (101) represented minor variants. However six variants were found to be either minor or dominant (i.e. >50%) variants in different samples, suggesting that they may impact viral fitness in some way, and hence merit additional investigation. More details on the minor variants are presented Supplementary Appendix Table 3, including the frequency of the six variants shared between viruses found in the data set of 279 genome sequences used in the present study.

Discussion
We have successfully developed a high-throughput Illumina Miseq-based generic assay for direct whole-genome sequencing of EV-A71 from clinical samples. Although primers for wholegenome amplification of EV-A71 have been proposed previously (Shih et al., 2000;Cordey et al., 2012;Huang et al., 2009;Yoke-Fun and AbuBakar, 2006;Chang et al., 2012;Zhang et al., 2013), these primer sets were either large (i.e. at least eight combinations were used) and/or not tested on diverse EV-A71 subgenogroups. The procedure described here only requires three overlapping PCRs to amplify the entire virus genome directly from clinical specimens. In addition, the entire procedure from nucleic acid extraction and PCR amplification to obtaining a total of between 24 and 96 virus genomes takes approximately 5 days (Flowchart 1). The total reagent-associated cost is $120 USD per sample if 24 samples are multiplexed in one run (date of cost assessment: July 2014). This can be reduced to $70 USD if 96 samples are combined. In addition, with the great sequencing depth (i.e. a single nucleotide is sequenced at least 500 times), the extent   and pattern of intra-host genetic diversity data can be explored in addition to the consensus sequence. It should however be noted that although the current cost per genome is affordable, the total cost per run (∼$3000 USD) together with hardwareassociated cost combined with bioinformatics challenges may remain the major obstacles preventing Illumina based wholegenome sequencing assay from being widely used, in particular in developing countries including parts of Southeast Asia where EV-A71 and HFMD are endemic. Alternatively, generating full-length virus sequence can be achieved using Sanger-sequencing-based strategies (e.g. DNA walking of the obtained PCR amplicons described in the present study and/or multiple small overlapping PCRs spanning the whole virus genome). While the associated cost per genome of such strategies might be comparable with that of Illumina Miseq one, generating full-length virus genome using Sanger sequencing technology is a laborious procedure as it not possible to multiplex between 24 and 96 samples per run. Likewise, with Sanger-sequencing data, exploring the pattern of intra-host diversity of the pathogen at a level of frequency of 5% is unachievable.
Using our procedure we were able to sequence viruses of subgenogroup C4 and B5. Likewise, RNA from culture materials of diverse EV-A71 subgenogroups belonging to both genogroup B and C ( Table 2) that have been associated with outbreaks of HFMD in the Asia-Pacific region since the 1990s was successfully amplified. Assuming that a Cp value of ∼30 is the threshold of assay limit of detection (Fig. 2), our observational data on Cp values obtained from 824 throat swabs from HFMD patients in Vietnam (data not shown) show that 75% of viruses in RT-PCR positive throat swabs can be sequenced by our assay. Of note, among 65 tested swabs, in the majority of cases the procedure failure was attributed to the RT-PCRs that generated the largest amplicon (RT-PCR 1) and/or targeted at the genetically variable regions of the virus genome (RT-PCR 2) (Supplementary Appendix Table 1), which may suggest that fragment size and/or sequence variations play a part in addition to the viral-load factor.
We conservatively chose cut-off values of 500-fold coverage and a frequency of 5% for detection of minor variants. This is well above the next-generation sequencing error rate of 0.1% (Archer et al., 2012), while a previous study suggested that a coverage of >400fold is required for reliable detection of minor variants present at 1% in next generation sequencing data (Wang et al., 2007).
Although an association between EV-A71 subgenogroups and disease severity has not been identified, previous studies have shown potential associations between single nucleotide variants and clinical severity (Cordey et al., 2012;Zhang et al., 2014). In addition, a recent study of C4, B1, B2 and B5 sequences found that nonsynonymous and synonymous substitutions occurred more frequently in the nonstructural than the structural protein-coding region, suggesting that the former was a major fitness determinant . We similarly observed a majority of minor variants (including nonsynonymous ones) in the nonstructural protein-coding region, although the function of these mutations is uncertain. As a consequence, these data further emphasize the need to study viral diversity and evolution at the whole-genome level, rather than in the VP1 protein alone which has been the main focus to date. Likewise, exploring the association between specific viral genotypes/single nucleotide polymorphisms and clinical phenotype is clearly an area of importance, although beyond the scope of the present study.
Our analysis of EV-A71 strains sampled in 2013 showed that they all belonged to subgenogroup B5, representing the first report of this subgenogroup in Vietnam, while subgenogroup C4 caused the large outbreak of HFMD that occurred in Vietnam in 2011-12 (Khanh et al., 2012). Subgenogroup switches commonly occur in other countries of Asia-Pacific region where EV-A71 and HFMD are endemic (Solomon et al., 2010), although the evolutionary and epidemiological processes responsible for these switches are still unclear. Taken together, these data highlight the importance of studying spatial and temporal dynamics of EV-A71 across endemic countries, which may in turn inform the development of intervention strategies, including vaccine development and implementation, and can now be facilitated by the availability of a high throughput/cost-effective whole-genome sequencing procedure.
In conclusion, we have successfully developed a highthroughput next-generation sequencing-based generic assay for direct whole-genome sequencing of EV-A71 from clinical samples that provides important insights into viral diversity and evolution.

Funding
The research leading to these results has received funding from the Wellcome Trust of Great-Britain (101104/Z/13/Z and 089276/Z/09/Z), and the Li Ka Shing Foundation-University of Oxford Global Health Program strategic award (LG23). ECH is supported by an NHMRC Australia Fellowship. OKC is supported by a High Impact Research Grant (H20001-E00004) from the Ministry of Education, Malaysia Government and University of Malaya Research Grant (RG480/12HTM). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Conflict of interest
No conflict of interest.