The use of next generation sequencing in the diagnosis and typing of respiratory infections

Highlights • We compared current viral respiratory diagnostic techniques with NGS.• NGS is able to detect respiratory viruses in clinical diagnostic samples.• With the current sample preparation method, NGS is less sensitive than RT-PCR.• NGS provided additional sequence and typing information compared with RT-PCR.


Background
Virus specific molecular assays such as real-time PCR (RT-PCR) are now considered the gold standard in the diagnosis of viral respiratory tract infections. They are rapid, relatively inexpensive and offer increased sensitivity and specificity over prior techniques Abbreviations: RT-PCR, real-time polymerase chain reaction; NGS, next generation sequencing; NPS, nasopharyngeal swab; VTM, viral transport medium; HRV, human rhinovirus; IFA, influenza A; IFB, influenza B; RSV, respiratory syncytial virus; ADV, adenovirus; hMPV, human metapneumovirus; PIV-1-4, parainfluenza virus 1-4; HCoV, human coronavirus; WoSSVC, West of Scotland Specialist Virology Center; HEV, human enterovirus; Ct, cycle threshold; BLAST, basic local alignment search tool; TRT, turn-around time.
such as virus culture and direct immunofluorescence. Assays can be developed quickly to detect novel/emerging pathogens and can be combined to identify multiple microbiological pathogens in a single test. Yet there is a limit to the number of targets, usually up to four, which can be included in an in-house test before compromising test sensitivity. As a result, diagnostic laboratories must develop a panel of multiplex tests in order to detect the whole range of pathogens. Also, as for all PCR based assays, detection is based on targeting conserved regions of the pathogen genome and mutations can lead to reduced sensitivity or false negative results. Furthermore, only the targeted pathogens included in the assay will be identified, therefore atypical or emerging pathogens will generally evade detection by PCR. Although commercial PCR based tests [1] are available that overcome some of the pitfalls associated with in-house tests, they remain PCR based technologies and as a result suffer from the same sequence based pitfalls outlined above. Introducing NGS into a diagnostic setting may revolutionize the investigation of respiratory infections. Combining sequence independent amplification with NGS will potentially detect viral and non-viral pathogens within a clinical specimen without actively targeting them, while simultaneously analyzing the genetic sequence. NGS is established in virus discovery, whole genome studies and metagenome studies [2][3][4] thus the simultaneous detection of multiple different pathogens with this technique is possible. However the efficacy and feasibility of employing such techniques in a diagnostic setting requires further study.

Objectives
Here we present a pilot study that compares current diagnostic techniques, namely RT-PCR with NGS in the detection of RNA viruses in respiratory samples from individuals symptomatic of a respiratory illness.

Samples
Eighty nine nasopharyngeal swabs (NPS) were collected from adults with upper respiratory tract infections between May 2010 and October 2011. Samples were collected as part of the VIDARIS trial, a random subset of which were used in this study. It should be noted that over half of the participants in this trial were vaccinated against influenza. Ethical approval was provided by the Upper South B Regional Ethics Committee. All participants provided written informed consent [5]. Swabs were stored in viral transport media (VTM) at −80 • C until testing. The VTM was thawed at 37 • C and centrifuged at 1500 × g for 10 min to remove debris. Total nucleic acids were extracted from 200 l of the supernatant (Mag-Jet Viral DNA and RNA kit, Thermo Scientific) and eluted in 100 l of water.

Next generation sequencing method
A 20 l aliquot of the extract was treated with DNAse 4U (Turbo DNAse 2U/l, Life technologies) for 30 min at 37 • C. RNA was purified from the reaction using RNAClean XP beads (Agencourt), eluted in 15 l of water and reverse transcription carried out using Maxima Minus H (Thermofisher) at 50 • C for 60 min with 0.2pM primer FR26RV-N (5 GCC GGA GCT CTG CAG ATA TCN NNN NN 3 ). Second strand cDNA was synthesised (NEBNext mRNA 2nd Strand Synthesis, New England Biolabs) and the reaction purified with Ampure XP beads (Agencourt). Sequence-independent single primer amplification (SISPA) was carried out with the Advantage 2 PCR kit (Clontech) and 0.2pM primer FR20RV (5 GCC GGA GCT CTG CAG ATA TC 3 ). The PCR product was purified with Ampure XP beads (Agencourt) and quantified (Qubit HS DNA, Life Technologies). 1 ng of cDNA was used to prepare barcoded sequencing libraries with the Nextera XT DNA Sample Prep kit (Illumina) and indices from the Nextera XT Index Kit as per the manufacturer's instructions. Up to 24 sample libraries were pooled per sequencing run and 151 bp paired-end reads were generated on the Illumina MiSeq.

Bioinformatic analysis
Sequencing adapters and low quality sequencing reads were removed (Trim Galore!, Babraham Bioinformatics) and low-complexity reads filtered out (PrinSeq [6]). High quality paired-end sequences were retained for downstream analyses. These sequences were mapped to a database containing a human genome and cDNA references, to remove host sequences.
Unmapped sequences were entered into the Metamos pipeline [7] which employs multiple de novo assemblers with k-mer optimisation to assemble contigs. The contigs from the most effective assembly were then taxonomically classified using the Basic Local Alignment Search Tool (BLAST) against the GenBank nucleotide and non-redundant databases (cut off E value 0.001). Identical sequences between samples were removed using BedTools and unique sequences were retained for further analysis. Sequenced reads were then mapped back to the top taxonomic hit for each sample and visualized using Tablet [8], to quantify viral reads within each sample and generate a consensus sequence. Where appropriate, greater than 90% of reference genome coverage, the consensus sequences were aligned with known reference sequences and phylogenetic analysis carried out using MEGA6 [9]. Taxonomic hits were compared with the results of the diagnostic qRT-PCR (Table 1).

Next generation sequencing
The average number of sequences generated per sample was ∼660,640 (range 30,872-1,278,122) after quality trimming and filtering. Viral contigs were found in 53/89 samples but following removal of duplicate reads this was reduced to 46/89. In a subset of samples (n = 8), there were fewer than 10 unique viral reads detected by the NGS assay alone. Due to the low number of reads we deemed these to be negative by NGS. The viral sequences detected in the remaining 38 samples belonged to the Picornaviridae, Coronaviridae, Paramyxoviridae and Orthomyxoviridae (Table 1). No mixed infections were detected by NGS.
All viruses identified by NGS were confirmed by qRT-PCR. However, in eleven cases, virus was identified by RT-PCR only. This included the following viruses, ADV (1/11), PIV-2 (1/11), hMPV (1/11), RSV (2/11), HCoV (3/11) and HRV (3/11). One sample was found by RT-PCR to contain a mixture of both ADV and HRV. The NGS method failed to detect the ADV in this sample. Where NGS confirmed the findings of the RT-PCR assay the Ct values were sig- Further examination of the relationship between the numbers of viral sequenced reads and the threshold cycle (Ct) value provided by the RT-PCR assay is shown in Fig. 4. The log of the percentage of sequenced reads correlates with the Ct value indicating that a higher viral load is associated with a greater proportion of viral reads.
Despite sample treatment to enrich for viral RNA, a substantial number of bacterial transcript sequences were generated, however partial genome amplification of bacteria only allows classification to the level of order/family (data not shown).

Discussion
As a pilot study designed to assess the utility of NGS as a method for the detection of respiratory RNA viruses, we compared an in-house NGS method with an established in-house RT-PCR test using 89 respiratory samples. A viral pathogen was detected by the NGS method in 38 samples with the RT-PCR confirming these and detecting a further 11 viruses. Overall, based on these results, the NGS assay had a sensitivity of 77.55% (95% CI 63.37-88.21%) and specificity of 80.49% (95% CI 65.13-91.15%) or 100% (95% CI 91.31-100%) if considering samples with fewer than 10 unique reads are negative, compared to RT-PCR. It is possible that the sensitivity of NGS detection could vary between virus groups however the number of detections in this cohort was not powered to assess each individually.
The 11 NGS negative/RT-PCR positive samples contained virus at lower Cts compared to those positive by both assays, suggesting the current NGS method has a cut off in the region of Ct 32, approximately 1-2 logs less sensitive than the qRT-PCR method (data not shown). This finding is similar to that outlined by Prachayangprecha et al. At present, this level of sensitivity is not appropriate for diagnostic services to replace qRT-PCR. Increasing the depth of    sequencing could improve the sensitivity of NGS, either by reducing the multiplexing of samples (i.e., reducing the number of samples processed per sequencing run) or using an alternative platform with greater capacity such as the Illumina NextSeq or HiSeq. Reducing the contamination of host DNA through host DNA depletion holds the greatest promise [11] to increase the depth of viral and microbial DNA as we found up to 99% of reads obtained per sample were derived from the host.
One of the 11 pathogens not detected by the NGS method was adenovirus. It is unclear whether this was due to the sensitivity cut off discussed above (Ct of 35.1) or due to the initial DNAse step outlined in the method. In future, if this is to be adapted for the detection of DNA viruses and other non viral causes of respiratory infection then other ways of enrichment will need to be sought. Perhaps target enrichment via hybridization, but again this may encounter the issue of missing novel or changed viruses.
Although the NGS method failed to detect all the positive samples, it did offer several advantages over the RT-PCR method. For example, the NGS provided more detailed typing information for the detected viruses including subtype data for RSV and hMPV as well as serotype data for the HRV and HEV.
Such information is highly informative. As well as enabling real-time diagnostic data to be produced, NGS could simultaneously provide resistance testing (e.g., H275Y mutation of influenza A, conferring oseltamivir resistance) and typing data, as demonstrated here and previously [12,13]. This could be used to inform public health of circulating strains and could allow rapid detection of the emergence of novel subtypes (e.g., EV68 and ADV 14) or highlight potential outbreaks. In future, it may also detect viral polymorphisms associated with disease severity [14,15].
Similar to other studies we also found a correlation between NGS sequence reads and RT-PCR cycle thresholds [16,17]. This was particularly strong for rhinoviruses -mainly because these represented the majority of the pathogens detected. This suggests that sequence reads might be a suitable proxy for viral/target copies. This would enable laboratories/clinicians to interpret the clinical relevance of results [18]. Such data can also be used to infer prognosis or treatment response.
Viral co-infection detection by NGS could not be assessed in this cohort as only a single co-infection episode was detected by RT-PCR. The population under study were normally healthy adults who did not require health care interventions. There is little information on co-infections in such individuals but evidence suggests that viral co-infections occur more commonly in hospitalised individuals [19]. The detection of influenza was also lower than may have been expected which probably relates to the level vaccina-tion amongst the cohort. Further studies including children and unvaccinated individuals would be of use.
The NGS approach did not detect positive results that were missed by the RT-PCR. Although this panel was small, our data supports other studies which have concluded that current respiratory panels are appropriate for the detection of the main causes of viral respiratory disease.
A major barrier to the introduction of NGS as a routine diagnostic test is its cost and turnaround time. The current cost of sequencing is prohibitive as a diagnostic test when compared to RT-PCR but this has reduced in recent years, a trend which will probably continue in the future. Although not designed to detect bacteria the current method still detected bacterial genomes in nearly all samples, this finding shows that syndromic testing is a possibility. If viral, bacterial and fungal diagnostic tests can be combined into a single assay this would benefit the cost effectiveness of developing NGS as a diagnostic tool.
The turnaround-time of the above process was in the region of seven days, including preliminary data analysis whereas the turnaround time for RT-PCR methods is usually just a few hours. As a result, NGS is unlikely to become a routine diagnostic test in the near term. However there are many steps that could be accelerated with automation. The development of kit based library preparation methods has also resulted in a condensed process, reducing hands-on time. Technical advances that allow a greater depth of sequencing could obviate the need for enrichment processes. Furthermore third generation sequencers, such as the PacBio RS II, offer the potential for even more rapid sequencing. Taken together, these advances are likely to improve the TRT of NGS significantly.

Competing interests
None declared.

Funding
This study was funded by the UK Medical Research Council (Grant reference number G0801822).

Ethical approval
The study was approved by the Upper South B Regional Ethics Committee, New Zealand (URB/09/10/050) and all participants provided written informed consent.