Complete assembly of a dengue virus type 3 genome from a recent genotype III clade by metagenomic sequencing of serum

Background: Mosquito-borne flaviviruses, such as dengue and Japanese encephalitis virus (JEV), cause life-threatening diseases, particularly in the tropics. Methods: Here we performed unbiased metagenomic sequencing of RNA extracted from the serum of four patients and the plasma of one patient, all hospitalized at a tertiary care centre in South India with severe or prolonged febrile illness, together with the serum from one healthy control, in 2014. Results: We identified and assembled a complete dengue virus type 3 sequence from a case of severe dengue fever. We also identified a small number of JEV sequences in the serum of two adults with febrile illness, including one with severe dengue. Phylogenetic analysis revealed that the dengue sequence belonged to genotype III. It has an estimated divergence time of 13.86 years from the most highly related Indian strains. In total, 11 amino acid substitutions were predicted for this strain in the antigenic envelope protein, when compared to the parent strain used for development of the first commercial dengue vaccine. Conclusions: We demonstrate that both genome assembly and detection of a low number of viral sequences are possible through the unbiased sequencing of clinical material. These methods may help ascertain causal agents for febrile illnesses with no known cause.


Introduction
Acute undifferentiated febrile illness refers to a sudden onset of high fever without localized organ-specific clinical features 1 . Although the majority of patients recover over a few days, some can develop severe illnesses, resulting in high morbidity and even death in many parts of the world. Among the many causes of febrile illness, some of the most important across Asia are mosquito-borne viruses such as dengue virus [1][2][3][4][5][6] . In addition, novel agents associated with acute febrile illness continue to be discovered 7-9 .
Current molecular diagnostic techniques, such as polymerase chain reaction, are pathogen-specific and therefore pose limitations, as they may fail to detect co-infections and novel agents not commonly associated with the disease syndrome 10 . The unbiased metagenomic sequencing of clinical material from patients with acute fever can overcome these limitations 3,11 .
Mosquito-borne viruses of the family Flaviviridae, which include dengue virus and Japanese encephalitis virus (JEV) are known to co-circulate in India and other parts of Asia 12 . Dengue viruses are a major cause of acute febrile illness in Asia, with recurrent outbreaks having occurred 13 . JEV, on the other hand, is better known as a cause of acute encephalitis 14 . Although JEV has been noted as an agent that causes acute fever in Southeast Asia, it is not routinely tested as a cause of fevers in India 5,6 . There are four distinct serotypes of dengue viruses (DENV1-DENV4), with their small RNA genomes (approximately 10.8 kbp) making them amenable for characterization by deep sequencing of infected mosquitoes or clinical material from infected individuals 15 . Sequencing dengue genomes is important for tracking virus evolution, given that they frequently mutate 15,16 . Outbreaks of severe dengue disease associated with serotype switches or the introduction of a novel strain into the population have been reported from several different countries, including Sri Lanka, Pakistan and Singapore 17-22 . Recent analysis suggests an influenza-virus-like pattern for dengue virus evolution, where strain-specific differences underlie antibody neutralization 23 . Pre-existing antibodies to circulating dengue strains can therefore contribute to disease severity by inadequate neutralization of the virus or by antibody-mediated enhancement, which facilitates virus infection 24-28 . This is supported by in vitro studies, which found that changes to the envelope (E) protein of DENV3 were sufficient to alter antibody binding 26 . Multiple dengue vaccines are currently in various stages of development, and a tetravalent vaccine (CYD-TDV; Dengvaxia®, Sanofi Pasteur) has been approved for use in several countries 29,30 . This vaccine has been shown to induce the expression of broadly neutralizing antibodies to multiple strains and all serotypes of dengue viruses 31 . The results of a phase III trial of this vaccine suggest that both the immune state (with respect to dengue viruses) and circulating viruses may influence vaccine effectiveness 29 . This underscores the need to characterize both the sequence evolution and antibody response of circulating dengue strains.
Here we used an unbiased sequencing/metagenomic approach, in order to determine both the identity and sequences of viruses associated with febrile illness. In particular, based on previous studies of sequencing data from the serum of febrile individuals, we expected that medium-depth sequencing (about 10-20 million sequence reads per sample) was necessary and sufficient to provide complete sequences of small viral genomes from clinical material 2,9 . To test this, we sequenced RNA extracted from the serum of four individuals and the plasma of another presenting with febrile illness at a tertiary care hospital in Bangalore, India and one healthy control from the same hospital, during the dengue season of 2014. We recovered the complete coding sequence of DENV3 clustering into a recent genotype III clade.

Results
We sequenced RNA extracted from the serum of four patients hospitalized with severe febrile illness and from one plasma sample from a patient hospitalized with prolonged febrile illness (Table 1). We included serum from a healthy individual and water as controls. Approximately 10×10 6 sequence reads were recovered from each sample, with the water control yielding a lower number of reads ( Figure 1A).
A BLAST 32 similarity search, mapping all sequenced reads to a database of NCBI reference viral sequences (Table 1), identified 19,120 DENV3 sequence reads and 14 JEV sequence reads in sample F2, and 12 JEV sequence reads in sample F5. A single DENV3 read was detected in sample F3. No animal viruses were confirmed by BLAST in the controls or in other samples (Table 1 and Figure 1B).
On the basis of World Health Organization guidelines for the classification of dengue cases 33 , F2 was classified as a case of severe dengue, as the presenting symptoms included respiratory distress (bilateral pleural effusions in chest X-ray) hypotension and elevated liver enzymes ( Table 1).
The serum sample from this individual was positive for both the non-structural protein 1 antigen and dengue IgM, and we were able to obtain a complete DENV3 genome sequence from this sample. Genomes were assembled both by de novo (87.05% coverage) and mapping-based (99% coverage) assembly (Table 2 and Table 3, Supplementary File 1) and found to be identical (Supplementary File 2). Mapping revealed good coverage across the genome, with an average depth of 231.45 ( Figure 1, Table 2). The genome is missing 76 bp at the 5'-UTR and 28 bp at the 3'-UTR compared to the NCBI RefSeq (NC_001475.2) DENV3 genome.
The mapping-based assembly was used for phylogenetic analysis and submitted to GenBank, with accession number KX855927.

Amendments from Version 1
Revisions have been made to the main manuscript and to Figure 3 based on comments from the reviewers. Modifications have been made in -i) Page 7, discussion, paragraph 1 ii) page 3 -introduction, paragraph 3 iii) in the legend and results section for Figure 3.   The degree of nucleotide identity between this strain and the reference DENV3 genome (NC_001475.2) was 96.32%, and with the closest DENV3 strain from India, 98.75%.

REVISED
Phylogenetic analysis was carried out with BEAST2 using the coding sequence of KX855927 and 79 sequences selected as being similar to KX855927, using the BLAST search against dengue genomes in the Virus Pathogen Database and Analysis Resource 34 (Supplementary File 3). The strain clusters with recent DENV3 sequences from India, China and Singapore ( Figure 2). This clade split from other DENV3 and other DENV3 genotype III strains around 15 years ago. The branch length of KX855927 is longer than most others in the tree, with an estimated divergence time of 13.86 years (with the 95% highest posterior densities between 12.94 and 14.83 years) from the closest Indian strain ( Figure 2). A maximum likelihood tree showed the same topology as the consensus tree from BEAST, although many clades had low bootstrap support (Supplementary File 4).
Both synonymous and non-synonymous substitutions were predicted throughout the genome, as compared to the DENV3 reference sequence (Supplementary File 5). We aligned the E protein of all the complete genomes from Indian strains against the parent strain used to derive the tetravalent dengue vaccine (CYD-TDV; Dengvaxia®, Sanofi Pasteur) ( Figure 3). Multiple amino acid substitutions were predicted throughout the envelope protein and two additional stop codons (at amino   acid positions 58 and 168) were observed in the DENV3 KX855927. Most of the amino acid substitutions were shared among all the Indian strains, while a E361D substitution was unique to the DENV3 strain reported here ( Figure 3A). Of the substitutions, 9 out of 11 were mapped onto the surface of the E protein. Of these, six are in key antigenic sites, with three sites known to influence antibody binding ( Figure 3B).
The sequencing reads mapping to JEV from Sample F2 and F5 were assembled into contigs and used to check for potential alignment to other genomes in the NCBI nucleotide sequence database. A BLAST search revealed that the JEV sequences we identified were specific to JEV (100% identity, 100% coverage of read) (Supplementary File 6). The sequences were found to match non-structural protein 5 of JEV. A specific search against the dengue database for the contigs from the sample containing DENV3 sequences showed no similarity for contig 1 and some similarity to a dengue virus 2 sequence for contig 2 (83% identity, 97% coverage; Supplementary File 6).
The single DENV3 sequencing read found in sample F3 was identical to a sequencing read occurring with high frequency in sample F2. Therefore, we did not carry out any further analysis with this sequence read as we suspect it to be a contamination.

Discussion
Here we sequenced a complete dengue genome from a clinical case of severe dengue fever, without the need to culture the virus, and in an unbiased manner. We believe that, in the future, the sequence-based -enrichment of viral sequences using conserved sequences, will enable the recovery of complete genomes from routine clinical samples even with by lower-depth sequencing 35 .
We identified a low number of reads mapping specifically to JEV. JEV is known to cause fevers 5,6,36 . Further systematic analysis using a combination of polymerase chain reaction and IgM testing is required to ascertain how much JEV contributes to the acute fever burden in India. The low number of JEV reads obtained in both samples in which reads mapping to JEV were found suggests there was not much active viral replication occurring. There are previous reports of the detection of JEV sequences many months after infection 37 . The sequences we found could therefore be remnants of a previous infection or may be the result of an infection from a mosquito bite that was checked by the immune system. The low number of reads in these cases mapped to the same gene (nonstructural protein 5) (Supplementary File 6). This could reflect the higher stability of some parts of the JEV RNA genome.
The results of metagenomic sequencing, however, do need to be interpreted with caution owing to issues related to contamination 10,11 . Contamination can occur in every step of the procedure, starting from sample collection, processing, sequencing and, when multiple indexed samples are sequenced together, de-multiplexing (the process in which reads get assigned to a sample). This needs to be taken into consideration, particularly when the number of sequences supporting the presence of a pathogen are low, when there is incomplete genome information, or when the same sequence is present in all the samples, including the controls. We have tried to mitigate this partially by the use of controls-serum from a healthy individual collected at the same time and place and a water sample processed in the same way as the clinical samples. However, we believe that independent methods are required to confirm novel/unexpected findings by this method.
DENV3 has been shown to be re-emerging in India, and has been responsible for severe outbreaks in other geographic regions, including in South America and Cuba 27,38,39 . The full-length DENV3 (KX855927) we describe here clusters into a clade containing DENV3 viruses from India and is related to an Indo-China-Singapore clade. We observed a longer branch length for this particular strain, which could be the result of incomplete sampling of this clade or could indicate that this lineage is showing accelerated rates of molecular evolution 40 . This can be resolved in future studies by the addition of more sequence information, as more full-length dengue sequences from India become available in the databases.
While both synonymous and non-synonymous changes were observed throughout the DENV3 (KX855927) genome compared to the DENV3 reference sequence (NC_001475.2), the changes in the antigenic E protein are of particular interest. Neutralizing antibodies have been described against the envelope protein that target particular epitopes 26,41 . Critical amino acid residues that change antibody binding have also been described by others 26 . The results from our phylogenetic analyses are consistent with previous work tracing the emergence of new clade of DENV3 genotype III strains in India 39 . The ability of a dengue vaccine to elicit neutralizing antibodies against locally circulating DENV3 strains therefore needs to be evaluated in this light.

Description of samples
In total, samples from five patients (two diagnosed with dengue fever (serum; F1 and F2), two with Rickettsial fever (serum; F3 and F5) and one with unknown fever (plasma; F4) presenting with febrile illness, and one healthy control (serum; C1) at St. John's Medical College and Hospital (SJMCH), Bangalore, were assessed in this study. Isolation of RNA RNA was extracted using the Qiagen All-Prep kit, using 300-500 µl of serum/plasma and lysed using 1 ml of lysis buffer. The remaining protocol was performed as recommended by the manufacturer. Eluted RNA was concentrated and used for sequencing reactions.

Sequencing
Sequencing libraries were prepared using the Ion Proton library preparation protocol. Indexing was performed using the IonXpress RNA Seq Barcode kit (Thermo Fisher Scientific, Inc.). Samples F1-4 and C1 were run on the same chip; sample F5 was run on a separate chip. Libraries were pooled to give equimolar concentrations of 10 pM. This was used in templatepreparation steps and RNA sequencing was performed using the Ion PI sequencing kit on the Ion Proton platform using the Ion PI™ ChipV2 and Ion PI™ Sequencing Kit V3 (Thermo Fisher Scientific, Inc.).

Analysis of sequences
We aligned the sequencing reads to a database of all known viruses using the SNAP alignment tool (snap-1.0beta.16-linux) 42 . All hits were verified using nucleotide BLAST sequence search and visualized using tools from the Dark Matter project. Reads aligning to the human genome, human mRNA, rRNA large subunit and rRNA small subunit from the SILVA database were removed 43 . The aligned sequences were used as the input for assembly. De novo assembly was performed using the SPAdes (v3.10.1) tool 44 . Quality assessment of the assembly was performed using the QUAST tool 45 . MIRA 4.0.2 was used for mapping based assembly, with the GenBank sequence NC_001475.2 for DENV3 as the backbone for assembly and NC_001437 as the backbone for JEV 46 . Contigs were subjected to nucleotide BLAST using the online BLAST tool. The mapping based assembly of DENV3 obtained using MIRA was manually checked for regions with low confidence using Gap5 (staden-2.0.0b11-2016-linux-x86_64) 47 . Fewer than 30 nucleotides were found to have low confidence, of which 22 were in the 3'-UTR end region. The files from the MIRA assembly, together with the contributing reads, are provided as Supplementary File 1. This sequence was submitted to GenBank with the accession number KX855927. An earlier version of this work can be found on bioRxiv (https://doi.org/10.1101/204503).

Data availability
The raw files from sequencing are not provided in their entirety as these are metagenomic datasets that contain identifying host information. Therefore we have used only sequences not aligning to the human genome for our research. This data has been uploaded in fastq format on OSF (see below). As our experiments were designed to identify pathogens, we do expect the accompanying human data to be free from biases involving sampling, storage and handling. However, under the conditions that the samples remain de-identified, and the work is not directly on human genetics, approval for data sharing of the complete data from the RNA sequencing experiment, which includes any human sequences, can be sought with the The funders had no role in the in any steps of experimental design, data collection or analysis or the decision to publish. 1.

5.
6. The report is straight-forward and presents a successful application of metagenomics sequencing to help diagnose febrile illnesses with no known causes. The assembled DENV3 sequence will be very useful for epidemiology studies and to track the evolution of DENV3. Specific items for the authors to consider: OPTIONAL: On page 3, the authors mention Dengvaxia. It will actually strengthen the authors' projected use of metagenomics sequencing if they mentioned the problems in the Philippines and elsewhere with this vaccine. Will it help to know before vaccines are used which virus strains are in circulation? And will that sequence knowledge be useful in the future for evaluating the possible side-effects of vaccines before the vaccines are administered? The former is typically used for whole blood. How was the eluted RNA concentrated? Why? OPTIONAL: The authors make no mention of the costs/time commitments/ personnel/training/facilities to accomplish the type of work they performed. Though the approach is very good (and clearly the wave of the future), it is at present very costly and not practical for most medical laboratories, even in developed countries. How can this be remedied? OPTIONAL: The authors might comment further on this: It will be curious to many readers why JEV sequences were also detected. Is this a finding of major importance? Has this been observed in other studies (persistence of JEV sequences)? Could the severity of illness in patient F2 have been affected by a "smoldering" JEV infection? More explanation would be useful regarding how the sequencing libraries were made. For example, flavivirus RNA is not polyadenylated and is capped. Do the authors have a specific approach to capture these types of RNA? *OPTIONAL: Some possible clarifications/comments from the authors:

Open Peer Review
(1) Next generation sequencing definitely has a role diagnostics, phylogeographic studies, etc. On page 3, the authors state "We hypothesized that …". Some readers will interpret that to mean the authors claim they conceived this concept, and this is the first time the approach has been used. The authors might reword that statement. reword that statement.
(2) The authors did not comment on the specificity of the DENV IgM test.
(3) Many NGS-derived virus sequences deposited in data banks have errors. Thus, whereas the NGS-generated data is useful for the diagnosis, indels etc. are not always verified in the submitted sequences, the general understanding among users being that errors may be present. Also, often, the 5' and 3' UTRs are not included. The authors might discuss these issues. In particular, there are artefacts associated with Ion Proton sequencing associated with repeats of the same nucleotide (example: TTTTT).

If applicable, is the statistical analysis and its interpretation appropriate? Not applicable
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Yes
No competing interests were disclosed.

Competing Interests:
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.
the RefSeq DENV3 strain are included in Supplementary File 5.

Considering that DENV virus protein translation starts as a polyprotein formation from a single ORF, identifying two internal stop codons at positions 58 & 168 seems to be curious. Have the authors done a manual verification of the corresponding raw read alignments? What could be the ? implications of this finding
We have realigned the sequence and manually verified this. Both changes are well supported. Our experiment cannot distinguish between defective viral particles and viable ones. Also, we are potentially putting together sequences from different viruses to make the consensus. These two factors limit our ability to interpret these findings.
3.The authors have identified two missing regions-one each in the 5' and 3' UTR. Are they internal gaps or terminal truncation in the sequence? Could this be a failure to sequence these regions by the technique or is it possible that the DENV-3 strain itself is lacking these regions? Since the UTRs contain major replication regulatory elements, discussing this observation seems to be important.
Given the importance of the UTRs in viral replication, we strongly suspect that our approach has limitations in capturing the ends of the genome.
4.In the result section, the unique substitution is mentioned as D361E, where as in the fig.3A it is shown as E361D. The reader gets confused on the order of the sequences that are being compared.
The textchas been corrected in line with the figure.
Most of the amino acid substitutions were shared among all the Indian strains, while a D361E substitution was unique to the DENV3 strain reported here ( Figure 3A). to Most of the amino acid substitutions were shared among all the Indian strains, while a E361Dsubstitution was unique to the DENV3 strain reported here ( Figure 3A). Fig.3b(ii), the authors have highlighted the amino acid residues that are important in antibody binding and antigenicity. But how many of these changes are functionally important, based on the chemical nature of the amino acid substitutions observed? Also, the authors should indicate the wild type as well as the mutant amino acids along with the position number in the figure.

In
The figure and accompanying legend have been modified.
6.Was there any specific reason in using plasma sample from one patient and serum from others? It would be good to indicate, if available, the duration of fever in F1, and treatment and outcome in F4, for reasons of consistency.
In this study, we used samples remaining after routine testing and plasma sample was available for that patient. The manuscript by Dias describes the use of metagenomic sequencing of RNA from serum/plasma et al samples of febrile patients using an Ion Proton system to identify viral sequences. Using this approach they could identify presence and complete sequence of DENV-3 in one of the samples tested. They could also detect presence of Japanese Encephalitis virus sequences in the sample from the same patient and also in another sample. The work is interesting and has shown the possibility of using this approach in clinical samples for detection of pathogen sequence signatures.
This reviewer had the following queries: 1.New Generation sequencing approaches are attractive options to identify viral quasispecies in patients. In the present article, the authors have presented a common consensus final sequence of DENV-3 and compared it with reference sequence and the closely related GWL-25 genome. Did the authors had a chance to look at the raw read clusters, and identify possible variations that could represent quasi species by looking at sites/ sequence regions showing consistent variability? A comparison table showing the non-synonymous sites could have been included.
2. Considering that DENV virus protein translation starts as a polyprotein formation from a single ORF, identifying two internal stop codons at positions 58 & 168 seems to be curious. Have the authors done a manual verification of the corresponding raw read alignments? What could be the implications of this finding?
3.The authors have identified two missing regions-one each in the 5' and 3' UTR. Are they internal gaps or terminal truncation in the sequence? Could this be a failure to sequence these regions by the technique or is it possible that the DENV-3 strain itself is lacking these regions? Since the UTRs contain major replication regulatory elements, discussing this observation seems to be important.
4.In the result section, the unique substitution is mentioned as D361E, where as in the fig.3A it is shown as E361D. The reader gets confused on the order of the sequences that are being compared. Fig.3b(ii), the authors have highlighted the amino acid residues that are important in antibody binding and antigenicity. But how many of these changes are functionally important, based on the chemical nature of the amino acid substitutions observed? Also, the authors should indicate the wild type as well as the mutant amino acids along with the position number in the figure.

5.In
6.Was there any specific reason in using plasma sample from one patient and serum from others? It would be good to indicate, if available, the duration of fever in F1, and treatment and outcome in F4, for reasons of consistency.
Is the work clearly and accurately presented and does it cite the current literature? Yes

Are sufficient details of methods and analysis provided to allow replication by others? Yes
If applicable, is the statistical analysis and its interpretation appropriate?

Not applicable
Are all the source data underlying the results available to ensure full reproducibility? Yes

Are the conclusions drawn adequately supported by the results? Partly
No competing interests were disclosed.

Competing Interests:
Referee Expertise: Molecular Virology, Dengue, Chikungunya, Host-pathogen interaction, Virus evolution We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.