Complete genome sequence of T’Ho virus, a novel putative flavivirus from the Yucatan Peninsula of Mexico

We previously reported the discovery of a novel, putative flavivirus designated T’Ho virus in Culex quinquefasciatus mosquitoes in the Yucatan Peninsula of Mexico. A 1358-nt region of the NS5 gene was amplified and sequenced but an isolate was not recovered. The complete genome of T’Ho virus was sequenced using a combination of unbiased high-throughput sequencing, 5′ and 3′ rapid amplification of cDNA ends, reverse transcription-polymerase chain reaction and Sanger sequencing. The genome contains a single open reading frame of 10,284 nt which is flanked by 5′ and 3′ untranslated regions of 97 and 556-nt, respectively. Genome sequence alignments revealed that T’Ho virus is most closely related to Rocio virus (67.4% nucleotide identity) and Ilheus virus (65.9%), both of which belong to the Ntaya group, followed by other Ntaya group viruses (58.8–63.3%) and Japanese encephalitis group viruses (62.0–63.7%). Phylogenetic inference is in agreement with these findings. This study furthers our understanding of flavivirus genetics, phylogeny and diagnostics. Because the two closest known relatives of T’Ho virus are human pathogens, T’Ho virus could be an unrecognized cause of human disease. It is therefore important that future studies investigate the public health significance of this virus.


Background
The genus Flavivirus (family Flaviviridae) contains more than 70 viruses, most of which are transmitted to vertebrates by arthropod vectors such as mosquitoes and ticks [23]. The genus is divided into at least 14 groups on the basis of nucleotide (nt) and deduced amino acid sequence data, antigenic relatedness and other characteristics. Two groups within the genus are the Ntaya and Japanese encephalitis (JE) groups. According to the Ninth Report of the International Committee on Taxonomy of Viruses, the Ntaya group consists of six vi- All flaviviruses possess a single-stranded, positivesense RNA genome of approximately 11 kb [29]. The genome encodes a major open reading frame (ORF) that is flanked by 5′ and 3′ untranslated regions (UTRs) of 100 and~400-700 nt, respectively. The ORF encodes a polyprotein that is co-and post-translationally cleaved to generate three structural proteins, designated the capsid (C), premembrane/membrane (prM/M) and envelope (E) proteins, and at least seven nonstructural (NS) proteins in the gene order: 5′-C-prM (M)-E-NS1-NS2A-NS2B-NS3-NS4A-2K-NS4B-NS5-3′. Some viruses in the JE group utilize efficient −1 ribosomal frameshifting to produce a larger NS1-related protein (NS1') [18,33].
Previously, we provided evidence that a novel flavivirus (designated T'Ho virus) occurs in the Yucatan Peninsula of Mexico [16]. The putative virus was identified in a pool of Culex quinquefasciatus mosquitoes collected in 2007 at the Merida zoo, Yucatan State. A 1358-nt region of the NS5 gene was amplified and sequenced by reverse transcription-polymerase chain reaction (RT-PCR) and Sanger sequencing using flavivirus-specific primers. Application of BLASTn analysis revealed that the sequence is genetically equidistant to the corresponding regions of SLEV (72.6% identical), ILHV (72.2%), JEV (72.1%), USUV (71.8%), ROCV (71.4%), MVEV (71.3%), WNV (71.1%) and BAGV (70.1%). Although we successfully amplified T'Ho virus RNA, we were not able to obtain an isolate by virus isolation in African Green Monkey kidney (Vero) or Aedes albopictus (C6/36) mosquito cells or suckling mouse brain inoculation. In this study, the complete genome sequence of T'Ho virus was determined and its genetic relatedness to other flaviviruses was assessed.

High-throughput sequencing
Trizol Reagent (Invitrogen, Carlsbad, CA, USA) was used to extract total RNA from the pool of Cx. quinquefasciatus previously shown to contain T'Ho virus RNA. Protocols used for the collection, identification and homogenization of mosquitoes have been described elsewhere [16]. RNA extracts were reverse transcribed using SuperScript III (Thermo Fisher, Waltham, MA, USA) with random hexamers. The complementary DNA (cDNA) was RNase-H treated prior to second strand synthesis with Klenow Fragment (NEB, Ipswich, MA, USA). The generated double stranded (ds) DNA was sheared to an average fragment size of 200 bp using manufacturer's standard settings (Covaris focusedultrasonicator E210; Woburn, MA, USA). Sheared products were purified (Agencourt Ampure DNA purification beads, Beckman Coulter, Brea, CA, USA) and libraries constructed. Sheared nucleic acid was end-repaired, dAtailed, ligated to sequencing adapters (NEBNext modules, NEB), PCR amplified (Phusion High-Fidelity DNA polymerase, NEB) and quantitated by Bioanalyzer (Agilent, Santa Clara, CA, USA) for sequencing. Sequencing on the Illumina HiSeq 2500 platform (Illumina, San Diego, CA, USA) resulted in an average of 180 million reads per lane. Samples were de-multiplexed using Illumina software and FastQ files generated. Data were quality filtered and trimmed (Slim-Filter) and de novo assembled using Dwight assembler at custom settings [20]. The generated contiguous sequences (contigs) and unique singleton reads were subjected to homology search using BLASTn and BLASTx against the GenBank database.
5′ and 3′ rapid amplification of cDNA ends The extreme 5′ and 3′ ends of the T'Ho virus genome were determined by 5′ rapid amplification of cDNA ends (RACE) and 3′ RACE, respectively. In the 5′ RACE reactions, total RNA was reversed transcribed using a T'Ho virus-specific primer. Complementary DNAs were purified by ethanol precipitation and oligo (dC) tails were added to the 3′ ends using 15 units of terminal deoxynucleotidyl transferase (Invitrogen, Carlsbad, CA, USA). Tailing reactions were performed at 37°C for 30 min and then terminated by heat-inactivation (65°C for 10 min). Oligo dC-tailed cDNAs were purified by ethanol precipitation then PCR-amplified using a consensus forward primer specific to the C-tailed termini (5′-GACATCGAAAGGGGGGGGGGG-3′) and a reverse primer specific to the T'Ho virus cDNA sequence. In the 3′ RACE reactions, polyadenylate [poly (A)] tails were added to the 3′ ends of the T'Ho virus genomic RNA using 6 units of poly (A) polymerase (Ambion, Austin, TX, USA). Tailing reactions were performed at 37°C for 1 h and then terminated by heat-inactivation (65°C for 10 min). Poly (A)-tailed RNA was reverse transcribed using a poly (A) tail-specific primer (5′-GGCCACGCGTCGACTAGTACTTTTTTTTTTTTTTT TT-3′). Complementary DNAs were PCR amplified using a forward primer specific to the T'Ho virus cDNA sequence and a reverse primer that matched the 5′ half of the poly (A)-specific reverse transcription primer (5′-GGCCACGCGTCGACTAGTAC-3′). PCR products generated from the 5′ and 3′ RACE reactions were inserted into the pCR4-TOPO cloning vector (Invitrogen, Carlsbad, CA, USA) and ligated plasmids were transformed into competent TOPO10 Escherichia coli cells (Invitrogen, Carlsbad, CA). Cells were grown on Lysogeny broth (LB) agar containing ampicillin (50 μg/ml) and kanamycin (50 μg/ml), and colonies were screened for inserts by PCR amplification. An aliquot of each PCR product was examined by 1% agarose gel electrophoresis and several PCR products were purified using a QIAquick spin column (Qiagen, Valencia, CA, USA) and sequenced using a 3730 × 1 DNA sequencer (Applied Biosystems, Foster City, CA, USA).

Nucleotide and amino acid sequence alignments
The genomic and predicted amino acid sequences of T'Ho virus were aligned to all other sequences in the Genbank database by application of BLASTn and BLASTp, respectively [1]. Sequence alignments using Clustal Omega (available at http://www.ebi.ac.uk/Tools/ msa/clustalo/) were performed to calculate percent nucleotide and amino acid identities between select sequences.

Genomic organization and BLAST analysis
The T'Ho virus genome consists of 10,937 nt (Genbank Accession No. EU879061.2) and contains a single 10,284-nt ORF flanked by 5′ and 3′ UTRs of 97 and 556-nt, respectively. The ORF encodes the three structural and seven nonstructural proteins common to all known flaviviruses. The position and length of each gene and untranslated region are shown in Table 1. The complete genome and deduced polyprotein amino acid sequences of T'Ho virus were aligned to all other flaviviruses sequences for which complete genome data are available. The nucleotide sequence alignments indicated that T'Ho virus is most closely related to ROCV (67.4% identity) and ILHV (65.9%) followed by other Ntaya group viruses (58.8-63.3%) and JE group viruses (62.0-63.7%) ( Table 2). The polyprotein of T'Ho virus has greatest amino acid identity to ROCV (72.2%) and ILHV (70.6%) followed by other Ntaya group viruses (57.1-66.3%) and JE group viruses (63.2-66.3%). The genome sequence of T'Ho virus was inspected for potential overlapping genes but none were identified.

Organization of the polyprotein
Cleavage sites in the T'Ho virus polyprotein are shown in Table 3. These sites were predicted by aligning the polyprotein sequence of T'Ho virus to those of select other flaviviruses for which cleavage sites have previously been predicted and in some instances experimentally verified. The predicted cleavage sites for T'Ho virus are in agreement with the rules previously established for flaviviruses. The C/prM, prM/E, E/NS1 and 2K/ NS4B polyprotein junctions of most known flaviviruses conform to predicted signalase cleavage sites [6] and similar sites were identified at these locations for T'Ho virus ( Table 3). The NS1/NS2A junctions of most known  Subtype of ILHV, MVEV and WNV, respectively n/a not available (genome has not been fully sequenced)  [27]; [32]). Most of the predicted cleavage sites for ILHV are described in the corresponding Genbank entry (accession no. AAV34155) and the remainder determined here. Underlined sequences have been experimentally verified. Cleavage events are mediated by 1 the viral NS2B/NS3 serine protease ([6]; [7]; [5]), 2 a host signal peptidase [6], 3 the cellular furin protease ( [6]; [35]; [41]) and 4 a membrane-bound host protease in the endoplasmic reticulum [15].
flaviviruses occur after a Val-X-Ala site that fulfills the '-1, −3' rule for a signalase site but lacks an upstream hydrophobic domain [6]. The predicted NS1/NS2A junction of T'Ho virus fulfills these requirements. For most flaviviruses, the predicted pr/M junction occurs after an Arg-X-Lys/Arg-Arg or Arg-X-X-Arg motif. T'Ho virus adheres to this rule. The flavivirus virion C/anchor, NS2A/NS2B, NS2B/NS3, NS3/NS4A, NS4A/2K and NS4B/NS5 junctions are commonly cleaved after two basic amino acid residues (KR, RR or RK) [5][6][7] and sites in T'Ho virus are consistent with that rule. Potential N-linked glycosylation sites were identified using the NetNGlyc 1.0 Server (available at http:// www.cbs.dtu.dk/services/NetNGlyc/) with the consensus sequence defined as Asn-X-Ser/Thr (where X is not Pro) in the context of specific surrounding sequences. Fourteen Asn-X-Ser/Thr motifs were identified in the T'Ho virus polyprotein sequence. Eight motifs are predicted to be utilized by N-linked glycans: these are located at prM 15

Phylogenetic analysis
A ML phylogenetic tree was constructed by RAxML v8.2.8 using the complete ORF sequences of T'Ho virus and all 30 mosquito-borne flaviviruses (species and subtypes) listed in the Ninth Report of the International Committee on Taxonomy of Viruses (Fig. 1). Cell fusing agent virus (an insect-specific flavivirus) was used as the outgroup. Phylogenetic trees were also constructed using Bayesian, NJ and MP methods (data not shown). In all trees, T'Ho virus was most closely related to ROCV and ILHV. The Bayesian posterior support for this grouping was strong (100%), in agreement with similarly strong bootstrap support (100% for ML, 100% for MP, 94% for NJ). These viruses cluster within a larger clade that contains the Ntaya group viruses with the exception of ZIKV, although in phylogenies based on short conserved polymerase motifs used in taxonomic classification ZIKV appears more closely related to the Ntaya group viruses [24]. The Bayesian posterior support for this topological arrangement is 100%, but bootstrap support is weaker ranging from 49% for MP to 79% for NJ. Trees generated by all methods agree on the composition of all clades shown in Fig.  1 with 98% or more bootstrap support, but the relative placement of these clades is uncertain (data not shown).

Discussion
We report the complete genome sequence of T'Ho virus, a novel putative flavivirus discovered in Cx. quinquefasciatus from the Yucatan Peninsula of Mexico. The closest known relatives of T'Ho virus are ROCV and ILHV, that both belong to the Ntaya group of mosquito-borne flaviviruses, with 67.4% and 65.9% nt. identity, respectively. Based on genetic criteria, we propose that T'Ho virus is classified as a new species within the genus Flavivirus. It has been suggested that flaviviruses with >84% nucleotide sequence identity should be classified within the same species [28]. The genome of T'Ho virus has only 58.8-67.4% nt. identity (mean: 63.4%) to the seven viral species and subtypes currently assigned to the Ntaya group, and this amount of genetic relatedness is not dissimilar to that observed between group members (≥57.9% identity; mean: 65.9%). Phylogenetic analysis of full polyprotein sequences revealed that T'Ho virus forms a distinct clade with Ntaya group viruses, except for ZIKV. Based on previous sequence analysis of a 1358-nt region of the NS5 gene of T'Ho virus that indicated it to be genetically equidistant to ILHV, SLEV, WNV, ROCV and JEV, we speculated that T'Ho virus could also serologically resembled WNV [16]; however, the NS5 gene is one of the most conserved regions of the flavivirus genome [6] and sequence alignments and phylogenetic studies performed using relatively short, conserved sequences are not as robust as those performed with complete genomes.
The two closest known relatives of T'Ho virus are recognized human pathogens. ROCV was responsible for several epidemics of severe encephalitis in Brazil in the 1970s [10,11,17,44]. A case fatality rate of 10% and long-term sequelae in 20% of the surviving patients were reported. ILHV has been sporadically isolated from humans in Central America, South America and the Caribbean, and symptomatic infections are usually characterized by fever, headache, myalgia and arthralgia although central nervous system manifestations have also been reported [25,37,39,40,45]. Because T'Ho virus is most closely related to known human pathogens, it too could be a cause of human disease. Most other Ntaya group viruses are also recognized pathogens; ZIKV is a cause of febrile illness, neonatal microcephaly and linked to Guillain-Barré syndrome [47]. BAGV, ITV and TMUV are known avian pathogens [4,9,30]. Most JE group viruses are also serious pathogens of humans and other vertebrates [21].
An isolate of T'Ho virus is not available; thus, experimental infection studies cannot currently be performed to assess the vector and reservoir competence of mosquito and vertebrate species likely to be involved in its amplification. However, probable vectors and reservoir hosts of T'Ho virus can be inferred from the information available for its closest known relatives. The amplification cycles of ROCV and ILHV are not well defined but most isolations have been made from Aedes and Psorophora spp. mosquitoes (particularly Ps. ferox) with birds implicated as principal reservoir hosts [2,12,17,19,31,34]. Most other Ntaya group viruses are primarily maintained in transmission cycles between Culex spp. mosquitoes and birds [13,22,43], the notable exception being ZIKV which cycles between Aedes spp. mosquitoes and primates [47]. JE group viruses are primarily maintained in transmission cycles between Culex spp. mosquitoes and birds [21]. It is therefore likely that the principal amplification vectors of T'Ho virus are Culex, Aedes and/or Psorophora spp. mosquitoes and the principal reservoir hosts are birds.

Conclusion
In conclusion, we describe a novel species, T'Ho virus, of the genus Flavivirus, whose closest known relatives are human pathogens; thus, it is feasible to suggest that T'Ho virus may be an unrecognized cause of human disease in the Yucatan Peninsula of Mexico. Our report enables directly the creation of specific PCR diagnostic assays. However, serological cross-reactivity is common with flaviviruses [3], as exemplified in the ongoing ZIKV pandemic where the differential diagnosis of ZIKV and dengue virus infections is difficult [46]. Accordingly, virus isolates or a recombinant virus will be needed for serosurveillance studies in Latin America where several other flaviviruses, including dengue virus, yellow fever virus, ZIKV, WNV, ROCV and ILHV, may co-circulate in overlapping geographic areas.