In Silico Characterisation of a Type D Betaretrovirus from Malayan Flying Fox, Pteropus vampyrus

Bats are well known for their role as reservoirs of viral diseases. Previous studies have reported the presence of endogenous gamma and betaretrovirus in diverse species of bats. Here, we characterize one of these lineagesPteropus vampyrus endogenous betaretrovirus namely PvEB in depth describing the genomic organization, age estimation and phylogenetic approaches to examine the origin and history of this virus. Phylogenetic analysis of gag, pol and full genome sequence has placed PvEB robustly within the betaretrovirus genus. Subsequently, PvEB was estimated to invade the bat genome since 1.5-8.8mya. The results of bioinformatics analysis were verified through amplification of the PvEB reverse transcriptase gene with the size of 700bp. With these findings together with the previous report, it is evident that PvEB is a member of type D betaretrovirus group.


Introduction
Retroviruses that are coming from the Retroviridae family infect vertebrates and are characterized by a replication strategy in which their RNA genomes are reverse transcribed into a DNA copy and integrated into the genome of the infected cell.Occasionally, this process occurs in germ line cells, such that integrated retrovirus DNA can be vertically inherited as genomic loci called endogenous retroviruses (ERVs) [1].ERVs can reach fixation in their host species, however, most are defective and incapable of producing infectious particles [2].Many mammalian species harbor ERVs are closely related to the exogenous gamma and betaretrovirus genera including rodents and bats.
Bats are the second most species-rich group of mammals (after rodents) and are broadly divided into two groups, Microchiroptera and Megachiroptera.The latter group comprises single family, called Pteropodidae that includes all flying foxes and Old World fruit bats.The Microchiroptera and Megachiroptera are estimated to have diverged approximately 45-50.2 million years ago (mya) -within the range of available fossil data for bats [3,4].The lineage that gave rise to the contemporary Megachiropteran species, Pteropus vampyrus is estimated to have diverged from the main Megachiropteran branch approximately 31.9 mya [3].
Bats are known for gregarious roosting lifestyle which makes them as an ideal host for numerous pathogens [5] including Hendra virus [6], Nipah virus [7,8] and Ebola virus [9] that conveyed significant health and economic impact as humans and livestock are jeopardized by these viral pathogens.Previous study has discovered bats to harbor diverse endogenous gammaretroviruses [10] and endogenous betaretroviruses [11].Hayward et al. has revealed the integration time of endogenous retrovirus into bat genome which was based on the number of nucleotide variations between 5' and 3' LTR of each betaretrovirus.It was estimated to take place within 3.2mya to 36.3mya with one element closely to PvEB, called PvERV-βJ was likely to integrate at 6.3mya.
In addition, Betaretroviruses can be further divided into Type B and Type D based on distinctive morphologies of its retroviral particle [12] and genomic organisation.
Under the genomic organization, some of these betaretroviruses, could be infectious such as Type B Mouse Mammary Tumor Virus (MMTV) which possess superantigen (sag) protein as a virulence factor and Type D (Mason Pfizer Monkey Virus, MPMV and Jaagsiekte Sheep Retrovirus, JSRV) which encode oncogenic gene at its envelope region.In this study, we analyze PvEB with phylogenetic analysis of their viral genes (gag, pol, env), dating the invasion time of PvEB into the Pteropus vampyrus genome and verified its presence using PCR amplification of its reverse transcriptase gene.
The LTRs (5' and 3') were searched based on sequence similarity between the upstream of gag and downstream of env.The difference between the pair of the LTRs were observed and calculated in percentage in order to predict whether it is a recent integration of retrovirus into the host genome.The LTRs were also searched for TATA box, primer binding sites (PBS) at its 5' and polypurine tract (PPT) at its 3' .

Phylogenetic tree reconstruction
In order to determine the phylogenetic relationships between PvEB and other groups of retroviruses, maximum likelihood (ML) trees were constructed from the ClustalW alignments of gag, pol, env and fulllength sequences.Using MEGA 5.1 software [13], the ML trees were constructed using selected substitution model [14].The best model substitution for each gene together with the AIC and log likelihood value were listed in Table 1  Whelan and Goldman (WAG) substitution matrix [15] was selected as the best fit model for gag and env whereas General Reverse Transcriptase (rtREV) model [16] was selected for pol and whole genome sequences (Table 1).These models were selected based on lowest AIC value and high log likelihood value [13] when compared to other listed models.This is expected since different genomic regions may have different selective pressures and evolutionary constraint [17].Apart from that, all positions containing gaps were eliminated.In order to determine the strength of support for the placement of PvEB and other groups of retroviruses, 100 of bootstrap replication were used for each viral gene.Thirty two sequences of established retroviruses genera plus two sequences of established endogenous betaretrovirus (bat and guinea pig) were used as ingroup taxa whereas one gypsy retrotransposon was used as outgroup [18].

Estimation of integration time of invasion to host genome
The integration time of PvEB in bats was estimated using LTR analysis with the following formula T= (D/R)/2 depicted from [19] and [11].The differences between the nucleotides of 5' and 3' LTR was abbreviated as 'D' whereas the genomic substitution rate was indicated as 'R' .Both LTRs of PvEB were assumed to be indistinguishable at the time of integration.Through comparisons of both LTRs, the pair differed by 0.007 substitutions per site whereas the rate of neutral substitution for bat has been estimated at ~3.968 × 10 -10 [11].

DNA extraction and PCR amplification
We used polymerase chain reaction (PCR) approach to amplify the reverse transcriptase gene of PvEB and confirm its presence in the Pteropus vampyrus genome.P. vampyrus tissue sample (MZU/M/199) was obtained from Universiti Malaysia Sarawak (UNIMAS) Zoological Museum.Since the tissue was highly degraded, we extracted the genomic DNA using modified method of CTAB extraction and PCR purification [20].Tissue sample was ground together with CTAB extraction buffer (7.5 g CTAB powder, 1 M Tris, 0.5M EDTA, 5M NaCl and distilled water) and 20 µl of Proteinase K was added into microcentrifuge tube.The sample was incubated at 55˚C for 30 minutes.The extraction solution was centrifuged at 13000rpm for 5 minutes and later the supernatant was transferred into new tube.The supernatant was then treated as per manual in QIAquick PCR Purification Kit.The eluate was used as template in PCR amplifications.
PCR products were generated from 4 ul of DNA template, 1 × PCR buffer, 200 µM dNTPs and 0.25U Taq DNA polymerase (OneTaq).The primers used were designed from pol region of PvEB with its forward primer 5'TTTGAGGGCTTACTTGATTCG3' and reverse primer 5'TTACATGGATGATATTCTTGTAGC3' making the final volume of 25 µl.Amplifications were performed on BioRAD thermocycler under the following conditions: 1 cycle of 95˚C for 2 minutes, 35 cycles of 94˚C for 15 seconds, 63˚C for 20 seconds and 72˚C for 2 minutes and 1 cycle of 72˚C for 10 minutes.The PCR product was then separated on 1% agarose gel in 1 × TAE buffer with 1 kb DNA ladder (NorgenBiotek) as DNA marker.Genomic DNA extraction was performed according to the method described in [20] with few modifications since the tissue sample was ancient and has been preserved in 95% ethanol for quite some time.

PvEB is a type-D endogenous betaretrovirus based on genomic organisation
To obtain further insight into the genetic structure of PvEB, we analyzed the full sequence of PvEB from our previous study [21] and showed the gag, pol and env genes into a summarized schematic diagram (Figure 1) based on their location in open reading frames.All viral genes present in PvEB were included together with their length as in Figure 1a and b.Secondly, dUTPase and protease genes are located upstream of pol gene of different frame.The location of dUTPase in PvEB is similar to other type D betaretroviruses such as MPMV and SMRV which is upstream of pol gene.In comparison, lentiviral dUTPase genes are located in the pol coding domain in between the genes encoding reverse transcriptase and integrase.Studies on the varying dUTPase acquisitions also showed that retroviral dUTPase genes found in protease frame have a monophyletic origin and were acquired by alpha-like retroviruses earlier in evolution, before or during the formation of betaretroviruses, between reverse transcriptase and integrase [22].In previous study by Baldo and McClure, (1999), betaretroviruses was shown to possess dUTPase at the upstream of protease while the lentiviruses possessed dUTPase in between reverse transcriptase and integrase of pol gene.Additionally, the motifs found in dUTPase were almost congruent to the motifs described in by McGeoch (1990) of dUTPase of betaretroviral families [23].The five dUTPase motifs found in PvEB are SGTIL, GRAS, SVVDADYEGEIK, RIAQ and RGASNPGSS.dUTPase in betaretrovirus has a fundamental role in DNA metabolism and repair [24], viability and infectivity of viruses [25].Additionally, non-primate lentiviruses and betaretroviruses possess dUTPase gene at distinct genomic location where betaretroviruses dUTPase gene located at the upstream of the protease gene [26] which is in congruent with the result discussed above [27].
Thirdly, as retroviruses were previously categorized based on the specific tRNA that anneals to their PBS required for initiation of reverse transcription, we determined the specific tRNA used by PvEB.It was found to harbour a PBS complementary to tRNALys2 sequence typical of a type D betaretrovirus.
Other features of PvEb include the pol and env retroviral elements have several stop codons and are thus expected to yield truncated gene product.However, the open reading frames (ORFs) for gag and pro reading frames are intact, suggesting that they might code for functional proteins.The gag ORF contains a predicted retroviral matrix protein (gag p10; 73.3% identity; E-value 1.2e-22) and core nucleocapsid protein (gag p24; 134% identity; E-value 3.9e-40) while the protease region has a trimeric dUTP diphosphate domain (dUTPAse; 61.7% identity; E-value 4.1e-17).

PvEB close to mammalian endogenous type D betaretrovirus based on phylogenetic tree analysis
PvEB grouped within the type D betaretrovirus clade from mammalian (i.e.MPMVand SMRV) with 68% and 99% bootstrap support respectively in phylogenies constructed using both gag and pol sequences (Figure 2a and 2b).Moreover, these findings are consistent with the findings in [11] that both genes are closely related to SMRV and MPMV, a type D betaretroviruses.Similarly, additional analysis using the full-length sequence of PvEB with other retroviruses also showed significant bootstrap support (74%) for the inclusion of PvEB within the betaretrovirus clade (Figure 2c).However, phylogenetic tree of env gene showed that PvEB is located within the gammaretroviruses with 84% bootstrap support (Figure 2d).This slight different grouping pattern is expected since the env region is highly divergent in nature [19].This could also be due to recombinant origin, that possess gamma-beta type env [28].In addition, the relationship of PvEB with other betaretroviruses is in parallel to the findings in [19] which reported that common vampire bat (Desmodus rotundus) was related to primate and rodent betaretroviruses.

PvEB has diverged from its host since 8.8mya
The time of PvEB in bats was estimated using LTR analysis with the following formula T= (D/R)/2 depicted from [19] and [11].The differences between the nucleotides of 5' and 3' LTR was abbreviated as 'D' whereas the genomic substitution rate was indicated as 'R' .Both LTRs of PvEB were assumed to be indistinguishable at the time of integration.Through comparisons of both LTRs, the pair differed by 0.007 substitutions per site whereas the rate of neutral substitution for bat has been estimated at ~3.968 × 10 -10 [11].Therefore, the estimated age of PvEB genome invasion is approximately 8.8mya.Though the exact invasion of PvERV-βJ occurred more recent that is 6.3mya, the divergence time of PvEB is still congruent to the findings of [11], whom reported the invasion of endogenous betaretrovirus in the origins of modern bats have occurred within 3.2 to 36.3mya.Nevertheless, when applying the average mammalian neutral substitution rate, 2.292 × 10 -9 [29], PvEB was estimated to have a minimum integration date of 1.5mya.This finding extends the range of time since divergence of bat endogenous betaretrovirus that is from 1.5-36.3mya.

Estimated size of amplified reverse transcriptase gene is 700bp
As shown in Figure 3, the estimated size of PCR product was 700bp.The PCR primer was designed from domain 1 to V of reverse transcriptase gene from our previous bioinformatics work on this sample.This PCR product is in congruent to most endogenous retrovirus and retroelements obtained from previous study.

Conclusion
PvEB possesses a typical type D betaretrovirus genome organization and phylogenetic tree analysis of gag, pol and full-length sequence data placed PvEB robustly within a type D betaretrovirus group.It was estimated that the invasion of bat genome by PvEB dated back from 1.5 to 8.8 mya.PCR amplification of the PvEB reverse transcriptase gene confirmed its presence in the Pteropus vampyrus genome.
Therefore, the findings of PvEB using bioinformatics analysis is supported by the phylogenetic analysis and later, the molecular work verified the presence of a type D betaretrovirus in the Pteropus vampyrus genome.

Figure 1 :
Figure 1: (A) Genomic structure of PvEB.Panel 1 shows complete genomic organization of PvEB; 5'LTR-gag-dUTPase-pol-env-orf1-3'LTR.Orf 1 was also present in PvEB.Panel II shows the presence of TATA box, Primer binding site (PBS) at the 5' end and polypurine tract (PPT) at the 3' end.(B) Open reading frame map of PvEB.Full length bar indicates stop codon and half length bar indicates start codon.

Figure 2 :
Figure 2: Phylogenetic analysis of (a) gag, (b) pol, (c) whole genome sequence and (d) env of seven groups of endogenous and exogenous retroviruses.PvEB and other betaretroviruses were labeled with red triangle whereas members of gammaretroviruses were labeled as blue circle.

Figure 3 :
Figure 3: DNA band indicates successful amplification of the reverse transcriptase region in Pteropus vampyrus genome.The estimated size of the DNA band is 700bp.Lane1-DNA ladder 1 kb, Lane 3-amplified reverse transcriptase gene of Pteropus vampyrus.
Best model for each major viral gene and whole genome sequence.

Table 2 :
Comparison of viral gene sizes between PvEB and other Betaretrovirus.