Construction of A Preliminary Three-Dimensional Structure Simian betaretrovirus Serotype-2 (SRV-2) Reverse Transcriptase Isolated from Indonesian Cynomolgus Monkey

Simian betaretrovirus serotype-2 (SRV-2) is an important pathogenic agent in Asian macaques. It is a potential confounding variable in biomedical research. SRV-2 also provides a valuable viral model compared to other retroviruses which can be used for understanding many aspects of retroviral-host interactions and immunosuppression, infection mechanism, retroviral structure, antiretroviral and vaccine development. In this study, we isolated the gene encoding reverse transcriptase enzyme (RT) of SRV-2 that infected Indonesian cynomolgus monkey (Mf ET1006) and predicted the three dimensional structure model using the iterative threading assembly refinement (I-TASSER) computational programme. This SRV-2 RT Mf ET1006 consisted of 547 amino acids at nucleotide position 3284–4925 of whole genome SRV-2. The polymerase active site located in the finger/palm subdomain characterised by three conserved catalytic aspartates (Asp90, Asp165, Asp166), and has a highly conserved YMDD motif as Tyr163, Met164, Asp165 and Asp166. We estimated that this SRV-2 RT Mf ET1006 structure has the accuracy of template modelling score (TM-score 0.90 ± 0.06) and root mean square deviation (RMSD) 4.7 ± 3.1Å, indicating that this model can be trusted and the accuracy can be seen from the appearance of protein folding in tertiary structure. The superpositionings between SRV-2 RT Mf ET1006 and Human Immunodeficiency Virus-1 (HIV-1) RT were performed to predict the structural in details and to optimise the best fits for illustrations. This SRV-2 RT Mf ET1006 structure model has the highest homology to HIV-1 RT (2B6A.pdb) with estimated accuracy at TM-score 0.911, RMSD 1.85 Å, and coverage of 0.953. This preliminary study of SRV-2 RT Mf ET1006 structure modelling is intriguing and provide some information to explore the molecular characteristic and biochemical mechanism of this enzyme.


INTRODUCTION
Retroviral reverse transcriptase (RT) is a multifunctional enzyme that catalyses the formation of a double-stranded DNA from the single stranded retroviral RNA genome. This complex process is called reverse transcription, a critical step in the life cycle of all retroviruses which is responsible for viral genome replication. The newly synthesized DNA is then integrated into the host genome by integrase, another retroviral enzyme (Telesnitsky & Goff 1997). The retroviral RT enzyme has multiple function, such as an RNA-dependent DNA polymerase, a DNA dependent DNA polymerase, DNA-directed RNA cleavage, strand transfer, and strand displacement synthesis (Herschhorn & Hizi 2010). The RT enzyme has two enzymatic activities, as a DNA polymerase and RNase H activities localised in two separate protein domains. The DNA polymerisation function is capable of using either DNA or RNA as a template and RNase H function serves to hydrolyse the RNA strand within an RNA/DNA hybrid. Both the polymerase and RNase H activities are essential for viral replication (Coté & Roth 2008;Telesnitsky & Goff 1997).
RTs from the different groups of retroviruses share similar functional catalytic activities but they significantly differ in particular parameters, such as structure and subunit composition, molecular weights, catalytic properties, biochemical and biophysical characteristics and sensitivity to different inhibitors (Herschhorn & Hizi 2010). The RT of type-1 Human Immunodeficiency Virus (HIV-1) is the most intensively studied enzymes in the last decades (Herschhorn & Hizi 2010). The HIV-1 RT is a major target for the development of antiretroviral drug therapy of HIV-1, the virus causing acquired immunodeficiency syndrome (AIDS), with over half of current FDA-approved therapeutics against HIV infection targeting this enzyme (Ilina et al. 2012;Sarafianos et al. 2009). As a major target for anti-HIV therapy, RT has been the subject of extensive research using crystal structure determinations and biochemical assays (Sarafianos et al. 2009). HIV-1 RT has been characterised structurally. The structures of complexes with dsDNA (Ding et al. 1998); dsDNA and the incoming nucleotide (Huang et al. 1998); and an RNA/DNA hybrid (Sarafianos et al. 2001) are available.
Simian betaretrovirus serotype-2 (SRV-2) is a causative agent of simian acquired immunodeficiency syndrome in Asian macaques it accounts for high morbidity and mortality in research and breeding facilities (Marx et al. 1984;Gardner et al. 1988;Lerche 2010;Montiel 2010;Lerche & Osborn 2003). The SRV-2 genome contains four genes organised in the order 5'-gag-prt-pol-env-3' and encode identically in size of 8105-bp proviruses (Marracci et al. 1995;Marracci et al. 1999). Similar to other retroviruses, SRV-2 can transcribe its own RNA genome into a double-stranded DNA by the action of a reverse transcriptase enzyme. The resulted double-stranded DNA is inserted randomly into the chromosome of the host cell and can express the genetic information of the retrovirus (Marracci et al. 1999). The presence of gene encoding RT enzyme in SRV-2 genome may potentially be studied and utilised further as RT enzyme model for developing an anti RT drug of other retroviruses especially HIV/AIDS.
In this study, we have amplified the gene encoding the RT enzyme of the SRV-2 isolated from an infected Indonesian cynomolgus monkey and predicted the three dimensional (3D) structure model of this SRV-2 RT enzyme using I-TASSER (Iterative Threading ASSEmbly Refinement) computational program. I-TASSER is a computational method that has been successful in accurately modelling protein structures (Zhang 2008;Roy et al. 2010;Yang et al. 2015). I-TASSER uses a combinatorial approach, employing all three conventional methods for structure modelling: comparative modelling, threading, and ab initio modelling (Roy et al. 2010).

Phylogenetic Tree Analysis
The obtained nucleotide sequences were analysed using BioEdit program (Hall 1999) and alignment was performed with computer software BLAST program (Altschul et al. 1997) (https://blast.ncbi.nlm.nih.gov/Blast.cgi). Nucleotide encoded amino acid translation was performed using EMBOSS Transeq Program (Madeira et al. 2019) (http://www.ebi.ac.uk/Tools/emboss/transeq). Amino acid sequences were aligned using the ClustalW2 (Madeira et al. 2019) (https://www.ebi.ac.uk/ Tools/msa/clustalw2/) then adjusted to phylogenetic tree construction using MEGA 6.0 program based on Neighbour-Joining method (Tamura et al. 2013). Genetic distances were estimated using Kimura's two parameter method (Kimura 1980) and the bootstrap analysis was performed 500 replicates to asign confidence to tree nodes. RT amino acid sequences of others retroviruses were used as references: Avian

Functional Structure and Three Dimensional Model of SRV-2 RT Enzyme
Amino acid sequences were aligned using the ClustalW2 and the secondarystructure predictions were made based on sequences and structural motif using I-TASSER (https://zhanglab.ccmb.med.umich.edu/I-TASSER/). The conserve domain and amino acids position of active site were refined manually based on predicted secondary structure and multiple sequences alignment as previously defined of HIV-1 (1HYS A) (Sarafianos et al. 2001) and SRV-2 pol AAD43256 (Marracci et al. 1999). The I-TASSER program was used to determine the function of the query protein inferred by structurally matching the predicted 3D models against the proteins of known structure and function in the Protein Data Bank (PDB). The functional analogs from the global search results are ranked based on the conserved structural patterns present in the model, measured using a scoring scheme that combines template modelling (TM-score), root-mean-squared deviation (RMSD), sequence identity, and coverage of the structure alignment. TM-score was defined to assess the topological similarity of protein structure pairs with a value in the range of (0, 1), a higher score indicating better structural match. The results from the global and local search are combined to present a comprehensive list of functional analogs (Zhang 2008;Roy et al. 2010). Matching structures were analysed using PyMol program (Delano 2002) (http://www.pymol. org) that allowed us to further evaluate possible structural and functional domains of the SRV-2 RT protein.

PCR Amplification and Sequencing of SRV-2 Reverse Transcriptase Gene
Recently, we have isolated and characterised the SRV-2 from wild Indonesian cynomolgus monkeys with sequences analysis result of this virus envelope region revealed high homology (97%-98% similarities) to SRV-2 reported previously in captive macaques in US Primate Research Centre (Iskandriati et al. 2010). In this research, we analysed the RT encoding gene from Indonesian SRV-2 (SRV-2 RT Mf ET1006) since there has been no report focusing on SRV-2 RT study. Therefore, this preliminary research may provide a better understanding of structure and mechanism of retroviral SRV-2 RT. The DNA amplification to SRV-2 RT Mf ET1006 resulted in 1641 bp amplicon (Fig. 1) at the position of 3284-4925 of whole provirus SRV-2 genome consisting of 547 amino acids as enzymatic RT. These SRV-2 RT Mf ET1006 amino acids have 98%-99% similarities to others SRV-2 previously isolated from captive macaques in some primate research centre in US (AF126467, AF126468, and M16605) and have 79%-87% similarities to others SRVs (SRV 1, 3, 4, 5, 8 and SERV) ( Table 1). It seemed that the amino acid sequences of RT enzyme among SRVs are corserved.

Phylogenetic Tree Analysis of SRV-2 RT
We have multiple-aligned our amino acid sequences and constructed phylogenetic tree to determine genetic relationship among RT enzyme family retroviruses (Fig. 2). The phylogenetic result showed that the target sequences of SRV-2 RT Indonesian isolate (SRV-2 RT Mf ET1006) have the closes relationship with SRV-2 D2/RHE/OR (AF126467 and AF126468) first isolated from rhesus monkey at Oregon Regional Primate Research Centre US in 1999 (Marracci et al. 1999). SRV-2 RT Mf ET1006 was clustered with endogenous SERV, SRV-1, SRV-3, SRV-4, SRV-5, SRV-8 and MMTV as betaretrovirus subfamily. This subfamily has the closest relationship with alpharetrovirus (AMV), then both of this cluster branched with the deltaretrovirus subfamily (HTLV-1). Alpha, beta and deltaretrovirus subfamily then branched with lentivirus where HIV-1 as a member of this subfamily is the most investigated retrovirus in biomedical field. The furthest relation of SRV cluster was spumavirus and gammaretrovirus where the MMLV as a member of this subfamily has widely applied RT enzyme in RT-PCR technique. This data has the similar correlation with previously reported research by Li et al. (1995) on pol retrovirus phylogenetic from 55 retroelements where SRV-1 is closely related to MPMV (SRV-3) and evolved after SRV-2. Therefore, the highly conserved pol gene region has been the target in phylogenetic studies both in nucleotide and amino acids sequences (Li et al. 1995).

Functional Structure Prediction of SRV-2 RT Enzyme
In this study, we tried to compare the functional structure of SRV-2 RT Mf ET1006 with HIV-1 RT. The comparison was based on the HIV-1 RT crystal structure in PDB (2B6A and 1HYS) and SRV-2 pol previously reported in GenBank (AAD43256.1). Amino acid sequences of SRV-2 RT Mf ET1006 at the position of 1-547 and HIV-1 RT from 1-553 were aligned using ClustalW, STRAP2 (Gille & Frömmel 2001) and I-TASSER (Fig. 3). Comparison between HIV-1 RT p66 andSRV-2 RT Mf ET1006 showed that finger/palm subdomain has 32% similarity of 237 residues, the thumb subdomain has 25.6% similarity of 82 residues, the connection subdomain has 12% similarity of 133 residues, and the RNase H subdomain has 27% similarity of 118 residues. Overall, the full-length comparison between SRV-2 RT Mf ET1006 and HIV-1 RT has 25.5% of 553 residues. Meanwhile, if the very conservative amino acid (such as Ala vs. Val, Leu vs. Ile, etc.) was allowed for comparison, it resulted in 43% similarities. The polymerase active site that is located in the finger/palm subdomain is characterised by three conserved catalytic aspartates at the position of Asp90, Asp165, Asp166 for SRV-2 RT Mf ET1006, and Asp110, Asp185, Asp186 for HIV-1 RT (Table 2). This triad aspartates (the last two residues are part of YXDD motif), play a significant role in the HIV-1 RT enzymatic catalytic through the binding to the divalent cation Mg 2+ (Sharma et al. 2005). This leads to the assumption that the catalytic active site of SRV-2 RT Mf ET1006 coordinate with two divalent ions, Mg 2+ . The YXDD motif, where X is a variable amino acid, is highly conserved and plays a critical role among various viral RNA dependent DNA polymerases. The conserved motif of SRV-2 RT Mf ET1006 and HIV-1 RT consists of four amino acids, YMDD for Tyr, Met, Asp, Asp at the position 163-166 for SRV-2 RT Mf ET1006 and 183-186 for HIV-1 RT. SRV-2 RT Mf ET1006 has a YMDD amino acid motif similar with HIV-1, SIV and FIV, however, it was different from YVDD motif that MMLV and FeLV have (Sharma et al. 2005). Recently, many studies focused on this YXDD motif due to its significance in NRTI resistance, RT fidelity, polymerase process, and viral replication (Sharma et al. 2005). The dNTPs binding sites is also located in the polymerase active site and have conserve residues which are Lys64, Arg72, Asp110, Asp113, Gln151, and Asp186 for HIV-1 RT. Meanwhile Lys45, Arg52, Asp90, Asp93, Gln131, and Asp166 for SRV-2 RT Mf ET1006. The RNase H active sites that interacted with DNA/RNA hybrid contain conserved DED motif such as Asp, Glu, and Asp at the position of 443, 478, and 498 for RNase H of 465,and 486 for RNase H of SRV-2 RT (Table 2 and Fig. 3).  Figure 3: Amino acid sequences alignment and secondary structural motifs prediction of the SRV-2 RT Mf ET1006 and HIV-1 RT. Amino acid sequence of SRV-2 RT Mf ET1006 from the position 1-547 and HIV-1 RT from 18-553 were aligned based on sequences and structural motifs using the programs ClustalW, STRAP2 and I-TASSER. The motifs were indicated as α-helixes (H), β-sheets (S), and coils (C). The amino acids active site was indicated in red and highlighted red, the DNA binding sequences were indicated in green, dNTPs binding site were indicated in blue, NNRTI binding site were indicated in brown, RNA/DNA hybrid (RNase H) were indicated in purple and highlighted purple. Meanwhile, the YXDD sequences were over-scored with a red line. The boundaries of the Finger/Palm, Thumb, Connection and RNase H were shown with a vertical line.

Three Dimensional Structure Model of SRV-2 RT Mf ET1006
Three-dimensional structure prediction of the full-length SRV-2 RT Mf ET1006 is depicted in Fig. 4(A). The models were generated using I-TASSER method and visualised using Pymol program. Five models of SRV-2 RT Mf ET1006 were computationally generated based on pair-wise structure similarity and the Model I structure gave a highest C-score at 1.33 compared to Models 2 to 5 with C-score less than -0.1. The C-score is a confidence score which ranges from -5 to 2, with higher scores representing higher confidence in the model. The C-score has a strong correlation with the quality of the I-TASSER models (Roy et al. 2010).
In general, models with C-score > -1.5 are expected to have a correct fold. The RMSD and TM-score are both well-known measures of topological similarity between the model and native structure. TM-score values range from 0 to 1, where a higher score indicates a better structure match (Roy et al. 2010). The model I structure of SRV-1 RT Mf ET1006 has the estimated accuracy at TM-score 0.90 ± 0.06 with RMSD 4.7 ± 3.1Å. Therefore, the Model I SRV-2 RT Mf ET1006 structure has the best C-score. So, it can be trusted, and the accuracy can be seen from the appearance of protein folding in tertiary structure (Fig. 4). The superpositionings between HIV-1 RT and SRV-2 RT Mf ET1006 were performed to predict the structural details and to optimise best fits for illustrations. Table 2, Fig. 3 and Fig. 4(B) showed the structural motifs used in the superposition comparisons between SRV-2 RT Mf ET1006 and HIV-1 RT. The HIV-1 RT structures possessing the most consistent and well-defined motifs for our study were 1HYS (Sarafianos et al. 2001), 2HMI (Ding et al. 1998) and 2B6A (Roth et al. 1997). This SRV-2 RT Mf ET1006 structure model has the highest homology to HIV-1 RT (2B6A) with estimated accuracy at TM-score 0.911, RMSD 1.85 and coverage of 0.953 measured with TM-align from Zhang Lab (Zhang & Skolnick 2005). The models of the SRV-2 RT Mf ET1006 in complex with 18/19-mer DNA/RNA nucleic acid substrates stretching from the polymerase active site to the RNase H active site are depicted in Fig. 5. The SRV-2 RT Mf ET1006 was superpositioning to HIV-1 RT (1N6Q) (Sarafianos et al. 2002) based on I-TASSER template proteins with similar binding site. The assessment of this model was carried out using TM-align server (http://zhanglab.ccmb.med.umich.edu/TM-align) with TM-score 0.838, Identity 0.258 and BS-score 1.48. BS-score is a measure of local similarity (sequence and structure) between template binding site and predicted binding site in the query structure. Based on large scale benchmarking analysis, BS-score > 1.1 reflects a very good local match between the predicted and template binding site (Roy et al. 2010).
The model of interaction between SRV-2 RT Mf ET1006 site actives with nucleic acids was developed based on the available crystal structure HIV-1 RT (Huang et al. 1998) (Fig. 6). The amino acids involved in interaction with incoming dNTP of HIV-1 RT such as Lys65, Arg72, Asp113, Ala114, Tyr115 and Gln151 (Huang et al. 1998), have a conserved amino acid with SRV-2 RT Mf ET1006 as Lys45, Arg52, Asp93, Gln131. Ala114 and Tyr115 in HIV-1 RT were replaced with Cys94 and Phe95, in SRV-2 RT Mf ET1006. The substitution of Tyr115 with Phe caused a little deleterious effect in enzyme catalytic activities that interact directly with deoxyribose of the incoming triphosphate (Boyer et al. 2000). We predicted the first nucleotide overhang of the template that will pair with incoming dNTP and interact with side chain of Leu54 and against the backbone of Gly132 (Figs. 6(A) and (B)). The second and third 5'-template overhang will pack against the residues of Trp7, Pro8 and Phe42. Direct interaction of bases in minor groove occur with Pro137, Met164, Leu74 and Tyr163. The incoming dNTP pairs with the templating bases in which the triphosphate residues will interact with Lys45, Arg52, Asp93 and Cys94. Meanwhile, the 3'-OH of dNTP will line by side chain of Asp93, Phe95, Gln131 and the peptide backbone between 93 and 95.  In this study, a predicted three-dimensional structure of SRV-2 RT Mf ET1006 enzyme should be very informative and useful for understanding its structure and functional characteristics, since no structure is currently available for this enzyme. This is because the structure of a protein provides precise molecular details that often facilitate experimental characterisation of an expected function. The development and validation of computational methods that can predict protein structure to a relatively high level of accuracy and thereby facilitate functional annotation, biochemical analyses, and biological characterisation are a high priority. Our model provides an initial structural framework for understanding the SRV-2 RT Mf ET1006 domains function, enzymatic site active and hybrid DNA/ RNA-enzyme interaction.
A B Figure 6: Interaction of amino acids that play roles in polymerase site active of SRV-2 RT Mf ET1006 with terminal portion of the RNA/DNA duplex. DNA was indicated with red and blue stick, while RNA in red, white and blue stick, signed with -1, -2 and -3 as duplex RNA/ DNA; overhang RNA was indicated by +1 as round edges stick. (A) SRV-2 RT Mf ET1006 site active is represented in green surface with conserved amino acids Asp165, Gln131, Gly132 and Arg58 (shown in green stick); (B) SRV-2 RT Mf ET1006 amino acids site active represented in green stick.

CONCLUSION
The preliminary study of three-dimensional structure modelling of monomeric SRV-2 RT Mf ET1006 compared to heterodimeric HIV-1 RT is intriguing, given some information about the similarities and differences of amino acid sequences between them to predict the same function of each subdomains. Using the structural model studies to explore the molecular characteristic and biochemical mechanism of SRV-2 RT Mf ET1006 and to understand the mechanism of enzymatic catalytic would provide valuable information. Recently, many RT structures have been explored and reported in PDB Data Bank. Therefore, these available RT structures have contributed and inspired us to explore the SRV-2 RT Mf ET1006 structure based on three-dimensional structure models. This will intrigue us to expand our knowledge about SRV-2 RT Mf ET1006 as a virus model of HIV and develop new improved therapeutics agents.