HIV-1 Rev interacts with HERV-K RcREs present in the human genome and promotes export of unspliced HERV-K proviral RNA

Background The HERV-K (HML-2) viruses are the youngest of the human endogenous retroviruses. They are present as several almost complete proviral copies and numerous fragments in the human genome. Many HERV-K proviruses express a regulatory protein Rec, which binds to an element present in HERV-K mRNAs called the RcRE. This interaction is necessary for the nucleo-cytoplasmic export and expression of HERV-K mRNAs that retain introns and plays a role analogous to that of Rev and the RRE in HIV replication. There are over 900 HERV-K RcREs distributed throughout the human genome. Thus, it was of interest to determine if Rev could functionally interact with selected RcRE elements that map either to HERV-K proviruses or human gene regions. This interaction would have the potential to alter the expression of both HERV-K mRNAs and cellular mRNAs during HIV-1 infection. Results In this study we employed a combination of RNAseq, bioinformatics and cell-based functional assays. Potential RcREs were identified through a number of bioinformatic approaches. They were then tested for their ability to promote export and translation of a reporter mRNA with a retained intron in conjunction with Rev or Rec. Some of the selected elements functioned well with either Rev, Rec or both, whereas some showed little or no function. Rev function on individual RcREs varied and was also dependent on the Rev sequence. We also performed RNAseq on total and cytoplasmic RNA isolated from SupT1 cells expressing HIV Rev, with or without Tat, or HERV-K Rec. Proviral mRNA from three HERV-K loci (4p16.1b, 22q11.23 and most significantly 3q12.3) accumulated in the cytoplasm in the presence of Rev or Tat and Rev, but not Rec. Consistent with this, the 3′ RcRE from 3q12.3 functioned well with HIV-Rev in our reporter assay. In contrast, this RcRE showed little or no function with Rec. Conclusions The HIV Rev protein can functionally interact with many RcREs present in the human genome, depending on the RcRE sequence, as well as the Rev sequence. This leads to export of some of the HERV-K proviral mRNAs and also has the potential to change the expression of non-viral genes.


Background
With the recent advances in DNA sequencing, it is now realized that human endogenous retroviruses (HERVs) constitute about 8% of the human genome. This is a result of retroviral integration after infection of the germ line, in combination with numerous subsequent transposition events. More than 200,000 copies of HERVs have been identified, making up more than thirty distinct groups (1,2). The majority of these proviruses are silenced through chromatin modifications and the accumulation of multiple point mutations, deletions, or insertions in their open reading frames (ORFs) or regulatory elements (3)(4)(5). Nevertheless, multiple endogenous retroviral copies are still almost complete and many fragmented HERV sequences provide functional cis-acting signals, such as promoters, enhancers, alternative splice sites, and poly-A signals that can alter human gene expression (6)(7)(8).
The youngest HERV family, HERV-K(HML-2), hereafter referred to as HERV-K, is thought to have integrated into the germ line between 200,000 and 5 million years ago (9,10). HERV-K viruses are named for their utilization of a lysine tRNA to prime reverse transcription and the Human MMTV-Like (HML-2) designation refers to the fact that these viruses are similar to mouse mammary tumor virus (MMTV) (11). The copies of these viruses in the human genome have accumulated the fewest number of mutations among the HERVs and several have retained ORFs (8,11). Two different types of HML-2 proviruses have been identified, based on their potential to produce different viral regulatory proteins. Type I HERV-K viruses encode a protein named Np9, while type II viruses encode the regulatory protein Rec. Type I viruses are the result of a 292bp deletion in the genome that shifts the rec ORF into another reading frame in a second coding exon, making the sequence of Np9 almost completely different from Rec. The function of Np9 remains unknown, although some studies suggest that it may have oncogenic potential (11)(12)(13). Rec functions to facilitate the export of incompletely spliced HERV-K mRNAs from the nucleus to the cytoplasm (see below).
To date, over 90 copies of intact or partially intact HERV-K proviruses have been identified. Although replication competent HERV-K molecular clones have been created in the laboratory (14)(15)(16), none of the sequences found in the human genome appear to produce a replicating, full-length integration-competent virus, despite the fact that some have complete open reading frames and two Long Terminal Repeats (LTRs) (14). Furthermore, many of these viruses are polymorphic in the human population, and new copies are still being discovered, so it remains possible that some of them are still capable of active retro-transposition (17,18). In addition to the proviral copies, more than 900 solo HERV-K LTRs, which have arisen from recombination events, are also present (14). Many of these solo LTRs are located within intergenic regions of the genome, but they are also found within protein coding human genes (19). Since these LTRs contain sequences that can function as regulatory elements of transcription, they have the potential to influence cellular gene expression within a gene or up to 200kb either upstream or downstream of a neighboring gene (20).
The LTR of the HERV-K provirus also contains a sequence element, called the RcRE, that is present near the 3' of all of the HERV-K mRNAs. The interaction of this element with the HERV-K Rec protein forms a complex, together with Crm-1 and Ran-GTP (21)(22)(23), that promotes nuclear export and expression of the viral mRNAs that retain introns (24). Complex exogenous retroviruses, such as HIV (25,26), HTLV (27,28), JSRV (29) and MMTV (30) also utilize similar mechanisms. For example, the HIV viral Rev protein acts in conjunction with the Rev Responsive Element (RRE) (25,26) to perform this task. Early studies, reported about 15 years ago, examining putative functional RcREs from unknown locations in the genome, showed that the HIV Rev protein could promote export and expression of mRNA with retained introns from reporter constructs containing these RcREs. It was also reported that although Rev functioned to some extent with RcREs, the prototype Rec proteins failed to function in conjunction with the HIV-1 RRE (23, 31). As discussed above, since hundreds of HERV-K LTRs are widely distributed throughout the human genome, potentially functional RcREs are also present in many human RNA transcripts. This raises the distinct possibility that when Rev is expressed during HIV infection, mRNA isoforms with retained introns, which would normally be found only in the nucleus, could be transported to the cytoplasm to alter the transcriptome. They could potentially also be translated and alter the proteome.
Several recent studies have examined the effect of HIV infection on host cell gene expression. A recent RNAseq study reported that HIV infection in primary CD4 T-cells led to accumulation of RNAs with a significant amount of intron retention in ribosomal protein and other genes (32). The sequence analysis also detected induction of novel RNAs from retrotransposons and endogenous retroviruses. This study did not provide an explanation for the observed increase in these mRNA species, although an effect of expression of Rev was discussed as a potential explanation.
Several reports have also demonstrated expression of HERV-K mRNAs and proteins from proviruses in HIV-1 infected patients (33), as well as in cultured cells infected with HIV (34). HERV-K virus-like particles have also been detected in the plasma of HIV-1 infected individuals (35). However, in a recent report, increased HERV-K virion production in HIV-1 infected individuals was not detected, although there was increased transcription from some HERV-K LTRs (36). Other reports have suggested that HERV-K transcripts and proteins can be induced by the HIV-1 viral proteins Tat and Vif, as a result of activation of transcription from the HERV-K LTR (37,38).
Based on these studies, we decided to investigate whether putative functional RcREs found within known human genes, as well as in a HERV-K provirus (HERV-K 3q12.3) that was shown to be active during HIV infection, could functionally respond to HIV Rev and HERV-K Rec proteins. The sequences were chosen based on their degree of homology to the prototypical RcREs that had been tested to date. Since our previous studies showed that Rev/RRE pairs from different HIV primary isolates displayed striking differences in functional activity (39), we also analyzed if different Rev proteins functioned differentially on different RcREs and how well the Revs worked in comparison to the prototypical Rec. Here we report that the selected HERV-K RcREs functioned with varying efficiencies in response to different HIV-1 Rev proteins and that some in fact functioned better with some of the Rev proteins than with Rec. We also show that the 3'LTR of HERV-K 3q12.3 functioned very well with Rev, while the 5' LTR did not and that the prototypical Rec protein showed almost no function on either element. Furthermore, expression of NL4-3 Rev, but not Rec, in SupT1 cells resulted in a significant increase in cytoplasmic, unspliced mRNA from this HERV-K provirus, and to a lesser extent, in at least two other ones.

Selection of genomic RcRE sequences for functional testing
We initially performed BLAST searches in human RNA databases using one of the previously analyzed RcREs (which was named prRcRE for prototype RcRE) (23). Surprisingly, no sequences were found that showed 100% identity with the prRcRE or in our subsequent searches of the human genome. Based on these searches, we selected four sequences that showed high sequence similarity to the prRcRE, present in RNAs expressed from the following genes (percent identity with the prRcRE indicated in parenthesis): C11orf72 (98%), RP4linc (97%) cFLAR (95%) and U9linc (94%) (see Figure 1, top 5 lines). To determine if the selected RcREs could function with either Rev or Rec similarly to the prRcRE, the sequences were cloned into the pCMV-Gag-Pol reporter vector (40). This vector produces a GagPol mRNA containing the coding region within an intron, similarly to the way it is arranged within the HIV provirus (40). Because of this, the mRNA cannot be exported from the nucleus to the cytoplasm and translated into protein without the insertion of a cis-acting element capable of binding to a trans-acting export factor. Gag and GagPol proteins are only produced in the presence of both of these factors. These proteins assemble into HIV particles and are released into the medium, where they can be easily measured by a p24 ELISA. This vector has been used by us in multiple previous publications to measure the activity of many cis-acting elements and trans-acting factors that mediate export of mRNA with retained introns (40)(41)(42)(43).
After cloning of these elements into the vectors, they were transfected into 293T/17 cells together with plasmids expressing either HIV NL4-3 Rev or prRec (39,43). The results of these experiments are shown in Figure 2. Without the co-expression of an export protein, most of the elements failed to elicit any p24 production, although low level activity was observed with RP4linc, cFLAR and U9linc. The element located within C11orf72 functioned significantly better than all of the others and both Rev and Rec functioned in conjunction with this element, although Rec showed higher activity than Rev (Figure 2A). The element located within the RP4linc gene also gave significant, albeit lower activity, with both Rev and Rec (Fig 2A), whereas the other two elements showed little change in functional activity in conjunction with either Rev or Rec. We also tested the GagPol vector containing the RRE, which confirmed previous results that indicated that Rec does not function in conjunction with the HIV-RRE ( Figure 1C). In contrast, both NL4-3 Rev and Rec functioned on the prRcRE, consistent with previous data. These experiments indicate that some human mRNAs and lncRNAs contain elements that function as RcREs, capable of working with both Rec and Rev proteins. However, they also showed that even though many sequences may show high sequence similarity to functional RcREs, even a few nucleotide changes can cause significant activity differences.

Selection of RcRE sequences in the human genome mapping to annotated gene regions that are common to datasets from HIV-1 infected cells and cells expressing Rec proteins.
Since our results demonstrated that sequence similarity was not sufficient to identify whether a specific element was functional with Rev or Rec, we employed a multi-step approach to increase the probability of finding functional elements. First, we performed a BLAST search with the prRcRE against the Ensembl DNA human genomic GRCh38.87 database (44). This resulted a data set of 198 sequences with greater than 94% sequence similarity to the prRcRE. Within this dataset, 88 potential unique functional RcRE sequences were found to be present in 98 annotated gene regions, as some of these regions had duplicate or identical RcRE sequences. We then queried our dataset to find genes that had been previously shown to have altered expression after HIV-1 infection (32) and/or genes that were differentially expressed in human embryonic carcinoma cells (hECC) expressing exogenously added Rec (45) (Figure 3). Genes were considered differentially expressed in these data sets if the false discovery rate (FDR) adjusted p-values were <0.01.
Three genes, (CEBPZ, SLC44A5 and SLC3A2) were common to all three datasets. A fourth annotated gene, TMEM64, was shared between our data set and the Rec dataset, but was absent from the HIV dataset. The four potential RcREs were highly similar, showing 97-98% identity to the prRcRE ( Figure 3). All of these elements are located within regions that have been designated as introns and are present in the anti-sense orientation relative to the transcription of the respective gene. The RcRE in the SLC44A5 gene region is located within the 3' LTR of a partial HERV-K provirus, 1p31.1. The TMEM64 gene region contains two solo direct repeat LTRs (451 and 452) with identical RcREs.
We also identified genes that were common to only the RcRE similarity and HIV datasets. The ten identified genes, (MAP4K3, TPRG1, TULP2, CNTLN, CCDC18, MMP24, KLRB1, FN3K, ZNF44 and CD48) all contained elements located within introns that are similar (95.4-98.2%) to the prRcRE ( Figure 1). Two of the genes, CNTLN and ZNF44, contained elements that are sense relative to the transcription of the gene. The remaining eight genes, (MAP4K3, TPRG1, TULP2, CCDC18, MMP24, KLRB1, FN3K and CD48) contained elements that are anti-sense to the transcription of the gene. The element in CD48 is located within the 3' LTR of HERV-K provirus, 1q23.3. The locations of these elements are shown in Table 1 with the LTR annotation adopted from Subramanian et al (14).

Functional testing of potential RcRE elements in the identified gene regions
We decided to first test the four potential RcREs that were common to both the Rec dataset and our data set.
To do this, synthetic DNA fragments containing the RcRE sequences were cloned into the HIV-1-Gag-Pol reporter construct. The resulting plasmids were then transfected into 293T/17 cells together with either pCMV-Rev or pCMV-Rec. The results of these experiments demonstrated that all of these elements functioned well with both Rev and Rec ( Figure 4A). Interestingly, the different RcREs responded differentially to Rev and Rec.
In the case of the RcREs from CEBPZ, SLC44A5 and TMEM64, higher levels of p24 were obtained with Rec than with Rev. In contrast, the RcRE from the SLC3A2 gene region showed significantly higher levels of p24 expression with Rev compared to the levels obtained with Rec. SLC3A2 also showed differential expression after infection with HIV.
We next tested seven of the ten additional potential RcRE elements (CCDC18, MMP24, TPRG1, CNTLN, ZNF44, TULP2 and MAP4K3) that were identified in the overlap of our RcRE database and the dataset of genes that were differently regulated after HIV infection. Two of the elements, MAP4K3 and TULP2, did not produce detectable levels of p24 above background with either Rev or Rec and MMP24 responded very weakly ( Figure 4B). Four of the seven elements (CCDC18, TPRG1, ZNF44 and CNTLN), showed increased activity with Rev ( Figure 4B), but only CCDC18 and ZNF44 functioned with Rec. Taken together, these results demonstrate that multiple RcREs within human genes can function with both Rev and Rec and that small sequence differences in the RcREs can result in significant functional differences and different activity with Rev and Rec. The fact that many functional RcREs were found in genes that were differentially expressed in HIV infected cells and that some of these functioned better with Rev than with Rec suggests that Rev could alter expression of the genes containing these RcREs after HIV infection.

Functional testing of RcRE sequences with different HIV-1 Rev proteins
Rev sequences derived from primary isolates of subtypes A, G and the recombinant virus CRF02 AG were previously shown to have differential activity in conjunction with the pNL4-3 RRE. To test how some of the selected RcRE sequences would function with these different Rev proteins, eight HIV-1 Rev sequences were tested with reporter vectors containing the CEBPZ, SLC44A5, SLC3A2 and TMEM64 RcREs (39). All of the Rev proteins were expressed from pCMV vectors that were identical except for the Rev sequences. We also included the prRev and the prototype Rec protein in these experiments and a plasmid expressing SEAP was also added to each transfection as a normalization control. Figure 5A all of the Rev proteins functioned on the prRcRE, however their activity varied greatly, and they functioned in a different rank order compared to their function on the NL4-3 RRE, as shown in our previously published report (39). They also demonstrated varying levels of activity on all of the RcREs ( Figure   5B). In order to more easily compare how a particular Rev protein behaved with each of the RcREs, we normalized the activity of each Rev/prRcRE to 1 and plotted the activity of each Rev/RcRE combination relative to the normalized value ( Figure 5C). In this way, it can be easily seen that some Rev proteins function more efficiently with certain RcREs. For example, 6AG Rev functioned better with the CEBPZ RcRE and less well with the TMEM64 RcRE compared to its activity in conjunction with the prRcRE, whereas 1A Rev functioned about the same on the CEBPZ RcRE and less well with the TMEM64 RcRE compared to its activity with the prRcRE. On the other hand, 8G Rev functioned less well with the CEBPZ RcRE, and hardly at all with the TMEM64 RcRE compared to its activity with the prRcRE. We then tested seven additional RcRE sequences (CCDC18, MMP24, TPRG1, CNTLN, ZNF44, TULP2 and MAP4K3) with three of the HIV-1 Revs (1A, 6AG and 8G) that showed high, medium and low p24 expression on the HERV-K prRcRE. Again, the different RcREs functioned with differential activity with the different Revs and Rec. However, in these cases, the rank order of activity of all of the RcREs tested remained nearly the same with all of the Rev proteins, with minor variations compared to Rec ( Figure S1A and S1B). Taken together, these results indicate that different RcRE sequences respond differentially to different Rev proteins, suggesting that Rev proteins from different HIV primary isolates will likely influence host cell gene expression in different ways.

Identification of HERV-K transcripts present in total and cytoplasmic RNA in cells transduced with retroviral vectors expressing HIV Rev and/or Tat or HERV-K Rec
To directly analyze if expression of regulatory HIV proteins (Tat and Rev) and the HERV-K Rec protein could Trimmed reads were aligned to the hg19 version of the human genome using HISAT2. The BAM files were then filtered to obtain only the reads that mapped uniquely to one proviral location. To be able to count reads emanating from HERV-K proviruses, we created a Gene Transfer Format (GTF) file integrating all known HERV-K proviral genomic coordinates into the Ensembl GTF, hg19 build (44). This enabled us to identify the HERV-K loci that were transcriptionally active (14,17).
When total RNA was analyzed, expression from 9 HERV-K loci (1q21.3, 3q12.3, 4p16.1b, 6p21.1, 6q25.1, 11q12.3, 19q13.12b, 20q11.22 and 22q11.23) was identified in all samples with only very slight differences between the samples transduced with the different vectors. Some reads also mapped to HERV-K locus 11q22.1, but only in the sample that received the Rec vector ( Figure 6A). These reads appeared to originate from the exogenously expressed Rec, since the sequence of the reads were a perfect match to the sequences of Rec that are in the vector. When cytoplasmic RNA was analyzed a very different picture emerged. The samples that were expressing Rev or both Tat and Rev contained an increased number of reads for HERV-K proviruses 3q12.3, 4p16.1b and 22q11.23, with the greatest differences being seen for the 3q12.3 virus ( Figure   6B).
A quantitative analysis, using DESeq2, of the read counts mapping to 3q12.3, is presented in Figure 6. In this virus, the reads mapped across the entire proviral genome ( Figure 7A). Cytoplasmic samples from cells transduced with a Rev-expressing construct, had 4.2 fold (p value = 1.13E-08) more reads uniquely mapping to this locus compared to the cells transduced with the empty vector and the cytoplasmic samples from cells transduced with both Tat and Rev had 5.4 fold (p value = 4.10E-10) fold more reads ( Figure 7C). The fact that the increase in read counts maps across the entire proviral locus strongly suggests that Rev has facilitated the nucleocytoplasmic export of the full length unspliced HERV K genomic RNA in the transduced cells.
Interestingly, cells transduced with HERV-K Rec, showed only a small increase in reads (1.4 fold). The differences in the number of reads mapping between total and cytoplasmic RNA samples in the other 2 proviruses were also quantified but were much less dramatic (1-2 fold). ( Figure S2) As a control, we also examined the reads mapping to the gene encoding NEAT1 lncRNA. This gene is known to encode a full length unspliced RNA (NEAT1_2) that remains nuclear, localizing to paraspeckles, as well as a shorter form whose localization is less clear (NEAT1_1) (46). Figure S3 shows that reads derived from total RNA mapped across the whole NEAT1 gene region. However, in cytoplasmic RNA, only reads mapping to the region corresponding NEAT 1_1 were present. This serves as a control to confirm that our cytoplasmic RNA prep was not contaminated with RNA that should remain nuclear.

Rec
Since transcribed HERV-K 3q12.3 mRNA was efficiently exported to the cytoplasm in the presence of Rev, it was of interest to analyze whether the 3q12.3 provirus contains RcRE sequences that function in conjunction with Rev. Since the RcRE is present in the U3 region of the HERV-K LTR, only the 3' RcRE would appear in RNA transcribed from the provirus. Nevertheless, we tested both the 5' and 3' RcRE sequences in this provirus by cloning them into the Gag-Pol reporter construct and testing them with the different Rev proteins that we used in the experiments described above.
As shown in Figure 8A, all of the HIV-1 Rev proteins functioned well on the 3' RcRE from this provirus. In contrast, very little function was seen in conjunction with HERV-K Rec. This correlated well with the RNAseq data that showed that only Rev significantly promoted the export of the HERV-K proviral RNA containing this RcRE. The different Rev proteins functioned in a somewhat different rank order than they did on the prRcRE, highlighting the fact that different Rev proteins show different activity on the RcREs due to unknown factors that affect their interactions. As in shown in Figure 8B, the 5' RcRE from 3q12.3 was completely unresponsive to any of the proteins tested in spite of the fact the two RcREs are highly similar, showing 96% sequence identity to each other and 96.7% (3') and 95.6% (5') sequence identity to the prRcRE ( Figure 8C). (33)(34)(35)(36)47). There have also been reports of immune responses to HERV-K in HIV infected individuals, and, in a recent report, it was shown that antibodies directed against HERV-K Gag were present in elite controllers (48).

Many studies have described increased HERV-K expression in HIV infection and HIV infected cells
It has also been shown that HIV Tat can enhance HERV-K transcription from several proviral loci (37,38).
However, there have been no previous studies that have analyzed the effects of Rev expression on HERV-K expression, although studies conducted many years ago reported that Rev can enhance expression of unspliced RNA from reporter vectors through direct interactions with prototype RcREs cloned from unknown locations in the human genome (23,31). Based on the advances in sequencing that led to the publishing of the human genome, we now know that there are several HERV-K proviruses that contain potential functional RcREs (14,17). In addition to these, there are many solo HERV-K LTRs that are likely to contain functional RcREs (14).
Curiously, the previously studied functional RcREs do not show an exact sequence match to any RcREs identified in the current versions of the human genome. However, here we show that Rev can indeed function in conjunction with many of the HERV-K RcREs that are present in the human genome and that many of them are in protein-coding gene regions. Some of the RcREs that we identified and tested have the same sense as the coding region of the gene they reside in. Others are in the antisense orientation. Recently it has been shown that many genes encode overlapping antisense RNA (49,50), which can sometimes change regulation of a protein coding gene on the opposite strand. Thus, it is possible that Rev interactions with a RcRE, irrespective of its orientation (19), could modulate gene expression. Whether this occurs for any of the genes where we have identified RcREs (see Table 1) requires further study.
Our studies clearly show that Rev can act to promote export endogenously expressed HERV-K RNA.
Interestingly, the provirus that showed the most significant response to Rev was the provirus located at 3q12.3. This provirus has previously been shown to be transcriptionally active in many cells, including HIV infected cells (14,51,52). Our results showed that expression of Rev led to an increase of the full-length mRNA in cytoplasmic RNA and also that expression of Tat and Rev together led to an increase of this RNA in total RNA. This suggests that Tat increased transcription, generating mRNA that could be exported by Rev.
Our data also clearly show that different Rev proteins function with different efficiencies in conjunction with different RcREs and that some RcREs function better with some Rev proteins than with Rec. This result dovetails nicely with our previous finding that Rev proteins and RREs from different primary isolates can have significantly different levels of activity.
The fact that HIV-1 Rev promotes the nucleo-cytoplasmic export of unspliced mRNAs expressed from the Type Whereas the effects of Rev on proviral HERV-K expression may be the most important finding in our study, we started by identifying potential RcREs in HERV-K solo LTRs present in non-viral protein coding and non-coding genes. We originally based our selection of sequences to test based simply on how similar they were in sequence to the prototype RcRE that was analyzed in previous studies. While this did indeed lead to the identification of RcREs that functioned with varying efficiencies with both Rev and Rec, it became clear that primary sequence is not a good indicator of RcRE function. This may be best illustrated by fact that the RcRE located within the 3' LTR from the 3q12.3 HERV-K provirus was shown to function extremely well with some HIV Revs, but not with others and also rather poorly with Rec (see Figure 8). In contrast, the 5' RcRE from this provirus did not function with either Rev or Rec, in spite of the fact that there were only a few primary sequence differences between the 5' and 3' RcRE and the prototype RcRE. Additionally, the RcRE which mapped to the SLC3A2 region clearly functioned better with the prototype Rev compared to its activity with the prototype Rec.
These results highlight that functionality cannot currently be predicted simply by an RcRE sequence analysis.
Thus, functional activity experiments using reporter assays like the one we describe here are essential to evaluate the extent to which a specific RcRE will function in conjunction with either a specific Rev or Rec sequence.
Our RNAseq analysis also identified two more HERV-K proviruses (4p16.1b and 22q11.23) where there was an increase in the amount of unspliced mRNA in the cytoplasm, when HIV-1 Rev or Tat and Rev were present, although to a much lesser extent then seen for 3q12.3. While 3q12.3 is a full-length intact provirus with a functional 3' RcRE, the 4p16.1b and 22q11.23 proviruses have truncated 3' LTRs with only partially intact RcREs. It remains to be determined how Rev may act to promote nucleo-cytoplasmic export of the mRNA expressed from these two proviruses.
Previously published reports demonstrate that transcription for both 3q12.3 and 4p16.1b is initiated in their respective 5' LTRs, while the transcription for 22q11.23 is through a separate LTR, located a little over 500bp upstream of the provirus (53). Notably, it has been reported that the 5' LTR from the provirus 3q12.3 has evolved to include a duplication of the HOX-PBX transcription binding site as well as a SNP in the RFX3 transcription factor binding site (54). This may explain why this provirus is highly transcribed in many different cell types and why it no longer is able to function as a RcRE.
Most reading frames in HERV-K 22q11. 23 (36,48,55,56). A recent article reporting on increased HERV-K capsid antibody levels in elite controllers suggest that immune responses to HERV-K structural proteins could contribute to HIV immune control (48). Thus it will be of clear interest to further define the RcREs present in HERV-K mRNAs that are expressed in HIV infected individuals, especially those with strong anti-HERV-K immune responses, and to determine whether the Rev proteins expressed in these individuals can function in conjunction with these RcREs. It has also been reported that the 3q12.3 provirus is actively transcribed in patients with Amyotropic Lateral Sclerosis (ALS) (57) and although this remains controversial, it has been proposed that HERVs plays a pathophysiological role in this lethal disease (for a review see (58). Some HIV-infected patients develop neurological manifestations that resemble classical ALS, although it occurs at a younger age and there has been reports of improvement following the initiation of antiretroviral therapy (59). In view of the results presented here, it may be of interest to address whether HIV Rev expression leads to increased expression of HERV-K Gag proteins in the brain and immune responses to HERV-K in these patients.
The mechanism(s) responsible for the difference in relative HIV-1 Rev or HERV-K Rec activity when paired with different HERV-K RcREs was not examined in this study. It has been previously proposed that different stem-loops in the prototypical HERV-K RcRE sequence may be involved in binding to Rev and Rec (31,60,61). For HIV-1 Rev, it has been suggested that there may be two discrete binding sites that lie within stem-loop 2 and 6, while for HERV-K Rec, binding between the protein and RNA may be much more complex, involving folding and three-dimensional structure rather than discrete binding sites within the HERV-K RcRE. Our results in this study demonstrate that the RcREs that are found within SLC3A2, TPRG1 and CNTLN displayed higher functional activity levels with HIV-1 Rev than with HERV-K Rec. Studying additional HERV-K RcREs in other areas of the genome and their functional activity as well as performing SHAPE analysis may be helpful in further pinpointing areas of the RcREs that are important for HIV-1 Rev binding.

Conclusions
During this study we identified HERV-K RcRE sequences that functioned with varying efficiencies in response to different HIV-1 Rev proteins. Some of the HIV-1 Rev proteins functioned better with several of the selected HERV-K RcREs than with HERV-K Rec. Additionally, we showed that the RcRE found within the 3' LTR of HERV-K 3q12.3 functioned well with many different Rev proteins, but hardly at all with the prototypical Rec protein. Consistent with this, we showed that expression of Rev in SupT1 cells resulted in a significant increase in cytoplasmic, unspliced viral mRNA from the HERV-K 3q12.3 provirus which has a complete open gag reading frame. HIV Rev could thus change HERV-K protein expression, resulting in novel immune responses to these viruses. Finally, our results indicate that effects of HIV Rev on HERV-K expression in infected individuals will differ depending on variation in HERV-K expression and the HIV subtype.

Identification of genomic HERV-K RcRE sequences
A list of sequences highly similar to the HERV-K RcRE was generated through a BLAST search using the prototypical HERV-K RcRE sequence (23) to search the Ensembl DNA, non-coding RNA, and cDNA human genomic databases (GRCh38.87) (44). Sequences with greater than 94% sequence similarity were identified and listed.
Two other gene lists were retrieved from previously published studies identifying differential gene expression during HIV-1 infection (32) and differential gene expression when HERV-K Rec was expressed in hECC (62).
Genes were considered differentially expressed if the false discovery rate (FDR) adjusted p-value was p<0.01.

Plasmids
All plasmids utilized were given the name pHRXXXX for identification and to facilitate requests. Synthetic (3q12.3_5'_RcRE). pCMV-GagPol -RRE (pHR3442) has been used previously to measure Rev function (43). It was constructed in the same way as the pCMV-GagPol-RcRE plasmids described above, except that a DNA fragment containing the RRE, amplified by PCR from pNL4-3, was inserted into the backbone by traditional cloning methods rather than using the NEBuilder HiFi DNA assembly kit. The sequence of each inserted gBlock or PCR fragment in each plasmid was confirmed using Sanger DNA sequencing. Plasmids expressing Rev from several different HIV subtypes and the viruses from which they are derived have been previously described (40,44). These included pHR5209 (subtype A-1), pHR5182(subtype A-2), pHR5183(subtype AG-3), pHR5185(subtype AG-4), pHR5187(subtype G-9), pHR5202(subtype AG-5), pHR5203(subtype AG-6), pHR5204(subtype G-8)). The plasmid expressing the prototypical Rev sequence (prRev), pHR30, which was derived from a clade B isolate, has been used previously described (64).

Cell Culture and Transient Transfections
Twenty-four hours prior to transfection, 1.7 x 10 5 293T cells were seeded into 12-well tissue culture plates in 1ml medium (DMEM, 10% BCS, 50mg/ml gentamicin). To test the activity of the HIV-1 RRE and HERV-K RcREs, each well was transfected with 50ng of either pCMVRev (pHR30) or pCMVRec (pHR5313), 2µg of pCMV-Gag-Pol-RRE(pHR3442) or pCMV-Gag-Pol-RcRE(pHR5338) and 100ng of a plasmid expressing secreted placental alkaline phosphatase (SEAP) (pHR1831) using Turbofect™ transfection reagent (ThermoFisher). Cell supernatants were collected 72 hours post-transfection, and levels of p24 were determined by ELISA as previously described (65) The SEAP levels of the supernatants were determined by a chemiluminescent method (Roche) and were used to normalize the differences in transfection efficiencies.

Retroviral vectors, production and transduction
The MSCV retroviral vector backbone was utilized to make vectors that express Rev, Rec or Tat

RNA isolation and sequencing
Total and cytoplasmic RNA from three biological replicates of SupT1 cells transduced with the empty MSCV vector (EV) or MSCV vectors expressing Rev, Tat, Tat and Rev and Rec together with the fluorescent proteins, were isolated from cells sorted by flow cytometry. The methods used for total and cytoplasmic RNA extraction were previously described (69), with the following modifications. For the total RNA extraction, the sorted SupT1 cells were pelleted, then washed twice with 5ml of cold phosphate-buffered saline (PBS). After the final wash, the cells were lysed in 500ul of lysis buffer containing 0.2 M Tris-HCL (pH 7.5), 0.2M NaCl, 1.5mM MgCl 2 , 2% sodium dodecyl sulfate (SDS) and 200µg of proteinase K per ml, and incubated for 1 to 2 hours at 45°C. For cytoplasmic RNA extraction, the sorted SupT1 cells were pelleted then washed twice with 5ml of ice-cold phosphate-buffered saline (PBS). After the final wash, the cells were pelleted and resuspended in reticulocyte standard buffer (RSB) (10mM Tris-HCL (pH 7.4), 10mM NaCl, 1.5mM MgCl 2 ) followed by the addition of an equal volume of RSB with 0.1% IgePal™. Nuclei were cleared by two sequential centrifugations at 13,000 rpm in an Eppendorf centrifuge at +4C, prior to the addition of 2X PK buffer (200mM Tris-HCL (pH7.5), 25mM EDTA, 300mM NaCl, 2% SDS, 400µg of proteinase K per ml) to the cleared supernatant. The lysates were then incubated at 37°C for 30 minutes.
Sodium acetate (3M, pH 5.5) was then added to the final extractions for a final concentration of 0.3M prior to precipitation with 2.5 volumes of 200 proof ethanol (-20°C). The RNA was pelleted for one hour at 4°C 3000 RPM (3014g) in a Baxter Cryofuge 6000 centrifuge, then washed twice with 70% ethanol and air dried prior to the addition of RNase/DNase free water. Total RNA was DNase digested (RQ1 RNase-free DNase, Promega) then extracted and precipitated as before. Final RNA concentrations were determined with a Qubit RNA HS assay kit (Qubit 3.0 fluorometer, ThermoFisher).

NGS Sequencing
Stranded libraries were prepared by Novogene (Beijing, China) from total and cytoplasmic RNA using reagents from New England Biolabs (NEB). First, polyA+ RNA was isolated from total or cytoplasmic RNA using poly-T magnetic beads. The polyA containing RNA was then fragmented with fragmentation buffer (NEB)and first strand cDNA was synthesized using random hexamers and M-MuLV Reverse Transcriptase (RNase H-).
Second strand cDNA synthesis was performed using DNA Polymerase I and RNase H while substituting dTTP with dUTP. The double stranded cDNA was purified using AMPure XP beads (Beckman Coulter, Beverly, USA). To ensure that the library was stranded, the second strand cDNA was digested by USER (Uracil-Specific Excision Reagent, NEB) prior to selecting for fragments approximately 150-200bp in length with AMPure XP beads. The final library was amplified by PCR, then re-purified with AMPure XP beads prior to sequencing on the Illumina platform at Novogene.

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Competing interests
The authors declare that they have no competing interests   The data set named RcRE contained sequences in gene regions that were highly similar to the HERV-K prRcRE. The data set named HIV, contained a list of genes that were shown to be changed after HIV-1 infection (32). The data set named Rec contained genes that were differentially expressed in human embryonic carcinoma cells (hECC) expressing HERV-K Rec (45). Three genes, CEBPZ, SLC44A5 and SLC3A2, were common to all three datasets. A fourth annotated gene, TMEM64, was shared between the Rec and RcRE datasets, but was absent from the HIV dataset. Ten additional annotated genes from the HIV dataset contained RcREs, however, these genes were absent from the Rec dataset. The RcREs located within CNTLN and TPRG1 are in the same transcriptional orientation as the gene, while the remaining RcREs are anti-sense to the transcriptional orientation of their associated gene.       After normalization of the data, the fold difference in the number of unique reads mapping to the 22q11.23 (left panel) or 4p16.1b (right panel) loci were quantified using DESeq2 for total or cytoplasmic RNA from the Rev, Tat, Tat and Rev or Rec transduced samples, compared to the samples transduced with the empty vector.

Figure S3
Visualization of NEAT1 reads from total and cytoplasmic RNAseq data. Total and cytoplasmic DESeq2 normalized read counts for NEAT1 were visualized with IGV. Note that DNA reads from total RNA map across the entire NEAT1 gene region and include both the NEAT 1_1 and NEAT 1_2 RNA isoforms. In contrast, most of the reads from cytoplasmic RNA map only to the region corresponding to the NEAT1_1 RNA isoform.