Homologous SV40 RNA trans-splicing: Special case or prime example of viral RNA trans-splicing?

To date the Simian Virus 40 (SV40) is the only proven example of a virus that recruits the mechanism of RNA trans-splicing to diversify its sequences and gene products. Thereby, two identical viral transcripts are efficiently joined by homologous trans-splicing triggering the formation of a highly transforming 100 kDa super T antigen. Sequences of other viruses including HIV-1 and the human adenovirus type 5 were reported to be involved in heterologous trans-splicing towards cellular or viral sequences but the meaning of these events remains unclear. We computationally and experimentally investigated molecular features associated with viral RNA trans-splicing and identified a common pattern: Viral RNA trans-splicing occurs between strong cryptic or regular viral splice sites and strong regular or cryptic splice sites of the trans-splice partner sequences. The majority of these splice sites are supported by exonic splice enhancers. Splice sites that could compete with the trans-splicing sites for cis-splice reactions are weaker or inexistent. Finally, all but one of the trans-splice reactions seem to be facilitated by one or more complementary binding domains of 11 to 16 nucleotides in length which, however occur with a statistical probability close to one for the given length of the involved sequences. The chimeric RNAs generated via heterologous viral RNA trans-splicing either did not lead to fusion proteins or led to proteins of unknown function. Our data suggest that distinct viral RNAs are highly susceptible to trans-splicing and that heterologous viral trans-splicing, unlike homologous SV40 trans-splicing, represents a chance event.

Viruses represent the smallest replicating specimens. Mammalian viruses are mainly composed of the viral DNA or RNA genome, protein coats protecting the genetic material, and in some cases lipid envelopes surrounding the coat in the extracellular stage. The size of a viral particle correlates with the size of its genome and for most viruses genome size is a common theme. In order to express a maximum number of proteins, viruses use an array of mechanisms which do not significantly enlarge the genomic size such as including the use of multiple promoters, alternative open reading frames, translational frame shifting, stop codon read-through, and alternative splicing. In the same sense, viruses use virus-encoded small interfering (si)RNAs and antisense transcription to regulate gene expression. Recently, we reported strong evidence that viruses, i.e. the simian virus 40 (SV40), do also recruit the mechanism of RNA trans-splicing to generate the full repertoire of protein functions [1]. In mammalian cells, RNA trans-splicing represents a special form of alternative splicing in which two distinct RNA transcripts are joined in trans with the help of the spliceosome leading to the formation of chimeric RNA molecules which can code for novel functions or non-sense proteins. In the case of homologous SV40 RNA trans-splicing, two identical early viral transcripts are joined triggering the formation of a 100 kDa super tumour antigen (sT-ag). This 100 kDa protein was found to have superior cell transformation activity as compared with the regular 94 kDa large T antigen (T-ag). The SV40 trans-splice reaction was described to be supported by various SV40intrinsic molecular helper functions which render this reaction highly efficient with about 50 to 70% of the viral pre-mRNA transcripts being involved in trans-splicing. Sequences of other viruses including HIV-1 and the human adenovirus type 5 were reported to be involved in trans-splicing as well [2,3]; however, these events have not been investigated in depth and the question as to whether these reactions represent chance events or are of relevance to the viruses remains open. We used SV40 RNA trans-splicing as a model and computationally screened sequences reported to be involved in heterologous viral RNA trans-splicing for common trans-splicing-associated molecular features and signatures.
As opposed to conventional cis-splicing, the efficiency of transsplicing depends on the probability that two separate RNA precursor molecules encounter each other in the nuclear compartment. This probability increases with high sub-cellular RNA concentrations, e.g. in consequence of productive viral RNA transcription and/or when the  [4,5]. For homologous SV40 RNA trans-splicing the following trans-splice helper functions were reported: 1. the early SV40 transcript was found to harbour two binding domains (BDs) of 13 and 10 nucleotides (nts) in length respectively. 2. In addition, SV40 RNA trans-splicing was enhanced by mutational weakening of the Tag 5′ donor splice site (ss) which impaired cis-splicing in favour of the alternative trans-splice reaction towards 3. a strong cryptic 5′ss within a second pre-mRNA molecule. In consequence of the T-ag 5′ss mutation, only 4 instead of 5 nts of the 9 nt 5′ splice donor can base-pair to the U1 snRNA of the spliceosomal U1 snRNP, likely impeding its recruitment to this 5′ss. The alternatively used cryptic 5′ss can bind with 6 of 9 nts to the U1 snRNA and hence, represents a much stronger splice donor site. 4. Finally, the alternative cryptic 5′ss was further strengthened by an upstream purine-rich exonic splice enhancer (ESE). SV40 transsplicing was confirmed by the systematic exclusion of alternative mechanisms that might trigger the generation of the observed chimeric RNA, such as direct coding from integrated rearranged SV40 DNA and alternative RNA cis-splicing of a long pre-mRNA transcript that might originate from genomic SV40 tandem integration, using southern hybridization and PCR. The coincidence of various trans-splicing helper functions led us to the conclusion that SV40 RNA trans-splicing is certainly not a chance event but instead a mechanism that is intrinsic to the virus leading to the diversification of viral gene expression [6]. We used computational and manual analyses to screen the adenovirus type 5 exon 2 and the HIV-1 nef sequences as well as the reported trans-splicing partner sequences for trans-splicing helper functions deducted from the SV40 system. The adenoviral pre-mRNA was reported to splice in trans with various cellular host pre-mRNAs including the v-abl Abelson murine leukaemia virus protooncogene homolog 2 transcript variant a, zinc finger protein 41 exon 3, armadillo protein P0071 5′UTR, GTP-binding protein (RAB5) 1st coding exon, GAPDH exons 3 to 7, β-actin exon 3, T-plastin exon 1, and 1st coding exon of the interferon-inducible gene (IFI)-54K during the natural course of infection in human (U521MG, JEG3) cell lines [3]. Adenovirus type 5 is considered to be a non-integrating virus which excludes that the chimeric RNAs are directly DNA coded or generated by alternative RNA cissplicing; thus the described fusion RNAs are most likely the product of a trans-splice process. The HIV-1 nef mRNA was reported to splice in trans with the SV40 pre-mRNA, KIAA1454, and Cercopithecus aethiops genes 1, 2, and 3 in the cell lines CV1 and Cos7 [2]. The setup in these HIV-1 experiments did not resemble the situation of a natural infection and the HIV-1 nef sequence was either stably integrated into the host cell genome to monitor trans-splicing with cellular targets or alternatively microinjected as in vitro transcribed RNA together with an SV40 transcript. Thus, whilst hybrid RNAs formed with cellular sequences could originate from mechanisms other than trans-splicing, the chimeric RNAs formed between HIV-1 nef and SV40 could only be generated by RNA trans-splicing.
We (i) screened the above mentioned sequences for the presence of potential antisense BDs, (ii) investigated the strength of the splice donor and acceptor sites involved in RNA trans-splicing and (iii) alternative cis-splicing options available to the respective sequences, and (iv) searched for putative ESEs supporting either the trans-splice or regular cis-splice sites ( Table 1).
As for the BDs we considered complementary sites of ten or more consecutive nucleotides being suitable for the fixation of two encountering RNAs under physiological conditions. For all but the adenovirus::IFI-54K sequence pair we identified between one and nine BD candidates. Each pair can recruit at least one 11mer BD with the longest interacting domains being 15 and 16 nts in length (adenovirus:: GAPDH). Though the identified BDs can support the trans-splice reactions, many complementary BDs of 12 nts or shorter occur with a probability close to one among sequences with a considerable length. The expected number N exp (L BD ) of BDs of a given length L BD (nt) is in good approximation given by Eq. (1) with L A and L B as the respective lengths of the trans-splicing RNAs A and B. The probability P to observe a BD of the length L BD is then given by with N obs (L BD ) as the actually observed frequency of BDs with the length L BD . Accordingly, the majority if not all of viral pre/mRNAs have the potential to bind towards mammalian including human pre-mRNAs thus harbouring an important albeit not sufficient molecular feature for splicing in trans. The identified potential interactions between viral sequences and their targets as listed in Table 1 are summarised in Table 2.
The chance that two nuclear RNAs encounter and bind to each other via BDs directly correlates with their nuclear abundance and high RNA abundance has to considered an essential condition for trans-splicing. Hence, it was not surprising that Kikumori and colleagues [3] detected trans-splicing of adenoviral RNA with the highly abundant cellular pre-mRNAs of β-actin and GAPDH when searching for it.
To investigate the strength of splice donor sites we considered the number of consecutive base pairs and hydrogen bonds (hb) that can be formed between the respective donors with the spliceosomal U1 snRNA. A very strong donor can form up to 9 bp with the U1 snRNA including canonical Watson-Crick as well as wobble base pairs. To assess the strength of splice acceptor sites we counted matches, base pairs, and hydrogen bonds between the U2 snRNA and the branch point consensus sequence as well as sequence matches with the poly U/C stretch consensus sequence which is being recognised by the spliceosomal factor U2AF65. In order to additionally consider the base sequence, i.e. its compliance with the splice donor consensus sequence, we used the algorithms Alternative Splice Site Predictor (ASSP) and Splice Site Prediction (SSP) NNSPLICE 0.9 version to predict the strength of each splice site reflected by a numeric score [7,8]. Accordingly, ASSPscores of 10 and above and SSP-scores close or equal to 1 have to be considered high. Finally, we searched for purine (A/G)-rich sequences of at least 7 nts or longer that might act as ESEs [9]. The molecular features of the SV40 trans-splice reaction are included in the upper part of Table 1 for reference. The involved strong cryptic splice donor has no suitable downstream acceptor and splices in trans with the regular SV40 splice acceptor site. Whilst the wild-type T-ag 5′ss efficiently Notes to Table 1: a Trans-splicing is enhanced with A to C mutation in 5′ss of exon 1. b Cry 5′ss at aa 66 trans-splices with SV40, KIAA1454, and C. aethiops; cry 5′ss at aa 77 trans-splices with SV40 and C. aethiops. ss = splice site, alt = alternative, reg = regular, cry = cryptic, hb = hydrogen bonds, and aa = amino acid.
5′-GUGAGCCUGCA-3′ 3′-CACUCGGACGU-5′ C. aethiops exons 1, 2, 3 11mer 5′-GAAGUGUUAGA-3′ 3′-CUUCACAAUCU-5′ competes with the cryptic splice donor for cis-splicing, trans-splicing is strongly enhanced by mutational weakening of the T-ag 5′ donor ss. In the case of the adenoviral transcript, a strong regular viral splice acceptor trans-splices with a variety of mostly strong splice donor sites within cellular pre-mRNAs. The corresponding regular cellular cis-splice acceptor sites are mostly weak and the viral cis-splice donor site that belongs to the trans-splicing acceptor was assigned a lower score as well. In the case of HIV-1 nef two alternative strong cryptic splice donors were described to trans-splice with either SV40 RNA or two cellular mRNAs. These cryptic donor sites have no corresponding HIV-1 nef acceptor sites and the recruited SV40 acceptor represents a rather strong ss. In summary, our data point towards a general pattern associated with viral RNA trans-splicing which is schematically depicted in Fig. 1. Accordingly, a strong viral cryptic or regular donor or acceptor ss splices in trans with a rather strong regular or cryptic acceptor or donor site. Notably, splice sites that could compete with the trans-splicing sites by cis-splicing within the respective transcripts mostly are weaker or inexistent. As expected, the majority of splice sites involved in transsplicing are supported by ESEs which can be identified in proximity of many mammalian splice sites.
To experimentally investigate our hypothesis of ss selection/ preference in RNA trans-splicing we set up an artificial test system comprising a mini-gene, composed of three exonic segments (E1 to E3) joint by two introns (I1 and I2), and two trans-splicing RNAs, one for 5′ and one for 3′ exon replacement harbouring exons E1* or E3* respectively (Fig. 2). Whilst I1 harbours a very strong donor (D1) and an acceptor ss (A1) of medium strength, I2 was flanked by strong donor (D2) and strong acceptor (A2). Both trans-splicing RNAs were equipped with strong donor (D) or acceptor (A) sites, which can be considered cryptic splice sites since partnering cis-splice sites are not available, and BDs guiding the trans-splicing sequences to the respective 5′ or 3′ terminal intron. Consequently, the 5' or 3' splice sites of the trans-splicing RNAs could choose either of the available acceptor or donor splice sites, though one would expect that the target splice sites in proximity to the binding site is preferred [10]. HepG2 cells were co-transfected with plasmids expressing the mini-gene and one of the trans-splicing RNAs, total RNA was isolated 24 h post transfection, and trans-spliced RNAs were quantified using real-time reverse transcriptase PCR (rtRT-PCR) and the beta-actin mRNA as internal standard. For both transsplicing RNAs, the fold change values indicate that alternative transsplicing towards the more distant target ss was favoured over the expected trans-splicing with the respective proximal ss. In detail, D-to-A2 trans-splicing between the very strong donor D and the strong acceptor A2 was favoured (2.2-fold) over D-to-A1 trans-splicing from D to the weak acceptor A1; similarly, D1-to-A trans-splicing between the very strong donor D1 and the very strong acceptor A was favoured (7.2-fold) over D2-to-A trans-splicing between strong donor D2, which was a weaker donor than D1, and A. Notably, the alternative cis-splice partner ss D2 or A1 of the favoured trans-splice target sites A2 and D1 were weaker than partner sites D1 or A2 of the disfavoured trans-splice target sites A1 or D2, respectively. By qualitative agarose gel electrophoresis and by monitoring the melting profile of the rtRT-PCR products we excluded that longer products such as E1*/E2/E3 or E1/E2/E3* were generated and co-detected during the quantification of distal E1*/E3 or E1/E3* exon joining. These data support our model of ss selection during RNA trans-splicing indicating that (i) stronger target splice sites are favoured over weaker sites (ii) in particular when the alternative cis-splice partner site of the target is weaker compared with the ss of the trans-splicing RNA.
The chimeric RNAs generated in the course of heterologous viral RNA trans-splicing either (1) did not lead to the translation of fusion proteins, because (1a) multiple stop codons will only allow the formation of a truncated cellular protein such as in the case of the adenoviral trans-splice to the v-abl proto-oncogene homolog or because (1b) the translational start site is lost upon trans-splicing as in the case of the Armadillo protein with the adenoviral RNA, or (2) yielded fusion proteins of unknown function. Therefore, the highly transforming 100 kDa SV40 super T antigen remains the only example of a functional Fig. 1. Schematic illustration of a trans-splicing reaction between a viral pre/mRNA and an interacting pre-mRNA molecule. a, A strong viral cryptic or regular 3′ acceptor ss trans-splices with a rather strong regular splice donor (SD r ), cryptic exonic splice donor (SD e ) or cryptic intronic splice donor (SD i ). Ss that could compete with the trans-splicing sites for cis-splicing mostly are weaker or inexistent. b, In the case of the HIV-nef RNA one of two possible strong cryptic splice donor sites (at aa positions 66 and 74) trans-splice with a splice acceptor site of a cellular target. a and b: One or more potential binding domains (BDs) of 11 nts or longer can support the trans-splice reaction. Notably, in order to allow antisense binding via the BDs one of the two strands has to partly turn around as indicated by the crossing arrows. WD: weak donor; SD: strong donor; WA: weak acceptor; SA: strong acceptor; Cry: cryptic ss; *: ESE: exonic splice enhancer. protein the generation of which is triggered by homologous viral RNA trans-splicing.
Thus, our analyses suggest that the reported cases of heterologous viral RNA trans-splicing, unlike the homologous SV40 trans-splicing, reflect chance events which may occur in the course of active viral RNA transcription but which are meaningless for the involved viruses. On the other hand, both our analyses and the frequency of reported viral trans-splicing events emphasize the high competence of distinct viral sequences for RNA trans-splicing. This assumption is further supported by trans-splicing events detected between SV40 RNA and sequences of the murine polyomavirus or adenoviruses (J Eul & V Patzel, unpublished data). This may indicate that HIV-1, adenovirus type 5, and possibly other viruses use the mechanism of trans-splicing in the same way as the SV40 to diversify their sequences and gene products. One may then address the question why trans-splicing has not yet been observed for these viruses, though extensive cis-splicing was reported e.g. among the transcripts of HIV-1 [11]. A plausible explanation might be that the study design wasn't suitable to detect trans-splicing or that researchers have not been considering potential trans-splice events. Thus, targeted investigations to exploit viral trans-splicing might shed light to answer this question in the future. RNA trans-splicing is increasingly considered for therapeutic applications and researchers might explore the features of highly efficient viral RNA trans-splicing for the design of therapeutic trans-splicing RNAs.

RNA sequences
Nucleotide sequences of the analysed RNAs were obtained from NCBI GenBank.

Analysis of splice site strengths
The numbers of consecutive base pairs and hydrogen bonds (hb) that can be formed between splice donor sites and the spliceosomal U1 snRNA as well as matches, base pairs, and hydrogen bonds between the U2 snRNA and the branch point consensus sequence, and sequence matches with the poly U/C stretch consensus sequence were counted manually. For the count of hydrogen bonds canonical Watson-Crick (G ≡ C = 3hb; A = U = 2hb) as well as wobble base pairs (G · U = 2hb) were considered. Ss strength was further investigated using the software Alternative Splice Site Predictor (ASSP) http://wangcomputing.com/ assp/overview.html [7]. The software provides ab initio recognition of the probable splice sites based on the position and strength of ss, length, GC content, regulatory elements, consensus polypyrimidine tract and branch point regions. For analysis of possible trans-splicing between the viruses and the targets, default cut-off values of the software (2.2 for acceptor and 4.5 for donor) were considered along with the full length RNA sequences. The tabulated result generated with the putative ss information like position, score, GC content and confidence was compared with the viral sequences involved in trans-splicing. As a second software the Berkeley Drosophila Genome Project Splice Site Prediction (BDGP SSP), a neural network based program http://www.fruitfly.org/ seq_tools/splice.html [8] was used to predict the strength of splice signals at each intron/exon boundary using default cut-off values of 0.4 (range: 0-1). The software predicts the probability of the ss usage based on the consensus motifs involved in splicing.

Analysis of dimerization domains
Complementary between trans-splicing RNA sequences was investigated by sequence alignments (considering one sequence as reverse complement) using the BLASTn software (NCBI). Possible wobble base pairing was considered manually.