Occurrence, function and evolutionary origins of ‘2A-like’ sequences in virus genomes

2A is an oligopeptide sequence mediating a ribosome ‘skipping’ effect, producing an apparent ‘cleavage’ of polyproteins. First identified and characterized in picornaviruses, ‘2A-like’ sequences are found in other mammalian viruses and a wide range of insect viruses. Databases were analysed using a motif conserved amongst 2A/2A-like sequences. The newly identified 2A-like sequences (30 aa) were inserted into a reporter polyprotein to determine their cleavage activity. Our analyses showed that these sequences fall into two categories. The majority mediated very high (complete) cleavage to separate proteins and a few sequences mediated cleavage with lower efficiency, generating appreciable levels of the uncleaved form. Phylogenetic analyses of 2A-like sequences and RNA-dependent RNA polymerases (RdRps) indicated multiple, independent, acquisitions of these sequences at different stages during virus evolution. Within a virus family, 2A sequences are (probably) homologous, but diverge due to other evolutionary pressures. Amongst different families, however, 2A/2A-like sequences appear to be homoplasic.

In the case of foot-and-mouth disease virus (FMDV), the 2A oligopeptide is post-translationally 'trimmed' from the C terminus of the upstream protein 1D by the virusencoded 3C proteinase (3C pro ), 'delineating' 2A as just 18 aa. Residues that were not critical, but enhanced the cleavage activity, mapped to a somewhat longer sequence, extending~30 aa upstream of the 2A/2B cleavage site (Donnelly et al., 1997(Donnelly et al., , 2001b. This length is consistent with our model of the cleavage mechanism, where 2A is proposed to interact with the exit tunnel of the ribosome to conformationally restrict the peptidyl-tRNA ester linkage, precluding it from nucleophilic attack by prolyl-tRNA in the A site of the ribosome (Ryan et al., 1999;Donnelly et al., 2001a).
A motif comprising the seven C-terminal residues of 2A and the N-terminal proline of protein 2B (underlined) is conserved (-DxExNPG Q P-, where 'x'5any amino acid). With this motif, databases were analysed using PATTINPROT (Pô le BioIformatique Lyonnais) and PSI-BLAST (NBCI; http://www.ncbi.nlm.nih.gov). The positions of 2A/2A-like sequences (2As) in a number of RNA viruses are shown in Fig. 1(b), the sequences are shown in Fig. 2 and GenBank accession numbers are listed in Supplementary Table S1 (available in JGV Online).
To study the activity of these 2As, plasmids were constructed to encode a single ORF consisting of green fluorescent protein (GFP), a longer (30 aa) version of 2A and b-glucuronidase (GUS; Donnelly et al., 2001b). Oligonucleotide primers used are listed in Supplementary  Table S2 (available in JGV Online). Rabbit reticulocyte lysate in vitro translation system (TNT T7 Quick Coupled Transcription/Translation System; Promega) was used to determine the cleavage activity of these new 2As. Proteins synthesized de novo were labelled with [ 35 S]methionine (5 mCi, 185 kBq) and reactions were incubated at 30 u C for 90 min. Translation products were analysed by 10 % SDS-PAGE ( Fig. 1c) and the distribution of the radiolabel was quantified by using phosphorimaging. 'Cleavage' activities were calculated as described previously (Donnelly et al., 2001b) and are the mean of three independent translation reactions.
In these in vitro systems, we typically observed three products: (i) low-level of [GFP-2A-GUS] uncleaved product, (ii) GUS and (iii) [GFP-2A] cleavage products. However, in picornavirus-infected cells, no proteins were detected that spanned the 2A/2B cleavage site (data not shown). Here, the longer versions of 2A more closely reflected the cleavage activities observed in vivo [~99 % with sequences of the family Picornaviridae such as FMDV, equine rhinitis B virus 1 (ERBV-1), SAF-V and LV, Fig. 1(c)]. A variation from the consensus motif (-DVESNLGP-) reported in FMDV was found to be inactive (data not shown), consistent with analyses of site-directed mutants at this position (Donnelly et al., 2001b). Interestingly, a rare substitution within this region (SerAPro; -DVEPNPGP-; Oem et al., 2004;Carrillo et al., 2005) cleaved highly efficiently (~99 %; data not shown).
In type C rotaviruses, 2A links the ssRNA-binding protein NSP3 to dsRNA-binding protein (dsRBP). Rotavirus mRNAs do not bear poly(A) tails and NSP3 circularizes rotaviral mRNAs (Piron et al., 1999;Jayaram et al., 2004). The dsRBPs downstream of 2A sequester viral dsRNA (.11-16 nt, without apparent sequence specificity) from the cellular sensors of dsRNA, counteracting the activation of the cellular antiviral interferon system (Langland et al., 1994). When segment 6 from the porcine C rotavirus was expressed, both in vitro and in COS-1 cells, similar to our in vitro analyses, three proteins were observed: a small amount of full-length [NSP3-2A-dsRBP] product and nearly equimolar amounts of [NSP3-2A] and the dsRBP cleavage products (Langland et al., 1994). Furthermore, [NSP3-2A-dsRBP] was detected in infected cells and it was shown to bind dsRNA. It is noteworthy that NSP3 forms dimers, which may add a further level of complexity since NSP3 could form heterodimers with [NSP3-2A-dsRBP]. The incomplete cleavage produced by 2A allows type C rotaviruses to generate a complex array of products at relatively high levels. No other translational control mechanism can produce this outcome.
The members of the family Totiviridae are non-segmented dsRNA viruses. The N-terminal domain of the IMNV polyprotein ORF1 encodes non-structural proteins with two 2As (Fig. 1b) that are highly active (~99 %; Fig. 1c). Interestingly, although segment 6 in group C rotavirus encodes a different protein to that of segment 5 in ADRV-N (NSP3 and NSP1, respectively), the protein downstream of 2A, a dsRBP, is the same in both cases. This dsRBP forms the N terminus of IMNV ORF1 followed by 2A 1 . In this case, therefore, the dsRBP is 'cleaved' from ORF1 as a [dsRBP-2A] protein.
The cleavage activity of 2As not completely matching the -DxExNPGP-motif was also determined. The iflaviruses VDV-1, KV and DWV contain the motif (-MDNPNPGP-) in the N-terminal region of their polyproteins. The VDV-1 2A was chosen for analysis and no cleavage activity was observed (data not shown). The unclassified picorna-like virus APV -DLESNPPP-sequence was modified (ProAGly, underlined) to closely resemble the consensus sequence (-DLESNPGP-). No cleavage activity was observed with either form of this sequence (data not shown).
Analyses of 2A-mediated cleavages suggest that they are of broadly two types. In most cases, very low levels of protein spanning the 2A tract are observed in our in vitro translation analyses. However, in CrPV, PrV-2A 2 and type C rotaviruses there are appreciable levels (~10 %) of uncleaved polyprotein in vitro. Currently, no data are available from CrPV-and PrV-infected cells.
Phylogenetic analyses of viruses containing 2As were performed to determine their evolutionary relationships by alignment of the RNA polymerases (RdRp) from 40 members of the picornavirus 'supergroup' and 19 other RNA viruses by using CLUSTAL W (Thompson et al., 1994). All viruses with a functional 2A sequence were included in this analysis. Related viruses (without 2A) from the same families were included to produce a comprehensive phylogenetic tree (Fig. 3). RdRp sequences from the family Tetraviridae were, of necessity, excluded since major domains of the tetravirus RdRp are 'shuffled' in comparison with other RNA viruses (Gorbalenya et al., 2002) and could not be aligned. Optimal alignments were obtained with the gap opening value set to 3 and gap extension set to 0.1. Phylogenetic trees were created using CLUSTAL_X 1.81 (neighbour-joining algorithm, Kimura substitution model) using the 'exclude positions with gaps' and 'correct for multiple substitutions' options. Phylogenetic trees were then visualized with NJPlot module. Phylogenetic relationships of the viruses were verified with previously published data. This analysis showed four major clusterings: two clusters with segmented dsRNA reoviruses (cypoviruses and rotaviruses), a single cluster with non-segmented dsRNA totiviruses and one comprising RdRps of all positive ssRNA viruses with separate branches formed by picornaviruses, iflaviruses and dicistroviruses (Fig. 3).
2As were aligned by using CLUSTAL_X 1.81 (Thompson et al., 1997). Since 2A functions co-translationally (within the ribosome exit tunnel), we aligned these sequences such that no gaps were introduced by the algorithm (gap opening penalty550, Fig. 2a). It is apparent that 2A sequences from related viruses do not necessarily form clusters corresponding to those obtained using RdRp sequences, but are distributed throughout various branches of the tree (Fig. 2b-e).
Capsid and replication proteins are separated in picornaviruses by three means: 3C pro , 2A pro and the type of 2A that forms the subject of this paper. This region appears to be highly mutable -a recombinational hot-spot in enteroand aphthoviruses (Lukashev, 2005;Heath et al., 2006). Since the latter form of 2A is present in many genera, either 2A has been acquired/lost on multiple occasions, or 2A was acquired at an early stage of evolution and subsequently replaced with a proteinase in the entero-, rhinovirus lineage (Fig. 3).
Whilst 2A appears to have been acquired at a relatively early stage in picornavirus evolution, the reverse seems to be the case in the dicistroviruses -assuming a single acquisition event in the branch comprising DCV, CrPV, Solenopsis invicta virus 1 (SINV-1), IAPV, KBV and ABPV. It appears that SINV-1 has lost 2A. Indeed, alignments show that SINV-1 has a large deletion of the N terminus of ORF1 (Valles et al., 2004).
Similarly, acquisition of 2A appears to have occurred at a relatively late stage in the evolution of the members of the family Reoviridae. In cypoviruses, only the CPV-1 and -18 lineages possess 2As, while closely related viruses do not (e.g. . In rotaviruses, a [2A-like/dsRBP] 'module' has been acquired by different RNA segments/ proteins diverging into two forms: low cleavage (type C rotaviruses) and high cleavage (ADVR-N). Similarly, among the members of the family Totiviridae, only IMNV possesses a [2A-like/dsRBP] module (plus another downstream 2A).
A more complex pattern is observed in the iflaviruses. Here, analyses of both 2A and the polymerase sequences show IFV is much more distantly related to PnPV/EoPV (Figs 2c and 3). Two explanations seem equally plausible: (i) an early acquisition accompanied by divergence of 2A (between IFV and PnPV/EoPV), acquisition of a second 2A in PnPV/EoPV and loss of 2A from the other lineages or, (ii) two independent acquisitions of 2A, one in IFV and another in the PnPV/EoPV lineage.
Our analysis suggests that 2As emerged independently at least six times amongst the RNA viruses analysed. Whilst some 2A sequences are clearly homologous, our data also strongly indicate homoplasy: a common function arising Polymerase domains were aligned by using CLUSTAL_X and phylogenetic trees visualized by using NJPlot. Virus groups are indicated (shaded areas) with those viruses possessing 2As indicated in boxes. Virus names and sequence GenBank accession numbers are given in Supplementary Table S1 (available in JGV Online). from multiple, independent, evolutionary origins -not surprising given their short length and the location of these sequences in known recombinational hot-spots.