Transcriptional slippage in the positive-sense RNA virus family Potyviridae

The family Potyviridae encompasses ∼30% of plant viruses and is responsible for significant economic losses worldwide. Recently, a small overlapping coding sequence, termed pipo, was found to be conserved in the genomes of all potyvirids. PIPO is expressed as part of a frameshift protein, P3N-PIPO, which is essential for virus cell-to-cell movement. However, the frameshift expression mechanism has hitherto remained unknown. Here, we demonstrate that transcriptional slippage, specific to the viral RNA polymerase, results in a population of transcripts with an additional “A” inserted within a highly conserved GAAAAAA sequence, thus enabling expression of P3N-PIPO. The slippage efficiency is ∼2% in Turnip mosaic virus and slippage is inhibited by mutations in the GAAAAAA sequence. While utilization of transcriptional slippage is well known in negative-sense RNA viruses such as Ebola, mumps and measles, to our knowledge this is the first report of its widespread utilization for gene expression in positive-sense RNA viruses.


Viruses and plasmids
The following viruses and viral constructs were used: TuMV-GFP based on isolate UK1 (GenBank accession EF028235; [1]); BCMNV isolate PV 0413 (GenBank accession HG792063), and BCMV isolate PV 0915 (GenBank accession HG792064). TuMV infections used Nicotiana benthamiana as the host and BCMNV/BCMV infections used Phaseolus vulgaris as the host. Nucleotide and amino acid coordinates for constructs and primers are relative to the RNA sequences of these accessions. Mutagenesis of constructs was carried out using overlap extension PCR with mutagenesis primers and standard cloning methods. To enable agroinfiltration, the 35S-TuMV-GFP-NosT cassette was cut out from its original backbone with SmaI-EheI and ligated into blunted KpnI-SacI sites in the vector pGreenII. A V5 tag sequence was inserted near the 5' end of the region encoding P3 to produce the protein sequence ...VG|GTGKPIPNPLLGLDSTGTEW... (V5 tag sequence in bold, "|" indicates the P3 N-terminal processing site). Mutations made at the GAA_AAA_A conserved site are described in Fig 3A. Both the V5 tag and conserved site mutations were introduced into the vector by exchanging the KpnI-SnaBI fragment. To generate the ∆GDD mutant, nucleotides 9014-9022 (encoding the GDD motif in the RdRp) were deleted by mutagenesis PCR and the fragment introduced back to the plasmid using AvrII-SalI restriction sites.

Inoculation
Nicotiana benthamiana plants were grown under a 16 h photoperiod at 22 °C and 60% humidity. Three-to four-week old plants were inoculated biolistically using the PDS-1000/He system (BioRad) with 0.8-1.5 µm gold particles (AlphaAesar) coated with plasmids according to the manufacturer's instructions. For cell-to-cell movement analysis, fully expanded younger leaves roughly 3-4 cm in diameter were removed from plants, biolistically inoculated as described above, and kept in a closed container to prevent drying out of tissue prior to further analysis.

Agroinfiltration
Agrobacterium tumefaciens GV3101 containing the desired constructs was grown in LB medium at 30 °C. Bacteria were pelleted by centrifugation at 2,500 g at 4 °C for 15 min, resuspended in 10 mM MgCl 2 and pelleted again. Bacterial cells were then suspended in 0.2 mM acetosyringone in 10 mM MgCl 2 , incubated on ice for 30 min, pelleted and resuspended in the same solution. The OD600 of bacterial suspensions was adjusted to 1.0 for agroinfiltration.

Western analysis
For protein detection, leaf disks were frozen in liquid nitrogen and homogenized. Ten volumes (v/w) of -20 °C 100% acetone was added to powdered tissue, mixed (optionally incubated at -20 °C) and centrifuged at 18,000 g for 15 min at 4 °C. The pellets were washed 3 times with the initial volume of -20 °C 100% acetone, centrifuging between each step, and dried under vacuum. The pellets were reconstituted in SDS loading buffer, denatured by boiling for 5 min and analysed on 12% NuPAGE bis-tris gels. Proteins were blotted to nitrocellulose membrane and blocked with 5% non-fat milk for 1 h in PBS. Membranes were then probed in PBStw with anti-V5 (LifeTechnologies) or anti-CP (DSMZ, AS-0132) antibodies for 1 h and washed three times with PBStw, followed by an incubation of 1 h with IRdye680-or IRdye800-conjugated secondary antibodies (Licor). The blots were washed twice in PBStw before visualisation with an Odyssey infrared scanner (Licor).

Reverse transcription
Leaf disks were frozen in liquid nitrogen, homogenised and total RNA was extracted as described by . Reverse transcription was carried out on 2 µg of RNA using SuperScript III (LifeTechnologies) at 50 °C according to the manufacturer's instructions. Negativestrand-specific RT was performed as described by Purcell et al. [3]. Briefly, total RNA was treated with DNase and checked for DNA contamination using PCR. Reverse transcription was done as described above using a tagged primer (ggcagtatcgtgaattcgatgcCATCAGGGTGGACAGCAACG, tag in lowercase; TuMV-GFP nt 3556-3576 in uppercase). Excess primer was removed by adding 10 U of exonuclease I (NEB) and incubated at 37 °C for 30 min followed by inactivation at 70 °C for 15 min. PCR was performed using 0.5 µl of RT reaction as template for 27 cycles with primers for the tag (ggcagtatcgtgaattcgatgc) and virus (ATGTGATTCGCCTCGGCAGT; complementary to TuMV-GFP nt 4186-4206) using Phusion DNA polymerase (NEB); cycling conditionsdenaturation 20 s at 98 °C, annealing 20 s at 69 °C and extension 40 s at 72 °C. For detecting positive strand, the latter primer was used in reverse transcription, accompanied by an untagged primer (CATCAGGGTGGACAGCAACG, TuMV-GFP 3556-3576) in PCR.

Virus purification
Virions were purified as described by Baratova et al. [4] with modifications. Virions were purified from systemically infected leaves harvested 3 to 4 weeks after inoculation. Leaves were ground to powder in liquid nitrogen and 0.5 M potassium phosphate buffer, pH 7.5, containing 0.01 M DIECA, 0.005 M EDTA, and 1% (w/v) sodium sulfite (2 ml of buffer per g of leaf tissue) was added. Debris was pelleted at 8,000 g at 8 °C for 20 min and supernatant filtered through cloth. The supernatant was stirred for 1 h at 4 °C with 1% (v/v) Triton X100, then PEG 6000 and NaCl were added at final concentration 5% and 1.2%, respectively. The mixture was stirred at 4 °C for 2 h. The precipitate was sedimented at 8,000 g at 8 °C for 20 min and resuspended in 0.05 M potassium phosphate buffer, pH 7.5. Cleared supernatant was layered onto a 20% (w/v) sucrose cushion and ultracentrifuged at 150,000 g at 5 °C for 2.5 h. Then the pellet was resuspended in 0.05 M potassium phosphate buffer, pH 7.5 and the ultracentrifugation step with cushion repeated. For RNA extraction, virions were incubated with 1% (w/v) SDS at room temperature for 15 min, and RNA was extracted by standard phenol-chloroform extraction followed by sodium acetate precipitation.

High-throughput sequencing
Targeted high-throughput sequencing was performed on viral sequences containing the conserved site, or host controls with an identical GA 6 sequence. Recent N. benthamiana transcriptome datasets [7] were searched for transcripts containing GA 6 heptanucleotides and these were verified against other datasets (http://solgenomics.net); accession numbers below are according to Nakasugi et al. [7]. Two transcript sequences were selected: one encoding a predicted ubiquitin-conjugating enzyme E2 36 (Nbv5tr6378450), and the second encoding a predicted eukaryotic translation initiation factor 5B (Nbv5tr6430098). The latter had two separate GA 6 sites. PAGE-purified primers containing the sequencing adapter and target sequence were purchased from Integrated DNA Technologies. RT-PCR with primers containing the sequencing adapters were used to produce amplicons with the following target sequences (omitting sequencing adapters and flanking sequence included in primers): TuMV -TCCATTTTGGAAAAAAGTTA (TuMV-GFP 3824-3843, total amplified fragment 3809-3859); BCMV -AGATGGAAAAAATCTATA (BCMNV 3281-3298, total fragment 3264-3315); BCMNV -TGTGTCGGAAAAAATTTATGCAAA (BCMV 2942-2961, total fragment 2922-2978); Ubi E2 -AAAAGAAAAAGAAAAAAGAT (Nbv5tr6378450 569-588, total fragment 554-607); eIF5B-1 -CCTTTGGTAAGAAAAAAGGCAAGAA (Nbv5tr6430098 394-418, total fragment 379-433); and eIF5B-2 -TAAGATGAAGAAAAAAGGGGCTG (Nbv5tr6430098 1058-1080, total fragment 1044-1097). Due to differing target RNA abundances, reverse transcription was carried out using: 2 µg of total RNA for TuMV WT, M1 and M2; 5 µg of total RNA for mutants P and FSko, and for BCMV and BCMNV; 10 µg of total RNA for host and ∆GDD controls; 400 ng of T7 in vitro transcribed RNA for pT7-667; 400 ng of virion-derived RNA; and 2 µg of polysome-associated RNA. 10 µl of reverse transcription reaction was used for PCR in a final volume of 70 µl using Q5 High-Fidelity DNA Polymerase (NEB) with appropriate primers. For the DNA control 1 ng of plasmid was used as template. Libraries were amplified for 17 cycles (20 cycles for ∆GDD and host controls) of: denaturation at 98 °C for 20 s; annealing for the first five cycles at 44 °C and the rest at 60 °C for 20 s; and extension at 72 °C for 30 s. After amplification, libraries were separated on 1xTBE 10% PAGE (Life Technologies), and target fragments were cut from the gel and purified. Then the libraries were quantified fluorometrically with Qbit dsDNA HS kit (Life Technologies), normalised and sequenced using the NextSeq500 platform (Illumina). Reads were checked for quality, clipped for tailing adapter sequence and preprocessed using the FASTX Toolkit (Hannon lab): reads containing 'N's, too short reads, contaminating reads from other libraries (errors in indexing), reads with alternative transcript sequence (only detected for Ubi E2, 2-4% reads) and reads less abundant than 1/10,000 of the most abundant read (i.e. below 0.01%) were not included in the analysis. Reads were processed using custom scripts according to variability in the conserved site followed by manual verification.
For whole-genome TuMV sequencing, RNA treated with RiboZero (Epicentre) from systemicallyinfected leaves or untreated RNA purified from virions was used to prepare libraries with the TruSeq Stranded mRNA Library Prep Kit (Illumina) according to the manufacturer's protocol. Libraries were sequenced using the NextSeq500 platform, and reads were checked for quality and adapter sequences trimmed using the FASTX Toolkit. The resulting 25-to 76-nt reads were submitted to the ENA databank (study accession PRJEB9490) and used in the analysis. Reads were mapped to the reference virus genome with BWA [8] and positions of insertions and read coverage were determined with custom scripts utilizing the reported CIGAR and TAG fields. Background single-nucleotide insertion rates were calculated by dividing the observed number of insertions (excluding those at the pipo slip site) by the product of mean read depth and genome length.

In vitro translation constructs
For producing an RNA containing conserved RNA regions of the TuMV genome, a T7 promoter was inserted directly upstream of the TuMV-GFP sequence. Subsequently the following rearrangements were made: nucleotides 445 and 3347 (P1 codon 105 / P3 start) were joined with a KpnI site inserted between; nucleotides 4014 and 1220 (end of PIPO / GFP sequence to extend the product) were joined with a NcoI site inserted between; and a TAG stop codon followed by a BglII site were inserted after nucleotide 1939 (GFP C-terminus) and joined to nucleotide 9881. This resulted in a construct, named pT7-667 (Appendix Fig S2A), which contained the following elements, in order: T7 promoter, 5'-UTR, 105 N-terminal codons of P1, P3(PIPO with a C-terminal GFP extension and stop), 3' UTR. Mutagenesis of the conserved site was carried out as described above and mutations were introduced by exchanging the KpnI-NcoI fragment in pT7-667. For testing frameshifting in vitro in a virus-unrelated context, a TTG_GAA_AAA_AGT sequence, or the same fragment with mutations as described in Fig 3A, was inserted between XhoI-BglII sites of vector pDluc [9] to produce pDluc-12 and its derivatives. In frame controls for pT7-667 and pDluc-12 contained CTC_GAG_AAG_GT (+1 IFC) or CTC_GAG_AAG_T (+2 IFC) in place of the WT sequence TTG_GAA_AAA_AGT. All constructs were verified by sequencing.

In vitro translation
Templates were linearised before transcription using SalI and FspI for pT7-667 and pDluc-12 based constructs, respectively. Capped transcripts were in vitro transcribed using mMESSAGE mMACHINE T7 Transcription Kit (Life Technologies) according to the manufacturer's instructions. Transcripts were purified on NucAway spin columns (Life Technologies), verified for integrity by agarose gel electrophoresis and quantified. Transcripts were translated in wheat germ extract (Promega) according to the manufacturer's instructions using 125 mM potassium acetate and 400 ng of RNA in 10 µl reactions containing [ 35 S]methionine for 1.5 h at 25 °C. Reactions were terminated by addition of an equal volume of 100 µg/µl ribonuclease A and incubation at 37 °C for 20 min. Two microlitre samples were heated with the same volume of SDS-PAGE sample buffer and products analysed on 10% SDS-PAGE. The gels were dried and visualised using either autoradiography or, for quantification, phosphor storage screens and a Molecular Imager Typhoon (GE) system was used. Quantification of products was done using ImageQuant TL (GE) software.

Generation of shuffled ORF sequences
Polyprotein ORF sequences were shuffled so that the original amino acid sequence and the original total numbers of each of the 61 codons were maintained, but synonymous codons were randomly shuffled between the different sites where the corresponding amino acid is used in the original sequence. The three codons containing the bona fide pipo slip site were excluded from the shuffling and motif counting.  Positions of expected zero-frame and frameshift products are indicated with arrows. Molecular weight markers are shown on the right-most lane. A product corresponding to a shift into the +2 frame is observed for the WT, M1 and M2 constructs. For the same constructs, a product corresponding to a shift into the +1 frame is also observed. With mutants P and FSko, neither frameshift product is detected. B As in (A), but using a 12-nt insert containing the conserved GAA_AAA_A sequence, mutants thereof, or in-frame controls in an unrelated dual reporter construct. The constructs behave similarly to those with a 667-nt insert, except that the P and FSko mutations fail to completely eliminate products corresponding to a shift into the +2 frame. C The efficiency of +2 frame expression based on the ratio of methionine-normalized densitometric volumes of +2 frame and 0 frame products. Error bars show standard deviations for 3 technical replicates. The small amount of +2 frame expression observed with the P and FSko mutants for the constructs with a 12-nt insert may result from low level translational frameshifting at the termination codon of the upstream reporter in the in vitro system, which would result in a product co-migrating with the product of a shift in reading frame occurring at the GA 6 sequence. Such shifting could also explain the faint band migrating slightly ahead of the +2 frame product in the constructs with the 667-nt insert. Excepting these faint bands, in view of the high-throughput sequencing analysis of T7 slippage on GA 6 (Fig 5), it would appear that the frameshift products observed in vitro can be explained by T7 polymerase slippage. The distal sequences included in the Fig S2A constructs were not found to stimulate translational frameshifting and are of limited relevance to an assessment of transcriptional frameshifting given the exotic polymerase (T7) and template (DNA).

Figure S3 -Assessment of transcriptional slippage in Plum pox virus.
NCBI SRA accession numbers ERX013141 and ERX013142 contain samples ERR034801-06 and ERR034807-12, respectively. Frequencies for reads with an additional 'A' in the conserved site (i.e. GA 6 to GA 7 ) are presented. Figure S4 -Selection against A 6 and U 6 sequences in potyvirus genomes. GA m , A n , U n , U m C, GA 5 , A 6 , U 6 , GA 4 , A 5 , U 5 (m > 6, n > 7) sequences were counted in the polyprotein ORF (positivesense; any reading frame) of 99 genus Potyvirus NCBI RefSeqs (red bars; see Appendix Dataset S1 for accession numbers). Except for the first four motifs, only homopolymeric runs of the exact length stated were counted (e.g. the count for A 5 sequences excludes A 5 sequences that occur as part of A 6 sequences). The polyprotein ORF of each of the 99 sequences was randomly shuffled 1000 times while maintaining amino acid sequence and codon bias (see Supplementary Methods), and the frequencies of the above motifs were counted in each shuffled sequence (purple bars). Three arbitrary non homopolymeric heptanucleotides were included for comparison. The three codons containing the pipo slip site were excluded from these analyses. Error bars indicate standard deviations over the 1000 randomizations. Small differences between the observed and expected values (e.g. for ACCUCCC) may partly stem from dinucleotide biases which were not explicitly maintained in the randomization protocol. The 'other' sequences from the TuMV virion samples have a 'G' insertion instead of an 'A' insertion (i.e. U 4 G 3 A 6 G instead of U 4 G 2 A 7 G). c The 'other' sequences from the TuMV T7 transcription samples are either U 5 G 2 A 6 G (81%) or U 4 G 3 A 6 G (19%) instead of U 4 G 2 A 7 G. d Technical repeats.