A new subclass of exoribonuclease resistant RNA found in multiple Flaviviridae genera

Viruses have developed innovative strategies to exploit the cellular machinery and overcome the host antiviral defenses, often using specifically structured RNA elements. Examples are found in flaviviruses; during flaviviral infection, pathogenic subgenomic flaviviral RNAs (sfRNAs) accumulate in the cell. These sfRNAs are formed when a host cell 5’ to 3’ exoribonuclease degrades the viral genomic RNA but is blocked by an exoribonuclease resistant RNA structure (xrRNA) located in the viral genome’s 3’untranslated region (UTR). Although known to exist in several Flaviviridae genera the full distribution and diversity of xRNAs in this virus family was unknown. Using the recent high-resolution structure of an xrRNA from the divergent flavivirus Tamana bat virus (TABV) as a reference, we used bioinformatic searches to identify xrRNA in the Pegivirus, Pestivirus, and Hepacivirus genera. We biochemically and structurally characterized several examples, determining that they are genuine xrRNAs with a conserved fold. These new xrRNAs look superficially similar to the previously described xrRNAs but possess structural differences making them distinct from previous classes of xrRNAs. Our findings thus require adjustments of previous xrRNA classification schemes and expand on the previously known distribution of the xrRNA in Flaviviridae, indicating their widespread distribution and illustrating their importance. IMPORTANCE The Flaviviridae comprise one of the largest families of positive sense single stranded (+ssRNA) and it is divided into the Flavivirus, Pestivirus, Pegivirus, and Hepacivirus genera. The genus Flavivirus contains many medically relevant viruses such as Zika Virus, Dengue Virus, and Powassan Virus. In these, a part of the virus’s RNA twists up into a very special three-dimensional shape called an xrRNA that blocks the ability of the cell to “chew up” the viral RNA. Hence, part of the virus’ RNA remains intact, and this protected part is important for viral infection. This was known to occur in Flaviviruses but whether it existed in the other members of the family was not known. In this study, we not only identified a new subclass of xrRNA found in Flavivirus but also in the remaining three genera. The fact that this process of viral RNA maturation exists throughout the entire Flaviviridae family makes it clear that this is an important but underappreciated part of the infection strategy of these diverse human pathogens.


INTRODUCTION
Viruses face continuous evolutionary pressure to evolve innovative strategies that exploit the host cell′s biological machinery and overcome its antiviral defenses. Often, these are based on specifically structured RNA elements, which is not surprising given RNA′s functional diversity and ability to influence virtually every cellular process. Important examples are found in positive sense single stranded RNA (+ssRNA) viruses such as arthropod-borne flaviviruses, which use several structured elements within their RNA genome to direct or regulate important processes during infection. (1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13).
Within plant viruses, xrRNAs are associated with both non-coding and coding subgenomic RNAs (36).
Detailed three-dimensional crystal structures of xrRNAs from several flaviviruses provide insight into how an xrRNA resists the action of 5′-3′ exoribonucleases (1,13,37). Specifically, all xrRNAs fold into a compact structure containing a ring-like feature that wraps around the 5′ end of the resistant RNA element. Biochemical analyses suggest that this topology acts as a molecular brace against the surface of the exoribonuclease approaching from the 5' side, preventing it from progressing past a defined point (38). This ring is formed through specific structural motifs, including a three-way junction, pseudoknots and other long-range tertiary interactions, non-Watson-Crick base pairs, complex base stacking sequences in our alignment contain any nucleotide between P2 and P3, a peculiarity that was discussed in the report of the TABV xrRNA structure (37). The one sequence that does contain a nucleotide at that position comes from Rodent hepacivirus isolate rn-1 (RHV ; Table S1). This putative xrRNA has a U in this position, which cannot be otherwise accommodated within P2 or P3. Similarly, other Hepacivirus isolates belonging to sequences such as the Rodent hepacivirus isolate RrMC-HCV (RtMC, Table S1) comprise sequences that depart from the subclass 1b pattern, potentially leading to non-Watson-Crick pairs at the base of P2 or within a somewhat altered three-way junction (sequences marked by ′!!′ in Table S1). These sequences are rare and found mainly in members of the Hepacivirus genera.
Another subset of sequences deviates from the covariance model at the base of P1. They comprise putative ISFV xrRNA sequences that are found in the more downstream copies of putative xrRNA (xr3, xr4, etc.). Among them are xr4 from the Parramatta River virus isolate 92-B115745 xr4 (PARV_xr4; Table S1), xr3 from Cell fusing agent virus strain Galveston (CFAV_xr3; Table S1), xr1 from Mercadeo virus isolate ER-M10 (MECDV_xr1 ; Table S1) and xr1 from Menghai flavivirus isolate MHAedFV1 (MFV_xr1 ; Table S1). For example, only a two base-pair P1 can be proposed for PaRV_xr4, with no equivalent to A48 in TABV. Further testing will be required to determine whether these sequences indeed correspond to xrRNAs, and if so, whether they possibly form a subtype within subclass 1b.

Bioinformatically identified subclass 1b xrRNAs are exoribonuclease resistant
To test whether computationally identified subclass 1b xrRNAs were resistant to Xrn1 and therefore comprised authentic xrRNAs, we subjected representative sequences from Pestiviruses, Szucs, M. J. et al 9 Pegiviruses and Hepaciviruses to in vitro Xrn1 resistance assays (Fig. 3). Briefly, eight putative subclass 1b xrRNAs (marked by * in Fig. 2A) were transcribed in vitro with a 22 nt leader sequence, which allows Xrn1 to load onto the 5′ end of the transcribed RNA. Some of the previous work from our laboratory showed that a wild-type leader sequence of an arbitrary length could generate false negative results in Xrn1 resistance assays, likely due to RNA misfolding. Hence, for our assays of subclass 1b xrRNAs, the leader sequence contained the same normalization hairpin (xxxx-GAGUA-xxxx) and spacer sequences used for the chemical probing experiments described in the next section (Methods; see Table S3 in the supplemental material). This adjustment also contributed to consistency between resistance assays and probing experiments.
RNAs were treated with RNA 5′-Pyrophosphohydrolase (RPPH), leaving a mono-phosphorylated 5′ end (50). When an RNA is resistant to Xrn1, subsequent incubation with recombinant Xrn1 leads to a defined but smaller size product when resolved by polyacrylamide electrophoresis, as the enzyme has loaded, partially degraded the RNA, but then stopped. As a positive control, we used the in vitro transcribed TABV xrRNA (xr1) ( Table S3). An RNA with a tRNA-like structure that is not resistant to Xrn1, with appended normalization hairpins, was used as a negative control (51). When challenged with Xrn1, all putative xrRNA sequences identified in our computational searches were resistant, indicated by the appearance of the shorter but stable RNA product (Fig. 3). This result confirmed that even sequences with very short ( structure. Even the xrRNA from HCP, which displays a putative A.A pair at the base of P2, is resistant to Xrn1 (see Fig. S1 in the supplemental material). Thus, the bioinformatically identified subclass 1b xrRNAs that we tested are authentic exoribonuclease-resistant elements and this result suggests that the untested sequences from our sequence alignment likely are also true xrRNAs.

Secondary structure validation and chemical probing of subclass 1b xrRNAs
Since the secondary structures proposed on the basis of sequence alignments and covariation are predictions, we tested them experimentally through chemical probing with dimethyl sulfate (DMS; modifies the Watson-Crick side of unpaired A and C) and N-methylisatoic anhydride (NMIA; modifies the 2′ hydroxyl group of nucleotides not involved in Watson-Crick pairs) (Fig. 4) (52, 53).
We first probed (using NMIA only) the wild-type sequence of the TABV xrRNA, which expands on observations based on the crystal structure by providing a ′fingerprint′ of the chemical probing pattern for this type of fold (Fig. 4A). Most of the nucleotides were unreactive to NMIA (including in L3 which appears as ′unpaired′ in the secondary structure diagram, except for Pk2), supporting a compact and stable fold. Highly reactive positions in the exposed loop L2 were consistent with the TABV xrRNA crystal structure (Fig. 1C). Likewise, strong reactivities at U43 and A48 can also be rationalized by the structure: U43 is flexible and the ribose of A48 adopts a reactive C2′ endo pucker (54, 55). Although chemical probing was performed with the wild-type TABV sequence, the resulting data are consistent with the structure of the sequence variant engineered for crystallization (37). In previous studied, removal of the P4 stem has did not affect Xrn1 resistance (37, 56), thus we did not examine it as part of our analysis but the full chemical probing construct with the P4 stem is displayed in the supplementary information (Fig.

S2)
We performed chemical probing of the eight Pegivirus, Pestivirus, and Hepacivirus xrRNAs tested for Xrn1 resistance to compare the probing 'fingerprint' to that of TABV. The resultant patterns are consistent with the pattern from TABV xrRNA, indicating that they adopt a similar secondary structure ( Fig. 4). In seven cases, the RNAs were similarly non-reactive overall, except for L2 and the nucleotide equivalent to U43 in TABV ( Fig. 4B-H). GBVB showed the same pattern but was overall more reactive, likely indicative of a less stable in vitro fold (Fig. 4I). Several of the RNAs contained >30 nucleotide expansions in the P2 regions, and in all cases the probing indicated they were folded as elongated stem loops, in some cases with internal loops as seen in HPgV2 (Fig. 4B,C,E,G). In some of these, parts of L2 may be involved in additional intramolecular interactions, as suggested by the absence of reactive positions within L2 for NRPeV, RPeV and HPgV2 (Fig. 4B, C, E). Probing revealed that some sequences had two (NRPeV, RPeV, HPgV2, GBVB) or even three (APPeV) reactive nucleotides between Pk1′ and P1′, regardless of the potential for Pk1 to contain more than three base pairs, which could be interpreted Intramolecular interactions involved in proper ring folding, such as the equivalent position to the strongly reactive A48 in TABV, were reactive only in APPeV (G57), SPgV (A54) and GBVB (A47) (Fig.   4D,F,I). Finally, reactive positions to DMS but not NMIA in L3′ (the 3′ side of L3) for NRPeV, RPeV, SPgV, HCK, HCGu, and GBVB implied the Watson-Crick face of these residues remained available in an otherwise structured region of stacked bases, which is compatible to the availability of the A34, A35, and C36 in the the TABV crystal structure. Overall, the chemical probing data support the predicted secondary structure models that indicate these xrRNAs share the same TABV xrRNA-like fold, consistent with their inclusion in subclass 1b.

Structural analysis of the subclass 1b xrRNA
Based on the sequence covariation observed and our structural analysis, xrRNAs from subclass 1b are more compact than those from subclass 1a, except for the P2 stem (Fig. 2B). Subclass 1b xrRNAs are distinguished from subclass 1a by four structural features observed in the TABV xrRNA crystal structure. The first is the presence of generally two non-Watson-Crick interactions at the 5′ end of the P1 stem (Fig. 5A, 5C). Over half of the sequences in our alignment have the two A...C interactions observed in the TABV structure, followed by two Watson-Crick pairs before the three-way junction ( Fig.   2A, ; see also Table S1 in the supplemental material). The interactions between A4...C46 and A5...C45 seen in the TABV structure help create a sharp bend in the structure associated with a break in base stacking between Pk1 and P2 (Fig. 5A). For the remainder of the sequences, one of the two As is missing, probably the equivalent to A5, because A4 is involved in stabilizing interactions at the ring closure (Fig. 5A). In 28% of sequences, A4...C46 is replaced by A...A/G interactions. This change is accompanied by the presence of A-U or G-C instead of A5...C45, suggesting a third Watson-Crick pair could form in P1 in that case. A common Hoogsteen/sugar edge configuration at A...G and Watson-Crick at A-U or G-C would lead to a shift of the pairing partner of both A bases towards the minor groove. In any event, the length of P1 is constrained to 3-4 base pairs by the compactness of the fold, a structural Szucs, M. J. et al 12 feature which is supported by our comparative sequence alignment ( Fig. 2A). Conversely, the subclass 1a P1 stem is highly conserved with only two base pairs covarying out of the six possible Watson-Crick base pairs. Also, the subclass 1b lack almost absolutely-conserved sequences that are found in the subclass 1a xrRNAs, for example in the P1 stem, making them distinct (39).
The L3 loop comprising the Pk2 pseudoknot is the second defining feature of the subclass 1b xrRNA. The L3 region exists in a 3-3-3 (86%), 4-3-3/3-3-4 (13%), or a 4-3-4 (1%) nucleotide configuration (the number of nucleotides refers to the length of the 5′ side of L3, of Pk2, and of the 3′ side of L3 or L3′; Fig. 2A, 5C, Table S1). Sequence patterns of L3 can be rationalized based on the TABV structure: (i) the 5′ side generally comprises three pyrimidines (90% U at the position preceding Pk2; following Pk2 generally comprises two purines (A34 and A35 in TABV) which help support stacking of Pk2 on P3, followed by a pyrimidine (C36) which could accommodate the guanine present on the 5′ side of L3 in 45% sequences ( Figure 2; see also Table S1 in the supplemental material). Notably, the first adenines following Pk2 do not get modified by NMIA, although they are reactive to DMS (Fig. 4), which is consistent with them being conformationally constrained but with available Watson-Crick edges, as revealed by the TABV xrRNA structure ( Figure 5B).
Next, a defining structural feature of subclass 1a lacking in subclass 1b is the presence of a conserved cytosine between P2 and P3. In subclass 1a xrRNA, this nucleotide is important for tertiary interactions that support folding and the formation of the ring-like structure around the 5′ end of the xrRNA. Studies have shown that mutating this C to any other nucleotide disrupts the ability of xrRNA to resist Xrn1 (1). In recent studies with TABV it was shown that its fold cannot tolerate the addition of a nucleotide in this region (37). Because the ring of subclass 1b TABV xrRNA is stabilized by a different set of long-range interactions compared to the subclass 1a ZIKV xrRNA, this results in a more compact fold that can no longer accommodate the C. Although this C is critical for subclass 1a xrRNAs, our present work shows that its absence is a shared feature among subclass 1b xrRNAs. Finally, the covariation pattern of the Pk1 is another key feature of the subclass 1b xrRNA, although less distinctive between the two subclasses. According to sequence alignment and probing data, this region accommodates either two (11% of the sequences identified) or three base pairs (88.3% of the sequences) (Fig. 5B, 5C ; see also Table S1 in the supplemental figures). Out of the eight sequences we probed, five reveal that the Pk1 predicted to have three base pairs (based on sequence) actually only forms two base pairs (Fig. 4B, E, F, H, I). Together with the crystal structure of TABV (where Pk1 comprises two base pairs), these results suggest that two base pairs are sufficient to support a resistant fold, although we cannot exclude that Pk1 may have an additional pair on its 5′ side in some other cases, or that this pair is dynamic (Fig. 1C). The second of the two Pk1 base pairs (which is always formed) is >97% a G-C (Fig. 2B), as in subclass 1a ( Fig. 2A; reference to Kieft RNA Biology 2015). The first nucleotide immediately downstream of Pk1′ is strongly reactive to NMIA, except for HCK (Fig. 4G).
This feature is consistent with the flipped-out conformation of U42 in the TABV structure (Fig. 1B). In the subclass 1a the PK1 region is never more than two base pairs, but it can which can be up to three for the subclass 1b.

DISCUSSION
In this study we have identified and structurally characterized xrRNAs in all genera of Flaviviridae, one of the largest families of RNA viruses, expanding on the known distribution of these RNA structures within Flaviviridae (23). Based on our computational, structural, and biochemical data, we propose that the characteristics of these new xrRNAs require a division of the previously proposed class 1 xrRNA (38) into two distinct subclasses: subclass 1a (comprising MBFV, in particular MVEV, ZIKV, WNV, YFV, etc.), and subclass 1b (comprising TABV, ISFV, pegi-, pestiand hepaciviruses). Subclasses 1a and 1b deviate in (i) the sequence patterns of their P1 and P3 stems, (ii) the nucleotide patterns of the L3 loop, (iii) the Pk2 region, and (iv) whether a particular nucleotide is present between P2 and P3. In addition, the pattern of almost absolutely-conserved nucleotides that is indicative of subclass 1a (39), is not present in subclass 1b. Thus, all of the identified xrRNAs in Flaviviridae fall into three distinct groups: subclasses 1a and 1b, and class 2 (which remains as described in the literature) (38).
Within known Flaviviridae sequences, subclasses 1a and 1b do not appear in the same viral species. In other words, no viral sequence within the phylogenetic groups characterized as having subclass 1a xrRNAs possesses a subclass 1b xrRNA and vice-versa. When several xrRNAs are present in the same 3′ UTR (57), all belong to the same subclass. This observation supports the hypothesis of the evolution of these structural elements from a common ancestor (58), as opposed to horizontal gene transfer. This scenario is further supported by our observation that hepaciviruses -one of the most distantly related genera within Flaviviridaealso contain the most divergent subset of subclass 1b xrRNAs, possibly branching out further into distinct subcategories. Moreover, hepaciviruses include viruses like Hepatitis C, which do not have xrRNAs (59). In short, xrRNAs are evolving with the rest of the attached viral genome and not actively 'hopping' across genera.
Although subclass 1b xrRNAs are resistant to Xrn1 in vitro, their function in the context of viral infection and pathogenesis remains unclear. A key functional role of xrRNAs from subclass 1a is to enable formation of sfRNAs, which have key roles for virus survival within the vector and host cells (23, 27, 60). Whether the subclass 1b xrRNAs from pesti-, pegi-and hepaciviruses have a similar function in the generation of viral sfRNA is currently unknown. In particular, not every virus with a subclass 1b xrRNA generates sfRNAs. As an example, we show that GBV-C has a subclass 1b xrRNA (Fig. S1), although previous studies indicated it may not produce any sfRNA (3). Functions other than generating subgenomic RNAs were also expected from our discovery that plant virus xrRNAs could be located upstream of coding regions (36). Systematically testing of viruses for formation of sfRNAs would be a first step in pinpointing alternative functions of subclass 1b xrRNA. Whether subclass 1b xrRNAs participate in sfRNA generation or not, evolutionarily they maintain the necessary interactions for proper folding into a structure that can resist exoribonucleases.
In addition to motivating systematic characterization of sfRNA formation within all Flaviviridae genera, this study also reinforces the importance of combining bioinformatic, biochemical, and structural techniques to derive meaningful conclusions about viral structured RNAs. Bioinformatics are a powerful tool to develop xrRNA secondary structure predictions, but this approach requires the availability of complete 3′ UTR sequences. We estimate that >50% of the available viral genome sequences for Pesti-, Pegi-and Hepaciviruses end at the last codon of the coding region. In addition, biochemical experiments are needed to not only validate computational predictions, but also to refine the proposed models. As an example, the APPeV xrRNA was predicted to have a six-base pair Pk1, based solely on the sequence alignment and our covariance model. This possibility was refuted -in the context of Xrn1 resistance at least -after the chemical probing experiments revealed that Pk1 consisted of three base pairs (Fig. 4D).
We also bioinformatically identified two putative xrRNAs in HPgV2 whose sequences were overlapping ( Fig. S2). The Xrn1 resistance assay enabled us to determine which sequence corresponded to the true xrRNA. Similarly, further testing and structural mapping of the putative but still ambiguous xr3-5 elements for some of the ISFVs (CSFV and PaRV) should reveal whether they are actual xrRNAs, and if so, whether they might represent another subgroup within subclass 1b. Overall, this study expands on the widespread presence of xrRNAs in nature, now within all Flaviviridae viruses, thereby further supporting the prevalence of this particular structure in the viral world.

Subclass 1b bioinformatic searches and covariance model analysis
An initial subclass 1b alignment was created starting from a total of 20 sequences of insect specific flaviviruses (see Table S1, sequences with a '+' symbol) that were manually aligned with a combination of RALEE v. 0.8 and a text editor, using the TABV secondary structure as a reference. Using Infernal v.

Plasmid construction
Each xrRNA construct (Table S3) was designed as a double stranded DNA gBlock (IDT) and was subsequently cloned into the EcoRI and BamHI sites of pUC-19. Cloned plasmids were amplified in competent E. coli DH5α cells and purified with Qiagen miniprep kit (Qiagen). The recovered plasmid stocks were verified through sequencing (Eton Bioscience).

In vitro RNA transcription
Template DNA was amplified by PCR using custom DNA primers (Table S3)  Amicon Ultra spin Concentrators (Millipore-Sigma) were used to concentrate the eluted RNA to 2.5 mg/mL, which was then stored at -20°C in DEPC-treated H2O until use.

In vitro Xrn1 resistance assay
A total of 5 µg RNA was incubated at 90°C for 3 minutes and 20°C for 5 minutes in 40 µL of refolding buffer (100 mM NaCl, 50 mM Tris pH 7.5, 10 mM MgCl2, and 1 mM DTT). Next, 3 µL of recombinant RppH (0.5 µg/µL stock) was added to the mixture which was then aliquoted into two volumes of 20 µL.
To one of the aliquots, 1.5 µL of recombinant Xrn1 (0.8 µg/ul stock) was added and the reaction mix was incubated in a thermocycler at 37°C for 2 hrs. 10 µL of RNA from each reaction (+/-Xrn1) was resolved by a 7M urea 8% dPAGE and visualized with ethidium bromide.

Quantitation of chemical probing by reverse transcription
The RNA was reverse transcribed on beads with 2. The outlined reverse transcription protocol was followed as described above.

NMIA Reactivities
Simian Pegivirus Pk2'  Table S1: Complete subclass1b sequence alignment Table S2: General information about subclass1b sequences in alignment Table S3: In vitro transcribed subclass1b construct information Figure S1: subclass1b in vitro degradation assay gel images Figure S2: The following pages contain several tables: Table S1: Complete subclass 1b xrRNA sequence alignment. This sequence alignment was used to generate the subclass 1b covariance model (Fig. 2B). Constructs used for chemical probing analysis are designated by a star. Labels are formatted with the name of the virus, the accession number, the location of the xrRNA within the accession number, and the relative position of the xrRNA, when several are found in series (i.e., xr1 is the most 5′ xrRNA element). Sequences ending in a "+" were used in the initial sequence alignment as input for Infernal searches.    Figure 3B: SPgV D. Figure 3B: HPgV2 E. Figure 3A: NRPeV H. Figure    Rodent Pestivirus (KY370101.1)   Norway Rat Pestivirus

Replicates
Sequence Position

Replicates
Sequence Position

Replicates
Sequence Position

G G A G G CG UC G A G U A G A C GC C A A A U U C G GG C A A G GC A U G C GA G A U A A A A A G G GU C U C G U A U G A G GG C G UG GC A A C C C C UC C C C C CC AG CU G C GG CG GC A C A A A A G CG U C UC GC GU G U C G UC G A A A G G A G UC G A G U A G A C UC C
Reactivity

Replicates
Sequence Position

Replicates
Sequence Position