It’s the Little Things (in Viral RNA)

Chemical modifications of viral RNA are an integral part of the viral life cycle and are present in most classes of viruses. To date, more than 170 RNA modifications have been discovered in all types of cellular RNA. Only a few, however, have been found in viral RNA, and the function of most of these has yet to be elucidated. Those few we have discovered and whose functions we understand have a varied effect on each virus. They facilitate RNA export from the nucleus, aid in viral protein synthesis, recruit host enzymes, and even interact with the host immune machinery.

V iruses are a phylogenetically diverse group of obligate intracellular parasites and it is estimated there are approximately 10 31 of them in the world today (1). The Baltimore system divides viruses into seven (originally six) main categories based on their genome form: (i) double-stranded DNA (dsDNA); (ii) single-stranded DNA (ssDNA); (iii) double-stranded RNA (dsRNA); (iv) positive single-stranded RNA (ϩssRNA); (v) negative single-stranded RNA (ϪssRNA); (vi) positive single-stranded RNA retroviruses (ssRNA-RT); and (vii) double-stranded DNA retroviruses (dsDNA-RT) (2,3). In successful completion of their life cycles, however, these viruses have several things in common. One of them is the creation of viral RNA, whether it be genomic RNA, mRNA, or only an intermediate RNA. As viruses do not have their own translational machinery, they must hijack a host cell apparatus in order to replicate, and they have developed various strategies for this purpose (4)(5)(6). Viruses also need to evade host immunity, facilitate RNA export from the nucleus, and improve their RNA stability, translational efficiency, packaging, etc. while avoiding cellular processing of viral RNA. One of the ways they achieve all this is through the use of RNA modifications (7,8).
Chemical modifications of RNA have been known for over 50 years (9,10). They affect a wide range of processes, from RNA stability to translational efficiency (11,12). To date, more than 170 RNA modifications have been identified (13), and our understanding of their functions has improved greatly and been the topic of numerous reviews (14)(15)(16)(17). Most of the known chemical modifications are present in rRNA and tRNA. Moreover, the discovery that modifications of mRNA are dynamic and reversible (18) led to the establishment of the new field of epitranscriptomics. Unfortunately, the minuscule amount of regulatory RNA and mRNA, in which these modifications potentially affect the function of the entire RNA molecule, represents a limitation that is difficult to overcome even with the techniques available today (19). The actual number viral protein expression and viral release (Fig. 1B). On the other hand, based on experiments demonstrating a significant decrease in viral replication upon METTL3 and METTL14 inhibition, along with an increase in viral replication upon AlkBH5 depletion, it seems that m 6 A plays an important role in the regulation of the HIV life cycle (25). This effect may be caused by the HIV-1 Rev protein preferentially binding to a Rev response element (RRE) containing the m 6 A modification and promoting nuclear export of the viral mRNA to the cytosol (Fig. 1A) (51). The effect of m 6 A in the RRE is still under debate, as some studies have shown that m 6 A has a minimal impact on the structure and stability of the RRE. Though the impact of methylation seems marginal, recent small molecule microarray screens have revealed that the change is sufficient for selective recognition by Rev (49,(51)(52)(53)(54). The differences in results from studies of m 6 A in HIV can be attributed to several factors, such as the cell type used, the phase of the viral life cycle, the method used to detect m 6 A, etc. These discrepancies are thoroughly discussed in a review focusing specifically on m 6 A (25). It is also important to note that the viral infection can change the abundance of m 6 A in cellular RNA (51). For example, the binding of the CD4 receptor to the HIV-1 envelope glycoprotein GP120 increased the amount of cellular m 6 A by several fold, though the mechanism remains unclear (55).
In comparison to HIV-1 infection, the role of m 6 A during a flaviviral infection is more unambiguous. Based on current studies, m 6 A has an inhibitory effect on flaviviruses such as the hepatitis C virus (HCV) and the Zika virus (ZIKV). An increase in the amount of m 6 A in the viral RNA hinders viral replication and, consistently with this effect, a lowering of m 6 A leads to an increase in viral production (56). In HCV, the E1 gene region showed an ability to bind YTHDF proteins. In infected cells, these proteins subsequently relocalize to the lipid droplets in which viral particle assembly takes place and where they inhibit the packaging of the virus. When YTHDF is overexpressed, it binds to the RNA and hinders viral production. When m 6 A is absent, viral production accelerates due to an increase in HCV core protein binding to the E1 site (57). The YTHDF proteins had a similar effect on the Zika virus. Their knockdown through small interfering RNA (siRNA) led to an increase in ZIKV replication, while their overexpression inhibited the virus, pointing to a conserved mechanism among flaviviruses (58). It is important to note that when a virus infects a cell, it also affects its immune response by modifying the amount of m 6 A present in cellular RNA (34). m 5 C. 5-Methylcytosine was first discovered as a chemical modification of DNA more than 70 years ago (59). Its presence in RNA was detected in the late 1970s. Then, 3 H labeling was used to confirm the presence of m 5 C in the mRNA of hamster cells infected with Sindbis virus (60). More specifically, the viral 26S mRNA coding for viral structural proteins is substantially modified by m 5 C. The 42S mRNA also possesses several m 5 C sites, but significantly fewer than 26S mRNA (61).
In viral RNA, m 5 C has been shown to affect the host innate immune response, binding to the pattern recognition receptor RIG-1 but failing to induce the necessary conformational change that would cause the antiviral signaling cascade (62). The ability of m 5 C to modify viral RNA properties has led some to believe that this modification could facilitate the transfer of viroidal RNA into cellular nuclei or chloroplasts. The presence of m 5 C in viroids, however, was ruled out through bisulfite sequencing (63).
The m 5 C modification has also recently been linked to enhanced retroviral gene expression. In general, the methyltransferase NSUN2 is responsible for the methylation of cytosine in tRNA and mRNA (64), and can also add m 5 C to retroviral transcripts and thus affect their life cycle (65). The genomic RNA of the murine leukemia virus contains as many as 40 m 5 C sites, and their removal inhibits viral replication. Specifically, downregulation of the m 5 C writer NSUN2 through RNA interference (RNAi) caused an overall decrease in Gag protein expression, proving that the presence of m 5 C positively regulates viral replication (66). It has recently been reported that m 5 C is also present in the genomic RNA of SARS-CoV-2 genomic RNA (67). Although the effect of m 5 C on the viral life cycle is clearly visible, more research is necessary to elucidate the mechanisms by which it acts.
Inosine. Inosine (I) is an essential modification created through the deamination of adenosine in a process called RNA editing (68). This is done by specific deaminases termed ADAT for tRNA and ADAR for noncoding RNA and mRNA (69). Inosine has been detected in several types of viral RNA, including dsRNA viral transcripts of human herpesvirus 8, negative-sense ssRNA viruses such as human orthopneumovirus, and even virusoids like hepatitis delta virus (HDV) (70)(71)(72). The ADARs and the modification itself have been shown to affect the viral life cycle with several mechanisms, either directly by means of the interaction of the modification, or through the inhibition of an immune response against the virus (73,74). There is a specific isoform of ADAR1, known as p150, which is generated through an interferon (IFN)-inducible alternative promoter, meaning p150 is part of a direct antiviral response (75). ADAR1 has also been shown to positively regulate viral replication by binding and inhibiting the protein kinase R (PKR), which acts as an inhibitor of translation by phosphorylating eukaryotic initiation factor 2 (eIF2␣). Phosphorylation of eIF2␣ stops the cellular mRNA translation and thus prevents the viral mRNA from being translated as well (76). HIV-1 is another example of a virus that actively uses ADAR-1; in fact, ADAR-1 can bind to the HIV-1 p55 Gag protein and is readily incorporated into the virion, pointing to an even more important role of this enzyme and, potentially, inosine in the viral life cycle (77).
In the viral RNA of the human orthopneumovirus (also called the respiratory syncytial virus [RSV]), inosine acts as an innate immune recognition element. An in vitro-prepared RNA containing this modification also elicits an immune response. The ssRNA with this modification induces a stronger inflammatory cytokine response ( Fig. 2A). It facilitates the release of interferon ␤ (IFN-␤), tumor necrosis factor ␣ (TNF-␣), and interleukin 6 (IL-6) through binding with a scavenger receptor class-A molecule. This receptor then activates the mitogen-activated protein kinase (MAPK) pathways through toll-like receptor 3 (TLR3). The authors have also demonstrated that inosine-RNA decreases replication of the respiratory syncytial virus (RSV) in epithelial cells in vitro (72). It has been suggested that the changes in RNA secondary structures associated with inosine are detected through TLR7 and TLR8. These changes also lead to an increase in TNF-␣ production, which would mean that inosine serves as a molecular pattern to be recognized in the phagocytosed RNA, as these receptors are mainly expressed by antigen-presenting dendritic cells (78).
Inosine has been shown to play a major role in the life cycle of hepatitis delta virus (HDV). HDV is a negative-strand RNA virusoid that exists in the form of a satellite associated with hepatitis B virus (HBV) (79). HDV produces its two proteins, called the small delta antigen (HDAg-S) and the large delta antigen (HDAg-L), from the same open reading frame (ORF) (80). HDV uses host ADARs, which edit an A to I in the HDAg-S amber codon in the antigenomic RNA. A UIG codon is thus transcribed as a UGG codon, and the resulting mRNA is translated as the HDAg-L ( Fig. 2B) (81,82). The general ability of inosine to change the ORF in the viral RNA enables the virus to compress more genetic information into the same sequence. The possibility that other viruses utilize the same feature should be entertained, along with the potential immunogenic effects of this modification.
2=-O-methylations. 2=-O-methylations (Nm) (the addition of a methyl group to the 2=-OH of a ribose) of RNA have been a subject of interest since the 1960s, when they were discovered and observed by means of radioactive labeling of RNA (83,84). Based on their position within the RNA molecule, 2=-O-methylations can be divided into two categories. The first category includes methylation of eukaryotic mRNA at the first and second nucleotide behind the 5= cap (85). These structures, called Cap1 and Cap2 based on the position of the methylated nucleoside (86), are responsible for efficiency of processing, translation, overall stability, and susceptibility to degradation of mRNA (87,88). Some viruses, such as coronaviruses, flaviviruses, orthomyxoviruses, and picornaviruses, rely on such a cap-dependent mechanism of translation. They can either use the host cell capping apparatus, snatch the caps from host mRNA (e.g., influenza), or code for their own capping machinery (89,90). While the aforementioned viruses produce 5= capped RNA, the lack of 2=-O-methylation may still alert the cell to their presence (91).
In fact, the absence of 2=-O-methylation on the first nucleotide of the 5= cap (Cap0) is strongly immunogenic. The cytoplasmic pattern recognition receptor Mda5 is activated through binding to the Cap0 RNA (92). The activated Mda5 interacts with the mitochondrial antiviral signaling proteins (MAVS) through its N-terminal caspase activation and recruitment domains (CARDs). Working in a multiprotein complex, the MAVS recruit the inhibitor of nuclear factor kappa-B kinase subunit epsilon (IKK) and the serine/threonine-protein kinase 1 (TBK1). This leads to the phosphorylation and transport of interferon regulatory factors 3 and 7 (IRF3 and IRF7) into the nucleus, where they activate the transcription of type I interferon genes IFN-␣ and IFN-␤ (Fig. 2) (93)(94)(95)(96)(97).
Another type of molecule capable of recognizing the Cap0 structure belongs to the IFIT family (interferon-induced proteins with tetratricopeptide repeats) (98). These molecules serve not only as detectors but also as effectors capable of inhibiting the viral life cycle (99). In particular, IFIT1 competes with the eukaryotic initiation factor 4E (eIF4E), which is part of the eukaryotic initiation factor 4F (eIF4F) that binds to the 5= cap of mRNA (100). The eIF4E has a higher affinity for the Cap1 and Cap2 structures than IFIT1. On the other hand, IFIT1 has a higher affinity for the Cap0 structure. Binding to the viral RNA, IFIT1 leads to the abortion of viral translation (101). It also inhibits the formation of the 43S-mRNA complex and blocks the recruitment of eIF3 to the ternary complex, etc. (102).
The second type of 2=-O-methylation is present in internal RNA sites. These methylations are added to the viral RNA by hijacking the cellular methyltransferase FTSJ3. Cells in which the FTSJ3 methyltransferase is knocked down produce HIV-1 RNA with fewer methylations, and the virus induces higher expression of IFN-␣ and IFN-␤ (103).
It has been suggested and tested both in vitro and in mouse models that some RNA viruses may be attenuated by creating mutants lacking the Nm modification. This is done by creating recombinant viruses with a specific defect in the S-adenosylmethionine (SAM) binding site of the methyltransferase responsible for 2=-O-methylation. An infection with this recombinant virus elicits strong humoral and cellular immune reactions (104,105). One of the model viruses used for this type of vaccine research was the severe acute respiratory syndrome coronavirus (SARS-CoV). Mutations introduced into nonstructural protein 16 (nsp16) created an attenuated virus by preventing it from creating the 2=-O-methylation (106). It also conclusively proved that viral Nm is an integral RNA modification necessary for a successful viral life cycle, and that viruses may utilize host methylating machinery to hide from the immune system. Given the recent outbreak of the disease COVID-19 associated with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), it is definitely worthwhile to continue examining the modifications possessed by this virus and discover new ways of exploiting them for our benefit.
Pseudouridine. Pseudouridine (⌿), often called the fifth nucleotide, is created by the isomerization of uridine. It is present in all RNA and in very high quantities in noncoding RNA (107). While the detection and location of pseudouridine in viral RNA is still in its infancy, and pseudouridine has yet to be detected in viral RNA, it has recently been reported that the enzyme pseudouridine synthase PUS7L is integral to the life cycle of HCV (108). An in vitro-prepared part of the polyU/UC RNA domain of HCV has been shown to act as a pathogen-associated molecular pattern that activates the pattern recognition receptor RIG-1 and leads to IFN-␤ production (109). The complete replacement of uridine with pseudouridine in this transcript drastically decreased IFN-␤ production, even though the RNA motif still had a high affinity for the RIG-1 molecule. Specifically, the RNA binds to RIG-1 but fails to trigger the conformational change associated with the activation of the molecule, thus disrupting the IFN-␤ immune response at an early stage (62). As it may be an essential part of the viral life cycle and the evasion of the host immune response, further research into the effect of pseudouridine in viral RNAs is warranted.
Other RNA modifications. Recently, a study based solely on liquid chromatographymass spectrometry (LC-MS) analysis of viral RNA from ZIKV, dengue virus, HCV, poliovirus, and HIV-1 reported that the genomic RNA of these viruses contains, respectively, 32, 39, 42, 41, and 36 various chemical RNA modifications. Apart from numerous RNA modifications that were never before reported in mammalian systems, N 1 -methyladenosine (m 1 A) was also detected in all the tested viruses (110). Shortly thereafter, a study mapping m 1 A in RNA from the viral particle of HIV-1 showed that all the detected m 1 A comes from tRNA copacked in the viral particle, proving that HIV-1 genomic RNA does not contain m 1 A (22). This is in line with the finding that m 1 A is a typical tRNA modification and its presence in other types of RNA is somewhat rare (111).
Using ultraperformance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS) analysis of the purified genomic viral RNA of two retroviruses, HIV-1 (65) and murine leukemia virus-MLV (66), it was recently shown that m 1 A, m 1 G, m 5 C, or m 7 G are present in viral RNA. These modifications, however, are present in tRNA Lys and tRNA Pro , which serve as primers for the start of the reverse transcription of HIV-1 or MLV, respectively (13). It can be assumed that these tRNAs bind tightly to the genomic viral RNA. The sequencing libraries were prepared using the protocol for HIV-1 packageome analysis (112), which does not recover short RNAs (Ͻ50 nucleotides [nt]) or the highly structured tRNA. Even though the control sequencing analysis of the UPLC-MS/MS samples did not show any contamination from cellular RNA, the bound tRNAs could have been present and overlooked. It is important to note that, for example, 2-methylthio-N 6 -threonylcarbamoyladenosine (mS 2 t 6 A) causes a complete abortion of reverse transcription, and all the cDNA reads from tRNA Lys have an approximate length of only 36 nucleotides and thus are not included in the library (22). Therefore, it would be useful to map m 1 A or m 1 G in retroviral genomic RNA with a profiling technique to confirm the presence of these modifications. It is important to note that modifications such as m 1 A and m 1 G would affect the function of the viral RNA because they disrupt traditional Watson-Crick base pairing, unlike m 6 A or m 5 C. For example, they may weaken complementary pairing of the molecule and change its coding capacity.
Detection techniques. Before a thorough study of the functions of a particular RNA modification, its existence and sequence position must be determined. Known viral RNA modifications and the methods of their detection are summarized in Table 1. Prior to the era of transcriptome sequencing (RNA-seq)-based techniques (113), mass spectrometry (MS) and radioactive labeling were commonly used to discover new RNA modifications. Even today, MS remains a very important tool capable of confirming the presence of almost all the chemical modifications (114). Nevertheless, it does not allow for the determination of the exact position of an RNA modification within the RNA sequence. The common procedure comprises the isolation of very pure target RNA material, followed by its digestion into the form of nucleosides or nucleotides. Analysis by means of MS usually requires a larger amount of starting material (isolated RNA) compared with RNA-seq-based methods. Although MS is a direct method and does not suffer from amplification bias created during library preparation, its main limitation lies in the purification of a particular RNA. Because rRNA represents about 85%, and tRNA about 12%, of cellular RNA (115), contamination of mRNA or viral RNA with these very abundant RNAs sometimes causes false positives when detecting RNA modifications.
In contrast, RNA-seq methods allow for the determination of the exact position of RNA modifications within the entire transcriptome. The main disadvantage is the necessity of developing a specific capture/profile technique for every RNA modification. Once such a method is available, captured RNA is reverse transcribed into cDNA (which does not contain any modifications) and then amplified. The majority of methods rely on selective antibodies against m 6 A, including m 6 A-seq (20); MeRIP-seq (18); miCLIP (116); and m 5 C (m 5 C-RIP) (117). The main issue with antibody-based methods is the lack of specificity and effectivity of the antibodies used. Nonspecific binding of the antibodies often introduces significant bias into the results, so a careful approach is thus required (118). To overcome this problem, alternative techniques combined with next-generation sequencing have been developed for m 6 A profiling, such as employment of RT-Klentaq DNA polymerase (119) or, more recently, MAZTER-seq (120). Other techniques for the detection of m 6 A, such as SCARLET, require prior knowledge of the position of m 6 A (121). Another antibody-independent method is called DART-seq (deamination adjacent to RNA modification targets). This uses the m 6 A-binding domain YTH fused to the cytidine deaminase APOBEC1. The C nucleotides next to the m 6 A are deaminated into U, which is subsequently recognized using RNA-seq (122). The development of selective chemical techniques for m 6 A profiling, which would overcome all the disadvantages of the previous methods, is limited by the similar chemical structure and similar reactivity of m 6 A to canonical adenosine.
While there is an antibody-based approach for the detection of m 5 C, bisulfite sequencing is also frequently used. It relies on the selective chemical reaction of Genomic, mRNA

RiboMeth-Seq
Minireview ® canonical cytidine and 5-methyl cytidine and is a functional alternative to the aforementioned antibody-based techniques (123)(124)(125)(126). Relatively harsh reaction conditions, however, may destroy fragile RNA molecules, and other modifications (such as N 4 , 2=-O-dimethylcytidine) are sometimes mistaken for m 5 C. Moreover, the standard RNA bisulfite protocol has been shown to generate false-positive results when working with highly structured RNA (63). Nevertheless, the method has been used effectively to detect m 5 C in mouse embryonic and brain polyA RNA (127) and, recently, a modified bisulfite sequencing method called RBS-seq was introduced for simultaneous detection of ⌿, m 1 A, and m 5 C in a transcriptome-wide manner (128). To avoid the drawbacks of the aforesaid techniques, metabolic labeling methods were developed to detect both m 6 A (PA-m 6 A-seq [129] or metabolic propargyl labeling [130]) and m 5 C (Aza-IP [131] and miCLIP [132]). In general, the main disadvantage of metabolic labeling using propargyl, 5-azacytidine, or 4-thiouridine is that it introduces a major type of stress to the cell, such that the results do not represent the state of a healthy system. Even though there are currently no antibody-based methods to detect inosine, a comparison of genomic sequences with the corresponding cDNA reveals its position within the RNA molecule. Inosine pairs with cytidine, and the cDNA thus contains a guanosine in its place (133). However, the method is prone to false positives, as it does not distinguish mapping errors, alignment errors, or single-nucleotide polymorphisms (134,135). As an alternative, a chemical method called ICE-seq (RNA-seq based on the selective reaction of inosine with cyanoethyl) was developed in 2010 (136,137).
There are no antibodies against 2=-O-methylation, and the modification is fairly unreactive. There are also two types of 2=-O-methylations distinguished by their position within the RNA molecule, where one is a part of the 5= cap (85) and the other in internal RNA sites. The 2=-O-methylation within the cap structure can be detected mainly using UPLC-MS/MS (138). Identification of methylated internal sites is more complicated, and the developed techniques rely on a higher stability of the methylated position under basic conditions (RiboMethseq [139,140]) or during oxidation (NaIO 4 ) and ␤-elimination (RiboOxi-seq [141] and Nm-seq [142]). It was discovered, however, that in the case of less abundant RNA, Nm-seq is prone to mispriming and false positives (143).
In the future, all the problems caused by classical next-generation sequencing-based methods might be overcome with direct nanopore sequencing. This technique has the potential to detect RNA modifications directly without the need for reverse transcription and amplification. It has already been used in the sequencing of influenza virus RNA (144). In nanopore sequencing, the RNA moves through a pore and disrupts the electric current around it, causing so-called squiggles. Theoretically, every base and every modification disrupts the current differently and can thus be identified (145). Problems with alignment and current intensity changes, however, have prevented the creation of a successful detection algorithm. This issue can be circumvented by analyzing base-calling errors for some modifications, such as m 6 A, by comparing the target RNA with a nonmodified (or severely depleted) control RNA, or by employing artificial intelligence (AI) (146,147).
Conclusion and outlook. While internal RNA modifications in mRNA or viral genomic RNA and mRNA do exist, they are not as diverse and abundant as many believed in 2012, when the field of epitranscriptomics was established. Nevertheless, the chemical modifications of viral RNA described above obviously play a role in the viral life cycle, in interactions with host innate immunity, and in the distinction between self and nonself RNA. Despite that several attempts have been made to exploit one of these modifications in order to create an attenuated vaccine in several viruses, to the best of our knowledge, no attempts to target the other modifications are currently being made. In comparison with internal modifications, new discoveries of modifications, such as the 5= diphosphate termini in reoviruses, have shown that there is still much to be learned about the 5= RNA moieties. The 5= diphosphate RNA is recognized by RIG-I-like receptors in the cytoplasm and starts an antiviral cascade like the one described for the cap0 structure (107). Recently, a new mass spectrometry detection method called CapQuant has been used to identify several new types of RNA caps in purified dengue virions, including FAD (flavin adenine dinucleotide); UDP-Glc; UDP-GlcNAc; and m 7 Gpppm 6 A (108). We have recently described a new class of RNA caps in bacteria; dinucleoside polyphosphates were discovered using a combination of biochemical methods together with mass spectrometry (148). Given that some dsDNA viruses (e.g., poxviruses) encode their own NudiX enzymes that may process the 5= dinucleoside polyphosphate RNA caps, the existence of such alternative caps in viral RNA cannot be ruled out (149,150). The roles of most of these modifications and whether they are present in viral RNA remain to be determined. There is also a need for a careful and controlled approach when preparing RNA samples in order to generate reproducible and trustworthy data, or when applying a particular detection technique. Although new detection methods are constantly being developed based on ingenious new techniques, such as mutated enzymes that specifically interact with given modifications (120,122), AI analysis of collected data, together with nanopore sequencing, seems the most promising. It is clear that viral RNA and the roles played by its modifications still hold a number of secrets. Thanks to the high abundance of viral RNA molecules in infected cells, they may well be a crucial model for understanding the role of similarly modified cellular RNA and further expanding our knowledge in the field of RNA modifications.