Epitranscriptomics: Toward A Better Understanding of RNA Modifications

Ever since the first RNA nucleoside modification was characterized in 1957 [1], over 100 distinct chemical modifications have been identified in RNA to date [2]. Most of these modifications were characterized in non-coding RNAs (ncRNAs), including tRNA, rRNA, and small nuclear RNA (snRNA) [3]. Studies in the past few decades have located various modifications in these ncRNAs and revealed their functional roles [3]. For instance, N-methyladenosine (mA), which is typically found at position 58 in the tRNA T-loop of eukaryotes, functions to stabilize tRNA tertiary structure [4] and affect translation by regulating the associations between tRNA and polysome [5]. Pseudouridine (W) in snRNA can fine-tune branch site interactions and affect mRNA splicing [6], whereas W in rRNA is required for binding to the internal ribosome entry site (IRES) and hence ensuring translational fidelity [7]. 5-methylcytidine (mC) in tRNA can stabilize the secondary structure and maintain the anticodon stem-loop conformation [8,9]. Many of these modifications in ncRNAs are conserved across different species, further indicating their biological significances [10]. Interestingly, N-methyladenosine (mA) was revealed to be also present in mRNA several decades ago [11], and its transcriptome-wide location has been mapped recently [12,13] (Figure 1, Table 1). mA modification in mRNA has been shown to play roles in gene expression regulation through multiple pathways, including affecting the stability, translation, splicing, and secondary structure of RNA molecules [54–58]. Besides mA, several other modifications, including N, 20-O-dimethyladenosine (mAm), inosine (I), mC, W, mA, and 5-hydroxylmethylcytidine (hmC), have also been revealed and mapped in mRNA and long ncRNA (lncRNA) [3,58,59] (Figure 1, Table 1). Recent studies have uncovered several potential biological roles for these mRNA modifications, such as W-mediated translational read-through [50], mA-associated translational regulation [51], and inosineinduced recoding [42] (Table 1). Taking together, studies on RNA modifications have led to the emergence of a new field, ‘‘epitranscriptomics”, which aims to identify modifications in transcriptome and unravel their regulatory roles in biological processes [60]. Studies in the field of epitranscriptomics have revealed the functional importance of RNA modifications, yet many questions remain to be answered. While recent progresses in

Ever since the first RNA nucleoside modification was characterized in 1957 [1], over 100 distinct chemical modifications have been identified in RNA to date [2]. Most of these modifications were characterized in non-coding RNAs (ncRNAs), including tRNA, rRNA, and small nuclear RNA (snRNA) [3]. Studies in the past few decades have located various modifications in these ncRNAs and revealed their functional roles [3]. For instance, N 1 -methyladenosine (m 1 A), which is typically found at position 58 in the tRNA T-loop of eukaryotes, functions to stabilize tRNA tertiary structure [4] and affect translation by regulating the associations between tRNA and polysome [5]. Pseudouridine (W) in snRNA can fine-tune branch site interactions and affect mRNA splicing [6], whereas W in rRNA is required for binding to the internal ribosome entry site (IRES) and hence ensuring translational fidelity [7]. 5-methylcytidine (m 5 C) in tRNA can stabilize the secondary structure and maintain the anticodon stem-loop conformation [8,9]. Many of these modifications in ncRNAs are conserved across different species, further indicating their biological significances [10].
Studies in the field of epitranscriptomics have revealed the functional importance of RNA modifications, yet many questions remain to be answered. While recent progresses in epitranscriptomics have been summarized in many excellent reviews [3,[54][55][56][57][58]61,62], we choose to discuss future research directions that could further advance our understanding of the field (Figure 1).

Improvement of epitranscriptome mapping technology
Interest in the functional studies of RNA modifications has been revived in recent years. The majority of these studies are highly dependent on the existing transcriptome-wide mapping methods with the utilization of next-generation sequencing (NGS) technologies [59]. While the practicability of the current modification sequencing technologies has been demonstrated by the prosperities of recent NGS-based epitranscriptomics studies [3,[54][55][56][57][58]61,62], improvement of these technologies may further boost the studies of modifications in mRNA and lncRNA. Firstly, technologies that merely rely on the antibody-based RNA immunoprecipitation (RIP) cannot achieve single-base resolution [62]. Take the mapping technologies for hm 5 C and m 1 A for example, most of the modified sites can only be detected with resolutions of wider than 50 nucleotides [38,51,52]. Knowing the exact position of a modification, we are able to deduce its influence on the local RNA structure and investigate its downstream effects, such as alterations in protein binding [23]. Single-base resolution mapping technology also facilitates interrogation of the potential influence of mRNA modifications on biological processes like translational recoding and microRNA (miRNA) binding. Secondly, technologies that allow absolute stoichiometric quantification are still needed. In general, biologically essential modified sites, such as m 1 A in position 1322 of 28S rRNA and position 58 of tRNA Met in humans, are modified with high stoichiometry [4,63]. With the stoichiometric information for the modified sites, researchers will be able to evaluate their biological significances and investigate those highly-modified sites with higher priority. Of note, Molinie et al. developed a RIPbased method, namely m 6 A-level and isoform-characterization sequencing (m 6 A-LAIC-seq). m 6 A-LAIC-seq is capable of detecting m 6 A in a quantitative manner [29], indicating that quantitative information could be extracted by a RIP-based approach when an antibody with high specificity and sensitivity is available and the immunoprecipitation procedure is optimized enough for the full-length RNA pull-down. Thirdly, technologies for detecting the ''co-existence" of modifications (either identical or different) in one transcript will be very helpful to reveal the potential interplays among different modifications. However, this can hardly be achieved with current NGS-based technologies, mainly due to the inevitable RNA fragmentation step in NGS sample preparation. Fourthly, current methods for transcriptome-wide modification mapping generally need more than 5 lg of mRNA [12,33,[46][47][48]52,53]. If the required mRNA amount of starting materials can be reduced to the level of nanogram, it would enable the epitranscriptomic studies in the samples with limited availability (e.g., clinical samples). Besides the widely-used NGS-based approaches, the rapidly-evolving third-generation sequencing (TGS) will, hopefully, provide a better approach for transcriptome-wide modification mapping. The single-molecule technologies, including PacBio's single molecule real-time (SMRT) sequencing and Oxford Nanopore's nanopore sequencing, have the potentials to directly detect modifications in the original nucleotide molecule [64,65]. These technologies have already been applied to profile the modifications on DNA in mammalian and bacterial systems [66,67]. Since the signal   [17][18][19][20][21] Enriched in 3 0 UTR and near stop codon [12,13]; last exon [22] Affecting RNA structure [23]; Reducing mRNA stability [24]; Enhancing mRNA translation [25]; Accelerating mRNA export [20]; Affecting pre-mRNA processing [26] m 6 Am Firstly methylated by 2 0 -O-MTase to form Am (CMTR1, CMTR2) [27] and then further methylated at N 6 position by an unidentified nucleocytoplasmic methyltransferase [28] m 6 Am/all nucleosides: $0.003% [29] Exclusively TSS [30] Enhancing mRNA stability [31]  Enriched in 5 0 UTR and near start codon [51,52] Enhancing mRNA translation [51]; Affecting RNA structure [53] Note: m 6 A, for every single nucleotide can be captured by TGS technologies, researchers would be able to detect all the modifications (either identical or different types) at single-base resolution within a RNA molecule. Moreover, as the procedure of PCR amplification, which may bring in bias [66], is no more required, quantitative information for the modified sites can also be retrieved. However, TGS technologies are still in their developing stages, thus their applicability in RNA modification detection remains to be seen.

Development of bioinformatics tools
Most of the transcriptome-wide modification studies generate a large amount of sequencing data, bioinformatic tools are therefore greatly needed to aid the data analyses. Specifically, RIP-based transcriptome-wide mapping technologies are commonly used in epitranscriptomic studies. However, most of the currently available peak calling algorithms, such as zeroinflated negative binomial algorithm (ZINBA) [67], modelbased analysis of ChIP-seq (MACS) [68], and HPeak [69] are programed for the data of ChIP-seq or DNase-seq [70,71]. Different from such DNA sequencing data, RNA sequencing data generated from RIP assay would bear the bias of depletion in the two terminals due to the procedure of fragmentation [66]. Moreover, the alignment of epitranscriptomic data could be complicated by various isoforms transcribed from one identical genomic locus. Therefore, peak calling tools specifically designed for RIP data is necessitated. Of note, exomePeak is an RNA-seq based approach for detecting and visualizing the antibody-based epitranscriptome data, which has taken these issues into account [72]. However, for certain antibodybased technologies, the modifications would cause misincorporation or truncation during reverse transcription, providing additional information to elevate the resolutions [30,51,52,73]. Thus, an optimized algorithm integrating both peak detection and reverse transcription signature capture could further facilitate the modification mapping analysis. In addition, several modifications, including m 6 A, W, and m 1 A, would affect the Watson-Crick base-pairing, hence influencing the local RNA structure [23,48,53]. Given that RNA structure could affect the protein binding and alter the fate of RNA molecules [23], software for evaluating the local structure and energy changes caused by modifications would be useful as well.

Identification of writer, eraser, and reader proteins of modifications
For modification studies, it is important to identify the enzymes responsible for the installation and removal of a modification (known as ''writer" and ''eraser", respectively), as well as the modification binding proteins (''reader"). By manipulating the expression of the writer or eraser (if any) for a modification, we can observe the global change of the modification and further investigate the downstream biological effects. For example, many functional studies of m 6 A modification, including regulation on mRNA stability, translation efficiency, and exon inclusion, relied on manipulating the methyltransferases METTL3/METTL14/WTAP [55][56][57][58][59]. Of note, these regulatory roles of m 6 A were found to be mediated by its readers, mainly the YT521-B homology (YTH) domain-containing proteins [12,[55][56][57][58][59]. Thus, for other newly-mapped mRNA modifications besides m 6 A, identification of their writer, eraser, and reader proteins would be very important and helpful in exploring their functions, as well as revealing the underlying mechanisms. The initial discovery of m 6 A readers relies on the RNA pull-down assay using synthesized oligonucleotides containing m 6 A [12]. Such method may also be imitated to identify readers for other modifications. As for the screening of writer proteins, purification of the fractions that are necessary for modifying activity from the cell extract could be a practical approach. Enzymatic activity can be evaluated through in vitro incubation of an unmodified RNA oligonucleotide with different fractions from cell extract. Such strategy proved successful in discovering METTL3 (also known as MT-A70), one component of m 6 A writer [74]. In addition, modification enzymes may also be identified de novo through bioinformatic prediction based on the conservation of protein domains involved, followed by the experimental validation using assays like knock-down or overexpression [15]. However, for the modification like W, which has multiple writers, functional study could be quite complicated due to the redundancy and potential interplay of the modifying enzymes. In the case of m 5 C, whose writer could target both mRNA and tRNA, it could be difficult to distinguish the effect caused by the modifications in mRNA or in tRNA. Under such circumstances, assays like photoactivatable ribonucleoside-enhanced crosslinking and immunoprecipitation (PAR-CLIP) or individual-nucleotide resolution CLIP (iCLIP) could be very useful to identify the global targets of a writer.

Establishment of comprehensive database for epitranscriptomics
A comprehensive database for curating and sharing epitranscriptomic data should be established. The National Institutes of Health (NIH) has built the ''Roadmap Epigenomics Mapping Consortium" to integrate the public resources for epigenomics studies [75]. This consortium also aims to standardize the experimental and computational procedures to close the gaps between different studies. Accordingly, as more important roles of RNA modifications are revealed and broader focuses are received for this area, a similar project or database should also be established for epitranscriptomics study in the near future. This would certainly help to facilitate and standardize the study of RNA modifications, hence accelerating the development of this area.

Investigation on the functions of modifications
Since m 6 A is the first RNA modification being profiled transcriptome-wide with its writer, eraser, and reader identified, the functional roles of this methylation have received extensive attention in the field. Hopefully, besides m 6 A, other modifications in mRNA could also share the limelight. In fact, potential functions of a certain type of modification could be hinted by its distribution pattern. Take m 1 A for example, a prominent feature of highly-enriched distribution pattern in 5 0 UTRs and start codon has been revealed in both humans and mice [51,52], suggesting that m 1 A may be involved in translational regulation. Besides, the intrinsic property of a modification would also provide us some clues for generating new hypotheses. Modifications occurring in the Watson-Crick edge, such as m 5 C and m 1 A, would affect the normal base pairing, hence altering the local RNA structure or causing recoding of the translated peptides [35,53]. On the other hand, besides the widely-utilized cultured cells, animal or disease models that are deficient in writers, readers, or erasers also need to be established. These systems could facilitate the investigations on physiological roles of a particular modification, such as its relations to fertility, differentiation, and pathogenicity [16,20,[76][77][78]. Hopefully, with the help of these model systems, we can achieve a deeper understanding of the molecular mechanism underlying the pathogenesis, and develop the therapies for the diseases associated with these RNA modifications. Furthermore, the dynamic pattern of a particular modification may have the potential to serve as a biomarker to monitor the status of the disease, provided that the correlation between a certain disease and a modification could be illustrated. In fact, DNA modifications have already been discovered as biomarkers for the early detection of cancer [79]. As m 6 A has been suggested to connect with cell differentiation and cancer formation [80], the possibility of mRNA modifications as biomarkers could be tested in the near future.

Integral study of different types of modifications
The study of epitranscriptome could evolve toward an integral manner that comprehensively considers the effect of multiple types of modifications in the transcriptome. Most of the current studies in epitranscriptome are focused on a single particular modification. However, various modifications in RNA may interplay with each other, forming ''networks" to modulate the physiological pathways. For instance, multiple types of modifications, including m 6 A, m 1 A, and W, have been identified in MALAT1 [23,45,51,52], a nuclear speckle localized lncRNA; yet it remains unknown whether these different types of modifications would affect the installation of each other, or whether they would work together to maintain the structure or stability of MALAT1. Along with the development of transcriptome-wide mapping technologies in the foreseeable future (especially the TGS technologies), integral studies of the RNA modifications could come into play.
Collectively, the functional studies of m 6 A have brought an initial period of prosperity in the emerging field of epitranscriptomics; in the meantime, several other modifications have also been identified and mapped in mRNAs and lncRNAs. We envision that with more endeavor being made in this field, researchers will get a better understanding of RNA modifications and their functions in a more comprehensive way.