YTH Domain: A Family of N6-methyladenosine (m6A) Readers

Like protein and DNA, different types of RNA molecules undergo various modifications. Accumulating evidence suggests that these RNA modifications serve as sophisticated codes to mediate RNA behaviors and many important biological functions. N6-methyladenosine (m6A) is the most abundant internal RNA modification found in a variety of eukaryotic RNAs, including but not limited to mRNAs, tRNAs, rRNAs, and long non-coding RNAs (lncRNAs). In mammalian cells, m6A can be incorporated by a methyltransferase complex and removed by demethylases, which ensures that the m6A modification is reversible and dynamic. Moreover, m6A is recognized by the YT521-B homology (YTH) domain-containing proteins, which subsequently direct different complexes to regulate RNA signaling pathways, such as RNA metabolism, RNA splicing, RNA folding, and protein translation. Herein, we summarize the recent progresses made in understanding the molecular mechanisms underlying the m6A recognition by YTH domain-containing proteins, which would shed new light on m6A-specific recognition and provide clues to the future identification of reader proteins of many other RNA modifications.


Introduction
The central dogma explains how genetic information is transferred from DNA to RNA to protein [1]. It is well known that epigenetic marks on the nucleosome, including histone modifications and DNA methylation (5-methylcytosine), play important roles in gene regulation by mediating gene transcription events [2,3]. In addition to DNA and protein, RNA molecules can also be modified. Up till now, more than 100 modifications have been identified in different types of eukaryotic RNAs, including mRNAs, tRNAs, and non-coding RNAs (ncRNAs) [4]. In contrast to the well-studied epigenetic marks, the exact biological roles of most of the identified RNA modifications are largely unknown. N 6 -methyladenosine (m 6 A), which was discovered in a wide range of cellular RNAs in 1970s [5][6][7], is the most prevalent internal RNA modification present in a GAC or AAC motif within almost all types of eukaryotic RNAs examined [8] as well as viral RNAs [9][10][11][12][13][14]. On average, there are 3-5 m 6 A sites in each mRNA molecule [15]. m 6 A has been attracting considerable attention because of its important roles in gene regulation [16], genome stability maintenance [17], as well as cell renewal and differentiation [18]. Recent advancements in crosslinking and immunoprecipitation (CLIP) technologies have made it possible to accurately locate this specific mark in cellular RNAs [19].
Similar to other epigenetic modifications, m 6 A is dynamic and reversible, established mainly by the METTL3-METTL14 methyltransferase complex [20,21] and removed by demethylases including the fat mass and obesity-associated protein (FTO) [22] and AlkB homolog 5 RNA demethylase (ALKBH5) [23]. Although both METTL3 and METTL14 adopt a canonical fold similar to that of other methyltransferases [20], only METTL3 can bind to the methyl donor Sadenosyl methionine (SAM or AdoMet), whereas METTL14 acts to modulate the activity of METTL3 and binds to the RNA substrate instead [20,24,25]. FTO and ALKBH5 are the only two known m 6 A demethylases found in humans and they both belong to the a-ketoglutarate-dependent dioxygenase family [23,26]. Interestingly, both FTO and ALKBH5 discriminate single-stranded RNA (ssRNA) from doublestranded RNA (dsRNA), by a unique insertion in the case of FTO and by a loop rigidified by the disulfide bond in the case of ALKBH5 [27][28][29][30]. In addition, ALKBH5 displays comparable activities toward m 6 A-modified ssRNA and N 6methyldeoxyadenosine (6mA)-modified ssDNA [30]. Although the in vivo biological relevance of 6 mA ssDNA demethylation by ALKBH5 remains unknown, 6 mA has been identified in eukaryotic genomes by several groups [31][32][33].
The regulatory role of m 6 A on RNA molecules is similar to that of epigenetic marks on chromatin [34], which could be achieved in two ways, i.e., cis and trans. In the cis mode, the effect of m 6 A on the RNA structure is similar to that of epigenetic marks on the nucleosome. Incorporation of the methyl moiety at the N 6 atom of adenosine renders the m 6 A-U pair energetically unfavorable [35], which may cause destruction of the stem loop where it resides and lead to further global conformational rearrangement of the RNA [8]. In addition, m 6 A can mediate RNA functions in a trans mode through the recruitment of specific proteins or protein complexes [8].
The YT521-B homology (YTH) domain serves as the module for recognizing m 6 A in a methylation-dependent manner [36][37][38]. There are five YTH domain-containing proteins in humans, namely, YTHDC1, YTHDC2, YTHDF1, YTHDF2, and YTHDF3. YTHDF2 is the first protein, of which the m 6 Aassociated function has been well studied [33]. After being targeted to a specific site via m 6 A recognition, YTHDF2 recruits the CCR4-NOT deadenylase complex to destabilize and further decay target mRNAs ( Figure 1) [37,39]. Binding of YTHDF1 to m 6 A-modified mRNA increases the translation efficiency of the mRNA independent of the m 7 G cap ( Figure 1) [40]. YTHDC1, the only known m 6 A reader in the nucleus, has been reported to be involved in exon selection during gene splicing ( Figure 1) [17]. YTHDC2 is a putative RNA helicase [41,42] that forms a complex with the meiosis-specific coiled-coil domaincontaining protein (MEIOC) to regulate RNA levels during meiosis through recognizing m 6 A by its YTH domain ( Figure 1) [42]. Thus, by targeting different complexes to specific sites via direct binding to m 6 A, the YTH domain-containing proteins participate extensively in post-transcriptional regulation by regulating splicing, translation, localization, and lifetime of RNAs ( Figure 1) [43]. By reading and interpreting the m 6 A mark, these proteins play important roles in gene regulation, DNA repair, and cell fate determination [44].
The specific m 6 A recognition mode by the YTH domain had remained largely unknown until the structure of the first human YTH complex, the YTHDC1 YTH domain with the 5-mer GG(m 6 A)CU RNA, was solved in 2014 [38]. Immediately thereafter, several structures of human YTH domaincontaining proteins have also been reported, including the YTH domains of YTHDF1 and YTHDF2 with their respective m 6 A-modified RNA ligands, the structure of the YTHDF2 YTH domain alone, and one nuclear magnetic resonance structure of the YTHDC1 complex [45][46][47][48]. Besides the YTH family proteins, other RNA-binding proteins (RBPs), such as heterogeneous nuclear ribonucleoproteins A2/B1 (HNRNPA2B1) [49], embryonic lethal, abnormal vision-like protein 1 (ELAVL1) [50], and insulin-like growth factor 2 mRNA-binding proteins 1-3 (IGF2BP1-3) [51], are also suggested to be potential m 6 A-binding proteins, albeit awaiting further confirmation. We aim to summarize the progresses made in unraveling the structural features of the YTH family proteins, including the m 6 A-binding specificity and sequence selectivity. Furthermore, we also provide mechanistic insights into the search for new m 6 A reader proteins based on known rules of m 6 A recognition.

Human YTH domain-containing proteins
The YTH domain is present in 174 different proteins and is evolutionarily conserved across the eukaryotic species [52]. Early functional studies of YTH domain-containing proteins, such as YT521-B [53] and Mmi1 [54,55], have implied their potential roles in RNA metabolism. Although YTH domaincontaining proteins are putative RBPs, their exact binding ligands had remained unknown until two reports discovered that mammalian YTH family members are the candidates of m 6 A readers [36,37]. By searching through the human genome, five YTH domain-containing proteins are found, namely, YTHDF1-3 and YTHDC1-2, all of which are conserved in mammalian genomes (Figure 2A). On the basis of their primary sequences and domain organizations, these five human YTH domain-containing proteins can be classified into three categories: YTHDC1 (DC1 family), YTHDC2 (DC2 family), and YTHDF1-3 (DF family) ( Figure 2A). YTHDC1 is a nuclear protein involved in gene splicing, whereas YTHDF1-3 are cytoplasmic m 6 A readers [37]. YTHDC2 is a putative RNA helicase that, aside from the YTH domain, contains the helicase domain, ankyrin repeats, and DUF1065 domain (Figure 2A), which may act as a scaffold molecule in regulating spermatogenesis [41].

Structural features of the YTH complexes
Although all five human YTH domain-containing proteins share a homologous YTH domain, the biological functions of these proteins remain unknown until YTHDF2 is reported to affect the lifetimes of mammalian mRNAs through recognizing m 6 A by its YTH domain [37], indicating that the YTH domain serves as the m 6 A binding module. Subsequent determination of the crystal structures of the YTHDC1 YTH domain, alone and together with GG(m 6 A)CU RNA, helps unravel the mechanisms underlying m 6 A recognition and sequence selectivity [38]. The YTH domains share a conserved a/b fold ( Figure 2B), which consists of four or five a helices and six b strands [38]. These six b strands form a b barrel, with the a helices packed against the b strands to stabilize the hydrophobic core ( Figure 3) [38].
In the YTHDC1-m 6 A complex, the RNA molecule lies in the positively-charged groove of the protein, with m 6 A buried in a deep cleft formed by three hydrophobic residues, W377, W428, and L439 ( Figure 3A) [38]. Specifically, the methyl-p interactions between the methyl group of m 6 A and the rings of the two tryptophan residues constitute the basis of m 6 A-specific recognition, consistent with the fact that the YTHDC1 YTH domain exhibits binding affinity toward m 6 A-modified RNAs, but not unmodified RNAs [38]. The m 6 A binding mode of the YTH domain is somewhat similar to that of the methyllysine recognition by Royal family domains, which also utilize an aromatic cage pocket to accommodate the methyllysine residue [56]. In addition to the methylation-dependent interactions, m 6 A also forms base-specific hydrogen bonds with N363, N367, and S378 of YTHDC1 ( Figure 3A) [38]. Of note, the m 6 A-binding pocket of YTHDC1 can accommodate m 6 A, but not N 6 ,N 6 -dimethyladenosine (m 6,6 A), since introducing another methyl group at N 6 would not only disrupt the hydrogen bond between S378 and N 6 of m 6 A but also cause steric clash with the backbone of S378 ( Figure 3A). Besides m 6 Aspecific binding, electrostatic interactions between YTHDC1 and the RNA molecule also contribute to formation of the complex, such as the hydrogen bond between the guanosine at À2 position (GÀ2) and D476 of YTHDC1, cation-p interaction between the cytosine following m 6 A (C+1) and R475 of YTHDC1, as well as several hydrogen bonds between YTHDC1 and sugar-phosphate backbone atoms of RNA [38].
With the elucidation of the YTHDC1-m 6 A complex, two other complexes, the YTH domains of YTHDF1 and YTHDF2 with their respective m 6 A-modified RNA ligands, have also been reported [45,46]. In both complexes, m 6 A is recognized in a manner similar to that observed in the YTHDC1m 6 A complex ( Figure 3B). m 6 A is positioned into a positivelycharged pocket of YTHDF1, formed by the side chains of W411, W465, and W470. The methyl group of m 6 A points to the ring of W465 and is positioned between the rings of W411 and W470 [46]. The methyl-p interactions between m 6 A and the three tryptophan residues constitute the methylation-dependent recognition mode. Furthermore, the YTH domain of yeast Pho92, the only YTH domaincontaining protein in Saccharomyces cerevisiae, adopts the canonical YTH fold and possesses the m 6 A-binding pocket ( Figure 3C) [46], which is formed by W177, W231, and Y237, suggesting a conserved m 6 A recognition mode in eukaryotes ( Figure 3C). Of note, in all of these m 6 A-binding pockets, the residues W411 and W465 of YTHDC1 are absolutely conserved in all human YTH domains, whereas the third residue could be tryptophan, tyrosine, or leucine ( Figure 2B), indicating that these YTH domains described above not only adopt a common architecture but also share a conserved m 6 A-binding pocket.
Comparison of the binding affinity between YTHDC1 and m 6 A with that between YTHDF1 and m 6 A shows that the YTH domain of YTHDC1 binds to the 5-mer m 6 A-modified RNA $10 folds more strongly than does that of YTHDF1. Detailed structural analysis indicates that YTHDC1 utilizes N367 to form a hydrogen bond with N 1 of m 6 A, whereas the corresponding residue in YTHDF1 is D401 ( Figure 3A and B). Under neutral or basic pH conditions, N 1 of m 6 A cannot serve as the hydrogen donor to form one hydrogen bond with an aspartic acid residue; instead, it serves as the hydrogen acceptor to be hydrogen bonded to an asparagine, such as N367 of YTHDC1. Only under acidic pH conditions, the protonation of N 1 might make it possible for m 6 A to form a hydrogen bond with D401 of YTHDF1 ( Figure 3B). Further work is required to investigate the pH-dependent interactions between YTHDF1 and m 6 A-modified RNA, which might explain the apparently weak binding of YTHDF1 to short m 6 A-modified RNAs.
Despite the common m 6 A-binding pocket, the YTH domains display different binding preferences. The YTH domain of YTHDC1 prefers a guanosine residue at a position preceding m 6 A (GÀ1), as confirmed by binding experiments using both photoactivatable-ribonucleoside-enhanced CLIP (PAR-CLIP) and isothermal titration calorimetry (ITC) [36,38]. In the YTHDC1-m 6 A complex, the GÀ1 residue stacks with L380 and M438 of YTHDC1, and forms hydrogen bonds with V382 and N383 ( Figure 4A) [38]. G-to-A substitution at the À1 position would disrupt the base-specific hydrogen bonds and lead to steric clash between AÀ1 and the backbone of V382; therefore, adenosine at the À1 position is not favored [38]. Collectively, hydrophobic interactions and hydrogen bonds confer the binding preference for guanosine at the À1 position on YTHDC1 [38]. Other human YTH domains do not seem to contain the GÀ1 binding pocket, and neither does the YTH domain of yeast Pho92, indicating that the Pho92 YTH domain might be evolutionarily more similar to that of YTHDF1 than to that of YTHDC1. Furthermore, the difference in sequence selectivity between the YTH domain of YTHDC1 and other human YTH domains may reflect the differential m 6 A-binding demands in nucleus and cytoplasm [38,46].

The pocket of YTH domain governs the m 6 A-specific recognition
In contrast to Pho92, Mmi1 from fission yeast contains a YTH domain that does not exhibit m 6 A-specific binding toward RNA ligands, although the Mmi1 YTH domain also adopts the canonical YTH fold [57]. Structural comparison of the YTH domains of YTHDC1 and Mmi1 indicates that while YTHDC1 contains a large positively-charged groove to position the m 6 A-modified RNA ( Figure 4B), the corresponding surface of Mmi1 is negatively-charged, which impairs the binding of its YTH domain to negatively-charged RNA backbones ( Figure 4C). In addition, when superimposing the m 6 Abinding pocket of YTHDC1 with that of Mmi1, it is found that the key m 6 A-binding residues of YTHDC1 are not conserved in Mmi1, with N367 of YTHDC1 replaced by an alanine residue in Mmi1, which would disrupt the base-specific hydrogen bond ( Figure 4C). Moreover, W428 of Mmi1 rotates its ring plane by $90°to avoid potential clash with the Mmi1 P419, which completely blocks the entry of the m 6 A-modified nucleotide into this pocket ( Figure 4D). Mmi1 can bind to an unmodified RNA motif named as the DSR core motif (5 0 -U(U/C)AAAC-3 0 ) [57]. Recent structural studies of the Mmi1 YTH domain in complex with the DSR motif have revealed that Mmi1 binds to the RNA via a positively-charged groove formed by its a 4 helix, as well as b 3 and b 4 strands, which is distinct from the YTHDC1m 6 A surface [57,58]. Of note, the methylation of adenosine within the DSR motif would weaken rather than enhance the binding [57]. The diversity in the RNA-binding ability of the two yeast YTH domains implies that they are deviated from each other during evolution, albeit without altering the fold.
A search within the Dali database reveals other proteins containing domains with an architecture similar to that of the YTH domain, such as MJECL36 of Methanocaldococcus jannaschii (PDB ID: 2P5D; Figure 5A Figure 5A and B). In MJECL36, an aromatic cage is formed by the rings of W25, F79, and F90, which could be superimposed with the aromatic cage of YTHDC1, and accommodates the m 6 Amodified nucleotide ( Figure 5C). However, residues at the Cterminal end of b1 in MJECL36, T11 and N12, deviate from those of YTHDC1, N363 and N364, potentially disrupting the backbone hydrogen bond formed between N 3 of m 6 A and N363 of YTHDC1 ( Figure 5C). Consistently, MJECL36, albeit with a YTH-like fold, does not exhibit any detectable m 6 A-binding affinity [46]. It seems that the size of the pocket, as well as the aromatic residues that reside therein, confers the m 6 A-binding ability on the YTH domain. The similar topology and distinctive functions between the YTH domain and MJECL36 indicate that they may have originated from a common ancestor, but the YTH domain in higher eukaryotes probably has acquired adaptive functions after a long period of evolution.

Other potential m 6 A binders
Besides the YTH domain-containing proteins, eukaryotic initiation factor 3 (eIF3) has also been reported as an m 6 A reader [59]. The cap-binding protein eIF4 is essential for translation initiation [59]. However, eIF4-independent translation initiation can occur in case of eIF4 loss of function or viral mRNA translation [60]. Meyer et al. [59] report that eIF3 facilitates eIF4-independent translation of mRNAs depending on the m 6 A modification in their 5 0 UTRs. HNRNPA2B1, a RBP that contains the RNA recognition motif (RRM) domain, has also been reported as an m 6 A reader. Alarco´n et al. find that HNRNPA2B1 binds to m 6 Arich sites in the transcriptome [49]. However, a recent study on the complex structure of HNRNPA2B1 with RNA shows that no aromatic pocket is found in the RRM domain of HNRNPA2B1, which prefers unmodified RNA ligands [61]. Therefore, HNRNPA2B1 might not bind to m 6 A directly, although we could not rule out the possibility that the binding occurs via other effectors. Another RRM domain-containing protein, embryonic lethal, abnormal vision-like protein 1 (ELAVL1), can be pulled down by m 6 A-containing RNAs. ELAVL1 contains three RRM domains, all of which are homologous to the determined RRM structures and do not seem to contain the m 6 A-binding pocket [50]. Although we could not exclude the possibility that ELAVL1 might recognize m 6 A via other regions, it is also possible that ELAVL1 binds to other sequences rather than the m 6 A site itself, as implicated by the difference between the sequence of the m 6 A site and the ELAVL1-binding motif [50]. Very recently, IGF2BPs are reported to enhance mRNA stability and mediate translation in an m 6 A-dependent manner [51]. IGF2BPs contain tandem KH domains and KH domain is a conserved ssRNA-binding domain that usually appears as tandem repeats in proteins [62]. Whether the tandem KH domains of IGF2BPs serve as the reader of m 6 A requires further investigation. One possibility is that some intrinsicallydisordered regions flanking the KH domains may endure conformational conversion to enable the m 6 A binding by providing additional contacts. For example, although the RGG motif from the human fragile X mental retardation protein (FMRP) alone is disordered, it becomes ordered after binding to the major groove of the G-rich RNA duplex-quadruplex junction [63].

Concluding remarks and outlook
In the past decades, the roles of histone modifications and DNA methylation have been well studied. In contrast, although >100 RNA modifications have been discovered in vivo, their exact roles remain elusive. As the hallmark of RNA epigenetics, m 6 A mediates the functions of eukaryotic RNAs extensively [64]. The YTH domain represents a family that recognizes the m 6 A mark directly. By recruiting different complexes to target m 6 A sites, the YTH domain-containing proteins, as well as other potential m 6 A-binding proteins, contribute to gene regulation post-transcriptionally in many aspects, such as splicing, translation, localization, and lifetime.
Despite the progress made in understanding the m 6 A effectors in the past few years, some questions remain to be answered. Are there more proteins that recognize m 6 A directly? How should we go about discovering the reader proteins of many other RNA modifications, such as the N 1methyladenosine (m 1 A) [65], 5-methylcytosine (m 5 C) [66], N 6 ,2 0 -O-dimethyladenosine (m 6 Am) [67], and pseudouridine (w) [68]? Could a better understanding of m 6 A-binding proteins facilitate our search for readers of other RNA modifications? Is it possible that the YTH domain serves as the readers of other modified RNAs?
Detailed structural analysis has revealed that the m 6 A base fits into the YTH domain pocket and forms several basespecific hydrogen bonds with the YTH domain residues. Therefore, it is unlikely that the same pocket of the YTH domain could recognize other modified bases other than m 6 A. Even for m 6 Am, introducing a methyl moiety would disrupt the hydrogen bond between the C2 0 -ribosyl hydroxyl oxygen and the side chain of N363 in YTHDC1. It is possible that YTHDC1 could accommodate m 6 Am by changing the conformation of N363. Whether the YTH domain-containing proteins possess the m 6 Am-binding ability requires further examinations both in vitro and in vivo. In the epigenetic field, the structural information of known mediators or readers of histone acetyllysine and methyllysine has been used to guide the design and development of chemical probes [69]. Interestingly, some of the small molecules designed serve as inhibitors of protein-protein interactions rather than inhibitors of enzymes [70]. Considering the similar characteristics between the methyllysine-binding pocket and the m 6 Abinding pocket [71], we believe that it is plausible to design small molecules to modulate the functions of RNAs through disrupting the m 6 A recognition by YTH domains. Some human YTH domains have been associated with human diseases, such as cancer or viral infection. For instance, YTHDC1 is associated with endometrial cancer [38], while YTHDF1-3 can recognize m 6 A in RNA of human immunodeficiency virus 1 (HIV-1) and suppress HIV infection [12]. The structural studies on human YTH domains and other identified m 6 A binders, should help to address the unanswered questions and provide insights into the development of chemical probes and future drug therapies.