Identification of putative reader proteins of 5-methylcytosine and its derivatives in Caenorhabditis elegans RNA

Background: Methylation of carbon-5 of cytosines (m 5C) is a conserved post-transcriptional nucleotide modification of RNA with widespread distribution across organisms. It can be further modified to yield 5-hydroxymethylcytidine (hm 5C), 5-formylcytidine (f 5C), 2´-O-methyl-5-hydroxymethylcytidine (hm 5Cm) and 2´-O-methyl-5-formylcytidine (f 5Cm). How m 5C, and specially its derivates, contribute to biology mechanistically is poorly understood. We recently showed that m 5C is required for Caenorhabditis elegans development and fertility under heat stress. m 5C has been shown to participate in mRNA transport and maintain mRNA stability through its recognition by the reader proteins ALYREF and YBX1, respectively. Hence, identifying readers for RNA modifications can enhance our understanding in the biological roles of these modifications. Methods: To contribute to the understanding of how m 5C and its oxidative derivatives mediate their functions, we developed RNA baits bearing modified cytosines in diverse structural contexts to pulldown potential readers in C. elegans. Potential readers were identified using mass spectrometry. The interaction of two of the putative readers with m 5C was validated using immunoblotting. Results: Our mass spectrometry analyses revealed unique binding proteins for each of the modifications. In silico analysis for phenotype enrichments suggested that hm 5Cm unique readers are enriched in proteins involved in RNA processing, while readers for m 5C, hm 5C and f 5C are involved in germline processes. We validated our dataset by demonstrating that the nematode ALYREF homologues ALY-1 and ALY-2 preferentially bind m 5C in vitro. Finally, sequence alignment analysis showed that several of the putative m 5C readers contain the conserved RNA recognition motif (RRM), including ALY-1 and ALY-2. Conclusions: The dataset presented here serves as an important scientific resource that will support the discovery of new functions of m 5C and its derivatives. Furthermore, we demonstrate that ALY-1 and ALY-2 bind to m 5C in C. elegans.


Introduction
Nucleoside chemical modifications are a common feature in RNA molecules. More than 160 post-transcriptional modifications have been reported since the discovery of pseudouridine in 1957 (Boccaletto et al., 2018;Cohn, 1960;Davis & Allen, 1957). However, the molecular and physiological functions of most RNA modifications remain unknown. Discovered in 1958(Amos & Korn, 1958, the methylation of carbon-5 of cytosines in RNAs (m 5 C) is now known to be a conserved and widely distributed feature in biological systems. m 5 C is catalysed by tRNA aspartic acid MTase 1 (DNMT2) and RNA methyltransferases from the Nop2/Sun domain family (NSUN1-7 in humans), and has been shown to be involved in tRNA stability, ribosome fidelity and translation efficiency (reviewed in García-Vílchez et al., 2019). Previous studies showed that m 5 C is subject to hydroxylation and oxidation by alpha-ketoglutarate-dependent dioxygenase ABH (ALKBH1), forming 5-hydroxymethylcytidine (hm 5 C) and 5-formylcytidine ( The identification of readers, i.e. proteins that specifically, or preferentially, interact with RNA molecules bearing a certain modification, has proven to be an important step towards the understanding of a modification's biological function. For example, the discovery of N 6 -methyladenosine (m 6 A) readers added new layers of complexity to this pathway, revealing context-and stimuli-dependent functions previously not appreciated (Shi et al., 2019). To date, only two reader proteins have been identified for m 5 C. The Aly/REF export factor (ALYREF) was identified in an RNA pulldown experiment and shown to interact with NSUN2modified CG-rich regions and regions immediately downstream of translation initiation sites. NSUN2-mediated mRNA methylation is interpreted by binding of ALYREF to modulate nuclear-cytoplasm shuttling of transcripts. . More recently, the Y-box binding protein 1 (YBX1) was shown to recognise m 5 C in mRNAs and enhance its stability through the recruitment of ELAV1 in human urothelian carcinoma cells (Chen et al., 2019). In zebrafish, YBX1 was shown to stabilise m 5 C-modified transcripts via Pabpc1a during the maternal-to-zygotic transition (Yang et al., 2019).
Recently, we used Caenorhabditis elegans as a model to engineer the first organism completely devoid of m 5 C in RNA. We mapped m 5 C with single nucleotide resolution in different RNA species and found that m 5 C is required for physiological adaptation and translation efficiency at high temperatures (Navarro et al., 2020). Here, to further expand our knowledge on the m 5 C pathway in this organism, we produced the first list of putative readers of m 5 C and its metabolic derivatives hm 5 C, hm 5 Cm and f 5 C. The putative readers are enriched in proteins with roles in germline development and RNA surveillance. We show that the C. elegans ALYREF homologues ALY-1 and ALY-2 preferentially bind m 5 C in vitro and identify other proteins that contain a conserved RNA recognition motif potentially involved in m 5 C binding.

Methods
Genetics C. elegans strains were grown and maintained on NGM plates (3 g/L NaCl, 17 g/L agar, 2.5 g/L peptone, 1mM CaCl 2 , 5 µg/mL cholesterol, 1mM MgSO 4 , 25 mM KPO 4 ) seeded with Escherichia coli and transferred to fresh plates regularly to avoid starvation (Brenner, 1974). The strains were kept at 20°C, unless otherwise indicated. HB101 strain E. coli was grown in B broth (10 g/L Bactro-tryptone, 5 g/L NaCl) and used as food source (Caenorhabditis Genetics Center, University of Minnesota, Twin Cities, MN, USA). Bristol N2 was used as the wild type strain.

Protein extraction
Nematodes were grown until gravid adults in 140 mm NGM plates seeded with concentrated HB101 E. coli and then harvested and washed twice in M9 buffer. A final wash was performed in lysis buffer (20 mM HEPES pH 7.5, 150 mM KCl, 1.5 mM MgCl 2 , 0.1% IGEPAL, 0.5 mM DTT) and the animals were resuspended in 4 ml of lysis buffer before the addition of a protease inhibitor cocktail (SIGMAFAST protease inhibitor tablets, Sigma). The animals were pelleted by centrifugation at 2,000 rpm for 2 min (Eppendorf 5810) and the supernatant was removed. Samples were frozen in liquid nitrogen drop-wise, using a Pasteur pipette. Frozen droplets were grinded to powder in metallic capsules for 25 sec in a mixer (Retsch MM 400 Mixer Mill). The powder was stored at -80°C until required. The powder was defrosted at 4°C and the lysate was sonicated for 10 cycles of 20 sec, with breaks of 20 sec in between, in a Bioruptor Pico (Diagenode). The sample was centrifuged in an Eppendorf 5430R microcentrifuge at maximum speed for 15 min at 4°C and the protein concentration of the supernatant was determined using a Bio-Rad Bradford protein assay (Bio-Rad 5000006) according to the manufacturer's instructions.
RNA pulldown RNA pulldown protocol was adapted from Dominissini et al., 2012. Magnetic beads (Dynabeads MyOne Streptavidin C1, Invitrogen) were washed twice in 100 mM Tris-HCl pH 7.5, 10 mM EDTA, 1 M NaCl, and 0.1% Tween-20, followed by two washes in lysis buffer. In order to reduce non-specific protein binding to the RNA bait matrix, lysates were pre-cleared by incubation with washed beads for 1 h at 4°C with constant rotation prior to the pulldown. Finally, the following pulldown mix was prepared: 2 mg of protein lysate, 50 U of RNAse inhibitor Superase-IN (Invitrogen), 500 pmol of 25-mer biotinylated RNA bait, up to a final volume of 250 µl in lysis buffer. The mixture was incubated with constant rotation for 2 h at 4°C and then added to 20 µl of streptavidin-conjugated beads for pulldown for another 2 h at 4°C. The beads were washed 3-5 times in 1 ml of lysis buffer and the bound proteins were eluted for mass spectrometry in 50 µl 4% SDS 100 mM TEAB at 95°C for 10 min.
Sample preparation for protein mass spectrometry Samples were processed and analysed as described in Bensaddek et al., 2016. Briefly, samples were reduced using 25 mM tris-carboxyethylphosphine TCEP for 30 min at room temperature, then alkylated in the dark for 30 min using 50 mM iodoacetamide. Protein concentration was quantified using the EZQ assay (Thermo Fisher). The lysates were diluted with 100 mM triethyl ammonium bicarbonate (TEAB) four-fold for the first digestion with endoprotease Lys-C (Fujifilm Wako Chemicals). The lysates were further diluted 2.5-fold before a second digestion with trypsin. Lys-C and trypsin were used (1:50 (w/w) enzyme:substrate), carried out overnight at 37°C. The digestion was terminated with 1% (v/v) trifluoroacetic acid (TFA)C18 Sep-Pak cartridges (Waters) were sued to desalt the peptides. Desalted peptides were dried and dissolved in 5% formic acid (FA).
Reverse-phase liquid chromatography-MS RP-LC was performed using a Dionex RSLC nano HPLC (Thermo Fisher Scientific). Peptides were injected onto a 75 µm x 2 cm PepMap-C18 pre-column and resolved on a 75 µm x 50 cm RP-C18 EASY-Spray temperature controlled integrated column-emitter (Thermo Fisher Scientific). The mobile phases were: 2% ACN incorporating 0.1% FA (Solvent A) and 80% ACN incorporating 0.1% FA (Solvent B). A gradient from 5% to 35% solvent B was used over 4 h with a constant flow of 200 nL/min. The spray was initiated by applying 2.

Mass spectrometry data analysis
Putative readers were ranked based on the median of the intensity-based absolute quantification (iBAQ) values (Schwanhäusser et al., 2011). Briefly, for normalisation, iBAQ values were divided by the total sum of intensity of each sample and the median of normalised biological replicates was calculated. A value of 1x10 -9 (a number 100 times smaller than the smallest value in the dataset) was then added to all median values to calculate fold changes.

SDS-PAGE
Samples were prepared and validation pulldowns were performed as described in "Protein extraction" and "RNA pulldown", respectively. Proteins bound to RNA-baits were eluted in 50 µL of sample buffer (Biorad, 1610737) and boiled for 5 min at 95 o C in a heat block (Eppendorf 5355). Boiled samples were then spun at 13,000 rpm in a microcentrifuge (Eppendorf 5430R). Pulldown eluates (100%) and input controls (5%) were loaded alongside a protein ladder (PageRuler Plus pre-stained protein ladder 10-250 kDA, Thermo Fisher Scientific 26619) in a 4-12% Bis-Tris gel and ran in NuPAGE MOPS SDS running buffer (Invitrogen) at 200V for 50 min in a XCell SureLock electrophoresis system (Thermo Fisher Scientific) connected to a Bio-Rad PowerPac Universal Power Supply (Bio-Rad 1645070).

Western blotting
Proteins resolved in SDS-PAGE were transferred onto a nitrocellulose membrane (Hybond ECL, Amersham) using NuPAGE Transfer Buffer (Invitrogen), for 2 h, at 250 mA, 4°C in a Mini Trans-Blot Cell (Bio-Rad). Membranes were blocked in 3% non-fat dry milk in TBS-T buffer for 30 min at room temperature. Membranes were incubated with primary antibody (mouse monoclonal anti-FLAG M2, Sigma F1804) at 1:1000 dilutions in 3% milk-TBS-T overnight rotating at 4°C. Following three washes in TBST-T for 10 min each, membranes were incubated with goat polyclonal anti-mouse IgG conjugated to HRP (1:10000; Invitrogen, A16078) diluted in milk/TBS-T for 1-2 hours at room temperature. Following three washes in TBS-T, bands were detected using Immobilon Western Chemiluminescent HRP Substrate (Millipore) according to manufacturer's instructions. Medical X-ray films (Super Rx, Fuji) were exposed to luminescent membrane for the time required and the films were developed on a Compact X4 automatic X-ray film processor (Xograph Imaging Systems Ltd).

Results
To increase the chances to identify bona-fide modification readers, we used modified RNA baits with biologically relevant characteristics. Considering that m 5 C is deposited in tRNAs in a structure-dependent manner, we designed baits bearing modifications in three structural contexts: single-strand, double-strand, and loop ( Figure 1A). m 5 C, hm 5 C, hm 5 Cm and f 5 C monomers were synthesised and incorporated into oligonucleotides tethered to triethylene glycol (TEG)-biotin (Tanpure & Balasubramanian, 2017).
The RNA pulldown experiments were performed in independent biological triplicate. We incubated the RNA baits with pre-cleared whole worm lysates prepared from synchronised populations of gravid adults. To increase the stringency of our screen, we included two controls: beads only and an unmodified RNA bait. Following the pulldown experiments, proteins were eluted from the RNA baits and identified by mass spectrometry (Suen, 2022).
Proteins were ranked according to their enrichment in modified RNA pulldowns in comparison to the controls. Proteins were (a) Molecular baits were designed bearing modifications in three structural contexts: single-strand, double-strand, and loop. m 5 C, hm 5 C, hm 5 Cm and f 5 C monomers were synthesised and incorporated into oligonucleotides tethered to TEG-biotin. Blue circles indicate modified positions. Pre-cleared protein lysate from synchronised adult populations was incubated with unmodified or modified biotinilated RNA baits. Streptavidin-conjugated magnetic beads were used to capture proteins that interact with the RNA. Following washes and elution, proteins were processed and identified by RP LC-MS. (b) Scatter plot of proteins bound to modified versus unmodified RNA baits. Median of the normalised iBAQ values were plotted. Proteins uniquely enriched in m 5 C, hm 5 C, hm 5 Cm and f 5 C pulldowns are highlighted in mustard. Proteins were considered candidate readers whenever the fold-change of iBAQ values over the "beads only" and "unmodified C" controls was >2.0 and >1.5, respectively. n = 3 independent biological replicates.
considered candidate readers if they preferentially bound to the modified-RNA bait, i.e., at least 2.0-and 1.5-fold increased as compared with the beads and unmodified C controls, respectively ( Figure 1B).
To narrow down the number of candidate readers, we focused on unique binders, i.e. proteins that bound specifically to a single modified RNA bait. Using these criteria, we found 143, 62, 81 and 33 putative unique readers for m 5 C, hm 5 C, hm 5 Cm and f 5 C, respectively ( Figure 2A). We used the WormBase Enrichment Suite to perform phenotype enrichment analysis on the lists of unique candidates (Angeles-Albores et al., 2016). This tool analyses a list of genes according to phenotypes that have been reported by researchers to WormBase upon knockout or RNAi, thus suggesting a potential involvement of such genes in specific biological processes. m 5 C unique binders are the most abundant proteins when compared to its oxidised derivatives (Huber et al., 2015; Huber et al., 2017), likely due to higher levels of this modification in RNA. This subset is enriched mostly for germline development phenotypes, and a few RNA surveillance processes. hm 5 Cm-unique binders show specific enrichment for RNA processing phenotypes, whereas hm 5 C and f 5 C show exclusively germline-related phenotypes ( Figure 2B).
To validate our dataset, we verified whether ALYREF homologues could be found among our ranked m 5 C reader candidates. Yang et al. identified amino acid residues on ALYREF required for binding to m 5 C (Yang et al., 2019). Protein alignments show that some of these residues are conserved in ALY-1 and ALY-2 ( Figure 3A). Indeed, ALY-1 and ALY-2 are 1.8-fold enriched across all triplicates of m 5 C pulldowns. We used transgenic strains expressing either ALY-1 (OP502), or ALY-2 (OP217), tagged at the C-terminus with TY1::eGFP::3xFLAG, to confirm these findings (Sarov et al., 2012). We performed pulldown experiments under the same conditions as before and confirmed using western blotting that both ALY-1 and ALY-2 preferentially bind m 5 C in vitro ( Figure 3B; Suen, 2022). Some m 6 A readers use common RNA binding domains to identify modified RNA (Shi et al., 2019). Amino acid residues shown to be important for m 5 C binding in ALYREF also reside in the RNA recognition motif (RMM) domain, which is a known RNA-binding domain . The C. elegans orthologues ALY-1 and -2 are predicted to have a similar protein fold in the same region ( Figure 3C).
Having shown that ALY-1 and ALY-2 preferentially bind m 5 C-modified RNA, we wondered if other RRM-containing proteins were also identified as m 5 C unique binders. We found that three other proteins with one or more RRMs, RNP-4, SQD-1 and PAB-2, bind to m 5 C baits preferentially. RRM domains consist of a four-stranded antiparallel ß-sheet with two α-helices packed against it (Maris et al., 2005). Sequence alignments of the RRM domains in ALY-1, ALY-2, RNP-4, SQD-1, PAB-2 and ALYREF show that a high degree of conservation exists across these proteins in the two ribonucleoprotein (RNP) consensus sequences that are important for nucleic acid binding. Interestingly, a phenylalanine residue outside of the two RNPs is highly conserved in all six proteins ( Figure 3D).
In summary, we have identified putative readers for m 5 C, hm 5 C, hm 5 Cm and f 5 C in C. elegans using proteomics. To verify the relevance of this proteomic dataset, we confirmed that ALY-1 and ALY-2 interact with m 5 C orthogonally. Finally, we identified three candidate proteins that share a conserved RRM domain with ALY-1/2 (RNP-4, SQD-1 and PAB-2), highlighting these as potential bona fide m 5 C readers.

Discussion
This work presents a list of candidate readers of the m 5 C oxidative derivatives hm 5 C, hm 5 Cm and f 5 C. Stable isotope labelling experiments suggest that hm 5 Cm is a stable modification (Huber et al., 2017). In addition, it has been shown that the ratios between hm 5 C, hm 5 Cm and f 5 Cm in the tRNA Leu-CAA pool varies between organisms and tissues, suggesting a dynamic regulation of these marks in different physiological contexts (Kawarada et al., 2017). This could suggest that these modifications carry out cellular functions themselves, rather than being metabolic by-products of a demethylation pathway. Considering the reported function of f 5 C in mitochondrial protein synthesis (Takemoto et al., 2009), it is expected that the oxidation of m 5 C in tRNAs might affect their role in cytoplasmic translation. The identification of readers provides a first step to elucidate these functions. Importantly, pathogenic mutations have been mapped to NSUN2, NSUN3 and FTSJ1, suggesting that imbalances in these modifications might contribute to the pathogenesis of human diseases The genome of C. elegans encodes three members of the Aly/REF family. It has been shown that simultaneous downregulation of these three genes by RNAi does not compromise either viability, or development. In addition, no defects in mRNA export were observed upon simultaneous knockdown of ALY-1, ALY-2 and ALY-3, suggesting that these proteins mediate alternative processes in the nematode, or that their role in transcript export is redundant with other genes (Longman et al., 2003). This is in agreement with reports showing that in Drosophila ALY proteins are not required for general mRNA transport (Gatfield & Izaurralde, 2002). In contrast, ALY-1 and ALY-2 have been implicated in nuclear retention of a specific mRNA. In C. elegans, sex is initially specified by the ratio of X chromosomes to autosomal chromosomes, i.e. XX animals are hermaphrodites and X0 are males. Female fate requires the genes tra-1 and tra-2, while male cell fate requires inhibition of tra-2 activity. Kuersten et al. showed that binding of NXF-2 and ALY-1/2 inhibits nuclear export of tra-2. In support of these findings, RNAi against ALY proteins led to 5-7% female occurrence in the progeny (Kuersten et al., 2004). It remains to be determined whether m 5 C acts as an intermediate in showing enrichment of ALY-1 and ALY-2 following pulldown using m 5 C-modified RNA oligos and relative quantification (right). Mean ± SD, quantification performed using ImageJ, n = 2 independent biological replicates. (c) Overlay of ALY-1 (yellow) or ALY-2 (pink) with ALYREF (grey) structures. ALY-1 and -2 structures were predicted using the SWISS-Model platform which were aligned against a ALYREF crystal structure (PDB: 3ULH). The overall RRM folds are conserved in ALY-1 and -2. (d) Sequence alignment of RRM in m 5 C unique binders with ALYREF using Muscle. Black squares highlight consensus sequences RNP1 and RNP2 that are found in RRM. "*" represents identical amino acids while ":" and "." represent amino acids with strongly and weakly similar properties, respectively. Amino acids highlighted in pink are identical to the reference sequence (ALYREF). this process, for example by enhancing binding affinity of ALY proteins to tra-2 mRNA. Notably, our previous report demonstrated that m 5 C is absent or occurs very rarely in the coding transcripts in C. elegans, suggesting that ALY-1/2 could be recognising m 5 C in other molecules (Navarro et al., 2020). In addition, the existence of multiple orthologs of ALY genes in C. elegans raises the possibility that these proteins could act redundantly or bind differential sets of modified residues in RNA.

Conclusions
Our study presents the first comprehensive investigation of putative readers of m 5 C and its derivatives in a whole organism. This dataset represents an important resource for the discovery of new functions of the RNA m 5 C pathway.

Anna Y Zinovyeva
Kansas State University, Manhattan, Kansas, USA Jeffrey Medley Division of Biology, Kansas State University, Manhattan, Kansas, USA RNA modifications have gained much interest in the recent years as they become associated with important biological functions. 5-methylcytosine (m 5 C) is an abundant and conserved posttranscriptional modification found on a variety of RNA molecules. Thus far, much effort has been placed on identifying m 5 C modification sites and unraveling the biological significance of these modifications through "writer" loss-of-function experiments. More is currently known about the "writers" and "erasers" of the m 5 C modifications than the "readers", which could introduce specificity and unique biological effects through combinatorial modification plus target context.
In "Identification of putative reader proteins of 5-methylcytosine and its derivatives in Caenorhabditis elegans RNA", Navarro et al. takes advantage of C. elegans, with its excellent genetic tractability, to identify potential readers of m 5 C modification and its derivatives. Identifying proteins that are responsible for reading specific RNA modifications is an important and valuable first step towards understanding the functional significance of distinct RNA modifications. This study aims to do this by identifying C. elegans proteins that preferentially interact with synthetic RNA baits containing different chemical modifications. The mass spec identification of bait RNAassociated proteins is a great first-order approach and is thus well-conceived, with appropriate unmodified RNA oligonucleotides and beads-only controls. However, addressing two major areas would be needed to elevate this manuscript to its intended form as a community resource.
1. The first key issue that needs to be addressed is demonstrating to the readers that the provided lists of putative modified oligo interactors are a solid starting point for follow up work. This can be achieved through addressing the following comments: Additional mass-spec data analysis should be presented to increase the scientific readers' confidence in the identified lists of putative m 5 C, etc. "readers". Usage of the intensity-based absolute quantification (iBAQ) median across three (3) replicates could be wildly misleading if there is a high degree of variability among the replicates and if pairing of the data exists. Use of a median essentially reduces the analysis to a single replicate comparison between the experimental conditions. One way to bolster the confidence in the data would be to provide a principal component analysis (or analyses) of the proteomics data to demonstrate data consistency between control and m 5 C (and derivatives) experimental conditions. Figure 1 legend and manuscript text: Please define what you mean by "Proteins uniquely enriched in pulldowns". Does that mean 1. proteins that were enriched 1.5 or 2-fold over control in each experiment, 2. proteins that were unique to modification type but not enriched over control, or 3. proteins that were 1.5 or 2-fold enriched over control and unique to m5C pulldown vs. other modifications? The reader assumption is that it's likely the latter, however, Figure 1B highlights "uniquely enriched proteins" that appear to fail the 1.5 or 2-fold enrichment threshold. Please address this discrepancy. In addition, use of 1.5-or 2-fold lines on Figure 1B plots, as well as colorcoding fold enrichment vs. uniqueness in the plots would further improve the clarity of the data analysis. Labeling highly changed/high confidence candidate readers on the plots (as well as ALY-1, ALY-2, etc.) would also provide context of these proteins' enrichment for the reader. As shown, Figure 1B provides limited and unclear information.
The expectation inherent to the described experimental approach is that the "reader" proteins are uniquely and specifically able to discriminate between m 5 C RNA modification and its derivatives hm 5 C, hm 5 Cm, and f 5 Cm. A discussion addressing the justifications for this expectation is important to set the stage for the reader to understand the appropriateness of the "uniquely binding" criteria of the co-precipitating proteins. A protein appears to be disqualified from being considered a candidate "reader" if it is not unique to any given modification derivative. Is there a precedent of this assumption and could this assumption perhaps erroneously eliminate "readers" capable of reading m5C and some of its derivatives? Further discussion on this is needed to ensure the manuscript audience can evaluate the validity of the approach used to qualify any given putative interactor as a potential "reader".
In a similar vein of thought, the manuscript does not address the justifications for using the specific oligos and the three modification sites within each oligonucleotide. Thus, the manuscript would greatly benefit from a discussion of what previous data drove the design of the RNA oligonucleotides used in this study, as well as the justification for using the three (3) specific locations for m 5 C within the oligos. Some discussion about how the baits were designed and potential limitations would be helpful to understand the significance of the findings. Given that many m 5 C modification occurs on tRNAs, were the baits designed to mimic the tRNA structure (since the structural context is important)? Is there evidence that RNAs contain these specific modifications at multiple sites? If so, is it known whether modification "readers" can differentiate between molecules containing multiple modifications and single modifications? If single-modified baits were used, how might that impact the identified readers? It seems plausible that some "readers" might differentiate between single-stranded, double-stranded and loop modifications. An expanded discussion of some of these points would help clarify the significance and limitations of this study, necessary to identify potential avenues for further analysis. Figure 3A, the residues required for RNA binding are highlighted, with the authors referring to Yang et al., 2017 1 : "residues in ALYREF required for m 5 C binding previously identified by Yang et al. are highlighted in red". However, these residues do not appear to be defined in that study. According to Yang et al., 2017 1 , the same residues are described as being "the relatively conserved amino acids used for constructing mutants" (and have not each been validated as important for RNA binding as suggested in this study, with one potential exception of K171). The authors should also clarify the relevance of the Yang et al., 2019 2 citation, as it appears to have been referenced in error.

In
In Figure 3A and 3D, numbering of amino acid positions would help orient manuscript readers to positions of functional domains and understand the take-home message. In Figure 3A, the highlighted residues appear to be different from the RNP1 and RNP2 motifs highlighted in Figure  3D. What is the basis for RNP1 and RNP2 being highlighted in Figure 3D as RNA binding domains? This should be clarified in both the manuscript text and figure legend. Further, these reviewers are not convinced that the provided alignment shows a "high degree of conservation", as shown in Figure 3D. Usage of protein isoforms identical for the shown region can skew the Muscle analysis and more information is needed for why the authors deem the highlighted regions (RNP1 and 2) in Figure 3D as "important for nucleic acid binding". Finally, the language should be softened to propose the highlighted "RNP" regions as potentially important, with qualifications that the highlighted regions have not been demonstrated to bind nucleic acids. Justifications for highlighting the proposed regions should still be provided.
ALY-1/2 pulldowns/Western findings would be bolstered by providing all replicates in Supplemental data. In addition, language within the manuscript should be softened to avoid stating that pulldowns and Western unambiguously confirm ALY-1 and ALY-2 as m 5 C readers.

Additional comments:
These reviewers appreciate the list of putative interactive proteins provided as a resource in The Supplemental Table 1. However, the practical usability of that resource would be dramatically improved by including the C. elegans gene names. This would also allow the readers to independently assess the nature of the putative interacting proteins beyond the provided GO term analysis. Figure 3C: The legend should state that only the RRM domains of the proteins are shown. This will prevent manuscript readership from misinterpreting the shown predicted structures as structurally conserved across the entire protein length.
Based on GO-term analysis, the authors claim that hm5C and f5C modifications show "exclusively germline related phenotypes". However, one of the phenotypes for hm5C was sluggish, which seems unlikely to result from a germline defect.
Please state in the Methods section how much protein was used per each pulldown experiment going into mass spectrometry. As written, it appears that only 2mg of protein has been used, an inadequate amount for most mass spectrometry experiments. Alternatively, please reference the literature that supports that 2mg of protein is sufficient for RP-LC mass spectrometry. modification in C. elegans. Only two m 5 C readers have been identified to date in any organism, so identification of more candidate proteins is an important question. To this end, Navarro et al, synthesize RNA baits modified with m 5 C and derivatives, incubate the baits with RNA lysate, and following pulldown, perform mass spec for interacting proteins. A significant number of candidate readers were identified, including orthologs of ALYREF, a previously known m 5 C reader, which were then further validated by m 5 C RNA pulldown and western blot. Further analysis will be needed to parse the false positives from the true m 5 C readers and to determine the biological function of each protein.

Comments -
RNA baits were designed in three structural contexts, single-strand, double-strand and loop -but it is not discussed in the results whether the identified m 5 C candidate readers were found using one or all three of these structures. And were unmodified baits also used in all three structural contexts?
○ The text indicates that candidates were limited to those that bound only one of the modified RNA baits, but it is not discussed whether the known readers are unable to bind the m 5 C variants. Is it possible that interesting candidates would have been eliminated at this step? At a minimum, the full list of mass spec interactors, including those found binding multiple m 5 C variants should be provided. ○ Figure 3A and text "Protein alignments show that some of these residues are conserved in ALY-1 and ALY-2" -Of the residues highlighted in the red box, several are weakly conserved or not conserved at all. It is not discussed whether the loss of these critical residues will likely affect the ability of these proteins to bind the m 5 C modification.

○
In the last sentence of the results, the word "bona fide" should be removed since these interactions have not yet been validated.

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? Partly

Are the conclusions drawn adequately supported by the results? Yes
Besides mapping chemical modifications in RNA, the vast majority of epitranscriptomics studies is currently reporting on associations between the function of a gene (writer, reader, eraser) and a multitude of observable phenotypes. These approaches are usually very simplistic and classically involve gene knockdowns or even knockouts, followed by the description of phenotypic changes, many of which seem to be rather pleiotropic resulting in little insight which of the RNA substrates of a particular writer, reader or eraser protein has contributed to the observed changes. Interestingly, even though mapping studies have provided positional clues about the occurrence of particular modifications, there seems to be little incentive in following up on assigning functional impact of specifically modified nucleotides in specific RNAs on a particular trait and/or phenotype.
It is accepted that RNA modifications are "written" onto RNA sequences by a staggering number of dedicated enzymes. While non-coding RNAs are often stoichiometrically modified with a great number of different RNA modifications (i.e., tRNAs) , mRNAs, while being modified with a few different RNA modifications, carry only sub-stoichiometric levels of a particular modification at specific nucleotides. It follows that modified nucleotides in mRNA are not only rare within a transcript (1-3 modified nucleotides per 1000 nucleotides when averaged over all mRNAs for the most abundant mRNA modification, m6A) but also stoichiometrically low (e.g., about 20% m5C can be detected at a given position based on RNA bisulfite sequencing). This is an important and seemingly overlooked conundrum in the field, since it begs the question how a few nucleotides within thousands of nucleotides, and those at every 5 th to 10 th mRNA molecule should account for physiologically meaningful impact on the biology of a cell.
A solution for this presents the concept of reader proteins. This concept has been put forward by epitranscriptomic influencers borrowing heavily from the notion of epigenetic readers. If there are only a few RNA modifications within a sea of transcripts, then proteins binding preferentially to this modified nucleotides (readers) might make the difference. Is this concept correct? Do experimentally validated data support the assumption that a few RNA modifications will make a difference through interpretation of particular proteins?
In the last years, especially the m6A-focused field has reported on the identification of various m6A readers. A few attempts also exist in the public record that identified readers of other RNA modifications, including m5C. Notably, much of the current identification and characterization of these RNA modification readers relied on in vitro pull-down experiments using either defined (synthetic) RNA baits (rather than those identified by epitranscriptomic mapping strategies) combined with mass spectrometry or RNA-IPs (and derivatives of the method) to identify RNA substrates of specific reader proteins. From these reports one can deduce that all readers of specific RNA modifications also bind to unmodified RNA sequences. This begs the question if readers bind with lower Kd to modified RNAs than to unmodified RNAs? Curiously, very little has been published reporting on actual interaction parameters for specific readers and their substrates, which is common when studying enzymes or protein-substrate interactions. Instead, fixed end-point assays with readouts based on RNA sequencing or western blotting are currently substituting for deriving quantitative and kinetic parameters from the identified interactions, which would be needed to better understand the biology of putative readers of RNA modifications.
This manuscript by Navarro et al., 2022 is therefore welcome for attempting to identify reader proteins for m5C as potential mediators of RNAs modified with this chemical modification.
Assuming that C. elegans contains readers for the m5C modification and its oxidized derivatives, the authors use an in vitro RNA pull-down approach with differently modified RNA baits to identify proteins differentially binding to m5C, hm5C, hm5m and f5C. Setting rather tight cut-offs for enrichment of controls, they report on numbers and GO annotations of the identified proteins.
Since they found the orthologues of a previously identified human m5C reader protein (ALYREF) in their data, they repeated the pull-down experiment with m5C-modified oligos and identified both worm orthologues of ALYREF to bind preferentially to m5C-modified oligos.
This reviewer accepts that Wellcome Open Research publishes articles reporting any original research that has been funded (or co-funded) by Wellcome, without editorial bias including papers reporting single findings, replications studies, and null results or negative findings. This reviewer will therefore only assess the validity of the article content, rather than the novelty or interest levels.
Comments: This manuscript cannot be disconnected from Navarro et al., 2020 1 , which reported on only 188 positions in the worm transcriptome as putative m 5 C sites. Their final conclusion about m 5 C in C. elegans: "In summary, we found no evidence of a widespread distribution of m 5 C in coding transcripts. To investigate whether the presence of common characteristics could support a subset of the aforementioned 188 positions as bona fide methylated sites, we performed gene ontology, motif search, genomic localisation and secondary structure analyses on these transcripts and sites; however, no significant shared features were found. While our data do not completely rule out the existence of m5C methylation in mRNAs, it demonstrates that this mark cannot be detected in high stoichiometry in C. elegans, as observed in tRNAs and rRNAs." These published results hints towards the fact, that putative readers of m5C in worms will read this modification likely in tRNA and rRNA but not in mRNA. This reviewer suggests to place a reminder of their findings into the introduction of the manuscript, since the current version will be read by many scientists, eager to put 1.
meaning into exceptions, as proof that C. elegans contains m5C in mRNAs which affect its adaptability to high temperatures.
The RNA bait is depicted as a secondary structure in Figure 1a. Please specify in the M&M, how you achieved formation of such structures before performing pull-down experiments with protein extracts.

2.
PAGE 4: "To increase the chances to identify bona-fide modification readers…" Since synthetic and not naturally occurring modified cytosine-containing RNA oligos were used, this reviewer would not call the "to-be-identified readers" bona fide (meaning genuine, true).
3. . Protein alignments show that some of these residues are conserved in ALY-1 and ALY-2 ( Figure 3A)." Yang et al., reported on exactly one (1) amino acid (K171) that seemed to be required for binding of ALYREF to m5C-modified RNA oligos. Please, correct this sentence. Figure 3a shows that exactly this lysine (human K171) is a glutamine (Q) in both worm aly-1 and -2. Even though a few other residues in the shown sequence are conserved, none really matters since lysine within the sequence of AMKQ in human ALYREF is the only residue to be important as published by Yang et al., 2019. Spoiler alert for readers of this review: in S4G of Yang et al., 2019, exactly at the level of ALYREF (with the K171A mutation), the binding to the 5mC oligo is obscured by a blind spot (air bubble?), which makes it questionable if that the signal is really reduced.

5.
Figure 3a: "*" represents identical amino acids while ":" and "." represent amino acids with strongly and weakly similar properties, respectively." Please indicate, which protein substitution/similarity index was used to support this statement.
6. Figure 3b: "Anti-FLAG western blotting (left) showing enrichment of ALY-1 and ALY-2 following pulldown using m 5 C-modified RNA oligos and relative quantification (right). Mean ± SD, quantification performed using ImageJ, n = 2 independent biological replicates." Provided raw blots at https://data.mendeley.com/datasets/k6z3c5xftr/3 do not show replicates of pull-down followed by aly-2-FLAG identification. Please provide, or state that the data come from a single experiment. A comment for interested readers: quantification of overexposed inputs is notoriously difficult and likely not possible. Hence, either a titration curve on input protein or lower exposure would convince this reviewer of the stated enrichment factors using density measurements.

7.
"Amino acid residues shown to be important for m5C binding in ALYREF also reside in the RNA recognition motif (RMM) domain, which is a known RNA-binding domain (Yang et al., 2017 3 )." same overstatement as before. Yang et al., 2017 only reported on K171, which lies at the border of the RRM and is not conserved in any of the proteins shown in Figure 3. 8.