A plant-like mechanism coupling m6A reading to polyadenylation safeguards transcriptome integrity and developmental genes partitioning in Toxoplasma

Correct 3’end processing of mRNAs is regarded as one of the regulatory cornerstones of gene expression. In a parasite that must answer to the high regulatory requirements of its multi-host life style, there is a great need to adopt additional means to partition the distinct transcriptional signatures of the closely and tandemly-arranged stage specific genes. In this study, we report on our findings in T. gondii of an m6A-dependent 3’end polyadenylation serving as a transcriptional barrier at these loci. We identify the core polyadenylation complex within T. gondii and establish CPSF4 as a reader for m6A-modified mRNAs, via a YTH domain within its C-terminus, a feature which is shared with plants. We bring evidence of the specificity of this interaction both biochemically, and by determining the crystal structure at high resolution of the T. gondii CPSF4-YTH in complex with an m6A modified RNA. We show that the loss of m6A, both at the level of its deposition or its recognition was associated with an increase in aberrantly elongated chimeric mRNAs emanating from impaired transcriptional termination, a phenotype previously noticed in the plant model Arabidopsis thaliana. We bring Nanopore direct RNA sequencing-based evidence of the occurrence of transcriptional read-through breaching into downstream repressed stage-specific genes, in the absence of either CPSF4 or the m6A RNA methylase components in both T. gondii and A. thaliana. Taken together, our results shed light on an essential regulatory mechanism coupling the pathways of m6A metabolism directly to the cleavage and polyadenylation processes, one that interestingly seem to serve, in both T. gondii and A. thaliana, as a guardian against aberrant transcriptional read-throughs. Highlights m6A is recognized in apicomplexan and plants by CPSF4, a member of the cleavage and polyadenylation complex machinery. The structural insight behind the specificity of the binding of m6A by the CPSF4 YTH subunit are solved by high resolution crystal structures. The m6A-driven 3’end polyadenylation pathway protects transcriptome integrity by restricting transcriptional read-throughs and RNA chimera formation in apicomplexan parasites and plants.


64
A member of the phylum Apicomplexa, Toxoplasma gondii is an obligate parasite that 65 develops and proliferates inside a surrogate host cell and causes toxoplasmosis, a 66 usually mild disease in immunocompetent humans that can turn into a major threat to 67 the unborn and to immunocompromised people e.g. with acquired immunodeficiency 68 syndrome or under chemo-and graft rejection therapies (Milne et al., 2020). T. gondii 69 has evolved dynamic and robust mechanisms for adapting and regulating its genetic Apicomplexans seems to be held by complex mechanisms to which can be attributed 90 the apparent episodic absence of correlation between the levels of mRNA and their 91 corresponding proteins, at a given stage (Holmes et al., 2017). AAUAAA, could themselves be methylated, as a nanopore-based analysis indicated an 165 enrichment of PAS motifs around m6A motifs (Parker et al., 2020). A link between the 166 presence of an m6A site and the overrun of the respective proximal PAS by the 3'end 167 processing machinery was briefly implied in plants (Parker et al., 2020). Moreover, the 168 fact that chimeric mRNAs were generated in plants, in the context of a deficiency in 169 the CPSF30L isoform, hints at transcriptional readthrough events taking place, and at 170 the involvement of the YTH domain of CPSF30 in this process (Pontier et al., 2019). 171 While a link between m6A-related proteins and 3'end processing players has been 172 proposed, the mechanistic and functional outcomes of such a cross between these two 173 pathways as well as their evolution across species have not yet been fully explored. 174

175
Here, we describe the T. gondii homolog of CPSF4 and we demonstrate by mass 176 spectrometry the involvement of the CPSF4-YTH protein in the core CPSF complex, 177 as well as providing the overall composition of the latter through the use of 178 endogenously tagged and purified putative CPSF subunits. More importantly, we bring 179 in-vitro evidence for the ability of the T. gondii YTH domain to recognize m6A 180 modified RNAs, which we corroborate by providing comparable data in Arabidopsis 181 thaliana. We were also able to determine the crystallographic structure of the T. gondii 182 YTH domain in complex with a short 7 mer m6A modified RNA. Finally, our native 183 RNA sequencing analysis allowed us to first to identify putative m6A sites, and second 184 to shed light on an essential regulatory mechanism coupling the pathways of m6A 185 metabolism with the polyadenylation processes, one that interestingly seem to serve, in  to their respective predicted molecular weights. For instance, the CPSF4 subunit with 213 its theoretical mass of 68 kDa, can be found mostly abundant within the band at 62 kDa. 214 A relatively high quantity was detected of two yet unknown proteins which may 215 constitute apicomplexan specific subunits of the CSTF or CPSF complexes, namely 216 TGME49_261960 carrying an RNA recognition motif, and TGME49_254210 carrying 217 a C2H2 zing finger domain. Similarly, the CPSF complex core components were also 218 pulled down during the purification of the Fip1 subunit, however less rigorously as can 219 be observed from the mass spectrometry-based proteomic characterization (Fig 1C and  220 S1 Data). It is worth noting that the mostly non-structured nature of the Fip1 protein 221 allows the suggestion of some degradation events taking place to justify the relatively 222 poor peptide representativity of this subunit in the CPSF1 immunoprecipitation 223 profiling. It must be noted that none of these subunits immunoprecipitation data allowed 224 the detection of the PAP subunit within, despite the fact that we managed to purify the 225 PAP-FLAG protein separately, except for WDR33 interactome analysis in which only 226 very weak amounts of PAP were detected (Fig 1B and S1 Data). This could be 227 explained either by a highly transient binding mode of PAP, or by its featuring of 228 weaker interactions within the complex that could have been disrupted during the 229 stringent salt washing conditions (up to 500 mM KCl) of the purification steps. 230 Other than all of the identified CPSF subunits seemingly sharing a nuclear-based 231 localization, their tachyzoite-based fitness assessment suggests that they are all 232 essential for the survival of the parasite (S1C

T. gondii CPSF4 harbors a YTH domain in addition to the conserved zinc fingers, 237
an architecture also found in plants 238 Of the CPSF complex, the T. gondii CPSF4 subunit can be distinguished as one holding 239 a unique architecture which interestingly is shared with the plant CPSF4 family, and it 240 constitutes of a co-occurrence of three zinc fingers and a conserved YTH domain (Fig  241   1C). In comparison, the metazoan and fungi counterparts display five distinctive and 242 evolutionary-conserved CCCH-type zinc fingers, in addition to a zinc knuckle, but none 243 of them presents the YTH domain within the same protein ( Fig 1C). It must be noted 244 that this architecture is detected on one of two isoforms of the CPSF homolog (CPSF30 245 gene At1g30460) in Arabidopsis thaliana, namely the CPSF30L, while the short 246 version CPSF30 is one that lacks the YTH domain (S1D following assessment, the proteins of metazoans and fungi, can be set together against 255 those of plants, Apicomplexa, but also of chromerids which constitutes one of the last 256 common ancestors between the two. These proteins can be compared in view of the 257 ability of human CPSF4 zinc finger motifs to recognize the canonical polyadenylation 258 signals (PAS) consisting of the hexamer motif AAUAAA, with which the binding is 259 sufficient to recruit poly(A) polymerase through the binding of CPSF30 to Fip1 (Fig  260   1D) (Clerici et al., 2018;Sun et al., 2018). A close-up view of the CCCH-type zinc 261 fingers highlights a great conservation between the ZNF2 from metazoan and fungi and 262 the ZNF1 from plants, chromerids and Apicomplexa (S1E Fig), suggesting that the 263 function acknowledged for the metazoan CPSF4 might be conserved in the 264 aforementioned counterparts, that former being an ability to recognize nucleotides A1 265 and A2 (Fig 1D). Similarly, the ZNF3 from plants, chromerids and Apicomplexa shares 266 a great conservation with the ZNF5 from metazoan and fungi, a motif involved in Fip1 The presence of a YTH domain within T. gondii CPSF4 was intriguing and prompted 272 us to explore its link to the m6A modification, as it is recognized as a reader of this 273 latter. First, we checked for the corresponding methyltransferases (writers) in T. gondii. 274 As with many Apicomplexa, the T. gondii genome has retained the genes encoding for 275 METTL3 and METTL14 which together are known to form a core catalytic complex, bioinformatic-based analysis, we failed to detect in apicomplexan genomes the 283 auxiliary proteins that are usually found in the identified human complexes, and that 284 are thought to aid the catalytic core components in the correct m6A deposition (S2A 285 To further explore how the enzymes partner in vivo, we generated knock-in parasite 288 lines expressing a tagged version of METTL3, METTL14 and WTAP. 289 Immunofluorescence analysis of intracellular parasites revealed an almost exclusive 290 nuclear staining for all of METTL3, METTL14 and WTAP (Fig 2A-B). Intense 291 punctate foci were detected, similarly to their human counterparts which were seen to 292 accumulate as condensates within nuclear speckles (Ping et al., 2014), with these latter 293 representing phase-separated membrane-less organelles enriched in pre-mRNA 294 splicing factors. In addition to its nuclear staining, the WTAP protein displayed a 295 diffused staining throughout the cytoplasm, hinting at the ability of this protein to 296 shuttle between the nucleus and the cytoplasm (Fig 2A). 297

298
In order to validate the predicted association between METTL3, METTL14 and WTAP, 299 and in the hope of identifying auxiliary proteins, even if divergent ones, we opted for a 300 biochemical approach, which allowed us to define the interactome of each of the 301 catalytic core subunits, using the respective endogenously HA-FLAG tagged knock-in 302 parasites. Western blotting of the flag eluates revealed a single band at the expected 303 size for each protein, with the exception of METTL3 which exhibited lower 304 substoichiometric forms, which may result from a sensitivity to degradation (S2B Fig). Coomassie stain analysis of the FLAG eluates suggested that all three proteins bind to 306 multiple partners under high stringent wash conditions (0.5 M NaCl and 0.1% NP-40; 307 Fig 2C and S1 Data). These partnerships were subsequently resolved by mass 308 spectrometry-based proteomics which identified METTL3 and METTL14 as an intact 309 dimeric RNA methyltransferase core complex ( Fig 2C) with an apparent molecular 310 weight by size exclusion chromatography of 400-500 kDa ( Fig 2D). While METTL14 311 was not detected in the eluates of WTAP and vice versa, WTAP was found in the 312 METTL3 pull-down in significant quantities despite the stringent washing conditions 313 (S1 Data). Additional partners were recognized as the RNA-binding proteins displaying 314 multiple RRM (TGME49_291930) or KH (TGME49_235930) domains, ATP-315 dependent RNA helicase involved in pre-mRNA splicing (DHX15 and DDX17) and 316 the notable PAP enzyme (Fig 2C). Interestingly, our experiments also reveal the 317 existence of two new partners, the uncharacterized proteins TGME49_226660 and 318 TGME49_275990, which were detected abundantly in WTAP eluates but also in less 319 quantity in METTL3 and METTL14 pull-down ( Fig 2C). Having identified the complexes putatively acting as methyltransferases of the m6A 324 modification, we next examined the extent of this supposition by attempting to deplete 325 METTL3 function, as it is thought to carry the catalytic potential of the core complex. 326 To this end, we employed the auxin-inducible degron (AID) system, for an acute and 327 reversible depletion of METTL3, owing to the essential requirement of this latter for 328 the fitness of the tachyzoite (Fig 3A and S2C Fig). The m6A mark is detected mostly 329 within the cytoplasm by immunofluorescence staining in tachyzoites ( Fig 3B). The 330 staining of m6A was reduced following the knockdown of METTL3, of which the 331 expression can easily be considered as specifically omitted at 24 hours post treatment 332 with indole-3-acetic acid (IAA) (Fig 3A). However, the m6A staining was far from 333 being fully cleared until after having exposed the cells to a longer METTL3-KD-334 induction period of about 48 hours total, a time when a more drastic decrease could be 335 detected in the cellular m6A levels in T. gondii (Fig 3B). It should be noted that along 336 with this post-translational loss of METTL3, a CRISPR-based transitory genetic 337 inactivation of this protein, resulted in a similarly significant drop in the m6A levels 338 ( Fig 3C).  CPSF4 (predicted from residues 434 to 598) we undertook to recombinantly express 354 the domain in E. coli on its own with an N-terminal TEV cleavable 8*His tag with 355 minimized extremities so as to limit disordered regions ( Fig 4A). We then used 356 isothermal calorimetry to titrate a chemically synthetized 7 mer RNA with a consensus 357 m6A site (5'-GAACAUU-3') possessing or lacking the m6A modification ( Fig 4B). As 358 measured, binding towards the RNA substrate has a relatively high dissociation 359 constant (Kd) but is entirely dependent on the presence of an m6A modification as 360 almost no binding affinity is measured in the un-modified RNA (Fig 4B). The same is 361 also true for the YTH module (residues 277-445) of Arabidopsis thaliana CPSF4 (Fig  362   4C) confirming that this ability to bind m6A is a shared evolutionary feature across the 363 apicomplexan and plant kingdom.  (Table 1) which all diffracted to high resolutions 370 (up to 1.45 Å for the RNA bound form), sometimes using fully automated upstream 371 crystal harvesting (using the EMBL crystal direct technology) and automated crystal 372 diffraction (using MASSIF-1 at the ESRF). Molecular replacement (using the 373 YTHDC1 pdb 4r3i) was able in all cases to rapidly find phasing solutions. Overall, the 374 CPSF4 YTH domain folds into a well-structured domain ( Fig 5A) featuring six alpha 375 helices (α1-6) and 5 beta sheets (β1-5). The m6A binding site involves residues in or 376 close to helices α1/ α2/ α3 and beta sheet 1 (Fig 5A, S3A Fig). For convenience, we 377 depict the m6-adenosine as an adenine, although in practice crystals were co-grown 378 with m6-adenosine, the ribose electron density is indeed poorly visible in the crystal 379 structure, the adenine electron density is however unequivocal (S3B Fig).  When comparing the CPSF4 YTH with its closest homologue structure (human 386 YTHDC1), overall general conservation of the domain is observed (Fig 5B) with both 387 sharing a sequence identity of 38% and having most of their secondary structure 388 features conserved as well (Fig 5C), apart from α3 and β1 which are unique secondary 389 structures to CPSF4 YTH. Although strongly conserved, the aromatic cage recognizing 390 the m6A displays a notable difference in the region between residues 519 to 526 region 391 (res 428 to 439 in the human YTHDC1) with the absence of a methionine residue 392 (M434 in YTHDC1) and the presence of an additional valine (V522). Finally, visible 393 angular differences in the planes of the m6A base between human YTHDC or CPSF4 394 YTH are seen in the m6A co-crystals but do not reflect a biological reality as the m6A 395 modified RNA with CPSF4 YTH adopts a comparable plane as that of YTHDC1. 396 The m6A RNA / CPSF4 YTH structure reveals a conserved RNA binding mode 397 and no sequence specificity outside of m6A recognition 398 With no prior information on the potential sequence specificity of CPSF4 YTH towards 399 m6A modified RNA in T. gondii we undertook to crystalize the Tg-YTH with the 400 canonical m6A modified short RNA used in the isothermal titration experiment. 401 Although using a 7 mer GA-m6A-CAUU RNA, we can only visualize the electron 402 density of the m6A followed by two nucleotides downstream ( Fig 6A). The RNA is 403 bound within a clearly positively charged groove which is then followed by a potential 404 secondary groove. In this structure, as in others for YTH domains, the m6A-modified 405 base is twisted inward compared to the other bases. Although the m6A base electron 406 density is clearly visible, the following cytosine and adenosine have poor electron 407 density for the bases which are solvent exposed, the sugar and phosphate backbone is In order to answer this question, we proceeded by first exploring the transcriptional 435 outcome of depleting the CPSF4 protein by employing the auxin-inducible degron 436   Furthermore, the sequencing is stranded going from 3'to 5', so 3' ends are sequenced 463 first. Therefore, the fact that following the depletion of CPSF4, the same aberrant 464 transcription termination seen with illumina RNAseq is detected by nanopore DRS on 465 single full length transcripts, provides evidence that the initial 3' end polyadenylation 466 site has been overrun and that this process has now shifted to employ an alternative 467 PAS within the downstream gene (bottom of Fig 7D), thus the display of these 468  3'end processing zinc fingers, with the YTH that we structurally proved as an m6A 494 reader, it seemed only logical to tackle the weight of this RNA modification on the 495 termination defects that we observed in the context of the CPSF4 KD. For this purpose, 496 we proceeded by assessing the outcomes of diminishing this mark at the level of its 497 deposition by employing the previously described METTL3 KD cell line ( Fig 3A) to 498 generate nanopore-sequenced RNA data, at 24 hours post induction of the knock-down, 499 a time that is short enough to be able to discriminate primary from secondary 500 transcriptomic effects (GEO GSE…). We assessed the average distribution of these chimeric transcripts at a genome-wide 515 level, by using ChimerID scripts (Parker et al., 2020), and we concluded that the 516 formation of RNAs chimeras following the depletion of CPSF4 and METTL3, occurred 517 in a global and frequent manner ( Fig 8A). These chimeras displayed different patterns 518 of alternative splicing, which can be exemplified as follows: i) fusion transcripts 519 covering two loci, each retaining the same splicing patterns as annotated, with an un-520 spliced, intact intergenic region (e.g. mRNA-ch1 in Fig 8C and mRNA-ch2 in S10C 521   an increase in the number of reads of a set of genes, when compared to their repressed 552 state in the WT samples (Fig 9A-B). This mutation-specific increase was evidently to 553 be caused by a readthrough of an upstream gene, the transcription of which did not 554 terminate and read into the adjacent gene and terminated at the PAS of this latter instead 555 ( Fig 9A-B, S17 Fig and S18 Fig). This was occasionally accompanied by a differential 556 state of the splicing of the resulting elongated transcripts. Our nanopore-based data thus 557 KD) to map significantly modified error sites. Thereby, using such an approach to 579 compare UT vs IAA-dependent METTL3-KD DRS datasets, allowed us to presume 580 that most of these detected differential error sites are m6A sites (Fig 10A). We then 581 analysed the motifs around which the most significant peaks of error sites were mapped, 582 which revealed a high and significant enrichment of a motif consisting of ARACW (R 583 = A/G, W= A/T/G) (Fig 10B). This resembles the RGAC core motif which is the 584 established m6A consensus sequence identified in P. falciparum (Baumgarten et al.,585 2019), A. thaliana (Parker et al., 2020), humans (Linder et al., 2015) and yeast 586 (Schwartz et al., 2013). We were able to also confirm the m6A signature that was 587 identified in Arabidopsis (Parker et al., 2020) using our nanopore data (Fig 10B). About 588 65% of the error sites mapped at the RRAC consensus motif, which seems to be 589 evolutionary conserved across canonical strains of T. gondii, as shown by the evaluation 590 of individual methylation sites, suggesting that mRNA methylation is a cis-regulatory 591 feature conserved at the gene level ( Fig. 10C-F). believe that the choice of this site is initially regulated by the m6A modification. In fact, 598 the majority of the differential error sites that were detected coincided with the 599 previously depicted transcription termination defects. This could be explained by an 600 overlap existing initially between the overrun PAS sites and m6A sites, if it's not that 601 the adenosines of the PAS themselves could be methylated, as it was observed in plants 602 (Parker et al., 2020). 603

604
We believe that, in the natural WT case, the m6A site would guide the polyadenylation 605 machinery via the ability of the CPSF4 YTH to bind this modification, thus allowing 606 the recognition of the respective proximal PAS site, and the proper termination at this 607 locus. This explains why in the absence of this mark, we could no longer detect these 608 transcripts, as they were either non poly-adenylated, or were degraded in consequence. The readthrough of the gene upstream, hereafter referred to as gene1, invading the 615 transcriptional unit of the gene downstream, hereafter referred to as gene2, seemed to 616 be occurring at loci exhibiting a distinctive pattern. In fact, the genes 2 that were 617 displaying now higher amounts of nanopore-reads in the context of KD of METTL3 618 and CPSF4, were mostly, if not all, initially repressed, and represented developmentally 619 regulated genes, that happen to be adjacent to expressed tachyzoite genes. The mRNA 620 analysis of the set of genes that were targeted by this readthrough phenotype, illustrated 621 their developmentally regulated nature (Fig 11A). Interestingly, many of these genes 622 are recognized as targets of the MORC repressor complex and their expression is seen 623 to be upregulated following the KD of MORC (Fig 11A) (Farhat et al., 2020). However, 624 a detailed look at the nanopore derived reads in the contexts of KD of the latter, when 625 compared with those of CPSF4 and METTL3, provided enough proof that the read-626 through phenotype occurs following the KD of both CPSF4 and METTL3, but not of 627 MORC, which only resulted in a conventional promoter-dependent upregulation of the 628 initially repressed genes ( Fig 8C, Fig 10F, Fig 11B-D, S9A Fig, S10B Fig, and S11B 629 formation of chimeric RNAs (Fig 11D), the data generated in the context of MORC KD 631 helped distinguishing the mis-annotation of certain genes, such as the example shown 632 for the TgME49_227630 (Fig 8C), thus avoiding any misinterpretation of the elongated 633 transcripts. 634

635
The recurrence of the dual expression pattern between gene1 and gene2, the first being 636 specific to tachyzoite, and the second being repressed and only expressed in stages other 637 than tachyzoite, suggests that the respective m6A-dependent polyadenylation of the 638 gene1 and the m6A independent polyadenylation of the gene2, at the core of an essential 639 mechanism aiding in the tight transcriptional regulation of developmental stage-640 specific regulated genes in T. gondii. 641

642
An illustrative example of this observation can be that of ROP35 (Fig 11B-D), a rhoptry 643 gene that displays a tachyzoites specific expression, and which occurs upstream of a 644 repressed gene namely TgME49_304730, the expression of which is acknowledged to 645 be specific to the late sexual, early oocyst stages (EES5 and oocyst D0) (Fig 11A). The 646 mRNA levels of ROP35 were unaltered following the KD of METTL3 or of CPSF4, 647 based on both illumina-seq and nanopore-seq data. However, the expression of the 648 downstream TgME49_304730 was clearly induced following the KD of CPSF4, as 649 illustrated by illumina-seq (Fig 11B, top). Similarly, direct RNA sequencing displayed 650 a higher level of reads at this locus and in the context of KD (Fig 11B, bottom). The 651 analysis of the reads generated at these loci provides the evidence for this upregulation 652 to be caused by an overrun of the ROP35 PAS and the readthrough breaching the 653 transcriptional unit of the downstream gene2 (TgME49_304730), as well as the 654 polyadenylation machinery terminating by using this alternative PAS, which can now 655 be referred to as an m6A-independent PAS, as evidenced by the earlier results and the 656 error-based m6A sites identification (Fig 11C). The fact that the parasite has adopted and evolved such an unconventional 3'end 714 processing mechanism, suggests a role for this latter in the tight gene expression 715 regulation that is portrayed by this highly adaptive organism. The m6A dependent 716 polyadenylation was detected mostly at the ends of a set of tachyzoites specific genes 717 that first are highly expressed and second are adjacent to developmental stage specific 718 repressed genes (Farhat et al., 2020), hence at sites where a highly efficient barrier is 719 needed to partition the distinct transcriptional signatures of these tandem genes, thus 720 preventing any aberrant readthrough of the polymerase that is actively transcribing the 721 upstream tachyzoite gene. of transcription that is witnessed in this stage, dampens down in the other stages, these 729 latter being either slow in their proliferation or even quiescent, which might explain 730 why the parasite had privileged a large set of tachyzoite genes by this m6A related 731 transcriptional barrier at their 3'ends. Also, the fact that most of the m6A related 732 enzymes and most of the 3'end processing factors were found to be less expressed in 733 latent stages than they were in highly replicative ones, goes in accordance with the 734 requirement of this mark at these stages in particular (Fig S1B, S2C). 735 736 Although this differential concentration might hint to some level of stage specific 737 upstream regulation for this mark, the fact that no m6A erasers have been detected in 738 T. gondii and that the mark seems to have a relatively long half-life hints at a low level 739 of dynamism for this mark. It seems that the crucial requirement for this m6A-740 dependent barrier in tachyzoites, might not be extended to other stages. When 741 transcribed, the transcriptional termination of the transcripts of genes 2 (as referred in 742 the text to the downstream gene in a tandem, which belongs to stages other than 743 tachyzoites) would occur in an orthodox manner, independently of m6A. 744

745
It must be noted that the PAS sites which we referred to as being m6A-dependent were 746 not sequenced, thus we do not claim that T. gondii harbors the canonical AAUAAA. between the adjacent genes makes it so that the RNA pol II would still be able to scan 780 the downstream poly(A) signal site and to use it to efficiently terminate the transcripts, 781 thus allowing us to detect these latter by nanopore and to assess the phenotype of the 782 KD of CPSF4 at these loci. 783

784
In addition, employing the nanopore DRS allowed us to witness events which we could 785 not have captured through illumina-seq. Apart from the genome assembly artefacts 786 emanating from the inaccuracy of conventional techniques to read the repetitive 787 elements which exist broadly in this genome, there seem to be a fairly large amount of 788 loci that were mis-annotated or even non-annotated in the genome of T. gondii. For 789 instance, nanopore-based DRS allowed us to align the transcripts of some unannotated 790 genes (e.g. Fig S11). The nanopore data also allowed us to distinguish the direction in 791 which the transcription is taking place; for instance, the readthrough breaching into the 792 TGME49_212275 gene (Fig S13) occurs in a direction opposite to the strand at which 793 it is predicted, in the context of the CPSF4-KD-dependent readthrough, while the 794 transcription of this same gene follows its predicted direction in the context of its 795 MORC-KD-dependent induction, similar behavior is observed (Fig 10f). The ability to 796 detect the orientation of the transcription occurring at a gene, allowed us to also predict 797 instances of steric hindrance between molecules of actively transcribing polymerases, 798 as the case at a gene that was initially transcribed in WT, but then had fewer reads 799 mapping at its locus when the adjacent repressed gene was undergoing a CPSF4-KD-800 dependent readthrough (Fig S6d). Toxoplasma strains that were used in this study (listed in Supplementary Table 2) were 820 maintained in vitro by serial passage on monolayers of HFFs. The cultures were free of 821 mycoplasma, as determined by qualitative PCR. 822

Endogenous tagging of CPSF, WTAP and METTL subunits 823
A list of the plasmids and primers for genes of interest (GOIs) that were used in this 824 study is provided in Supplementary Table 1

. To construct the vector pLIC-GOI-825
HAFlag, the coding sequence of GOI was amplified with the primers LIC-GOI-Fwd 826 and LIC-GOI-Rev, using T. gondii genomic DNA as a template. The resulting PCR 827 product was cloned into the pLIC-HA-Flag-dhfr or pLIC-(TY)2-hxgprt vectors using 828 the ligation independent cloning method. The plasmid pTOXO_Cas9-CRISPR was 829 described previously. We cloned 20-mer oligonucleotides corresponding to specific 830 GOIs using the Golden Gate strategy. In brief, primers GOI-gRNA-Fwd and GOI-831 gRNA-Rev containing the sgRNA targeting GOI genomic sequence were 832 phosphorylated, annealed and ligated into the pTOXO_Cas9-CRISPR plasmid 833 linearized with BsaI, leading to pTOXO_Cas9-CRISPR-sgGOI. The same approach 834 was also used to build pLIC-GOI-HAFlag-mAID vectors as already described in 835 (Farhat et al., 2020). 836

Reagents. 847
The following primary antibodies were used in the immunofluorescence, mouse anti-848

MS-based interactome analyses 865
Protein bands were excised from gels stained with colloidal blue (Thermo Fisher 866 Scientific) before in-gel digestion using modified trypsin (Promega, sequencing grade). 867 The resulting peptides were analysed by online nanoliquid chromatography coupled to Broth/Chlo/Kan (TB -Formedium) expression culture (using a 2 ml inoculum) which 914 was incubated at 37°C. Upon reaching an OD600 of 0.5-0.8, cultures were ice cooled 915 to 20°C for 10 min then induced with 500 μM of IPTG (Euromedex) for 12h after which 916 cultures were centrifuged and stored as dry pellets at -80°C. 917

Protein purification 918
Culture pellets were resuspended in 50mM Tris pH: 7.5, 300 mM NaCl and 5 µM β-919 mercaptoethanol (BME) with the addition of complete protease inhibitor (1 tab per 50 920 mL of lysis buffer). Following resuspension, lysis was performed on ice by sonication 921 for 10 minutes (30 sec on/ 30 sec off, 45° amplitude). Clarification was then performed 922 by centrifugation 1h at 12000g/4°C after which the supernatant was supplemented with 923 20 mM imidazole and further incubated with 5 ml Ni-NTA resin with a stirring magnet 924 at 4°C for 30 min. Resin retention was performed by gravity with a Bio-Rad glass 925 column after which the resin was washed with 100 mL of washing buffer (50 mM Tris 926 pH: 7.5, 1M NaCl, 2 mM BME and 20 mM Imidazole). His-tagged TgCPSF4-YTH 927 was then eluted by in 50 mM Tris pH: 7.5, 300 mM NaCl, 300 mM Imidazole, 2% 928 glycerol and 2 mM BME and dialyzed overnight with TEV protease to remove the 929 8*histidine tag in a buffer composed of 50 mM Tris pH 7.5, 150 mM NaCl, 2% glycerol 930 and 2 mM BME. Non-cleaved forms of TgCPSF4-YTH were removed by flowing 931 through 1 mL of pre-equilibrated Ni-NTA. All subsequent liquid chromatography steps 932 were performed on an Akta Purifier. Nucleic acid contaminants were removed by 933 directly binding CPSF4-YTH onto a 5 ml heparin (GE healthcare) and eluting by a 40 934 ml step NaCl gradient (150 mM to 2M) in a 50 mM tris pH 7.5, 2 mM BME, 2% 935 glycerol buffer system. Following elution from heparin, fractions of interest were 936 pooled and concentrated to 1 mL using 10 kDa cut-off concentrators before being 937 subsequently injected on an S75 column for size fractionation in a buffer containing 50 938 mM Tris pH: 7.5, 150 mM NaCl, 1 mM BME. Following the size exclusion step, final 939 fractions were pooled, concentrated to a minimum of 15 mg/ml with centricon 10 kDa 940 concentrators, flash frozen in liquid nitrogen and stored at -80°C. 941 942

Isothermal titration calorimetry 943
The affinity of at-YTH or tg-YTH for RNA or RNAm6A were determined using a 944 Crystal were subsequently fished out in cryo-loops (Hampton) and directly flash-frozen 967 in their crystallization mother liquor. All PEG and associated chemical formulations 968 were obtained from Sigma and purchased as chemical compounds. X-ray diffraction 969 data for m6A / CPSF4-YTH crystals were collected by the autonomous European  Minknow v20.06.5 and guppy v4.09. Basecalling was performed during the run using 1032 the fast-basecalling algorithm with a Q score cutoff >7. Long read alignment (to ME49-1033 Toxodb-13 and TAIR10 reference fasta files) was performed using Minimap2 (ver 2.1) 1034 with the following parameters: "-ax splice -k14 -uf -G 5000 -t 10 --secondary=no -sam-1035 hit-only" for Toxoplasma and "-ax splice -k14 -uf -G 20000 -t 10 --secondary=no -1036 sam-hit-only" for Arabidopsis. Aligned reads were converted to bam, sorted and 1037 indexed using Samtools. For T. gondii datasets, most sequencing runs were stopped 1038 after having generated between 400k to 500k of aligned reads to keep a standard of 1039 comparison (T. gondii reads varying between 30 to 70% of total mRNA depending on 1040 the preparation). 1041

MORC KD
Nanopore DRS read alignment 7 error sites Z score Figure 11