RNA-dependent chromatin association of transcription elongation factors and Pol II CTD kinases

For transcription through chromatin, RNA polymerase (Pol) II associates with elongation factors (EFs). Here we show that many EFs crosslink to RNA emerging from transcribing Pol II in the yeast Saccharomyces cerevisiae. Most EFs crosslink preferentially to mRNAs, rather than unstable non-coding RNAs. RNA contributes to chromatin association of many EFs, including the Pol II serine 2 kinases Ctk1 and Bur1 and the histone H3 methyltransferases Set1 and Set2. The Ctk1 kinase complex binds RNA in vitro, consistent with direct EF-RNA interaction. Set1 recruitment to genes in vivo depends on its RNA recognition motifs (RRMs). These results strongly suggest that nascent RNA contributes to EF recruitment to transcribing Pol II. We propose that EF-RNA interactions facilitate assembly of the elongation complex on transcribed genes when RNA emerges from Pol II, and that loss of EF-RNA interactions upon RNA cleavage at the polyadenylation site triggers disassembly of the elongation complex. DOI: http://dx.doi.org/10.7554/eLife.25637.001


Introduction
For productive transcription through chromatin, RNA polymerase (Pol) II associates with general elongation factors (EFs) (Perales and Bentley, 2009;Shilatifard, 2004;Shilatifard et al., 2003;Sims et al., 2004) that are recruited to the body of transcribed genes in yeast . EFs in yeast include Spt5 (a subunit of human DSIF), the histone chaperone Spt6, and the Paf1 complex (Paf1C). The Pol II C-terminal domain (CTD) kinases Bur1 (human CDK9) and Ctk1 (human CDK12), and their cyclin partners Bur2 and Ctk2, respectively, can also be classified as EFs. In addition, the histone methyltransferases Set1 (a subunit of the COMPASS complex), Set2, and Dot1, are recruited to elongating Pol II to set the 'active' histone marks H3K4me3, H3K36me3, and H3K79me3, respectively.
However, other recruitment mechanisms exist because mutations in EFs that prevent their polymerase interactions do not abolish gene occupancy of such factors, including Bur1, Paf1C subunits, Spt6, and Set2 (Ng et al., 2003;Qiu et al., 2012Qiu et al., , 2009Mayer et al., 2010;Zhou et al., 2009;Krogan et al., 2003b). Further, it remains unknown how the yeast CTD serine 2 (Ser2) kinase Ctk1 is recruited, which is apparently a prerequisite for recruitment of Spt6 and Set2, because these factors bind the Ser2-phosphorylated CTD (Dengl et al., 2009;Kizer et al., 2005;Li et al., 2003;Phatnani et al., 2004;Sun et al., 2010;Yoh et al., 2007). More generally, it is unknown whether and how EFs can distinguish transcribing Pol II from free or initiating polymerase based on polymerase interactions alone, in particular at an early stage of elongation when Ser2 phosphorylation is absent.
Here we report that most EFs in yeast, including, most notably, CTD Ser2 kinases and histone H3 methyltransferases, directly crosslink to nascent pre-mRNA in vivo. We find that crosslinking preferences can differ for coding RNAs and non-coding (nc) RNAs. We further show that chromatin association of many EFs depends on RNA. We also directly tested one prominent EF for RNA binding in vitro, and found that recombinant, purified Ctk1-containing kinase complex CTDK-I strongly binds RNA in the absence of other components. Moreover, we show that the N-terminal region of Set1 that contains two RNA recognition motifs (RRMs) is required for full Set1 recruitment to genes in vivo. Based on these results we suggest a model where nascent RNA contributes to the assembly and stability of the Pol II elongation complex. RNA-EF interactions provide a missing link for understanding the coordination of the transcription cycle.

Elongation factors directly crosslink to RNA in vivo
To investigate whether EFs interact with RNA in vivo, we used photoactivatable ribonucleosideenhanced crosslinking and immunoprecipitation (PAR-CLIP) (Hafner et al., 2010), a method that detects and maps direct protein-RNA interactions without chemical crosslinkers. We applied our recently optimized PAR-CLIP protocol (Baejen et al., 2014) to 14 EFs of the yeast Saccharomyces cerevisiae ( Table 1, Materials and methods). These EFs included Spt5, Spt6, the five Paf1C subunits Cdc73, Ctr9, Leo1, Rtf1, and Paf1, the kinases Bur1 and Ctk1, the cyclins Bur2 and Ctk2, and the histone methyltransferases Set1, Set2, and Dot1.
For 12 of these 14 EFs we obtained PAR-CLIP signals that were more than two-fold above background, showing that these EFs interact with RNA in vivo ( Figure 1, Figure 1-figure supplement 1A). We obtained between 42,000 and 520,000 high-confidence protein-RNA crosslinking sites per factor with p-values below 0.005 ( Table 1). The obtained data sets were highly reproducible (Figure 1-figure supplement 1B). To estimate background RNA binding, we collected PAR-CLIP data for the transcription initiation factor TFIIB that is recruited to promoter DNA before nascent RNA is made (Sainsbury et al., 2015). Only very low levels of background binding were observed, further emphasizing the significance of EF-RNA interactions detected by UV crosslinking.
We then classified EFs into factors with moderate and high PAR-CLIP signals, based on their fold enrichments (>2 and >4 fold, respectively) over background TFIIB signals ( Figure 1). Spt5, Set1, Ctk1, Spt6, Ctk2 and Bur1 showed high PAR-CLIP signals (Figure 1, Figure 1-figure supplement 1A, Table 1). EFs with moderate signals included Rtf1, Ctr9, Cdc73, Bur2, Set2 and Dot1. PAR-CLIP signals were clearly specific for individual subunits of known complexes. For instance, only the Paf1C subunits Rtf1, Cdc73 and Ctr9 bound RNA according to the PAR-CLIP results, and the same subunits bound radioactively labeled RNA after immunoprecipitation (Figure 1-figure supplement 1C). A very low background signal was observed for other subunits, whereas the enriched bands were due to the protein of interest. These data revealed that many EFs directly bind RNA in vivo, including Pol II Ser2 kinases and histone H3 methyltransferases.

Comparisons of PAR-CLIP data require normalization
We have previously noted the importance of normalizing the raw PAR-CLIP signal, as measured by the number of U-to-C transitions per U site, to account for differences in RNA abundance (Baejen et al., 2014). Briefly, the raw PAR-CLIP signal is proportional to the occupancy of the factor on RNA and to the concentration of RNAs covering the U site. Therefore, normalization is crucial to enable comparison of PAR-CLIP signals between individual transcripts and transcript classes. Relative occupancies can be estimated by dividing the observed PAR-CLIP signal by RNA-Seq reads that have been obtained under the same experimental conditions (Baejen et al., 2014). An alternative approach is to divide the observed PAR-CLIP signal by a PAR-CLIP signal obtained for Pol II (Baejen et al., 2017), although this is only suitable for proteins that associate with nascent RNA during transcription, which is the case for the EFs studied here.
In Figure 2 we investigate how the two different normalization methods affect EF occupancy profiles on mRNA transcripts. For two representative EFs, Ctk2 and Spt5, the raw data ( Figure 2A) was either normalized with RNA-Seq reads ( Figure 2B) or with reads from Pol II (Rpb1 subunit) PAR-CLIP data ( Figure 2C). Meta-transcript profiles are shown in Figure 2D. In the case of Ctk2, the raw data profile and the Pol II normalized profile look very similar, whereas the RNA-normalized profile shows slightly less occupancy of Ctk2 in the 3 0 part of the transcripts, due to the slightly higher RNA-Seq signal in this region ( Figure 2B, bottom). The PAR-CLIP signal for Spt5 is enriched around the 5 0 -end of mRNAs, decreases towards the 3 0 -end, and this was independent of the normalization approach ( Figure 2D, bottom). However, Spt5 signals peak just downstream of the pA site, and the size of this peak varies dependent on the normalization approach. This is due to the intrinsic instability of transcripts downstream of the pA site, which reduces the number of RNA-Seq reads, and artificially increases the PAR-CLIP peak after RNA-Seq-based normalization. Taken together, the PAR-CLIP metagene profiles over stable transcripts were largely independent of the type of normalization used, whereas normalization becomes very important when crosslinking to unstable RNAs is investigated. Indeed, when we compare meta-profiles over cryptic unstable transcripts (CUTs) versus stable mRNAs using the different normalization methods (Figure 2-figure supplement 1), we observe that for proteins that bind CUTs (e.g. Spt5) the relative signal over CUTs increases when total RNA-Seq reads are used for normalization, similarly as for unstable transcripts downstream of the pA site ( Figure 2D, bottom). Since we were interested in comparing EF occupancies between transcript classes, including unstable RNAs, we used Pol II PAR-CLIP normalization to calculate normalized EF PAR-CLIP occupancies, and used these for further analysis.

EF localization along mRNA transcripts
To localize EFs on transcripts, we mapped the Pol II normalized PAR-CLIP occupancies onto transcripts in different classes (Materials and methods). We then calculated factor occupancies for 2532 mRNA transcripts that were filtered to reduce ambiguous signals from overlapping transcripts. We calculated heat maps with occupancies averaged around the transcript 5 0 -end, which corresponds to the transcription start site (TSS), and around the polyadenylation (pA) site ( Figure 3A). The obtained profiles were also visible on individual transcripts (Figure 3-figure supplement 1A).
Generally, PAR-CLIP occupancies were high at the 5 0 -end of mRNAs and decreased shortly before the pA site, with few exceptions ( Figure 3A). First, the histone methyltransferases Set2 and Dot1, for which the corresponding methylation marks accumulate in gene bodies (Bannister et al., 2005;Pokholok et al., 2005), showed more RNA-binding sites over transcript bodies. Second, Set1 crosslinked to mRNAs mainly near the beginning of transcripts, which was expected since Set1 and its methylation mark, H3K4me3, are observed in promoter-proximal regions of genes (Ng et al., 2003). Third, the kinases Ctk1 and Bur1 and their cyclin partners Ctk2 and Bur2 were enriched near the 5 0 -  end but also in the transcript body. The 5 0 -peak for Bur1-Bur2 slightly preceded that of Ctk1-Ctk2. The three Paf1C subunits Cdc73, Ctr9 and Rtf1 showed similar occupancy profiles as the kinases but with a focused peak at the 5 0 -end. Fourth, Spt5 and Spt6 showed high PAR-CLIP occupancy at the 5 0 -end of mRNAs and decreased occupancy towards the pA site. This analysis revealed specific differences in EF localization on mRNAs, and additionally suggested that EFs bind nascent RNA during transcription.

EFs bind nascent pre-mRNA
To test whether EFs interact with nascent pre-mRNA or with spliced, mature mRNA, we measured factor occupancies at introns, which are co-transcriptionally spliced out and subsequently degraded (Carrillo Oesterreich et al., 2016). All EFs cross-linked to introns (Figure 3-figure supplement 1B), indicating that they bind pre-mRNA. Most EFs bound to introns with a frequency that was comparable to that at exons, although Spt5 and Set1 showed slightly higher occupancy within introns, whereas Bur2, Set2 and Dot1 showed lower occupancy (   Figure 1).
Only factors with average RNA-binding occupancies > 2 fold above background are shown. The Spt5 PAR-CLIP profile reveals a peak downstream of the pA site that is discussed in detail elsewhere (Baejen et al., 2017). The color code shows the occupancy relative to the maximum occupancy per profile (dark blue). (B) EFs bind to pre-mRNA. Processing indices (PIs) measure preferential binding of factors to uncleaved pre-mRNA with respect to cleaved RNA, computed as log2 odds ratios uncleaved versus cleaved RNA bound by the factor (Materials and methods). The PIs for Pab1 and Pub1, as typical factors binding mature mRNA (Baejen et al., 2014), are shown for comparison. (C) Colocalization of factor crosslinking sites on transcripts. Euclidean distances between pairwise colocalization measures were subjected to average-linkage hierarchical clustering (Materials and methods). The cluster dendrogram shows similarities in crosslinking locations on transcripts between EFs and published RNA processing factors (Baejen et al., 2014;Schulz et al., 2013). Figure 3 continued on next page Tennyson et al., 1995;Listerman et al., 2006), our data show that EFs interact with nascent pre-mRNA. However, only~4% of yeast genes contain introns (Qin et al., 2016), preventing general statements related to all pre-mRNAs. We therefore calculated a processing index (PI) that measures preferential binding of factors to uncleaved pre-mRNA with respect to cleaved RNA (Materials and methods) (Baejen et al., 2014). All EFs showed positive PIs, indicating binding to pre-mRNA, in contrast to the negative PIs that we previously obtained for typical RNA binders of processed, mature mRNA, such as Pab1 and Pub1 ( Figure 3B) (Baejen et al., 2014). We conclude that EFs preferentially interact with nascent pre-mRNA. We next investigated where EFs localize on RNAs in relation to previously mapped mRNA biogenesis factors (Baejen et al., 2014). We determined the extent of factor colocalization by computing the average occupancy of factor A within ±20 nucleotides (nt) around RNA-binding sites of factor B and subjected the pairwise colocalization measures to hierarchical clustering ( Figure 3C, Materials and methods). We found that Spt5 colocalizes with the Cbc2 subunit of the cap-binding complex, consistent with its recruitment during early elongation. Both Ctk1 and Bur1 colocalized with binding sites of Set1 and splicing factors. Paf1C subunits colocalized with Set2, whereas RNA 3 0 -processing and surveillance factors formed separate groups ( Figure 3C). Together these data show a distinct distribution of EFs over RNAs, and suggested that EFs cooperate with other mRNA biogenesis factors during pre-mRNA binding.

Most EFs preferentially interact with coding transcripts
We next analyzed our PAR-CLIP data for EF binding to non-coding Pol II transcripts including shortlived cryptic unstable transcripts (CUTs), which often arise from upstream antisense transcription of bidirectional promoters (Wyers et al., 2005;Xu et al., 2009). We selected CUTs with a minimum length of 350 nt and compared transcript-averaged RNA-binding occupancies between CUTs and mRNAs (see Figure 4A). This revealed that EFs bind to these transcript classes with distinct preferences. Spt5 was equally distributed between CUTs and mRNAs whereas Set1 preferentially bound mRNAs. All other EFs were depleted at CUTs relative to their mRNA occupancies ( Figure 4A). This was essentially independent of RNA length (Figure 4-figure supplement 1A). Thus, most EFs preferentially crosslink to coding RNAs.
We then analyzed PAR-CLIP signals at bidirectional promoters, which produce mRNA in one direction and a CUT in the divergent direction ( Figure 4B). We observed clear differences in PAR-CLIP signals for divergent directions. As in Figure 4A, Set1 and Spt5 showed high signals on CUTs and mRNAs ( Figure 4B, top) whereas all other EFs bound exclusively to mRNAs ( Figure 4B, bottom). These differences were also observed when the analysis was restricted to bidirectional promoters producing CUTs and mRNAs of similar lengths (Figure 4-figure supplement 1B).
How can some EFs distinguish between CUTs and mRNAs? We carried out motif analysis around the strongest PAR-CLIP sites for each EF using XXmotif (Luehr et al., 2012) and could not find any significantly enriched motifs, indicating that EFs bind RNA in a non-specific manner. We hypothesize that another RNA-binding factor blocks binding of EFs to CUTs. CUTs are rapidly degraded by a surveillance system, which includes Nrd1 (Schulz et al., 2013;Vasiljeva et al., 2008;Steinmetz and Brow, 1996). Nrd1 selectively binds to CUTs ( Figure 4C) via motifs that are enriched in CUTs compared to mRNAs (Schulz et al., 2013). Binding of Nrd1 to CUTs might hinder RNA binding of some EFs, especially those which possess lower RNA binding affinity. This may explain how stable elongation complexes preferentially assemble on mRNAs.

Chromatin association of EFs depends on RNA
We next investigated whether RNA binding of EFs contributes to their association with chromatin. Yeast cells were lysed and incubated with buffer containing RNases or with buffer only. Chromatin  was isolated and associated protein factors were detected by Western blotting (Materials and methods). We found that RNase treatment strongly decreased the levels of chromatin-associated enzymes Set1, Set2, Dot1, Bur1, Ctk1, and the cyclins Bur2 and Ctk2 ( Figure 5). Thus, RNA stabilizes chromatin association of these factors. Two non-enzymatic EFs also depended on RNA for chromatin association, although less strongly. With respect to Paf1C, Rtf1 was partially lost upon RNase treatment, whereas Leo1 and Paf1 were not significantly affected ( Figure 5). Spt5 binding to chromatin also depended on RNA, whereas Spt6 was not significantly affected by RNase treatment ( Figure 5). These data are generally consistent with our PAR-CLIP results. The discrepancies between RNAdependent chromatin association and PAR-CLIP results for Spt6 (high PAR-CLIP signal versus RNAindependent chromatin binding) and Dot1 (low PAR-CLIP signal versus strong RNA-dependent chromatin binding) can be explained by additional protein-protein interactions, and by the dependence of PAR-CLIP on the concentration of the RNA-interacting protein in the cell (Chong et al., 2015;Kulak et al., 2014). occupancy profiles for sense mRNA (right) and divergent CUT (left) were centered around their 5 0 -end (TSS) [À75 nt to +400 nt]. We considered only bidirectional promoters producing mRNAs and CUTs that did not overlap with any other transcripts in the depicted region. After normalization, average mRNA and CUT profiles were rescaled, setting the maximum occupancy to one and the minimum occupancy to 0 (see also  Western blot analysis (top) and quantitative densitometry (bottom) of exemplary EFs bound to chromatin before and after treatment with RNase A/T1 mix. H3 was used as loading control. Densitometry data are expressed as mean ± SD from two to three independent experiments. *p<0.05; **p<0.01; ***p<0.001; n.s. = not significant (one-way ANOVA Dunnett post-hoc test). DOI: 10.7554/eLife.25637.011 As a negative control, we subjected TFIIB to the RNase assay. We observed no differences in chromatin binding after RNase treatment ( Figure 5), consistent with recruitment of TFIIB to DNA during transcription initiation (Sainsbury et al., 2015). Also as expected, RNase treatment did not affect association of Pol II with chromatin, showing that the observed losses of EFs from chromatin upon RNase treatment were not due to a loss of Pol II ( Figure 5). These controls and the above results show that the association of many EFs with chromatin depends on RNA.

Ctk1 kinase complex binds RNA in vitro
The observed RNA-EF crosslinking in vivo and the RNA-dependent chromatin association data strongly suggested that EFs can directly bind RNA. To investigate this in vitro, we prepared one EF complex in recombinant form. We chose the prominent Ser2 kinase complex CTDK-I that comprises Ctk1, Ctk2, and the small subunit Ctk3 (Mühlbacher et al., 2015;Sterner et al., 1995). CTDK-I is the main yeast kinase responsible for phosphorylating the Pol II CTD at Ser2 (Patturajan et al., 1999;Cho et al., 2001), and this is a decisive event in establishing a mature Pol II elongation complex. Further, RNA-dependent chromatin association of Ctk1 and Ctk2 were most unexpected, as for several other EFs RNA interactions were already reported (compare introduction).
We co-expressed recombinant Ctk1, Ctk2, and Ctk3 in insect cells and purified a complete, intact CTDK-I complex (Materials and methods, Figure 6A). We then tested the purified CTDK-I complex for its kinase activity using a purified GST-CTD construct and dephosphorylated full-length S. cerevisiae Pol II (Materials and methods). Both the GST-CTD and the Rpb1 subunit of Pol II were readily phosphorylated by CTDK-I at the Ser2 position in vitro ( Figure 6B,C), showing that our purified CTDK-I complex was active.
We then tested the purified CTDK-I complex for RNA binding in vitro. We performed fluorescence anisotropy titration experiments using single-stranded (ss) RNA oligonucleotides with 45% or 24% GC content and bearing a 5 0 FAM label ( Figure 6D,E). CTDK-I bound both ssRNAs with similar affinities ( Figure 6D). We also tested U-or A-rich sequences for association with CTDK-I and found some preference for U-rich RNA ( Figure 6E, Figure 6-figure supplement 1). Fitting the data with binding curves by linear regression resulted in apparent K d 's in the nanomolar range ( Figure 6D,E). All experiments were done in the presence of tRNA as competitor, indicating that flexible, singlestranded nucleic acids are preferentially bound. Consistent with this, CTDK-I bound to duplex DNA much more weakly (dsDNA, Figure 6E). These experiments show that the EF complex CTDK-I binds to single-stranded RNA in vitro, consistent with direct EF-RNA interactions in vivo.

Evidence that RNA contributes to EF recruitment
We also measured gene occupancies of EFs using ChIP-Seq and compared them with our PAR-CLIP occupancies ( Figure 7). The obtained ChIP-Seq data sets were highly reproducible (Figure 7-figure supplement 1). For comparability with PAR-CLIP data, we collected ChIP-Seq data, although ChIP data are available for single genes or genome-wide using various other techniques or set-ups (Keogh et al., 2003;Kim et al., 2004;Kizer et al., 2005;Krogan et al., 2003b;Liu et al., 2005;Mayer et al., 2010;Ng et al., 2003;Pokholok et al., 2005;Weiner et al., 2015). Metagene analysis of our ChIP-Seq data revealed that EF occupancy increased within 100-600 bp downstream of the TSS, and was generally high in gene bodies (Figure 7, red lines). In contrast, PAR-CLIP results showed that EFs interacted with RNA already from around 20 nt downstream of the capped 5 0 -end of mRNAs (Figure 7, blue lines). This difference was most pronounced for Set2, which occupies transcripts at the 5 0 -end but showed peak levels of genome association only in the downstream region, with peak levels 450-300 bp upstream of the pA site. These results are consistent with the idea that RNA contributes to EF recruitment to transcribed genes, and that the contribution of RNA-based recruitment differs for different EFs.
Comparison of our histone methyltransferase PAR-CLIP data sets with ChIP-Seq data of the corresponding methylation marks (Figure 7, left, orange lines) provides further support of the model that RNA binding can contribute to EF recruitment to transcribed regions. In the direction of transcription, the PAR-CLIP signals for methyltransferases increased first, followed by an onset of ChIP-Seq signals for the respective histone methylation marks, which in turn preceded the increase in ChIP-Seq signals for the enzymes (Figure 7, left). This sequence of signal onsets is consistent with the model that these EFs are recruited by nascent RNA and then modify histones as Pol II moves downstream.
To test the model of RNA-based recruitment for a particular factor, we investigated whether Set1 gene occupancy depends on the N-terminal region of the protein that contains two RNA recognition motifs (RRMs, residues 247-375 and 376-579) that bind RNA in vitro (Trésaugues et al., 2006). We performed ChIP-qPCR analysis for Set1 in a strain lacking its N-terminal residues 1-579 (DRRM-Set1-TAP, Materials and methods) (Figure 8). Additionally, we carried out Set1 ChIP-qPCR in a mutant lacking Paf1 (DPaf1 Set1-TAP) because the Paf1 complex was shown to contribute to Set1 recruitment (Krogan et al., 2003a). We compared Set1 gene occupancy levels of both mutant strains (DRRM-Set1-TAP and DPaf1 Set1-TAP) with the full-length protein occupancy in a Set1-TAP strain. All strains expressed similar levels of Set1 and the Pol II subunit Rpb3 ( Figure 8A). We analyzed protein occupancy at different genomic regions of four housekeeping genes ( Figure 8B) and at a nontranscribed region of chromosome V. Gene regions within the first 1000 bp downstream of the TSS showed a severe decrease in DRRM-Set1 occupancy ( Figure 8C; genomic regions 1, 2, 4 and 6). Similarly, we also detected a decrease in Set1 occupancy in the absence of Paf1, confirming the role of Paf1C in Set1 recruitment ( Figure 8C). These results indicate that Set1 recruitment not only depends on the Paf1 complex, but also on binding to nascent RNA. Taken together, several lines of evidence presented here strongly suggest that interactions of EFs with nascent RNA contribute to EF recruitment to actively transcribed genes in vivo.

Discussion
Here we present a large set of system-wide occupancy data for yeast transcription elongation factors on RNA (PAR-CLIP) and DNA (ChIP-Seq), and complementary biochemical data. The remarkable finding from our work is that many EFs interact with nascent RNA in vivo. Additional in vitro results support these findings and indicate that RNA can contribute to EF recruitment and the stability of the transcription elongation complex. For Set1 we further demonstrate that the two RNA recognition motifs are required for Set1 recruitment to genes in vivo. These results extend our understanding of how the transcription elongation complex is assembled and maintained on active genes. The emerging view from our data is that nascent RNA contributes to EF recruitment and elongation complex stability to different extents for different EFs. We note that our results do not reveal whether all EFs studied here are initially recruited by RNA, and which EFs establish RNA interactions only after they have been recruited by alternative interactions, although EF binding in the very 5 0 -region of transcripts argues for a RNA-based recruitment model. Our results also elucidate the long-standing question how the yeast CTD Ser2 kinases Ctk1 and Bur1, which are essential for transcription elongation, are recruited to transcribing Pol II. The Pol II Ser2 kinases give rise to strong PAR-CLIP signals and their chromatin association is strongly dependent on RNA. In addition, we show that purified CTDK-I complex strongly binds to RNA in vitro. This all indicates that nascent RNA plays an important role in recruiting Ser2 kinases to transcribing Pol II. Binding of the Ser2 kinases near the RNA 5 0 -end is consistent with stabilization of these kinases on Figure 8. Deletion of Set1 RRMs impairs its recruitment to genes. (A) Western blot analysis of Set1-TAP (top) and Rpb3 (bottom) in a Set1-TAP strain (left), a strain lacking the first 579 amino acids of Set1 (DRRM-Set1-TAP; middle) and a DPaf1 Set1-TAP strain (right); bands are shown for biological duplicates of yeast cell cultures before formaldehyde crosslinking. Set1 was detected using an antibody directed against its C-terminal TAP tag. As a control, Pol II was detected using an antibody against the Rpb3 subunit in all three strains. (B) Schematic localization of gene regions analyzed via ChIP-qPCR. Set1 recruitment was monitored at one gene region of ADH1 (1) and two different gene regions of ILV5 (2 and 3), PDC1 (4 and 5) and PMA1 (6 and 7). (C) ChIP analysis reveals that Set1 occupancy is reduced in DPaf1 cells (DPaf1 Set1-TAP) as well as in a truncated version of Set1 that lacks its RRM domains (DRRM-Set1-TAP). ChIP data are expressed as mean ± SD from two independent experiments. *p<0.05; **p<0.01 (two sample t-test). DOI: 10.7554/eLife.25637.016 the elongation complex by the cap-binding complex (Hossain et al., 2013;Lidschreiber et al., 2013). A model of kinase recruitment by capped RNA predicts that these enzymes are lost from the transcribing enzyme upon RNA cleavage at the pA site, and this is indeed observed by ChIP-Seq. In conclusion, RNA-based recruitment of Ser2 kinases explains why Ser2 phosphorylation of the CTD is restricted to transcribing polymerases, whereas free or initiating polymerases are not phosphorylated at Ser2 residues.
How can some EFs bind both RNA and Pol II? EFs are generally modular and contain multiple domains that can be involved in RNA or protein interactions. However, the same domain can mediate both RNA and protein interactions, as documented for the RNA export factor Yra1, which contains a RNA recognition motif (RRM) domain that binds both RNA and the phosphorylated CTD (MacKellar and Greenleaf, 2011). Set1 contains two adjacent RRM domains (Trésaugues et al., 2006), and Set2 contains a SRI domain that binds the phosphorylated CTD (Dengl et al., 2009;Sun et al., 2010;Yoh et al., 2007;MacKellar and Greenleaf, 2011), but may also bind RNA. The three Paf1C subunits that bind RNA in vivo, namely Cdc73, Ctr9 and Rtf1, also interact with the phosphorylated CTD and the phosphorylated C-terminal region (CTR) of Spt5 in vitro (Qiu et al., 2012). Rtf1 contains a positively charged Plus-3 domain (Finn et al., 2014) that binds the phosphorylated CTR (Wier et al., 2013) and single-stranded nucleic acids (de Jong et al., 2008). We predict that many EFs contain domains that can interact with RNA or with the phosphorylated CTD or CTR, which resemble RNA in its flexible nature and negative charge. Whereas for some EFs binding to RNA or the CTD may be mutually exclusive, others can bind both Pol II and RNA at the same time, for example Spt5. Due to a lack of solubility of individually expressed EF subunits, and the difficulty of preparing EF complexes in recombinant and pure form in large quantities, we had to limit our in vitro RNA-binding analysis to CTDK-I.
Finally, we predict that RNA-based recruitment of EFs provides a missing link in our understanding of how the transcription cycle is coordinated. When the initiation complex assembles at the promoter, TFIIH phosphorylates Ser5 residues in the CTD and this enables recruitment of the capping enzyme (Cho et al., 1997;Fabrega et al., 2003;Rodriguez et al., 2000;Schroeder et al., 2000;Schwer and Shuman, 2011). The nascent RNA then receives a 5 0 -cap (Martinez-Rucobo et al., 2015), and capped RNA could then help to recruit EFs. The requirement for a cap on RNA befits the observation that Ser5 phosphorylation is needed for high gene occupancy with some EFs (Qiu et al., 2012(Qiu et al., , 2009(Qiu et al., , 2006Ng et al., 2003). RNA-based recruitment of the major Ser2 kinase, Ctk1, would then lead to CTD phosphorylation on Ser2 residues and stable binding of other EFs. Eventually, transcription of a pA site triggers RNA cleavage, and this would facilitate loss of many RNA-bound EFs and render the polymerase prone to transcription termination. Thus, the transcribing Pol II complex may be viewed as a self-organizing system that is encoded in the DNA, but only realized on the level of RNA, which plays crucial roles in complex assembly and disassembly.

Strains and antibodies
Saccharomyces cerevisiae (Sc) BY4741 strains containing C-terminally TAP-tagged genes (Open Biosystems, Germany) were tested for expression of the correctly tagged protein using the Peroxidase Anti-Peroxidase (PAP; Sigma, P1291, St. Louis, MO) antibody. To obtain the DPaf1 Set1-TAP strain, a DPaf1-KanMX6 cassette, amplified from the pFA6a-KanMX6 vector (Supplementary file 1), was introduced by homologous recombination into a Set1-TAP strain. A DNA fragment coding for Set1 residues 580 to 1080 and a C-terminal TAP tag was amplified from genomic DNA from a Set1-TAP strain (Supplementary file 1) and transformed into a DSet1 strain (Open Biosystems) by homologous recombination. Additional antibodies used were anti-Histone H3 (HRP; Abcam, ab21054, UK), anti-Histone H3 (tri methyl K4; Abcam, ab8580), anti-Histone H3 (tri methyl K36; Abcam, ab9050), anti-Histone H3 (tri methyl K79; Abcam, ab2621), IgG from rabbit serum (directed against the protein A content of the C-terminal TAP tag of proteins; Sigma, I5006), anti-rat IgG (HRP; Sigma, A9037) and anti-Ser2P (3E10; kindly provided by Dirk Eick [Chapman et al., 2007]). PAR-CLIP of S. cerevisiae proteins PAR-CLIP was performed as described (Baejen et al., 2014;Schulz et al., 2013), with modifications. The full protocol is described here for convenience. Yeast cells expressing the TAP-tagged protein were grown at 30˚C to OD 600~0 .5 in minimal medium (CSM mixture, Formedium, UK) containing 10 mg/L uracil, 100 mM 4-thiouracil (4tU) and 2% glucose. 4-Thiouracil was added to a final concentration of 1 mM and cells were grown further for 4 hr. Following RNA labeling, cells were harvested, resuspended in 1Â PBS and UV-irradiated on ice with an energy dose of 12 J/cm 2 at 365 nm under continuous shaking. Cells were harvested, resuspended in lysis buffer (50 mM Tris-HCl pH 7.5, 100 mM NaCl, 0.5% sodium deoxycholate, 0.1% SDS, 0.5% NP-40), and disrupted by bead beating (FastPrepÀ24 Instrument, MP Biomedicals, LLC., France) in the presence of 1 mL of silica-zirconium beads (Roth, Germany) for 40 s at 4 m/s, followed by an incubation of the sample for 1 min on ice. This was repeated eight times. The success of the cell lysis was monitored by photometric measurements and the cell lysis efficiency was usually >80%. Samples were solubilized for 1 min via sonication with a Covaris S220 instrument (Covaris, UK) using following parameters: Peak Incident Power (W): 140; Duty Factor: 5%; Cycles per Burst: 200. The lysate was cleared by centrifugation. Immunoprecipitation was performed on a rotating wheel overnight at 4˚C with rabbit IgG-conjugated Protein G magnetic beads (Invitrogen, Germany). Beads were washed twice in wash buffer (50 mM Tris-HCl pH 7.5, 500 mM NaCl, 0.5% sodium deoxycholate, 0.1% SDS, 0.5% NP-40) and once in T1 buffer (50 mM Tris-HCl pH 7.5, 2 mM EDTA). Immunoprecipitated and crosslinked RNA was partially digested with 50 U of RNase T1 per mL for 20 min at 25˚C and 400 rpm. Beads were washed twice in T1 buffer and phosphatase reaction buffer (50 mM Tris-HCl pH 7.0, 1 mM MgCl 2 , 0.1 mM ZnCl 2 ). For dephosphorylation, 1Â antarctic phosphatase reaction buffer (NEB, Germany) with 1 U/mL of antarctic phosphatase and 1 U/mL of RNase OUT (Invitrogen) were added and the suspension was incubated at 37˚C for 30 min and 800 rpm. Beads were washed once in phosphatase wash buffer (50 mM Tris-HCl pH 7.5, 20 mM EGTA, 0.5% NP-40) and twice in polynucleotide kinase (PNK) buffer (50 mM Tris-HCl pH 7.5, 50 mM NaCl, 10 mM MgCl 2 ). Beads were resuspended in 1 Â T4 PNK reaction buffer A (Fermentas, Germany) with a final concentration of 1 U/mL T4 PNK and 1 U/mL RNase OUT. Phosphorylation of PAR-CLIP samples was performed using either 1 mM ATP per mL (cold-labeling) or 0.5 mCi of gamma-32-P-ATP per mL (radioactive labeling). The bead suspension was incubated for 1 hr at 37˚C and 800 rpm and washed in PNK buffer. For visualization of protein-RNA interactions, the radioactively labeled samples were subjected to SDS-PAGE analysis. Radioactive RNA-protein bands were detected with the Typhoon FLA 9500 instrument (Typhoon, Sweden).
We calculated the P-values for true crosslinking sites as described (Baejen et al., 2014). Briefly, we had to quantitatively model the null hypothesis, that is, the probability that the T-to-C mismatches observed in reads covering a certain T nucleotide in the genome were not caused by crosslinks between the immunoprecipitated factor and RNA but are due to the other sources of mismatches. Owing to the exquisite sensitivity of our experimental PAR-CLIP procedure, we could set a very stringent P-value cut-off of 0.005 and a minimum coverage threshold of two. For true crosslinking sites passing our stringent thresholds, the PAR-CLIP-induced T-to-C transitions strongly dominate over the contributions by sequencing errors and SNPs. For any given T site in the transcriptome, the number of reads showing the T-to-C transition is proportional to the occupancy of the factor on the RNA times the concentration of RNAs covering the T site. Therefore, the occupancy of the factor on the RNA is proportional to the number of reads showing the T-to-C transition divided by the concentration of RNAs covering the T site. This concentration was estimated either from the RNA-Seq read coverage measured under comparable conditions as described (Baejen et al., 2014) or by the read coverage obtained from a Rpb1 PAR-CLIP experiment (this study) and was used to obtain normalized occupancies. We compared RNA and Pol II (Rpb1) normalized occupancy profiles and found that the latter were less prone to biases introduced due to difficulties in measuring unstable RNA species, including CUTs, introns and nascent transcripts downstream of the pA site.

PAR-CLIP data analysis
For transcript annotation, we used the recent TIF-Seq data from (Pelechano et al., 2013) to derive TSS and pA site annotations for 5578 coding genes. TSS and TTS positions of non-coding RNAs were taken from (Xu et al., 2009) for CUTs and from the Saccharomyces Genome Database (SGD, version = R64-2-1) for snoRNAs. Annotated transcripts were distance-filtered for downstream analysis to reduce ambiguous signals from overlapping transcripts. Unless stated otherwise, mRNA transcripts were selected to be at least 150 nt away from neighboring transcripts on the same strand. Unless stated otherwise, mRNAs and CUTs were selected to be 800-5000 nt and 350-1500 nt long, respectively. Bidirectional promoters were selected as follows: distance between TSS of mRNAs and divergent CUTs was smaller than 350 bp. Moreover, only mRNAs and CUTs that did not overlap with any other transcripts in the region from their TSS to 400 nt downstream on the same strand were considered.
To generate transcript class-averaged heat maps and profiles, transcripts were aligned at their 5 0end ('TSS') and pA sites and either scaled to the same length (median) or cut around the TSS and pA sites before taking the average RNA-binding occupancy at each genomic position. Average occupancies were smoothed (sliding window averaging using a 61 nt window, 30 nt to either side of the current position) and for each factor individually re-scaled between 0 (minimum signal) and 1 (maximum signal) for all figures but Figure 1-figure supplement 1A, for which all factors were globally scaled to show the relative strength of factor binding. To compare averaged RNA-binding occupancies between transcript classes, they were scaled together by setting min (transcript class 1, transcript class 2) to 0 and max (transcript class 1, transcript class 2) to 1 (Figure 2-figure supplement 1, Figure 4 and Figure 4-figure supplement 1).
For generation of non-averaged heat maps of filtered mRNAs (Figure 2 and Figure 3-figure supplement 1A) transcripts were sorted by length and aligned at their 5 0 -end ('TSS'). Smoothed occupancies were binned in cells of 20 nucleotide positions times 10 transcripts to avoid aliasing effects due to limited resolution of the plots. The color code displays the occupancy of the PAR-CLIPped factor (with the 97% quantile of these bins scaled to 1). In Figure 3-figure supplement 1B, all introns (SGD annotation) with lengths between 150 and 650 nt were aligned at the 5 0 -splice site (5 0 SS) and the occupancy of each intron is displayed without binning in either x or y direction. PAR-CLIP processing indices (PIs) ( Figure 3B) were calculated essentially as described (Baejen et al., 2014;Schulz et al., 2013). We assume that read counts (not crosslinking sites) N down downstream of a pA site can only occur from pre-mRNAs, N down ¼ N prem , whereas read counts N up upstream of a pA site are a mixture of mature mRNA counts N mat and pre-mRNA counts N prem . Therefore, N up ¼ N mat þ N prem . For increased robustness with regard to different transcript isoforms and uncertainties in the exact location of pA sites, we computed N up i and N down i as average of the read counts for each transcript i of a given annotation A: Transcriptome wide averages of N up and N down are defined as Finally the processing index is given by Colocalization analysis ( Figure 3C) was done as described (Baejen et al., 2014), with modifications. Briefly, to calculate the tendency of pairs of factors A and B to bind locations in the transcriptome near each other, we computed the average occupancy of factor B within ±20 nt of occupancy peaks of factor A (unsmoothed occupancy data). First, crosslink sites of factor A are sorted according to their occupancy and the strongest n = 3000 sites are selected. For each crosslink site a i of this selection the maximum occupancy value of factor B m B i is identified based on the occupancies of factor B 20 nt AE around a i . The average colocalization c is then given by 1=n P n i m B i . Next, the background binding b of factor B is defined as the median of all occupancies of factor B. The colocalization is defined as log2 c=b ð Þ. Finally, we constructed a data matrix containing the calculated colocalization values between all EF pairs. After data normalization the derived colocalization dissimilarity matrix (Euclidean distance) was subjected to average-linkage hierarchical clustering ( Figure 3C).

ChIP-Seq
ChIP was performed as described , with modifications. Yeast strains were grown in 600 mL YPD medium to mid-log phase (OD600,~0.8). Cell cultures were treated with formaldehyde (1%, Sigma, F1635) for 20 min at 20˚C. Crosslinking was quenched with 75 mL of 3 M glycine for 5 min at 20˚C. All subsequent steps were performed at 4˚C with pre-cooled buffers and in the presence of a fresh protease-inhibitor mix (1 mM Leupetin, 2 mM Pepstatin A, 100 mM Phenylmethylsulfonyl fluoride, 280 mM Benzamidine). Cells were collected by centrifugation, washed with 1 Â TBS (20 mM Tris-HCL at pH 7.5, 150 mM NaCl) and twice with lysis buffer (50 mM HEPES-KOH at pH 7.5, 150 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% Na deoxycholate, 0.1% SDS). Cell pellets were resuspended in 2 mL lysis buffer and cell lysis performed as described above for PAR-CLIP.
Chromatin was washed with lysis buffer and solubilized via sonication with a Covaris S220 instrument (Covaris) to yield an average DNA fragment size of 200 bp as determined on an Agilent 2200 TapeStation instrument (see below). This was achieved by sonicating the sample for 18 min using the following parameters: Peak Incident Power (W): 140; Duty Factor: 5%; Cycles per Burst: 200. 30 mL of the washed and fragmented chromatin samples were saved as input material and for control of the average chromatin fragment size. The remaining chromatin sample was immunoprecipitated with 100 mL antibody-coated and prewashed magnetic Dynabeads Protein G (Life Technologies, UK ) at 4˚C for 3 hr (ChIP of TAP-tagged proteins) or overnight (ChIP with protein-specific antibodies) on a turning wheel. Immunoprecipitated chromatin was washed five times with ChIP wash buffer (100 mM Tris-HCl at pH 7.5, 500 mM LiCl, 1% NP-40, 1% Na deoxycholate) and one time with TE buffer (10 mM Tris-HCl at pH 7.5, 1 mM EDTA). Immunoprecipitated chromatin was eluted for 10 min at 70˚C in the presence of ChIP elution buffer (100 mM Sodium bicarbonate, 1% SDS). Eluted immunoprecipitated chromatin as well as input material were incubated with 10 mL RNase A (10 mg/ mL) at 37˚C for 30 min and subsequently subjected to Proteinase K (20 mL of 20 mg/mL Proteinase K, Bioline, BIO-37084, Germany) digestion at 37˚C for 2 hr and reversal of crosslinks (at 65˚C overnight).
IP DNA and input samples were purified with the QIAquick MinElute PCR Purification Kit (Qiagen, Germany) according to the manufacturer's instructions. Elution was performed adding three times 15 mL H 2 Odd to the columns with a 5 min incubation time in between. The average chromatin fragment size for each experiment was verified using 1 mL of the purified input samples on an Agilent 2200 TapeStation instrument using a D1000 ScreenTape (Agilent Technologies). DNA concentration of the IP and Input samples was determined with Qubit 1.0, dsDNA HS (Invitrogen). Before preparing libraries for Illumina sequencing, IP and input samples were analyzed via qRT-PCR on four housekeeping genes to assess quality of sample DNA (see below). For Illumina sequencing of ChIP samples, 1-10 ng of IP or Input DNA were used for library preparation according to the manufacturer's recommendations using the ThruPLEX DNA-Seq Kit (Rubicon Genomics, Inc., Ann Arbor, MI). Libraries were qualified on an Agilent 2200 TapeStation instrument and quantified with Qubit 1.0. Libraries were pooled and sequenced on an Illumina HiSeq 1500 sequencer.

Quantitative real-time PCR
For ChIP experiments, input and immunoprecipitated (IP) samples were analyzed by qPCR to assess the extent of protein occupancy at different genomic regions. Primer pairs directed against promoter, coding and terminator regions of the housekeeping genes PMA1, PDC1, MUP1, ILV5, ADH1 and ALD5 as well as against a heterochromatic control region of chromosome V (chrV) were designed and the corresponding PCR efficiencies determined. All primer pairs used in this study had PCR efficiencies in the range of 95-100%. PCR reactions contained 1 mL DNA template, 1.6 mL of 10 mM primer pairs and 10 mL iQ SYBR Green Supermix (Bio-Rad, Hercules, CA). Quantitative PCR was performed on a qTOWER 2.2 Real-Time System (Analytik Jena AG, Germany) using a 3 min denaturing step at 95˚C, followed by 40 cycles of 15 s at 95˚C, 30 s at 61˚C and 15 s at 72˚C. Threshold cycle (Ct) values were determined using the corresponding qPCRsoft 3.1 software. Percent input was determined for each IP sample: 100*2^(Adjusted input -Ct (IP); Adjusted input: Raw Ct Input -log2 of Adjusted input to 100%. Sequence information of primer pairs is available in Supplementary file 1.

ChIP-Seq data processing and analysis
Paired-end 50 bp reads were aligned to the S. cerevisiae genome (sacCer3, version 64.2.1) using the short read aligner Bowtie (version 2.2.3) (Langmead and Salzberg, 2012). SAMTools was used to quality filter SAM files . Alignments with MAPQ smaller than 7 (-q 7) were skipped and only proper pairs (-f99, -f147, -f83, -f163) were selected. The BEDTools toolset (Quinlan and Hall, 2010) was used to obtain coverage tracks that were subsequently imported into R/Bioconductor where further processing of the data was carried out. Normalization between IP and Input was done using the signal extraction scaling (SES) factor obtained with the estimateScaleFactor function from deepTools (Ramírez et al., 2014) with options: -l 100 -n 100000 and the median fragment size (-f) estimated from the data (around 200 bp). ChIP enrichments were obtained by dividing SES-normalized IP intensities by the corresponding input intensities: log2(IP/Input).
The same transcript annotations as for PAR-CLIP data analysis (see above) were used for ChIP-Seq data analysis, except that filtering criteria had to be more stringent due to the lack of strandspecificity and lower resolution of ChIP-Seq data. Thus, for Figure 7, the distance filtering between transcripts was increased to 200 bp and transcripts on both strands were considered.

Chromatin association assay
Yeast cultures were grown in 200 mL of YPD medium at 30˚C to mid-log phase (OD 600 , 0.8). Subsequent steps were performed at 4˚C with precooled buffers and in the presence of a fresh proteaseinhibitor mix. Cells were collected by centrifugation, washed with 1Â TBS buffer and with lysis buffer (50 mM Tris-HCl pH 7.5, 100 mM NaCl, 0.5% sodium deoxycholate, 0.1% SDS, 0.5% NP-40). Cell pellets were flash frozen in liquid nitrogen and stored at À80˚C. Pellets were thawed, resuspended in 1 mL lysis buffer, and disrupted via beat beating (see PAR-CLIP Materials and methods). The lysate was divided into two samples. One half was treated with 7.5 U of RNase A and 300 U of RNase T1 (Ambion, UK); the other half was treated with the same volume of the RNase storage buffer (10 mM HEPES pH 7.5, 20 mM NaCl, 0.1% Triton X-100, 1 mM EDTA, 50% glycerol). After 30 min incubation at room temperature, chromatin was isolated by centrifugation at 15,000 rpm for 15 min. Chromatin was solubilized in 1 mL lysis buffer via sonication with a Covaris S220 instrument (COVARIS, INC.). Chromatin solutions were then analyzed by SDS-PAGE and Western blotting against the C-terminal TAP tag of the analyzed factor and against H3, which served as loading control. We performed three independent biological replicates for TFIIB, Rpb1, Rtf1, Paf1, Ctk2, Bur1, Set2 and Spt6 and two independent biological replicates for Leo1, Ctk1, Bur1, Set1, Dot1 and Spt5. Band intensities were quantified using ImageJ software (1.49v). For statistical analysis, multiple group comparisons were done by one-way ANOVA with Dunnett post-hoc test. Data are presented as mean ± SD. Differences were considered significant when p<0.05 (*p<0.05; **p<0.01; ***p<0.001; n.s. = not significant).

Cloning and expression of S. cerevisiae CTDK-I protein complex
The full-length subunits of the CTDK-I complex, Ctk1, Ctk2 and Ctk3 were amplified from genomic yeast DNA and cloned into modified pFastBac vectors via ligation independent cloning (LIC) (a gift of Scott Gradia, UC Berkeley, vectors 438-A, 438 C (Addgene: 55218, 55220)). Ctk2 bears an N-terminal 6x His MBP tag followed by a tobacco etch virus (TEV) protease cleavage site. Individual subunits were combined into a single plasmid by successive rounds of ligation independent cloning. Each subunit is preceded by a PolH promoter and followed by an SV40 termination site. Purified plasmid DNA (0.5 mg) was electroporated into DH10EMBacY cells to generate bacmids (Berger et al., 2004). Bacmids were prepared from positive clones by isopropanol precipitation and transfected into Sf9 cells (ThermoFisher, UK) grown in Sf-900 III SFM (ThermoFisher) with X-tremeGENE9 transfection reagent (Sigma) to generate V0 virus. V0 virus was harvested 72 hr after transfection. V1 virus was produced by infecting 25 mL of Sf21 cells (Expression Technologies, UK) grown at 27˚C, 300 rpm with V0 virus (1E6 cell/mL, 1:50 (v/v) cells:virus). V1 viruses were harvested 48 hr after proliferation arrest and stored at 4˚C. For protein expression, 600 mL of Hi5 cells (Expression Technologies) (1E6/mL) grown in ESF921 medium (Expression Technologies) were infected with 200 mL of V1 virus and grown for 72 hr at 27˚C. Cells were harvested by centrifugation (238x g, 4˚C, 30 min), resuspended in lysis buffer at 4˚C (400 mM NaCl, 20 mM Na.HEPES pH 7.4, 10% glycerol (v/v), 1 mM DTT, 30 mM imidazole pH 8.0, 0.284 mg/mL leupeptin, 1.37 mg/mL pepstatin A, 0.17 mg/mL PMSF, 0.33 mg/mL benzamidine), snap frozen, and stored at À80˚C. All insect cell lines were not tested for mycoplasma contamination and the identity of the cells was not confirmed.
Fluorescence anisotropy assays with CTDK-I 5 0 -FAM labeled ssRNA and dsDNA were obtained from Integrated DNA Technologies and dissolved in water to 100 mM. Sequences for ssRNA were 24% GC, A-rich (AAUAUUCAAGACGAUUUA-GACGAUAAUAUCAUA), 24% GC, U-rich (AUGUUGUAUGAUAUCUUGCUAACUUAAUUUGAU), 45% GC, A-rich (AAGCAGCCAAACAAGCAGUCAACAUCAAGUCGU) and 45% GC, U-rich (UUCG UCGGUUUGUGCGUCAGUUGUAGUUCAUCA). The dsDNA sequence corresponds to the 45% GC, A-rich ssRNA sequence. 24% GC, A-rich, 24% GC, U-rich and 45% GC, A-rich sequences correspond to natural coding sequences in S. cerevisiae. RNA oligos were unfolded by incubating the RNA at 95˚C for 1 min and transferring to ice for 10 min. The oligonucleotide sequences were diluted in water for all experiments. Purified CTDK-I was serially diluted in two fold steps in dilution buffer (200 mM NaCl, 20 mM Na.HEPES pH 7.4, 1 mM DTT and 10% glycerol). Nucleic acids (8 nM final concentration) were added on ice and the reaction was incubated for 10 min. The assay was brought to a final volume of 30 mL and incubated for 20 min at RT in the dark (final conditions: 100 mM NaCl, 2 mM MgCl 2 , 20 mM Na . HEPES pH 7.4, 1 mM TCEP, 4% glycerol, 0.01 mg/mL BSA and 5 mg/mL yeast tRNA (Sigma) as a competitor for non-specific binding). 18 mL of each reaction were transferred to a Greiner 384 Flat Bottom Black Small volume plate. Fluorescence anisotropy was measured at 30˚C with an Infinite M1000Pro reader (Tecan, Switzerland) with an excitation wavelength of 470 nm (±5 nm), an emission wavelength of 518 nm (±20 nm) and a gain of 63. All experiments were done in triplicate and analyzed with GraphPad Prism Version 7. Binding curves were fitted with a single site quadratic binding equation: where B max is the maximum specific binding, L is the concentration of nucleic acid, x is the concentration of CTDK-I, K d,app is the apparent disassociation constant for CTDK-I and nucleic acid. Error bars represent the standard deviation from the mean of three experimental replicates.

Accession numbers
Data have been deposited in NCBI's Gene Expression Omnibus (Edgar et al., 2002) and are accessible through GEO Series accession number GSE81822.