Definition of RNA Polymerase II CoTC Terminator Elements in the Human Genome

Summary Mammalian RNA polymerase II (Pol II) transcription termination is an essential step in protein-coding gene expression that is mediated by pre-mRNA processing activities and DNA-encoded terminator elements. Although much is known about the role of pre-mRNA processing in termination, our understanding of the characteristics and generality of terminator elements is limited. Whereas promoter databases list up to 40,000 known and potential Pol II promoter sequences, fewer than ten Pol II terminator sequences have been described. Using our knowledge of the human β-globin terminator mechanism, we have developed a selection strategy for mapping mammalian Pol II terminator elements. We report the identification of 78 cotranscriptional cleavage (CoTC)-type terminator elements at endogenous gene loci. The results of this analysis pave the way for the full understanding of Pol II termination pathways and their roles in gene expression.


INTRODUCTION
The transcription cycle consists of three stages. Initiation, where RNA polymerase engages with the DNA template, followed by elongation where it translocates along the DNA template synthesizing the RNA copy of the gene and finally termination, where polymerase disengages from the DNA template. In mammals, promoter and terminator DNA sequences are well described for genes transcribed by RNA polymerases I and III (Richard and Manley, 2009). For genes transcribed by RNA polymerase II (Pol II), including all protein coding genes, there is an extensive literature describing promoter sequences. The Mammalian Promoter Database lists 24,967 known and an additional 17,926 potential human Pol II promoters (Gupta et al., 2011); however, fewer than ten verified mammalian Pol II terminator sequences have been described (Proudfoot, 1989;Richard and Manley, 2009).
The major reason for this discrepancy is that Pol II termination is coupled to the complex steps involved in pre-mRNA process-ing and is entirely dependent upon the presence of a functional poly(A) signal (Whitelaw and Proudfoot, 1986;Connelly and Manley, 1988). Pre-mRNA cleavage at the poly(A) site, mediated by the 3 0 end processing complex (Shi et al., 2009), generates two RNA products; a 5 0 cleavage product that is stabilized by polyadenylation as it is processed into mRNA and a 3 0 cleavage product that is subject to rapid degradation by the 5 0 -3 0 exonuclease Xrn2. Such Xrn2-mediated transcript degradation has been shown to have a role in Pol II termination (West et al., 2004). The above could lead one to conclude that the poly(A) site sequence, typically characterized by the hexanucleotide sequence AATAAA followed by a GU/U rich region (Proudfoot, 2011), is the sole Pol II terminator signal. That this is indeed the case, in lower eukaryotes, is supported by a number of studies, mainly in yeast, which provide a detailed understanding of the roles of a host of RNA processing factors in the Pol II termination process (Richard and Manley, 2009;Kuehner et al., 2011). For mammalian genes, however, the poly(A) signal is not the only sequence that is required for Pol II termination. Studies in our laboratory and others have shown the existence of dedicated DNA-encoded Pol II terminator elements located downstream of poly(A) signals (Proudfoot, 1989;Tantravahi et al., 1993;Dye and Proudfoot, 2001;Plant et al., 2005;Gromak et al., 2006;West et al., 2006). From these studies, it appears that there are two broad categories of terminator sequence; G-rich sequences that enhance poly(A) site cleavage and subsequent Pol II termination by pausing Pol II near to the poly(A) signal (Proudfoot et al., 2002;Gromak et al., 2006) and AT-rich terminator sequences, located 1-2 kb downstream of the poly(A) site, which mediate rapid cotranscriptional cleavage of nascent transcripts, prior to poly(A) site cleavage (Dye and Proudfoot, 2001;Plant et al., 2005;West et al., 2006). RNA degradation initiating at cotranscriptional cleavage (CoTC) sites leads to release of Pol II and associated unprocessed pre-mRNA, from the DNA template (West et al., 2008). This distinguishing feature of CoTC-mediated termination is supported by electron microscopic studies in Drosophila that show that release of pre-mRNA from transcription sites prior to 3 0 end processing is a common occurrence (Osheim et al., 2002).
One of the major reasons for the paucity of mammalian Pol II terminator sequences in the literature is that examination of termination mechanisms is hindered by the technical difficulty of mapping nascent transcripts in nuclear run on experiments (Proudfoot, 1989). Here we have used a CLIP-seq strategy (Licatalosi et al., 2008) to distinguish pre-mRNAs that are released from the DNA template before cleavage at the poly(A) site, in order to identify genes that utilize the CoTC termination pathway. Thorough testing of potential terminator sequences from a number of candidate gene loci shows that we have isolated authentic Pol II terminators and indicates that CoTCmediated termination is a feature of a significant proportion of mammalian genes.

CLIP-seq-Based CoTC Terminator Mapping Strategy
Detailed transcriptional analysis of the human b-globin gene has shown that CoTC of b-globin 3 0 flanking region transcripts leads to release of Pol II and associated pre-mRNA, from the chromatin template prior to cleavage/polyadenylation at the poly(A) site. Interestingly, 3 0 end processing of these released pre-mRNAs (referred to herein as unprocessed pre-mRNAs) occurs posttranscriptionally, in the nucleoplasm ( Figure 1A) (West et al., 2008). Following on from these observations, we have developed a strategy, using in vivo UV crosslinking immunoprecipitation (IP) with antibody to the CstF-64 pre-mRNA processing factor (MacDonald et al., 1994), to select such unprocessed nucleoplasmic pre-mRNAs in order to identify genes that use the CoTC termination pathway.
In an initial pilot experiment, HeLa cells transiently transfected with a b-globin minigene construct bTERM (that has an HIV-LTR promoter) and a plasmid encoding the viral transactivator Tat (pTat), were subjected to UV crosslinking followed by nuclear fractionation into chromatin (Ch) and nucleoplasm (N) fractions, as described previously (West et al., 2008). IP was then conducted on the nucleoplasm fraction using CstF-64 antibody to precipitate unprocessed b-globin pre-mRNA, which was detected by RT-PCR using b-globin and control 7SK snRNA primers ( Figure 1B). In lane 1, the detection of PCR products representing unprocessed b-globin pre-mRNA and mature 7SK transcripts, confirms the presence of both RNA species in the input nucleoplasm fraction. In control lane 2, no PCR products are detected confirming that 7SK and b-globin pre-mRNA do not interact with IgG. In lane 3, RT-PCR of nucleoplasmic RNA precipitated with the CstF-64 antibody shows the presence of a PCR product for b-globin but not 7SK, confirming a specific interaction of CstF-64 with nucleoplasmic b-globin pre-mRNA.
We next performed CstF-64 IP, combined with high throughput sequencing (CLIP-seq) (Licatalosi et al., 2008), to identify endogenous CstF-64 interacting nucleoplasmic pre-mRNAs. HeLa cells transiently transfected with bTERM and pTat were UV crosslinked prior to nuclear fractionation. The nucleoplasm fraction was then treated with high (40 U/ml; lanes 1-4) or low (4 U/ml; lanes 5-8) RNaseI before IP with CstF-64 and IgG antibodies. Immunoprecipitated RNA was 5 0 end-labeled with [g-32 P] ATP and protein RNA complexes were separated by PAGE and transferred to a nitrocellulose membrane ( Figure 1C). IP of UV-treated cells with CstF-64 antibody (lanes 4 and 8) results in two prominent radiolabeled bands. (The absence of bands in control lanes 1-3 and 5-7, confirms the specificity of the CstF-64 IP experiment; see legend for full details). The lower 70 kDa band is the expected size for CstF-64 and the upper $200 kDa band is possibly a CstF-64 dimer. The bands detected in lane 8 (4 U/ml) are relatively weaker than those detected in lane 4 (40 U/ml) and appear within a radioactive smear extending from 40-300 kDa, which reflects partial RNA digestion due to limiting RNaseI. RNA was eluted from the membrane at a position 20-30 kDa above the 70 kDa CstF-64 band to obtain CstF-64/RNA complexes containing 50-80 nucleotide (nt) RNAs. Eluted RNAs were ligated to adaptors for reverse transcription and PCR amplification. This experiment was repeated twice more on untransfected cells. PCR products from each experiment were analyzed by high throughput sequencing (HITS). A total of 1,285 CLIP regions were identified, from the three repeats, which were supported by at least two independent read alignments in the pooled samples. These CLIP regions were significantly enriched in genic (RefSeq+RNA genes) but not intergenic regions and within genes, they showed significant enrichment in extended 3 0 UTR regions, but not in exons or introns (graphs, Figure 1D). Seventy-eight genes (Table 1) containing CstF-64 CLIP regions within 3 0 UTRs (i.e., in proximity to putative poly(A) sites) were selected as CoTC candidates (see Experimental Procedures for details). Importantly, this list contains the b-globin gene, as three unique CLIP reads were mapped in the proximity of its annotated poly(A) site, in the transfected sample ( Figure 1E).
CLIP-seq Positive Candidate Pre-mRNAs Are Released from the DNA Template According to our hypothesis, the detection of a particular unprocessed pre-mRNA in the nucleoplasm is a marker of CoTC-type termination occurring at the corresponding gene locus. Therefore, to further test the CLIP-seq data, we analyzed the nuclear distribution of unprocessed pre-mRNA, from a subset of CLIP-seq positive genes, by quantitative radioactive RT-PCR (qRT-PCR) of chromatin and nucleoplasm fractions ( Figure 2A). Analysis of pre-mRNA in the chromatin (Ch) fraction (lane 1) shows that unprocessed, presumably nascent, pre-mRNAs are detected at each of the candidate gene loci. Likewise, for the nucleoplasm (N) fraction (lane 2), unprocessed pre-mRNA from each of the candidate gene loci is detected. Quantitative analysis of the PCR products, in lanes 1 and 2, confirms that significant amounts of unprocessed pre-mRNA (19%-55%) are released to the nucleoplasm fraction from the corresponding gene loci (Figure 2A, graph). Variation in the amount of nucleoplasmic pre-mRNA, from different genes, may indicate variation in the efficiency of pre-mRNA release, gene-specific rates of 3 0 end processing or differential RNA stability. Importantly, these data confirm our detection of unprocessed nucleoplasmic pre-mRNAs in our CLIP-seq analysis and support the hypothesis that a CoTC-type termination mechanism operates at these gene loci.
To control for the possibility that the presence of unprocessed nucleoplasmic pre-mRNA is a general feature of Pol II transcribed genes, we employed the same qRT-PCR strategy to examine the nuclear distribution of GAPDH, PKM2, and ENO1 pre-mRNAs, as these highly expressed transcripts (Kapranov et al., 2007) were not detected in our CLIP-seq analysis. As shown in Figure 2B, analysis of pre-mRNAs in the chromatin (B) RT-PCR analysis of immunoprecipitated nucleoplasmic RNA. RNA in control input (lanes 1 and 4) and immunoprecipitated with rabbit IgG (lanes 2 and 5) or CstF-64 antibody (lanes 3 and 6) was subjected to RT-PCR analysis using b-globin-specific primers (blue arrows in diagram beside the data panel) and primers for 7SK snRNA. The lack of bands in lanes 4-6 (ÀRTase) confirms the absence of contaminating DNA in all samples. (C) Purification of CstF-64-RNA covalent complexes by SDS-PAGE following treatment with high (40 U/ml, lanes 1-4) or low (4 U/ml, lanes 5-8) RNaseI. No protein-RNA complexes were detected in the absence of UV irradiation (lanes 1, 2, 5, and 6) or following control immunoprecipitation with rabbit IgG (lanes 3 and 7), demonstrating the specificity of the CstF-64 antibody/CstF-64-RNA complex interaction. The red dashed box, in lane 8, indicates the area of the gel from which CstF-64-RNA complexes were eluted for subsequent HITS analysis. (D) Bar graphs showing the distribution of CLIP reads in annotated genic and intergenic regions (left panel) and in intron, exon, and 3 0 UTRs of annotated genes (right panel). P values (brackets) for the significance of relative enrichment, with respect to what would be expected by chance, in the case of a random distribution, were calculated using a one-sided binomial test. *Indicates significant enrichment (p value < 0.001). (E) Diagram of the human b-globin gene with CLIP tags (dark blue squares) mapped to the sequence shown below the diagram. The annotated poly(A) site is indicated in green. Red letters highlight G/U stretches representing potential CstF-64 binding sites.
(Ch) fraction (lane 1) detected unprocessed, presumably nascent, pre-mRNAs at each of the candidate gene loci. In lane 2, analysis of the nucleoplasm fraction detected only faint PCR products representing unprocessed nucleoplasmic pre-mRNAs from these gene loci. Quantitative analysis of the radiolabeled RT-PCR products derived from chromatin and nucleoplasm fractions (graph, Figure 2B) shows that unprocessed nucleoplasmic pre-mRNAs represent merely 4%-7% of total pre-mRNA from these gene loci. These data provide further validation of the CLIP-seq experiment and indicate that, on the basis of our selection criteria (detection of unprocessed nucleoplasmic pre-mRNA), it is likely that the GAPDH, PKM2, and ENO1 genes do not employ a CoTC termination mechanism. We predict that the termination mechanism employed at these gene loci is similar to the previously described pause-type, where poly(A) site cleavage precedes Pol II release from the DNA template (Gromak et al., 2006;West et al., 2008) Figure 2B).

Mapping of Terminator Elements at CLIP-seq Positive Gene Loci
We next developed a strategy to map terminator elements at the CLIP-seq positive gene loci. Transcript cleavage occurring downstream of and prior to cleavage at the poly(A) site is an intrinsic part of the CoTC terminator mechanism (Dye and Proudfoot, 2001;West et al., 2008) ( Figure 1A). Therefore we reasoned that it would be possible to infer the location of potential CoTC terminator sequences, in the 3 0 flanking regions of CLIP-seq positive genes, by mapping sites of transcript cleavage using an RT-PCR approach. Total nuclear RNA was reverse transcribed using random primers and the resulting cDNA was PCR amplified using gene-specific primers complementary to 3 0 flanking region transcripts. For CCNB1, PCR amplification was carried out using a single forward primer (F), located upstream of the CCNB1 poly(A) site, in combination with reverse primers (R1-R5) located at increasing distance downstream of the poly(A) site in the CCNB1 3 0 flanking region ( Figure 3A, diagram). PCR amplification using the F/R1-R3 primer pairs resulted in the detection of 180 bp, 580 bp, and 1.1 kb bands (lanes 6-8), which correspond precisely to bands derived from control amplification of genomic DNA using the same primer pairs (lanes 1-3). PCR amplification using F/R4 and F/R5 primer pairs does not result in detectable levels of product (lanes 9 and 10), even though corresponding PCR products are derived from control amplification of genomic DNA (lanes 4 and 5). Thus RT-PCR analysis shows that continuous CCNB1 pre-mRNA is detected up to $1.1 kb downstream of the poly(A) site (lane 8, F/R3 primer pair), with no continuous RNA detected beyond this point (lanes 9 and 10). From these data, we estimated that a potential CoTC terminator element (PCTE) was located between 470-1,780 bp downstream of the CCNB1 poly(A) site (the region bordered by primers R2 and R4). This observation is similar to that reported for the b-globin gene, where significant CoTC activity occurs at a position $1.0 kb downstream of the b-globin poly(A) site (Dye and Proudfoot, 2001). We next adopted the same terminator mapping strategy for four other CLIP-seq positive genes (AKIRIN1, PTCH2, THOC2, and WDR13). Transcript cleavage (3 0 flanking region) was mapped to positions 0.8-1.2 kb downstream of the respective poly(A) sites and was used to estimate the loca-tion of PCTEs ( Figure 3B). Confirmation of these results comes from control experiments, measuring RT-PCR efficiency on full-length in vitro transcribed CCNB1, PTCH2, and WDR13 3 0 flanking region transcripts ( Figure S1).

Testing Candidate Terminator Elements
The five newly identified PCTEs were placed in the termination reporter plasmid (bDTERM) and tested for terminator activity by RNase protection assay (RPA) (Plant et al., 2005). Nuclear RNA isolated from HeLa cells transiently transfected with each of the candidate reporter constructs, positive (bTERM), or negative (bDTERM) terminator control constructs and pTat, was hybridized to an antisense radiolabeled riboprobe spanning the reporter gene HIV-LTR ( Figure 4A). Following RNase digestion, protected products were analyzed by PAGE ( Figure 4B). In lane 1, no protection products were detected, confirming the absence of b-globin mRNA in untransfected HeLa cells. In lane 2 (bTERM), the prominent 85 nt band, (labeled mRNA) at the base of the gel, results from hybridization of the riboprobe to the 5 0 end of the b-globin mRNA. The weaker 242 nt band (labeled RT) results from hybridization of the riboprobe to readthrough transcripts derived from Pol II transcription proceeding around the plasmid into the HIV-LTR. In lane 3 (bDTERM), a weaker mRNA band was detected together with a prominent read-through band that reflects increased Pol II read-through transcription in the absence of the b-globin terminator element. In lane 4 (bCCNB1), the reduced intensity of the read-through band shows that the CCNB1 PCTE effectively blocks Pol II read-through transcription. Furthermore, the corresponding increase in the mRNA band shows that b-globin mRNA recovers to wild-type level in the presence of the CCNB1 PCTE. The low level of the read-through band in all candidate PCTE samples (lanes 5-8 and graph below the data panel) indicates that each PCTE has terminator activity.
Although RPA is useful for screening terminator activity, it is an indirect method and read-through transcript levels could be affected by differential RNA stability. Therefore, we next employed nuclear run on (NRO) analysis to measure termination by the CCNB1 and PTCH2 PCTEs. NRO analysis was conducted on nuclei isolated from HeLa cells transfected with bCCNB1, bPTCH2, and three control constructs (bTERM, bDTERM, and b4-7) along with pTat. Resulting radiolabeled nascent transcripts were hybridized to the nylon filter shown in Figure 4C. For the positive control construct (bTERM) prominent hybridization signals are detected in the gene body and post poly(A) site region (probes P, B3, and B4, respectively) and background level signals are detected over probes A and U3, which are located downstream of the terminator, showing that efficient termination occurs before Pol II reaches region A of the plasmid template (Dye and Proudfoot, 2001). For the termination negative control (bDTERM), in the absence of the terminator sequence, hybridization signals are detected over all probes P-U3. The presence of prominent signals over probes A and U3 shows that in the absence of the terminator Pol II transcribes the entire plasmid. The transcription profiles resulting from NRO analysis of bCCNB1 and bPTCH2 are essentially identical to bTERM, with prominent radioactive signals detected over probes P, B3 and B4 and only background   Figure 1A) (Dye and Proudfoot, 2001;West et al., 2008). To determine if a similar order of events occurs at the CCNB1 terminator, we began by measuring the distribution of transcribing Pol II in the endogenous CCNB1 terminator region, by qRT-PCR analysis of nascent transcripts. Total nuclear RNA was reverse transcribed using random primers and the resulting cDNA was PCR amplified, using primer pairs to detect transcripts of the CCNB1 poly(A) site and terminator regions (see upper diagram, Figure 5A). The resulting PCR products were quantified by PhosphoImage analysis before plotting on the graph (gray bars, Figure 5A). PCR amplification, using primer pair F1/R1, results in a prominent signal reflecting the high abundance of transcripts in the poly(A) site region. PCR amplification of cDNA representing CCNB1 terminator transcripts, using primer pair F2/R2, indicates relatively high transcript abundance at the 5 0 end of the terminator. However, transcript abundance is significantly decreased by the middle of the terminator, primer pair F3/R3, falling to $10% at the 3 0 end of the terminator element (primer pair F4/R4). Examination of the post-terminator region, using primer pair F5/R5, indicates a further decrease in active Pol II level. These data show that the level of transcriptionally engaged Pol II decreases as it proceeds through the terminator region, indicative of Pol II termination. We next examined the continuity of CCNB1 terminator transcripts by qPCR of random primed cDNA, using primer pairs composed of a single forward primer (F1), located immediately upstream of the CCNB1 poly(A) site, in combination with five different reverse primers (R1-R5; see lower diagram, Figure 5A). The resulting radioactive PCR products were quantified by PhosphoImage analysis before plotting on the graph (black bars, Figure 5A). This analysis shows that whereas high levels of continuous transcripts are detected with reverse primers positioned before the terminator element, very few or none (7%-0%) are detected with primers positioned within or downstream of the terminator element. These data show that nascent transcripts of the 5 0 end of the CCNB1 terminator are cotranscriptionally cleaved and, when combined with results from the measurement of transcript distribution above, indicate that transcript cleavage precedes Pol II termination occurring throughout the CCNB1 terminator. The profile of transcript discontinuity followed by Pol II termination is similar to that described for the human b-globin gene terminator (Dye and Proudfoot, 2001) and is indicative of the presence of the CoTC termination mechanism at the CCNB1 gene locus. To test if other candidate genes utilize the CoTC termination pathway we conducted analogous qRT-PCR terminator transcript analysis on the endogenous WDR13 gene and observed terminator transcript discontinuity occurring before Pol II termination, again indicating the presence of the CoTC termination mechanism ( Figure S2).

Terminator and CoTC Activities Localize to the 5 0 End of the CCNB1 Terminator Element
To analyze the CCNB1 terminator in more detail, it was divided into three subfragments (labeled A, B, and C) that were placed in the reporter plasmid bDTERM, forming bCCNB1A, bCCNB1B, and bCCNB1C (diagram, Figure 5B). The termination capacity of each subfragment was compared to that of the full-length CCNB1 terminator (in bCCNB1) by RPA. Nuclear RNA isolated from HeLa cells transiently transfected with bCCNB1/A/B and C, positive (bTERM) and negative (bDTERM) control constructs and pTat, were hybridized to an antisense radiolabeled riboprobe spanning the HIV-LTR ( Figure 5B). The positive (bTERM, lane 2) and negative (bDTERM, lane 3) controls show that bTERM promotes efficient Pol II termination, as indicated by the reduced read-through signal in lane 2. In lane 4 (bCCNB1), restoration of read-through and mRNA protection products, to the levels seen with the termination positive control bTERM (lane 2), confirms that the full CCNB1 terminator mediates efficient Pol II termination. In lane 5 (bCCNB1A) a similar pattern of low read-through and high mRNA signal is observed, indicating that region A of the CCNB1 terminator mediates efficient Pol II termination. In lanes 6 (bCCNB1B) and 7 (bCCNB1C), the significantly lower level of the mRNA band and higher level of the read-through band indicates that regions B and C of the CCNB1 terminator have reduced terminator activity. This observation is confirmed by quantitative phosphoImage analysis of the radiolabeled protection products, shown in the graph below the data panel. We next tested the CCNB1 terminator subfragments for CoTC activity by measuring transcript abundance using qRT-PCR. Nuclear RNA isolated from HeLa cells transiently transfected with bCCNB1/A, B, or C and pTat, was reverse transcribed with PCR primers labeled BR and terR, which are  1 and 3) and nucleoplasm (N, lanes 2 and 4) fractions. cDNA synthesis was primed using random primers, in reactions with (+RTase, lanes 1 and 2) or without (ÀRTase, lanes 3 and 4) the addition of reverse transcriptase to control for the presence of contaminating DNA. PCR amplification (23 cycles) of the resulting cDNA was conducted using gene-specific primers (indicated by blue arrows in the diagrams above the data panels) spanning annotated poly(A) sites (UCSC Genome Browser; http://genome.ucsc.edu/). (The number of PCR cycles was determined to be within the linear range [data not shown] and therefore accurately reflects RNA abundance). Radiolabeled RT-PCR products from chromatin and nucleoplasm fractions (lanes 1 and 2) were quantitated by PhosphoImage analysis and the proportion of released nucleoplasmic pre-mRNA (% total) calculated and displayed in the graphs below the data panels. The lack of PCR products in lanes 3 and 4 (ÀRTase) confirms the absence of contaminating DNA in all samples. The diagrams above the data panels illustrating CoTC-type (A) and pause-type (B) termination mechanisms are labeled as Figure 1A, except for the pause element (blue bar) and scissors (indicating cleavage by the 3 0 end processing complex) in (B). Error bars represent the results of three experimental repeats.
complementary to plasmid sequence either side of the inserted CCNB1 terminator subfragments (diagram, Figure 5C). We then conducted PCR on the resultant cDNA using primer pairs BF/BR and terF/terR to analyze transcripts from upstream of and across the terminator subfragments ( Figure 5C). The absence of PCR products in lane 1 confirms that there is no background signal in untransfected HeLa cells. In lane 2 (bCCNB1A) amplification with the BF/BR primer pair yields a prominent PCR product, representing transcripts from upstream of terminator subfragment A. However, PCR amplification of the region A transcript, with the terF/terR primer pair, results in a very low abundance PCR product indicating that few continuous RNA transcripts extend across this region. In contrast, bCCNB1/B and C generated prominent PCR products with both primer pairs, indicating that abundant continuous RNA transcripts extend across these subfragments of the CCNB1 terminator. The very low level of RT-PCR product representing transcripts of subfragment A indicates that they are subject to CoTC. The correspondence of the robust terminator activity of subfragment A, shown by RPA ( Figure 5B), and the observed discontinuity of its transcript, shown both here (lane 2, Figure 5C) and in qRT-PCR of terminator transcripts from the endogenous gene locus ( Figure 5A), contrasts with the weak terminator activity and apparent stability of region B and C transcripts. These data provide further compelling evidence that Pol II termination on the CCNB1 gene is mediated by the CoTC termination mechanism and confirm that, using the CLIP-seq strategy, we have successfully identified authentic Pol II terminator elements.

Bioinformatic Analysis of CoTC Terminator Sequences
In order to search for conserved DNA or RNA sequences involved in the CoTC termination mechanism we next conducted a detailed bioinformatic analysis of the 3 0 flanking regions (0-2 kb downstream of the CLIP-seq sites) of all 78 CoTC candidate genes. From this analysis we found that candidate gene 3 0 flanking regions are slightly more AT-and T-rich than equally sized regions downstream of the annotated pA sites of other protein coding genes ( Figure S3). This finding is in agreement with our analysis of the b-globin terminator that has shown the importance of AT rich sequences in CoTC-mediated termination of b-globin gene transcription (Dye and Proudfoot, 2001;White et al., 2013). Next, in an effort to identify possible trans-acting factors in the CoTC termination process, we screened the candidate gene 3 0 flanking regions for the presence of DNA binding motifs of known transcription factors represented in the professional version of the TRANSFAC database, which includes 1,665 binding motif matrices. We found that none of these motifs were significantly enriched or depleted in the tested set when compared to the corresponding region of other protein coding genes. Finally, we conducted a search for potential novel sequence motifs by using MEME software (Bailey and Elkan, 1994). Although, as expected, some weak motifs were identified in a subset of candidate gene 3 0 flanking regions using this approach (see Figure S4), their relevance to transcription termination is not clear. Thus our bioinformatic analysis shows that CoTC terminators are not characterized by a simple sequence motif and indicates that factors apart from DNA sequence are involved in the CoTC termination process.

DISCUSSION
Transcription termination is an important, yet relatively overlooked, aspect of the Pol II transcription cycle. Major reasons for this are the considerable technical difficulties involved in Diagrams, below the graph, show primer pairs (blue arrows) used in transcript distribution and transcript continuity analyses. The CCNB1 poly(A) site (green arrowhead) and terminator element (red bar) are indicated. In the graph, (*) indicates that no PCR product was detected with the indicated primer pairs. Error bars represent the results of three experimental repeats. (B) RPA of CCNB1 terminator fragments. In the diagram of the bDTERM reporter plasmid, dashed red lines indicate the insertion site of CCNB1 terminator fragments (labeled colored bars). Lane 1, untransfected cells, lanes 2-7 transfected cells. Control RNase digestion of the riboprobe is shown in lane 8 (tRNA + ) beside undigested riboprobe (tRNA À , lane 9). For each sample RT and P protection products were quantified by PhosphoImage analysis and the relative abundance of the RT product (RT/Total) was calculated and displayed in the graph below the data panel. Error bars represent the results of three experimental repeats. (C) qRT-PCR analysis of the continuity of CCNB1 terminator subfragment transcripts. In the diagram, the location of PCR primers (red and blue arrows), relative to terminator fragments (red bar) is shown. Lane 1, qRT-PCR of untransfected cells. Lanes 2-4, cells transfected with CCNB1 terminator fragment constructs. Lanes 5-8, control qRT-PCR of ÀRTase samples. cDNA in all samples was amplified using 15 PCR cycles (data not shown), which was determined to be within the linear range and therefore accurately reflects RNA abundance. See also Figure S2. the analysis of nascent Pol II transcripts and the fact that Pol II termination has not as yet, been recapitulated in vitro. However, the finding that Pol II transcription termination on the human b-globin gene, which occurs by the CoTC terminator mechanism (Dye and Proudfoot, 2001), involves release of b-globin pre-mRNA from the chromatin template to the nucleoplasm (West et al., 2008) has enabled us to develop a method for identification of Pol II terminator elements. We have conducted IP of nucleoplasmic RNA, using antibody against the pre-mRNA processing factor CstF-64, to select pre-mRNAs that are released from transcription sites prior to 3 0 end processing. Mass sequencing of such CstF-64 interacting pre-mRNAs enabled the identification of 78 candidate genes for the CoTC termination pathway. Detailed transcriptional analysis of the 3 0 flanking regions of a randomly selected subset of five candidate genes (CCNB1, PTCH2, WDR13, THOC2, and AKIRIN1), resulted in the identification of CoTC terminator elements located 0.5-2 kb downstream of the candidate gene poly(A) sites. Each terminator element promotes efficient Pol II termination with the most potent, from the CCNB1 and PTCH2 gene 3 0 flanking regions, mediating 100% Pol II termination in nuclear run on assays. From these results we predict that the remaining 73 candidate genes contain CoTC terminators within their 3 0 flanking regions.
Although we have identified CoTC terminators at a number of gene loci, our data indicate that this number is limited because we have not reached saturation in identification of unprocessed pre-mRNAs in the nucleoplasm. This is possibly due to both the relatively low abundance of these species and gene-specific variation in the strength of the poly(A) site-CstF-64 interaction (Takagaki and Manley, 1997;Martin et al., 2012). Thus it is likely that many more protein coding genes employ the CoTC termination mechanism. Supporting evidence for this suggestion comes from an electron microscopic study of Pol II transcription in Drosophila. In this study, of over 100 unidentified Pol II transcribed genes, it was found that Pol II termination and pre-mRNA release occurred prior to pre-mRNA 3 0 end processing for 64% of these genes (Osheim et al., 2002). Although the mechanism of Pol II termination at these gene loci is unknown, the observation of abundant released pre-mRNA is suggestive of a CoTC-type termination pathway.
In order to understand more about the possible role of DNA sequence in the CoTC termination mechanism we conducted a detailed bioinformatic analysis of the 3 0 flanking regions (0-2 kb downstream of the CLIP-seq sites) of all 78 CoTC candidate genes. From this analysis, we found that these regions are more AT-and T-rich than equally sized regions downstream of the annotated pA sites of other protein coding genes. Although a search for known transcription factor DNA binding motifs in the candidate gene 3 0 flanking regions, using the TRANSFAC database (1,665 binding motif matrices), revealed no matches, a search for potential novel sequence motifs using MEME software (Bailey and Elkan, 1994) did identify a number of weak motifs in a subset of candidate gene 3 0 flanking regions. Further work will be required to determine the importance of these sequence motifs in the CoTC termination process. Combining these data with our understanding of the role of ATrich sequences in the human b-globin terminator Proudfoot, 2001: White et al., 2013) enables us to state that CoTC terminator sequences are complex and are not characterized by a simple sequence motif. An interesting possibility is that the length and sequence composition of CoTC terminator elements may have affects on nucleosome organization (Kaplan et al., 2009) that may be instrumental in the Pol II termination process.
Apart from mapping CoTC terminators our CLIP-seq strategy illuminates another termination pathway at endogenous gene loci. Analysis of pre-mRNA from the GAPDH, PKM2, and ENO1 genes (that were not selected by IP of nucleoplasmic pre-mRNA) shows that for these genes, cotranscriptional poly(A) site cleavage precedes release of Pol II from the chromatin template. Such an order of events corresponds to the pausing model of transcription termination where it is envisioned that G-rich sequences, located immediately downstream of the poly(A) site, cause a transient pause in Pol II progression that effectively enhances 3 0 end processing (Gromak et al., 2006). In agreement with this model the GAPDH, PKM2, and ENO1 genes all have enrichment of G residues in the post poly(A) site region, which correlates with a recent chromatin immunoprecipitation (ChIP) analysis showing significant Pol II accumulation at the 3 0 ends of the GAPDH and ENO1 genes (Brannan et al., 2012).
Although we have discussed Pol II termination in terms of CoTC and pausing models ( Figure 6) this may be an oversimplification. Results herein and in previous analyses of CoTC sequences, show significant variation in CoTC terminator efficiency (Dye and Proudfoot, 2001;Plant et al., 2005;West et al., 2006). Considering the sequence-specificity of CoTC (AT-rich) and pause (G-rich) terminator elements, it is likely that the relative contribution of each termination mechanism, at individual gene loci, is directed by 3 0 flanking region sequence composition. This leads us to speculate that, especially in the light of the observation that CoTC termination can enhance levels of gene expression (West and Proudfoot, 2009), genespecific 3 0 flanking region sequence composition could have subtle, but important, effects on gene expression.
This study marks a successful attempt to map Pol II terminator elements at endogenous gene loci. It has enabled the characterization of a significant number of CoTC terminator elements, which we predict to be a common feature of mammalian genes, and the visualization of a different termination mechanism, possibly pause-type, occurring at gene loci that were not selected using the CLIP-seq methodology. We anticipate that application of the range of techniques described herein will enable the definition of many more terminator elements and lead to a deeper understanding of mammalian Pol II termination pathways and their roles in gene expression.

PCR Primers and RNA Linkers
A list of oligonucleotide sequences used as PCR primers and RNA linkers is given in Table S1.

Transfection Procedure
Transient transfection of HeLa cells was performed as previously described (West et al., 2008).

Nuclear RNA Fractionation
Nuclear RNA fractionation was performed as previously described (West et al., 2008).

HITS Library Preparation and Data Processing
HITS library preparation was performed as described (Licatalosi et al., 2008). Samples from three independent repeats of CstF-64 CLIP-seq were submitted to high-throughput sequencing from the 5 0 ends using the Illumina Hi-Seq 100 nt single-end reads protocol (Source Bioscience). Repeats 1 and 2 were multiplexed and sequenced together in the same lane. During pre-processing, samples 1 and 2 were demultiplexed using barcodes; low quality reads (mean Q < 30 within the first 50 bases) were removed and 3 0 sequences matching ligated adaptors or putative oligo-A tails trimmed. Reads shorter than 24 nt were discarded and longer reads trimmed to 50 nt before Bowtie alignment to the hg18 assembly of the human genome allowing for three mismatches. From a total of 97.5 M reads (40.1 M, 32.9 M and 24.5 M, in repeats 1, 2, and 3, respectively), 49.8 M aligned, of which 38.5 M (13.1 M, 10.9 M, and 13.8 M in repeats 1, 2, and 3, respectively) matched unique sites (only these were considered in the subsequent analysis). Each of the experimental repeats resulted in a high level of duplica-

Figure 6. Diagram of CoTC and Pause-Type Pol II Termination Pathways
In the CoTC termination pathway transcripts of AT-rich terminator elements (red bar) are cleaved by CoTC activity (red scissors) promoting Pol II release before poly(A) site cleavage, mediated by the 3 0 processing complex (green scissors). In the pause-type termination pathway, Pol II transcriptional pausing, at G-rich sequences (blue bar), enhances pre-mRNA cleavage at the poly(A) site, leading to Pol II release. Icons and symbols as in Figure 1A.
tion indicated by many reads aligned to the same genomic location, probably due to the low amount of pre-mRNA targets recovered from the nucleoplasm in the CLIP experiment. To avoid bias PCR duplicates were removed and each unique read-alignment location was considered only once, resulting in 9,658 (3,410, 2,765, and 3,483, in repeats 1, 2, and 3, respectively) readalignment sites. To maximize sensitivity reads from the three repeats were pooled for subsequent analysis. Overlapping read-alignments (extended by 100 bp) were merged to ''regions.'' A total of 1,285 regions, supported by at least two independent read-alignments, were considered significant. Many regions were identified in only one repeat, indicating that the experiment was far from saturation in detection of nucleoplasmic CstF-64 binding targets. To determine their genomic distribution, significant regions were related to NCBI RNA reference sequences (RefSeq) and short RNA gene (RNA genes) annotation tracks, downloaded from the UCSC genome browser. To identify genes that employ the CoTC termination pathway we selected pre-mRNAs with a CstF-64 CLIP region in 3 0 UTRs, extended by 200 bp downstream (to account for variability in poly(A) site usage and imprecision in transcript-end annotation).
RT-PCR and qRT-PCR cDNA was synthesized using Superscript III (Invitrogen). DNA amplification was performed using Go-Taq DNA polymerase (Promega). When conducting qRT-PCR, PCR products were amplified with [a-32 P]dCTP (Perkin Elmer). PCR products were applied to 6% polyacrylamide gels and radioactive signals quantified by PhosphoImager (Fuji).

RNase Protection Analysis
RNase protection analysis is as described previously (Plant et al., 2005).
NRO Analysis and Single-Stranded DNA probes NRO analysis and single-stranded M13 probes used are as described previously (West et al., 2008). Quantitation of NRO hybridization signals by PhosphoImager analysis is based on the average of multiple experiments after subtraction of background signal, shown by probe M.

ACCESSION NUMBERS
The sequences generated for this work have been deposited in the Array Express Archive under accession number E-MTAB-1375 (http://www.ebi.ac. uk/arrayexpress/experiments/E-MTAB-1375).

SUPPLEMENTAL INFORMATION
Supplemental Information includes four figures and one table and can be found with this article online at http://dx.doi.org/10.1016/j.celrep.2013. 03.012.

LICENSING INFORMATION
This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.