Hyperphosphorylated C-terminal Repeat Domain-associating Proteins in the Nuclear Proteome Link Transcription to DNA/Chromatin Modification and RNA Processing*

Using an interaction blot approach to search in the human nuclear proteome, we identified eight novel proteins that bind the hyperphosphorylated C-terminal repeat domain (phosphoCTD) of RNA polymerase II. Unexpectedly, five of the new phosphoCTD-associating proteins (PCAPs) represent either enzymes that act on DNA and chromatin (topoisomerase I, DNA (cytosine-5) methyltransferase 1, poly(ADP-ribose) polymerase-1) or proteins known to bind DNA (heterogeneous nuclear ribonucleoprotein (hnRNP) U/SAF-A, hnRNP D). The other three PCAPs represent factors involved in pre-mRNA metabolism as anticipated (CA150, NSAP1/hnRNP Q, hnRNP R) (note that hnRNP U/SAF-A and hnRNP D are also implicated in pre-mRNA metabolism). Identifying as PCAPs proteins involved in diverse DNA transactions suggests that the range of phosphoCTD functions extends far beyond just transcription and RNA processing. In view of the activities possessed by the DNA-directed PCAPs, it is likely that the phosphoCTD plays important roles in genome integrity, epigenetic regulation, and potentially nuclear structure. We present a model in which the phosphoCTD association of the PCAPs poises them to act either on the nascent transcript or on the DNA/chromatin template. We propose that the phosphoCTD of elongating RNA polymerase II is a major organizer of nuclear functions.


Sherry M. Carty and Arno L. Greenleaf ‡
Using an interaction blot approach to search in the human nuclear proteome, we identified eight novel proteins that bind the hyperphosphorylated C-terminal repeat domain (phosphoCTD) of RNA polymerase II. Unexpectedly, five of the new phosphoCTD-associating proteins (PCAPs) represent either enzymes that act on DNA and chromatin (topoisomerase I, DNA (cytosine-5) methyltransferase 1, poly(ADP-ribose) polymerase-1) or proteins known to bind DNA (heterogeneous nuclear ribonucleoprotein (hnRNP) U/SAF-A, hnRNP D). The other three PCAPs represent factors involved in pre-mRNA metabolism as anticipated (CA150, NSAP1/hnRNP Q, hnRNP R) (note that hnRNP U/SAF-A and hnRNP D are also implicated in pre-mRNA metabolism). Identifying as PCAPs proteins involved in diverse DNA transactions suggests that the range of phosphoCTD functions extends far beyond just transcription and RNA processing. In view of the activities possessed by the DNA-directed PCAPs, it is likely that the phosphoCTD plays important roles in genome integrity, epigenetic regulation, and potentially nuclear structure. We present a model in which the phosphoCTD association of the PCAPs poises them to act either on the nascent transcript or on the DNA/chromatin template. We propose that the phosphoCTD of elongating RNA polymerase II is a major organizer of nuclear functions.

Molecular & Cellular Proteomics 1:598 -610, 2002.
The C-terminal repeat domain (CTD), 1 a non-catalytic "tail" on the largest subunit of RNA polymerase, is a unique structure made up of as many as 52 repeats of the consensus heptamer YSPTSPS (1). That each seven-residue repeat con-tains five hydroxyl-containing side chains suggests that the CTD could be a good target for phosphorylation, and indeed, in elongating RNAP II the CTD is hyperphosphorylated (2)(3)(4); in preinitiating RNAP II, on the other hand, the CTD is not phosphorylated (e.g. see Ref. 5). The amino acid composition of the CTD also renders it very hydrophilic, and it is largely unstructured in aqueous solution (6,7); in mammals a fully stretched out CTD could potentially extend some 1200 Å away from the globular body of the polymerase (whose diameter is ϳ150 Å, Ref. 8) (see e.g. Ref. 9). Its unusual properties render the CTD well suited to function as an extended scaffold to which transcription factors and other proteins could bind; indeed, a growing body of data indicates that such a scaffolding role is the principal function of the CTD. Experiments focusing on the hyperphosphorylated CTD characteristic of elongating RNAP II have revealed that a number of RNA processing factors bind to the phosphoCTD (e.g. see Refs. 10 and 11). Binding to the phosphoCTD tethers such factors to the transcription machinery where they can carry out their functions more efficiently (e.g. see Refs. [12][13][14][15][16]; furthermore, certain factors or processes are stimulated by phosphoCTD binding (e.g. see . In addition, a pool of nontranscribing RNAP II carries a phosphoCTD and is associated with certain transcription and processing factors in potential assembly areas called "transcriptosomes" (see Ref. 20). RNA processing proteins known to bind directly to the phos-phoCTD include the following: subunits of capping enzyme (21)(22)(23), subunits of 3Ј end formation factors (24,25), splicing factors (9), splicing factor-like proteins (26 -29), and an apparent small nucleolar RNA 3Ј end factor (30). A few other proteins of disparate function have been shown to bind directly to the phosphoCTD, including the prolyl isomerase ESS1 (31), the histone acetyltransferase p300/CBP-associated factor (32), and the KRAB/Cys 2 -His 2 zinc finger protein ZNF74 (33). Significantly, however, the currently known phosphoCTD-associating proteins (PCAPs) were not, for the most part, discovered in a systematic way, and it is therefore very likely that numerous PCAPs remain to be discovered. Because discovering additional PCAPs promised to generate new insights into phosphoCTD function, we began a systematic search for novel human PCAPs in the nuclear proteome.
To carry out the search for new human PCAPs we exploited the ability of yeast CTD kinase I (34 -36) to hyperphosphorylate a recombinant CTD fusion protein, using the kinase to generate a radiolabeled, exhaustively phosphorylated probe (37). The [ 32 P]phosphoCTD probe was then used in an interaction blot (Far-Western) approach to identify proteins in HeLa nuclear extracts that bind directly to the phosphoCTD. Numerous PCAPs were detected in the extract, which was then fractionated to purify the proteins for identification by mass spectrometry. Following this strategy we have so far succeeded in identifying eight previously unknown human PCAPs. One subset of the new PCAPs consists of known or suspected RNA processing factors as anticipated. Excitingly, another subset of the new PCAPs consists of enzymes with activities directed toward DNA or chromatin rather than toward pre-mRNA. Because the enzymes include DNA topoisomerase I, DNA (cytosine-5) methyltransferase 1, and poly(ADP-ribose) polymerase-1, our findings hint that the phosphoCTD of RNAP II plays previously unsuspected roles in several DNA-centered processes, including modifications involved in epigenetic regulation. As is the case for the RNA-directed PCAPs, binding of these DNA-directed proteins to the phosphoCTD should poise them to efficiently carry out their functions. Accordingly, we propose that the phosphoCTD of elongating RNAP II is a major organizer of diverse nuclear processes.

EXPERIMENTAL PROCEDURES
Suppliers-MacroPrep High S and MacroPrep High Q resins were from Bio-Rad. O-Phospho-L-serine and O-phospho-L-serine-agarose were from Sigma. Hybond-C Extra was from Amersham Biosciences. [ 32 P]ATP was from ICN Biomedicals.
Salt Extraction of HeLa Nuclear Proteins-HeLa nuclei were prepared, extracted with 0.15 M NaCl, and centrifuged (38). Trial extractions of this nuclear pellet were carried out by sequential homogenization at increasing NaCl concentrations in HGEϩϩ (25 mM Hepes, pH 7.6, 15% glycerol, 0.1 mM EDTA ϩ phenylmethylsulfonyl fluoride (1:1000 dilution of saturated solution in isopropanol) ϩ 10 mM NaF). After each homogenization and centrifugation (150,000 ϫ g for 60 min at 4°C) the supernatant was kept ("extract"), and the pellet was extracted with the next salt step. For the preparative extraction, nuclear pellet (25 ml, from 0.15 M NaCl extraction) was homogenized on ice in a final volume of 200 ml of 0.5 M NaCl in HGEϩϩ and centrifuged (150,000 ϫ g for 60 min at 4°C). The supernatant, termed the 0.5 M NaCl extract of the nuclear pellet, was stored at Ϫ80°C.
High S Column of 0.5 M NaCl Extract of Nuclear Pellet-An aliquot (140 mg of protein in 33 ml) of the 0.5 M NaCl extract of the nuclear pellet was subjected to chromatography on a 5-ml High S column as described previously (26). The flow-through was at 0.1 M NaCl, and the salt steps were at 0.25, 0.5, 0.75, and 1.0 M NaCl.
Phosphoserine Columns of NaCl High S Eluates-Five ml of the 0.5 M NaCl High S eluate were diluted to 0.0165 M with HGE and applied to a 0.5-ml Ser(P) column equilibrated in 0.0165 M NaCl, HGE. The column was eluted stepwise with 3 ϫ 0.5 ml of 0.01, 0.05, 0.1, 0.2, 0.4, 0.6, and 1 M Ser(P) in HGE. The 0.75 M NaCl High S eluate (5 ml) was identically subjected to Ser(P) column chromatography. Samples were analyzed on duplicate 4 -15% SDS gels; one was Coomassiestained, and one was analyzed by phosphoCTD interaction blotting.
Q Column of 0.1 M Ser(P) Eluate-The 0.1 M Ser(P) fraction was diluted to 0.075 M in HGE and applied to a 0.5-ml Q column equilibrated with 0.075 M NaCl, HGE. Proteins were eluted stepwise with 3 ϫ 0.5 ml of 0.075, 0.1, 0.2, 0.3, 0.4, and 1 M NaCl. Fractions were gel-analyzed as before.
PhosphoCTD Interaction Blotting-Proteins were separated on duplicate 4 -15% precast SDS gels (Bio-Rad) and either stained with Coomassie Blue or electroblotted to nitrocellulose. Membranes were then incubated overnight at 4°C in phosphate-buffered saline, 0.2% Tween 20, 3% dry milk, 10 mM NaF, phenylmethylsulfonyl fluoride, 2 mM dithiothreitol. Blots were probed in the above solution with 32 Plabeled hyperphosphorylated ␤-galactosidase-yeast CTD (37) for 4 h at 4°C. Note that each batch of phosphoCTD probe was incubated with yeast CTDK-I until complete mobility shift of the fusion protein was achieved (34,37). Membranes were washed in phosphate-buffered saline, 0.2% Tween 20 for 4 ϫ 10 min at 4°C and then exposed overnight to film or a PhosphorImager screen.
Identification of PCAPs-Well separated protein bands (Coomassie-stained gels) representing phosphoCTD-associating proteins (interaction blots), as indicated in the text and figures, were excised and subjected to in-gel trypsinization and MALDI mass spectrometry analysis (at either the Keck Foundation Biotechnology Resource Lab, Yale University or the Laboratory for Protein Microsequencing and Proteomic Mass Spectrometry, University of Massachusetts Medical School, Worcester Foundation Campus, Shrewsbury, MA (www. umassmed.edu/proteomic/)). The masses of the tryptic peptides revealed the identities of the putative PCAPs as summarized in Table I. In several cases, MS/MS analysis of selected peptides confirmed the identity or provided the identity (see Table I

RESULTS
Extracting PCAPs from the Nuclear Pellet-We sought a compartment of the nuclear proteome likely to be enriched for proteins that interact directly with the phosphoCTD of elongating RNAP II. Because elongating RNAP II is tightly associated with its template and is found in the pelleting fraction of nuclei extracted with low salt (40), we decided to use as starting material a centrifugally prepared pellet from HeLa cell nuclei. Because this material is generated at roughly physiological salt concentrations, it seemed likely that many PCAPs bound to the phosphoCTD in the living cells should remain bound during preparation of the nuclear pellet. Indeed, interaction blot (Far-Western) analysis of the proteins in the nuclear pellet confirmed that it was a rich source of PCAPs (Fig.  1). To dissociate the PCAPs from the phosphoCTD and solubilize them for fractionation, we extracted the pellet with increasing salt concentrations. Interaction blot analysis indicated that 0.5 M NaCl was effective in solubilizing the majority of PCAPs (Fig. 1), and we picked that concentration for our standard protocol.
Fractionating the PCAPs-To make possible the identification of individual PCAPs by mass spectrometry, we fractionated the nuclear extract with a series of column chromatography steps. At each step, column fractions were monitored by interaction blot analysis to identify fractions enriched for PCAPs. PCAP-enriched fractions were subjected to additional column chromatography steps; when a PCAP was pure enough, it was identified by excision from a stained gel followed by mass spectrometric analysis of trypsin fragments (see "Experimental Procedures," and e.g. see Ref. 31). The fractionation scheme is summarized diagrammatically in Fig. 2.
High S column chromatography of the nuclear extract separated PCAPs into several fractions with the majority of the strong phosphoCTD binding activities in the 0.5 and 0.75 M salt steps (Fig. 3). To further purify the proteins possessing phosphoCTD binding activity, the two High S fractions were separately applied to phosphoserine columns, which were eluted with steps of increasing Ser(P) (Fig. 2).
Chromatography of the High S 0.5 M salt step fraction on a Ser(P) column cleanly separated two major PCAPs into the 0.1 M salt step fraction (Fig. 4A, lane 4, ϳ115-kDa species) and the 0.2 ϩ 0.4 M salt step fractions (ϳ100-kDa species, lanes 5 and 6). Note the specificity of the interaction blot results; most of the major protein species (Stained Gel) do not show any binding to the phosphoCTD probe (Interaction Blot). The proteins representing the predominant PCAP in the 0.1 M step and the 0.2 M step fractions were excised from a stained gel (Fig. 4A) and identified by mass spectrometry (see "Experimental Procedures," Table I were excised from a stained gel (more heavily loaded than that in Fig. 4B) and identified. The ϳ200-kDa PCAP was identified as DNA (cytosine-5) methyltransferase 1 (DNMT1 ;  Table I and below). The ϳ105-kDa PCAP and the ϳ90-kDa PCAP were both found to be hnRNPs, namely hnRNP U and hnRNP R, respectively (see Table I, Fig. 5, and below).
Proteins in the 0.1 M salt step of the first Ser(P) column (Ser(P) column on left in Fig. 2; onput ϭ 0.5 M step of the High S column) were fractionated by chromatography on High Q resin (Fig. 2). PCAPs were found in the 0.2 M salt step fraction and in the flow-through of the High Q column (data not shown). The phosphoCTD binding properties of the most prominent PCAPs in the 0.2 M step fraction, CA150, have been previously described (see Fig. 1 in Carty et al. (26)); its identification is further discussed below (see Fig. 5). The two other major PCAPs in this fraction (see Fig. 1 in Carty et al. (26)) were identified as hnRNP R (also found in a Ser(P) column fraction, above) and NSAP1, a protein related to hnRNP R (see Fig. 5, Table I, and below). Interestingly, the flow-through contained another phosphoCTD-associating hnRNP, hnRNP D (see Fig. 5, Table I, and below).
Identifying the PCAPs-Protein bands in excised gel slices were subjected to trypsin digestion followed by mass spectrometry (see "Experimental Procedures"). We initially categorized the proteins as belonging to three different identification confidence classes depending on the mass spectra and their analyses. Confidence class A contained proteins identified by both MALDI and MS/MS analyses. A MALDI spectrum illustrating a class A protein, CA150, is presented in Fig. 5A. Twenty-four mass values from the spectrum were used to conduct a MS-Fit search of the National Center for Biotechnology Information (NCBI) protein data base (parameters in legend for Fig. 5); the search yielded human CA150 as the number one hit with 22 matches. This identification was confirmed by MS/MS analysis of one peptide (m/z ϭ 833) and independent MS-Tag search (not shown). Three other proteins were identified similarly: PARP, NSAP1, and hnRNP R (i.e. essentially unambiguous identification by the original MALDI spectrum followed by confirmation for one or more peptides by MS/MS analysis; see Table I).
Another three proteins were identified either by MALDI analysis or by MS/MS analysis, and they were placed in confidence class B. For example, the MALDI analysis of hnRNP U was suggestive but not conclusive (Table I). However, a peptide of m/z 1715 was subjected to MS/MS (Fig. 5B), and MS-Tag unambiguously identified it as a segment of hnRNP U. In a second case, the original MALDI results indicated that a mixture of proteins had been analyzed, but MS/MS on a peptide of m/z 915 (Fig. 5C) revealed a unique sequence that MS-Tag found to be part of hnRNP D. Finally, MS/MS was not applied to topoisomerase I, but the protein was clearly identified by analysis of its MALDI spectrum using ProFound and PeptideSearch (not shown). A third confidence class was represented by DNMT1 for which MS/MS was not carried out, and the MALDI analyses were suggestive but not conclusive (Table I; but see below).
In summary, our interaction blot analysis of the HeLa nuclear pellet revealed a number of easily detectable proteins that bind directly to the hyperphosphorylated CTD. Applying two or three simple fractionation steps to the pellet extract allowed us to obtain material suitable for identification by mass spectrometry, and we identified eight of the phos-phoCTD-associating proteins (see Table I (Fig. 2). Dots indicate bands that were excised and identified. P-Ser, phosphoserine.
The PCAPs fall into two major classes: proteins implicated principally in RNA processing and proteins involved in DNA transactions. The first class was expected, whereas the second class was unanticipated.
Confirming the Identity and PhosphoCTD Binding of PCAPs-To confirm the assigned identities, we carried out additional experiments with most of the PCAPs. First, of course, we have already published experiments on CA150 demonstrating both that recombinant segments of the protein (primarily FF domain-containing segments) bind the phos-phoCTD and that antibodies to CA150 co-precipitate preferentially RNAP II0 (RNA polymerase II carrying a hyperphosphorylated CTD) from nuclear extracts (26). We have now similarly utilized recombinant proteins or specific antibodies to confirm the phosphoCTD binding and identity of additional PCAPs.
In the case of DNA topoisomerase I, we confirmed phos-phoCTD binding for the human enzyme and for Topo I from yeast. Recombinant human Topo I was subjected to interaction blot analysis; the results clearly show that it binds to the phosphoCTD (Fig. 6A). For yeast, we also tested purified recombinant Topo I. As seen in Fig. 6A, the yeast enzyme binds the phosphoCTD. Control experiments with non-phosphorylated CTD showed no binding to the topoisomerases (data not shown). Therefore we conclude that our identification of the PCAP in lane 5 of Fig. 4A was correct and that topoisomerase I binds directly to the phosphoCTD.
To confirm phosphoCTD binding of DNMT1 we obtained purified recombinant protein (New England Biolabs, catalog no. M0230) and compared its binding to that of a related enzyme from bacteria, HhaI DNA methyltransferase (New England Biolabs, catalog no. M0217). HhaI methyltransferase is not expected to bind the phosphoCTD as bacterial RNA polymerases do not contain a CTD. As shown in Fig. 6B, recombinant human DNMT1 binds the phosphoCTD (lanes 2 and 3), whereas the bacterial methyltransferase does not (lane 1). In addition to the interaction blot analysis, preliminary experiments carried out in collaboration with S. Pradahn (New England Biolabs) 2 also suggest that DNMT1 is associated selectively with RNAP II0 in nuclear extracts, reminiscent of the situation for CA150.
For the third DNA/chromatin-modifying PCAP, PARP, we obtained purified bovine enzyme (BIOMOL, catalog no. SE-165; 30 -50% pure) and tested its phosphoCTD binding capability. As shown in Fig. 6C, purified PARP (113 kDa) binds the phosphoCTD (lanes 2). Several significant contaminants in the preparation do not bind (lanes 2) nor does a ␤-galactosidase control protein present in higher amounts than PARP (lanes 1). We conclude that indeed PARP is a phosphoCTDassociating protein.
To investigate one of the hnRNPs identified in our PCAP hunt, we obtained recombinant hnRNP D (a kind gift of N.  Maizels) and subjected it to interaction blot analysis. The experiment of Fig. 6D demonstrates that purified hnRNP D indeed binds to the phosphoCTD. A control experiment indicated that the non-phosphorylated CTD does not bind (data not shown). Thus hnRNP D is a phosphoCTD-associating protein.
Two more hnRNPs were identified as PCAPs, hnRNP R and hnRNP U (see Figs. 2 and 4B). We believe these identities will be confirmed by future tests because yet a fourth hnRNP was found among our newly identified PCAPs, namely NSAP1 (Fig.  2). Although NSAP1 was first identified as a protein that binds the NS1 protein of minute virus of mice (41), it has more recently been identified as one of three splice variants of hnRNP Q (42) (other synonyms for Q splice variants are SYNCRIP and GRY-RBP; Refs. [42][43][44]. We obtained antibodies against NSAP1 and used them to test the assigned identity of the phosphoCTDbinding protein. We found that the phosphoCTD-binding protein band detected by interaction blot analysis also reacted with antibody specific to NSAP1 (data not shown).  2 and 3), bacterial HhaI (lane 1) and the FF5 domain of human CA150 (lane 4; fused to glutathione S-transferase) were analyzed by PCTD interaction blotting. Protein amounts in lanes 1, 2, and 4 were roughly equal. C, mammalian PARP (lanes 2; band at "113" kDa; ϳ30% pure) and bacterial ␤-galactosidase (lanes 1) were subjected to SDS-PAGE, electroblotted to nitrocellulose, and either stained (India Ink) or incubated with phosphoCTD probe. The amount of ␤-galactosidase was estimated to be ϳ0.8 g. D, aliquots of hnRNP D were analyzed by SDS-PAGE and either Coomassie-stained (lane 1) or subjected to phosphoCTD interaction blotting (lane 2).

The Approach
We have demonstrated that interaction blotting with a [ 32 P]phosphoCTD probe is an efficient way to detect proteins that bind directly to the phosphoCTD and that after modest purification the proteins can be identified via mass spectrometry. Applying this approach to a nuclear pellet from HeLa cells, a starting material that contains chromatin-bound, elongating RNAP II (and in fact Ͼ90% of the total RNAP II; see Ref. 32), we have identified eight novel human PCAPs to date. For most of the PCAPs, we have verified phosphoCTD binding capability by follow-up tests using recombinant and/or purified proteins. The known properties of the new PCAPs have suggested novel and unanticipated functions for the phos-phoCTD in coordinating diverse nuclear processes.
The probe we used to detect human PCAPs was the yeast CTD hyperphosphorylated with yeast CTDK-I. The yeast domain, as used, consists of 25 repeats of the consensus CTD heptamer YSPTSPS with only eight non-consensus amino acids along its entire 175-residue length (37). Note that the human CTD is twice as long (52 repeats) and that it contains numerous positively charged residues in its distal (C-terminal) half (e.g. see Ref. 1). Such charged, non-consensus repeats are not found in the yeast CTD, and our probe therefore is an analogue of the proximal half of the mammalian CTD. The kinase we used, CTDK-I (34 -36), is homologous to human P-TEFb (45) and to other biochemically uncharacterized human open reading frames (46). In view of the sequence relatedness and because CTDK-I and P-TEFb are both involved in phosphorylating RNAP II for elongation (47,48), the pattern of phosphates on our probe is expected to be very similar to that generated on the consensus repeats of the human CTD by P-TEFb or other elongation-related CTD kinases. Current in vivo data suggest that elongating RNAP II is predominantly phosphorylated on Ser-2 of the heptad repeats (49 -51).
As for any proteomics interaction search, questions of binding specificity and in vivo relevance of binding arise. In our studies, the specificity of binding is extremely high in that only a very small percentage of nuclear proteins in any one fraction (stained gels) binds the phosphoCTD (interaction blots) (e.g. see Figs. 1, 3, and 4). This degree of binding selectivity recapitulates the specificity observed earlier (9,26,31,52). One possible source of artifactual interaction might be adventitious binding of the negatively charged phosphoCTD to protein domains that normally bind other negatively charged polymers such as DNA or RNA. A published study, in addition to data contained herein (below), serve as good indications that this possibility is not a significant problem. Morris and Greenleaf (9) found that probing bacterial extracts with the phosphoCTD revealed no PCAPs (see Fig. 3, lane 13 of Ref. 9), although such extracts contain hundreds of DNA-and RNA-binding proteins. This finding strongly suggests that the large number of eukaryotic homologues of these bacterial nucleic acid-binding proteins also will not bind the phos-phoCTD. Further, two results in the current study contain excellent controls that help dissipate concern about this type of fortuitous interaction. First, whereas bacterial DNA methyltransferase HhaI is a close structural relative of the catalytic domain of human DNMT1 and both enzymes presumably bind DNA very similarly, only the human enzyme binds the phos-phoCTD (Fig. 6B). Second, although full-length Topo I binds the phosphoCTD (Fig. 6A), this binding is not mediated by the catalytic (DNA binding) domain but rather by the non-catalytic N-terminal domain. 3 Thus, although some DNA or RNA binding domains may ultimately be found to also bind the phos-phoCTD, we conclude that nonspecific phosphoCTD binding by such proteins does not contribute significantly to our PCAP discovery efforts.
The question of functional relevance of the binding interactions revealed by PCAP identification is also important to consider.
The following examples illustrate that, in all cases of which we are aware, combined characterization of both the in vitro binding properties and in vivo behavior of PCAPs yields results that invariably support the functional relevance of the PCAP-phosphoCTD interaction. In particular, the majority of these proteins display binding that depends not on the extent of CTD phosphorylation but on a particular pattern of CTD phosphorylation, a pattern present on the CTD when the PCAPs bind RNAP II in vivo. The best extant examples are from the genetically tractable yeasts. As one example, capping guanylyltransferase (Pce1) and capping triphosphatase (Pct1) from fission yeast bind to CTD repeats phosphorylated either uniquely at Ser-5 or doubly at Ser-2 ϩ Ser-5; however, they bind weakly or not at all to CTD repeats phosphorylated on Ser-2 (53). These findings are consistent with current data indicating that RNAP II at the 5Ј ends of transcription units is preferentially phosphorylated on Ser-5, whereas RNAP II at the middle or 3Ј ends of transcription units is preferentially phosphorylated on Ser-2 (50). Also consistent with these two sets of data, capping enzyme cross-links to 5Ј ends of genes much better than to the middle or 3Ј ends of genes (50,54). Furthermore, genetic studies reveal synthetic lethal interactions between genes for capping enzyme and a CTD kinase with Ser-5 specificity (55). Another good example comprises cleavage/polyadenylation factors. Budding yeast Pcf11, a constituent of cleavage factor IA, shows specificity opposite to that of capping enzymes, binding to CTD repeats containing Ser-2 phosphate but not to repeats containing Ser-5 phosphate (56). This binding specificity parallels the shift from Ser-5 to Ser-2 phosphorylation as RNAP II progresses from the 5Ј to 3Ј end of a transcription unit; also in parallel, RNAP II at the 3Ј end of a gene is associated with more Pcf11 than RNAP II at the 5Ј end of a gene (56). Furthermore, a Pcf11interacting protein (Pti1) shows genetic interactions with Ctk1, the catalytic subunit of the CTD kinase principally responsible for Ser-2 phosphorylation of elongating RNAP II (see Ref. 51). 4 A third example is the yeast prolyl isomerase Ess1, which potentially associates with RNAP II at many points along entire transcription units; Ess1 binds to CTD repeats with multiple phosphates, but its binding is relatively insensitive to phosphorylation pattern (31,52). Additional examples of phosphorylation pattern-specific binding come from studies of mammalian PCAPs, although these investigations, for the most part, are less complete than the yeast studies due to a lack of extensive in vivo functional correlates. As one example, a splicing-like factor discovered in the Corden laboratory (57), SCAF8, has been shown to bind preferentially to a CTD phosphorylated at both Ser-2 and Ser-5, whereas another factor, SCAF1, binds preferentially to a CTD phosphorylated at either Ser-2 or Ser-5. Presumably these binding preferences will be found to reflect functional features of these proteins. Overall, although the binding specificity of only a few PCAPs has been determined to date, it is clear that CTD binding of most PCAPs depends on the pattern of CTD phosphorylation rather than merely on the presence of phosphates (negative charges). Moreover, where the binding properties of a PCAP have been determined they are in complete accord with the known or inferred in vivo function of that PCAP. Accordingly, we predict that future characterizations of the human PCAPs identified in this study will reveal that these proteins bind the phosphoCTD in vivo in functionally relevant ways.
The sensitivity of the interaction blot technique as performed with the phosphoCTD makes this approach quite useful. Because the ability to detect a specific PCAP depends both on its abundance in the fraction being tested and on its affinity for the phosphoCTD under interaction blot conditions, the exact nature of the starting material and its treatment prior to probing have a large influence on which PCAPs are detected. Thus, faint bands in the total extract can become strong bands in later chromatographic fractions. The repeat nature of the CTD makes it an unusual probe; because it contains multiple repeats (i.e. multiple potential binding sites for a PCAP), the effective binding to the immobilized proteins is expected to be tighter than that for a single repeat (9). This feature may enhance the detection efficacy of the method.
Our approach has already generated numerous PCAPs that will be of some interest for future characterizations. Detailed studies of the binding interactions between the phosphoCTDinteracting domain of a PCAP and differently phosphorylated CTD peptides will be an essential part of determining how the two partners interact and how those interactions relate to the functions of the proteins in vivo (see above and see Refs. 17, 52, 56, and 57). Although the phosphoCTD-interacting domains of most of the new PCAPs remain to be identified, CA150 was found to interact with the phosphoCTD principally through its so-called FF motifs (26); this observation was the first described specific function for an FF "domain." It will be enlightening to identify the phosphoCTD-interacting domains from additional PCAPs and to characterize and compare motifs mediating phosphoCTD-PCAP interactions.

The Proteins
Topoisomerase I-The first DNA enzyme we identified as a PCAP was DNA topoisomerase I. It has been known for years that Topo I is present at sites of actively transcribed chromatin (58); one of its principal functions at those sites is thought to be preventing RNA/DNA heteroduplex formation by removing transient supercoils that arise as a consequence of transcription (59,60). However, because Topo I also functions at sites of activity of other nuclear RNAPs (61), it is perhaps surprising that it binds the phosphoCTD (PCTD), a feature unique to RNAP II. There may be more than one reason for the association of Topo I with the PCTD. The great length of most RNAP II transcription units might necessitate a mechanism to ensure that Topo I will be near the transcribing polymerase to solve topological problems generated by extensive RNAP II translocation (see also Ref. 62). Also, it may be that recruitment of Topo I to the polymerase is important for roles not related to the catalytic activity of Topo I (63,64). In any case, it is striking that Hsieh and colleagues (65) demonstrated that only a noncatalytic N-terminal segment of Topo I is required to target a fused reporter protein (␤-galactosidase) to sites of active transcription in vivo.
DNA (Cytosine-5) Methyltransferase 1-Another DNA-modifying enzyme among the PCAPs we identified is DNMT1, which adds a methyl group to the C-5 position of cytosines in CpG dinucleotides (66). DNMT1 is a large protein with many "regulatory" domains attached to a C-terminal catalytic domain (e.g. see Refs. 67 and 68). Although the actual in vivo functions of DNMT1 are not universally agreed upon, it seems clear that one of its roles is to methylate mobile DNA elements, ensuring their transcriptional silence (69). However, the mechanisms by which specific CpGs in the mammalian genome are chosen for methylation, by DNMT1 or by other DNMTs, are not understood. Most CpGs in euchromatic regions of the human genome reside in introns (with many of these in mobile elements), and these CpGs are normally methylated (70); a smaller fraction of CpGs resides in CpG islands around certain promoters, and these CpGs are normally not methylated (71,72). We believe it noteworthy that most normally methylated CpGs in euchromatin are in regions of the genome traversed by RNAP II0 (e.g. the introns). We propose the following explanation: DNMT1 is recruited to the PCTD of elongating RNAP II and thereby assumes a position from which it can scan the large fraction of the human genome that passes through RNAP II (see Fig. 7). Attached to the polymerase, DNMT1 adds methyl groups to unmethylated CpGs as they pass by (e.g. those found in a newly acquired mobile element). Once methylated because of this transcriptional coupling, CpGs internal to transcription units are normally maintained in a methylated state during DNA replication as generally supposed (see also Ref. 73 for a relationship between transcription and methylation).
Poly(ADP-ribose) Polymerase-1-PARP-1 has been implicated in both DNA and RNA transactions. Its activity is important for genome integrity, and it has a role or roles in dealing with DNA damage (e.g. see Ref. 74). In response to certain DNA damage, for example, it covalently modifies chromosomal proteins, including histones (e.g. see Refs. 75 and 76). PARP also has been shown to play a role in transcriptional regulation, notably participating in activation involving NF-B and human T cell leukemia virus Tax protein (77)(78)(79) but also participating in repression (80). Strikingly, in its role as transcriptional activator, the catalytic activity of PARP is dispensable (a finding paralleling that for Topo I, see above). Ironically, PARP was found decades ago to be one of the originally described general transcription factors, TFIIC (81); however, the functional significance of this finding was generally discounted until recently. Additionally, PARP has been found as a component of a complex involved in Ig class switching (82); remarkably, this DNA recombination-related functional implication is shared with another new PCAP, hnRNP D (see below). It is likely that several of the many activities of PARP will be found to be influenced by or dependent on its PCTD binding.
hnRNP D-Like PARP, hnRNP D has been linked to both RNA and DNA transactions. It plays a role in mRNA turnover (as AUF1) in the cytoplasm (83,84), but it also is part of a transcription factor in the nucleus (39). Intriguingly, hnRNP D binds G quartet DNA structures, and this property is the basis for the suggestion that hnRNP D plays a role in Ig switch recombination (85). HnRNP D associates with nucleolin (85), which is also found in the putative switch recombinase activity along with PARP (see above). Our demonstration of the phos-phoCTD binding capability of hnRNP D adds a new perspective to views of its roles and possible mechanisms by which those roles are accomplished.
CA150 -CA150 was initially described in a transcriptional context, and in vivo results indicate that it can modulate transcription elongation on some genes (86,87). On the other FIG. 7. Working model of PCAPs bound to PCTD of elongating RNAP II0. RNAP II0 (red with pink phosphates) is traversing the template (DNA, brown; meant to represent both DNA and chromatin) and synthesizing a pre-mRNA transcript (green). The model depicts one subset of PCAPs (brownish tints) below the phosphoCTD interacting with DNA/chromatin and another subset (greenish tints) interacting (or poised to interact) with the RNA transcript or RNA-bound factors. FBP11 is a mammalian counterpart of yeast Prp40, and SF1 is a mammalian counterpart of yeast branchpoint-binding protein (see Refs. 9 and 14). Placement of proteins along the phosphoCTD is arbitrary. Note that, although not explicitly indicated, our results also fit nicely with the idea of a transcription factory in which the DNA template is drawn through a stationary polymerase (62); indeed, the phosphoCTD and some of its PCAPs may play important roles in the structure of such factories (see e.g. Ref. 16).
hand, in domain structure and in known protein-protein interactions, it bears striking similarities to the splicing factor Prp40 (FBP11/HYPC) (see e.g. Ref. 9). As mentioned, we showed earlier that the segment of CA150 principally responsible for PCTD binding is the FF motif-containing C-terminal part of the protein (26); the corresponding part of Prp40 similarly binds the PCTD (9). Furthermore, Prp40 binds to the branchpoint-binding protein, whereas CA150 binds to the human counterpart of that protein, SF1 (88). Our results suggest that its association with the PCTD stations CA150 in a position from which it can play roles in both transcription elongation and splicing.
hnRNP U-Like several other newly recognized PCAPs, hnRNP U (SAF-A) has been implicated in processes involving both DNA and RNA. In terms of RNA transactions, for example, hnRNP U has been shown to associate with and affect the activity of the glucocorticoid receptor (89,90). Interestingly, HnRNP U has been associated with the process of CTD phosphorylation previously, but the earlier work implicated U protein as a TFIIH-binding inhibitor of CTD phosphorylation (91); it is not yet clear how our results, and the PCTD binding of hnRNP U, will mesh with the previous study. In terms of DNA transactions, hnRNP U (SAF-A) has been found to be a nuclear matrix protein that binds to scaffold attachment regions of DNA (92,93). In view of the matrix association of hnRNP U, it is almost certainly not coincidental that another protein associated with the matrix is in fact RNA polymerase II0, the hyperphosphorylated form of RNAP II (94 -96). It will be instructive to unravel the significance of these interrelated properties of hnRNP U and RNAP II.
NSAP1 and hnRNP R-NSAP1 and hnRNP R are two PCAPs that share about 80% sequence identity. NSAP1 was identified by yeast two-hybrid screening as a HeLa protein that binds to the NS1 regulatory protein of the minute virus of mice (41). It has also been identified as a component of a complex involved in mRNA turnover along with hnRNP D (97). NSAP1 has more recently been identified as one of three splice variants of hnRNP Q (42) (other synonyms for Q splice variants are SYNCRIP and GRY-RBP; Refs. [42][43][44]. Remarkably, hnRNP Q proteins have been found to bind to SMN protein, mutations in which cause spinal muscular atrophy (42,98) probably by disturbing spliceosome formation and splicing. Furthermore hnRNP R also associates with the SMN complex (42,98). Their associations with SMN suggest that hnRNP Q and hnRNP R may be involved in spliceosome formation, and immunodepletion experiments directly implicate Q proteins in splicing (42); this implication is consistent with the presence of hnRNP Q3 (GRY-RBP) in spliceosomes (99). That hnRNPs Q and R interact with both the phos-phoCTD and the splicing-involved SMN complex is provocative given that the SMN complex has been shown previously to display interactions with RNAP II (100). Our new findings, although not addressing when the factors associate with RNAP II during its functional cycle (see e.g. Refs. 20 and 101), are significant in suggesting new physical links between the splicing and transcription machineries mediated through hnRNPs Q/R and the SMN complex. This suggestion is consistent with and extends the proposal that spliceosome assembly occurs co-transcriptionally and is facilitated by tethering the interacting components to the phosphoCTD of elongating RNAP II (9,14).

Implications
The connections revealed by our findings, between the phosphoCTD and several factors involved in diverse events in the nucleus, suggest that the phosphoCTD physically links a large number of nuclear processes to the transcription apparatus. The variety of these processes greatly exceeds that previously known to be CTD-linked. Our observations and their implications suggest that the phosphoCTD plays a major role in both physical and functional organization of the nucleus (see Fig. 7). In terms of RNA processing, our findings supplement and expand previously known connections to the phosphoCTD. Furthermore, our results implicate the phos-phoCTD in a set of DNA transactions not previously known to be transcriptase-linked. Beyond providing new insights into nuclear organization, our results also point out the potential medical relevance of the phosphoCTD and its protein interactions. Many of the PCAPs we have identified have been linked to processes that are disturbed in diseases. As one example, our hypothesis that the transcription connection of DNMT1 is normally involved in establishing methylation patterns on RNAP II-transcribed stretches of DNA can be logically extended to propose that some altered methylation patterns, such as those observed in cancer cells (e.g. see Refs. 102 and 103), are related to this transcription connection. Another example is found in the case of NSAP1/hnRNP Q. If hnRNP Q tethers SMN complex to the phosphoCTD, as our results suggest, it is then easy to suppose that the spinal muscular atrophy-causing mutations in SMN protein, which delete the segment of SMN that binds hnRNP Q (42), may result in inappropriate uncoupling of SMN from elongating RNAP II0 with attendant disastrous consequences. A third example is PARP, mutations in which result in the inability to recover from genotoxic stress at least partly because of defects in DNA repair (75). If one function of PARP is to scan for DNA damage from the vantage point of a phosphoCTD attachment, it is easy to see how absence of this enzyme, or possibly just the loss of its ability to bind the phosphoCTD, could lead to problems with genome stability. These few examples should suffice to indicate the potential scientific and medical importance of delving further into this unusual transcriptase tail and the mechanisms by which its many protein interactions influence nuclear structure and function.
Astell for antiserum against NSAP1, Mary-Ann Bjornsti for yeast Topo I, Tao-shih Hsieh and Jianhong Wu for the blot of human Topo I, Nancy Maizels for hnRNP D, and Sriharsa Pradhan for DNMT1 and for communicating unpublished results. We thank Jan Jones, Hemali Phatnani, and David Skaar for advice on the manuscript, and we acknowledge Caroline Chromey for preparation of Fig. 7. We thank Timothy Bestor for helpful discussions.
* This work was supported in part by a grant-in-aid from the American Heart Association (to A. L. G.) and by National Institutes of Health Tumor Immunology Training Grant 5T32CA09058 (to S. M. C.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.