Proteomics studies of the interactome of RNA polymerase II C-terminal repeated domain

Background Eukaryotic RNA polymerase II contains a C-terminal repeated domain (CTD) consisting of 52 consensus heptad repeats of Y1S2P3T4S5P6S7 that mediate interactions with many cellular proteins to regulate transcription elongation, RNA processing and chromatin structure. A number of CTD-binding proteins have been identified and the crystal structures of several protein-CTD complexes have demonstrated considerable conformational flexibility of the heptad repeats in those interactions. Furthermore, phosphorylation of the CTD at tyrosine, serine and threonine residues can regulate the CTD-protein interactions. Although the interactions of CTD with specific proteins have been elucidated at the atomic level, the capacity and specificity of the CTD-interactome in mammalian cells is not yet determined. Results A proteomic study was conducted to examine the mammalian CTD-interactome. We utilized six synthetic peptides each consisting of four consensus CTD-repeats with different combinations of serine and tyrosine phosphorylation as affinity-probes to pull-down nuclear proteins from HeLa cells. The pull-down fractions were then analyzed by MUDPIT mass spectrometry, which identified 100 proteins with the majority from the phospho-CTD pull-downs. Proteins pulled-down by serine-phosphorylated CTD-peptides included those containing the previously defined CTD-interacting domain (CID). Using SILAC mass spectrometry, we showed that the in vivo interaction of RNA polymerase II with the mammalian CID-containing RPRD1B is disrupted by CID mutation. We also showed that the CID from four mammalian proteins interacted with pS2-phosphorylated but not pY1pS2-doubly phosphorylated CTD-peptides. However, we also found proteins that were preferentially pulled-down by pY1pS2- or pY1pS5-doubly phosphorylated CTD-peptides. We prepared an antibody against tyrosine phosphorylated CTD and showed that ionizing radiation (IR) induced a transient increase in CTD tyrosine phosphorylation by immunoblotting. Combining SILAC and IMAC purification of phospho-peptides, we found that IR regulated the phosphorylation at four CTD tyrosine sites in different ways. Conclusion Upon phosphorylation, the 52 repeats of the CTD have the capacity to generate a large number of binding sites for cellular proteins. This study confirms previous findings that serine phosphorylation stimulates whereas tyrosine phosphorylation inhibits the protein-binding activity of the CTD. However, tyrosine phosphorylation of the CTD can also stimulate other CTD-protein interactions. The CTD-peptide affinity pull-down method described here can be adopted to survey the mammalian CTD-interactome in various cell types and under different biological conditions.

The mammalian RNAPII-CTD is also phosphorylated on Y 1 [10]. Both ABL1 and ABL2 (ARG) tyrosine kinases can catalyze the stoichiometric phosphorylation of CTD-Y 1 on RNAPII in vitro [10][11][12][13][14]. Recent phospho-proteomics studies have mapped several tyrosine phosphorylation sites in the mammalian RNAPII-CTD [15,16], and the yeast RNAPII is also phosphorylated on tyrosine by an unknown kinase [17]. An increase in the levels of RNAPII tyrosine phosphorylation has been observed following DNA damage and correlated with the activation of nuclear ABL tyrosine kinase in mammalian cell lines and mouse tissues [11,18]. To determine the effect of Y 1 -phosphorylation (pY 1 ) on the CTD-protein binding function, we used CTD-peptides as baits to pulldown mammalian cellular proteins and identified these CTD-interacting proteins by mass spectrometry. We used six different CTD peptides, each with four consensus heptad repeats and a unique phosphorylation pattern (no phosphorylation, pY 1 , pS 2 , pS 5 , pY 1 pS 2 , pY 1 pS 5 ). We found a number of RNA-binding proteins in the pS 2and the pS 5 -peptide pull-down fractions, however, those proteins were not pulled-down by the doubly phosphorylated pY 1 pS 2 -CTD or the pY 1 pS 5 -CTD peptides. The negative effect of pY 1 on the interaction of pS 2 -CTD with the CTD-interaction domain (CID) was confirmative of a previous report [17]. However, our study also identified proteins that were preferentially pulled-down by the pY 1 pS 2 -or the pY 1 pS 5 -CTD peptide, suggesting that tyrosine phosphorylation can either inhibit or stimulate the protein binding activities of the CTD.
Plasmids 3XMyc-human SCAF4 CTD interacting domain was generated by ligating PCR products of SCAF4 into KpnI/ XhoI digested pcDNA3.0. Three rounds of PCR using were used. The first round used forward primer: 5Kpn-IMYC15CID (5′-GAC CTA GGT GGG GAA CAG AAA CTG ATT TCG GAA GAA GAT CTC ATG GAC GCC GTC-3′) and reverse primer XhoI15CID (5′-CCG CTC GAG TTA CGC TGC CAT GTC-3′). The second round used was forward primer: 5KpnI2ndMYC: (5′-GAT CTG GGA GGC GAG CAG AAG CTA ATA TCC GAG GAA GAC CTA GGT GGG-3′) and reverse primer: XhoI15CID (5′-CCG CTC GAG TTA CGC TGC CAT GTC-3′). The third round used forward primer: 5KpnI3rdMYC: (5′-GGG GTA CCA TGG AAC AAA AAC TCA TCT CAG AAG AGG ATC TGG GAG GC-3′) and reverse primer: XhoI15CID (5′-CCG CTC GAG TTA CGC TGC CAT GTC-3′). 3XMyc-human full length Mutant RPRD1B was generated by ligating 344 bp DNA fragment-containing mutations synthesized by (GeneScript, N57S, D58S, Q61K, N62R) into KpnI/ EcoRI digested RPRD1B pcDNA5.0/FRT. Mutations were generated based on the generous amino acid analysis and modeling of Pcf11 CID to RPRD1B CID by Dr. Dong Wang, University of California, San Diego. 6XMyc-p72 was used unmodified as previously published in [19]. The AblPPn plasmid was generated by two rounds of ligation: in the first ligation, SbfI/SalI digested fragment from CMV-Abl-PP [20] was ligated to PCR product of SalI/μNES/XbaI fragment from Abl NES mutant plasmid [21] to generate a SbfI/XbaI fragment containing NLS and μNES. In the second round, the SbfI/XbaI fragment was further ligated into SbfI/XbaI digested CMV-Abl-PP-Nuc. The YF-CTD mutant plasmid used in our studies was pAT7Rpb1(FSPTSPS) 18 +Cterm Am r , which expresses a cDNA of the human Pol II large subunit with a truncated CTD containing 18 peptide repeats that have the tyrosine residue mutated to phenylalanine and a complete CTD C-terminus. This expressed cDNA has an amino-terminal B10 epitope tag and a carboxy-terminal 6XHis tag and was a gift from Dr. David Bentley, University of Colorado-Denver.

CTD-peptide affinity chromatography
HeLa nuclear extracts were prepared as previously described [22], with the following modifications: the nuclear pellet was sonicated in lysis buffer and spun for 30 min at maximum speed in a table top centrifuge. The supernatant was collected and contained both nucleoplasm and chromatin bound proteins. Lysis buffer consisted of the following: 10 mM HEPES, pH 7.9, 200 mM NaCl, 1.5 mM MgCl2, 0.2 mM EDTA, 0.5 mM DTT, 0.5 % NP40, 0.125 % Sodium deoxycholate, 0.05 % SDS, 10 % glycerol. Before use, 10 mM Na 2 VO 4 , 10 mM β-Glycerophosphate, 1 mM NaF, 1 mM PMSF, and 1X protease cocktail inhibitor (Roche) were added. The nuclear extract was incubated with 4 µg of CTD antibody 8WG16 overnight at 4º C to immunoprecipitate RNAPII using protein A/G beads (Pierce). Prior to immunoprecipitation, the NaCl concentration was adjusted to 400 mM for 30 min. The supernatant, i.e., nuclear extract immunodepleted of RNAPII, was adjusted to 150 mM NaCl. 100 pmols of each of the six different CTD peptides (four consensus repeats with differing phosphorylation's) were attached to streptavidin-magnetic beads per manufacturer instructions (Roche) and incubated with 5 mg of immunodepleted nuclear extract for 6 h at 4 °C. Beads were washed three times with binding buffer (150 mM NaCl), eluted with SDS-PAGE sample buffer, and fractions were silver stained after running on 4-20 % gel. The eluted fractions were analyzed by mass spectrometry.

Multidimensional protein identification technology (MUDPIT) mass spectrometry
Proteins were reduced and alkylated using 1 mM Tris (2-carboxyethyl) phosphine (Fisher, AC36383) at 94 °C for 5 min and 2.5 mM iodoacetamide (Fisher, AC12227) at 37 °C in dark for 30 min, respectively. Proteins were digested with 1 μg trypsin (Roche, 03 708 969 001) overnight. Supernatant was collected and centrifuged through a 0.22 μM filter (Fisher# 07-200-386). An Agilent 1100 HPLC system (Agilent Technologies, Santa Clara, CA) delivered a flow rate of 500 nL per minute to a 3-phase capillary chromatography column through a splitter. Using a custom pressure cell, 5 µm Zorbax SB-C18 (Agilent) was packed into fused silica capillary tubing (200 µm ID, 360 µm OD, 20 cm long) to form the first reverse phase column (RP1). A 5 cm long strong cation exchange (SCX) column packed with 5 µm PolySulfoethyl (PolyLC, Inc.) was connected to RP1 using a zero dead volume 1 µm filter (Upchurch, M548) attached to the exit of the RP1 column. A fused silica capillary (100 µm ID, 360 µm OD, 20 cm long) packed with 5 µm Zorbax SB-C18 (Agilent) was connected to SCX as the analytical column (the second reverse phase column). The electro-spray tip of the fused silica tubing was pulled to a sharp tip with the inner diameter smaller than 1 µm using a laser puller (Sutter P-2000). The peptide mixtures were loaded onto the RP1 using the custom pressure cell. Columns were not re-used. The peptide mixtures were loaded onto the RP1 column using the same in-house pressure cell. To avoid sample carry-over and keep good reproducibility, a new set of three columns with the same length was used for each sample. Peptides were first eluted from RP1 column to SCX column using a 0-80 % acetonitrile gradient for 150 min. The peptides were fractionated by the SCX column using a series of 7 step salt gradients (0, 20, 40, 60, 80, 100 mM, and 1 M ammonium acetate for 20 min), followed by high-resolution reverse phase separation using an acetonitrile gradient of 0-80 % for 120 min. The mass spectrometer was operated in positive ion mode with a source temperature of 150 °C and a spray voltage of 1500 V. Data-dependent analysis and gas phase separation were employed. The full MS scan range of 300-2000 m/z was divided into 3 smaller scan ranges (300-800, 800-1100, 1100-2000 Da) to improve the dynamic range. Each MS scan was followed by 4 MS/MS scans of the most intense ions from the parent MS scan. A dynamic exclusion of 1 min was used to improve the duty cycle of MS/MS scans. Raw data were extracted and searched using Spectrum Mill (Agilent, version A.03.02). MS/MS spectra with a sequence tag length of 1 or less were considered as poor spectra and discarded. The rest of the MS/MS spectra were searched against the IPI (International Protein Index) database limited to human taxonomy (v3.31, 67,533 protein sequences). The enzyme parameter was limited to full tryptic peptides with a maximum mis-cleavage of 1. All other search parameters were set to SpectrumMill's default settings (carbamidomethylation of cysteines, ±2.5 Da for precursor ions, ±0.7 Da for fragment ions, and a minimum matched peak intensity of 50 %). Search results for individual spectra were automatically validated using the filtering criteria listed in the following Table. Filtering criteria for autovalidation of database search results A concatenated forward-reverse protein database was constructed to calculate the in situ false discovery rate (FDR). The tryptic peptides in the reverse database were compared to the forward database, and were shuffled if they matched to any tryptic peptides from the forward database. The total number of protein sequences in the combined database is 135,069. Proteins that share common peptides were grouped to address the database redundancy issue. The proteins within the same group shared the same set or subset of unique peptides. Only proteins with 2 or more unique peptides were validated. There are 100 proteins observed in the pull-down samples containing CTD peptides (non-modified CTD, pY 1 , pS 2 , pS 5 , pY 1 pS 2 , and pY 1 pS 5 ) but not in the beads only control samples. Functional Annotation of these 100 proteins was completed using the Database for Annotation, Visualization and Integrated Discovery (DAVID) [23,24]. There were no proteins from the reverse database passed the filters mentioned above, which implies the FDR of our protein list is less than 1 %.

Surface plasmon resonance
Sensograms were recorded on a Biacore T200 instrument using streptavidin (SA) chips. All experiments were conducted at 25 °C and approximately 1000 response units of biotinylated CTD peptides 2.6 nM were immobilized on the chip in a high salt buffer (500 mM NaCl, 10 mM Tris, pH 7.5, 0.5 mM EDTA). Sensograms were run using flow cell 1 (FC 1) as an unmodified reference. Data was collected for FC's 2, 3 and 4, which contained differentially phosphorylated CTD peptides. FC2 contained unphosphorylated CTD peptide, FC3 contained a CTD peptide only phosphorylated on the serine residue, and FC4 contained a CTD peptide that was phosphorylated on both a serine and tyrosine residue. In all cases 1.2 nM of CID protein was flowed over the chip at 20 μl/ min with a 3 min contact time and a 3 min dissociation phase. The running buffer used for the binding experiments was 10 mM Tris, pH 7.5, 150 mM NaCl, 3 mM DTT, and 0.2 mM EDTA. Regeneration was achieved using an 8-min pulse of high salt buffer. Each of the four GST-CID fusion proteins was expressed in BL21 E.coli from pDEST ™ 24-CID and purified using glutathione Sepharose (GE Healthcare, Piscataway, NJ) according to manufacturer's instructions.

Co-immunoprecipitation of recombinant GST-CTD with Myc-SCAF4-CID
Human embryonic kidney (HEK) 293T cells were cultured in DMEM supplemented with 10 % Fetal Bovine Serum (Hyclone) and 100 µg/ml each of penicillin and streptomycin. 293T cells grown in 10 cm plates to 80 % confluence were transfected with the indicated plasmid using Lipofectamine (InVitrogen) or GeneTran (Biomiga). Cells were harvested in cold PBS, lysed in NETN buffer (20 mM Tris pH 8.0, 100 mM NaCl, 0.5 mM EDTA, 0.5 % Nonident P-40) on ice for 20 min, sonicated and treated with RNase and DNase for 1 h. For co-immunoprecipitation experiments, 250-500 µg of total protein was used for each immunoprecipitation reaction, either with anti-Myc (9E10)-conjugated agarose beads or mouse IgG-coupled A/G Sepharose beads for 1 h at 4 °C.
Preparation of partially purified RNA polymerase II from HeLa cell nuclear extract RNA polymerase II was obtained from HeLa cells treated with 8 Gy IR. Following IR treatment HeLa cells were washed once in ice cold 1X PBS and harvested by centrifuging at 1500g. The cell pellet was incubated on ice for 10 min in buffer containing 10 mM HEPES, pH 7.9, 100 mM NaCl, 1.5 mM MgCl2, 0.2 mM EDTA, 0.5 mM DTT, 1.0 % Saponin, and 10 % glycerol. Before use, 10 mM Na 2 VO 4 , 10 mM β-Glycerophosphate, 1 mM NaF, 1 mM PMSF, and 1X protease cocktail inhibitor (Roche) were added. After incubation cells were pelleted by centrifugation at 1000g and resuspended in phosphate buffer (PB) and layered onto a 30 % sucrose cushion and centrifuged at 1500g for 10 min. The pellet (nuclei) was resuspended in 10 mM HEPES, pH 7.9, 150 mM NaCl, 1.5 mM MgCl2, 0.2 mM EDTA, 0.5 mM DTT, 1 % TritonX100, 10 % glycerol, layered on top of 30 % sucrose cushion, and centrifuged at 1500g for 10 min. The pellet was washed 3X times with PB and the crude chromatin pellet was extracted using increasing amounts of ammonium sulfate (NH 4 ) 2 SO 4 . The supernatant from the 0.5 M fraction contained enriched RNA polymerase II and was used for phosphopeptide mapping.

Phosphopeptide purification and mapping using immobilized metal affinity chromatography (IMAC)
IMAC was prepared as previously described in [25]. Ni was strip from the resin by 5 mM EDTA, pH8.0, 100 mM NaCl was used to while rotating at room temperature for 1 h in a 50 ml Falcon tube. The stripped resin is pelleted by centrifugation at 1500g and washed twice by 50 ml water followed by 50 ml of 0.6 % acetic acid, then, 50 ml of 100 mM FeCl 3 in 0.3 % acetic acid was used to coordinate iron to NTA resin. Following overnight incubation the resin is washed three times. The first wash was with 50 ml of 0.6 % acetic acid followed by two washes with 50 ml each of 0.1 % acetic acid. After the last wash the volume of the resin is estimated and resuspended in 0.1 % acetic acid as 50 % (vol/vol) slurry and stored at 4 °C. All common chemicals used for IMAC resin preparation where purchased from (SIGMA-Aldrich). SDS was added to the partially purified RNA polymerase II to a final concentration of 1 %. The sample was then reduce and alkylated by with 5 mM DTT for 5 min at 50 °C and 30 mM iodoacetamide for 45 min at room temperature in the dark. Proteins were precipitated by adding 3 volumes of 50 % (vol/vol) ethanol/acetone for 1 h at 4 °C. The protein pellet was resuspended in a buffer composed of 100 mM Tris (pH8.0), 8 M urea. The protein concentration was measured by Bradford assay and 1XTBS was used to dilute the urea concentration to 2 M final. 10 mg of the sample was digested using 0.1 mg of trypsin overnight at 37 °C. Following overnight digest the sample was acidified with TFA to a final concentration of 0.2 % and centrifuged at 4000g for 15 min. The soluble peptides were loaded into a 500 mg Sep-Pak18 column and washed twice with 3 ml of 1 % acetic acid, then eluted with a buffer composed of 80 % acetonitrile and 0.1 % acetic acid and dried by speed vac. The dried peptides were resuspended in 100 μl of 1 % acetic acid and loaded to IMAC column containing 70 μl of beads. The IMAC column was washed 2X twice with buffer containing 25 % acetonitrile, 100 mM NaCl, and 0.1 % acetic acid, followed by 1X wash with 1 % acetic acid, and 1X wash with water. The bound peptides were finally eluted with 210 μl of 6 % ammonium and dried by speed vac. Phosphopeptides were resuspended in 80 % acetonitrile and fractionated using a 2 mm Amide-80 column. Fractionated samples were resuspended in 5 μl 1 % TFA, and a 70 min linear gradient from 10 to 40 % ACN and 0.1 % formic acid was used to run the samples into LTQ Orbitrap XL similarly to previously described in [25].

MS data analysis using SEQUEST
The tandem mass spectra were searched on Sorcerersequest system (SageN, San Jose, CA SEQUEST) using a human semi-tryptic IPI database version 3.80 (download from http://www.ebi.ac.uk/IPI). And quantified using XPRESS software from TPP v4.3 rev 1 (Institute for Systems Biology). The search parameters used were: a monoisotopic masses, 50 ppm for the parental mass tolerance, maximum of three modifications per peptide, and a 79.966331 amu variable modification for phosphorylation of serine, threonine, and tyrosine.

SILAC labeling for identification of CID-dependent interactions with RPRD1B
RPRD1B 293 Flp In cells were grown in conditions used in [26]. Essentially cells were grown in either heavy or light complete DMEM media with 10 % dialyze FBS. Before cells reached 80 % confluence TET induction was initiated for 36 h. After induction of 3XMYC-RPRD1B WT or 3XMYC-RPRD1B MT immunoprecipitation was performed using total cell lysate. Immunoprecipitated RPRD1B was combined reduced, alkylated, and digested by 1 μg of trypsin. Samples were then desalted by 50 mg Sep-Pak18 column and dried by speed Vac. Dried samples were run into a 1 mm amide 80 column, and analyzed by MS as previously described in [25]. The median from the proteins identified with more than three unique peptides was calculated, and proteins with 1.0 cutoff for heavy to light SILAC ratio were determined.

SILAC labeling for identification of CTD phosphorylation sites affected by ionizing radiation
Three independent SILAC labeling experiments were performed to determine the effect of ionizing radiation (IR) on CTD phosphorylation in HeLa cells. In each experiment, the cells labeled with the heavy amino acids were treated with 8 Gy IR and collected at 2 h after radiation exposure. The cells labeled with the light amino acids were (1) un-irradiated, (2) irradiated with 8 Gy IR and collected 30 min after radiation exposure, or (3) irradiated with 8 Gy IR and collected 60 min after radiation exposure. The lysates from each pair of heavy and light amino acids labeled cells were mixed and RNA polymerase II partially purified as described above. The partially purified RNA polymerase II was then subjected to trypsin digestion and phospho-peptide analysis as described above. The phosphorylation sites in the phospho-containing CTD peptides were quantitated using XPRESS software from TPP v4.3 rev 1 (Institute for Systems Biology) described above.

Phospho-CTD-peptide pull-down of cellular proteins
A previous study employed biotinylated CTD-peptides with different combinations of serine phosphorylation to investigate the interaction of cellular proteins with the CTD repeats [27]. We adopted this approach to identify CTD-interacting proteins from HeLa nuclear extracts. We synthesized six CTD-peptides (Fig. 1a), each containing four Y 1 S 2 P 3 T 4 S 5 P 6 S 7 consensus repeats with a biotin at the N-terminus, and each with a different phosphorylation status: (1) unphosphorylated, (2) phosphorylated at the four tyrosines (pY 1 ) at the first position, (3) phosphorylated at the four serines at the second position (pS 2 ), (4) phosphorylated at the four serines at the fifth position (pS 5 ) and (5, 6) combinations thereof (pY 1 pS 2 and pY 1 pS 5 ). HeLa nuclear extracts immunodepleted of the endogenous RNAPII were reacted with each of the six different CTD peptides and bound proteins were identified using multidimensional protein identification technology (MUDPIT). Silver staining displayed the complexity of each of the streptavidin pull-down fractions (Fig. 1b) and showed that the pS 2 and pS 5 CTD-peptides pulled-down more proteins than the unphosphorylated or the pY 1 CTD-peptides (Fig. 1b, compare lanes 5 and 6 to lane 3, 4). Furthermore, the pattern of protein bands pulled-down by the doubly phosphorylated pY 1 pS 2 or the pY 1 pS 5 CTD-peptides was dissimilar to that pulled-down by the singularly phosphorylated CTD-peptides (Fig. 1b, compare lanes 7-5, and 8-6). The six pull-down fractions were analyzed by mass spectrometry in two independent experiments. The first by analyses of silvered stained gel bands and the second by MUDPIT analyses of the entire pull-down fraction. A total of 100 proteins were identified from the MUDPIT experiment as summarized in Table 1. Of them, several were also identified by the analysis of gel bands (see proteins marked with ** in Table 1). Some of the proteins in Table 1 are known to directly interact with serine-phosphorylated CTD, e.g., those containing the CTD-interacting domain (CID) (see below). Other proteins pulled-down by the CTD-peptides may represent direct, indirect or non-specific interactions. It cannot be ruled out that these interactions are RNA or DNA dependent, because the nuclear extracts were not treated with nucleases to remove RNA or DNA. A likely example of a non-specific interaction would be GAPDH, an abundant cytosolic protein detected in the pull-down fractions of 4 CTD-peptides (Table 1). a b c d Fig. 1 Proteomic analysis of proteins pulled-down by phosphorylated CTD peptides. a Summary of experimental strategy. Six CTD-peptides with phosphorylation sites marked in red and biotinylation at the N-terminus marked as a circle were synthesized and used as affinity probes to pulldown proteins from HeLa nuclear extracts. b A representative silver stained gel of proteins pulled-down by the six CTD-peptides. c, d Graphical representations of bioinformatics analysis of CTD-interacting proteins separated by GO terms in biological process (c) or molecular function (d). The CTD-interaction proteins are listed in Table 1    However, many other proteins were pulled-down by only one of the six CTD peptides tested (Table 1). Bioinformatics analysis using Annotation, Visualization and Integrated Discovery (DAVID) of the proteins listed in Table 1 found that the majority of them fall into the Biological Process of RNA splicing and metabolism (Fig. 1c). DAVID also found the Biological Process of translation and the structural constituent of ribosome to be represented (Fig. 1c, d). Given the abundance of ribosomal constituent and the cytoplasmic location of translation, the ribosomal proteins in the pull-down fractions are most likely to be non-specific. On the other hands, Table 1 contains several RNA-binding proteins that are related to known components of the human spliceosomal complexes [28], and those interactions are likely to be relevant because the CTD is known to regulate RNA splicing.

CID binds pS 2 -CTD but not pY 1 pS 2 -CTD
The CTD-interacting domain (CID) was previously identified by a yeast two-hybrid screen for CTD-binding proteins [29]. Subsequent studies have determined that the CID domain of the transcription termination factor Pcf11 interact with phosphorylated serine residues of CTD (pS 2 -CTD) [30], but not with CTD that is doubly phosphorylated on tyrosine and serine (pY 1 pS 2 -CTD) [17]. In Fig. 2a, the complex of Pcf11-CID with pS 2 -CTD is overlaid with the CID of SCAF8 (Fig. 2a) [6,7,30,31]. Although Pcf11 was not among the proteins pulled-down by the pS 2 -CTD peptide, four other CIDproteins, namely SCAF8, SCAF4, RPRD1B, and RPRD2 were identified in the pS 2 -CTD but not in the pY 1 pS 2 -CTD pull-down fractions (Table 1; Fig. 2b). To validate the differential interaction between the CID and the different phosphorylated CTD peptides, we expressed and purified the CIDs from SCAF4, SCAF8, RPRD1B, and RPRD2 as GST-fusion proteins from bacteria (Fig. 2c). Direct interaction between each CID and the biotin-CTD, biotin-pS 2 -CTD and biotin-pY 1 pS 2 -CTD peptides were analyzed by surface plasmon resonance using streptavidin-coated Biacore chips (Fig. 2d) [27,32]. Consistent with the MUPIT results (Table 1) as well as previously published reports [17,30], we detected binding of all four CIDs to the pS 2 -CTD peptide but not to the unphosphorylated CTD peptide or the doubly phosphorylated pY 1 pS 2 -CTD peptide (Fig. 2d).
We then examined the interaction of the SCAF4-CID with a recombinant GST-CTD protein and with the CTD peptides by immunoprecipitation and pull-down assays (Fig. 3). As shown in Fig. 3b 8). Total cell lysates were probed with antibodies for GST or Myc to determine the levels of the transfected proteins. The cell lysates were each reacted with anti-Myc (9E10) or IgG conjugated agarose beads and the precipitated samples were then immunoblotted with anti-GST (Fig. 1c). The results showed that Myc-SCAF4-CID but not Myc-p72b (encoded by DDX17, which is another RNA binding protein involved in RNA processing) interacted with GST-CTD in co-transfected cells (Fig. 3c, lane  7). In Fig. 3d, total lysate from HEK293T cells transfected with the Myc-SCAF4-CID expression plasmid was reacted with pS 2 -CTD and pY 1 pS 2 -CTD peptides over a range of concentrations. The pull-down fractions were then probed with anti-Myc. Densitometry quantification of the immunoblots detecting Myc showed a concentration-dependent interaction of Myc-SCAF4-CID with the pS 2 -CTD peptide but not the pY 1 pS 2 -CTD peptide (Fig. 3d). Together, results shown in Table 1, Figs. 2 and 3 establish that CTD tyrosine-1 phosphorylation disrupts the CID interaction with pS 2 -CTD. These results are consistent with a previous report that Pcf11 interaction with the CTD is disrupted by CTD tyrosine phosphorylation [17].

CID-dependent interaction of RPRD1B with RNA polymerase II
To demonstrate that a mammalian CID containing protein associates with endogenous RNA polymerase II, we generated mutations in the CID domain of the human RPRD1B protein. The mutant RPRD1B contains four amino acid substitutions: N57S, D58S, Q61K, N62R, in its CID domain (Fig. 4a) [7]. To determine whether these   11-20, lower panel). Note that GST-CTD associated with WT but not MT RPRD1B, and that AblPPn disrupted WT RPRD1B interaction with GST-CTD (compare lane [17][18][19]. e Diagram of SILAC mass spectrometry strategy used to identify proteins that associated with WT but not MT RPRD1B. Heavy and light isotope labeling was conducted after tetracyclin-induced expression of the WT and MT RPRD1B in HEK293T cells. See Tables 3 and 4 for summaries of SILAC results CID mutations disrupt RPRD1B interaction with endogenous RNA polymerase II, HEK293T cells were transfected with the wild type or mutant Myc-tagged RPRD1B expression plasmids and the amount of RNAPII or pS 2 -CTD in the anti-Myc (9E10) immunoprecipitates was detected by immunoblotting (Fig. 4b). Immunoblotting of total lysates (Fig. 4b, lanes 1-3) with anti-Myc showed that the wild type (WT) and the CID-mutant (MT) RPRD1B were both expressed in the transfected cells.
We next used SILAC proteomics to identify cellular proteins that differentially associated with the WT vs. the MT RPRD1B in HEK293T cells (Fig. 4e; Table 2). We constructed HEK293 cells to stably express either the WT or the MT RPRD1B from a tetracycline-inducible promoter. Following tetracycline induction, we labeled the WT RPRD1B expressing cells with heavy amino acids, and the MT cells with light amino acids, subjected the labeled lysates to immunoprecipitation with anti-Myc conjugated-beads, and analyzed the resulting immunoprecipitates by tandem mass spectrometry. As summarized in Table 2, 79 proteins were identified to have a median WT/ MT ratio of greater than 1.0. It is important to note that the bait protein (RPRD1B) had a median WT/MT ratio of 1.16. The mass spectrometry analysis achieved an over 77 % coverage of RPRD1B in 22 distinct peptides. Bioinformatics analysis of WT-RPRD1B-associated proteins found that the top two biological processes represented were RNA processing (p value 1.1E−29) and mRNA metabolic process (p value 9.7E−27) ( Table 3). The top two cellular components represented are ribonucleoprotein complex (p value 2.4E−20) and nucleoplasm (p value 2.0E−17) ( Table 3). The top molecular function represented was RNA binding (p value 7.5E−24) ( Table 3). As summarized in Table 4, six RNA polymerase II subunits were found to associate with wild type RPRD1B and each with a WT/MT ratio of greater than 2.5, which is (See figure on next page.) Fig. 5 Reactivity of anti-pY 1 -CTD with CTD-peptides and endogenous RNAPII. a Representative images of ELISA results. Each column is coated with a different CTD peptide as indicated on the bottom of the columns. Each row is reacted with a different antibody as indicated to the left of the rows. 4G10 is a mouse monoclonal antibody that reacts with phosphor-tyrosine; 8WG16 is a mouse monoclonal antibody that reacts with un-phosphorylated CTD; α-pS 5 is rabbit polyclonal antibody that reacts with pS 5 in CTD; α-pS 2 is a rabbit polyclonal antibody that reacts with pS 2 in CTD; α-pY 1 is a rabbit polyclonal antibody raised against a pY 1 pS 5 -CTD peptide. b Quantification of ELISA results. Numbers shown are mean ± SD (n = 3). c Phosphotyrosine (pTyr) competed with the binding of α-pY 1 to pY 1 -containing CTD peptides. d α-pY 1 -CTD reacts with endogenous RNAPII. HEK293T cells transfected with vector or AblPPn were immunoprecipitated with 8WG16 or IgG and then probed with N20 or anti-pY 1 . Transfection with AblPPn increased the pY 1 -reactivity in whole cell lysate (input) and in 8WG16-precipitated RNAPII (upper panels). RNAPII CTD contains 52 CTDrepeats and not all 52 repeats are stoichiometrically phosphorylated in vivo, 8WG16 can react with RNAPII that contains some unphosphorylated CTD repeats and some pY 1 -CTD repeats. In reciprocal immunoprecipitation (IP), whole cell lysates were reacted with anti-pY 1 and then immunoblotted with anti-pS 5 -CTD or 8WG16. Note that anti-pY 1 immunoprecipitated RNAPII that reacted with anti-pS 5 -CTD or 8WG16. IIo, RNAPII containing hyper-phosphorylated CTD; IIA, RNAPII with hypo-phosphorylated CTD. e The previously reported 3D12 antibody [17] does not react with pY 1 -CTD. Increasing amounts of total lysates from HEK293T cells transfected with vector or AblPPn were immunoblotted with the indicated antibodies. Note that AblPPn did not alter the levels of pS 2 or pS 5 reactive RNAPII but increased the levels of pY 1 reactive RNAPII. The pY 1 -reactivity was competed with phosphotyrosine (pTyr). Note that 3D13 reacts with the IIA form of RNAPII. Reactivity with 3D12 was not affected by the expression of AblPPn. Phosphotyrosine does not inhibit the 3D12 reactivity with RNAPIIA. f Anti-pY 1 -CTD does not react with YF-CTD mutant. HEK293T cells were transfected with combinations of AblPPn, GST, GST-CTD or a RNAPII with a truncated CTD in which all of the Y 1 is mutated to F (phenylalanine). Whole cell lysates were immunoblotted with the indicated antibodies. Note that pS 5 reacted with the endogenous RNAPII, GST-CTD and the YF-RNAPII. The authenticity of the YF-RNAPII was established by its reactivity with B10 antibody [34]. The YF-RNAPII did not react with anti-pY 1 -CTD significantly above the ratio of the bait RPRD1B protein (Table 4). Over 20 unique peptides were identified as Rpb1, which encodes the largest subunit containing the CTD, with a median WT/MT ratio of 9.0. These results confirmed that the CID domain of RPRD1B is important for its association with the endogenous RNAPII enzyme complex in mammalian cells. Future investigation of the interactions between RPRD1B and the proteins identified in this study ( Table 2)

Characterization of antibodies for tyrosine-phosphorylated CTD
Antibodies for serine-2 and serine-5 phosphorylated CTD have been available for many years; however, antibodies for tyrosine-1-phosphorylated CTD were only recently  Lists proteins that were identified to preferentially associate with wild type (WT) RPRD1B, but not CID-mutated (MT) RPRD1B utilizing a SILAC mass spectrometry strategy where heavy and light isotope labeling was conducted after tetracycline-induced expression of the WT and MT RPRD1B in HEK293T cells.
The official gene symbols of the identified proteins are listed in the first column. The column labeled Median WT/MT is the median ratio between a heavy and light matching peptide identified for that protein. Unique peptides column shows the total number of peptides that were identified and exists in only one protein regardless of peptide length. Amino acid coverage column shows the percentage of the protein's sequence represented by the peptides identified in the MS analysis reported [17]. To develop anti-pY 1 -CTD antibodies, we immunized rabbits with three different tyrosine phosphorylated CTD peptides: pY 1 -consensus peptide, pY 1 pS 2concensus peptide, and pY 1 pS 5 -consensus peptide and purified phospho-specific antibodies by peptide-affinity chromatography. We found that the pY 1 -consensus peptide generated anti-pY 1 antibody of low affinity. Immunization with the pY 1 pS 2 -peptide generated antibodies that reacted with pY 1 , pS 2 , pY 1 pS 2 , pS 5 , and pY 1 pS 5 -CTD peptides. However, immunization with pY 1 pS 5 -CTD peptide generated antibody that reacted with pY 1 -CTD, pY 1 pS 2 -CTD and pY 1 pS 5 -CTD but not the serine-only phosphorylated peptides (Fig. 5a, b). This reactivity was competed by phosphotyrosine (Fig. 5c), demonstrating that the antibody recognizes the pY 1 -epitope. The pY 1antibody also reacted with endogenous RNA polymerase II in cells transfected with AblPPn (Fig. 5d). We found a significant increase in the reactivity of endogenous RNAPII with our anti-pY 1 antibody in cells transfected with AblPPn (Fig. 5e). The ectopic expression of AblPPn did not alter the reactivity of RNAPII with the pS 5 -or the pS 2 -CTD antibodies (Fig. 5e). We purchased the previously reported pY1-CTD antibody 3D12 [17]. Despite the report that this antibody reacts with Abl-phosphorylated CTD, we could not repeat that result. As shown in Fig. 5e, the 3D12 antibody reacted with the unphosphorylated RNAPII and its reactivity was not stimulated by the ectopic expression of AblPPn. To further demonstrate the specificity of our pY 1 -CTD antibody, we tested its reactivity against the YF-CTD mutant of RNAPII. As shown in (Fig. 5f ), the pY 1 -CTD antibody did not react with the YF-CTD mutant.

Ionizing radiation alters CTD tyrosine phosphorylation
Previous studies have shown that the nuclear Abl is activated by DNA damage to phosphorylate RNAPII-CTD on tyrosine [11,18]. We therefore examined IR induced tyrosine phosphorylation of RNA polymerase II CTD using phospho-proteomics combined with SILAC. A multistep purification strategy was established to generate an enriched partially purified fraction of RNA polymerase II that preserved its native phosphorylation state (Fig. 6a). The fractions were characterized using immunoblotting to detect total RNA polymerase II and phosphorylation of serine 2 or serine 5 on CTD (Fig. 6b). SILAC tandem mass spectrometry was then used to compare the CTD phospho-peptides at 2 h after exposure to 8 Gy ionizing radiation (IR) relative to un-irradiated, 30 min irradiated or 60 min irradiated cells (Fig. 6c). As summarized in Table 5, our analysis identified a subset of the previously identified CTD phosphorylation sites, i.e., Y-1874, Y-1881, Y-1909 and Y-1916 that are in the vicinity of the few Lys residues in the CTD. Among this subset of trypsin-released peptides, our SILAC analysis showed that ionizing radiation affected CTD tyrosine phosphorylation in several ways. A phospho-peptide containing pY-1874 and pY-1881 but no pS or pT showed similar levels (ratio of 0.91) between non-irradiated and irradiated cells at 2 h, but reduced ratio (0.5) when the comparison was made between cells irradiated for 30 min or 2 h, suggesting that IR caused a transient reduction in pY-1874 and pY-1881 at 30 min with a return to un-irradiated level by 2 h ( Table 5). The ratio of 0.75 between 60 min and 2 h irradiated samples was consistent with this transient reduction and recovery of phosphorylation at these two tyrosine sites. A phospho-peptide containing pY-1909 and also pS-1917 and pS-1920 showed reduced levels in un-irradiated and 30 min-irradiated relative to 2 h-irradiated samples (Table 5). This result suggests that IR caused an increase in the abundance of this pYcontaining CTD peptide between 30 min to 2 h of IR. A pY-1909, pS-1915 and pS-1920 peptide also showed increased abundance with time from 30 to 60 min relative to 2 h after irradiation (Table 5). Interestingly, a peptide with pY-1909 and pS-1920 was found at higher levels in un-irradiated cells when compared to 2 h-irradiated cells (Table 5). There are two possible interpretations of these results. First, the decrease in pY-1909/pS-1920 peptide may be coupled to the increase in pY-1909/pS-1917/ pS-1920 peptide and thus suggesting that IR induced the phosphorylation of pS-1917. Second, the decrease in pY-1909/pS1920 peptide is not related to the increase in pY1909/pS-1917/pS-1920 peptide in that these two Table 4

RNA polymerase subunits identified in RPRD1B interactome
Lists RNA polymerase subunits identified to associate with wild type (WT) RPRD1B, but not the CID-mutated (MT) RPRD1B, from the SILAC mass spectrometry experiment where heavy and light isotope labeling was conducted after tetracycline-induced expression of the WT and MT RPRD1B in HEK293T cells. The official gene symbols of the identified proteins are listed in the first column. The column labeled Median WT/MT is the median ratio between a heavy and light matching peptide. Unique peptides column shows the total number of peptides that were identified and exist in only one protein regardless of peptide length. Amino acid coverage column shows the percentage of the protein's sequence represented by the peptides identified in the MS analysis phosphorylation configurations occurred on different RNAPII molecules and that IR regulated their levels independently, dependent on the sub-genomic locations of these different RNAPII. With peptides containing the pY-1916 site, our SILAC analyses consistently showed a reduction in abundance at 2-h after IR (Table 5). It thus appears that exposure to ionizing radiation has a complex effect on CTD tyrosine phosphorylation, depending on the phosphorylation site and neighboring pS and pT status. Immunoblotting of total lysates from the HeLa cells used in the SILAC experiment showed a net increase in phospho-ATM up to 2 h after irradiation but a transient net increase in pY 1 -reactivity at 30 and 60 min after irradiation (Fig. 6d). The net increase in pY 1 levels at 30 and a b c d Fig. 6 Effect of ionizing radiation on CTD phosphorylation. a Purification scheme used to generate partially purified RNAPII from HeLa cell nuclear extract. b Detection of RNAPII with three different antibodies in fractions shown in (a). c Diagram of SILAC mass spectrometry strategy used to examine CTD phosphorylation alterations at 0 h, 30 or 60 min relative to 2 h exposure to 8 Gy of ionizing radiation. d RNAPII and CTD phosphorylation in IR-treated cells. Un-fractionated lysates from the indicated HeLa cells at the indicated time (minutes) after 8 Gy of IR were immunoblotted with the indicated antibodies. Note that IR stimulated the phosphorylation of ATM and increased the levels of pY 1 -CTD without changing the levels of pS 2 or pS 5 CTD 60 min after IR treatment was likely to have resulted from phosphorylation at other pY 1 sites that were not detected by the SILAC mapping of tryptic CTD peptides.

Discussions
Phosphorylation of the CTD generates "codes" for the selective binding of cellular proteins to regulate RNA processing and chromatin structure during transcription elongation [1]. Because each of the 52 repeats of the CTD can be phosphorylated on multiple residues, and because proteins can bind to more than one repeat, the theoretical complexity of the "CTD code" is immense.
In this study, we show that synthetic peptides with four heptad repeats of CTD can be used to pull-down mammalian cellular proteins that directly or indirectly interact with the CTD. This approach has identified proteins containing the well-established CTD-interacting domain (CID). This approach also led to the finding that CTDtyrosine phosphorylation could interfere with the direct binding of CID to pS 2 -CTD consistent with a recently published report that CTD-tyrosine phosphorylation inhibits RNAPII interaction with the Pcf11 transcription termination factor [17]. However, the mass spectrometry analysis has also identified several proteins that interacted with tyrosine/serine doubly phosphorylated CTD peptides.

Conclusions
While the CTD peptide-pull down method cannot distinguish between direct or indirect binding to alternatively phosphorylated CTD-repeats, it provides a way to survey the proteomic landscape associated with specified