Large Scale Phosphoproteome Profiles Comprehensive Features of Mouse Embryonic Stem Cells*

Embryonic stem cells are pluripotent and capable of unlimited self-renewal. Elucidation of the underlying molecular mechanism may contribute to the advancement of cell-based regenerative medicine. In the present work, we performed a large scale analysis of the phosphoproteome in mouse embryonic stem (mES) cells. Using multiplex strategies, we detected 4581 proteins and 3970 high confidence distinct phosphosites in 1642 phosphoproteins. Notably, 22 prominent phosphorylated stem cell marker proteins with 39 novel phosphosites were identified for the first time by mass spectrometry, including phosphorylation sites in NANOG (Ser-65) and RE1 silencing transcription factor (Ser-950 and Thr-953). Quantitative profiles of NANOG peptides obtained during the differentiation of mES cells revealed that the abundance of phosphopeptides and non-phosphopeptides decreased with different trends. To our knowledge, this study presents the largest global characterization of phosphorylation in mES cells. Compared with a study of ultimately differentiated tissue cells, a bioinformatics analysis of the phosphorylation data set revealed a consistent phosphorylation motif in human and mouse ES cells. Moreover, investigations into phosphorylation conservation suggested that phosphoproteins were more conserved in the undifferentiated ES cell state than in the ultimately differentiated tissue cell state. However, the opposite conclusion was drawn from this conservation comparison with phosphosites. Overall, this work provides an overview of phosphorylation in mES cells and is a valuable resource for the future understanding of basic biology in mES cells.

Embryonic stem (ES) cells, which are derived from the inner cell mass of early blastocysts (1), are pluripotent and capable of self-renewal (2). Therefore, they hold promise for cell-based regenerative medicine (3). In recent years, significant progress has been made toward understanding the characteristics of ES cells, including uncovering some notable transcription factors, such as OCT4 (4,5), SOX2 (6), and NANOG (7,8). These key regulators are thought to be critical for the differentiation of ES cells because of their unique expression profiles and their essential roles in early development. High throughput, genome-wide studies have also discovered hundreds of genes that are probably critical for the self-renewal and pluripotency of ES cells (9,10). In addition to the transcription of genes, the phenotype of ES cells was also found to be tightly regulated by the translation of proteins of which the active/inactive status is often determined by phosphorylation or other post-translational modifications (11). To obtain a more comprehensive understanding of the molecular mechanisms underlying ES cell pluripotency and differentiation, some transcriptome and proteome studies have been performed on human or mouse ES cells (12)(13)(14).
Mass spectrometry (MS)-based proteomics provides the means to determine the abundance, modification, localization, and interaction of proteins on a large scale (15). Efforts at proteomics analysis have led to full-scale profiling of the proteome of mouse embryonic stem (mES) cells, resulting in the investigation of nearly 2000 proteins (16,17). Other recent studies have reported the identification of 2389 and 5111 proteins in mES cells, including some previously undetected stem cell markers (18,19).
All of the studies mentioned above provide new insights into ES cell protein expression profiling. Nevertheless, some studies indicate that numerous decisive events in cellular responses are mediated or determined not only by changes in protein abundance but also by protein post-translational modifications (20). Among these protein modifications, the phosphorylation of proteins through kinase activity was reported to act as a critical regulator of ES cell function (21). The largest phosphoproteome analysis of human ES (hES) 1 cells performed to date identified 10,844 phosphorylation sites, which were detected via the combination of two peptide fragmentation methods (22). In addition, two groups recently identified additional phosphoproteomes and profiled their dynamic activities in hES cells (23,24). However, several major challenges remain unresolved, such as the low abundance of phosphoproteins in general, dynamic site occupancy, and inherently poor fragmentation of phosphopeptides. In contrast with hES cells, a global view of the mES cell phosphoproteome has not yet been fully achieved. mES cells are the most important models that can be used to investigate the "stemness" characteristics of stem cells; therefore, comprehensive information on the phosphorylation states and specific phosphosites in the mES cell proteome will provide an important resource for stem cell research.
In the current study, we attempted to profile a large phosphoproteome data set from mES cells to further explore their phosphorylation characteristics. Furthermore, we developed a "StemCell Project" web site to retrieve and analyze the data from the study. This information provides an overview of phosphorylation in mES cells and should be a valuable resource for the future understanding of the basic biology in mES cells.

EXPERIMENTAL PROCEDURES
mES Cell Culture-An E14.1 cell strain, a substrain of the E14 strain of ES cells derived from the blastocyst of a 129/Ola strain mouse, was purchased from ATCC. The cells were cultured under feeder-free conditions and maintained in Dulbecco's modified Eagle's medium with 15% fetal bovine serum, 0.1 mM ␤-mercaptoethanol (Sigma), 0.1 mM minimal essential medium nonessential amino acids (Chemicon), and 1000 units/ml leukemia inhibitory factor (Chemicon). The medium was changed every day, and cells were passaged every other day. For expansion, the E14 cells within three passages upon thaw were trypsinized into a single cell suspension and plated into Petri dishes at a density of 1.0 ϫ 10 4 cells/cm 2 . Two days after plating, the cells were harvested for proteomics analysis.
Alkaline Phosphatase Staining and Immunofluorescence-E14.1 cells parallel to harvest were plated into a 24-well dish at a density of 1.0 ϫ 10 4 cells/cm 2 . Two days after plating, cells were stained for alkaline phosphatase, SSEA-1, OCT3/4, SOX2, and NANOG expression as a quality control for pluripotency. Alkaline phosphatase staining was carried out using an Alkaline Phosphatase Detection kit (Chemicon, catalog number SCR004) following the manufacturer's instructions. For immunofluorescence, cells were washed twice with PBS, fixed with 4% PFA in PBS for 30 min, permeabilized with 0.2% Triton X-100 for 10 min, and blocked with 1% BSA in PBS for 2 h. Cells were then incubated with a primary antibody diluted 1:100 in 1% BSA in PBS for 2 h, washed twice with 1% BSA in PBS, incubated with a FITC-or Texas Red-conjugated secondary antibody diluted 1:100 in 1% BSA in PBS for 2 h, washed twice with 1% BSA in PBS, covered with PBS, and scanned for fluorescence under a microscope. Antibodies used were as follows: SSEA-1 (480, Santa Cruz Biotechnology, Santa Cruz, CA, sc21702), OCT3/4 (C-10, Santa Cruz Biotechnology, sc5279), SOX2 (R&D Systems, MAB2018), and NANOG (R&D Systems, AF2729).
Phosphoprotein Purification-To purify the phosphoproteins from cell lysate, the PhosphoProtein Purification kit from Qiagen (Valencia, CA; catalog number 37101) was applied. The procedures were conducted according to the manufacturer's instructions. A total of 5 mg of crude protein sample was used for the starting sample, and ϳ200 g of phosphorylated proteins was obtained for further analysis.
Strong Anion Exchange (SAX)/TiO 2 Fractionation and Enrichment of Phosphopeptides-SAX chromatography was used for fractionation of phosphopeptides of mES cells. Peptides (2 mg) were dissolved in pH 8.5 buffer and loaded onto an SAX column (1.0 mm ϫ 15 cm, Column Technology Inc.) at a flow rate of 0.3 ml/min. During the loading step, the flow-through fraction was collected. The bound peptides were fractionated by a linear gradient at 0.2 ml/min from pH 8.5 to pH 2.0 in 150 min (7.5 min/fraction), and 20 fractions were collected in total. Based on the UV trace, all the fractions were combined, resulting in five fractions that were used for TiO 2 enrichment. The flow-through was enriched four times using TiO 2 .
The phosphopeptide enrichment procedure using TiO 2 beads was described elsewhere (25)(26)(27) and was used with some modification. In detail, the TiO 2 beads (GL Sciences, Tokyo, Japan) were preincubated first in 200 l of loading buffer (65% ACN, 2% TFA, saturated by glutamic acid) for acidification. The fractions of the SAX chromatography were resolved in 200 l of loading buffer and then incubated with ϳ2 mg of TiO 2 beads. The peptide-bead slurry was incubated and centrifuged, and the supernatant was discarded. The incubated beads were then washed with 800 l of wash buffer I (65% ACN, 0.5% TFA) and wash buffer II (65% ACN, 0.1% TFA). The bound peptides were eluted once with 200 l of elution buffer I (300 mM NH 4 OH, 50% ACN) and twice with 200 l of elution buffer II (500 mM NH 4 OH, 60% ACN). All the incubation, washing, and elution procedures were performed with end-over-end rotation for 20 min at room temperature. The eluates were dried down and reconstituted in 0.1% FA in H 2 O for MS analysis.
Subcellular Fractionation and Protein Digestion-The stepwise subcellular fractionation procedures were performed using the ProteoExtract TM Subcellular Proteome Extraction kit from Calbiochem with little modification. Briefly, adherent feeder-free cultured ES cells were gently washed with 1.0 ml of cold wash buffer twice on ice. Detached cells were carefully transferred to a tube and fractionated as well. With ice-cold extraction buffers I-III, the subcellular fractions (cytosolic and nuclear extractions) were collected for proteomics analysis. Protein concentrations were determined by the method of Bradford using BSA as the standard (Bio-Rad). The tryptic digestion of 400 g of protein from each fraction was processed as described elsewhere (28).
Phosphatase Treatment-Antarctic phosphatase treatment was performed on total cell lysate according to the manufacturer's instruction (M0289L, Antarctic phosphatase, New England Biolabs) (29). A total of 200 g of lysate was redissolved in 20 l of reaction buffer (supplier-provided), 2 l of 10% SDS, and 158 l of H 2 O. After Vortex mixing, the sample was incubated at 37°C for 2 h with 20 l of 1 unit/l Antarctic phosphatase added. Finally, the proteins for Western blot validation were recovered after ultrafiltration at 15,000 ϫ g for 1 h at 4°C.
Each of 50 g of total cell lysate with and without treatment by phosphatase was subjected to 8 and 15% SDS-PAGE, and then the proteins in the gel were transferred to a nitrocellulose membrane. The membranes were incubated first with the appropriate primary antibodies (anti-NANOG, M-149, Santa Cruz Biotechnology; anti-REST: 07-579, Upstate, Temecula, CA), respectively, and then incubated with HRP-conjugated secondary antibodies for 45 min. The proteins were detected by enhanced chemiluminescence (ECL Plus, GE Healthcare). Signals of bands from Western blotting were scanned with PDQuest GS-710 on a flat bed scanner.
Yin-Yang Multidimensional Liquid Chromatography (MDLC) On-line Fractionation and Mass Spectrometry Analysis-To achieve an indepth view of mES cells on global and phosphoproteomics analyses, the strategy for peptide fractionation included three methods on the yin-yang MDLC system (30). The system involves three subsystems with pH continuous gradient elution: 1) strong cation exchange/reversed-phase (SCX/RP) LC-MS/MS (10 steps), 2) SAX/RP LC-MS/MS (10 steps), and 3) SCX/SAX LC-MS/MS (10 steps each). For the SCX/SAX method, the flow-through fractions of peptide samples were collected after loading onto the SCX column, lyophilized, redissolved, and then fractionated on the SAX/RP LC-MS/MS subsystem. The peptide fractionation was performed on an on-line two-dimensional LC-MS/MS system using the pH continuous on-line gradient developed by our laboratory (31) with some modifications. In detail, two 400-g tryptic peptide mixtures of cytosol/nuclear fractions were dissolved in 80 l of pH 2.5 buffer (pH 8.5 for SAX) before being loaded on an SCX (or SAX) column (320 m ϫ 100 mm, Column Technology Inc.) followed by 10 continuous on-line pH gradients from pH 2.5 to pH 8.5 for SCX (pH 8.5 to pH 2.0 for SAX). Each fraction from SCX (or SAX) was eluted to the trap columns (Agilent Technologies) alternately and separated on the analytic column (75 m ϫ 150 mm, Column Technology Inc.). The reversed-phase gradient for analysis was from 2 to 40% mobile phase (0.1% formic acid in acetonitrile (v/v)) in 180 min at 300 nl/min after the split. Another equivalent peptide mixture of cytosol (or nuclear) fraction was dissolved in 80 l of pH 2.5 buffer and loaded on the SCX column. The flow-through fraction was collected, lyophilized, and redissolved in 80 l of pH 8.5 buffer prior to loading on the SAX column. The fractionation on the SCX column and SAX column elution were preformed individually on a two-dimensional LC-MS/MS system as mentioned before. For fractions obtained from subcellular fractionation, the cytosol and nuclear fractions were analyzed repeatedly by SCX and SAX fractionation. Collectively, there were 40 fractions for the cytosol and nuclear extractions, respectively.
All the fractions from RP separation were analyzed in the LTQ-Orbitrap (ThermoFisher Scientific, San Jose, CA) as described (32). The nanospray ionization source was mounted, and the voltage was set at 1.90 kV. Normalized collision energy was 35%. Data-dependent collection of the 10 most intense ions with collision-induced dissociation from each full scan (performed in the Orbitrap) was selected for MS2 analyses (performed in the ion trap). Dynamic exclusion settings were as follows: repeat count, 2; repeat duration, 30 s; exclusion duration, 90 s. The resolution of the Orbitrap mass analyzer was set at 100,000 (m/z 400) for the precursor ion scans.
Data Collection and Database Search-The peak lists of all 108 raw files used for the database search were produced by MaxQuant software (default parameters, version 1.0.13.13) (33). The generated .par and .msm files were searched using the Mascot search engine (version 2.2.0, Matrix Science) (34) against a concatenated forward and reversed mouse International Protein Index protein sequence database (IPI mouse, version 3.52) containing 55,303 protein sequences and 262 commonly observed contaminants (111,130 protein entries in total). For the Mascot search, the following parameters were used. Cysteine carbamidomethylation was selected as a fixed modification, and methionine oxidation, protein N-terminal acetylation, and phosphorylation on serine, threonine, and tyrosine were selected as variable modifications. Up to two missing cleavage points of trypsin were allowed. The precursor ion mass tolerance was 7 ppm, and the fragment ion mass tolerance was 0.5 Da for MS/MS spectra. The database search results (.dat files) were further processed with MaxQuant.
All the identification from contaminants was removed firs. For protein identification, two peptides were required in which at least one was unique in the database (35). If the identified peptide sequences of one protein were equal to or contained the peptide set of another protein, then the two proteins were grouped together by MaxQuant and reported as one protein group. The false discovery rates (FDRs) for peptide, protein, and phosphosite (the function "apply site FDR separately" was activated) were all set at 1%, and a minimum length of six amino acids was used for peptides identification. The high confidence phosphosites were filtered based on the PTM scores as described (20). In detail, class I phosphorylation sites are defined by a localization probability of 0.75 and a probability localization score difference greater than or equal to 5 (20). The distributions of the site localization probabilities, Mascot scores, and PTM scores of the identified phosphopeptide are shown in supplemental Fig. 3. All the analyses for phosphorylation were based on class I phosphorylation sites. The identified proteins (supplemental Table 1), (phospho)peptides (supplemental Tables 2 and 3), and MS/MS spectra for phosphopeptides (supplemental Fig. 5) can also be accessed at our StemCell Project web site. The MS raw data were converted using the PRIDE Converter tool (36) and can be retrieved from the Proteomics Identifications (PRIDE) database (http://www.ebi.ac.uk/pride) (37) with accession numbers 12897-12908.
Mouse ES Cell Differentiation and Multiple Reaction Monitoring Mass Spectrometry-For induction differentiation, the mES cells were cultured following withdrawal of LIF for various times (0, 12, and 24 h). To validate the phosphosites identified in the current study, a multiple reaction monitoring (MRM) experiment was also performed to explore the abundance profiling at different time points. The isotopic peptides (Fmoc-[ 13 C 6 ]Leu; labeled form) with the same sequence without isotope (unlabeled form) were selected, synthesized (GL Biochem, Shanghai, China), and purified by the manufacturer (Ͼ95% purity). Dried peptides were reconstituted in 0.1% formic acid in The entire analysis was performed by reversed-phase high performance liquid chromatography on a 1200 HPLC system (Agilent Technologies) coupled on line to a 6410 QQQ mass spectrometer (Agilent Technologies). In detail, the one-dimensional LC system was set up using one C 18 trap column (300 m ϫ 5 mm, Agilent Technologies) followed by an analytical C 18 column (150 m ϫ 100 mm, Column Technology Inc.). To elute the peptides, a gradient of 5-40% buffer B (90% acetonitrile in 0.1% formic acid) was used. The parameters for mass spectrometer were as follows: capillary voltage at 4000 V, drying gas at 300°C at 3.0 liters/min, and nebulizer gas at 18 p.s.i. MRM transitions were optimized with MassHunter software (Agilent Technologies) and acquired at unit resolution in both the Q1 and Q3 quadrupoles to maximize sensitivity. Finally, two transitions per peptide (a quantifier and a qualifier) were selected and monitored. For quantitation analysis, 40 g of tryptic peptides from mES cells were resolubilized in 0.1% formic acid spiked with 1 pmol of internal standard peptides. The labeled non-phosphopeptide PSSEDLPLQG-SPDSSTSPK with 3ϩ charge state (m/z 645.6483; L, [ 13 C 6 ]Leu) and labeled phosphopeptide PSSEDLPLQGS*PDSST*SPK with 3ϩ charge state (m/z 698.9591; L, [ 13 C 6 ]Leu; the asterisk indicates the phosphorylated residue) were used as internal standard peptides and monitored by MRM.
For each spiked and endogenous peptide, the best transitions (non-phosphopeptide: 645.6 3 818.4 and 643.6 3 818.4; collision energy, 13; dwell time, 80 ms; phosphopeptide: 699.0 3 733.8 and 697.0 3 730.8; collision energy, 9; dwell time, 80 ms) were used for further analysis. In addition to the qualifier, the retention time was also used to confirm the peptide elution pattern. The peak area was calculated by MassHunter software (Agilent Technologies). All the results obtained from MRM experiments were averaged from triplicated measurements to decrease variation. All the MRM spectra can be found in supplemental Fig. 4.
Defining Stemness Proteins-Using previous studies, we predefined a list of 233 stemness genes, which could probably uncover the core stem cell characteristics that underlie self-renewal and generate differentiated progeny. Of these genes, 216 have been proven to be enriched in mouse embryonic, neural, and hematopoietic stem cells (38), and 15 were identified to be ES cell-specific markers (19). SALL4 (39) and REST (40) were confirmed as two main regulators in maintaining the pluripotency of stem cells. In our work, a total of 52 proteins and 22 phosphoproteins from the stemness genes were confidently identified (Table I and supplemental Table 5).
Protein Overlap Analysis-From three previous publications on mES cells, we extracted 846 (16), 1871 (17), and 2389 (18) proteins. Proteins with an identical sequence in each study were regarded as unique proteins, and the three studies yielded a total of 3870 unique proteins referred to in the "previous studies (3870)" data set ( Fig. 2A). The same procedure was used to compare the three large data sets in Fig. 2A (previous studies (3870), previous study (5111), and our study (4581)), and 1060 shared proteins were found among them.
Conservation Analysis of Phosphoproteins and Phosphosites-Conservation of phosphoproteins between humans and mice was defined as the ratio of orthologous phosphoproteins to total phosphoproteins. The phosphoproteins and phosphosites of hES cells were extracted from published studies (22)(23)(24). The redundancy of phosphoproteins was reduced if their sequences from individual data sets were identical. The homolog gene data from mice and humans were downloaded from NCBI (HomoloGene Release 63) and were used as a reference to find the homologous and orthologous phosphoproteins between humans and mice; only one copy of multiple paralog genes in mice or humans was selected in the following analysis. As a result, the degree of phosphoprotein conservation in ES and tissue cells was 25% (1015/4019) and 17% (605/3473), respectively. A 2 test was used to test the significance of the conservation difference of phosphoproteins between ES and tissue cells with input numbers of 1015, 4019, 605, and 3473. The resultant p value was less than 1eϪ10. To analyze the conservation differences of phosphoproteins between ES and tissue cells in specific gene function modules (Fig. 5A), the orthologous phosphoproteins in ES cells and tissue cells were classified in gene ontology terms (biological processes and molecular functions). The significance of the conservation difference of the individual gene ontology terms was tested using the same method described above for global phosphoproteins.
Phosphosite conservation between mice and humans was only considered in the orthologous phosphoproteins of the two species: the 1015 phosphoproteins in ES cells and 605 phosphoproteins in tissue cells. The orthologous phosphosites between mice and humans were obtained by performing a BLASTP search of each orthologous phosphoprotein in ES cells and tissue cells against itself. For each orthologous phosphoprotein, the highest score of the local region of the protein sequence from the BLASTP results was used to find the orthologous phosphosites. Only one copy of the multiple paralogous phosphosites was selected from the BLASTP results in mouse or human. Conservation of phosphosites between humans and mice was defined as the ratio of orthologous phosphosites to total phosphosites. As a result, the degree of phosphosite conservation in ES and tissue cells was 29% (1924/6667) and 40% (1222/ 3074), respectively. A 2 test was used to test the significance of the conservation difference of phosphosites between ES and tissue cells with input numbers of 1924, 6667, 1222, and 3074. The resultant p value was less than 1eϪ10. The orthologous phosphosites in ES and tissue cells were classified in gene ontology terms (biological processes and molecular functions) based on the InterPro domain annotation file in ipi.HUMAN.IPC and ipi.MOUSE.IPC (ftp://ftp.ebi.ac.uk/ pub/databases/IPI). The conservation differences of phosphosites between ES and tissue cells in specific gene function modules are shown in Fig. 5B. The significance of the conservation difference for the individual gene ontology terms was tested using the same method described above for global phosphosite analysis. The phosphosites, which were identified only in ES cells or tissue cells, were annotated in gene ontology terms and are shown in Fig. 5C with different colors. All significance analyses of specific terms of phosphosites between ES and tissue cells were conducted with a 2 test (supplemental Table 7), and only the specific term with a p value less than 0.05 was considered.

Experimental Strategy
The entire work flow is shown in Fig. 1. To confirm the undifferentiated state of the E14 cells used in the experiment, we first observed their morphology and performed alkaline phosphatase/SSEA-1 staining (supplemental Fig. 1). Before harvesting, the cells were examined under a microscope to ensure that the mES cells were more than 90% confluent and had a high nucleus/cytosol ratio (supplemental Fig. 1A). Immunostaining of cells cultured under the same conditions for the murine-specific stem cell surface markers alkaline phosphatase and SSEA-1 and for the expression of OCT4, SOX2, and NANOG, which are master regulators of mES cells, strongly demonstrates the undifferentiated state of the mES cells used in the assays (supplemental Fig. 1, B, C, and D).
To reduce the complexity of the mES cell proteome, we used standard cell subcellular fractionation methods (see "Experimental Procedures") and obtained the cytosolic and nuclear extractions. The quality of the fractionation procedure was examined in parallel by evaluating the expression of the cytosolic and nuclear markers Hsp90 and Lamin B, respectively, by Western blotting. The predominance of the respective markers in the relevant protein extraction confirmed the adequacy of the fractionation (supplemental Fig. 2). To maximize the coverage of the mES cell phosphoproteome, we used two additional strategies for phosphorylation enrichment at the protein and peptide levels. One strategy was to use a purification kit utilizing anti-Ser(P)/Thr(P) antibodies, and the second strategy was to use SAX off-line fractionation combined with TiO 2 enrichment (see "Experimental Procedures"). Finally, all peptide fractions were individually separated using the on-line yin-yang MDLC system or one-dimensional LC coupled with an LTQ-Orbitrap mass spectrometer with high mass accuracy (absolute average mass accuracy of 1.14 ppm). For fractions obtained from subcellular fractionation, the cytosolic and nuclear fractions were analyzed repeatedly by SCX and SAX fractionation.
When combined with all of the stringently filtered peptides, we report a collective mES cell proteome of 4581 proteins with a final FDR of less than 1% at the protein level (supplemental Table 1). The data set was compared with four previous large scale studies on mES cells (16 -19), and the results were categorized into three data sets ( Fig. 2A). Altogether, 1060 proteins overlapped and comprised 27, 21, and 23% of the individual data sets. These independent investigations of the mES cell proteome were cross-complementary and increased the number of mES cell proteins to 9297 regardless of differences in the instruments, search engines, and final criteria used.

mES Cell Phosphoproteome Profiling
Establishment of mES Cell Phosphoproteome-To comprehensively profile the phosphoproteome of mES cells, we used not only affinity purification for phosphoproteins and SAX off-line fractionation combined with TiO 2 enrichment for phosphopeptides but also subcellular fractionation followed by chromatographic enrichment for phosphopeptides. With an FDR below 1% for peptides and phosphosites (see "Experimental Procedures"), our approach confidently identified 3970 distinct phosphosites (supplemental Table 2 and supplemental Fig. 5) distributed over 2880 unique phosphopeptides (36,682 phosphopeptides; supplemental Table 3) derived from 1642 phosphoproteins; this is the largest phosphorylation data set obtained for mES cells to date. Furthermore, to make efficient use of the data set and share it for future research (42), we developed a web site StemCell Project where users can retrieve and analyze proteome and phosphoproteome data from large scale mES cell studies.
In general, various types of phosphopeptides could be recovered as previously reported with respect to their isolation specificity (43). Using multiplex and complementary strategies, we confidently identified 1570, 1509, and 1936 unique phosphorylation sites using the SAX/TiO 2 , anti-Ser(P)/Thr(P), and yin-yang MDLC strategies, respectively (Fig. 2B). Of the 2880 unique phosphopeptides identified, 79% were singly phosphorylated peptides, 17% were doubly phosphorylated peptides, and 4% were multiply phosphorylated peptides. Furthermore, the distribution of phosphoserine (Ser(P)), phosphothreonine (Thr(P)), and phosphotyrosine (Tyr(P)) was determined to be 87, 12, and 1%, respectively, which is close to the phosphorylated amino acid distribution found in hES cells (22). To archive an overview of the identified modifications, we used the PTMBlast tool in SysPTM (44) to check the status of 3970 phosphosites. In total, 1463 (37%) phosphorylation sites were found in SysPTM, and 2507 (63%) novel phosphorylation sites were identified for the first time in mES cells (Fig. 2C).
Representation of mES Cell Phosphoproteome-We investigated the phosphoproteome representation of mES cells by molecular weight, isoelectric point (pI), and amino acid composition (Fig. 2, D-F). As a comparison, the corresponding attributes of 4581 identified proteins and proteins in the IPI mouse database (version 3.52) were used as the background. The molecular weight distributions of the three data sets displayed a similar trend; however, the proteome and phosphoproteome shifted slightly toward a higher range of molecular weights. This shift could arise partly because of the limitation of the current work flow in which small proteins might be lost. The pI profiles appear as two waves representing acidic and basic proteins, and all cross in the range from pH 2 to pH 13. Because acidic phosphopeptides were more likely to be identified based on the strategy used (32), the proteins in the phosphoproteome and proteome were more likely to fall into the acidic range of the isoelectric point than those in the database. For the amino acid distribution analysis, we could not find an obvious pattern of occurrence for any amino acid appearing in the phosphoproteome, although most phosphorylation events occur at serine, threonine, and tyrosine sites. In general, the mES cell phosphoproteome FIG. 1. Flowchart of strategy used to profile mES cell phosphoproteome. Total cell lysates were digested using trypsin and fractionated by SAX followed by TiO 2 enrichment (SAX/TiO 2 ). Each fraction was separated by one-dimensional LC-MS/MS. Using antibody-based purification, the cell lysates were enriched for phosphorylated serine/threonine proteins (anti-Ser(P)/Thr(P) enrichment). Meanwhile, the cells were harvested, and subcellular fractionation was used to reduce the complexity. The two resulting fractions (cytosolic and nuclear) were digested and separated twice using an on-line yin-yang MDLC system. FT, flow-through fraction.
analyzed shows no obvious artificial bias and can thus be taken as a reasonable representation.
Some signaling pathways have been reported to function in maintaining the undifferentiated state of mES cells. These pathways include the JAK-STAT3, MAPK, SHP2-Ras-ERK, PI3K-AKT, mammalian target of rapamycin (mTOR), TGF, Wnt signaling pathways, and many others (45)(46)(47)(48)(49)(50). Many protein components of these pathways were identified in the current work, and some were phosphorylated (supplemental Table 4). Although enrichment and fractionation methods were applied, the relative low discovery rate for phosphorylation sites in these pathways may indicate that intense approaches for phosphorylation analysis are needed, especially for targeted research on some signaling pathways.
Kinases and phosphatases dominate the reversible conversion between the phosphorylated and dephosphorylated protein forms, which is a key process in controlling cascade signal transduction in mES cells. We identified several Ser/ Thr, Tyr, and dual specificity kinase/phosphatase families (supplemental Table 4), including the CAMKs, CDKs, MAPKs, and MAP2Ks families. A total of 18 phosphorylated kinases (24 specific phosphosites) and seven phosphorylated phosphatases (12 specific phosphosites) were identified.
Most signals from the surroundings are transmitted into living cells through transmembrane proteins. We predicted the transmembrane status of the 4581 identified proteins using TMHMM2.0 (51). In total, 148 proteins were predicted to contain transmembrane helixes, and 28 of those proteins were phosphorylated (supplemental Table 4).
Identification of Stemness Phosphoproteins-To further explore the phosphosites in some key regulators of mES cells, 52 stemness proteins were chosen for analysis (supplemental Table 5). The functions of these proteins were closely associated with mES cell characteristics (see "Experimental Procedures"). Twenty-two of the proteins were found to contain specific phosphosites that had not been identified previously (Table I). It is worth noting that although some of these proteins were previously identified as being phosphorylated (NANOG (29) and REST (52)) this study was the first to confidently identify a series of specific phosphosites in these critical mES cell proteins. These proteins include NANOG

Prominent phosphorylated stem cell markers identified in current study
For all identified genes and proteins related to markers, see supplemental Table 5). For all confident identifications, the phosphorylation sites are marked with an asterisk. The literature search of the phosphorylation sites was based on the UniProt database (71).  (Table I). Furthermore, to validate the phospho-NANOG and -REST proteins, we also performed a dephosphorylation treatment with phosphatase. As shown in Fig. 3, C and D, the treated NANOG and REST are present as single bands on immunoblots (29,52).
To determine the biological relevance of the novel phosphorylation sites identified in the current study, an MRM assay was developed to monitor any change in NANOG phosphorylation following LIF withdrawal at three different time points (0, 12, and 24 h). Because of the highly sensitive and robust quantitative results provided by MRM, the phosphorylated and non-phosphorylated forms of NANOG could be monitored in a single experiment ( Fig. 3E and supplemental Fig. 4). In agreement with previous reports (53) and as shown in Fig. 3E (643.6 3 818.4), the level of NANOG decreased following LIF withdrawal. Our results clearly show that both peptide forms of NANOG exhibit decreased abundance. However, the abundance of the phosphorylated peptide ( Fig. 3E; 697.0 3 730.8) decreased rapidly immediately following LIF withdrawal (from 0 to 12 h), and no decrease could be observed from 12 to 24 h. In contrast, the abundance of the non-phosphopeptide began to fall drastically from 12 to 24 h in the absence of LIF.

Phosphorylation Motif Discovery
To identify significant phosphorylation motifs in ES cells compared with ultimate tissue cells, the Motif-X algorithm (41) was used to extract phosphorylation motifs from the mES cell phosphorylation data that we generated and the hES cell phosphorylation data collected from the published literature (22)(23)(24)54). The phosphoproteins and phosphosites from mouse and human tissue cells that were used as the background data set were extracted from the SysPTM database, which stores tissue phosphorylation data collected from individual mass spectrometry-based work (see "Experimental Procedures"). To be more specific, phosphopeptides containing the 3970 confidently identified phosphosites of mES cells were prealigned at Ϯ6 amino acids from the central phosphorylated Ser/Thr/Tyr without any extension to the end of the N or C terminus. Finally, the 3957 prealigned peptides were submitted to the Motif-X algorithm. A total of 6058 prealigned phosphopeptides of hES cells, 8927 prealigned phosphopeptides of mouse tissue cells, and 6121 prealigned phosphopeptides of human tissue cells were also submitted in parallel to the Motif-X algorithm. As a result, 19 and 26 phosphorylation motifs were discovered in mES and hES cells, respectively (supplemental Table 6). Interestingly, only Pro-directed phosphorylation SP motif was found to exist in both mouse and human ES cells (Fig. 4A). We also wondered whether the motif existed in the identified phosphorylated master gene products. It was surprising that the pattern of the SP motif was prevalently found in some pivotal stem cell factors (NANOG, REST, SALL4, and UTF1) (Fig. 4B). The phospho-sites located in the motif were potential substrates of the p38_group and CDK2,3_group (Table I). The identification of their true kinases would clarify the function of these phosphorylation events in mES cells.

Phosphorylation Conservation in ES Cells
To investigate evolutionary differences in the phosphorylation events between undifferentiated ES cells and ultimately differentiated tissue cells, we performed a conservation analysis of the phosphoproteins and phosphosites. The conservation analysis of phosphoproteins was conducted as described elsewhere (55). The data were derived from the current work on mES cells, other phosphorylation sources of hES cells, and mouse and human tissue cells (see "Experimental Procedures").
It is important to determine whether the phosphoproteins are more conserved in the undifferentiated ES cell state or in the ultimately differentiated tissue cell state. Conservation of phosphoproteins between humans and mice was defined as the ratio of orthologous phosphoproteins to total phosphoproteins (see "Experimental Procedures"). Of the 1533 and 3501 orthologous phosphoproteins in mES and hES cells, respectively (22)(23)(24), a total of 1015 phosphoproteins were found to overlap with a degree of conservation of 25% (1015 of 4019). On the other hand, of the 2897 and 1181 orthologous phosphoproteins in mouse and human tissue cells, respectively, 605 shared phosphoproteins were found with a degree of conservation of 17% (605 of 3473). The significant 2 value of the two data sets (p Ͻ 1.0eϪ10) indicated that, based on the present data, phosphoproteins were more conserved in ES cells than in tissue cells on a global basis (see "Experimental Procedures"). The global phosphoprotein conservation analysis described above was further separated into specific gene ontology terms ( Fig. 5A and supplemental Table 7), which suggested that the biological function modules of potassium ion transport, ion channel activity, and transmembrane transport were more conserved in ES cells, whereas hydrolase activity, oxidoreductase activity, transcription, and many other activities were less conserved when compared with tissue cells. A phosphosite was defined as conserved between humans and mice only if the orthologous sites were phosphorylated in the two species (55). The phosphosite conservation analysis was only considered in orthologous phosphoproteins of the two species (1015 phosphoproteins in ES cells and 605 phosphoproteins in tissue cells). The degree of phosphosite conservation was calculated using a method similar to that used for phosphoproteins. Of the 2995 and 5596 orthologous phosphosites in mES and hES cells, respectively, 1924 shared phosphosites were discovered with a degree of conservation of 29% (1924 of 6667). Additionally, of the 2853 and 1443 orthologous phosphosites in mouse and human tissue cells, respectively, 1222 phosphosites were found in the two species with a degree of conservation of 40% (1222 of 3074). Intriguingly, the conservation analysis of phosphosites yielded the opposite result observed for phosphoproteins. In general, the phosphosites in tissues cells were more conserved than in ES cells (p Ͻ 1.0eϪ10). In tissue cells, more conserved phosphosites were found in domains possessing ATP binding and transcription regulation functions than in ES cells (Fig. 5B). However, many of the dominant phosphosites in ES cells were located in domains involved in nucleic acid binding, translation regulation, RNA processing, and DNA methylation (Fig. 5C). DISCUSSION In recent years, the proteome of mES cells has been widely studied; however, the phosphoproteome characteristics of mES cells have not been identified. In the current study, we integrated multiplex approaches to enrich the phosphoproteins and phosphopeptides from mES cells and confidently identified 3970 phosphosites from 1642 phosphoproteins. These data establish a large scale phosphorylation data set that contains a list of novel specific phosphosites in some key regulators of mES cells. They also allow for subsequent extensive bioinformatics analysis of the comprehensive phosphorylation features in mES cells.
The phosphorylation data that we generated included 22 phosphorylated stemness proteins with 87 specific phosphosites, most of which could be phosphorylated by the CK2, CDK2,3, or MAPK kinase group. CK2 is a positive regulator of the canonical Wnt signaling pathway, which is involved in stem cell fate determination (46,56,57). CDKs dominantly control the rhythm of the cell cycle and thus regulate the self-renewal of stem cells (58). Classical MAPK signaling pathways have been confirmed to regulate stem cells from the early steps of differentiation to maturation (59,60). By phosphorylating the core stemness proteins, which induces them into active/inactive states, all of the putative kinases could orchestrate the regulation of stem cell self-renewal or differentiation (24).
NANOG is a key regulator in the maintenance of pluripotency and self-renewal in mES cells and was previously suggested to be phosphorylated, but the exact phosphosite(s) was unclear (8,29). For the first time, we have successfully identified the exact phosphorylation sites of the NANOG protein (Fig. 3A). It is significant that the novel phosphosite(s) are located in the N terminus of NANOG, which appears to be a serine-rich sequence related to transactivation (61). Additionally, we performed a quantitative MS experiment to analyze NANOG phosphopeptide changes following cell differentiation. Unexpectedly, we observed different trends for the decrease in the phosphopeptide and non-phosphopeptide levels (Fig. 3E). Phosphatases cannot control the activity of phosphorylated proteins and protein degradation. The two different profiles may imply a unique regulatory mechanism for the phosphorylation events of NANOG. Moreover, such modifications of "master" regulators could play a role in the mechanism of pluripotency (62). However, further efforts are needed to study the biological importance and functional significance of this mechanism. Additionally, some notable proteins regulated by NANOG, including RIF1 (63) and REST (Table I), were shown to be phosphorylated in our study. For the first time, we identified the specific phosphosites in REST (Fig. 3B). These sites are a newly discovered element that is required for maintaining the self-renewal and pluripotency of mES cells (40).
The abundant SP motif in pivotal factors, including NANOG, REST, SALL4, and UTF1, may suggest that the responsible kinases are active in mES cells. It could also be hypothesized that the phosphorylation events of these prominent translation factors occur downstream of MAPKs and cell cycle-related signaling pathways. Therefore, they probably act as key hubs connecting transcriptional regulating networks and signaling pathways in mES cells, events that have been found to be necessary in maintaining the undifferentiated state of mES cells (50, 64 -70).
Moreover, we studied the evolutionary difference between ES and tissue cells of human and mouse. At the global phosphoprotein level, the degree of phosphorylation conservation was higher in ES cells than in tissue cells, implying that phosphoproteins are more likely to be engaged in the essential functions of ES cells. In contrast, the degree of phosphorylation conservation was reversed at the global phosphosite level, which could suggest that the phosphosites are more variable in ES cells. The more rapid evolutionary change of the phosphosites might indicate more flexible regulatory activities in ES cells, which may contribute to adaption to new developmental processes in differentiation. As for specific biological function modules, some might be more conserved in ES cells than in tissue cells at the phosphoprotein or phosphosite level. For example, ion channel activity and transmembrane transport were more conserved in ES cells with respect to phosphoproteins (Fig. 5A). A series of phosphosites found only in ES cells were located in domains of DNA-or RNArelated processes or functions including nucleic acid binding, DNA methylation, and mRNA/rRNA processing. These biological activities have been shown to be closely connected to the maintenance of pluripotency and self-renewal in ES cells. CONCLUSION ES cells possess the remarkable potential to develop into all cell types, which is believed to hold great promise for regenerative medicine and cancer therapies. In the current study, we focused on the large scale analysis of phosphorylation events in mES cells at the proteomic level. We obtained the most comprehensive phosphorylation event data in mES cells to date. Altogether, 52 protein markers of mES cells were identified, including SOX2, OCT4, REST, UTF1, and NANOG. Twenty-two of those proteins were found to be phosphorylated. Additionally, we constructed a database that contains all of the information contained in the current study. These data extend our knowledge of the global phosphorylation state in mES cells and are a valuable resource for the understanding and future study of the basic biology of mES cells.