Integrating phosphoproteomics in systems biology

Phosphorylation of serine, threonine and tyrosine plays significant roles in cellular signal transduction and in modifying multiple protein functions. Phosphoproteins are coordinated and regulated by a network of kinases, phosphatases and phospho-binding proteins, which modify the phosphorylation states, recognize unique phosphopeptides, or target proteins for degradation. Detailed and complete information on the structure and dynamics of these networks is required to better understand fundamental mechanisms of cellular processes and diseases. High-throughput technologies have been developed to investigate phosphoproteomes in model organisms and human diseases. Among them, mass spectrometry (MS)-based technologies are the major platforms and have been widely applied, which has led to explosive growth of phosphoproteomic data in recent years. New bioinformatics tools are needed to analyze and make sense of these data. Moreover, most research has focused on individual phosphoproteins and kinases. To gain a more complete knowledge of cellular processes, systems biology approaches, including pathways and networks modeling, have to be applied to integrate all components of the phosphorylation machinery, including kinases, phosphatases, their substrates, and phospho-binding proteins. This review presents the latest developments of bioinformatics methods and attempts to apply systems biology to analyze phosphoproteomics data generated by MS-based technologies. Challenges and future directions in this field will be also discussed.

to site occupancy, and the phosphorylation events are usually dynamic and transient. To systematically study phosphoproteins, several highthroughput phosphoproteomic technologies, such as reverse phase protein array, phospho-specific flow cytometry, and mass spectrometry (MS) based technologies have been developed as summarized in recent reviews [7,8]. Among them, MS based technologies have become major platforms that have been routinely applied to identify phosphoproteins and phosphosites at a global, unbiased, and quantifiable level [9][10][11][12][13]. In a typical MS-based phosphoproteomic experiment, the procedure can be divided in four stages: sample preparation, including cell fractionation and protein digestion; enrichment of phosphopeptides via affinity purification; analysis via liquid chromatography (LC) coupled with tandem MS; and finally, localization and quantification of phosphosites/ phosphoproteins using bioinformatics approach (summarized in Fig. 1). Recently, several excellent articles have reviewed the technical details of phosphoproteomic experiments, i.e., the first three steps [14][15][16]. In this review, we focus on the latest developments of bioinformatics approaches to annotate and integrate phosphoproteomic data, including phosphosites identification, comparative study of phosphoproteins, and construction of phosphorylation networks.
Systems biology is a relative new field that studies the properties and models of biological systems from systematic measurements, i.e., the "omics" data from high-throughput experiments. Systems biology approaches aim to reveal the property and behavior of dynamic and complex biological systems in the system level instead of individual parts [17]. Network construction and modeling are very important parts of systems biology; while signal transduction plays essential roles in the regulation and coordination of networks, and is in many cases mediated by protein phosphorylation [4,5]. Protein kinases and phosphatases are essential components of phosphorylation process: kinases add phosphate groups to their substrates; while protein phosphatases facilitate the reverse reaction. Pathways for protein phosphorylation are large and interconnected networks, involving kinases, phosphatase and their substrates. Large amounts of phosphoproteomics data generated from high-throughput experiments make it possible to construct a range of cell type, tissue, organism, and disease-specific phosphorylation networks, and they have allowed us to investigate the functional phosphorylation associated signaling states. In this review, we will present the recent progress in this exciting field.

Phosphosite localization
MS/MS spectra generated from phosphoproteomics experiments are used to identify phosphopeptides by applying database search tools, such as MASCOT [18] and SEQUEST [19]. However, most search engines are not designed or optimized for identification of phosphopeptides, and they don't provide reliable confidence levels for the exact localization of possible phosphosites (i.e., the identification of exact amino acids phosphorylated). This becomes crucial if there are two or more potential phosphosites in detected peptides. The situation becomes even more complicated when the phosphopeptides have low abundance and when intense neutral loss peaks in the MS/MS spectra dominate signals of interest. Statistical methods have been developed to score the reliability of phosphosite localization using one of two following strategies as reviewed in [20]: estimating the probability of each candidate phosphosite based on site-determining ions in MS/MS spectra, examples include the popular A-score [21] and PTM score [9]. Alternate methods include calculating a score based on the difference of database search outputs between different site assignments for a given phosphopeptide, as implemented in Mascot Delta Score [22] and SLIP score [23]. In this section, we present details of these methods, recent developments, and results from the comparison of different phosphosite localization methods.
Gygi and colleagues proposed the A-score, a measurement of the confidence for the correct phosphosite localization in a given peptide that has two or more potential phosphosites [21]. First, based on the peptide sequence, fragment ions that are able to uniquely assign a specific phosphosite are identified and termed "site-determining ions"; then, in the corresponding MS/MS spectrum, the site-determining ions matched peaks are identified and counted for each possible phosphosite; the cumulative binomial probability is calculated using the total number of site-determining ions and the number of matched ions; finally, the probabilities for the top two candidates are used to calculate A-score by formula − 10 × log(P 1 ) + 10 × log(P 2 ), where P i is the probability for the best two candidates [21]. The PTM sore is calculated in a very similar way [9], the major difference is the selection of ions: all detected ions are used to calculate PTM score; in the case of A-score only "site-determining ions" are used. Taus et al., observed that the density of peaks in different regions is different across each spectrum, and developed phosphoRS, which extends the A-score method by taking this into account [24].
More recently, Nesvizhskii and colleagues developed LuciPHOr, which uses mass accuracy (m/z) and peak intensities for phosphosite localization scoring [25]. LuciPHor is based on the principle that the distribution of peak intensity and m/z for random peaks and peaks with fragment ions are distinguishable, and an odds ratio can be calculated for each peak. There are four steps to compute LuciPHOr score: peak assignment, dynamic training, computing odds ratio, and score calculation. First, peaks with matched ions and random peaks are identified; then, distribution of the intensity and m/z for matched peaks and random peaks are established in the dynamic training step separately. The log odds of a matched peak can be computed using the determined distribution for each peak; a cumulative log odds for a spectrum (i.e. scores for all matched ion in a spectrum are summed up) can be assigned to each candidate site, and the final LuciPHOr score is the difference of cumulative log odds between the top two candidate sites. The authors claimed that LuciPHOr identified more correct phosphosites at equivalent false location rates compared to A-score and Mascot Delta Score [25].
The search engines utilized by phosphoproteomics consider all possible phosphosites in a peptide, and provide an identification score for each phosphopeptide. Thus, the score from a search engine can be used to estimate the site localization reliability. Beausoleil et al. first proposed the normalized Mascot delta score (i.e., the difference between the top two Mascot scores divided by the best Mascot score) for scoring the phosphosite localization and found that its performance was inferior compared with A-score [21]. However, Savitski et al. showed that the Mascot Delta score (MD-score, the difference between the top two Mascot scores) achieves similar performance with A-score, and the performance of normalized Mascot delta score is impaired by the low quality search results [22]. Methods for other search engine and engine-independent methods have been developed [23,26]. One serious drawback of this type method is that the scores of two possible phosphosites are very similar, even equal, for around 10% of the cases. Thus, this method cannot distinguish them.
The performances of different methods for phosphosite localization have been compared in several studies. Marx et al. showed that MDscore, phosphoRS and PTM score have overall similar accuracies, but the results are complementary among these approaches, which implies that there is still room for improvement [27]. This and other studies also demonstrated that localization methods have to be adjusted based on experimental designs (e.g., low vs. high mass accuracy or electron transfer dissociation vs. higher collisional dissociation) [25,27,28]. However, the performances of various phosphosite localization methods have not been validated by extensive experimental examination.
False localization rate (FLR) is a general and useful concept to measure the reliability of phosphosite localization though accurate estimation of the FLR is challenging because peptides can have significantly different uncertainty in phosphosite localization [20]. Furthermore, the incorrect phosphosite localizations are not randomly distributed across peptides; they are often near neighbors to the correct sites. Consequently, the use of decoy sequences, which are commonly used for measuring the reliability of peptide and protein identification, cannot provide accurate estimation for the error of phosphosite localization. To estimate these errors, approaches of phosphorylation estimation in silico have been developed, which computationally phosphorylates amino acid residues that typically do not occur in nature [23,25]. However, the field of FLR calculation is still in its early stages, and more study is critical for improved phosphoproteomics analyses.

Phosphoproteomics data resources
After phosphosites are determined by localization methods, many are deposited into public repositories. Due to recent advances in technologies, the amount of phosphoproteomics data in the public depositories has been growing exponentially in recent years [29]. In this section, we will present an overview of data resources in phosphoproteomics including phosphosite databases and phosphorylation networks.
Over the last decade, many public depositories have been developed to store phosphosite information, including Phospho.ELM [30] and phosphoSitePlus [31], which are commonly used in phosphoproteomics studies. Phospho.ELM contains experimentally identified phosphorylation sites in eukaryotic proteins; the current version (April, 2014) is 9.0, which contains 42,573 phosphosites for 8718 phosphoproteins. The current phosphoSitePlus has 144,042 phosphosites for 16,390 proteins, and it also includes other PTM sites, such as 12,300 acetylation sites. There are two major differences between these two resources. 1) Phospho.ELM focuses more on human: the proportions of protein instances are Homo sapiens (62%) and Mus musculus (16%), while the numbers are 51% and 42% for phosphoSitePlus. 2) phosphoSitePlus covers much more journals and papers (over 13,000 papers and 600 different journals), and it adds unpublished results from Cell Signaling Technology research groups. Some general databases also contain phosphorylation information for many proteins, such as HPRD [32] and UniProt [33], while databases for some specific organisms have also been developed, like PhosPhat for Arabidopsis thaliana [34], P 3 DB for six plants species [35] and phosphoPep for Drosophila [36]. Many post-translational modification (PTM) databases include phosphosite information since it is the most common PTM; examples are Phosida [37], SysPTM [38] and dbPTM [39].
Cataloging phosphosites can help to understand mechanisms of signaling and interactions; however other information is also needed, like the corresponding kinases and phosphatases that are active towards the known phosphosites. These can be used to create wideranging phosphorylation networks. The most reliable and extensive information to date is on kinase-substrate networks. High-throughput technologies and computational approaches have been developed to identify these relevant kinase-substrate relationships, and the results are deposited into network databases, like phosphoNetworks and phosphoPOINT [40][41][42]. PhosphoNetworks contains high-resolution phosphorylation networks derived from protein microarray and MSbased experiments, and it integrates a range of approaches to store, visualize, and analyze phosphorylation data [41]. The phosphorylation networks in PhosphoNetworks connect 230 human kinases with 2591 phosphorylation sites on 652 substrate proteins. Using a computational approach to integrate phosphoproteomics, protein interaction, and gene expression data, Yang et al. established phosphoPOINT, a database of the interactions among kinases, their potential substrates, and interacting proteins [42]. phosphoPOINT includes 4195 phosphoproteins, 518 kinases, and their corresponding protein-protein interaction (PPI) partners.
Complementary to identification of phosphosites by high-throughput experiments, computational methods have been previously developed to predict phosphosites [43]. Many state-of-the-art machine-learning approaches have been used to predict phosphosites, including artificial neural network [43], support vector machine [44], and random forest [45]. Besides protein sequences, some methods also used structural and evolutionary information to improve predictions [37,43,46]. The available methods and challenges in this field have been reviewed and compared for phosphorylation site prediction in recent articles [47,48].
Although phosphoproteomics data have increased dramatically in public repositories, it is important to note that the majority of the data are generated from high-throughput technologies and not validated by methods with higher accuracy. Thus, these databases thus have several problems associated with them aside from potentially inaccurate site identification. First, we do not know the proportion of phosphosites that are functionally active as opposed to sites that are modified by off-target effects of kinases. This is discussed extensively below. Second, phosphosite localization might not be correct if neighboring potential phosphosites exist. Third, there is minimal information on the stoichiometry of phosphosites in current databases, due to the difficulty of quantification. Such information is very important and can provide clues on the significance of the phosphorylation. Highthroughput methods to measure phosphorylation stoichiometry have been proposed recently [49], but the performance has not been extensively examined.

Functionality and evolution of phosphosites
Over 150,000 phosphosites have been documented in databases, and we expect that the number will continue to increase rapidly. However, the functional effects of most of them have not been investigated with sufficient care to fully understand their significance. Recently, it was proposed that some phosphosites have little or even no biological functional effect at all [50]. This concept can be compared to that of "driver/passenger" mutations in cancer genome [51]. In this respect there may be "driver" phosphorylations that are most important for mediating functional changes as well as "passenger" phosphorylations, which occur due to the normal operation of the cellular machinery but have little or no effect on the regulatory functions. One of the arguments to support this proposal is the lack of evolutionary conservation for some phosphosites. This section presents the recent development on the comparative phosphoproteomics along with the debates about the non-functional phosphorylation.
Landry et al. compared the phosphoproteomes of yeasts and vertebrates and found that phosphoproteomes evolve rapidly in these two lineages, and phosphosites with known functions are much more conserved than those without annotated functions [52]. They estimated that around 65% of phosphosites is not conserved. By comparison, for human vs. mouse orthologs, another study reported that 10-15% phosphosites are found not to be under strong constraint [37]. It was proposed that the non-conserved phosphosites might be non-functional [50,52,53]. However, other studies demonstrate that the functional phosphosites might not be necessarily conserved, and rapidly evolving phosphoproteomes could play important roles in evolution [54]. Furthermore, evolutionary constraints might act at the phosphorylation network level, instead of at the level of individual phosphorylated sequences [55,56].
By analyzing 308 substrates of cyclin-dependent kinas Cdk1 in yeast, Holt et al. found that 90% of phosphosites is in loops and disordered regions of proteins, and the "phosphorylated regions" are conserved across yeast species (i.e., the same loop regions are phosphorylated), but the positions of phosphosites are much less constrained [54]. At least two major mechanisms have been proposed for phosphorylation to regulate protein function: conformation change and phosphopeptide binding [4], as illustrated in Fig. 2. Precise conformation changes require the exact sites to be phosphorylated (i.e., the phosphosites are well conserved across species). On the other hand, phosphopeptide-mediated binding mechanisms may not have such strong requirements and sequence alignments in fact are challenging in these regions. Thus, the authors proposed that Cdk1 modifies substrate functions by phosphopeptide-binding mechanisms: the addition of phosphates to a protein sub-domain disrupts or generates interactions with other proteins and the phosphorylation provides a general guide (or repulsion) for signaling the binding partner [54]. The authors also suggested that the less conserved phosphosites could help to generate new regulatory mechanisms [54].
Tan et al. compared the human phosphoproteome with distantly related species (fly, worm and yeast) and identified 479 phosphosites that were conserved between human and at least one other species, which represents only 11% of 4448 human phosphosites [55]. The corresponding kinases for the 479 conserved phosphosites were identified using a computational approach [57], and conserved kinase-substrate networks (778 phosphorylation events in 698 human proteins) were constructed. Their results demonstrated that part of phosphorylation events is conserved at both sequence and network levels. However, although it is plausible that the conserved sites are functional, the converse is not necessarily true; i.e., the non-conserved sites are not necessarily nonfunctional. Nevertheless, their study suggests the importance of considering the phosphorylation networks instead of phosphosites alone.

Phosphorylation and phospho-signaling networks
Phosphorylation events can be portrayed as products of phosphorylation networks, which consist of kinases, their substrates, and their corresponding functional relationships. Many kinase substrates are also kinases, and in many cases, kinases are auto-phosphorylated. Thus, kinases are highly regulated by phosphorylation, and the phosphorylation networks are complex and an important part of signaling pathways. Numerous studies have attempted to construct and characterize phosphorylation networks using both experimental and computational approaches in model organisms and diseases [9,12,13,40,55,[58][59][60][61][62][63][64]. Significant insights have been gained through these studies. For example, phosphoproteomics studies of stimulated HeLa cells demonstrated that protein phosphorylation can regulate the transcription levels of many downstream targets, and phosphosites are regulated differently within the same protein, suggesting that functional effects of phosphosites within a protein are site-dependent [9]. Another lung cancer phosphoproteomics study found different combination of activated kinases in different cancer patients, implying the importance of the personalized treatments [12]. In a recent review, Liu et al., summarized the methods to construct phosphorylation networks and outlined recent developments of identifying phospho-signatures from phosphorylation networks [65]. To avoid redundancy, this section will focus on recent developments of phosphorylation-associated signaling networks (phospho-signaling networks).
Phospho-signaling networks are the networks and pathways in which phosphorylation plays a significant role in mediation of the signal transduction. Phosphorylation networks are part of phospho-signaling networks in which there are multiple important components, such as protein kinases, phosphatases, and their substrates, and phosphobinding proteins. The relationship among kinases, phosphatases, and phospho-binding proteins is illustrated in Fig. 3. Phosphatases have the opposite function of kinases: they remove the phosphate group from phosphoproteins by hydrolyzing phosphoric acid monoesters into a phosphate ion and a molecule with a free hydroxyl group. Kinases have been the focus of research due to their importance in signaling, while phosphatases were considered more passive housekeeping enzymes and not as important as kinases. However, recent studies clearly show that both kinases and phosphatases are very important for signal transduction: in many cases, kinases control the amplitude of the signal while phosphatases govern the rate and duration of the signal [66][67][68]. Thus, phosphatase databases and phosphatasesubstrate networks have been developed to facilitate the study of phospho-signaling networks [69][70][71][72]. Based on the substrates, phosphatases can be divided into tyrosine-specific, serine/threoninespecific, dual specificity phosphatases, and others such as lipid phosphatases. There are 226 human phosphatases known to date, and around 100 of them are classified as tyrosine-specific, implicating that phosphatases are important for phospho-tyrosine related signal transduction [67,[69][70][71]73]. Interestingly, the number of known tyrosine kinases encoded in human genome is similar with the number of tyrosine phosphatases [1] (110 tyrosine kinase, and 100 tyrosine-specific phosphatases). By contrast, only 38 out of 226 human known phosphatases are serine/threonine-specific, but there are more than 400 kinases with known or predicted function to phosphorylate serine/threonine [70]. Protein phosphatase 1 (PP1) and protein phosphatase-2A (PP2A) are responsible for most dephosphorylation reactions counteracting the effects of hundreds of kinases. Their crucial roles in modifying protein functions and in phospho-signaling networks are now beginning to be appreciated [70]. The sequence analysis of phosphatase targets show that the + 1 position is preferred as proline for some phosphatase (CDC14A, CDC14B, SCP, PP5, PP2B, and PP2A), but it is not the exclusive substrate specificity: PP5, PP2A and PP2B recognize other motifs as well [70,[74][75][76]. Furthermore, in general, the substrate specificity is weak for most phosphatases (particularly serine/threonine-specific phosphatases), suggesting the catalytic domains might be guided to act on their substrates by complex mechanisms [77], for example PP2A has multiple regulatory subunits that likely control specificity [78].
Phosphatases and kinases can share substrates and docking sites or even interact directly, and biological insights can be gained through investigating these relationships. For example, in a global protein kinase and phosphatase interaction network of budding yeast, phosphatase CDC14 interacts with around 20 kinases and five other phosphatases, indicating its significance for kinase signaling and DNA damage response [79]. Another study showed that CDC14A/B and cyclindependent kinases (CDKs) share several common substrates, confirming that CDC14A/B plays a role in cell cycle regulation [70]. In the case of common substrates, simulation studies demonstrated that the phosphorylation states can reach distinct steady states and the number of such states increases with the number of phosphosites, which highlights the effects of the interactions between kinases and phosphatases [80]. Furthermore, overlapping or common docking sites for kinase and phosphatases suggests a competition mechanism for regulating the phosphorylation states of the shared substrates [81][82][83].
Phospho-binding proteins are another important component of phospho-signaling networks. One characteristic feature of cellular signaling is the separation of enzyme activity from target recognition. Target recognition is often achieved by the cooperation of proteins with binding domains and proteins with target motif sequences. The Src homology-2 or SH2 domain is one well-characterized peptide-binding domain, and it recognizes and binds phosphotyrosine residues in specific sequence contexts [4]. There are 111 human proteins containing SH2 domains, which can be divided into 38 families based on sequence [84]. Other widespread phospho-binding domains include phosphotyrosinebinding (PTB), 14-3-3, and forkhead-associated (FHA) domains [85][86][87]. These proteins have diversified functions, such as kinase, phosphatase, transcription factor, and regulation of cytoskeletal architecture and cell adhesion. Thus, they play significant roles in phospho-signaling networks. Proteins without phospho-binding domains can also be affected by phosphorylation, which is the case for proteins with PDZ or SH3 domain. The recognition target of both domains has Ser or Thr, whose phosphorylation can influence the domain interactions and consequently result in uncoupling of signaling [88][89][90]. Thus, the study of phosphorylation on signaling should go beyond kinase/phosphatase and substrates and apply a systems biology approach that integrates multiple potential interactions, each of which contains a different slice of information [17].
Although the importance of both phosphatases and phosphobinding proteins in signaling pathways has been accepted in recent years [4,73], the studies are still far less comprehensive compared to those of kinases. To complete our knowledge of signaling processes and diseases mechanism, much more work on these important proteins are needed to identify their functions, substrates, and effects using systems biology approaches in particular. Systems biology approaches and the construction of phospho-signaling networks can also reveal the high level properties of systems, improve our understanding of diseases, and help develop new biomarkers and treatments. In a recent study, Koytiger et al. developed a protein microarray technology to systematically investigate interactions between human proteins with SH2 or PTB domain and receptor/adaptor proteins with phosphotyrosines [91,92]. They constructed a protein-protein interaction network, and found that it has a very high degree of connectivity, and many hub nodes are well-know oncogene, moreover, some of them are the primary targets of new anticancer drugs (e.g. EGFR, ERBB2, RET, KIT and ABL1) [92].

Integrating phosphoproteomics with other data
Various high throughput technologies have been developed and widely applied to profile different types of cellular components, such as microarray and next generation sequencing for DNA or mRNA, and high-resolution MS for proteins and protein-protein interactions. Combination of these technologies can generate various types of omics data from the same cell type or individual, and integrating diverse omics data Fig. 3. Relationships among kinase, phosphatase and phospho-binding protein. Note that the phosphoprotein and phospho-binding protein could also be a kinase or phosphatase, which makes for a quite sophisticated phospho-signaling network.
can provide significant insights for the mechanism of cellular process or diseases [93,94]. However, integrating diverse type of data is always challenging due to the differences in the technological methods and complexity of cellular process, which include post transcription regulation and post-translational modification [95]. Recently, several studies proposed new methods to integrate phosphoproteomics with other data to investigate signaling pathways. In complex diseases like cancer, mutated genes can dysregulate several pathways simultaneously; this consequently leads to dramatic changes in levels of many genes/ proteins, such as mRNA expression, protein abundance, and phosphorylation states [92,96,97]. To identify the dysregulated gene/proteins and pathways in cancer cells, Balbin et al. have attempted to integrate phosphoproteomics with transcriptomics, and proteomics [98]. They developed a novel method to identify proteins and pathways associated with mutated KRAS in non-small cell lung cancer (NSCLC) cell lines. To overcome the low overlap among data sets (only two differentially expressed proteins shared among three data sets at the level of corrected p-value 5%), they designed a metric S to identify transcripts, proteins and phosphoproteins with differential abundances uniquely or consistently seen across different data sets. First, the logtransformed fold changes (LFC) of each protein/gene was computed, and z-transformed for each data set separately. Then, the S score of one protein is the weighted sum of z-transformed LFC; while the weight is a function of the size of data set. Using this metric, 115 differentially abundant proteins were identified at adjusted P-value 5% level. Further pathway analyses based on these 115 proteins suggested an active and targetable subnetwork composing of KRAS, MET, LCK and PAK1 in KRAS-dependent NSCLC cell lines [98].
In another study, phosphosites detected by phosphoproteomics have been integrated with cancer genomics data to study the phosphorylation signaling in cancer. Reimand et al. developed ActiveDriver, a method to identify frequently mutated phosphosites and genes/ proteins in cancer [99]. The method is based on linear regression and applies likelihood ratio test to identify genes that have phosphosites with unexpected high mutation rates. ActiveDriver was applied to 87,060 known phosphosites and mutation data from The Cancer Genomics Atlas (TCGA) and identified 150 known or candidate cancer genes whose phosphosites are affected by the mutations [100]. Further analysis showed that those mutations likely led to rewiring of the involved signaling pathways.
Phosphoproteomics has also been integrated with PPI network to reveal network properties of phosphoproteins. Yachie et al. integrated yeast phosphoproteome with protein structure, proteome abundance, and PPI networks to investigate the effects of phosphorylation on protein interaction patterns [101]. They suggest that phosphoproteins have more interactions than other proteins, and the interacting partners of phosphoproteins are more diversified, implying that phosphorylation makes a large contribution to PPI diversity and might have significant roles other than signaling. They also found that interacting proteins tend to be phosphorylated together most likely by the same kinase.
Phosphorylation is only one of dozens frequent PTMs; others include glycosylation, acetylation, methylation, ubiquitination, and SUMOylation. Recent studies revealed extensive crosstalk and interplay between phosphorylation and other PTMs [14,102], for example, the competition between phosphorylation and O-linked glycosylation for Ser and Thr [103][104][105]; and the promotion/inhibition of ubiquitination by phosphorylation [106]. Recent studies investigated the co-evolution and functional association of PTMs, and constructed a network of associated PTMs that regulates multiple functional states of proteins [107,108]. By integrating protein interaction networks with several PTMs, Woodsmith et al. found that multiple PTMs are enriched in some disordered regions, which have high mutation rate in cancer and can be important for cellular processes [109]. New high throughput methods have developed to enrich phosphorylation, ubiquitylation and/or acetylation for same sample, and enable the integrated analysis for different PTMs [110,111]. However, currently sensitive and robust high throughput approaches are available only for few types of PTM, new reliable methods are needed to study the interplay and crosstalk of all common PTMs.

Challenges and future directions
Several challenges still remain to be tackled in the phosphoproteomics field. Due to the limitation of current technologies, many phosphosites have yet to be discovered, and knowledge is limited for the majority of known phosphosites, e.g., lack of functional annotation and abundance information. Moreover, other type of phosphosites, such as phosphohistidine, is much less studied. Histidine phosphorylation was discovered five decades ago, reported to present in both prokaryotes and eukaryotes, and might be more common than phosphotyrosines [112]. Studies showed that it plays significant role in signaling processes in some species and is implicated in certain human diseases. Detection of phosphohistidine is challenging due to its chemical instability [113]. However, recent discovery of phosphohistidine antibodies might provide a valuable method to tackle this challenge [114].
Another challenge is the identification of functionally important phosphosites. Although the proposal of nonfunctional phosphorylation is controversial, some phosphosites likely have more important roles than others. Several computational methods have been developed to distinguish those impotent ones based on the protein structure, but their performance is hard to evaluate [115,116].
As mentioned above, the phospho-signaling network is composed of phosphoproteins, kinases, phosphatases, and phospho-binding proteins. Phosphoproteomics experiments provide large amounts of information about phosphoproteins but minimal information for other components. Computational methods based on motif sequences or PPI network have been developed for prediction of kinases that is associated with phosphosites [57,117], but the method is very limited for the phosphatase and phospho-binding proteins. Consequently, our knowledge of signaling process is incomplete and heavily biased. A lot more work is needed to elucidate the sequence motifs and other factors that affect substrate specificity for phosphatase and interacting partners of phosphoproteins, and these will be the basis for the development of computational methods for prediction.