Extracellular Protein Phosphorylation, the Neglected Side of the Modification*

The very existence of extracellular phosphorylation has been questioned for a long time, although casein phosphorylation was discovered a century ago. In addition, several modification sites localized on secreted proteins or on extracellular or lumenal domains of transmembrane proteins have been catalogued in large scale phosphorylation analyses, though in most such studies this aspect of cellular localization was not considered. Our review presents examples when additional analyses were performed on already public data sets that revealed a wealth of information about this “neglected side” of the modification. We also sum up accumulated knowledge about extracellular phosphorylation, including the discovery of Golgi-residing kinases and the special difficulties encountered in targeted analyses. We hope future phosphorylation studies will not ignore the existence of phosphorylation outside of the cell, and further discoveries will shed more light on its biological role.

The very existence of extracellular phosphorylation has been questioned for a long time, although casein phosphorylation was discovered a century ago. In addition, several modification sites localized on secreted proteins or on extracellular or lumenal domains of transmembrane proteins have been catalogued in large scale phosphorylation analyses, though in most such studies this aspect of cellular localization was not considered. Our review presents examples when additional analyses were performed on already public data sets that revealed a wealth of information about this "neglected side" of the modification. We also sum up accumulated knowledge about extracellular phosphorylation, including the discovery of Golgi-residing kinases and the special difficulties encountered in targeted analyses. We hope future phosphorylation studies will not ignore the existence of phosphorylation outside of the cell, and further discoveries will shed more light on its biological role.

Molecular & Cellular Proteomics 16: 10.1074/mcp.O116.064188, 1-7, 2017.
Phosphorylation is the main regulatory modification of intracellular proteins and is involved in the majority of cellular processes. The existence of this modification extracellularly has long been disputed despite the fact that caseins abundant in milk were the first phosphoproteins discovered in the late 1800s (1). Gradually, other secreted proteins and peptide hormones were reported to contain this modification. Recently (as of November 3, 2015) the UniProt database (2) yielded 131 proteins once the entries were filtered using the following criteria: species -homo sapiens; subcellular localization -secreted; post-translational modification (PTM) 1 -phospho*; and all of these were "reviewed" (supplemental Table S1). Approximately, three-fourths of these proteins are indeed in vivo modified on an amino acid side-chain that normally is localized "outside" of the cell. The lumenal side of the endoplasmic reticulum (ER) or the Golgi are also considered as such. The accumulating evidence slowly changed the old dogma that restricted phosphorylation to the intracellular milieu, yet, our understanding of the regulation and the exact role of the modification on extracellular proteins is still very limited.
Are These Proteins Secreted Phosphorylated or Modified Extracellularly?-In the classical secretion pathway a short signal sequence directs the protein to the ER, then the protein travels through the Golgi network and is finally released to the extracellular space. During this transport the protein may undergo post-translational modifications. Specific kinase activity distinct from cytoplasmic kinases was detected in the Golgi apparatus isolated from lactating mammary gland, liver, spleen, and other organs (3,4). This kinase was shown to phosphorylate proteins on an Ser-Xxx-Glu/pSer motif; resist inhibitors of other protein kinases; use Mn 2ϩ as activator instead of Mg 2ϩ ; and ATP but not GTP as the phosphate donor (4). However, this kinase escaped identification for a long time. The first Golgi-residing kinase Four-jointed (Fj) was identified in Drosophila (5). Fj was shown to phosphorylate cadherin domains of the transmembrane proteins Fat and Dachsous, yet with different site specificity. PSI-BLAST (Position-Specific Iterated Basic Local Alignment Search Tool) search performed with the human ortholog of Fj, Fjx1 identified Family with sequence similarity 20 (Fam20) and 198 (Fam198) members as other potential Golgi kinases (6). Eventually, protein Fam20C was identified as "the real casein kinase" featuring all the characteristics described decades earlier, as listed above. It resides in the Golgi but it can also be found in an N-terminally truncated, secreted form (6,7). Phosphorylation of the substrate proteins occurs intracellularly (7,8), yet Fam20C may exert its activity also extracellularly. Fam20C was demonstrated to phosphorylate a wide array of secreted proteins involved in biomineralization, lipid homeostasis, wound healing, cell adhesion, and migration (7).
Another newly identified kinase: vertebrate lonesome kinase (VLK)-which shows preference for Tyr residues-also localizes in the secretory pathway (9,10). High expression level of VLK was detected in platelets, and upon stimulation by the thrombin receptor activating peptide (TRAP) VLK was rapidly and quantitatively released from platelets. Besides phosphorylating substrate proteins in the secretory pathway, secreted VLK was found to modify targets in the extracellular environment. Though numerous substrates of VLK were identified, no consensus motif could be determined (10).
Other kinases have also been observed in the releasate of activated platelets that can perform phosphorylation of extracellular proteins if sufficient ATP is available. Platelet secreted casein kinase(s) were shown to phosphorylate protein S and this phosphorylation enhanced its activated protein C cofactor activity thereby affecting its anticoagulant property (11). Upon stimulation by thrombin a cAMP-dependent protein kinase (PKA) was released from platelets that phosphorylated vitronectin (12). Several protein kinase C isozymes were also found secreted upon thrombin stimulation from platelets (13).
Proteins lacking a signal peptide and thus not entering the classical ER/Golgi secretion pathway may be released to the extracellular space by unconventional means of protein export, either via direct membrane translocation or secretion in various vesicules (14). In the secretome analysis of MCF7 human breast cancer cell line it was found that only half of the detected proteins were secreted via the classical ER/Golgi pathway or shed by plasma membrane, the other half of the proteins used unconventional secretion (15). Intracellular phosphorylation of these proteins may precede their export. Actually, in certain cases it was demonstrated that phosphorylation is a prerequisite for the transport. Fibroblast growth factor 2 (FGF2) is secreted by direct translocation across plasma membranes and its phosphorylation at Tyr 82 by Tec kinase is required for the secretion (16). Heat stress-induced translocation of annexin 2 is dependent on Tyr 23 phosphorylation (17). The high mobility group box 1 protein (HMGB1) is a DNA chaperone that acts also extracellularly in the regulation of inflammation. Phosphorylation of HMGB1 in its nuclear localization signal (NLS) regions results in the relocation of the protein from the nucleus to the cytoplasm with consecutive secretion (18). However, it should be noted that detection of phosphoproteins or kinase activities outside the cell may also be an artifact from broken cells and thus, not physiologically relevant.
Large-scale MS Analyses Significantly Contribute to Extracellular Phosphorylation Data Accumulation-Most information on the phosphorylation of extracellular proteins may be retrieved from mass spectrometric data. Large scale analyses of tissues or whole cells are a major source of such data. Interestingly, the resulting huge data sets practically are never interrogated for the subcellular localization of the modification, and the extracellular phosphorylation sites detected are not reported separately. That is a pity, because such informa-tion has been available for years and has been mostly ignored. For a few data sets we performed this analysis with the following notices: (1) we accepted the reported phosphorylation data at face value, (2) we somewhat trusted the UniProt cellular localization labels, (3) proteins secreted via alternative pathways were kept, (4) proteins that localize to the ER/Golgi were subjected to topology prediction using the Constrained Consensus Topology prediction method (CCTOP, http://cctop. enzim.ttk.mta.hu) (19) and only proteins with lumenal modification sites were kept.
Many plasma proteins are synthesized in hepatocytes, therefore, the in-depth phosphoproteome analysis of a liver homogenate performed by Bian et al. proved to be a rich source of extracellular phosphorylation data (20). Their workflow of "enzyme-assisted" two dimensional reversed phase chromatography supplemented with phosphopeptide enrichment using Ti(IV)-immobilized metal affinity chromatography (IMAC) resulted in phosphosite mapping of 54 secreted and 51 ER-or Golgi-localized proteins with lumenal phosphorylation sites (supplemental Table S2). Similarly, a large amount of extracellular phosphorylation data could be retrieved from the analysis of breast cancer xenograft (human on mouse) and human ovarian tumor samples performed by Mertins et al. (21). Combined, 489 secreted and 275 ER/Golgi-residing phosphoproteins were identified in that study using high pH reversed phase fractionation followed by Fe(III)-IMAC phosphopeptide enrichment and LC-MS/MS analysis (supplemental Table S3). Strong cation exchange chromatography (SCX) with consecutive TiO 2 enrichment performed by Sharma et al. in the analysis of a human cancer cell line also yielded extensive phosphoproteome coverage (22). Actually, the authors of this study paid some attention to extracellular phosphorylation. However, only Fam20C phosphorylation was considered within Ser-Xxx-Glu/pSer motifs and Pro was not permitted in the "middle" position. Their search yielded 101 phosphosites on 64 proteins. Unfortunately, modification within a 3-amino acid-constrain does not guarantee the identification of the kinase responsible. The authors did not check whether the regions modified ever enter the ER/Golgi system, and thus, the list contains numerous cytoplasmic modification sites that were possibly phosphorylated by kinases other than Fam20C. For example, Golgin subfamily B member 1 (UniProt Q14789-2) was included in the list with phosphorylated Ser 128 , Ser 139 , Ser 676 , Ser 1756 , Ser 1758 and Ser 3145 . Except, according to the protein's topology all of these sites are located in the cytoplasmic domain (positions: 1-3235). Our re-evaluation which was not restricted to Fam20C phosphorylation revealed 71 secreted and an additional 90 ER/Golgi-resident phosphoproteins in their study (supplemental Table S4).
Extracellular Phosphorylation Analysis from Cell Culture Media-Though whole cell analyses may contain extensive data on the phosphorylation of secreted proteins more specific information can be extracted from targeted analyses that use the culture media of specific cell lines. Tagliabracci et al.
compared the phosphoproteome of wild type and Fam20C knockout/knockdown HepG2 liver cells secreted into the cell culture media (7). As Fam20C shows high expression level in lactating mammary gland and mineralized tissue, their analyses were extended to breast epithelial and osteoblast-like cell lines to map modification sites that are linked to the Fam20C kinase activity. Applying Fe(III)-IMAC phosphopeptide enrichment, over 100 secreted phosphoproteins were identified in the culture media of these cell lines and the majority of the modification sites responded to Fam20C depletion. Interestingly, about one-third of the phosphosites whose intensity faded upon Fam20C depletion did not match the consensus Ser-Xxx-Glu/pSer sequence. Phosphosite mapping of insulin-like growth factor-binding protein 1 in vitro phosphorylated by Fam20C also indicated modifications in non Ser-Xxx-Glu/ pSer motives. The authors speculated that the recognition motif of Fam20C might be significantly broader than originally suspected, however, kinase assays performed with synthetic peptides could extend Fam20C activity only to Ser-Xxx-Gln-Xxx-Xxx-Asp/Glu-Asp/Glu-Asp/Glu sequences present in proline-rich phosphoprotein 1 (PRP1) (7,23). In addition, one has to consider the potential existence of "phosphorylation cascades" in the Golgi where the elimination of a kinase affects not only the direct substrates but also all the "downstream" modifications.
Cancer cells are known to display differences in intercellular communication or in their cell adhesion and migration properties that might partially be because of changes in the phosphorylation pattern of the proteins secreted into their microenvironment. In a phosphorylation analysis of the secretome of different luminal and basal type breast cancer cell lines over 1700 phosphoproteins were identified altogether combining hydrophilic interaction liquid chromatography (HILIC) with TiO 2 phosphopeptide enrichment (24). Approximately half of these phosphoproteins were predicted to be extracellular or membrane proteins according to gene ontology (GO) annotation and bioinformatics analyses using SignalP (classical secretion), SecretomeP (unconventional secretion), and TMHMM (transmembrane domains) algorithms. The proteins that showed subtype specific phosphorylation sites were involved among others in cell signaling and interaction, antigen presentation or cellular assembly and organization.
Analytical Challenges in Extracellular Phosphorylation Studies-Although the secretome can be studied from the culture media of different cell lines, phosphorylation analysis directly from readily available body fluids has the most potential in the clinic. There is only a limited number of such reports yet and the phosphorylation data derived from these studies are far from those targeting intracellular phosphorylation. In cerebrospinal fluid (CSF) 25-45 phosphoproteins were identified from 60 -200 g starting material combining SCX fractionation with phosphoSer specific molecularly imprinted polymers or using TiO 2 enrichment (25,26). Plasma/serum phosphoproteome analyses fared even worse in this aspect, 40 -100 l of plasma (ϳ2-5 mg protein) yielded 30 -70 phosphoproteins using TiO 2 enrichment (either alone or in combination with SCX) or Fe(III)-IMAC (27)(28)(29). The largest phosphopeptide data set from plasma was reported by Zawadzka et al. in parallel to the breast cancer cell line secretome analysis (24). On average, 130 phosphoproteins could be identified from 1 ml plasma (ϳ50 mg protein) of control or breast cancer patients using HILIC fractionation in combination with TiO 2 enrichment. Phosphorylation analysis of saliva also yielded moderate results, a total of 85 phosphoproteins were identified from ϳ50 mg protein after SCX fractionation and Fe(III)-IMAC phosphopeptide enrichment (30).
One of the main reasons for the relatively low outcome in the phosphorylation analysis of body fluids is the low level of the modification and the complexity and high dynamic range of proteins present in such matrices. Immunodepletion was used in some cases for the removal of up to the 14 most abundant plasma proteins to increase the depth of the analysis (24,25,28). However, comparison of TiO 2 enrichment applied to serum with or without depletion of albumin and IgG showed considerable loss of phosphopeptides in the depleted fraction (27). Depletion of the highly abundant proteins was also found detrimental in the phosphoproteome analysis of CSF (26). Indeed, off-target removal was reported using albumin-specific and other immunodepletion kits (31)(32)(33). This may be because of the role of albumin as a cargo for various compounds or a result of nonspecific binding to the affinity matrix. Additionally, ten of the 14 proteins targeted by the MARS-14 column are actually phosphorylated (29). Hexapeptide bead libraries (Proteominer) provide another means to narrow down the dynamic range, and that was used in the saliva phosphoproteome analysis by Stone et al. (30). This approach, however, may distort the quantitative results.
A highly efficient enrichment pipeline is a prerequisite for successful phosphorylation analysis from such complex matrices like serum. Above examples show a repertoire of available tools developed for the isolation of phosphopeptides. The most preferred ones are metal affinity techniques: TiO 2 and IMAC often in combination with other orthogonal separations. Though very powerful, selectivity of these methods in CSF or serum is far below that reported for whole cell lysates. In the latter, selectivity as high as 90% could be achieved (34) whereas in body fluids selectivity is rather around 30 -60% (26,27,29), a consequence of the high dynamic range and the overall low extent of the modification. Additionally, TiO 2 and recently also IMAC was shown to isolate sialylated glycopeptides under certain conditions (35,36,29). In the HeLa phosphopeptide data set obtained after SCX fractionation and subsequent TiO 2 enrichment, discussed earlier (22), 12.5% of the MS/MS spectra contained glycan specific oxonium ion (m/z 204.087, HexNAc) (37). N-glycosylation is the most frequent and highly abundant modification on secreted and membrane proteins, thus, an even higher glycopeptide con-tribution is expected when extracellular or membrane proteins are probed. Approximately one-third of the MS/MS spectra contained glycan specific oxonium ions in data sets obtained after Fe(III)-IMAC phosphopeptide enrichment of serum (38,29). Glycopeptides interfere with the phosphorylation analysis at multiple levels. First, glycopeptides affect enrichment efficiency by competing for the active sites of the affinity material. Second, coelution of glycopeptides can lead to undersampling of the phosphopeptides in the LC-MS analysis of the enriched fraction. In selected cases this interference of the glycopeptides was taken into account. In the CSF analysis Bahl et al. (26) removed N-glycans by peptide N-glycosidase F treatment prior to phosphopeptide enrichment. As deglycosylation may have an adverse effect on protein solubility, it should be performed on the digested sample. Deglycosylation, however, could be cost-prohibitive especially when large amounts of serum samples are processed. Additionally, upon removal of the N-glycans the Asn residues are converted to Asp increasing the acidic character of the peptide that can induce nonspecific binding to the affinity material. For IMAC we demonstrated that high acetonitrile containing solvents promote glycopeptide binding to the sorbent in a mixed mode of metal affinity and hydrophilic interaction (29). Elimination of this latter component by lowering the acetonitrile content or replacing acetonitrile with methanol-a strong solvent in HILIC-abolished glycopeptide capture completely. TiO 2 enrichment is also performed in solvents with high acetonitrile ratio, therefore, a similar mixed mode of metal affinity and hydrophilic interaction is expected in the capture of the glycopeptides to this sorbent.
Bioinformatics Challenges-Large scale phosphopeptide data sets are processed in an automated fashion and the database search results rarely undergo manual validation. Whether a spectrum contains sufficient information for modification site assignment is MS/MS activation and sequence specific. Even if the phosphopeptide is identified correctly, reliability of the site assignment can be questionable, especially if multiple Ser/Thr residues are in close proximity. In such cases, often all sites are reported as modified that calls for some caution when handling large scale mass spectrometric data. Development of additional bioinformatic tools that calculate probability of site assignments using prior search outputs (39 -42) or direct integration of a scoring algorithm into the search engine to measure the confidence of the modification site localization (43) led to an improvement in this respect. Although, presently only a single search engine, Protein Prospector indicates directly when there is no sufficient information for reliable site assignment. Additionally, there are recent initiatives to make all mass spectral data available. Despite of these initiatives, there are errors in public databases. In PhosphoSitePlus (44) over 1000 phosphoproteins are listed that according to UniProt or GO annotation may be considered extracellular or transmembrane proteins. Among the modification sites listed for these proteins the ratio of phosphorylated Tyr residues is unusually high, which may be a consequence that a large proportion of the information in this database was obtained from affinity enrichments using phosphoTyr antibodies but also reflects the uncertainties of site assignments. In addition, human or software fault may introduce errors in curated lists. For example, Tyr 225 in osteopontin and Tyr 158 in insulin-like growth factor-binding protein 1 were incorrectly listed as Fam20C phosphorylation sites in the UniProt database, although these were originally detected TMT-modified (7) (supplemental Table S1).
Even when the sequence identification as well as the site assignments are correct we may not know where the modified protein actually came from. In this aspect we may rely on "Subcellular localization" information from the UniProt database and/or GO annotations, but the available information varies. For certain entries, the localization is straightforward, such as "Golgi apparatus membrane; Single-pass type I membrane protein." But more frequently, this information line reads like a complete list of potential cellular localizations, including everything from the nucleus to the extracellular space. There are numerous proteins where this diverse localization list reflects known translocations. For example, as mentioned above, HMGB1 indeed may occur in the nucleus as well as in the cytoplasm, and also secreted. However, this uncertainty about the "origin of the protein" certainly hinders the characterization of subcellular localization-specific modifications.
Last but not least for the characterization of extracellular phosphorylation the topology of the membrane proteins has to be considered. Luckily, such prediction-based information has been available for several proteins, and is regularly listed in UniProt. In a previous large scale glycosylation study, we found this information very reliable, we encountered only two examples among hundreds of assignments that indicated glycosylation on a predicted cytoplasmic domain (45). Also, there are several topology prediction servers available to assess potential localization of the modification sites, of which CCTOP was used here for topology prediction of the ER/ Golgi-resident phosphoproteins.
Biological Significance of the Modification-Improved analytical methodologies continuously expand the class of secreted phosphoproteins leaving clarification of the biological importance of the modification far behind. The most established role of extracellular phosphorylation is related to the formation of the mineralized tissue. The structural matrix of bones and teeth is made up of collagen. The different aspects of collagen phosphorylation on processing, fibril formation or protein interactions have recently been reviewed (46). Other noncollagenous extracellular matrix proteins are also crucial in bone and tooth formation via the control of hydroxyapatite crystallization (47,48). These secretory calcium-binding phosphoproteins (osteopontin, OPN; dentin matrix protein-1, DMP1; bone sialoprotein, BSP; matrix extracellular phosphoglycoprotein, MEPE and dentin sialophosphoprotein, DSPP) belong to the small integrin-binding ligand N-linked glycoprotein (SIBLING) family and are heavily phosphorylated (49). Abnormal phosphorylation of these proteins is implicated in severe bone or tooth defects (6,50). OPN is the most extensively studied member of the SIBLING family. The majority of its Ser residues (37 of 42 in human OPN) and a few Thr residues are reported to be phosphorylated, more than half of these being targets of the Fam20C kinase. The phosphorylated residues are mainly found in clusters and highly conserved in mammals (51,52). OPN shows a variety in PTMs and processing that is linked to its diverse biological functions. Besides its essential role in inhibiting bone mineralization, it is involved in cell attachment and signaling, inflammation or tumorigenesis (52,53).
Milk caseins are also related to the secretory calciumbinding phosphoproteins (54,55). Phosphoserine clusters within Ser-Ser-Ser-Glu-Glu sequences (called phosphate centers) present in ␣ s1 -, ␣ s2 -and ␤-caseins show high affinity for calcium resulting in sequestration of amorphous calcium phosphate (ACP) (56). The formed nanoclusters with a core of ACP surrounded by casein molecules assemble to casein micelles. These structures ensure supersaturation of calcium and phosphate in milk without calcification (57,58). Similar sequestration of ACP as calcium phosphate nanoclusters was demonstrated for OPN peptides suggesting that such nanoclusters of other secreted phosphoproteins might be involved in the inhibition of calcification in soft and mineralized tissues, the extracellular matrix or biofluids (59).
Phosphorylation analyses performed on plasma/serum samples revealed the modification predominantly on proteins involved in blood coagulation, lipid transport and homeostasis, complement activation, proteolysis, cell adhesion and signaling (27)(28)(29). The details, however, how the modification affects these processes are largely unknown. Recently, phosphorylation of Fibroblast growth factor 23 (FGF23) at Ser 180 was demonstrated to inhibit glycosylation at Thr 178 in the subtilisin-like proprotein convertase cleavage region thereby affecting secretion of the full length biologically active hormone that regulates phosphate retention in the kidney (60,61). Thus, phosphate level in the circulation is balanced by the cross-talk of glycosylation, phosphorylation and proteolytic processing of FGF23. Previously, phosphorylation of factor XI by a platelet derived kinase was shown to increase its susceptibility to cleavage by factor XIIa or thrombin (62) suggesting a possible regulation of the coagulation cascade.
Conclusions and Perspectives-Large scale phosphoproteomic analyses predominantly from whole cell lysates or conditioned media led to an expansion of extracellular phosphorylation data, although the general public is not aware of it yet. Properly cataloging such data in public databases would facilitate our understanding what functions this PTM fulfills outside the cell. Currently, the PhosphoSitePlus database has the largest collection of phosphorylation data. This list can be interrogated on the protein level about cellular localization.
Unfortunately, results from the direct analysis of biofluids are limited because of the extreme dynamic range in these matrices that currently limits MS-based phospho-screening of biofluids. However, there are initial attempts to find biomarker candidates in plasma or to assess variation in the phosphorylation level of plasma phosphoproteins (24,63). Continuous improvements in MS detection sensitivity will facilitate detection of lower abundance phosphoproteins. Yet, it is a question how deep we can dig in this huge dynamic range especially when unexpected post-translationally or artificially (because of sample handling) modified counterparts of the higher abundance components reach the level of low abundance proteins and further increase complexity. In addition, it has also been reported that using a single enrichment protocol reveals only a portion of the "phosphoproteome" (64 -66). Perhaps, the combined effort of multiple groups could deliver almost comprehensive results.
Within the cell phosphorylation is a highly regulated process with a concerted action of a plethora of kinases and phosphatases affecting the vast majority of intracellular proteins. The lower abundance of extracellular phosphorylation and the limited number of kinases and phosphatases detected outside the cell doubt such a control extracellularly. However, the wealth of extracellular phosphorylation data and also recent targeted analyses anticipate discovery of further secretory pathway kinases (67). Deciphering the actual roles of the modification on extracellular proteins is also urgently needed to clear this puzzling picture.