Discovery of O-GlcNAc-6-phosphate Modified Proteins in Large-scale Phosphoproteomics Data*

Phosphorylated O-GlcNAc is a novel post-translational modification that has so far only been found on the neuronal protein AP180 from the rat (Graham et al., J. Proteome Res. 2011, 10, 2725–2733). Upon collision induced dissociation, the modification generates a highly mass deficient fragment ion (m/z 284.0530) that can be used as a reporter for the identification of phosphorylated O-GlcNAc. Using a publically available mouse brain phosphoproteome data set, we employed our recently developed Oscore software to re-evaluate high resolution/high accuracy tandem mass spectra and discovered the modification on 23 peptides corresponding to 11 mouse proteins. The systematic analysis of 220 candidate phosphoGlcNAc tandem mass spectra as well as a synthetic standard enabled the dissection of the major phosphoGlcNAc fragmentation pathways, suggesting that the modification is O-GlcNAc-6-phosphate. We find that the classical O-GlcNAc modification often exists on the same peptides indicating that O-GlcNAc-6-phosphate may biosynthetically arise in two steps involving the O-GlcNAc transferase and a currently unknown kinase. Many of the identified proteins are involved in synaptic transmission and for Ca2+/calmodulin kinase IV, the O-GlcNAc-6-phosphate modification was found in the vicinity of two autophosphorylation sites required for full activation of the kinase suggesting a potential regulatory role for O-GlcNAc-6-phosphate. By re-analyzing mass spectrometric data from human embryonic and induced pluripotent stem cells, our study also identified Zinc finger protein 462 (ZNF462) as the first human O-GlcNAc-6-phosphate modified protein. Collectively, the data suggests that O-GlcNAc-6-phosphate is a general post-translation modification of mammalian proteins with a variety of possible cellular functions.

The attachment of N-acetylglucosamine (O-GlcNAc) to serine and threonine residues of nuclear and cytoplasmic pro-teins is a dynamic post-translational modification with emerging roles in important cellular processes such as transcription, translation, cytokinesis, and signaling (1)(2)(3)(4). O-GlcNAcylation has been linked to phosphorylation as both modifications can occupy the same or adjacent sites (2) and a functional relationship of both modifications has been identified in some cases. For instance, the interplay between O-GlcNAcylation and phosphorylation modulates the stability and activity of p53 (5). However, recent data revealed the frequent co-occurrence of O-GlcNAc and phosphate at proximal sites (6), suggesting the reciprocal regulation by O-GlcNAcylation and phosphorylation may not be a very general mechanism. Moreover, it has also been found that the distribution of O-GlcNAc sites relative to phosphorylation sites is rather random and that the modification rates at sites detected with both modifications are almost equal, indicating that, on a global level, the substrate recognition of both pathways is not interconnected (7).
The identification of O-GlcNAc-modified proteins is typically achieved by combining selective enrichment and liquid chromatography tandem mass spectrometry (LC-MS/MS). In mass spectrometry based proteomics, peptides are usually analyzed by some form of collision-induced dissociation (CID). But, owing to the lability of the O-glycosidic bond under typical CID conditions, the direct and simultaneous identification of O-GlcNAc peptides and sites is difficult. Fragment ion spectra of O-GlcNAc peptides are dominated by the sugar fragments and the GlcNAc oxonium ion cannot be distinguished from other isobaric HexNAc epimers (e.g. GalNAc). Still, the fragment ions generated by the cleavage of the O-glycosidic bond define a highly useful pattern, which significantly facilitates the (automated) discovery of glycopeptides in general and O-GlcNAc peptides in particular even in complex samples (8 -14). The specificity of these diagnostic fragment ions is further increased when identified from high resolution and high mass accuracy tandem MS spectra (14,15). To interrogate such data systematically, we have recently developed a simple scoring scheme, termed Oscore, which automatically assesses tandem mass spectra for the presence and intensity of O-GlcNAc (HexNAc) diagnostic fragment ions and, in turn, allows ranking spectra according their probability of representing an O-GlcNAc peptide (15). A combined search strategy using the protein identification software Mascot and our Oscore algorithm enabled the identification of hundreds of O-GlcNAc peptides from large-scale proteome data (6).
Very recently, phosphorylated O-GlcNAc (phosphoGlcNAc) has been identified for the first time on the synapse-specific protein AP180 purified from rat brain (16). In light of this exciting discovery, we here report on the extension of the combined Mascot/Oscore approach for the discovery of proteins modified with phosphoGlcNAc. We first adapted the Oscore for the detection of phosphoGlcNAc and then reassessed a large-scale phosphoproteomic data set from murine brain (17). This led to the discovery of 23 phosphoGlcNAc peptides on 11 phosphoGlcNAc proteins. Based on the fragmentation patterns of 220 candidate phosphoGlcNAc spectra and a synthetic standard, we deduced O-GlcNAc-6-phosphate as the most likely molecular entity. Finally, the reanalysis of a phosphoproteome study of human embryonic (hES) and induced pluripotent stem (iPS) cells (18), revealed evidence for the first time that a human protein may be modified by O-GlcNAc-6-phosphate suggesting that this PTM may exist more generally in mammalian systems.
Data Analysis-The mass spectrometric data were processed essentially as described (6). The data processing with Mascot Distiller 2.4.2.0 (Matrix Science, London, UK) was slightly modified to account for the particular fragmentation behavior of O-GlcNAc-6-phosphate peptides. Briefly, the isotope fitting for fragments below m/z 285 was disabled during peak picking, and the Oscore script was adapted to consider diagnostic O-GlcNAc-6-phosphate fragment ion features (Table I) within a mass tolerance of 10 ppm. For the identification of O-GlcNAc-6-phosphate peptides, the generated peak list files contained only spectra for which an Oscore could be calculated. The peak list files were then searched with Mascot 2.3.0 against the UniProtKB complete mouse proteome (download date 26.10.2010, 73,688 sequences) combined with sequences of common contaminants. In case of the phosphoproteome dataset of hES and iPS cells (18), spectra were searched against a subset database generated with Scaffold 3.3.1 (Proteome Software, Portland, OR) including only protein identifications from the respective full proteome data set (11,288 sequences). Carbamidomethylation of cysteine residues, oxidation of methionine, HexNAc modification of serine, threonine and asparagine residues, phosphoHexNAc modification of serine and threonine residues as well as phosphorylation at serine, threonine, and tyrosine residues were taken into account as variable modifications. Where appropriate, 4-plex or 8-plex iTRAQ was set as fixed modification at peptide amino termini and lysine side chains. Enzyme specificity was set to trypsin with up to two missed cleavage sites. The target-decoy option of Mascot was enabled and peptide mass tolerance was set to 10 ppm and fragment mass tolerance to 0.02 Da. Search results were imported into Scaffold 3.3.1 and filtered for O-GlcNAc and O-GlcNAc-6-phosphate containing peptides. Candidate O-GlcNAc and O-GlcNAc-6-phosphate spectra were inspected and validated manually (see Supplemental Spectra). Ascore-based localization probabilities for phosphosites (19) from the complete mouse brain phosphoproteome data set were calculated with Scaffold PTM 1.1.3 (Proteome Software, Portland, OR).
A list of known human and murine O-GlcNAc proteins and sites was compiled from recent publications (14, 20 -22) as well as from the databases dbOGAP (23) and PhosphositePlus (24). Similarly, phosphosite and other PTM information was retrieved from UniProtKB and PhosphositePlus. The Oscore script is available from www.wzw. tum.de/proteomics/content/research/software/and the peaklist files for the O-GlcNAc-6-phosphate data can be downloaded from Pro-teomeCommons.org Tranche using the following hash key: e/G1JUUhZmrUXt4FWJpA0Op0svUWyKxbxu3UEXfARFUjki4t-5Bpht3qVSBMWyRAW8B7zIMfR6nYZL16vGDqcrvzibDgAAAAAAA-ADFAϭϭ(passphrase: zuapfqvp23el).

Identification of O-GlcNAc-and O-GlcNAc-phosphate
Modified Peptides From Mouse Brain-We previously applied a simple O-GlcNAc protein identification strategy that uses the Oscore as a means to re-assess peptide-spectrummatches (PSMs) from standard database search algorithms. This lead to the identification of hundreds of O-GlcNAc peptides from large-scale proteomic and phosphoproteomic data sets (6). We hypothesized that it may be possible to identify O-GlcNAc-phosphate modified peptides in a similar way from phosphoproteomic data sets, as these peptides may co-purify with ordinary phosphopeptides during biochemical enrichment and which should exhibit a fragmentation pattern that can be readily discovered using the Oscore algorithm (15). To this end, we downloaded a publically available mouse brain phosphoproteomic data set (17) acquired on a dual pressure linear ion trap Orbitrap hybrid mass spectrometer using higher energy collision dissociation (HCD) 1 (25) and adapted the Oscore configuration to consider typical O-GlcNAc-phosphate and O-GlcNAc diagnostic fragment ions (see below and Table I).
Using the combination of Mascot database searching and Oscore-based evaluation of PSMs, we identified 23 O-GlcNAc-phosphate and 34 O-GlcNAc peptides (Table II). Fig.  1 shows an example for an O-GlcNAc-phosphate modified peptide derived from the SH3 domain containing scaffold protein Shank2. We note that O-GlcNAc and O-GlcNAc-phosphate fragments are not readily distinguishable by mass spectrometry from other possible HexNAc(-phosphate) epimers, but represent the most likely explanation for the identified HexNAc-phosphate peptides (see ref. 15

and below).
Phosphorylation of O-GlcNAc Likely Occurs at Position 6 -The low m/z region of HCD spectra contains diagnostic reporter ions common and specific to GlcNAc and GlcNAcphosphate modified peptides (Table I). The high mass accuracy of HCD spectra readily allows the identification of the very mass deficient reporter ions containing the phosphate moiety. These ions are strong indicators for O-GlcNAc-phosphate peptides (Table I, Fig. 2A) and the most abundant ion is typically the phosphoHexNAc oxonium ion at m/z 284.0530.
Theoretically, the phosphate moiety may be attached to the 3, 4, or 6 hydroxy group of the sugar. Several lines of evidence however suggest that the phosphate group is attached to position 6. Based on ion count statistics for 220 experimental GlcNAc-phosphate spectra, we dissected the major fragmentation pathways (Fig. 2). The typical fragmentation routes of a GlcNAc oxonium ion initially proceeds by the thermodynamically driven elimination of two water molecules to form an aromatic ring followed by the loss of a ketene (originating from the acetyl group) and a formaldehyde molecule (originating from the hydroxymethyl group) (15). For the phosphoGlcNAc oxonium ion, a very similar fragmentation pattern can be formulated. First, two water molecules involving the hydroxyls at positions 3 and 4 are eliminated to form the aromatic species (m/z 248). From here, the fragmentation pathway branches. The m/z 248 species can lose HPO 3 from position 6 to form an aromatic oxonium ion (m/z 168). This ion can further eliminate a ketene or formaldehyde molecule, resulting in abundant signals at m/z 138 and 126, respectively. In an alternative pathway, the m/z 248 ion retains the phosphate moiety and gives rise to a fragment at m/z 206, which subsequently loses the HPO 3 group to generate the 126 ion. In contrast, a presumed GlcNAc-3-phosphate or GlcNAc-4phosphate would be expected to fragment differently. To form an aromatic system, such molecules would have to lose the phosphate group along with water. Elimination of HPO 3 or H 3 PO 4 would thus result in signals at m/z 204 and 186 respectively. However, these ions are barely observed. Further evidence for the proposed molecular structure of the modification comes from the analysis of a synthetic GlcNAc-6phosphate standard. The fragmentation pathways deduced from the experimental phosphoGlcNAc peptide spectra (Fig.

Discovery of GlcNAc-6-phosphate Modified Proteins
2A, upper spectrum) are exactly mirrored in HCD spectra of the GlcNAc-6-phosphate standard ( Fig. 2A, lower spectrum). It, therefore, appears very likely that the phosphorylation is localized to the 6-position of the GlcNAc moiety.
The Occurrence of O-GlcNAc-6-phosphate is Closely Linked to that of O-GlcNAc-Overall, we identified 23 O-GlcNAc-6-phosphate peptides from 11 proteins along with 34 O-GlcNAc peptides from 25 proteins (Table II and supplemental Tables S1 and S2). Lower spectrum: HCD spectrum of a synthetic GlcNAc-6-phosphate standard. The two fragmentation spectra are virtually identical supporting the assignment of GlcNAc-6-phosphate. B, Average ion counts for the most abundant diagnostic fragment ions from 220 experimental GlcNAc-phosphate spectra. The intensity distribution of the fragment ions is consistent with the two proposed fragmentation pathways of GlcNAc-6phosphate and intensity distribution of the synthetic standard.
For six of the O-GlcNAc peptides, the modified residues could be deduced from the HCD spectra. Unfortunately, this was only possible for a single O-GlcNAc-6-phosphate peptide, a shortcoming that will have to be addressed by ETD measurements in the future. Still, the identified O-GlcNAc-6-phosphate peptides include Thr-310 of clathrin coat assembly protein AP180 which is the only protein and only site reported thus far (16).
Interestingly, the occurrence of O-GlcNAc-6-phosphate appears to be closely related to that of O-GlcNAc. Five out of the 11 O-GlcNAc-6-phosphate proteins identified here were also identified to contain O-GlcNAc-modified peptides. Another four O-GlcNAc-6-phosphate proteins harbor reported O-GlcNAcylation sites in mouse or human. In particular, the proteins Basson and Piccolo both of which are known to be highly O-GlcNAc-modified (21) were identified with six unique O-GlcNAc-6-phosphate peptides each. Similar observations can be made at the peptide level. Four peptides were identified in their respective O-GlcNAc-6-phosphate and O-GlcNAc forms, and six additional O-GlcNAc-6-phosphate peptides overlap with reported O-GlcNAc sites. We note that O-GlcNAc as well as O-GlcNAc-6-phosphate modified peptides contain, on average, 6.5 serine or threonine residues, which is considerably higher than the 3.3 for phosphopeptides and the 1.5 for unmodified peptides (6), suggesting that O-GlcNAc-6phosphate also preferentially occurs in protein regions of low compositional complexity. Taken together, almost 50% of all O-GlcNAc-6-phosphate peptides can also be found as O-GlcNAc-modified peptides, indicating that the biosynthetic route to GlcNAc-6-phosphate may proceed in two distinct steps. Presumably, the O-GlcNAc transferase (OGT) first attaches the O-GlcNAc moiety to a target residue followed by a second step, in which a yet unknown kinase (possibly GlcNAc kinase, NAGK) phosphorylates the O-GlcNAc moiety at position 6. Given that GlcNAc-6-phosphate is also a precursor for UDP-GlcNAc (26), we cannot exclude the possibility that OGT may accept a putative UDP-GlcNAc-6-phosphate as a substrate and thus directly transfers GlcNAc-6-phosphate to a protein target.
The occurrence of O-GlcNAc-6-phosphate does not only exhibit strong links to O-GlcNAc but also to phosphorylation. All O-GlcNAc-6-modified proteins identified here are known phosphoproteins and were indeed found to be phosphorylated in the present study. In fact, we found seven O-GlcNAc-6-phosphate-modified peptides, which are also modified with one or two phosphates on the same peptide and further two peptides, which overlap with phosphosites identified in the complete phosphoproteome data set.
O-GlcNAc-6-phosphate Modified Proteins in Mouse Brain-The proteins identified in this study to be O-GlcNAc-6-phosphate modified are of quite high abundance in mouse brain and may thus only represent the 'tip of the iceberg' of all proteins that may carry the modification. Interestingly though, many of the identified proteins serve multiple important functions in neurons, ranging from regulation of sodium/potassium-coupled chloride cotransporters (the serine threonine kinase WNK2) to the regulation of neurotransmitter release and retrieval (Bassoon, Piccolo, AP180, TOM1-like protein 2) as well as postsynaptic structural organization (the SH3 domain containing scaffold protein Shank2). Maybe the most striking finding is that CaMKIV can be O-GlcNAc-6-phosphate modified. CaMKIV belongs to the Ca 2ϩ /calmodulin kinase signal cascade, which, in the nervous system, exerts key functions in signal transduction, gene transcription, synaptic plasticity, and behavior (27). Located in the nucleus, CaMKIV directly and indirectly regulates the activity of several important transcription factors, including CREB. In addition, CaMKIV is highly O-GlcNAcylated, and its activity toward CREB can be reciprocally modulated by phosphorylation and O-GlcNAcylation of adjacent sites in the active site (28). Further evidence suggests that CaMKIV specifically phosphorylates and activates OGT upon depolarization of neuronal cells, suggesting that OGT is a downstream target of CaMKIV and activates the transcription factor AP-1 (also an O-GlcNAc protein (29) (30)). As depicted in Fig. 3, the O-GlcNAc-6-phosphate modified residue is located between Thr-5 and Ser-33 at the amino terminus of CaMKIV and the identified peptide overlaps with two serine residues (Ser-11 and Ser-12 in murine CamKIV) which are autophosphorylated and required for full activation of CaMKIV. CaMKIV regulation is complex and involves Ca 2ϩ / calmodulin binding, phosphorylation in its activation loop as well as autophosphorylation (31,32). The modification with O-GlcNAc-6-phosphate in this region may therefore represent a novel, potentially functional feature in the regulation of CaMKIV. Clearly, future work needs to address if or to which extent the modification is functionally relevant for any of the identified proteins in general or during CaMK signaling in particular.
ZNF-462 is the First Human O-GlcNAc-6-phosphate Modified Protein-Mouse brain is well known to contain relatively high O-GlcNAc levels, and given the apparent relationship between O-GlcNAc-6-phosphate and O-GlcNAc, the discovery of O-GlcNAc-6-phosphate proteins from this biological source may not be too surprising. To investigate if O-GlcNAc-6-phosphate is also found on human proteins, we analyzed a number of published data sets obtained from different cancer and stem cell lines representing more than 36,000 phosphopeptides (18,33,34). From the hES and iPS cell line data (18), numerous O-GlcNAc proteins were identified (6) and O-GlcNAc-6-phosphate was identified on Zinc finger protein 462 (ZNF-462). Unfortunately, no clear function has been assigned to this protein yet (other than DNA binding and a putative involvement in transcriptional regulation), hence, we are unable to speculate about any functional significance of its modification by O-GlcNAc-6-phosphate. However, the first identification of an O-GlcNAc-6-phosphate protein from human clearly indicates that the modification not only exists in rodents (and brain for that matter), but possibly represents a general novel post-translation protein modification in mammalian cells with potential functional significance.

CONCLUSIONS
The Oscore-based re-assessment of high resolution tandem mass spectra from published phosphoproteomic studies enabled the identification of 12 O-GlcNAc-6-phosphate modified proteins, including the first human O-GlcNAc-6phosphate modified protein. This shows that O-GlcNAc-6phosphate is not a singular protein modification (16) and that it is sufficiently stable and abundant to be detected in the presence of tens of thousands of phosphopeptides. Thus we expect that mining phosphoprotemic data will substantially increase the number of proteins that can be modified in this way. Still, more efficient biochemical enrichment tools as well as MS techniques such as ETD that preserves the modification will likely be required for the proteome-wide investigation of O-GlcNAc-6-phosphate in the future. In addition to merely enumerating modified peptides, the identification of the corresponding O-GlcNAc kinase(s) as well as potentially involved phosphatases will clearly be important steps toward a basic understanding of this novel post-translational modification.