Confident Phosphorylation Site Localization Using the Mascot Delta Score

Large scale phosphorylation analysis is more and more getting into focus of proteomic research. Although it is now possible to identify thousands of phosphorylated peptides in a biological system, confident site localization remains challenging. Here we validate the Mascot Delta Score (MD-score) as a simple method that achieves similar sensitivity and specificity for phosphosite localization as the published Ascore, which is mainly used in conjunction with Sequest. The MD-score was evaluated using liquid chromatography-tandem MS data of 180 individually synthesized phosphopeptides with precisely known phosphorylation sites. We tested the MD-score for a wide range of commonly available fragmentation methods and found it to be applicable throughout with high statistical significance. However, the different fragmentation techniques differ strongly in their ability to localize phosphorylation sites. At 1% false localization rate, the highest number of correctly assigned phosphopeptides was achieved by higher energy collision induced dissociation in combination with an Orbitrap mass analyzer followed very closely by low resolution ion trap spectra obtained after electron transfer dissociation. Both these methods are significantly better than low resolution spectra acquired after collision induced dissociation and multi stage activation. Score thresholds determined from simple calibration functions for each fragmentation method were stable over replicate analyses of the phosphopeptide set. The MD-score outperforms the Ascore for tyrosine phosphorylated peptides and we further show that the ability to call sites correctly increases with increasing distance of two candidate sites within a peptide sequence. The MD-score does not require complex computational steps which makes it attractive in terms of practical utility. We provide all mass spectra and the synthetic peptides to the community so that the development of present and future localization software can be benchmarked and any laboratory can determine MD-scores and localization probabilities for their individual analytical set up.

Post translational modifications (PTMs) 1 of proteins are being actively pursued owing to their broad biological significance. In particular, recent advances in liquid chromatography and mass spectrometry have made the large scale enrichment, identification and quantification of phosphopeptides feasible (1)(2)(3)(4)(5)(6). At the same time, it has become increasingly difficult if not impossible to verify both identification and phosphorylation site assignments by manual inspection of tandem mass spectra (7). For reasons of throughput and objectivity, the automatic assignment of phosphorylation to the correct amino acid in a peptide has become an important yet challenging task (2,8,9). Owing to the frequently observed loss of the phosphate group in the gas phase, tandem MS spectra of phosphopeptides are often not straightforward to interpret (10). This complicates site localization because, unlike for peptide identification, the detection of one or few particular fragment ions is often required for unambiguous results. If a particular fragmentation technique does not generate these ions efficiently or the employed mass spectrometer cannot efficiently detect them, site localization may be significantly impaired. The situation is further complicated by the presence of multiple potential sites of modification in a peptide and the fact that phosphopeptides are often identified by single spectra only. Many of the common automated peptide identification tools such as Mascot and Sequest (11,12) do not explicitly score for proper PTM site assignment. In Mascot, ion scores are computed for each spectrum to peptide match, which may include alternative PTM site localizations within a peptide but the ion score alone does generally not suffice to call a phosphorylation site correctly (9).
To overcome issues of throughput and objectivity in phosphorylation site localization, several computational approaches have been published over the past few years (2,8,9,(13)(14)(15)(16)(17). The two best known are the Ascore (9) and the highly similar PTM score (2), which both use empirically collected information on the fragmentation behavior of phospho-peptides and check for the presence and intensity order of diagnostic fragment ions in tandem mass spectra. The Ascore for example then calculates the localization specific probability for every possible amino acid site present in a given peptide. Although both the Ascore and PTM score algorithms appear to work well, there are shortcomings. For instance, the Ascore (as implemented in available software) is incompatible with the widely used search engine Mascot. The PTM score was developed on large-scale phosphorylation data sets from cell lines without validating the score performance on phosphopeptides with known phosphorylation sites. Both scores (as published and implemented in available software) are only enabled for CID spectra acquired on low resolution ion trap analyzers and their performance on other fragmentation types has not been systematically evaluated. The SloMo method (8) is an adaptation of the Ascore for electron capture dissociation and electron transfer dissociation (ETD) data but (as implemented in available software) only accepts Sequest and OMSSA search results.
In light of the above, there still is a need for additional tools that are more widely applicable to multiple mass spectrometry platforms and fragmentation types or fill gaps in available methods. Using results of database search engines directly for phosphorylation site localization would be attractive for this purpose. For ion trap collision induced dissociation (CID) data and the search engine Mascot, Beausoleil et al. (9) evaluated the performance of a normalized Mascot delta ions score (normalized MD-score) that calculates the difference between the top two Mascot ion scores of alternative phosphorylation sites in the same peptide sequence divided by the ion score of the top ranking site. The authors found the normalized MD-score to be inferior in performance to the Ascore. The Heck laboratory has recently used the MDscore without normalization, but did not evaluate or validate its performance (18,19).
In this work, we re-evaluated the ability of the MD-score to estimate the probability of correct phosphorylation site localization for commonly used peptide fragmentation types on three types of mass spectrometers using 180 synthetic phosphopeptides with precisely known phosphorylation sites. We found the MD-score to be applicable throughout and provide MD-score distributions, thresholds and scoring functions tailored for each fragmentation technique that researchers may use as guidelines to determine the false localization rates (FLRs) of phosphorylation site assignment made by Mascot. In addition, we provide the complete liquid chromatographytandem MS (LC-MS/MS) data so that developers of localization software can benchmark the performance of these tools against a standard data set. Finally, we make the physical phosphopeptide collection available to the community so that any laboratory can determine and implement MD-score characteristics for their particular analytical set up.

EXPERIMENTAL PROCEDURES
Peptide Synthesis-Based on a list of naturally occurring phosphopeptides (20), 180 peptides including positional p-site isomers (Supplemental Table S1) were synthesized individually by solid phase synthesis at a scale of 2 mol on a parallel peptide synthesizer (Intavis, Cologne) following the standard Fmoc strategy. Fmoc protected amino acids were obtained from Intavis. Crude peptides were quality controlled by matrix-assisted laser desorption-time-of-flight MS (MALDI-TOF MS) and used for subsequent LC-MS/MS without further purification. Annotated tandem mass spectra of all synthesized phosphopeptides are documented in Supplemental Fig. S13. Peptides were either analyzed individually by LC-MS/MS or as five pooled mixtures (Supplemental Table S2). For all mixtures, peptides were chosen such that no phosphorylation site isomers were present in any one mixture. The synthesized peptides vary in length between 5 and 28 residues (average 16, Supplemental Table S1) and contain 14% Ser, 6% Thr, and 5% Tyr residues, which is similar to the "homo sapiens EGF data set" in Phosida (av. 14 residues, 16% Ser, 5% Thr, 1% Tyr) (21). Peptides are detected as 2ϩ (72%), 3ϩ (26%), and 4ϩ (2%) precursor ions and 33% of the peptides contain one missed protease cleavage site (5% contain two such sites), which is similar to other phosphorylation studies using trypsin as the protease and electrospray ionization. The higher incidence of pY-containing peptides in our set compared with that typically found in large-scale studies was driven by the need to investigate a sufficient number of these peptides for statistical analysis. Multiple phosphorylated peptides are under-represented in our study (ϳ10% here versus ϳ20% in Phosida). Therefore, although the same trends for singly and doubly phosphorylated peptides are observed, MD-score thresholds should be carefully assessed in these cases.
LC-MS/MS of Individual Phosphopeptides on a QTOF Micro-One hundred and eighty synthesized phosphopeptides were each subjected to a 20 min LC-MS/MS run using a 75 m ϫ 50 mm reversed phase column (ReproSil-PUR C18, Dr. Maisch, Germany) and a Ca-pLC instrument (Waters, UK) coupled on-line to a QTOF Micro (Waters, UK). Separation was performed within 15 min using a linear gradient from 2% to 35% acetonitrile in 0.1% formic acid and an effective flow-rate of 250 nl/min (passive flow-splitting). The eluent was sprayed via emitter tips (New Objective, Woburn, MA) buttconnected to the analytical column. Survey spectra were collected for 1 s followed by collecting CID spectra for the top three most abundant signals for 3 s (nitrogen was used as the collision gas, m/z and charge dependent collision energy was between 18 and 46; dynamic precursor exclusion 30 s, minimal precursor intensity 15 counts). A custom in-house software was used that read the centroid data from raw tandem mass spectra and converted these into Mascot generic file format (mgf). No further peak processing was performed. Mgf files were searched using Mascot (2.2) with carbamidomethyl cysteine as fixed modification and oxidized methionine, acetylated protein N terminus, and phosphorylation of serine, threonine, and tyrosine as variable modifications. Trypsin was specified as the proteolytic enzyme and up to three missed cleavages were allowed. The mass tolerance of the precursor ion was set to 0.6 Da and that of fragment ions was set to 0.4 Da. The data was searched against an in-house curated version of the human International Protein Index database combined with a decoy version thereof (22). This database contains a total of 163,476 protein sequences (50% forward, 50% reverse) and represents a nonredundant composite of International Protein Index versions 1.0 -3.54 and the sequences of bovine serum albumin, porcine trypsin, and mouse, rat, and sheep keratins.
LC-MS/MS Analysis of Phosphopeptide Mixtures on an LTQ-Orbitrap XL ETD-Nanoflow LC-MS/MS was performed by coupling a nanoLC Ultra 1D plus (Eksigent, Dublin, CA) to an LTQ Orbitrap XL ETD mass spectrometer (ThermoFisher Scientific), using a custom packed 20 mm ϫ 75 m ReproSil-Pur C18 (Dr. Maisch, Germany) trap column followed by a custom packed 400 mm ϫ 75 m Repro-Sil-Pur C18 (Dr. Maisch, Germany) analytical column. Separation was performed within 60 min using a gradient from 0% to 40% acetonitrile in 0.1% formic acid. The eluent was sprayed via emitter tips (New Objective) butt-connected to the analytical column. The mass spectrometer was operated in data dependent acquisition mode, automatically switching between MS and MS2 (dynamic exclusion off, minimal precursor intensity 1000). Full scan MS spectra were acquired in the Orbitrap at a resolution of 60,000 at m/z 400 after accumulating ions to a target value of 1 ϫ 10 6 . In separate runs, the five most intense ions were selected for fragmentation by either, collision induced dissociation (CID), multi stage activation (MSA), electron transfer dissociation (ETD), ETD with supplemental activation (ETDSA) or higherenergy collision-induced dissociation (HCD). For CID and MSA (activation of neutral losses of 98, 49, 32.6, and 24.5), peptides were fragmented after accumulating ions to a target value of 5000 and a max. injection time of 500 ms. The fragment ions were recorded in the LTQ ion trap. For ETD with and without supplemental activation, peptides were fragmented after accumulating ions to a target value of 5000 and a max. injection time of 500 ms. Fluoranthene was used as the ETD reagent and the reaction time in the ion trap was dependent on the charge state (100 ms for 2ϩ ions, 66.7 ms for 3ϩ ions and 50 ms for 4ϩ ions). The fragment ions were recorded in the linear trap quadrupole (LTQ) ion trap. For HCD experiments, full scan MS spectra were acquired at a resolution of 30,000 at m/z 400. Peptides were fragmented after accumulating ions to a target setting of 50,000 and using a normalized collision energy of 40%. Fragment ions were detected at a resolution of 7500 at m/z 400 in the orbitrap. A custom in-house software was used that read the centroid data from raw tandem mass spectra and converted these into Mascot generic file format (mgf). No further peak processing was performed. Mgf files were searched using Mascot (2.2) with carbamidomethyl cysteine as fixed modification and oxidized methionine, acetylated protein N terminus, and phosphorylation of serine, threnonine, and tyrosine as variable modifications (allowing neutral loss of 98 for pS/T but not pY peptides). Trypsin was specified as the proteolytic enzyme and up to three missed cleavages were allowed. The mass tolerance of the precursor ion was set to 10 ppm and that of fragment ions was set to either 0.5 Da (CID, MSA, ETD, ETDSA, and HCD) or 0.02 Da (HCD; see Supplemental Fig. S1 for choice of HCD search tolerance). The data was searched against an in-house curated version of the human International Protein Index database combined with a decoy version thereof (22). This database contains a total of 163,476 protein sequences (50% forward, 50% reverse) and represents a nonredundant composite of International Protein Index versions 1.0 -3.54. and the sequences of bovine serum albumin, porcine trypsin, and mouse, rat, and sheep keratins. Searches were performed with and without prior filtering of tandem mass spectra. Searches were performed with and without filtering of tandem mass spectra (as described (23)). Deisotoping and deconvolution of HCD spectra was performed as described (24). Table 2) were analyzed in duplicate on an LTQ ion trap mass spectrometer (ThermoFisher) coupled to a NanoLC 1D plus (Eksigent). Peptides were trapped on a custom made 0.3 mm ϫ 5 mm (ID) trap column (ReproSil-Pur C18, Dr. Maisch, Germany) followed by a custom made 50 cm ϫ 75 m (ID) reversed phase tip-column (ReproSil-Pur C18, Dr Maisch, Germany) and gradient elution was performed from 2% acetonitrile to 40% acetonitrile in 0.1% formic acid within 2 h. The mass spectrometer was operated using the XCalibur Developers kit 2.0.7 in data dependent acquisition mode, automatically switching between MS and MS2 (dynamic exclusion 30s, minimal precursor intensity 1,000). Full scan MS spectra were acquired at a mass range of m/z 400 -1200 after accumulating ions to a target value of 30,000 within 50 ms. For the four most intense ions, the charge state was determined by a zoom scan (target value: 7000, max accumulation time: 50 ms), followed by CID fragmentation or MSA (neutral losses of m/z 98, 49, 32.6, and 24.5) both with target values of 10,000 and 35% normalized collision energy. A custom in-house software was used that read the centroid data from raw tandem mass spectra and converted these into Mascot generic file format (mgf). No further peak processing was performed. Mgf files were searched using Mascot (2.2) with carbamidomethyl cysteine as fixed modification and oxidized methionine, acetylated protein N terminus, and phosphorylation of serine, threonine, and tyrosine as variable modifications (allowing neutral loss of 98 for pS/T but not pY peptides). Trypsin was specified as the proteolytic enzyme and up to three missed cleavages were allowed. The mass tolerance of the precursor ion was set to 3 Da and that of fragment ions was set to 0.6 Da. The data was searched against an in-house curated version of the human International Protein Index database combined with a decoy version thereof (22). This database contains a total of 163,476 protein sequences (50% forward, 50% reverse) and represents a nonredundant composite of International Protein Index versions 1.0 -3.54. and the sequences of bovine serum albumin, porcine trypsin, and mouse, rat, and sheep keratins. Searches were performed with and without prior filtering of tandem mass spectra. Filtering was performed as described (23).

LC-MS/MS Analysis of Phosphopeptide Mixtures on an LTQ-Phosphopeptide mixtures (Supplemental
Phosphorylation Site Localization-The MD-score score was computed from Mascot search result files by determining the difference between the best and second best Mascot ion scores for alternative phosphorylation site localizations on an otherwise identical peptide sequence. The normalized MD-score (nMD-score) was calculated by dividing the MD-score by the best Mascot ion score (9). A custom version of the Ascore algorithm was implemented in Python following the manuscript of Beausoleil et al. (9). False localization rate (FLR) calculation for all scores was performed by dividing the number of incorrect site assignments by the total number of site assignments as a function of the score. For the combination of the Ascore and the MD-score, we fitted a straight line (fixed at origin) to the data in Fig.  2B. The slope coefficient of the fit was 0.33. We then multiplied Ascore values by 0.33 to put both scores onto the same scale. Subsequently we picked the highest scaled Ascore/MD-score pair for each identified phosphorylation site and calculated its FLR.
Data and Reagent Availability-All MS data (raw mass spectrometer output as well as generated peak list files) and a Scaffold result file for the QTOF data are available from the Tranche data repository under the project name: Mascot Delta score; https://proteomecommons.org/tranche/. All synthetic peptides used in this study are available from Intavis AG (Cologne, Germany; http://www.intavis.com).

MD-Score Features and Performance Evaluation-
The peptide identification scores of Mascot or other search engines are not in themselves necessarily a good indicator for the correct localization of a phosphorylation site within a peptide sequence. We thus revisited if the Mascot Delta Score (MDscore), which simply reflects the difference of Mascot ion scores between the highest and second highest ion scores for candidate phosphorylation sites on an identical peptide sequence in a database search, would be a suitable criterion for site localization. Based on a set of naturally occurring phosphopeptides (20), a collection of 180 phosphopeptides (129 pS/pT; 48 pY; 3 mixed pS/pT/pY peptides; 164 singly and 16 doubly phosphorylated) with precisely known phosphoryla-tion sites and multiple positional isomers were synthesized individually and analyzed separately by LC-MS/MS employing CID on a QTOF Micro instrument. The properties of these peptides are similar to those found in other phosphorylation studies and should thus be a good set of standards for the purpose of evaluating the MD-score for p-site localization (see also discussion). The LC-MS/MS approach generated both strong and weak tandem MS data for each peptide as would be the case in a typical analysis of a proteomic sample. In total, 2174 MS/MS spectra were matched to the 180 different phosphopeptides (229 peptides when considering partially oxidized Met residues) corresponding to 9.5 spectra per peptide on average (range of 1-62 spectra per peptide). For all peptide-spectrum matches, Mascot ion score, Ascore, MDscore, and normalized MD-score values were obtained and the number of correct/incorrect localizations as well as the false localization rates were determined based on the known phosphorylation sites of the synthetic peptides. Fig. 1 shows that the distribution of all three localization scores strongly discriminate correct and incorrect phosphorylation site assignments whereas the Mascot score alone is not a confident measure for the reliability of phosphorylation site assignment (see also Supplemental Fig. S2). Analysis of the data shown in Figs. 1A-C reveals that the Ascore matched a total of 1446 spectra to the correct site (138 incorrect) resulting in a total FLR of 9%. The MD-score (and nMD-score) was slightly more sensitive and made 1639 correct (201 incorrect) assignments but at the expense of a slightly higher total FLR of 11%. For about 10% of all tandem mass spectra, the MD-score (and nMD-score) was zero indicating that no judgment between correct and incorrect site localization could be made. From the score distributions, one can easily derive FLR thresholds (say 1%) that may be used for the analysis of similar samples. For the Ascore, a threshold of 22 is required to reach 1% FLR at which the Ascore made 884 correct assignments. The respective threshold of the MDscore is 10 at which 899 correct assignments are made. The 1% FLR threshold for the normalized MD-score is 0.36 at which 574 site assignments were correct. We note here that the cutoff values calculated for this relatively small peptide set, are very similar to the ones originally determined for the Ascore (threshold 20) and normalized MD-score (threshold 0.4) (9). Collectively, this data shows that the MD-score and Ascore perform similarly well on our data and both substantially better than the normalized MD-score (Fig. 1D).
Because the CID fragmentation behavior of pS/pT and pY containing phosphopeptides can be quite distinct, we next investigated if the Ascore and MD-score would be biased in their ability to deal with phosphorylation site localization to the different amino acids. To address this, the pS/T and pY data was analyzed separately. As evident from Fig. 2A (and Supplemental Fig. S3), the Ascore works particularly well for pS/pT peptides (883 correct assignments at 1% FLR threshold of 20; total FLR 6%). At 1% FLR, the Ascore shows 40% higher sensitivity compared with the MD-score (615 correct assignments at 1% FLR threshold of 10; total FLR 11%). At 3% FLR, both scores have about equal sensitivity. Fig. 2A also shows that the MD-score outperforms the Ascore for pY-peptides by a factor of eight (306 correct assignments at 1% FLR threshold of 7; total FLR 10% versus 36 correct assignments at 1% FLR threshold of 39; total FLR 20%). At 3% or 5% FLR, the MD-score still outperforms the Ascore by a factor of three. Taken together, the results show that the Ascore performs extremely well for pS/T peptides but is strongly biased against pY-peptides. Conversely, the MDscore does not show a strong bias between the different phospho-amino acids but is indeed less sensitive than the Ascore for pS/T peptides.
As expected, a plot of observed MD-score and Ascore values for the Ͼ2,000 peptide to spectrum match against the determined FLR values shows that the FLR drops rapidly as the score rises (Supplemental Fig. S4). The distribution can be fitted to the sum of two exponentials, e.g. FLR ϭ A * exp(-C * MD score) ϩ B * exp(-D * MD score) (see Supplemental Fig. S12 for values of the constants A-D), which allows calculation of the probability of correct site localization for any given score threshold. This often is a useful alternative to filtering data to fixed FLR cutoffs. Table I shows that the MD-score thresholds derived from the fit are virtually identical to those obtained from counting correct/incorrect peptide to spectrum matches with the result that the numerical FLR values derived from the fit are also very close to those determined from counting correct/incorrect matches.
As shown above, both MD-score and Ascore have individual strengths and weaknesses in using information from tandem MS spectra to infer the site of modification to the different amino acids. A prominent difference being that for calculation of the Ascore fragment ions derived from the phosphorylated amino acid and the corresponding neutral loss are considered, provided both ion signals are of sufficient abundance. In contrast, Mascot mainly considers the best scoring ion series (either the one containing the phosphorylated amino acid or the one containing the neutral loss of phosphoric acid). Consequently, the two scores show a positive albeit not very strong correlation (R 2 ϭ 0.33, Fig. 2B). This observation lead us to try to combine the two scores and indeed, Fig. 2C shows that the combination of both scores lead to ϳ20% higher sensitivity at 1% FLR (25% higher at 3% FLR). There may be other factors influencing the correlation of the two scores but there is insufficient data to investigate more subtle effects such as amino acid composition or sequence features. Next we examined how the spacing of two alternative phosphorylation sites within a peptide sequence influenced the performance of both scores. The synthesized phosphopeptide collection contains many such examples that enabled us to check for potential bias in the MD-score and Ascore for positional alternatives. Although both scores can generally discriminate alternative phosphorylation sites, Fig. 3 shows that a significant bias exists toward more reliable localization in cases that the two alternative sites are more than one amino acid residue apart. At 1% FLR, an MD-score of 14 is required for discriminating adjacent phosphorylation sites whereas an MD-score of 7 suffices if the putative phosphorylation sites are further apart. The same effect is observed for the Ascore and the respective thresholds are 40 for adjacent sites and 18 for sites with larger spacing. This observation highlights that global FLR values apply to the "average" peptide in any data set but should be used with caution when assessing site localizations of individual or subsets of peptides containing particular sequence features. Scarcity of suitable data usually impairs development of feature specific score thresholds, e.g. for the specific case of site spacing.
Different Fragmentation Techniques Require Different MD-Score Thresholds-Above, we described the MD-score characteristics and its properties using CID spectra generated on a QTOF instrument. We next explored the utility of the MDscore for other fragmentation techniques commonly used in proteomics. In particular, we generated five phosphopeptide mixtures (Supplemental Table S2), analyzed the MD-scores of these pools following LC-MS/MS on an LTQ (CID and MSA) and an LTQ-Orbitrap XL ETD mass spectrometer (CID, MSA, ETD, ETDSA, and HCD). The outcomes of these experiments are summarized in Fig. 4 and Table I. Interestingly, at 1% FLR (spectrum level), HCD (following de-isotoping and charge deconvolution) identified the highest number of unique phosphopeptides (n ϭ 131) with the correct phosphorylation localization closely followed by low resolution ETD spectra without FIG. 2. Comparison of MD-Score and Ascore. A, The MD-Score (red) and Ascore (blue) perform similarly well for S/T phosphorylated (solid lines) peptides but the MD-Score outperforms the Ascore for Y-phosphorylated (dotted lines) peptides. B, Site localization scores made by the MD-Score and the Ascore vary significantly but show a positive correlation. C, Combining the two scorings improves the overall performance of phosphorylation site localization at all FLR thresholds, MD-score (red), Ascore (blue) and combined (green).
(n ϭ 127) or with (n ϭ 116) supplemental activation (see also Supplemental Figs. S5 and S11). At 5% FLR, the difference between HCD and ETD diminishes (Supplemental Fig. S11). Spectra collected by multistage activation (MSA) were significantly more successful than resonance activation CID performed on both the LTQ instrument ( Fig. 4A and Supplemental Fig. S6) and the Orbitrap instruments (Fig. 4B). However, both methods on both instruments performed significantly poorer than HCD and ETD implying that the high fragment ion mass accuracy afforded by HCD and the lack of neutral losses in ETD spectra provide more specific site localization information than low resolution CID and MSA spectra (see also below).
The overlap of all peptides detected by HCD (deisotoped, charge deconvoluted) and ETD is very high (91%). However, the correlation of MD-scores between the two techniques was rather weak (R 2 ϭ 0.33, Supplemental Fig. S7). Not only do the two techniques generate completely different fragment ions, they also have different cleavage preferences with respect to amino acid sequence context (25). Another reason for this behavior turned out to be the charge states of the fragmented precursors. Although HCD spectra (de-isotoped, charge deconvoluted) from 2ϩ and 3ϩ precursors had average MD-scores of 15.4 and 16.3 respectively, the corresponding ETD spectra showed average MD-scores of 14.0 and 19.8 for the respective 2ϩ and 3ϩ precursors. This reflects the general observation that more highly charged precursors (3ϩ or higher) tend to yield better ETD spectra compared with doubly charged precursors. As a side note, the MD-scores for unprocessed HCD data are 13.5 for 2ϩ ions and 10.8 for 3ϩ ions highlighting the benefit of de-isotoping and charge deconvolution for Mascot searching in general and p-site localization in particular. The overlap of all peptides detected by CID and MSA is also very high (89%, LTQ instrument, filtered data) and their MD-scores correlate much better than those for ETD and HCD (R 2 ϭ 0.63, Supplemental Fig. S7) implying that the fragment ions used for successful site localization are not completely distinct.
Because our phosphopeptide mixtures are not overly complex, multiple tandem MS spectra were generated for each peptide (average of 4 -12 depending on fragmentation tech-  Only the best scoring spectrum is used for counting correct and incorrect assignments; numerical FLR (fit) is calculated from the fitted MD-score distribution.
FIG. 3. Influence of phoshorylation site spacing on localization accuracy. MD-Score (A) and Ascore (B) phosphorylation site assignments are more reliable if two putative phosphorylation sites are more than one amino acid apart (red lines) compared with sites that are adjacent (blue lines). nique). Although this redundancy allowed us to generated robust FLR values, typical phosphoproteomics studies probably contain rather fewer spectra per peptide. In order to examine if this would influence the results, we repeated the data analysis using only the best scoring spectrum per peptide. The respective column in Table I shows that this mainly leads to a decrease in incorrect localizations indicating that the use of score thresholds determined here would also be useful for data sets with fewer available spectra per peptide. An alternative way to test the predictive value of the MD-score thresholds would be to divide the data in two sets, determine the score thresholds for one set and test if the same FLR values would be found for the other set. Because of the limited number of unique phosphopeptides available to us, we instead chose to address this point by replicate analysis. Fig.  4C shows the FLR versus MD-score distributions of two independently acquired and analyzed CID and MSA experiments using a 2-fold difference in the amount of material injected for analysis (LTQ instrument). Despite the differences in analyte quantity, the two replicates almost perfectly superimpose showing that FLR thresholds determined for one of the data sets can be transferred to the other. We suspected that the success of the MD-score using HCD data is in part driven by fragment ion mass accuracy. To test this hypothesis, we searched the very same HCD data with either low (0.5 Da) or high (0.02 Da) fragment ion accuracy. At 1% FLR, the number of correct localizations concomitantly increased from 44 to 85 unique peptides (Table I, Supplemental Fig. S8). Comparing site assignments for resonance CID data collected on an ion trap to those obtained from CID on a QTOF instrument (Supplemental Fig. S9) reveals that there are significantly fewer mistakes in the QTOF data. Because we did not observe an obvious difference in the average MD-scores of pS/pT/pY peptides on the two instruments, the differences in localization performance are presumably owing to several combined effects. The QTOF offers better fragment ion mass accuracy than an ion trap and contains sequence ions that are frequently lost in ion traps owing to their inherent inability to stabilize low m/z fragment ions (low mass cutoff). In addition, the neutral losses typically observed for pS/pT peptides are less pronounced on QTOF type instruments than on ion traps. On the other hand, ion trap spectra usually contain more abundant b-ions than QTOF spectra but the net effect of the above factors is that QTOF CID data leads to the matching of more fragment ions relevant for site localization and hence better localization performance.
Our results also highlight that processing of tandem mass spectra can have an effect on the success of p-site localization (Table I). It has previously been shown, that filtering tandem MS spectra to remove low signal:noise fragment ions improves the peptide identification rate of proteins from low resolution ion trap spectra (26,27). Such filtering is also used in the Ascore and PTM score algorithms and our filtered CID, MSA, and HCD data shows that this also improves the success of phosphorylation localization by the MD-score by ϳ15% (Table I, Supplemental Figs. S10 and S11). An alternative data processing step leading to much improved site localization is to deisotope and charge deconvolute HCD spectra. Both these improve the Mascot ion score because de-isotoping reduces the number of signals the search algorithm has to consider and charge deconvolution reduces the number of random matches from splitting the sequence information over two (or more) ion series. Both effects likely not only drive the improvement of the Mascot ion score but also the MD-score.
Scoring Positional Phosphopeptide Isomers Using the MD-Score-About 50% of the set of 180 synthesized phosphopeptides represent positional isomers. The data presented above and in Fig. 3 illustrate that the MD-score can also distinguish the majority of these cases. To illustrate this utility, Fig. 5 shows ETD spectra of the peptide ETTTSPKKYYLAEK (derived from the Tyrosine-protein kinase Tec) in which either of the four adjacent Thr or Ser residues was synthesized to carry one phosphate group. Evidently, all four spectra are FIG. 4. False localization rates and reproducibility of MD-Score thresholds for different types of tandem mass spectra. A, phosphorylation site assignments from spectra collected on an LTQ linear ion trap mass spectrometer. B, site assignments from spectra collected on a hybrid LTQ-Orbitrap mass spectrometer. Fitting a sum of two exponentials of the type FLR ϭ A * exp(-C * MDscore) ϩ B * exp(-D * MDscore) to these curves allows calculation of FLR values for any phosphopeptide assigned by Mascot in a tandem MS specific manner (for values of constants see Supplemental Fig. S12). C, Fitted FLR versus MD-score curves computed from two independent MSA and CID experiments show that the curves and score thresholds are highly reproducible. highly similar; all but a few c-ions are identical and rarely permit site localization because they cover only the C-terminal (i.e. unmodified) part of the peptides. Instead, the correct localization primarily relies on the few z-ions representing the N-terminal part of the peptides. Still, the minimal MD-score in all cases is Ն9, which assigns the correct phosphorylation site in each case with Ͼ99% confidence (ETD score threshold is 7, see Table I). Thus, the MD-score greatly helps to arrive at an objective assessment of the most likely phosphorylation site either by itself or in conjunction with manual spectrum interpretation. Because phosphopeptide isomers can very often be separated by reversed phase liquid chromatography using shallow gradients, site assignment by the MD-score will generally be possible in an LC-MS/MS experiment. However, if isomeric peptides do happen to co-elute under the chromatographic conditions employed, conclusive site identification may only be possible for the most abundant isomer. DISCUSSION In this study, we have re-evaluated the performance of a Mascot delta score (MD-score) metric for its ability to localize phosphorylation sites in peptides. Instead of using the Mascot ion score itself, the MD-score measures the difference in Mascot ion scores between the two best alternative phosphorylation site assignments suggested by the database search. We generated a significant number of diverse and individually synthesized phosphopeptides with precisely defined phosphorylation sites and properties similar to those found in typical phosphoproteomics studies. This set of reagents allowed us to explore the merits of the MD-score in detail and to calibrate the score for different use cases. As a result, false localization rates for phosphorylation site assignments made by the Mascot search engine can be computed for phosphopeptide spectra generated by many commonly used tandem mass spectrometry techniques, which we think is a useful extension to the available set of tools for phosphorylation site localization.
We note that the MD-score is not a new idea but our work suggests that it has more merit than previously appreciated. In the original Ascore publication (9) 5. Example ETD spectra of the peptide ETTTSPKKYYLAEK with a single phosphate group on four alternative adjacent S/T sites. Mascot ion scores are all above identity threshold confirming the peptide sequence but only the MD-Score allows confident assignment of the correct phosphorylation site in these isomeric phosphopeptides.
two ranking peptides and dividing that difference by the first ranking peptide's ions score) for low resolution ion trap CID spectra but found it to be inferior to the Ascore. We applied the same methods to the analysis of our phosphopeptide LC-MS/MS data and in addition evaluated the performance of a straight score difference (that is taking the difference in the ions score for the top two ranking peptides). The results of a direct comparison between the methods are shown in Fig. 1. Our data confirm the previous results obtained by Beausoleil et al. that the normalized MD-score is significantly poorer than the Ascore. We also find a very similar cutoff to reach 99% localization confidence (0.36 in our study versus 0.4 in the Beausoleil study). However, Fig. 1 also clearly shows that the straight MD-score significantly outperforms the normalized MD-score and is very similar in overall performance to the Ascore. The reason for the poor performance of the normalized MD-score is that it makes no difference between high and low quality database search results. For example, two alternative sites with Mascot scores of 60 and 40 respectively generate the same normalized MD-score as two alternative sites with Mascot scores of 6 and 4. Clearly, such score normalization will negatively impact the ability to call a p-site correctly by allowing too many obviously poor assignments. The Heck laboratory recently also used the delta ion score of Mascot database search results for assessing alternative phosphorylation sites from CID and ETD data (18,19) but did not establish if the statistical assumptions made by the Mascot ion score are equally applicable for scoring p-site localization for these two fragmentation techniques. Even though the MD-score and Ascore show similar overall performance, there are significant differences in detail. The MD-score outperforms the Ascore for tyrosine phosphorylated peptides whereas the Ascore does so for S/T phosphorylation (Fig. 2). This observation may not be surprising given that the Ascore was developed on a data set dominated by S/T phosphorylation. For the same reason, phosphorylation sites with high MD-scores may not necessarily also have high Ascores and vice versa. Consequently, combining the MD-score and Ascore leads to a moderate improvement in sensitivity and specificity over using one score alone. There are other published studies addressing the issue of phosphorylation site identification by either database searching using different search engines (9,14) or other localization scores (2, 8, 9, 14 -17). It was beyond the scope of this work to compare these methods systematically to the MD-score but it can be anticipated that differences between search engines and site localization scores will exist depending on which criteria (and with which weighting) are used for site identification. Most approaches use empirical information about how phosphopeptides fragment in the gas phase but in X! Tandem (28), for example, phosphorylation motif information can also be used to bias phosphorylation site localization results. A noteworthy feature of the MD-score is that its numerical value and statistical significance are independent of the size of the database searched. The same MD-scores are in fact obtained when searching the human subset of Swissprot (16,000 entries), the complete Swissprot (258,000 entries) or the full NCBInr database (4,627,000 entries) with or without modifications in addition to variable phosphorylation on Ser, Thr, and Tyr residues. Therefore, MD-score values determined for a particular tandem MS method may be used for scoring large or small data sets alike.
Using score differences from database search engines is generally attractive because it does not require specialized informatics tools and our results show that the MD-score can be used for many fragmentation techniques. However, it should be noted that the MD-score is not an absolute or universal measure of phosphorylation site localization probability because the MD-score distributions and significance thresholds are different for every fragmentation technique, an important point not discovered or addressed by previous studies. Using our set of synthetic phosphopeptides with precisely known sites allowed us to calibrate the MD-score so that false localization rates can be derived for any of the fragmentation techniques investigated (Fig. 4, Table I, Supplemental Fig. S12). To stress the point by example, an MDscore of 11 is required for correct site localization in low resolution resonance CID spectra (1% FLR) but an MD-score of four suffices for low resolution ETDSA spectra to reach the same level of confidence. We suspect that any site localization score rooted in database searching will be prone to differences depending on the fragmentation technique used, again stressing the importance to derive FLR values for each technique.
A pre-requisite for successful phosphorylation site determination is of course the ability to identify the underlying peptide in the first place. Because of the gas phase fragmentation behavior of phosphopeptides, not all fragmentation techniques and mass analyzers are equally suitable. We therefore used the phosphopeptide collection to investigate the phosphorylation site localization accuracy of the MD-score for all commonly used fragmentation techniques. The results confirmed earlier observations that low resolution resonance activation CID spectra are neither particularly sensitive nor very accurate in correctly assigning the site of phosphorylation (4,29). Multistage activation, MS 3 or data filtering routines have been shown to improve the number of phosphopeptide identification and localization moderately (30 -32) and again, our data agrees with these studies. Villen et al. have argued that MSA performs less well than CID for the large-scale identification of phosphopeptides because of the extra time required to record MSA spectra compared with resonance CID spectra (32) so that CID would simply outnumber MSA and thus be more productive overall. However, there are also other reports that conclude the opposite (31). Our study focused on the quality of site-localization rather than p-peptide identification and the data clearly suggests that MSA spectra offer benefits. At the same time, we find that this benefit is more pronounced for an LTQ instrument than for an Orbitrap, which is in line with the Villen study. A somewhat unexpected observation was that HCD fragmentation can be just as successful as ETD fragmentation (Table I), which is commonly referred to as the method of choice for phosphorylation site determination (33). This observation can be attributed to several factors; the HCD spectrum does not suffer from the low mass cutoff inherent to ion trap spectra and thus contains information on parts of the peptide sequence that are not available from ion trap mass analyzers. In addition, the increased fragment ion mass accuracy offered by the Orbitrap analyzer reduces the probability of randomly assigning fragment ions. Further, the low spectral noise in HCD spectra allows Mascot to score low abundance ions important for localization more often than possible in ion trap spectra. Last, the high resolution of HCD spectra allow de-isotoping and charge deconvolution of fragment ions both of which lead to higher Mascot scores. The lowest absolute significance threshold was obtained for ETD with supplemental activation, which can be rationalized by the more efficient fragmentation of charge reduced precursor ions as well as the presence of ETD and CID type fragment ions, which represents more information than ETD fragment ions alone. At the same time, ETDSA identified fewer peptides than HCD with correct phosphorylation site localization. This is because the majority of the synthetic tryptic peptides were observed as doubly charged ions that favor their identification by HCD over ETD (see Results section for details). We chose to base the presentation and interpretation of our data on 1% FLR. This is very stringent for the purpose of site localization. Although the same trends are also observed at a less conservative but acceptable level of 5% FLR (Supplemental Fig. S11), the subtle differences among ETD, ETDSA, and the HCD data processing varieties diminish. Given the good performance of ETD and HCD, it would have been interesting to combine ETD with fragment ion recording in the Orbitrap. Unfortunately, the sensitivity of this experiment on our instrument is very poor, so that we were unable to generate meaningful data. Future work in this area may include ETD on the latest generation of QTOF and Orbitrap instruments. It should be noted again, that the above conclusions are drawn for successful site localization and therefore do not necessarily infer more productive p-peptide identification in e.g. large-scale studies because the acquisition of both ETD and HCD need more time than resonance CID. However, a very recent report suggests that HCD is in fact a very competitive method for large-scale phosphorylation identification (34). Although we see the same trends for singly and doubly phosphorylated peptides, the rather low number of these peptides in our data set makes it currently difficult to anticipate if HCD or some form of ETD will be more successful for correct site localization of multiply phosphorylated peptides. As a noteworthy side note, our data and other recent studies (19,35) show that the reported gas phase phosphorylation site rearrangement of peptides (36) is not a major concern for phosphoproteomic studies.
Despite the success of the MD-score phosphorylation site localization, there are several aspects that should be given thought. First, it should be noted that the FLRs we present in this study both for the MD-score and the Ascore are "global" in the sense that they are unaware of parameters such as spacing between possible phosphorylation sites, composition or secondary structure of flanking amino acid sequences etc. Our observation that p-sites that are further apart are more likely to be called correctly indicates that a dependence on spacing indeed exists. In our simple evaluation (adjacent sites versus spaced sites) the effect is actually quite large. Consequently, caution should be applied when assessing the localization information reported for individual peptides containing many potential sites. It would indeed be very interesting to evaluate spacing (and others) as a parameter more systematically to be able to keep a fixed FLR for all peptides. However, a very large number of peptides (estimated thousands) with precisely known sites would be required to reach statistically sound conclusions, which was beyond the scope of this study. Another point for consideration relates to the localization of multiple phosphorylations to peptides containing many potential acceptor sites. Mascot tests a maximum of 256 permutations of a modification on a peptide sequence in order to keep the required computation time at a reasonable level. For singly phosphorylated peptides, the 256 permutation cap limits the number of potential sites to 256. For doubly phosphorylated peptides, the limit is 23 possible sites, for triply phosphorylated peptides the limit is 12 sites and for four, five, and six phosphates on a single peptide, the limit is 10 sites. We analyzed our data as well as the data presented in the Ascore and PTM score manuscripts (2,9) in this regard and found that there is not a single sequence in our phosphopeptide collection that would exceed these limits. The same is true for the 2872 phosphopeptide sequences listed in the Ascore paper. Of the 18,958 phosphopeptide sequences listed in the PTM score paper, 70 are above the limit (0.4%). Fortunately, we can conclude from this data that one should be aware of the 256 permutation limit imposed by Mascot (other search engines are likely to have similar caps) but it does not constitute a severe issue for large-scale phosphoproteomics in general and the MD-score in particular. The limit does however become more relevant for "middle-down" proteomic approaches particularly if multiple modifications are present on a reasonably large peptide. A third aspect relates to how easily MD-score thresholds determined here can be transferred to other data sets. We showed that the score thresholds are very reproducible among replicate experiments. However, as for any other localization approach, one cannot comprehensively rule out the possibility that changes to experimental parameters such as data acquisition settings might change the content of tandem mass spectra such that score thresholds shift to slightly different values. This is why we not only make our data available to the community but also provide the peptide collection so that individ-ual laboratories can determine MD-score thresholds for their individual analytical setup. CONCLUSIONS Our data shows that the MD-score is a valuable tool for the objective assessment of phosphorylation site assignments made by Mascot, which should further improve the reliability of small and large-scale phosphoproteomics studies. The use of individual synthetic phosphopeptides with precisely known phosphorylation sites independently validates approaches such as the Ascore, PTM score and similar other scoring schemes that were developed on large phosphopeptide data sets in which the exact sites were not always known a priori. This is particularly important for large-scale studies in which it is no longer practical to validate each phosphorylation site assignment by manual inspection of the tandem mass spectra. It might in fact be argued that manual inspection of tandem mass spectra may be more error prone than an automated objective scoring scheme such as the MD-score. The MD-score concept is applicable to many fragmentation techniques and can be obtained easily from Mascot database search results. Given that Mascot is one of the most widely used protein identification software tools in proteomics, the MD-score will enable many laboratories to assess their phosphorylation data objectively without the need for using somewhat arbitrary identification score thresholds. We are making all LC-MS/MS data as well as the phosphopeptide collection available to the community so that any laboratory may be able to perform similar types of analysis as we did and adapt the reported scores to their analytical environment.