Multi-laboratory experiment PME11 for the standardization of phosphoproteome analysis Journal of Proteomics

Global analysis of protein phosphorylation by mass spectrometry proteomic techniques has emerged in the last decades as a powerful tool in biological and biomedical research. However, there are several factors that make the global study of the phosphoproteome more challenging than measuring non-modified proteins. The low stoichiometry of the phosphorylated species and the need to retrieve residue specific information require particular attention on sample preparation, data acquisition and processing to ensure reproducibility, qualitative and quantitative robustness and ample phosphoproteome coverage in phosphoproteomic workflows. Aiming to investigate the effect of different variables in the performance of proteome wide phosphoprotein analysis protocols, ProteoRed-ISCIII and EuPA launched the Proteomics Multicentric Experiment 11 (PME11). A reference sample consisting of a yeast protein extract spiked in with different amounts of a phosphomix standard (Sigma/Merck) was distributed to 31 laboratories around the globe. Thirty-six datasets from 23 laboratories were analyzed. Our results indicate the suitability of the PME11 reference sample to benchmark and optimize phosphoproteomics strategies, weighing the influence of different factors, as well as to rank intra and inter laboratory performance.

x ProteoRed-ISCIII, CBM  Global analysis of protein phosphorylation by mass spectrometry proteomic techniques has emerged in the last decades as a powerful tool in biological and biomedical research. However, there are several factors that make the global study of the phosphoproteome more challenging than measuring non-modified proteins. The low stoichiometry of the phosphorylated species and the need to retrieve residue specific information require particular attention on sample preparation, data acquisition and processing to ensure reproducibility, qualitative and quantitative robustness and ample phosphoproteome coverage in phosphoproteomic workflows. Aiming to investigate the effect of different variables in the performance of proteome wide phosphoprotein analysis protocols, ProteoRed-ISCIII and EuPA launched the Proteomics Multicentric Experiment 11 (PME11). A reference sample consisting of a yeast protein extract spiked in with different amounts of a phosphomix standard (Sigma/Merck) was distributed to 31 laboratories around the globe. Thirty-six datasets from 23 laboratories were analyzed. Our results indicate the suitability of the PME11 reference sample to benchmark and optimize phosphoproteomics strategies, weighing the influence of different factors, as well as to rank intra and inter laboratory performance.
Many aspects of cell biology are regulated by reversible protein phosphorylation networks that involve thousands of phosphorylation events. In the last decade multiple methods have been developed to identify and quantify involved phosphorylation sites, and their modulation and dynamics under physiological and pathological conditions. Global post-translational modification analysis based on cutting edge mass spectrometry technology has emerged as the premier tool in many laboratories worldwide to investigate the complexity of signaling pathways and their crosstalk [1][2][3].
In 2016 the Spanish Proteomics Network ProteoRed-ISCIII proposed the PME11 multi-laboratory experiment as part of the EuPA Standardization Initiative. The aim was to evaluate the performance and reproducibility of phosphopeptide enrichment procedures and to test the usefulness of phosphopeptide mixture standards to set up, monitor, and troubleshoot phosphopeptide analysis pipelines. The reference samples analyzed in the study (PME11-A1, A2, A3) consisted of a yeast tryptic digest (125 μg of a C-18 purified peptide digest), spiked-in with three different concentrations (100, 250 and 500 fmol) of a mixture of 20 human phosphopeptide standards (Phosphomix 1 and 2 from Sigma-Aldrich, (product reference MSPL1 and MSPL2, Table 1), containing light isotopes. Each participant laboratory received two aliquots of each of the three samples (SUPP INFO 1&2), that were distributed in dry ice, lyophilized from a water-acetonitrile mixture. One additional vial PME11-B, containing 2 pmol of each of the corresponding isotopically labeled heavy Phosphomix standard peptides (Sigma-Aldrich MSP1H and MSP2H) was distributed in dried form for ulterior quantitative analyses. Upon reception participants were indicated to re-dissolve the samples in the appropriate buffer for the enrichment procedure selected. Then, enriched phosphopeptides were analyzed by LC-MS/MS (three replicates) following the recommended guidelines (10 to 30% of the enriched sample and 60 min 0-35% acetonitrile gradient). Analysis of pre-enriched samples was also recommended. Detailed descriptions of the experimental settings, reference sample and analysis guidelines were provided to the participants (SUPP INFO 1&2).
Recently, a related study conducted by several laboratories in the frame of the MS Resource Pillar of the HUPO Human Proteome Project has been reported [4]. In this study, a standard set of 94 phosphopeptides and their nonphosphorylated counterparts, mixed in a neat sample and a yeast background were analyzed. Unlike the HUPO study samples, the samples proposed in the present study allowed for the assessment of the enrichment of the endogenous yeast phosphopeptides, in conditions and amounts similar to a real sample. Besides, the spiked-in phosphopeptide standards were provided in isotopically labeled and unlabeled form, allowing for assessment not only of targeted phosphopeptide analysis, but also to estimate the yield of the enrichment procedures used.
Under the coordination of ProteoRed-ISCIII, 36 datasets were received from 23 laboratories ( Table 2) distributed across Europe -Spain, France, Switzerland, United Kingdom, and Sweden-and USA. Individual reports including experimental details and results were prepared by each participant in the template specifically design for this experiment. Additionally, MS/MS files (mgf format) were also submitted to the coordination unit for their centralized processing and integration, which will be described elsewhere. Some laboratories provided various datasets that corresponded to different analytical pipelines, which allowed the specific evaluation of the experimental conditions tested as the user and instrument used in these cases were the same. Shotgun analysis results were used to evaluate the general performance of each laboratory in terms of number of yeast phosphopeptides identified, efficiency of the enrichment procedure (phosphopeptides/total peptides ratio) and detection of spiked-in phosphopeptide standards.
In light of the dispersion of the analytical conditions used by the participating labs, a comprehensive statistical analysis may have limitations. Nevertheless, several outcomes are worth to be discussed taking into consideration the interlaboratory nature of the present experiment. Samples were processed following different protocols in eight different mass spectrometers as summarized in Table 2 and supplementary  information. A first clear outcome is that intra-laboratory reproducibility is in general very good, as shown by the error bars in the graph in Fig. 1A, with a median %CV between triplicate analysis of 9.16% (Table 2). It has to be remarked that these correspond to triplicate experiments, including the enrichment step and the LCMS analysis.
Regarding inter-laboratory comparison, the number of phosphopeptides identified in the different experiments spans a wide range, with an average value of 1026, (Fig. 1A, B and Table 2). One of the main factors that explains this wide range is of course the technical capability of the different instruments used. To roughly estimate the contribution of this factor, normalized values have been calculated (black points in the graph) using as normalization factor the ratio between the reported number of total peptides in the analysis of the preenrichment sample for each experiment (Table 2), and the average values for all the experiments. Using this normalization to "compensate" for instrument performance, the inter-laboratory %CV for the number of phosphopeptides decreases from 66% to 36% (Fig. 1B).
Other factors accounting for this variability would certainly include the enrichment protocol used, as well as the parameters used for data processing and database searches, but also reflect the different expertise of the different laboratories. This is apparent when comparing the results from laboratories using the same type of enrichment and identical instrument (see for example L14 vs L09, L28 vs L15, or L13 vs L20, in Fig. 1A).
The amount of sample analyzed (Table 2) is also a factor that influences the result, as is well illustrated by data from L23 with around 500 phosphopeptides detected upon processing 10% sample in an Orbitrap XL (L23-1 and -4) and about 1000 and 2000 identifications when 2% and 20% sample were processed (L23-2 and -3, -5 and -6 respectively) in an Orbitrap Fusion Lumos respectively ( Fig. 1A and Table 2).
The enrichment selectivity (Fig. 1C) spans from 15 to 90%. Overall, there is no clear correlation between the observed selectivity and the number of phosphopeptides identified in each of the experiments, influenced, as discussed, by many other factors.
TiO 2 was the preferred enrichment method, representing more than 80% of the analyses and resulted in higher enrichment selectivity (above 50%) compared to those obtained with IMAC (below 40%). This is also the case when comparing TiO 2 versus IMAC enrichment data from the same laboratory, such as data from L23 and L13. These results seem to be in agreement with previous data reporting increased selectivity of TiO 2 compared to metal chromatography [5]. It has been also reported that IMAC enrichment would favor the identification of polyphosphorylated peptides [6]. In the results gathered in this study, no significant differences have been observed in this respect (data not shown). However, the small number of IMAC analysis, together with the variety of protocols and instruments used, precludes a general conclusion. Combination of two enrichment steps, either TiO 2 -TiO 2 or TiO 2 -IMAC, increased the enrichment efficiency notwithstanding the total number of phosphopeptides identified (comparing data obtained in the same instrument in different labs), as deduced from L25 data ( Fig. 1A and B).
Data from TiO2 and IMAC enrichment for L23 and L13, but also data from L28, where different sample/TiO 2 matrix ratios were assayed (see Table 2), exemplify a general trend where higher selectivity of the enrichment step would result in a significant increase in the number of phosphopeptides detected. This behavior would be consistent with a "masking" effect of the presence of a higher ratio of non-phosphorylated peptides with respect to phosphopeptides in the enriched sample.
The enrichment chromatography format did not have any systematic effect either in the number of phosphopeptides detected or in the enrichment capacity; the observed variations result from inter-operator variability.
Detection of phosphopeptide standards relied on an enrichment step, no matter the amount of standard spiked on the yeast extract (aprox. 100, 50 or 20 fmol on column). The frequency of detection defined as the proportion of laboratories detecting a given peptide in three samples, was above 60% for most phosphopeptides (12/20 labs), around 50% in five cases while three phosphopeptides were not detected in any lab, likely due to their small size and highly hydrophilic nature, preventing their retention in the C18 precolumn (Fig. 2). No significant differences were observed in terms of phosphomix standard detection in regard of the different instruments or enrichment methods used. The phosphomix peptide standards are in general readily observable, even at the lowest concentration assayed, with the exceptions described, and so can be useful for quantitative purposes to measure the yield of a particular  S5  ADEPSSEESDLEIDK  1_7  S6  S9  ADEPSSEESDLEIDK  2_6  S9  ELSNSPLRENSFGSPLEFR  1_9  S5  S14  ELSNSPLRENSFGSPLEFR  2_9  S3  S5  ETQSPEQVK  2_3  T2  EVQAEQPSSSSPR  1_5  S10  FEDEGAGFEESSETGDYEEK  1_8  S12  HQYSDYDYHSSSEK  2_7  Y8  S12  LGPGRPLPTFPTSECTSDVEPDTR  2_10  T12  LPQETAR  2_1  T5  NTPSQHSHSIQHSPER  2_8  S4  S9  RDSLGTYSSR  1_3  T6  RSYSRSR  1_2  Y3  S4  RYSSRSR  2_2  S3  S4  SPTEYHEPVYANPFYRPTTPQR  1_10  Y10  T19  SRSPSSPELNNK  2_5  S1  S5  TKLITQLRDAK  1_4  T1  T5  VIEDNEYTAR  2_4  Y7  VLHSGSR 1_1 S6 enrichment experiment.
In conclusion, the use of different protocols, instruments and operators provides a wide scenario of experimental conditions that is optimal to prove the suitability of the reference material here described for interand intra-lab protocol benchmarking, indicating strengths, weaknesses, and guidance for optimization (Stage-Tip vs batch, sample/medium ratio). Overall, we propose that the use of a standardized reference material in a multi-lab study is a useful resource for technology testing as has been extensively demonstrated [7][8][9][10] and provide excellent references to set up protocols and rank the performance of individual labs, contributing to the democratization of sophisticated proteomics pipelines under standardized conditions. We think that the results here Table 2 Datasets gathered in the study, experimental settings, and summary of the main results. Datasets coded with the same L number correspond to experiments performed in the same laboratory using different enrichment or LCMS analysis conditions.  described demonstrate that the standard proposed in this study is a suitable reference material for the assessment and optimization of phosphoproteomic analysis and certainly provide valuable information to dig deeper into the pros and cons of phosphoproteomics workflows.